Senior Software Engineer AI Evaluation & Benchmarks (Python) [Remote]
$80 - $100 per hourG2i
- Remote job
Before Applying
This role is open to contractors in accepted locations only. Please confirm your country is on the list before applying — we're unable to process applications from unlisted locations. List of accepted countries and locations.
For US applicants: This is a 1099 independent contractor role. It is not compatible with F-1 OPT, STEM OPT, or any visa status that requires W-2 employment, guaranteed hours, or employer sponsorship. We are unable to provide offer letters or employment verification for this role.
What You'll Be Doing
Design and build the coding benchmarks and evaluation pipelines used to test frontier AI models on real software engineering work:
Design coding benchmarks that evaluate frontier models on real-world programming tasks — reasoning, debugging, and production-quality code
Build and maintain scalable data pipelines for evaluation workflows
Analyze model-generated code for correctness, reliability, and edge-case failures
Construct structured evaluation scenarios across large repos and multi-language environments
Provide detailed technical feedback on model performance and failure patterns
Contribute to evaluation frameworks that set the bar for how coding ability is measured
End result: benchmarks that meaningfully separate what frontier models can and can't do — and shape how the next generation is trained and improved.
AI coding evaluation in one line: Design task → build harness → run model → analyze failures → feed findings back into the benchmark → evaluations that actually distinguish strong models from weak ones.
What You'll Need
4+ years of professional software engineering experience (non-negotiable)
Expert Python — clean, performant, well-tested code
Hands-on experience working in large, complex codebases
Proven experience designing and implementing LLM coding benchmarks and evaluation data pipelines
Strong command of Git and modern development workflows
Track record at a high-growth tech company or top-tier software organization
Strong written English communication
Identity verification: Applicants will be required to verify their identity and confirm they have valid documentation to work as an independent contractor in their country of residence.
Nice to have
Senior or Lead-level profile with a history of technical ownership
Bachelor's or Master's in CS, ML, or related field (or equivalent professional experience)
Proficiency in additional languages: JavaScript, Go, C++, or others
CI/CD experience and writing robust unit tests (pytest, Mocha, JUnit)
Background in security engineering or significant open-source contributions
Familiarity with AI/ML evaluation methodologies or model benchmarking
Logistics
Location: Fully remote — work from anywhere on the accepted locations list
Compensation: $80–$100/hr based on location and seniority
Contract length: 3 months, with potential for extension
Hours: Full-time availability preferred — hours vary by project and are not guaranteed week to week
Engagement: 1099 independent contractor
Payment: Weekly via PayPal or Stripe
⚠️ Important: Hours are project-dependent and can vary week to week. We recommend keeping other work options open alongside this engagement rather than relying on it as your sole source of income.
$157k - $298k
...Senior Software Engineer – AI Applications At Bezos Academy, we believe all children deserve the great... ...to deployment, monitoring, and evaluation—you work effectively despite constraints... ...production environments; Proficiency in Python for backend development and...SeniorPythonFull timeRelocationRelocation packageFlexible hours- About Iru Iru is the AI‑powered security & IT platform used... ...satisfaction. The Opportunity As a Senior Software Engineer (Full‑Stack, Front‑End... ...key technical decisions : Evaluate tradeoffs pragmatically and... ...with (or strong interest in) Python + FastAPI backend development...SeniorPythonFull timeWork at office3 days per week
$175k - $210k
...Overview Well-Funded AI Startup. Fully Remote, Complex Problems, Very Stable, Meaningful... ...make key decisions. We\'re looking for engineers who are product-minded and can think... ...maintaining complex backend systems, ideally with Python. Existing comfort deeply integrating AI...SeniorPythonLocal areaRemote workWork from home$125k - $350k
...markets and take data analysis to new levels. Software Engineers create next generation software... ...• In-depth experience working in Python • The ambition to solve open-ended... ...power of compute, machine learning and AI to power our analytics and tackle the market...SeniorPython- ...About Iru Iru is the AI-powered security & IT platform used by the world's... ...satisfaction. The Opportunity As a Senior Software Engineer (Full-Stack) on Iru's Customer... ...a backend programming language (e.g., Python, Go, Java, or similar). ~ Hands-on experience...SeniorPythonFull timeWork at office3 days per week
$74.1k - $147.8k
...innovative ideas, such as AI-enabled insights and... ...Job Description: Senior AI Engineer 1 About Crowe... ...distributed training and model evaluation workflows that improve... ...AI/ML engineering or software engineering experience... ...~ Deep proficiency in Python, ML frameworks, and...SeniorPythonLocal areaRemote workWorldwideFlexible hours$124k - $280k
...Competency: Data, Analytics & AI Industry/Sector: Not... ...people in data and analytics engineering focus on leveraging advanced... ...] is a plus Proficient in Python and structured/unstructured data... ...closely with team members. We evaluate these factors thoughtfully to...SeniorPythonFull timeH1b$102.5k - $187.9k
...Technology – Digital Engineering – Full Stack Engineer (Senior) Whole industries... ..., and applied AI across a wide range... ...Senior Full Stack Software Engineer , you will... ...methods, techniques, and evaluation criteria for... ...languages such as Python, Java, C#, or JavaScript...SeniorPythonSummer holidayLocal areaFlexible hours$132.5k - $366.3k
...company at the forefront of AI-native innovation. We partner... ...generation, agent-powered workflows engineered to scale in real-world... ...based routing, tool invocation, evaluation harnesses, and lifecycle... ...of experience programming in Python, Java, or equivalent; familiarity...SeniorPythonLive inWork at officeLocal area$77k - $202k
...Competency: Data, Analytics & AI Industry/Sector:... ...in data and analytics engineering focus on leveraging... ...meet business needs. As a Senior Associate, you analyze... ...languages like Python, Java, Scala - Proficient... ...with team members. We evaluate these factors thoughtfully...SeniorPythonFull timeH1b- ...Worth is looking for a Senior Software Engineer to join our Integrations Squad and own the systems... ...and subsystems, from initial partner evaluation through production rollout and ongoing... ...data. Production experience with Python in backend systems. Background in...SeniorPythonRemote workFlexible hours
$175k - $210k
A pioneering AI Startup is seeking an experienced Back-End Engineer to design and build intelligent systems. The role is fully remote and suitable for those with... ...environments. Ideal candidates are proficient in Python and PostgreSQL, with strong technical communication...SeniorPythonRemote work$155k - $235k
...Databricks, to the semantic and AI layers that sit on top.... ...data works for engineers, analysts, and business... ...We're looking for a Senior AI Data Engineer to lead... ...Scribd. Tech Stack We use Python, SQL, Databricks (Unity... ...trained, aligned and evaluated (RLHF, fine‑tuning,...SeniorPythonHome officeFlexible hours$105.8k - $174.8k
...and Decision Science – AI Native Engineering Physical AI Engineering Consultant, Senior Consultant The... ...twin applications, and software engineering. We'll... ...programming languages such as Python, C++, or Java, with... ...data and evaluating results to make meaningful...SeniorPythonFull timeWork experience placementSummer holidayFlexible hours$124k - $280k
...Competency: Data, Analytics & AI Industry/Sector:... ...in data and analytics engineering focus on leveraging... ...and health plans. As a Senior Manager, you will drive... ...modeling, prompt engineering, Python-based development,... ...with team members. We evaluate these factors thoughtfully...SeniorPythonFull timeH1b- A leading global consulting firm in Miami is seeking a Senior AI Native Engineer to research and implement scalable AI systems that meet business... ...ideal candidate has a bachelor's degree, strong skills in Python, and 3-6 years of experience in AI or Machine Learning. This...SeniorPython
- ...company in Miami is seeking an experienced engineer to join their Transformation Office. This... ...on building automated infrastructure for AI initiatives, integrating legacy data with... ...MLOps and LLMOps, and strong skills in Python, SQL, and cloud environments. The position...SeniorPythonWork at office
$106.9k - $176.5k
...skills and ambitions. As a Senior AI Native Engineer, you will be at the forefront... ...science, analytics, and software engineering. We’ll look to... ...Learning. Strong skills in Python. Ability to collaborate and... ...ingestion, transformation, and evaluation. Experience with model...SeniorPythonFull timeWork experience placementSummer holidayFlexible hours- ...Role Summary Research Engineers at Citadel Securities are responsible... ...priorities and deliver custom software solutions Design and... ...Proficiency with C++ and Python Experience with derivatives... ...compute, machine learning and AI to power our analytics and tackle...SeniorPython
- Rialto Associate Services, LLC is looking for an experienced AI and Data Engineer to design and build enterprise AI knowledge bases that unify... ...in investment analytics, and hands-on experience with Python, FastAPI, and Azure. The role involves integrating LLM workflows...SeniorPython
- ...award of contract** The Senior Software Developer advances our customer's... ...Enterprise Data Architecture (EDA) by engineering modern, scalable applications that... ..., HTML5, CSS3, Nod JS, and AI/ML. Experience with Python, Grunt, Node, AngularJS, ArangoDB,...SeniorPythonContract workWork at officeWorldwide
$115k - $165k
...Title: Sr Front End Software Engineer Location: Miami, FL (Onsite) Employment Type: Full... ...Technologies: JavaScript | TypeScript | Python | Java | Front-end frameworks | Back-end... ...for this job, you agree to receive AI-generated calls, text messages, and/or emails...SeniorPythonFull time$105.8k - $174.8k
...skills and ambitions. As a Senior AI Native Engineer, you will be at the... ...digital twin applications, and software engineering. We'll look to... ...programming languages such as Python, C++, or Java, with experience... ..., transforming data and evaluating results to make meaningful...SeniorPythonFull timeWork experience placementSummer holidayFlexible hours$150k - $200k
100% REMOTE Senior Python Engineer / Senior Full Stack Developer Needed for Growing Reinsurance Company... ...that is on the lookout for a Senior Software Engineer / Senior Full Stack Engineer with... ...this job, you agree to receive calls, AI-generated calls, text messages, or...SeniorPythonLocal areaRemote workWork from home- A leading AI security firm located in Miami, Florida is seeking a Senior Software Engineer who will play a crucial role in shaping the architecture and development processes... ...is proficient in backend languages such as Python or Java, along with skills in Kubernetes, CI/...SeniorPython
- A leading cruise line company is seeking an AI/ML Engineer in Miami, Florida. The ideal candidate will design, build, and deploy AI agents... ...Bachelor's degree, 5-6 years of relevant experience, and strong Python skills. The role is mainly in-office, requiring team...SeniorPythonWork at officeRemote work
- ...Celestar Corporation is seeking a Senior Software Developer (SSD) to support USSOUTHCOM J26... ...with one of the following technologies: Python, SQL, PostgreSQL, Grunt, Node, AngularJS... ...of JavaScript, HTML5, CSS3, Nod JS, and AI/ML. • Active TS/SCI Security Clearance...SeniorPythonLocal areaRemote work
- A global professional services firm is seeking a Senior AI Native Engineer to lead the development of AI systems that meet diverse business needs... .... Candidates should have strong programming experience in Python or C++, along with excellent problem-solving and communication...SeniorPython
- The Blackstone Group L.P. is seeking a Senior Vice President, Software Engineering Manager for the AI Platform Development team in Miami, Florida. This role involves... ...engineering experience, advanced skills in Python, C#, and TypeScript, and a strong leadership track...SeniorPython
$155k
...the Team The Quality Engineering team builds the shared... ...We are looking for a Senior Software Engineer, Quality... ...in implementing how AI reshapes quality engineering... ...Experience using or evaluating AI-powered... ...ranges are based on local benchmarks for each role, level,...SeniorContract workLocal areaHome officeFlexible hoursShift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Software Engineer AI Evaluation & Benchmarks (Python) [Remote]. Be the first to apply!
- software engineer full time Miami, FL
- startup software engineer Miami, FL
- intermediate software engineer Miami, FL
- work from home software developer Miami, FL
- software developer Miami, FL
- software development engineer aws Miami, FL
- ngo software engineer Miami, FL
- software engineer Miami, FL
- senior software engineer Miami, FL
- cybersecurity software engineer Miami, FL

