Member of Technical Staff, LLM Evaluation
$119.8k - $234.7kMicrosoft Corporation
Overview As a Member of Technical Staff, LLM Evaluation, you will develop and implement cutting‑edge methodologies to help us evaluate how well Copilot performs in real‑world usage scenarios. Users turn to Copilot for all types of endeavors, making it critical that we ensure our AI systems effectively help them meet their needs. Our vision for meeting user needs is expansive and includes not only task completion, but also affective aspects of the experience. You will be responsible for developing new methods to evaluate LLMs, train classifiers, experimenting with data collection techniques, and implementing methodologies to provide real‑time signals on Copilot performance. We’re looking for outstanding individuals with experience in the social sciences, machine learning, and analysis of natural language. The right candidate is a creative problem solver who will work closely with user researchers and product leaders to build automated evaluation frameworks that help us drive improvements in Copilot. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. Responsibilities Leverage expertise to measure the performance of Copilot, identify failure modes and novel mitigation strategies, including data mining, prompt engineering, LLM as a judge, and classifier training. Creative problem solving, navigating complexity with clarity, independently shaping direction and delivering results even when the path isn’t obvious. Create and implement comprehensive evaluation frameworks across diverse scenarios, edge cases, and potential failure modes. Build automated testing systems, generalize solutions into repeatable frameworks, and write efficient code for model pipelines and intervention systems. Maintain a user‑oriented perspective by understanding needs from user perspectives, validating approaches through user research, and serving as a trusted advisor on AI matters. Track advances in research, identify relevant state‑of‑the‑art techniques, and adapt algorithms to drive innovation in production systems serving millions of users. Qualifications Required Qualifications Bachelor’s Degree in Computer Science, Statistics, Economics, Psychology, Linguistics or related technical discipline AND 4 years technical engineering experience with coding in languages including Python and SQL. Experience prompting and working with large language models. Experience writing production‑quality Python code. Preferred Qualifications Demonstrated interest in Responsible AI. Data Science IC4 – The typical base pay range for this role across the $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year. Data Science IC5 – The typical base pay range for this role across the $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year. Certain roles may be eligible for benefits and other compensation. Benefits Industry leading healthcare Educational resources Discounts on products and services Savings and investments Maternity and paternity leave Generous time away Giving programs Opportunities to network and connect #J-18808-Ljbffr
- ...advance our understanding of model capabilities Build and refine evaluation systems and processes that create tight feedback loops between... ...to rigorously measure model improvements Familiarity with LLM evaluation methodologies: static benchmarks, human preference evals...SuggestedRelocation package
- ...and sincere culture that puts the welfare of team members at the forefront." Maryna Agaibi Counsel | Legal... ...Operations Burlington, TX Principal Member of Technical Staff, Agent Workflow Systems and Evaluation Operational Excellence California Project Engineer-...SuggestedInternshipRemote workNight shift
$150k - $300k
...purpose-built tools for their staff, and compounding that... ...intelligence across acquisitions. As a Member of Technical Staff , you'll work directly... ...zero-to-one systems across LLM pipelines, document parsing,... ...within weeks of shipping Evaluate and integrate emerging AI...SuggestedWork at office- ...and field repair. About the role As a Member of Technical Staff focused on Agentic Reasoning & Core Capabilities... ...fail. Create domain-specific evaluations - Build evals, graders, traces, and... ...environment. Demonstrated experience building LLM-powered systems, agentic workflows,...SuggestedLocal areaRelocation package
- ...core infrastructure to tune, evaluate, and serve specialized models... ...scale — pioneering task-specific LLM development and running... ...more to be announced soon. Our Technical Staff develops the foundational technology... ...a fit, please apply! As a Member of Technical Staff, you will...SuggestedLive inWork at officeRelocationVisa sponsorship
- ...Member of Technical Staff, MLE at Cohere Join to apply for the Member of Technical Staff, MLE role at... ...understand customer domains, design custom LLM solutions, and deliver production-... ...retrieval + agent integrations, model evaluations, and SOTA modeling techniques. Influence...Full timeWork at officeRemote workFlexible hours
$200k - $270k
...term success for both clients and candidates. Member of Technical Staff Location: New York City Company Stage of... ...systems, and frontend applications Build and improve LLM-powered systems, including evaluations, monitoring, and reliability tooling Analyze...Work at officeVisa sponsorship$170k - $270k
...Members of Technical Staff at Anterior own problems end-to-end—from system design through to production. You’ll... ...workflows and increase team leverage Writing evaluations to identify weaknesses and drive improvements in our LLM‑powered systems Working across the stack when...ApprenticeshipFlexible hours$175k - $220k
...Member of Technical Staff, Cloud Infrastructure New York, NY; San Mateo, CA About Us: At Fireworks, we... ...independently benchmarked as the leader in LLM inference speed and are driving... ...infrastructure solutions. Continuously evaluate and integrate cloud‑native and open‑source...- ...Member Of Technical Staff – Audio And Voice Ai Systems Stuut is transforming accounts receivable for... ...-tune and optimize speech, audio, and LLM-based models for accuracy, latency,... ...availability and performance. Establish Evaluation & Monitoring Frameworks (LLMOps):...Full timeFlexible hours
- ...and Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy... ...workflow challenges Build tool-using LLM agents that surface insights, recommend... ...feature engineering, model selection, evaluation, calibration Have strong opinions on...Full timeFlexible hours
- ...in Rochester, NY is searching for motivated and talented technical staff members to bring world-class scientific discoveries to real-world... ...technical documents, proposals, presentations. Assist technical evaluation, solution suggestions, and cost estimates. Participating...Work experience placementWork at office
- ...the founders of Stripe, DoorDash, and Ramp. About the Role Members of Technical Staff (MTS) are the senior engineers who build the platform that... ...belongs in the stack and where it does not. If you have shipped LLM-driven systems in production, that is a plus, not a...
$150k - $300k
...powered agents. You'll design and own zero‑to‑one systems across LLM pipelines, document parsing, browser automation, and backend... ...outcomes with our design partners — your work will hit production Evaluate and integrate emerging AI frameworks, tools, and best practices...Work at office- ...About the Role As a Member of Technical Staff - Evals at Entendre, you will play a key role in ensuring the quality and reliability of our AI... ...responsibilities will include: Designing and maintaining evaluation frameworks to measure the accuracy, reliability, and regression...
$150k - $300k
...product roadmap. We are seeking a highly technical engineer who wants to build a product that... ...product and make decisions across the LLM pipeline, infrastructure, backend, and UX... ...trust. Growth opportunities: As an early member of the team, you will shape processes from...Flexible hours$175k - $220k
...independently benchmarked as the leader in LLM inference speed and are driving cutting-... ..., resilient backend infrastructure, lead technical design discussions, mentor engineers, and... ...into robust infrastructure solutions Evaluate and integrate cloud‑native and open‑source...$180k - $275k
...Job Type: Full-Time About The Role As a member of our R&D organization, you’ll build AI... ...data, and multi‑agent coordination. Build evaluation frameworks to measure agent quality,... ...communication skills; able to work closely with non‑technical teams, understand their workflows deeply...Full timeWork at officeFlexible hours$200k - $260k
...Member of Technical Staff $200000 - $260000 per year | New York, NY | On-Site | Permanent A bit about us: We’re partnering with a startup building... ...architectures Interest or experience in building AI native or LLM powered products Strong product instincts and attention to...Permanent employmentLocal area3 days per week- ...Member of the Technical Staff, Cheminformatics Employment Type Full time Department Science Compensation The Role Output has built a biological... ...will build scalable computational tools and methods that evaluate synthetic feasibility across generated molecular libraries...Full time
- ...push agents to their limits. You'd figure out how to build new evaluations and design the tasks that test what matters, not just what's easy... ...up, retry without duplicating, and fail without losing work. LLM infrastructure. You've run LLM workloads at scale. Token instrumentation...
- ...and acted upon by intelligent systems. Role Summary: As a Member of Technical Staff, you’ll help bridge the gap between cutting‑edge research... ...the full stack of ML workflows: data ingestion, training, evaluation, deployment, and monitoring Improve platform capabilities...
$180k - $275k
...Member Of Technical Staff Location: New York, NY Job Type: Full-Time WithCoverage replaces the traditional insurance brokerage with AI-... ...from unstructured data, and multi-agent coordination. Build evaluation frameworks to measure agent quality, catch regressions,...Full timeFor contractorsWork at officeFlexible hours- ...Activant, 1984 Ventures and Page One. The Role We’re hiring a Member of Technical Staff – AI/ML to design, build, and deploy AI-powered systems... ...needs into effective AI solutions. Measure Impact: Create evaluation frameworks to track AI system performance and quantify...Full timeFlexible hours
- ...resonates, mention it in your application. About the Role Members of Technical Staff at Anterior own problems end-to-end — from system design through... ...evals to identify weaknesses and drive improvements in our LLM-powered systems Building internal tooling to automate...ApprenticeshipFlexible hours
- ...hardest questions in the field and shape what comes next. Member of the Technical Staff, Molecular Generation Location Employment Type Full time... ...approaches, run experiments on distributed GPU clusters, and evaluate results. You will design and build generative...Full time
- ...Member of Technical Staff – Post‑Training Reflection AI is building open superintelligence and making it accessible to all. We’re developing open... ...learning fundamentals and practical experience with large‑scale LLM training. Strong engineering skills, comfortable diving into...Full timeRelocation package
- ...About the Role As a Member of Technical Staff - Applied AI at Entendre, you will design and ship user-facing products that combine cutting‑edge AI... ...understandable and reliable. Integrate large language model (LLM) capabilities—including retrieval, reconciliation, and...
$250k - $350k
...Member of Technical Staff - Quantitative Research New York City (Remote possible for exceptional candidates) About Uncharted/Udio Udio builds... ...with the modeling team, product leadership and the music evaluation manager, you will apply your research toward pushing the frontier...Work experience placementRemote workFlexible hours- ...We provide the core infrastructure to tune, evaluate, and serve specialized models at scale — pioneering task‑specific LLM development and running production‑ready... ...— with much more to be announced soon. Our Technical Staff develops the foundational technology that powers...InternshipLive inWork at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff, LLM Evaluation. Be the first to apply!
- salesforce technical analyst New York, NY
- it technical specialist New York, NY
- desktop support analyst New York, NY
- personal computer support technician New York, NY
- technical data analyst New York, NY
- technical support specialist New York, NY
- technical operations analyst New York, NY
- support analyst New York, NY
- customer support technician New York, NY
- lead technical specialist New York, NY

