Member of Technical Staff, LLM Evaluation

$119.8k - $234.7k

Microsoft Corporation

Overview As a Member of Technical Staff, LLM Evaluation, you will develop and implement cutting‑edge methodologies to help us evaluate how well Copilot performs in real‑world usage scenarios. Users turn to Copilot for all types of endeavors, making it critical that we ensure our AI systems effectively help them meet their needs. Our vision for meeting user needs is expansive and includes not only task completion, but also affective aspects of the experience. You will be responsible for developing new methods to evaluate LLMs, train classifiers, experimenting with data collection techniques, and implementing methodologies to provide real‑time signals on Copilot performance. We’re looking for outstanding individuals with experience in the social sciences, machine learning, and analysis of natural language. The right candidate is a creative problem solver who will work closely with user researchers and product leaders to build automated evaluation frameworks that help us drive improvements in Copilot. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. Responsibilities Leverage expertise to measure the performance of Copilot, identify failure modes and novel mitigation strategies, including data mining, prompt engineering, LLM as a judge, and classifier training. Creative problem solving, navigating complexity with clarity, independently shaping direction and delivering results even when the path isn’t obvious. Create and implement comprehensive evaluation frameworks across diverse scenarios, edge cases, and potential failure modes. Build automated testing systems, generalize solutions into repeatable frameworks, and write efficient code for model pipelines and intervention systems. Maintain a user‑oriented perspective by understanding needs from user perspectives, validating approaches through user research, and serving as a trusted advisor on AI matters. Track advances in research, identify relevant state‑of‑the‑art techniques, and adapt algorithms to drive innovation in production systems serving millions of users. Qualifications Required Qualifications Bachelor’s Degree in Computer Science, Statistics, Economics, Psychology, Linguistics or related technical discipline AND 4 years technical engineering experience with coding in languages including Python and SQL. Experience prompting and working with large language models. Experience writing production‑quality Python code. Preferred Qualifications Demonstrated interest in Responsible AI. Data Science IC4 – The typical base pay range for this role across the $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year. Data Science IC5 – The typical base pay range for this role across the $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year. Certain roles may be eligible for benefits and other compensation. Benefits Industry leading healthcare Educational resources Discounts on products and services Savings and investments Maternity and paternity leave Generous time away Giving programs Opportunities to network and connect #J-18808-Ljbffr

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff, LLM Evaluation in New York, NY vacancy

Member of Technical Staff - Evaluations
...advance our understanding of model capabilities Build and refine evaluation systems and processes that create tight feedback loops between... ...to rigorously measure model improvements Familiarity with LLM evaluation methodologies: static benchmarks, human preference evals...
Suggested
Relocation package
Reflection
New York, NY
4 days ago
Member of Technical Staff, Agent Workflow Systems and Evaluation
...and sincere culture that puts the welfare of team members at the forefront." Maryna Agaibi Counsel | Legal... ...Operations Burlington, TX Principal Member of Technical Staff, Agent Workflow Systems and Evaluation Operational Excellence California Project Engineer-...
Suggested
Internship
Remote work
Night shift
SB Energy
New York, NY
2 days ago
Founding Member of Technical Staff
$150k - $300k
...purpose-built tools for their staff, and compounding that... ...intelligence across acquisitions. As a Member of Technical Staff , you'll work directly... ...zero-to-one systems across LLM pipelines, document parsing,... ...within weeks of shipping Evaluate and integrate emerging AI...
Suggested
Work at office
Eagle KMC Transportation
New York, NY
4 days ago
Member of Technical Staff, Product
...and field repair. About the role As a Member of Technical Staff focused on Agentic Reasoning & Core Capabilities... ...fail. Create domain-specific evaluations - Build evals, graders, traces, and... ...environment. Demonstrated experience building LLM-powered systems, agentic workflows,...
Suggested
Local area
Relocation package
Arena Physica
New York, NY
4 days ago
Member of Technical Staff (Open Role)
...core infrastructure to tune, evaluate, and serve specialized models... ...scale — pioneering task-specific LLM development and running... ...more to be announced soon. Our Technical Staff develops the foundational technology... ...a fit, please apply! As a Member of Technical Staff, you will...
Suggested
Live in
Work at office
Relocation
Visa sponsorship
Adaptive ML
New York, NY
2 days ago
Member of Technical Staff, MLE
...Member of Technical Staff, MLE at Cohere Join to apply for the Member of Technical Staff, MLE role at... ...understand customer domains, design custom LLM solutions, and deliver production-... ...retrieval + agent integrations, model evaluations, and SOTA modeling techniques. Influence...
Full time
Work at office
Remote work
Flexible hours
Cohere
New York, NY
1 day ago
Member of Technical Staff
$200k - $270k
...term success for both clients and candidates. Member of Technical Staff Location: New York City Company Stage of... ...systems, and frontend applications Build and improve LLM-powered systems, including evaluations, monitoring, and reliability tooling Analyze...
Work at office
Visa sponsorship
Recruiting from Scratch
New York, NY
2 days ago
Senior Member of Technical Staff
$170k - $270k
...Members of Technical Staff at Anterior own problems end-to-end—from system design through to production. You’ll... ...workflows and increase team leverage Writing evaluations to identify weaknesses and drive improvements in our LLM‑powered systems Working across the stack when...
Apprenticeship
Flexible hours
Anterior
New York, NY
6 hours ago
Member of Technical Staff, Cloud Infrastructure
$175k - $220k
...Member of Technical Staff, Cloud Infrastructure New York, NY; San Mateo, CA About Us: At Fireworks, we... ...independently benchmarked as the leader in LLM inference speed and are driving... ...infrastructure solutions. Continuously evaluate and integrate cloud‑native and open‑source...
Fireworks AI
New York, NY
4 days ago
Member of Technical Staff Audio and Voice AI
...Member Of Technical Staff – Audio And Voice Ai Systems Stuut is transforming accounts receivable for... ...-tune and optimize speech, audio, and LLM-based models for accuracy, latency,... ...availability and performance. Establish Evaluation & Monitoring Frameworks (LLMOps):...
Full time
Flexible hours
Stuut
New York, NY
4 days ago
Member of the Technical Staff - AI/ML
...and Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy... ...workflow challenges Build tool-using LLM agents that surface insights, recommend... ...feature engineering, model selection, evaluation, calibration Have strong opinions on...
Full time
Flexible hours
Stuut
New York, NY
18 hours ago
Technical staff members
...in Rochester, NY is searching for motivated and talented technical staff members to bring world-class scientific discoveries to real-world... ...technical documents, proposals, presentations. Assist technical evaluation, solution suggestions, and cost estimates. Participating...
Work experience placement
Work at office
6AM City
New York, NY
4 days ago
Member of Technical Staff
...the founders of Stripe, DoorDash, and Ramp. About the Role Members of Technical Staff (MTS) are the senior engineers who build the platform that... ...belongs in the stack and where it does not. If you have shipped LLM-driven systems in production, that is a plus, not a...
BEACON SOFTWARE COMPANY
New York, NY
6 hours ago
Member of Technical Staff
$150k - $300k
...powered agents. You'll design and own zero‑to‑one systems across LLM pipelines, document parsing, browser automation, and backend... ...outcomes with our design partners — your work will hit production Evaluate and integrate emerging AI frameworks, tools, and best practices...
Work at office
Modus
New York, NY
6 hours ago
Member of Technical Staff - Evals
...About the Role As a Member of Technical Staff - Evals at Entendre, you will play a key role in ensuring the quality and reliability of our AI... ...responsibilities will include: Designing and maintaining evaluation frameworks to measure the accuracy, reliability, and regression...
Entendre
New York, NY
4 days ago
Member of Technical Staff, Product
$150k - $300k
...product roadmap. We are seeking a highly technical engineer who wants to build a product that... ...product and make decisions across the LLM pipeline, infrastructure, backend, and UX... ...trust. Growth opportunities: As an early member of the team, you will shape processes from...
Flexible hours
Listen Labs
New York, NY
4 days ago
Member of Technical Staff New New York, NY
$175k - $220k
...independently benchmarked as the leader in LLM inference speed and are driving cutting-... ..., resilient backend infrastructure, lead technical design discussions, mentor engineers, and... ...into robust infrastructure solutions Evaluate and integrate cloud‑native and open‑source...
Fireworks AI, Inc.
New York, NY
4 days ago
Member of Technical Staff
$180k - $275k
...Job Type: Full-Time About The Role As a member of our R&D organization, you’ll build AI... ...data, and multi‑agent coordination. Build evaluation frameworks to measure agent quality,... ...communication skills; able to work closely with non‑technical teams, understand their workflows deeply...
Full time
Work at office
Flexible hours
WithCoverage
New York, NY
6 hours ago
Member of Technical Staff
$200k - $260k
...Member of Technical Staff $200000 - $260000 per year | New York, NY | On-Site | Permanent A bit about us: We’re partnering with a startup building... ...architectures Interest or experience in building AI native or LLM powered products Strong product instincts and attention to...
Permanent employment
Local area
3 days per week
Australia-Employment
New York, NY
4 days ago
Member of the Technical Staff, Cheminformatics
...Member of the Technical Staff, Cheminformatics Employment Type Full time Department Science Compensation The Role Output has built a biological... ...will build scalable computational tools and methods that evaluate synthetic feasibility across generated molecular libraries...
Full time
Outputbiosciences
New York, NY
6 hours ago
Member of Technical Staff
...push agents to their limits. You'd figure out how to build new evaluations and design the tasks that test what matters, not just what's easy... ...up, retry without duplicating, and fail without losing work. LLM infrastructure. You've run LLM workloads at scale. Token instrumentation...
Castle Island Co
New York, NY
2 days ago
Member of Technical Staff
...and acted upon by intelligent systems. Role Summary: As a Member of Technical Staff, you’ll help bridge the gap between cutting‑edge research... ...the full stack of ML workflows: data ingestion, training, evaluation, deployment, and monitoring Improve platform capabilities...
People In AI
New York, NY
4 days ago
Member of Technical Staff
$180k - $275k
...Member Of Technical Staff Location: New York, NY Job Type: Full-Time WithCoverage replaces the traditional insurance brokerage with AI-... ...from unstructured data, and multi-agent coordination. Build evaluation frameworks to measure agent quality, catch regressions,...
Full time
For contractors
Work at office
Flexible hours
With Coverage
New York, NY
3 days ago
Member of the Technical Staff AI/ML
...Activant, 1984 Ventures and Page One. The Role We’re hiring a Member of Technical Staff – AI/ML to design, build, and deploy AI-powered systems... ...needs into effective AI solutions. Measure Impact: Create evaluation frameworks to track AI system performance and quantify...
Full time
Flexible hours
Stuut
New York, NY
1 day ago
Member of Technical Staff (Frontend)
...resonates, mention it in your application. About the Role Members of Technical Staff at Anterior own problems end-to-end — from system design through... ...evals to identify weaknesses and drive improvements in our LLM-powered systems Building internal tooling to automate...
Apprenticeship
Flexible hours
Anterior
New York, NY
6 hours ago
Member of the Technical Staff, Molecular Generation
...hardest questions in the field and shape what comes next. Member of the Technical Staff, Molecular Generation Location Employment Type Full time... ...approaches, run experiments on distributed GPU clusters, and evaluate results. You will design and build generative...
Full time
Outputbiosciences
New York, NY
6 hours ago
Member of Technical Staff - Post-Training
...Member of Technical Staff – Post‑Training Reflection AI is building open superintelligence and making it accessible to all. We’re developing open... ...learning fundamentals and practical experience with large‑scale LLM training. Strong engineering skills, comfortable diving into...
Full time
Relocation package
Reflection AI
New York, NY
6 hours ago
Member of Technical Staff - Applied AI
...About the Role As a Member of Technical Staff - Applied AI at Entendre, you will design and ship user-facing products that combine cutting‑edge AI... ...understandable and reliable. Integrate large language model (LLM) capabilities—including retrieval, reconciliation, and...
Entendre
New York, NY
6 hours ago
Member of Technical Staff - Quantitative Research
$250k - $350k
...Member of Technical Staff - Quantitative Research New York City (Remote possible for exceptional candidates) About Uncharted/Udio Udio builds... ...with the modeling team, product leadership and the music evaluation manager, you will apply your research toward pushing the frontier...
Work experience placement
Remote work
Flexible hours
Udio
New York, NY
6 hours ago
Member of Technical Staff (intern)
...We provide the core infrastructure to tune, evaluate, and serve specialized models at scale — pioneering task‑specific LLM development and running production‑ready... ...— with much more to be announced soon. Our Technical Staff develops the foundational technology that powers...
Internship
Live in
Work at office
Adaptive ML
New York, NY
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff, LLM Evaluation. Be the first to apply!