Principal Software Engineer - CoreAI Model Inference & Serving

$142.8k - $274.8k

Microsoft Corporation

Overview
Join our team within CoreAI , where we are building theAI data-planethat powersall LLMinferencing workloads across Microsoft and Azure customers-fromcutting-edgestartups to Fortune 500 enterprises. Ourconverged AI fabricdelivers inference capabilities for all LLMs inMicrosoft catalog, including OpenAI,Anthropic,Mistral, Cohere, Llama, and more. As a Principal Software Engineer , you will shape the future of one of thelargest and fastest-growing services in Azure, foundational to Microsoft's AI strategy. Our mission is to serve models at scale-reliably, efficiently, and with ultra-low latency-enabling a rich set of AI-powered product experiences. This is a rapidly evolving space with immense opportunities to learn, innovate, and drive industry-wide impact!

Responsibilities

Be a hands-on technical leader, designing, coding, and shipping core serving systems, smart routing, and request distribution for a broad portfolio of LLMs, including OpenAI, Mistral, Grok, DeepSeek, and others.
Build large-scale AI services and platform capabilities that power new products and customer experiences.
Drive cutting-edge innovation in AI systems alongside world-class engineers and cross-functional partners.
Lead through architecture, code reviews, mentorship, and technical excellence while staying close to implementation.
Improve reliability, scalability, observability, efficiency, and performance across mission-critical services.

Qualifications
Required/Minimum Qualifications:

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Java
- OR equivalent experience.

Other Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred/Additional Qualifications:

4+ years of design and problem-solving experience, with understanding of system performance, scalability, and engineering best practices.
Understanding of distributed systems specifically in request serving at scale; (e.g. inferencing, L7 gateways, high-performance storage, distributed databases across global-scale infrastructure)
Demonstrated experience in building high-quality, reliable systems at scale.
Experience using modern AI-assisted development tools and workflows to move faster, improve quality, and amplify engineering impact.
Customer-obsessed approach to problem solving, with empathy and a drive to deliver impactful solutions.

#AIPLATFORM #azureai #coreai #genai #aiinference

Software Engineering IC5 - The typical base pay range for this role across the U.S. is USD $142,800.00 - $274,800.00 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000.00 - $304,200.00 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Principal Software Engineer - CoreAI Model Inference & Serving in Mountain View, CA vacancy

Principal Software Engineer - CoreAI
$142.8k - $274.8k
...Overview Software quality is being redefined by AI. As part of... ...modern developer workflow and serve millions worldwide. As a Principal Software Engineer - CoreAI on the Playwright engineering... ...years of experience with AI LLM models, such as OpenAI, Azure AI, ML...
Suggested
Ongoing contract
Local area
Worldwide
Microsoft Corporation
Mountain View, CA
1 day ago
Principal Software Engineer - Responsible AI (CoreAI)
$142.8k - $274.8k
...cloud-enabled world. The CoreAI organization at Microsoft... ...AI, Azure OpenAI, Model as a Service, Azure ML, Cognitive... ..., our customers are better served. Within CoreAI, the Foundry... ...content. We are looking for a Principal Software Engineer - Responsible AI who is...
Suggested
Ongoing contract
Work at office
Local area
Microsoft Corporation
Mountain View, CA
1 day ago
Principal Software Engineer - AI Inference
$272k - $431.25k
...for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves contributing to upstream... ...systems. You will collaborate closely with internal model teams, infrastructure/SRE, and product to...
Suggested
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Software Engineer I, Inference
$139k - $204k
...Senior Software Engineer I, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave... ...to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA kernels, NCCL/SHARP,...
Suggested
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
17 hours ago
Senior Software Engineer, Inference
$152k - $204k
...Senior Software Engineer, Inference Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for... ...developing and tuning CUDA kernels, reducing model latency, maximizing compute and memory... ...(vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA...
Suggested
Permanent employment
Temporary work
Casual work
Work at office
Flexible hours
Shift work
CoreWeave
Sunnyvale, CA
17 hours ago
ML Engineer - Inference & Model Deployment
...building a 100x better job search engine: fast, comprehensive, honest, and... ...can help us turn powerful AI and ML models into fast, reliable production... ...infrastructure: deploying models, optimizing inference latency and throughput, scaling serving systems, and making sure our...
Relocation package
HiringCafe
Cupertino, CA
3 days ago
Senior/Principal Software Engineer - Growth (CoreAI)
$119.8k - $234.7k
...About the Role We’re building AI‑first engineering systems that power growth at Microsoft —... ...adopt AI. As a Growth Engineer in CoreAI, you’ll sit at the intersection of product... ...What We're Looking For Software engineering fundamentals with experience...
Ongoing contract
Local area
Microsoft Corporation
Mountain View, CA
4 days ago
Principal Software Engineer - Growth (CoreAI)
$165.6k - $296.4k
...role is about designing foundational engineering systems (instrumentation,... ...faster with higher confidence. As a Principal Growth Engineer in CoreAI, you’ll drive the technical strategy... ...deeply hands‑on and detail‑oriented Software engineering fundamentals with...
Ongoing contract
Local area
Microsoft Corporation
Mountain View, CA
6 days ago
Staff Software Engineer, Inference
$188k - $275k
...Staff Software Engineer, Inference CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of... ...inference frameworks such as vLLM, Triton, TensorRT-LLM, Ray Serve, or TorchServe Experience with GPU systems and...
Permanent employment
Temporary work
Casual work
Work at office
Flexible hours
CoreWeave
Sunnyvale, CA
1 day ago
Software Engineer, Inference AI/ML
$92k - $135k
...Learn more at What You'll Do: Join the Inference team to ship production features that... ...latency, reliability, and cost for model serving on our GPU platform. As an IC1, you'll... ...quickly with mentorship from experienced engineers. About the role: Implement well-scoped...
Permanent employment
Temporary work
Casual work
Internship
Work at office
Flexible hours
CoreWeave
Sunnyvale, CA
25 days ago
Senior Software Engineer, Inference Platform Palo Alto
...We’re looking for a Senior Engineer to help build the next-generation inference platform that supports embedding models used for semantic search,... ...systems at scale ~ Strong software engineering skills in... ...with concepts in ML model serving and inference runtimes, even...
Local area
Worldwide
MongoDB
Palo Alto, CA
1 day ago
Staff Software Engineer, Inference
$188k - $275k
...5. Learn more at What You'll Do: Inference Platform Team The Inference team builds... .... About the role: As a Staff Software Engineer (IC5) on the Inference team, you will act... ...as vLLM, Triton, TensorRT-LLM, Ray Serve, or TorchServe Experience with GPU systems...
Permanent employment
Temporary work
Casual work
Work at office
Flexible hours
CoreWeave
Sunnyvale, CA
10 days ago
AI Engineer, Model Quality and Performance
...-leading training and inference speeds and empowers machine... ...customers include top model labs, global... ...across the models we serve, building AI-driven systems... ...You'll sit between engineering, product, and customer... ...who are serious about software make their own hardware...
CEREBRAS SYSTEMS INC.
Sunnyvale, CA
1 day ago
Software Engineer - Hosted Model Infrastructure
$145k - $200k
...builds the world’s leading software for data-driven... ...Role We are a software engineering team with expertise in enabling ML models in production. We deploy... ...across the full stack, from inference engines, GPU scheduling... ..., Python and Go Model serving engines for GPU-...
Work experience placement
Work at office
Remote work
Work from home
Relocation package
Palantir
Palo Alto, CA
17 hours ago
Foundation Model DevOps Engineer
$150k - $350k
...the Institute of Foundation Models We are a dedicated research lab... ..., data scientists, and engineers, tackling the most fundamental... ...on Operational Stability to serve as the backbone of our AI research... ...difference between pre‑training and inference, and are familiar with the...
Live in
Immediate start
Institute of Foundation Models
Sunnyvale, CA
4 days ago
Member of Technical Staff - Imagine Model
$180k
...small, highly motivated, and focused on engineering excellence. This organization is for... ...As a multimodal engineer on the Imagine Model Team, you will develop cutting-edge AI... ...span data curation, modeling, training, inference serving, and product integration, covering both...
Temporary work
xAI
Palo Alto, CA
25 days ago
Principal/Senior Software Engineer, Experimentation Platform - CoreAI
$119.8k - $234.7k
...Overview CoreAI sits at the center of Microsoft's mission to redefine how software is built and experienced, providing the foundational platforms, services, and developer... ...will design and build services that empower engineers and scientists across the company to measure...
Ongoing contract
Local area
Microsoft Corporation
Mountain View, CA
9 days ago
Principal Software Engineer, Fintech Risk Platform
$261.5k - $353.5k
...important financial challenges. Serving over 50 million customers... ...units, is seeking a Principal Software Engineer to lead the long-term technology... ...detection, and behavioral modeling to significantly reduce... ..., and real-time inference at scale. Partner with...
Local area
Worldwide
Intuit
Mountain View, CA
2 days ago
Principal Software Engineer - Large-Scale LLM Memory and Storage Systems
$272k - $425.5k
Principal Software Engineer – Large-Scale LLM Memory and Storage Systems page is loaded## Principal Software Engineer... ...Dynamo is a high-throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments....
Local area
Remote work
NVIDIA Corporation
Santa Clara, CA
4 days ago
Staff Software Engineer, Inference Cloud
...industry-leading training and inference speeds and empowers machine... .... About The Role As a software engineer on our AI cloud platform, you... ...on our cloud platform for AI model training and inference. In this... ...services. Experience building ML serving and/or training services,...
Cerebras
Sunnyvale, CA
2 days ago
Senior ML Inference Engineer - Platform
$128.7k - $261.3k
...Description About the Team The Model Deployment & Inference Solutions team in GM AV... ...that makes deployment self-serve for every ML model... ...currently performed manually by engineers. Build the developer... ...designing clean, well-tested software with clear interfaces and good...
Local area
Remote work
Work from home
Relocation package
Flexible hours
Shift work
General Motors
Mountain View, CA
17 hours ago
Member of Technical Staff — Diffusion Model
...Technical Staff — Diffusion Model About the Role RadixArk is seeking... ...thinking with strong engineering execution — from designing novel... ...teams to scale training and inference Translate research ideas into... ...stars, the fastest open LLM serving engine), and developed Miles,...
Flexible hours
RadixArk
Palo Alto, CA
2 days ago
Senior Software Development Engineer - SGLang and Inference Stack
...accelerating deep learning models, and enabling RL... ...SOTA LLM and Multimodal inference at scale across multi-... ...across internal GPU software teams and engage with... ...PERSON: Skilled engineer with strong technical... ...for LLM, multimodal serving and RL-training....
Advanced Micro Devices , Inc.
Santa Clara, CA
4 days ago
Software Engineer, Inference
$187.5k - $395k
...for intelligence. To go beyond language models and build more aware, capable and... ...architectures by integrating them into our inference engine Collaborate closely across research,... ...user priority Architect a e2e model serving deployment pipeline for a custom vendor...
Luma AI
Redwood City, CA
3 days ago
Senior Software Development Engineer - LLM Inference Framework
...senior member of the LLM inference framework team, you... ...runtimes for large language models on AMD GPUs. You will... ...platform for LLM serving. This role sits... ...of inference engines, distributed systems,... ...kernel development Software Engineering ~ Expertise...
Advanced Micro Devices , Inc.
Santa Clara, CA
4 days ago
Senior ML Inference Platform Engineer (Remote)
...looking for a Senior ML Infrastructure Engineer in Mountain View, California.... ...scale robust platforms for ML inference workflows supporting GM’s AI efforts... ...and researchers to implement model serving strategies and handle backend software components. The position demands...
Remote job
Israelvcforum
Mountain View, CA
1 day ago
Principal Software Engineer, Ads Format, Level 7
$276k - $414k
...Principal Software Engineer Snap Inc is a technology company. We believe the camera presents the greatest opportunity to improve the way people... ...and implement highly automated systems for ad format serving, optimization, and A/B testing to maximize user engagement...
Temporary work
Live in
Work at office
Local area
Snapchat
Palo Alto, CA
2 days ago
Principal Software Engineer, Business Experience
$276k - $414k
...and its AR glasses, Spectacles. Snap Engineering teams build fun and technically... ...the forefront. We're looking for a Principal Software Engineer to join the Business Experience... ...culture faster, reinforce our values, and serve our community, customers and partners...
Temporary work
Live in
Work at office
Local area
Snapchat
Palo Alto, CA
1 day ago
Principal Software Engineer, Level 7
$276k - $414k
...its AR glasses, Spectacles ( . Snap Engineering ( teams build fun and technically... ...the forefront. We're looking for a Principal Software Engineer to join Snap Inc! What you'... ...culture faster, reinforce our values, and serve our community, customers and partners...
Temporary work
Live in
Work at office
Local area
Snap
Palo Alto, CA
2 days ago
Principal Software Engineer, CoreAI
$139.9k - $274.8k
...infrastructure for training agentic models to achieve frontier-level... .... Collaboration with engineers and researchers to build and... ...implement the services to serve the prod traffic and fulfill... ...teams to deliver large-scale software systems, preferably in AI, machine...
Ongoing contract
Local area
Microsoft Corporation
Mountain View, CA
17 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Software Engineer - CoreAI Model Inference & Serving. Be the first to apply!