Staff Software Engineer, Inference
$188k - $275kCoreWeave
Job Description
Job Description
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at
What You'll Do:
Inference Platform Team
The Inference team builds and operates CoreWeave's Kubernetes-native inference platform, powering low-latency, high-throughput AI workloads at massive scale. The team is responsible for request routing, scheduling, GPU resource management, and system-wide optimizations that drive performance, efficiency, and reliability across real-time inference systems.
About the role:
As a Staff Software Engineer (IC5) on the Inference team, you will act as a technical leader driving architecture, performance, and reliability across multiple services and teams. Your day-to-day will involve leading cross-team design initiatives, optimizing inference performance (latency, throughput, and GPU utilization), and improving system reliability at scale. You will work deeply in distributed systems and Kubernetes-based infrastructure, focusing on areas like scheduling, batching, and memory optimization. This role requires hands-on technical leadership and the ability to influence engineering direction across the organization.
Who You Are:
- 8–12+ years of experience building and operating large-scale distributed systems or cloud platforms
- Proven experience leading cross-team technical initiatives impacting multiple services or organizations
- Strong programming skills in Go, Python, or C++
- Deep expertise in Kubernetes at production scale, including orchestration, scheduling, and service design
- Strong understanding of distributed systems, networking, and performance optimization
- Experience designing and operating low-latency, high-throughput systems with strict P95/P99 latency requirements
- Hands-on experience with inference systems, including batching or micro-batching strategies, caching, and memory optimization
- Experience improving system performance using metrics-driven approaches (e.g., latency, throughput, utilization)
- Familiarity with mixed precision (BF16, FP8) and streaming inference workloads
Preferred:
- Experience with inference frameworks such as vLLM, Triton, TensorRT-LLM, Ray Serve, or TorchServe
- Experience with GPU systems and performance optimization (CUDA, NCCL, RDMA, NUMA, GPU interconnects)
- Experience leading multi-team or org-level technical initiatives
- Exposure to large-scale AI/ML infrastructure or hyperscale cloud environments
Wondering if you're a good fit?
We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match. Here are a few qualities we've found compatible with our team. If some of this describes you, we'd love to talk.
- You love to design and optimize high-performance distributed systems at scale
- You're curious about AI inference, GPU systems, and emerging performance techniques
- You're an expert in building reliable, low-latency infrastructure and driving system-wide improvements
Why CoreWeave?
At CoreWeave, we work hard, have fun, and move fast! We're in an exciting stage of hyper-growth that you will not want to miss out on. We're not afraid of a little chaos, and we're constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:
- Be Curious at Your Core
- Act Like an Owner
- Empower Employees
- Deliver Best-in-Class Client Experiences
- Achieve More Together
We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization's growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!
The base salary range for this role is $188,000 to $275,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).
What We Offer
The range we've posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.
In addition to a competitive salary, we offer a variety of benefits to support your needs. The benefits below reflect our US-based offerings; for roles in other locations, benefits vary and are shared during the hiring process. These include:
- Medical, dental, and vision insurance - 100% paid for by CoreWeave
- Company-paid Life Insurance
- Voluntary supplemental life insurance
- Short and long-term disability insurance
- Flexible Spending Account
- Health Savings Account
- Tuition Reimbursement
- Ability to Participate in Employee Stock Purchase Program (ESPP)
- Mental Wellness Benefits through Spring Health
- Family-Forming support provided by Carrot
- Paid Parental Leave
- Flexible, full-service childcare support with Kinside
- 401(k) with a generous employer match
- Flexible PTO
- Catered lunch each day in our office and data center locations
- A casual work environment
- A work culture focused on innovative disruption
California Applicants
California Consumer Privacy Act
Equal Opportunity & Accommodations
CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.
As part of this commitment and consistent with the Americans with Disabilities Act (ADA) , CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process, unless such accommodation would cause an undue hardship. If reasonable accommodation is needed, please contact: View email address on ziprecruiter.com.
Export Control Compliance
This position requires access to export controlled information. To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158, (B) eligible to access the export controlled information without a required export authorization, or (C) eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.
- Cerebras Systems, Inc. is seeking a Software Engineer in Sunnyvale, California to enhance high-performance, low-latency inference infrastructure. This role involves deploying scalable services, optimizing resource allocation, and integrating with containerized environments...Suggested
- Cerebras Systems in Sunnyvale, CA is seeking a Member of Technical Staff (Software Engineer) to implement infrastructure for high-performance, low-latency inference services. Applicants should have a Master’s degree in Computer Science or a related field and at least one...Suggested
- About the Role We're hiring a Staff Engineer to own major areas of the architecture of our Inference Cloud Platform. This team owns the cloud layer behind our Inference... ...& Qualifications 8+ years of experience in software engineering, with substantial individual contributor...Suggested
- Location: Sunnyvale We're hiring a Staff Engineer to help lead, drive, and contribute to projects on our Inference Platform team. Our team primarily owns the orchestration... ...& Qualifications 8+ years of experience in software engineering, with substantial individual...Suggested
$248.71k - $292.6k
About Groq Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers... ...AI is within reach, anything is possible. Build fast. Sr. Staff Software Engineer - High Performance GPU Inference Systems Mission Push the limits...Suggested$160.5k - $240.7k
...Company Qualcomm Technologies, Inc. Job Area Engineering Group Machine Learning Engineering... ...through machine learning hardware and software. Minimum Qualifications Bachelor’s... ...analytics, spanning model architectures, inference pipelines, and runtime frameworks deployed...Work experience placementWork from home$152k - $241.5k
Senior Software Engineer - Deep Learning Inference What you’ll be doing: Craft and develop robust inferencing software that can be scaled to multiple platforms for functionality and performance Develop components of TensorRT, NVIDIA’s SDK for high-performance deep learning...- NVIDIA Gruppe in Santa Clara is seeking a Deep Learning Software Engineer focused on improving performance of deep learning inference software like TensorRT. The ideal candidate will have a strong foundation in C++ and Python, relevant experience with deep learning frameworks...
- ...RL training and SOTA LLM and Multimodal inference at scale across multi-GPU and multi-node... ...You will collaborate across internal GPU software teams and engage with open-source... ...software ecosystem. THE PERSON: Skilled engineer with strong technical and analyticalexpertisein...
$152k - $241.5k
NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer - AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑in...$262k - $365k
Senior Staff Software Engineer, TPU Performance corporate_fare Google place Sunnyvale, CA, USA Apply Bachelor’s degree or equivalent practical... ..., enabling them to execute massive training and inference workloads using PyTorch and JAX. The AI and Infrastructure...Worldwide$196.5k - $219.3k
...requirements into platform features. Mentor junior engineers on secure backend development and best... ...the timely delivery of high‑quality software features while adhering to project... ...AI/ML systems (e.g. implementing model inference pipelines, fine‑tuning models, or working...Full time$197k - $291k
Staff Software Engineer, Machine Learning Compilers, Edge TPU Google Mountain View, CA, USA ; Kirkland, WA, USA Apply X In accordance with Washington... ...technical field. Experience in optimizing ML models for inference. Experience in Multi-Level Intermediate Representation (...Full timeTemporary work$207k - $275k
...foundation that powers AI training and inference at scale. This is an opportunity to help... ...possible with AI. What You’ll Do As a Staff Engineer (IC5), you will be a technical leader... ...Who You Are 8-12 years of professional software engineering experience. Proven track record...Permanent employmentTemporary workCasual workWork at officeFlexible hours$197k - $291k
Staff Software Engineer, Applied Research, Foundation User Models corporate_fare Google place Mountain View, CA, USA Apply Bachelor’s degree or... ...adaptation) that balances high-quality output with strict inference latency requirements for production environments. Drive architectural...Full timeImmediate startWorldwide- ...run on our GPUs. THE OPPORTUNITY We're looking for a senior software engineer who combines deep systems performance work with modern AI—... ...software from GPU kernels through distributed training and inference. You’ll join a core team of specialists working on the latest...Shift work
$262k - $365k
...experience. 8 years of experience in software development. 7 years of experience leading... ...Master’s degree or PhD in Engineering, Computer Science, or a related technical... ...enabling them to execute massive training and inference workloads using PyTorch and JAX. The AI...Worldwide$272k - $431.25k
NVIDIA is the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source LLM serving. This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly...- Cerebras Systems, Inc. is seeking engineers for its Inference Core Platform group in Sunnyvale, California. This role involves building foundational software and hardware infrastructure to enhance AI inference performance on the Cerebras Wafer-Scale Engine. Ideal candidates...
$152k - $241.5k
We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the... ...at the intersection of GPU performance engineering and public accountability. What You Will... ...equivalent experience. 5+ years of relevant software development experience. Strong Python...- A leading technology company is seeking a Senior AI Software Engineer to join their team in Santa Clara, California. In this role, you will... ...innovate and develop groundbreaking AI systems software for inference applications including deep learning framework optimizations...
$184k - $287.5k
NVIDIA Gruppe in Santa Clara is seeking an AI Systems Engineer to innovate and develop cutting-edge technologies in the AI inference software stack. Candidates should hold a Master's degree and possess over 6 years of experience in ML/DL systems development. The role involves...$152k - $241.5k
...seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves... ...Required qualifications include a relevant degree and significant software development experience in Python or C++. A deep understanding...$184k - $287.5k
Position Overview We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and...- Cerebras Systems, Inc. is looking for a Senior Performance Engineer to enhance the performance benchmarking and competitive pricing models... ...candidate will have extensive experience with open-source inference frameworks and an understanding of ML systems. This role...
$124k - $195.5k
NVIDIA Gruppe is looking for a passionate Software Engineer to join its TensorRT team in Santa Clara, California. This role involves designing and developing high-performance AI inference solutions while contributing to performance optimizations and collaborating with various...- A leading technology company is seeking a Senior System Software Engineer to develop GPU-accelerated AI inference serving software. The ideal candidate will have over 5 years of experience with deep learning software, strong skills in Rust and C++, and a collaborative approach...
$152k - $241.5k
NVIDIA Gruppe is seeking a Senior Software Engineer - AI Inference in Santa Clara, California. This role involves enhancing open-source LLM serving optimizations and implementing high-performance runtime capabilities. Candidates should have 5+ years of experience in building...$185k - $265k
...Fortinet is looking for a Senior/Staff Software Engineer on the FortiCNAPP Team! Be a valuable team member that owns and operates high-availability, cross-cloud, large-volume, data processing system that is one of the foundational pieces of Fortinet-Lacework’s Cloud security...Flexible hours- Member of Technical Staff (Software Engineer) Sunnyvale, CA Cerebras Systems builds the world’s largest AI chip, 56 times larger than GPUs.... ...scale architecture delivers industry‑leading training and inference speeds, empowering machine learning users to run large‑scale...Full timePart timeInternship
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Software Engineer, Inference. Be the first to apply!
- embedded software Sunnyvale, CA
- software sales Sunnyvale, CA
- android software developer Sunnyvale, CA
- software sales executive Sunnyvale, CA
- software quality assurance Sunnyvale, CA
- software sales representative Sunnyvale, CA
- software asset management analyst Sunnyvale, CA
- id software Sunnyvale, CA
- software support Sunnyvale, CA
- software technical support Sunnyvale, CA


