Software Engineer, Inference - Performance Optimization
AI Chopping Block, Inc.
About the Team Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and drive faster, cheaper inference. We combine systems profiling, benchmarking, and analysis to understand where time and cost are spent, then turn that understanding into performance optimizations and models that project performance and capacity needs for future launches. About the Role In this role, you will model inference performance across application, model, and fleet layers with higher fidelity. You will build cost‑to‑serve estimates from microbenchmarks and create tools that help cross‑functional teams reason about latency, capacity, utilization, and cost tradeoffs. In this role, you will: Build and refine performance models that translate microbenchmark results into cost‑to‑serve estimates. Analyze inference workloads end to end across applications, models, and fleet infrastructure. Enhance tooling to identify bottlenecks across layers for latency and throughput. Partner with other teams to turn performance insights into concrete improvements and project how future changes affect inference. You might thrive in this role if you: Enjoy reasoning from first principles about distributed systems, model inference, and hardware efficiency. Are comfortable working across abstraction layers, from application behavior to kernels, accelerators, networking, and fleet scheduling. Have deep expertise with performance profiling, benchmarking, analysis, and optimization. Enjoy collaborating with engineering and research teams to improve real production systems. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general‑purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link. #J-18808-Ljbffr AI Chopping Block, Inc.
- ...ML Systems Engineer — Training & Inference Optimization (MBMB) We are building large-scale embodied intelligence... ...robot foundation models, high-performance training infrastructure, and on-device... ...boundaries across hardware, software, and model design — where improvements...Performance
$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination... ...and infrastructure to streamline and optimize model efficiency and deployments... ...Infiniband, NVLink) ~ Experience with high performance large scale ML systems ( ~100 GPUs)...Performance- ...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference engine. In... ...performance Analyze performance bottlenecks across the software and hardware stack, and implement targeted...PerformanceWorldwideFlexible hours
$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks' Foundation... ...background (3+ years or equivalent) in performance-critical systems Solid understanding of...PerformanceLocal areaWorldwide- ...About the Team Our Inference team brings OpenAI's... ...to before. We focus on performant and efficient model... ...We are looking for an engineer who wants to take the... ...capable AI models and optimize them for use in a high... ...years of professional software engineering experience...Performance
- ...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier... ...- including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms that allow applied scientists...PerformanceInternshipImmediate start
$300k
...committed researchers, engineers, policy experts, and... ...the role Our Inference team is responsible for... ...scientists the high-performance inference infrastructure... ...Have significant software engineering experience... ...systems LLM inference optimization, batching, and...PerformanceWork at officeWorldwideVisa sponsorshipFlexible hours- ...small, fast-growing team of engineers in San Francisco powering Fortune... ...-latency, high-throughput inference for OCR and multimodal... ...smart batching and caching Optimize kernels, tokenization, and model... ...control with clear SLOs Own performance dashboards and capacity planning...PerformanceWork at officeVisa sponsorshipRelocation package
- ...About the Team Our Inference team brings OpenAI’s most capable research and technology... ...been able to before. We focus on performant and efficient model inference, as... .... About the Role We’re hiring engineers to scale and optimize OpenAI’s inference infrastructure across...PerformanceFull time
- ...Baseten powers mission-critical inference for the world's most dynamic... ...and help build the platform engineers turn to to ship AI products.... ...Deployed Engineers, Model Performance Engineers, and sister... ...runtime tuning, and server-level optimizations. Build large-scale, real-...PerformanceFull timeFlexible hours
$165k
...what's next. About the Role Inference is now the defining cost... ...systems, model optimization, and serving infrastructure... ...initial configuration and performance tuning to production SLA maintenance... ...5+ years of professional software engineering experience with a track record...PerformanceLocal area- ...powers mission‑critical inference for the world's most dynamic... ...help build the platform engineers turn to to ship AI... ...working across product, software development, performance engineering, and customer... ...outcomes for our customers. Optimize and enhance AI/ML projects...PerformanceWork experience placementFlexible hours
$175k - $225k
...led by veteran operators and engineers, alumni of Sonos, Paypal,... ...We're looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware. In this... ...with CUDA kernels, TensorRT optimizations, and the challenge of deploying...PerformanceLocal areaRemote work$160k - $250k
...Senior Backend Engineer, Inference Platform San Francisco About the... ...If you get a thrill from optimizing latency down to the last millisecond... ...boundaries of inference performance and efficiency. Shape... .... ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL)...PerformanceFull timeLocal area- Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure. The ideal candidate will have extensive experience with LLM frameworks, quantization techniques...Performance
$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are... ...experiences and scalable, high-performance AI infrastructure. At... ..., deploy, and support AI software components including foundation... ...state-of-the-art LLM optimization techniques to improve the...PerformanceFull timePart timeLocal area$380k
...benefit. About the Role We're looking for a GPU Inference Engineer to contribute to improvements in model serving... ...is a high-impact role where you'll drive initiatives to optimize inference performance and scalability. You'll also be engaged in model design...PerformanceWork at officeRelocation package- ...Analyze and model system performance, identifying... ...Build and lead a team of engineers responsible for implementing... ...the low-level inference stack, including kernel... ...Have designed or optimized high-performance compute... ...performance-critical software such as CUDA kernels,...PerformanceWork at officeRelocation package
- ...Staff Technical Lead for Inference & ML Performance San Francisco fal is the generative media... ...shape the future of fal's inference engine and ensure our generative models achieve... ...inference performance enhancements and optimizations. - You regularly ship code that...Performance
- Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming... ...paged attention, sequence packing, etc. Conducting model performance reviews Improve scheduler, batcher, autoscaling; profile...Performance
- ...Location Type Hybrid Department Inference Model Serving Who are we?... ...is a team of researchers, engineers, designers, and more, who are... ...energized by building high-performance, scalable and reliable machine... ...with many teams to deploy optimized NLP models to production in...PerformanceFull timeWork experience placementWork at officeRemote workFlexible hours
$220k
We build and run the inference engine behind every Perplexity query and... ...rapidly growing traffic. Performance optimisation. Profile and fix... ...3+ years of professional software engineering experience with... ...architectures and inference optimization techniques (e.g. quantization...Performance- ...Chopping Block, Inc. is looking for a specialized role to model inference performance across application, model, and fleet layers.... ...performance models and analyzing inference workloads to identify and optimize bottlenecks. Ideal candidates will have deep expertise in performance...Performance
$300k
...technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal candidate will... ...deep understanding of GPU execution, and a knack for optimizing inference latency for large generative models. With a...PerformanceVisa sponsorshipRelocation package- Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming...PerformanceFlexible hours
$170k - $216k
...evaluate the Waymo Driver's software stack at a massive... ...of customers Software Engineers, Product, Data Science... ...Build and evolve ML inference infrastructure for simulations... ...frameworks, TPUs and optimizing models for serving.... ..., if the role can be performed remote, the specific...Full timeRemote work- ...hardware systems to accelerate AI inference. These inference systems offer significant performance and efficiency gains over... ...inference systems. Senior Software Engineer – Machine Learning Systems &... ...relies on templates, SIMD optimizations, and efficient parallel computing...Performance
- ...powered workforce management that optimizes both human and AI capacity,... ..., data pipelines, and inference servers to predict support contact... ...with ML packages and software: Experience using Python libraries... ...team. Passion for performance: A strong commitment to advancing...Performance
$200k - $300k
...Senior Platform Engineer – AI Infrastructure $200-$300k base + Equity (depending... ...serving infrastructure, workload optimization, and platform performance Improve networking and connectivity... ...exposure Experience with AI inference or model-serving systems Real-time...Performance- ...Senior Software Engineer, LLM Performance SF Bay Area (Hybrid) Parasail is redefining AI infrastructure... ...a distributed network of GPUs, optimizing for cost, performance, and flexibility... ...-source projects. Contributions to inference engines such as vLLM is a strong...Performance
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer, Inference - Performance Optimization. Be the first to apply!
- graduate software developer San Francisco, CA
- rust software engineer San Francisco, CA
- senior software design engineer San Francisco, CA
- software engineer student San Francisco, CA
- software engineer amazon San Francisco, CA
- software developer positions San Francisco, CA
- software engineer full time San Francisco, CA
- software qa engineer San Francisco, CA
- new graduate software engineer San Francisco, CA
- junior software developer San Francisco, CA


