Software Engineer, AI Training Infrastructure
$175k - $220kFireworks Ai
About Us:
Here at Fireworks, we’re building the future of generative AI infrastructure. Fireworks offers the generative AI platform with the highest-quality models and the fastest, most scalable inference. We’ve been independently benchmarked to have the fastest LLM inference and have been getting great traction with innovative research projects, like our own function calling and multi-modal models. Fireworks is funded by top investors, like Benchmark and Sequoia, and we’re an ambitious, fun team composed primarily of veterans from Pytorch and Google Vertex AI.
The Role:
As a Training Infrastructure Engineer, you'll design, build, and optimize the infrastructure that powers our large-scale model training operations. Your work will be essential to developing high-performance AI training infrastructure. You'll collaborate with AI researchers and engineers to create robust training pipelines, optimize distributed training workloads, and ensure reliable model development.
Key Responsibilities:
- Design and implement scalable infrastructure for large-scale model training workloads
- Develop and maintain distributed training pipelines for LLMs and multimodal models
- Optimize training performance across multiple GPUs, nodes, and data centers
- Implement monitoring, logging, and debugging tools for training operations
- Architect and maintain data storage solutions for large-scale training datasets
- Automate infrastructure provisioning, scaling, and orchestration for model training
- Collaborate with researchers to implement and optimize training methodologies
- Analyze and improve efficiency, scalability, and cost-effectiveness of training systems
- Troubleshoot complex performance issues in distributed training environments
Minimum Qualifications:
- Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
- 3+ years of experience with distributed systems and ML infrastructure
- Experience with PyTorch
- Proficiency in cloud platforms (AWS, GCP, Azure)
- Experience with containerization, orchestration (Kubernetes, Docker)
- Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)
Preferred Qualifications:
- Master's or PhD in Computer Science or related field
- Experience training large language models or multimodal AI systems
- Experience with ML workflow orchestration tools
- Background in optimizing high-performance distributed computing systems
- Familiarity with ML DevOps practices
- Contributions to open-source ML infrastructure or related projects
Compensation is determined by various factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range for this role is a guideline and may be modified.
Redwood City Pay Range
$175,000 - $220,000 USD
Why Fireworks AI?
- Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
- Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
- Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
- Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.
Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.
$180k - $300k
...eat. But a large portion of training compute is wasted training on... ..., Microsoft, Amazon, and AI visionaries like Geoff Hinton... ...both data research and data engineering necessary to solve this incredibly... ...for an experienced Cloud Infrastructure Engineer to join our core...TrainingWork at officeRelocation package$220k - $260k
...we believe meaningful AI doesn't start with the... ...to empower scientists, engineers, financial experts,... ...are seeking a Senior Software Engineer to evolve Snorkel... ...across our cloud infrastructure, developer platform, and... ...and data flow through training and inference pipelines...TrainingLocal area$180k - $300k
...DatologyAI Infrastructure Engineer Models are what they eat. But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even... ..., Amplify Partners, Microsoft, Amazon, and AI visionaries like Geoff Hinton, Yann LeCun, Jeff...TrainingWork at officeRelocation package- ...About the Role As a Data Infrastructure Engineer in Research at Luma, you will play a critical role... ...supports our cutting-edge multimodal AI systems. Your work will focus on developing... ...from vision. So, we are working on training and scaling up multimodal foundation models...Training
$180k - $250k
About the Company Companies want to train their own large models on their... ...an experienced Data Platform Engineer to join as a member of our core Datology AI team. As one of our early senior... ...a Data Engineering / Platform / Infrastructure Team. Experience building ML/DL...TrainingWork at officeVisa sponsorshipRelocation package$140k - $390k
...What to Expect As a Software Engineer within the Autopilot AI Infrastructure team, you will work on reinforcing, optimizing, and scaling our infrastructure... ...neural networks that the research team is designing to train on very large amounts of data, across large-scale...TrainingHourly payFull timeTemporary workFlexible hours$160.36k - $240.54k
...Software Engineer, ML Infrastructure Mountain View, California (HQ) Who We Are Nuro is a self-driving... ...driver, combining cutting-edge AI with automotive-grade hardware. Nuro... ...and handle massive-scale distributed training. Data & ETL: Designing robust pipelines...Training- ...Software Engineer, AI Compute Infrastructure Los Angeles, Palo Alto, San Francisco, Toronto, Singapore About HeyGen At HeyGen, our mission is to... ...powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation...TrainingFull time
$157k - $235k
...glasses, Spectacles. Snap Engineering teams build fun and... ...critical role in scaling our ML Infrastructure, optimizing training and inference systems, and... .... We're looking for a Software Engineer, ML... ...ensure fast and efficient AI model serving Build infrastructure...TrainingLive inWork at officeLocal area$118k - $390k
...What to Expect As a Software Engineer within the Autopilot AI Infrastructure team, you will work on reinforcing, optimizing, and scaling our infrastructure... ...neural networks that the research team is designing to train on very large amounts of data, across large-scale...TrainingHourly payFull timeTemporary workFlexible hours$164.2k - $205.2k
...running the world's best data and AI infrastructure platform so our customers... ...their business. Founded by engineers - and customer obsessed - we... ...efficiency. As a Senior Software Engineer on the Compute... ...relevant certifications and training, and specific work location....TrainingLocal areaWorldwide$174k - $252k
Senior Software Engineer, AI/ML, AI and Infrastructure Apply X Note: By applying to this position you will have an opportunity to share your preferred working... ...skills, experience, and relevant education or training. Responsibilities Write and test product or system...TrainingFull timeWorldwide$174k - $252k
Senior Software Engineer, Infrastructure, Platforms and Devices Google Mountain View, CA, USA Bachelor’s degree... ...that combine the best of Google AI, software, and hardware. Teams across... ...experience, and relevant education or training. Your recruiter can share more about...TrainingFull time$160.36k - $240.54k
...Software Engineer, ML Data Infrastructure Mountain View, California (HQ) Nuro is a self-driving technology... ...scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro... ...the quantity and diversity of its training and evaluation data. The team plays...TrainingWork experience placement$147k - $211k
Software Engineer, Pixel Infrastructure, Platforms and Devices Google Mountain View, CA, USA Bachelor’s degree... ...services that combine the best of Google AI, software, and hardware. Teams... ...experience, and relevant education or training. Your recruiter can share more about...TrainingFull time$214k - $295k
...Staff Software Engineer, Data Infrastructure, AI Compute Platform Redwood City, CA (Hybrid) Biohub is the first large-scale initiative bringing... ...requires excellence across five interconnected pillars: training frontier AI models specifically for biology; building...TrainingWork at officeWorldwideRelocation packageFlexible hours3 days per week$174k - $252k
Senior Software Engineer, Infrastructure, Google Store corporate_fare Google place Mountain View, CA, USA Apply... ...retail point-of-sale system, and AI-driven algorithms for customer lifecycle... ..., and relevant education or training. Your recruiter can share more about...TrainingFull time$147k - $211k
Software Engineer, Google ADs, API Infrastructure corporate_fare Google place Mountain View, CA, USA Apply Qualifications... ...an agentic platform, enabling AI-driven interactions and... ...experience, and relevant education or training. Your recruiter can share more about...TrainingFull time$115k - $210k
...on our kiosks and our AI rings up their entire order... ...looking for a backend infrastructure developer to help us build the software that runs our kiosks... ...maintain a flat, high‑impact engineering culture where every... ...to perform scalable training in the cloud Rethinking...TrainingTemporary workWork experience placementWork at officeImmediate startFlexible hours$147k - $211k
Software Engineer, Infrastructure and Data AI, Ads Platform Google Mountain View, CA, USA Bachelor’s degree in Computer Science, a related technical field... ...skills, experience, and relevant education or training. Your recruiter can share more about the specific salary...TrainingFull timeLocal area$230k - $360k
...About Luma AI A new class of intelligence is... ...modeling challenge. It is an infrastructure challenge at the edge of what hardware, software, and organizations can... .... A single exceptional engineer can reshape how the... ...unnecessary Scaling Training & Inference Define...TrainingImmediate start- A leading AI company in Redwood City is seeking an Applied Research Engineer to manage GPU cluster infrastructure and build resilient systems for model training. This role requires hands-on experience with cloud clusters, orchestration tools like Kubernetes, and solid...TrainingRemote job
$241k - $331k
...Staff AI Infrastructure Engineer Redwood City, CA (Hybrid) Biohub is the first large-scale initiative bringing frontier AI models, massive... ...day-to-day AI researcher workflows to multi-node hero training runs at thousands of GPUs. The team works at the intersection...TrainingWork at officeRelocation package3 days per week$160.36k - $240.54k
...Senior Software Engineer – GenAI Infrastructure & Agent Systems for Engineering Efficiency Mountain View, California... ...driver, combining cutting-edge AI with automotive-grade hardware. Nuro... ...reasoning Integrate with ML training, evaluation, and data pipelines...Training$200k - $300k
...Company Overview At Skild AI, we are building the world's first general purpose robotic intelligence that... ...Position Overview Skild AI, Inc. seeks a Senior Software Engineer, AI Training & Infrastructure in San Mateo, CA. You will be responsible for building...TrainingFull time$188.5k - $282.7k
...innovation and solving complex engineering problems for our... ...: Software Development: Understand... ...with their data when infrastructure is attacked. Rubrik... ...absence, compensation and training. The minimum and maximum... ...Accelerating the World's AI Transformation...TrainingFull timeLocal area$174k - $252k
Senior Software Engineer, Infrastructure, Ads Safety Apply X Note: By applying to this position you will have... ...systems that integrate with AI Agents, Large Language Models (LLMs),... ...experience, and relevant education or training. Your recruiter can share more about...TrainingFull time$140k - $200k
...include frontend and backend engineers, AI research scientists, and... ...collection to support our model training operations. We are able to... ...a tight integration of infrastructure, engineering, and research... ...are looking for a skilled Software Engineer to join us. What...TrainingFull timeWork at officeShift work$137.86k - $240k
...Job description Software Engineer, Cloud & Infrastructure | Software Engineering Palo Alto, CA (on-site) About 1X We build humanoid robots that... ...on speed and precision Collaborate with hardware, AI, and manufacturing teams to ensure integrated product functionality...Full timeLocal area- ...Retell AI Retell AI is using first principles to reimagine the call center with cutting-edge voice AI. Thousands... ...build the future together. About The Role As a Senior Software Engineer - Infrastructure, you'll be the owner of our build, release, and runtime...H1bWork at officeRelocation
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer, AI Training Infrastructure. Be the first to apply!
- software engineer full time Redwood City, CA
- startup software engineer Redwood City, CA
- rust software engineer Redwood City, CA
- software developer Redwood City, CA
- software development engineer aws Redwood City, CA
- ngo software engineer Redwood City, CA
- software engineer staff Redwood City, CA
- software engineer Redwood City, CA
- senior software engineer Redwood City, CA
- cybersecurity software engineer Redwood City, CA



