Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer, AI Training Infrastructure

$175k - $220k
Full-time

Fireworks Ai

About Us:


Here at Fireworks, we’re building the future of generative AI infrastructure. Fireworks offers the generative AI platform with the highest-quality models and the fastest, most scalable inference. We’ve been independently benchmarked to have the fastest LLM inference and have been getting great traction with innovative research projects, like our own function calling and multi-modal models. Fireworks is funded by top investors, like Benchmark and Sequoia, and we’re an ambitious, fun team composed primarily of veterans from Pytorch and Google Vertex AI.

The Role:  


As a Training Infrastructure Engineer, you'll design, build, and optimize the infrastructure that powers our large-scale model training operations. Your work will be essential to developing high-performance AI training infrastructure. You'll collaborate with AI researchers and engineers to create robust training pipelines, optimize distributed training workloads, and ensure reliable model development.

Key Responsibilities:



  • Design and implement scalable infrastructure for large-scale model training workloads

  • Develop and maintain distributed training pipelines for LLMs and multimodal models

  • Optimize training performance across multiple GPUs, nodes, and data centers

  • Implement monitoring, logging, and debugging tools for training operations

  • Architect and maintain data storage solutions for large-scale training datasets

  • Automate infrastructure provisioning, scaling, and orchestration for model training

  • Collaborate with researchers to implement and optimize training methodologies

  • Analyze and improve efficiency, scalability, and cost-effectiveness of training systems

  • Troubleshoot complex performance issues in distributed training environments

Minimum Qualifications:



  • Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience

  • 3+ years of experience with distributed systems and ML infrastructure

  • Experience with PyTorch

  • Proficiency in cloud platforms (AWS, GCP, Azure)

  • Experience with containerization, orchestration (Kubernetes, Docker)

  • Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)

Preferred Qualifications:



  • Master's or PhD in Computer Science or related field

  • Experience training large language models or multimodal AI systems

  • Experience with ML workflow orchestration tools

  • Background in optimizing high-performance distributed computing systems

  • Familiarity with ML DevOps practices

  • Contributions to open-source ML infrastructure or related projects

Compensation is determined by various factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range for this role is a guideline and may be modified.

Redwood City Pay Range

$175,000 - $220,000 USD

Why Fireworks AI?



  • Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.

  • Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.

  • Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.

  • Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Vacancy posted 12 hours ago
Similar jobs that could be interesting for youBased on the Software Engineer, AI Training Infrastructure in Redwood City, CA vacancy
  • $180k - $300k

     ...eat. But a large portion of training compute is wasted training on...  ..., Microsoft, Amazon, and AI visionaries like Geoff Hinton...  ...both data research and data engineering necessary to solve this incredibly...  ...for an experienced Cloud Infrastructure Engineer to join our core... 
    Training
    Work at office
    Relocation package

    DatologyAI

    Redwood City, CA
    2 days ago
  • $220k - $260k

     ...we believe meaningful AI doesn't start with the...  ...to empower scientists, engineers, financial experts,...  ...are seeking a Senior Software Engineer to evolve Snorkel...  ...across our cloud infrastructure, developer platform, and...  ...and data flow through training and inference pipelines... 
    Training
    Local area

    Snorkel AI

    Redwood City, CA
    3 days ago
  • $180k - $300k

     ...DatologyAI Infrastructure Engineer Models are what they eat. But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even...  ..., Amplify Partners, Microsoft, Amazon, and AI visionaries like Geoff Hinton, Yann LeCun, Jeff... 
    Training
    Work at office
    Relocation package

    DatologyAI

    Redwood City, CA
    12 hours ago
  •  ...About the Role As a Data Infrastructure Engineer in Research at Luma, you will play a critical role...  ...supports our cutting-edge multimodal AI systems. Your work will focus on developing...  ...from vision. So, we are working on training and scaling up multimodal foundation models... 
    Training

    Luma AI

    Redwood City, CA
    2 days ago
  • $180k - $250k

    About the Company Companies want to train their own large models on their...  ...an experienced Data Platform Engineer to join as a member of our core Datology AI team. As one of our early senior...  ...a Data Engineering / Platform / Infrastructure Team. Experience building ML/DL... 
    Training
    Work at office
    Visa sponsorship
    Relocation package

    DatologyAI

    Redwood City, CA
    2 days ago
  • $140k - $390k

     ...What to Expect As a Software Engineer within the Autopilot AI Infrastructure team, you will work on reinforcing, optimizing, and scaling our infrastructure...  ...neural networks that the research team is designing to train on very large amounts of data, across large-scale... 
    Training
    Hourly pay
    Full time
    Temporary work
    Flexible hours

    Tesla

    Palo Alto, CA
    4 days ago
  • $160.36k - $240.54k

     ...Software Engineer, ML Infrastructure Mountain View, California (HQ) Who We Are Nuro is a self-driving...  ...driver, combining cutting-edge AI with automotive-grade hardware. Nuro...  ...and handle massive-scale distributed training. Data & ETL: Designing robust pipelines... 
    Training

    Nuro

    Mountain View, CA
    1 day ago
  •  ...Software Engineer, AI Compute Infrastructure Los Angeles, Palo Alto, San Francisco, Toronto, Singapore About HeyGen At HeyGen, our mission is to...  ...powers our state-of-the-art AI models—from multimodal training data pipelines to high-throughput, low-latency video generation... 
    Training
    Full time

    HeyGen

    Palo Alto, CA
    1 day ago
  • $157k - $235k

     ...glasses, Spectacles. Snap Engineering teams build fun and...  ...critical role in scaling our ML Infrastructure, optimizing training and inference systems, and...  .... We're looking for a Software Engineer, ML...  ...ensure fast and efficient AI model serving Build infrastructure... 
    Training
    Live in
    Work at office
    Local area

    Snapchat

    Palo Alto, CA
    4 days ago
  • $118k - $390k

     ...What to Expect As a Software Engineer within the Autopilot AI Infrastructure team, you will work on reinforcing, optimizing, and scaling our infrastructure...  ...neural networks that the research team is designing to train on very large amounts of data, across large-scale... 
    Training
    Hourly pay
    Full time
    Temporary work
    Flexible hours

    Tesla

    Palo Alto, CA
    3 days ago
  • $164.2k - $205.2k

     ...running the world's best data and AI infrastructure platform so our customers...  ...their business. Founded by engineers - and customer obsessed - we...  ...efficiency. As a Senior Software Engineer on the Compute...  ...relevant certifications and training, and specific work location.... 
    Training
    Local area
    Worldwide

    Databricks

    Mountain View, CA
    1 day ago
  • $174k - $252k

    Senior Software Engineer, AI/ML, AI and Infrastructure Apply X Note: By applying to this position you will have an opportunity to share your preferred working...  ...skills, experience, and relevant education or training. Responsibilities Write and test product or system... 
    Training
    Full time
    Worldwide

    Google Inc.

    Mountain View, CA
    4 days ago
  • $174k - $252k

    Senior Software Engineer, Infrastructure, Platforms and Devices Google Mountain View, CA, USA Bachelor’s degree...  ...that combine the best of Google AI, software, and hardware. Teams across...  ...experience, and relevant education or training. Your recruiter can share more about... 
    Training
    Full time

    Google Inc.

    Mountain View, CA
    3 days ago
  • $160.36k - $240.54k

     ...Software Engineer, ML Data Infrastructure Mountain View, California (HQ) Nuro is a self-driving technology...  ...scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro...  ...the quantity and diversity of its training and evaluation data. The team plays... 
    Training
    Work experience placement

    Nuro

    Mountain View, CA
    6 days ago
  • $147k - $211k

    Software Engineer, Pixel Infrastructure, Platforms and Devices Google Mountain View, CA, USA Bachelor’s degree...  ...services that combine the best of Google AI, software, and hardware. Teams...  ...experience, and relevant education or training. Your recruiter can share more about... 
    Training
    Full time

    Google Inc.

    Mountain View, CA
    4 days ago
  • $214k - $295k

     ...Staff Software Engineer, Data Infrastructure, AI Compute Platform Redwood City, CA (Hybrid) Biohub is the first large-scale initiative bringing...  ...requires excellence across five interconnected pillars: training frontier AI models specifically for biology; building... 
    Training
    Work at office
    Worldwide
    Relocation package
    Flexible hours
    3 days per week

    Biohub

    Redwood City, CA
    1 day ago
  • $174k - $252k

    Senior Software Engineer, Infrastructure, Google Store corporate_fare Google place Mountain View, CA, USA Apply...  ...retail point-of-sale system, and AI-driven algorithms for customer lifecycle...  ..., and relevant education or training. Your recruiter can share more about... 
    Training
    Full time

    Google Inc.

    Mountain View, CA
    2 days ago
  • $147k - $211k

    Software Engineer, Google ADs, API Infrastructure corporate_fare Google place Mountain View, CA, USA Apply Qualifications...  ...an agentic platform, enabling AI-driven interactions and...  ...experience, and relevant education or training. Your recruiter can share more about... 
    Training
    Full time

    Google Inc.

    Mountain View, CA
    1 day ago
  • $115k - $210k

     ...on our kiosks and our AI rings up their entire order...  ...looking for a backend infrastructure developer to help us build the software that runs our kiosks...  ...maintain a flat, high‑impact engineering culture where every...  ...to perform scalable training in the cloud Rethinking... 
    Training
    Temporary work
    Work experience placement
    Work at office
    Immediate start
    Flexible hours

    Mashgin

    Palo Alto, CA
    4 days ago
  • $147k - $211k

    Software Engineer, Infrastructure and Data AI, Ads Platform Google Mountain View, CA, USA Bachelor’s degree in Computer Science, a related technical field...  ...skills, experience, and relevant education or training. Your recruiter can share more about the specific salary... 
    Training
    Full time
    Local area

    Google Inc.

    Mountain View, CA
    12 hours ago
  • $230k - $360k

     ...About Luma AI A new class of intelligence is...  ...modeling challenge. It is an infrastructure challenge at the edge of what hardware, software, and organizations can...  .... A single exceptional engineer can reshape how the...  ...unnecessary Scaling Training & Inference Define... 
    Training
    Immediate start

    Luma AI

    Redwood City, CA
    31 minutes ago
  • A leading AI company in Redwood City is seeking an Applied Research Engineer to manage GPU cluster infrastructure and build resilient systems for model training. This role requires hands-on experience with cloud clusters, orchestration tools like Kubernetes, and solid... 
    Training
    Remote job

    Snorkel AI

    Redwood City, CA
    4 days ago
  • $241k - $331k

     ...Staff AI Infrastructure Engineer Redwood City, CA (Hybrid) Biohub is the first large-scale initiative bringing frontier AI models, massive...  ...day-to-day AI researcher workflows to multi-node hero training runs at thousands of GPUs. The team works at the intersection... 
    Training
    Work at office
    Relocation package
    3 days per week

    Biohub

    Redwood City, CA
    1 day ago
  • $160.36k - $240.54k

     ...Senior Software Engineer – GenAI Infrastructure & Agent Systems for Engineering Efficiency Mountain View, California...  ...driver, combining cutting-edge AI with automotive-grade hardware. Nuro...  ...reasoning Integrate with ML training, evaluation, and data pipelines... 
    Training

    Nuro

    Mountain View, CA
    3 days ago
  • $200k - $300k

     ...Company Overview At Skild AI, we are building the world's first general purpose robotic intelligence that...  ...Position Overview Skild AI, Inc. seeks a Senior Software Engineer, AI Training & Infrastructure in San Mateo, CA. You will be responsible for building... 
    Training
    Full time

    Skild AI

    San Mateo, CA
    8 hours ago
  • $188.5k - $282.7k

     ...innovation and solving complex engineering problems for our...  ...: Software Development: Understand...  ...with their data when infrastructure is attacked. Rubrik...  ...absence, compensation and training. The minimum and maximum...  ...Accelerating the World's AI Transformation... 
    Training
    Full time
    Local area

    Rubrik

    Palo Alto, CA
    8 hours ago
  • $174k - $252k

    Senior Software Engineer, Infrastructure, Ads Safety Apply X Note: By applying to this position you will have...  ...systems that integrate with AI Agents, Large Language Models (LLMs),...  ...experience, and relevant education or training. Your recruiter can share more about... 
    Training
    Full time

    Google Inc.

    Mountain View, CA
    4 days ago
  • $140k - $200k

     ...include frontend and backend engineers, AI research scientists, and...  ...collection to support our model training operations. We are able to...  ...a tight integration of infrastructure, engineering, and research...  ...are looking for a skilled Software Engineer to join us. What... 
    Training
    Full time
    Work at office
    Shift work

    Speechify

    Palo Alto, CA
    13 days ago
  • $137.86k - $240k

     ...Job description Software Engineer, Cloud & Infrastructure | Software Engineering Palo Alto, CA (on-site) About 1X We build humanoid robots that...  ...on speed and precision Collaborate with hardware, AI, and manufacturing teams to ensure integrated product functionality... 
    Full time
    Local area

    1x Technologies As

    San Carlos, CA
    8 hours ago
  •  ...Retell AI Retell AI is using first principles to reimagine the call center with cutting-edge voice AI. Thousands...  ...build the future together. About The Role As a Senior Software Engineer - Infrastructure, you'll be the owner of our build, release, and runtime... 
    H1b
    Work at office
    Relocation

    Retell AI

    Redwood City, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, AI Training Infrastructure. Be the first to apply!