Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer, ML Infrastructure

Realm Labs LLC

Role Overview

We are hiring a Founding ML Infrastructure Engineer to own the end-to-end deployment, optimization, and operation of our suits of models in production.

This is a core founding role focused on building and operating production-grade LLM systems . You will apply deep knowledge of model internals to deploy, optimize, and run modern LLMs at scale , owning performance end-to-end across latency, throughput, and reliability .

You will design and operate the full ML serving stack from model artifacts to GPU execution, and work closely with Product and ML teams to ensure our models can support high QPS, strict SLAs, and production correctness .

This role is ideal for someone who deeply understands how LLMs work internally but chooses to specialize in making them fast, stable, and production-ready .

About Realm Labs

Realm Labs is an AI trust and security startup. We help enterprises detect, debug, and prevent AI’s misbehaviors in production. We are backed by top VCs and serve some of the most iconic global enterprises.

Key Responsibilities
  • Own the end-to-end LLM inference stack , including:
    • Model loading and execution
    • GPU utilization and memory efficiency
    • Runtime performance tuning
    • Production deployment and scaling
  • Design and operate high-performance LLM serving systems using technologies such as:
    • vLLM, TensorRT / TensorRT-LLM, Triton Inference Server, SGLang
  • Optimize inference across:
    • Latency
    • Throughput (QPS)
    • GPU memory footprint
    • Cost efficiency
  • Work hands-on with PyTorch and TensorFlow models , including:
    • Model graph understanding
    • Attention mechanisms, KV cache behavior, batching strategies
    • Precision tradeoffs (FP16, BF16, INT8, etc.)
  • Build and maintain production-grade GPU services :
  • Multi-model serving
  • Autoscaling strategies
  • Fault isolation and graceful degradation
  • Collaborate with application and platform teams to:
    • Define serving APIs
    • Ensure correctness and safety of outputs
    • Debug production issues end-to-end
  • Build a reproducible model training and versioning system for customer deployments
  • Establish best practices for:
    • Model versioning
    • Rollouts and rollbacks
    • Performance benchmarking
    • Production validation
Expected Qualifications
  • 5+ years of professional experience in ML infrastructure, systems engineering, or production ML roles.
  • Strong software engineering fundamentals; ability to write robust, maintainable production code .
  • Deep hands-on experience with LLM inference infrastructure , including:
    • PyTorch (required)
    • TensorFlow (working knowledge)
  • Proven experience with GPU inference optimization , including:
    • TensorRT / TensorRT-LLM
    • vLLM
    • Triton Inference Server
    • SGLang or similar serving runtimes
  • Strong understanding of LLM internals , such as:
    • Transformer architectures
    • Attention and KV caching
    • Batching, streaming, and token-level generation
  • Experience running ML systems in production with high traffic and SLAs.
  • Comfortable working in Linux-based, cloud production environments .
Preferred Qualifications
  • Experience deploying LLMs on Kubernetes and GPU clusters.
  • Familiarity with CUDA, NCCL , or low-level GPU performance concepts.
  • Experience with:
    • Model sharding and parallelism strategies
    • Multi-GPU inference
    • Streaming inference systems
  • Knowledge of observability for ML systems (metrics, latency breakdowns, GPU monitoring).
  • Experience working at startups or owning systems with minimal abstraction layers.
Additional Information
  • This is a founding, high-ownership role with direct impact on core product capabilities.
  • You will be expected to build, run, and own systems end-to-end .
  • The role may include limited on-call responsibilities aligned with production ownership.
Compensation & Benefits
  • Market aligned compensation and benefits.
  • Founding engineer equity ( Equity is a significant component of this role and will be discussed ).
  • Medical, Dental, Vision, Life insurance, 401-K, In-office lunch etc.
Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and candidate. But if we make you an offer, we will make all reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. #J-18808-Ljbffr
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Software Engineer, ML Infrastructure in Sunnyvale, CA vacancy
  • $160.36k - $240.54k

     ...Software Engineer, ML Infrastructure Mountain View, California (HQ) Who We Are Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge... 
    Suggested

    Nuro

    Mountain View, CA
    12 hours ago
  • $152k - $228k

     ...Software Engineer, Performance Tooling and Infrastructure Mountain View, California (HQ) Who We Are Nuro is a self-driving technology company on a mission...  ...At Nuro, every autonomy code change, from ML model updates to radius of map around the robot to number... 
    Suggested
    Temporary work

    Nuro

    Mountain View, CA
    2 days ago
  • $153k - $222k

     ...Machine Learning Engineer Applied Intuition, Inc. is powering the...  ...is creating the digital infrastructure needed to bring intelligence...  ...machine learning pipelines and ML engineers that want to work beyond...  ...degree in Computer Science, Software Engineering, or equivalent... 
    Suggested
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Remote work
    Day shift

    Applied Intuition

    Sunnyvale, CA
    3 days ago
  • $19 - $65 per hour

    Medium is looking for a Software Engineer Intern to work on AI-based virtual driver software for...  .... The role involves optimizing search infrastructures, developing sampling strategies, and...  ...candidates will have a solid foundation in ML and programming languages like Python... 
    Suggested
    Hourly pay
    Internship

    Medium

    Santa Clara, CA
    1 day ago
  • $174k - $252k

    Senior Software Engineer, Infrastructure AI/ML, Google Cloud Google Sunnyvale, CA, USA Apply Bachelor’s degree or equivalent practical experience. 5 years of experience programming in C++ or Java. 3 years of experience testing, maintaining, or launching software products... 
    Suggested
    Full time

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • Responsibilities Build out core infrastructure services and microservices that impact our machine...  ...that is interdependent with other engineering teams. Own features end-to-end, and regularly...  ...cross functionally with Core and ML engineering teams, and more. Qualifications... 
    Immediate start

    Centaur Labs

    Mountain View, CA
    3 days ago
  • $174k - $252k

    Senior Software Engineer, ML Infrastructure, Core Infra corporate_fare Google place Sunnyvale, CA, USA Apply Qualifications Bachelor’s degree or equivalent practical experience. 5 years of experience with software development in one or more programming languages.... 
    Full time
    Temporary work

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $180k - $250k

     ...Lead Data Infrastructure Engineer Cupertino, CA Company Gridmatic is a startup trying to help...  ...increasingly important. We use machine learning (ML) forecasting and optimization to trade...  ...Flyte, Airflow, or Temporal. Strong software engineering skills. Being able to write... 
    Work at office
    Remote work
    Work from home
    Home office
    Flexible hours
    3 days per week

    Gridmatic

    Cupertino, CA
    2 days ago
  • $160.36k - $240.54k

     ...Software Engineer, ML Data Infrastructure Mountain View, California (HQ) Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge AI with... 
    Work experience placement

    Nuro

    Mountain View, CA
    10 days ago
  • $160.36k - $240.54k

     ...About the Team We empower engineers to build the future of transportation by transforming how software is developed, tested, and...  ...systems. Our team builds the AI infrastructure that enables autonomous...  ...debugging, knowledge discovery, and ML model improvement. We sit... 

    Nuro

    Mountain View, CA
    2 days ago
  • $213k - $263k

     ...simulation across 15+ U.S. states. The ML Ops team, part of Waymo ML Platform team, builds tools and infrastructure to realize the ML flywheel at Waymo. This includes...  ...of professional experience in the field of software engineering ~ Experience programming in C++ ~... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    2 days ago
  • $193.93k - $291.15k

     ...Sr. Software Engineer, Perception Data Infrastructure Mountain View, California (HQ) About the Role We are a team of high-output generalists where ML and systems engineering converge to push autonomy performance forward. As a Senior Perception ML Data Infrastructure... 

    Nuro

    Mountain View, CA
    2 days ago
  • $224k - $356.5k

    NVIDIA is hiring engineers to scale up the introduction of next generation...  ...architecture into its EDA Infrastructure. We expect you to have a deep...  ...systems, familiarity with software testing and deployment, and excellent...  ...from the crowd Developing ML/AI infrastructure. Developing... 

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $141k - $202k

     ...2 years of experience with software development in C++. 2 years...  ...with developing large-scale infrastructure, distributed systems or networks...  ..., and software test engineering. About the job The XLA (Accelerated...  ...(infra) gaps to help with ML stack maturation (e.g.,... 
    Full time
    Worldwide

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $153k - $222k

     ....) About the role We are looking for infrastructure engineers with expertise in scaling open-source...  ...data infrastructure to join the Data & ML infra group. This role will work across...  .... Develop and deploy high-quality software using modern tooling and frameworks, especially... 
    Full time
    For contractors
    For subcontractor
    Casual work
    Work at office
    Remote work
    Day shift

    Decisive Point

    Mountain View, CA
    4 days ago
  • $136.8k - $259.2k

     ...A leading technology company is looking for a Software Engineer Graduate to join the Inference Infrastructure team in San Jose. This role involves designing and building...  ...should have a strong background in systems and ML, with a competitive salary range of $136,800 - $25... 

    Pangleglobal

    San Jose, CA
    2 days ago
  •  ...Software Engineer, AI Compute Infrastructure About HeyGen At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade...  ...such as Kubernetes and Ray . Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX .... 
    Full time

    HeyGen

    Palo Alto, CA
    2 days ago
  • $136.8k - $259.2k

     ...Software Engineer Graduate (Inference Infrastructure) - 2026 Start (PHD) Location: San Jose Team: Technology Employment Type: Regular The Inference...  ...infrastructure to deliver cost-efficient and secure ML platforms. Collaborate across teams to deliver world... 
    Temporary work

    Pangleglobal

    San Jose, CA
    2 days ago
  • $136.8k - $359.72k

     ...Senior Software Engineer - Compute Infrastructure (Orchestration & Scheduling) Location: San Jose Team: Infrastructure Employment Type: Regular...  ...innovate, powering global platforms like TikTok and various AI/ML & LLM initiatives, we face the challenge of enhancing... 
    Temporary work
    Overseas

    ByteDance

    San Jose, CA
    2 days ago
  • $156k - $387.6k

     ...Responsibilitie About the Team The Inference Infrastructure team is the creator and open-source...  ...new AI workloads, and are looking for engineers passionate about cloud-native systems,...  ...infrastructure to deliver cost-efficient and secure ML platforms. - Collaborate across teams to... 
    Temporary work
    Local area

    ByteDance

    San Jose, CA
    12 hours ago
  • $175k - $290k

     ...This role is part of the Software Infrastructure team , responsible for building and scaling the...  ...infrastructure that supports the entire software engineering organization. You will work on...  ...platforms that enable development of ML accelerator systems across both... 
    Remote work

    Phizenix

    Santa Clara, CA
    2 days ago
  •  ...Senior Software Engineer - Test Infrastructure Latitude AI develops automated driving technologies, including L3, for Ford vehicles at scale. We're...  ...environments, including Linux-based edge devices, robotics, or ML-driven applications Experience with C++ & Bazel... 
    Work at office
    Immediate start

    Latitude AI

    Palo Alto, CA
    12 hours ago
  • $2,000 per month

     ...would be impossible with GPUs, like real-time video generation models and extremely deep chain-of-thought reasoning. Software Engineer, Infrastructure Performance Designing and writing software for new ASICs is hard, and requires a huge amount of software and tooling.... 
    Work at office
    Relocation package

    OpenReq

    Cupertino, CA
    2 days ago
  • $157k - $235k

     ...and its AR glasses, Spectacles. Snap Engineering teams build fun and technically...  ...ll play a critical role in scaling our ML Infrastructure, optimizing training and inference systems...  ...and impactful. We're looking for a Software Engineer, ML Infrastructure to join Snap... 
    Live in
    Work at office
    Local area

    Snapchat

    Palo Alto, CA
    3 days ago
  • $156k - $316.8k

     ...Software Engineer, Ads ML Infrastructure Location: San Jose Employment Type: Regular Job Code: A217691 Responsibilities About the team The ads system at TikTok operates on a massive scale and serves millions of advertisers, clients and influencers across... 
    Temporary work
    Local area

    Tik Tok

    San Jose, CA
    12 days ago
  • $2,000 per month

     ...would be impossible with GPUs, like real-time video generation models and extremely deep chain-of-thought reasoning. Software Engineer, LLM Infrastructure Transformer ASICs, like those built by Etched, dramatically improve time-to-first-token latency. For a large model... 
    Work at office
    Relocation package

    OpenReq

    Cupertino, CA
    2 days ago
  • $156k - $316.8k

     ...recommendations. We bridge the gap between complex ML models and high-performance systems,...  .... By joining us, you'll build the infrastructure backbone that connects premium creators...  ...areas: personalized recommendations, search engine, machine learning, distributed storage... 
    Temporary work
    Local area

    Tik Tok

    San Jose, CA
    1 day ago
  • $244.8k

     ...highly performant, scalable and stable infrastructures that serve billions of search requests everyday...  ...users globally. We apply cutting edge ML/NLP/LLM/VLM technology for end-to-end...  ...TikTok's AI Search multi-agent LLM engine, supporting ReAct + Tool calling, DAG-based... 
    Temporary work
    Local area

    Tik Tok

    San Jose, CA
    12 hours ago
  • $158.8k - $190.55k

     ...Senior Software Engineer (Backend) - AI Infrastructure San Jose, California, United States Working here means you become part of a vision-driven team that...  ...in Python, Java, or similar, along with experience in ML frameworks (e.g., PyTorch) and cloud services (e.g.,... 
    Remote work
    Flexible hours

    ESR Healthcare

    San Jose, CA
    2 days ago
  •  ...Join to apply for the Software Engineer, Infrastructure role at Simular . Get AI-powered advice on this job and more exclusive features. Where...  ...end. Bonus: you’ve touched GPU scheduling, large-scale ML infra, or scaling SaaS systems. Seniority level... 
    Full time

    Simular

    Palo Alto, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, ML Infrastructure. Be the first to apply!