Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff, Inference & Serving

$200k - $350k

Inception LLC

The Role

We're looking for engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs in production. Your work will make inference faster, more cost-effective, and more reliable.

Key Responsibilities

  • Build and optimize high-performance model serving systems for low-latency inference of diffusion LLMs.
  • Extend orchestration frameworks (Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving.
  • Implement and manage load balancing, autoscaling, and traffic routing for model endpoints.
  • Build systems for model versioning, canary deployments, and zero-downtime rollouts.
  • Develop monitoring, alerting, and observability tooling to ensure SLA compliance and rapid incident response.
  • Collaborate with ML researchers to translate model advances (new architectures, quantization techniques, batching strategies) into production-ready serving improvements.
Qualifications
  • BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience).
  • Knowledge of ML serving frameworks (SGLang, vLLM, Triton Inference Server, TensorRT-LLM).
  • Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective.
  • Familiarity with high-performance computing and GPU programming (CUDA).
  • Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines.
  • Background in performance optimization and profiling of ML systems.
Preferred Skills
  • Experience building and maintaining large-scale language models with tens of billions of parameters or more.
  • Experience with distributed systems and cloud computing platforms (AWS/GCP/Azure).
  • Experience with ML workflow orchestration tools (Kubeflow, Airflow).
  • Experience with model optimization techniques (quantization, distillation, speculative decoding, continuous batching).
  • Knowledge of ML-specific infrastructure challenges (checkpointing, resource scheduling, etc.).
Compensation

The annual base salary range for this role is $200,000 - $350,000 USD. Final compensation is determined based on experience, skills, and qualifications. Equity and benefits are included in the total package.

Why Join Inception
  • Work with World-Class Talent : Collaborate with the inventors of diffusion models and leading AI researchers
  • Shape Foundational Technology : Your decisions will influence how the next generation of AI products are built and used
  • Immediate Impact : Join at the ground floor where your contributions directly shape product direction and company trajectory
Perks & Benefits
  • Competitive salary and equity in a rapidly growing startup
  • Flexible vacation and paid time off (PTO)
  • Health, dental, and vision insurance
  • 401k match
  • Catered meals (breakfast, lunch, & dinner)
  • Commuter subsidies
  • A collaborative and inclusive culture

About Us

Inception creates the world's fastest, most efficient AI models. Today's autoregressive LLMs generate tokens sequentially, which makes them painfully slow and expensive. Inception's diffusion-based LLMs (dLLMs) generate answers in parallel. They are 5x faster and more efficient, while delivering best-in-class quality.

Inception was co-founded by Stanford professor Stefano Ermon, who co-invented such breakthrough AI technologies as diffusion models, flash attention, and DPO, UCLA professor Aditya Grover, who co-invented node2vec, decision transformers, and d1 reasoning, and Cornell professor and Afresh co-founder Volodymyr Kuleshov, who co-invented MDLM and Block Diffusion.

We pioneered the application of diffusion to language, with world's first (and only) commercially available dLLM, Mercury. We are currently deploying our large-scale diffusion LLMs at Fortune 500 companies. Diffusion is the technology behind today's image and video AI, and we're making it the standard for LLMs as well.

Our team includes engineers from AWS, Google DeepMind, Meta AI, Microsoft, HashiCorp, and OpenAI. Based in Palo Alto, CA, we are backed by top-tier venture capitalists, including Menlo Ventures, Mayfield, M12 (Microsoft's venture fund), Snowflake Ventures, Databricks, and Innovation Endeavors, and by tech luminaries such as Andrew Ng, Andrej Karpathy, and Eric Schmidt.

If you are talented, innovative, and ambitious, come help us invent the future of AI.

We are an equal opportunity employer and encourage candidates of all backgrounds to apply.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff, Inference & Serving in San Mateo, CA vacancy
  •  ...Job Title What You'll Do Build low-latency inference pipelines for on-device deployment, enabling real-time next-token and diffusion...  ...systems on GPU clusters, pushing throughput with large-batch serving and efficient resource utilization Implement efficient low-... 
    Suggested

    GenesisAI

    San Carlos, CA
    5 days ago
  • $175k - $220k

     ...Member of Technical Staff, Performance Optimization San Mateo, CA About Us At Fireworks, we...  ...models with the fastest and most scalable inference in the industry. We've been...  ...-latency inference to scalable model serving. Build What's Next: Work with bleeding... 
    Suggested

    Fireworks AI

    San Mateo, CA
    3 days ago
  • $175k - $240k

     ...Member of Technical Staff, Research San Mateo, CA About Us At Fireworks, we're building the...  ...models with the fastest and most scalable inference in the industry. We've been...  ...-latency inference to scalable model serving. Build What's Next: Work with bleeding... 
    Suggested
    Work experience placement
    Internship

    Fireworks AI

    San Mateo, CA
    5 days ago
  • $200k - $350k

     ...foundations that power large-scale language model training and inference. You will develop high-performance ML kernels, enable efficient...  ...improve the distributed compute stack that makes training and serving large models possible. Key Responsibilities Design and implement... 
    Suggested
    Immediate start
    Flexible hours

    Inception LLC

    San Mateo, CA
    4 days ago
  • $200k - $350k

     ...related post-training methods. Familiarity with training and inference in diffusion models. Experience training deep learning...  ...and neural network architecture design. Experience with LLM serving frameworks like vLLM, SGLang, or TensorRT. Compensation... 
    Suggested
    Immediate start
    Flexible hours

    Inception LLC

    San Mateo, CA
    4 days ago
  • $200k - $350k

     ...Improve model efficiency, reduce training time, and optimize inference throughput. Qualifications BS/MS/PhD in Computer Science...  ...and neural network architecture design. Experience with LLM serving frameworks like vLLM, SGLang, or TensorRT. Compensation... 
    Immediate start
    Flexible hours

    Inception LLC

    San Mateo, CA
    2 days ago
  • $175k - $220k

     ...Member of Technical Staff, Software Engineer San Mateo, CA About Us At Fireworks, we're building...  ...with the fastest and most scalable inference in the industry. We've been...  ...-latency inference to scalable model serving. Build What's Next: Work with bleeding... 

    Fireworks AI

    San Mateo, CA
    4 days ago
  •  ...Member Of Technical Staff- Full Stack Software Engineer San Mateo, CA About Us At Fireworks...  ...models with the fastest and most scalable inference in the industry. We've been...  ...-latency inference to scalable model serving. Build What's Next: Work with bleeding... 

    Fireworks AI

    San Mateo, CA
    1 day ago
  • $200k - $350k

     ...We're hiring a hands-on Staff Security Engineer to build the...  ...foundation for a frontier AI platform serving enterprise customers - owning...  ...risk as we scale - a technical leader, not a friction point...  ...infrastructure, GPU-intensive workloads, inference pipelines, serving APIs, or... 
    Immediate start
    Flexible hours

    Inception LLC

    San Mateo, CA
    3 days ago
  •  ...Scope of Work: - GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs. - Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing.... 

    Embedding VC

    San Mateo, CA
    4 days ago
  •  ...and drive improvements back into the product with the AI team. Serve as the scientific voice in customer engagements, deployments, and...  ...customers and translating their scientific needs into technical requirements. Ability to move quickly in a fast-paced research... 
    Work at office

    Phylo, Inc.

    South San Francisco, CA
    5 days ago
  • Security Infrastructure Engineer What You'll Do Design, build, and scale security infrastructure from the ground up across our systems, networks, endpoints, and products Own and evolve security architecture across endpoint security, network security, application...
    Interim role

    GenesisAI

    San Carlos, CA
    4 days ago
  • $96.8k - $223.4k

     ...automated partitioning, automated clustering and zonemaps. As a member of the software engineering division, you will take an active...  ...and research-centric, involving application of advanced technical and problem-solving skills in area efficient execution of sql queries... 
    Temporary work
    Flexible hours

    Oracle

    Redwood City, CA
    3 days ago
  • $99.6k - $223.4k

     ...programs for databases, applications, tools, networks etc. As a member of the software engineering division, you will assist in...  ...designing of software applications or operating systems. Provide technical leadership to other software developers. Specify, design and... 
    Temporary work
    Flexible hours

    Oracle

    Redwood City, CA
    4 days ago
  • Job Title Develop a high-throughput, GPU-based simulation pipeline (primarily rigid body simulation for robots) to train robotics foundation models Implement essential robotics features, including actuators, sensors, and controllers, in collaboration with the robotics...

    GenesisAI

    San Carlos, CA
    5 days ago
  • Introducing Moonlake, AI for creating real-time interactive content Mission : As an applied AI Research Engineer: Code agents (post training + systems) Scope of Work : - Agentic systems design: Tool catalogs, function calling, program synthesis/repair loops, ...

    Embedding VC

    San Mateo, CA
    4 days ago
  • Job Title What You'll Do Develop a high-throughput rendering pipeline for training robotics foundation models Design protocols and interfaces between the rendering pipeline, physics engine, and 3D generative models Build an efficient platform for large-scale...

    GenesisAI

    San Carlos, CA
    5 days ago
  • Job Title What You'll Do Develop and optimize a learning-based robotic manipulation control stack Design and maintain a teleoperation system with smooth, precise motion and low latency Train robotic policies for manipulation and locomotion with reinforcement...

    GenesisAI

    San Carlos, CA
    1 day ago
  •  ...paradigm of physical data synthesis— combining simulation, generative models, and autonomous agents Deep curiosity and strong technical ownership, with a track record of driving complex, open-ended projects from concept to implementation Experience with (multimodal... 

    GenesisAI

    San Carlos, CA
    5 days ago
  •  ...with a warm and sincere culture that puts the welfare of team members at the forefront." Maryna Agaibi Counsel | Legal &...  ...Manager Data Center Operations Burlington, TX Principal Member of Technical Staff, Agent Workflow Systems and Evaluation Operational Excellence... 
    Internship
    Remote work
    Night shift

    SB Energy

    Redwood City, CA
    3 hours ago
  •  ...and Network Security Job Summary The Technical Support Engineerindependently resolves...  ...Customer Engagement & Support Excellence Serves as a primary technical point of contact...  ...business and what we look for in every team member: Trust is paramount. We deliver... 
    Full time
    Remote work

    Keyfactor

    San Mateo, CA
    a month ago
  • $100 per hour

     ...Technical Support Engineer Hybrid (4 days onsite, 1 day remote) – Brisbane, CA To get...  ...to resolve complex challenges while serving as a trusted customer advisor. We're seeking...  ...and collaboration skills with team members, partners, and customers Nice to have... 
    Temporary work
    Fixed term contract
    Remote work
    Work from home
    Home office
    Work visa

    Odoo

    San Mateo, CA
    4 days ago
  • $80k - $115k

     ...for attorneys and professional staff, both locally in the office and remote staff. As a member of the Desktop Operations team,...  ...user requests, business needs and technical specifications into formal...  ...contact/troubleshooting purposes serving as a liaison with third-party support... 
    Work experience placement
    Work at office
    Remote work

    Goodwin Procter

    Redwood City, CA
    5 days ago
  • $35 - $40 per hour

     ...Job Description Key Responsibilities Serve as the primary onsite point of contact...  ...available onsite Vendor and advanced technical support is available for escalations With...  .... Benefits may be different for union members. About the Team Our Company Mission... 
    Local area
    Worldwide

    Securitas

    Redwood City, CA
    4 days ago
  • $90k - $120k

     ...ROLE ROLE : The IT Helpdesk Engineer will serve as a critical hub for end-users to access information and support in all areas of technology. Working closely with the IT Manager, this position shall act as SME and primary user of the company’s IT ticketing system to... 
    For contractors
    Work experience placement
    Work at office
    Local area
    Remote work
    Flexible hours

    Obayashi Na

    Foster, CA
    3 days ago
  • $51 - $60 per hour

     ...Senior Technical Support Engineer San Mateo, CA United States Who We Are Verkada is transforming how organizations protect their...  ...Technical Support Engineer to join our elite team of engineers serving our growing base of enterprise customers. This high-impact role... 
    Hourly pay
    Work visa
    Flexible hours
    Shift work

    Verkada

    San Mateo, CA
    1 day ago
  • $75 per hour

     ...fleet operational, reliable, and customer-ready. You’ll be the technical authority for diagnosing, resolving, and preventing issues that...  ...What You’ll Do Technical Escalation & Fleet Support Serve as the primary technical escalation point for HMI-related issues... 
    Hourly pay
    Permanent employment
    Contract work
    Remote work
    Monday to Friday
    Night shift
    Weekend work

    Wealth Recruitment, LLC

    San Mateo, CA
    5 days ago
  • Phylo AI Engineer Position Phylo is an applied research lab building agentic intelligence to accelerate discovery for every biomedical scientist. We believe AI agents will fundamentally transform how biomedical research is done, enabling faster and more systematic scientific...
    Work at office

    Phylo

    South San Francisco, CA
    4 days ago
  • $75 - $85 per hour

     ...electrical schematics Job Description: The Fleet Support Engineer on the Vehicle Development team serves as the highest level (Tier 3) of field technical support, focused on systemic reliability of fleet vehicles. You will be a subject matter expert in sensor... 
    Hourly pay
    Full time
    Temporary work
    Remote work

    Ursus Inc

    San Mateo, CA
    3 days ago
  • About Phylo Phylo is an applied research lab building agentic intelligence to accelerate discovery for every biomedical scientist. We believe AI agents will fundamentally transform how biomedical research is done. Our fast-growing team brings together researchers and engineers...
    Work at office

    Phylo, Inc.

    South San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff, Inference & Serving. Be the first to apply!