Member of Technical Staff, Inference & Serving
Inception LLC
The Role We're looking for engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs in production. Your work will make inference faster, more cost-effective, and more reliable. Key Responsibilities
- Build and optimize high-performance model serving systems for low-latency inference of diffusion LLMs.
- Extend orchestration frameworks (Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving.
- Implement and manage load balancing, autoscaling, and traffic routing for model endpoints.
- Build systems for model versioning, canary deployments, and zero-downtime rollouts.
- Develop monitoring, alerting, and observability tooling to ensure SLA compliance and rapid incident response.
- Collaborate with ML researchers to translate model advances (new architectures, quantization techniques, batching strategies) into production-ready serving improvements.
- BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience).
- Knowledge of ML serving frameworks (SGLang, vLLM, Triton Inference Server, TensorRT-LLM).
- Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective.
- Familiarity with high-performance computing and GPU programming (CUDA).
- Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines.
- Background in performance optimization and profiling of ML systems.
- Experience building and maintaining large-scale language models with tens of billions of parameters or more.
- Experience with distributed systems and cloud computing platforms (AWS/GCP/Azure).
- Experience with ML workflow orchestration tools (Kubeflow, Airflow).
- Experience with model optimization techniques (quantization, distillation, speculative decoding, continuous batching).
- Knowledge of ML-specific infrastructure challenges (checkpointing, resource scheduling, etc.).
- Work with World-Class Talent : Collaborate with the inventors of diffusion models and leading AI researchers
- Shape Foundational Technology : Your decisions will influence how the next generation of AI products are built and used
- Immediate Impact : Join at the ground floor where your contributions directly shape product direction and company trajectory
- Competitive salary and equity in a rapidly growing startup
- Flexible vacation and paid time off (PTO)
- Health, dental, and vision insurance
- Catered meals (breakfast, lunch, & dinner)
- Commuter subsidies
- A collaborative and inclusive culture
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff, Inference & Serving in San Mateo, CA vacancy
- ...Job Title What You'll Do Build low-latency inference pipelines for on-device deployment, enabling real-time next-token and diffusion... ...systems on GPU clusters, pushing throughput with large-batch serving and efficient resource utilization Implement efficient low-...Suggested
$175k - $220k
...Member of Technical Staff, Performance Optimization San Mateo, CA About Us At Fireworks, we... ...models with the fastest and most scalable inference in the industry. We've been... ...-latency inference to scalable model serving. Build What's Next: Work with bleeding...Suggested$175k - $240k
...Member of Technical Staff, Research San Mateo, CA About Us At Fireworks, we're building the... ...models with the fastest and most scalable inference in the industry. We've been... ...-latency inference to scalable model serving. Build What's Next: Work with bleeding...SuggestedWork experience placementInternship$175k - $220k
...Member of Technical Staff, Software Engineer San Mateo, CA About Us At Fireworks, we're building... ...with the fastest and most scalable inference in the industry. We've been... ...-latency inference to scalable model serving. Build What's Next: Work with bleeding...Suggested- ...foundations that power large-scale language model training and inference. You will develop high-performance ML kernels, enable efficient... ...improve the distributed compute stack that makes training and serving large models possible. Key Responsibilities Design and implement...SuggestedImmediate startFlexible hours
- ...Improve model efficiency, reduce training time, and optimize inference throughput. Qualifications BS/MS/PhD in Computer Science... ...and neural network architecture design. Experience with LLM serving frameworks like vLLM, SGLang, or TensorRT. Why Join Inception...Immediate startFlexible hours
- ...related post-training methods. Familiarity with training and inference in diffusion models. Experience training deep learning... ...and neural network architecture design. Experience with LLM serving frameworks like vLLM, SGLang, or TensorRT. Why Join Inception...Immediate startFlexible hours
- ...We're hiring a hands-on Staff Security Engineer to build the... ...foundation for a frontier AI platform serving enterprise customers - owning... ...risk as we scale - a technical leader, not a friction point... ...infrastructure, GPU-intensive workloads, inference pipelines, serving APIs, or...Immediate startFlexible hours
- ...Scope of Work: - GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs. - Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing....
- What You’ll Do Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU ...Remote work
- What You’ll Do Design, build, and maintain large-scale data pipelines (batch and streaming) for robotics foundation model training and evaluation at petabyte scale Own core data infrastructure: data model, storage systems, ingestion pipelines, transformation frameworks...Remote work
- Security Infrastructure Engineer What You'll Do Design, build, and scale security infrastructure from the ground up across our systems, networks, endpoints, and products Own and evolve security architecture across endpoint security, network security, application...Interim role
- Job Title What You'll Do Develop and optimize a learning-based robotic manipulation control stack Design and maintain a teleoperation system with smooth, precise motion and low latency Train robotic policies for manipulation and locomotion with reinforcement...
- Job Title What You'll Do Develop a high-throughput rendering pipeline for training robotics foundation models Design protocols and interfaces between the rendering pipeline, physics engine, and 3D generative models Build an efficient platform for large-scale...
- Introducing Moonlake, AI for creating real-time interactive content Mission : As an applied AI Research Engineer: Code agents (post training + systems) Scope of Work : - Agentic systems design: Tool catalogs, function calling, program synthesis/repair loops, ...
- Job Title Develop a high-throughput, GPU-based simulation pipeline (primarily rigid body simulation for robots) to train robotics foundation models Implement essential robotics features, including actuators, sensors, and controllers, in collaboration with the robotics...
- ...paradigm of physical data synthesis— combining simulation, generative models, and autonomous agents Deep curiosity and strong technical ownership, with a track record of driving complex, open-ended projects from concept to implementation Experience with (multimodal...
- ...and Network Security Job Summary The Technical Support Engineerindependently resolves... ...Customer Engagement & Support Excellence Serves as a primary technical point of contact... ...business and what we look for in every team member: Trust is paramount. We deliver...Full timeRemote work
$100 per hour
...Technical Support Engineer (Spanish Required) Hybrid (4 days onsite, 1 day remote) – Brisbane... ...to resolve complex challenges while serving as a trusted customer advisor. We're seeking... ...and collaboration skills with team members, partners, and customers Nice to have...Temporary workFixed term contractRemote workWork from homeHome officeWork visa- Introducing Moonlake, AI for creating real-time interactive content Mission: Product-level UX + Full-stack, turn research into 'magical', shippable experiences quickly. Non-negotiables: - Strong product taste; rapid prototyping; zero-to-one ownership. - Stack...
- ...Key Responsibilities • Serve as the primary onsite point of contact for all security system... ...available onsite • Vendor and advanced technical support is available for escalations... ...policy, depending on local requirements. Benefits may be different for union members....Local areaWorldwide
$80k - $115k
...for attorneys and professional staff, both locally in the office and remote staff. As a member of the Desktop Operations team,... ...user requests, business needs and technical specifications into formal... ...contact/troubleshooting purposes serving as a liaison with third-party support...Work experience placementWork at officeRemote work- ...creation. About the role: The Tier 2 Help Desk Technician serves as an advanced technical support resource and escalation point responsible for... ...process standardization initiatives while mentoring Tier 1 staff and helping scale IT operations in a fast-paced, security-...Full timeTemporary workWork at officeRemote workWorldwideMonday to FridayFlexible hours
$51 - $60 per hour
...Senior Technical Support Engineer San Mateo, CA United States Who We Are Verkada is transforming how organizations protect their... ...Technical Support Engineer to join our elite team of engineers serving our growing base of enterprise customers. This high-impact role...Hourly payWork visaFlexible hoursShift work- ...Minimal domestic travel will be required The Fleet Support Engineer on the Vehicle Development team serves as the highest level (Tier 3) of field technical support, focused on systemic reliability of fleet vehicles. You will be a subject matter expert in sensor systems...Contract workTemporary workWork at officeRemote work
- ...with a specific focus on chassis systems. We're looking for a technical expert who excels at collaborating with cross-functional teams,... ...impactful support for daily fleet operations. Responsibilities: Serve as the primary technical point of contact for real-time...Permanent employmentInterim roleWork at officeImmediate startNight shiftWeekend work
$75 per hour
...fleet operational, reliable, and customer-ready. You’ll be the technical authority for diagnosing, resolving, and preventing issues that... ...What You’ll Do Technical Escalation & Fleet Support Serve as the primary technical escalation point for HMI-related issues...Hourly payPermanent employmentContract workRemote workMonday to FridayNight shiftWeekend work- About Phylo Phylo is an applied research lab building agentic intelligence to accelerate discovery for every biomedical scientist. We believe AI agents will fundamentally transform how biomedical research is done, enabling faster and more systematic scientific progress...Work at office
$18 per hour
...Hourly Description Summary: The Service Desk Representative is a high-profile customer service position delivering beyond our member's expectations. They contribute to member retention, as well as new membership sales. This person has the responsibility of being...Hourly payShift work$55 - $90 per hour
...California and the Portland area, BKF has served transportation, water resources, land... ...BKF is hiring a Historic Structures Senior Technical Specialist to conduct research, evaluations... ...critical conversations with internal staff, managers, and leaders. Valid California...Hourly payFull timeFor contractorsWork at officeLocal areaRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff, Inference & Serving. Be the first to apply!
Related searches
- technical support assistant San Mateo, CA
- technical analyst San Mateo, CA
- IT assistant San Mateo, CA
- help desk assistant San Mateo, CA
- IT support technician San Mateo, CA
- desktop support analyst San Mateo, CA
- support analyst San Mateo, CA
- technical associate San Mateo, CA
- support technician San Mateo, CA
- work from home technical support specialist San Mateo, CA


