Member of Technical Staff, Performance Optimization

$175k - $220k

Fireworks AI

Member of Technical Staff, Performance Optimization San Mateo, CA About Us: At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting‑edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI. The Role: We're looking for a Software Engineer focused on Performance Optimization to help push the boundaries of speed and efficiency across our AI infrastructure. In this role, you'll take ownership of optimizing performance at every layer of the stack—from low‑level GPU kernels to large‑scale distributed systems. A key focus will be maximizing the performance of our most demanding workloads, including large language models (LLMs), vision‑language models (VLMs), and next‑generation video models. You’ll work closely with teams across research, infrastructure, and systems to identify performance bottlenecks, implement cutting‑edge optimizations, and scale our AI systems to meet the demands of real‑world production use cases. Your work will directly impact the speed, scalability, and cost‑effectiveness of some of the most advanced generative AI models in the world. Key Responsibilities: Optimize system and GPU performance for high‑throughput AI workloads across training and inference Analyze and improve latency, throughput, memory usage, and compute efficiency Profile system performance to detect and resolve GPU‑ and kernel‑level bottlenecks Implement low‑level optimizations using CUDA, Triton, and other performance tooling Drive improvements in execution speed and resource utilization for large‑scale model workloads (LLMs, VLMs, and video models) Collaborate with ML researchers to co‑design and tune model architectures for hardware efficiency Improve support for mixed precision, quantization, and model graph optimization Build and maintain performance benchmarking and monitoring infrastructure Scale inference and training systems across multi‑GPU, multi‑node environments Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes Minimum Qualifications: Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience 5+ years of experience working on performance optimization or high‑performance computing systems Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI) Familiarity with PyTorch and performance‑critical model execution Experience with distributed system debugging and optimization in multi‑GPU environments Deep understanding of GPU architecture, parallel programming models, and compute kernels Preferred Qualifications: Master’s or PhD in Computer Science, Electrical Engineering, or a related field Experience optimizing large models for training and inference (LLMs, VLMs, or video models) Knowledge of compiler stacks or ML compilers (e.g., torch.compile, Triton, XLA) Contributions to open‑source ML or HPC infrastructure Familiarity with cloud‑scale AI infrastructure and orchestration tools (e.g., Kubernetes) Background in ML systems engineering or hardware‑aware model design Implement fully asynchronous low‑latency sampling for large language models integrated with structured outputs Implement GPU kernels for the new low‑precision scheme and run experiments to find optimal speed‑quality tradeoff Build a distributed router with a custom load‑balancing algorithm to optimize LLM cache efficiency Define metrics and build harness for finding optimal performance configuration (e.g., sharding, precision) for a given class of model Determine and implement in PyTorch an optimal sharding scheme for a novel attention variant Optimize communication patterns in RDMA networks (Infiniband, RoCE) Debug numerical instabilities for a given model for a small portion of requests when deployed at scale Total compensation for this role also includes meaningful equity in a fast‑growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted. $175,000 - $220,000 USD Why Fireworks AI? Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low‑latency inference to scalable model serving. Build What’s Next: Work with bleeding‑edge technology that impacts how businesses and developers harness AI globally. Ownership & Impact: Join a fast‑growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results. Learn from the Best: Collaborate with world‑class engineers and AI researchers who thrive on curiosity and innovation. Fireworks AI is an equal‑opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators. As set forth in Fireworks AI’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law. #J-18808-Ljbffr

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff, Performance Optimization in San Mateo, CA vacancy

Member of Technical Staff, Vision / Language
...Member of Technical Staff, Vision / Language Frontier labs are racing to build general-purpose robots, and the bottleneck isn't compute. It's... ...close the loop between data quality and downstream policy performance Stay current on the research frontier (VLAs, video foundation...
Performance
xdof.ai
San Mateo, CA
5 days ago
Member of Technical Staff, Inference (Bay Area)
...token and diffusion-based control loops in robotics Design and optimize distributed inference systems on GPU clusters, pushing... ...experience in distributed systems, ML infrastructure, or high-performance serving (8+ years) Production-grade expertise in Python, with...
Performance
GenesisAI
San Carlos, CA
5 days ago
Member of Technical Staff, Research
$175k - $240k
...Member of Technical Staff, Research San Mateo, CA About Us At Fireworks, we're building... ...scalability, directly shaping our high-performance AI infrastructure. You'll... ...learning, distributed systems, and optimization to bring cutting-edge research into...
Performance
Work experience placement
Internship
Fireworks AI
Redwood City, CA
3 days ago
Member of Technical Staff, Inference & Serving
$200k - $350k
...Role We're looking for engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs in... ...reliable. Key Responsibilities Build and optimize high-performance model serving systems for low-latency inference of diffusion LLMs...
Performance
Immediate start
Flexible hours
Inception LLC
San Mateo, CA
4 days ago
Member of Technical Staff, Kernels
$200k - $350k
...Role We're looking for engineers and scientists to design, optimize, and maintain the compute foundations that power large-scale language model training and inference. You will develop high-performance ML kernels, enable efficient low-precision arithmetic, and...
Performance
Immediate start
Flexible hours
Inception LLC
San Mateo, CA
4 days ago
Member of Technical Staff, Software Engineer
$175k - $220k
...Member of Technical Staff, Software Engineer San Mateo, CA About Us At Fireworks, we're building the future of generative AI infrastructure... ...from architecture to production Improve reliability, performance, and developer experience Work directly with customers...
Performance
Fireworks AI
San Mateo, CA
4 days ago
Member of Technical Staff - ML Infrastructure & Performance
...throughput, latency, and cost - deploying our models 2–10× faster and cheaper without quality regressions. Scope of Work - GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs. - Serving stack: TensorRT-LLM/Triton Inference...
Performance
Embedding VC
San Mateo, CA
4 days ago
Member of Technical Staff - Founding Engineer
...productivity and strive to be a small and talent-dense team. No formal performance reviews. If you're here, you're a high-performer. Our... ...design) Are invigorated by high-performing peers and doing high-quality work Love technically challenging problems #J-18808-Ljbffr...
Performance
Work at office
Local area
Twenty Labs
San Mateo, CA
3 days ago
Member of Technical Staff - Product Design Engineer
...feature flags, and experiment toggles. Tech signals: Portfolio of polished AI demos in production Built design systems and shipped them; cares about performance budgets. We are committed to being an on-site, in-person team currently based in San Mateo #J-18808-Ljbffr...
Performance
Embedding VC
San Mateo, CA
3 days ago
Member of Technical Staff - Computational Biology
$200k - $300k
...integrations that power them. The other part is evaluating agent performance by designing evaluation pipelines and benchmarks that measure... ...customers and translating their scientific needs into technical requirements. Ability to move quickly in a fast-paced research...
Performance
Work at office
Phylo
South San Francisco, CA
5 days ago
Member of Technical Staff, Training
$200k - $350k
...novel training techniques and pushing the boundaries of parallel token generation. Key Responsibilities Design, develop, and optimize architectures for diffusion-based language models. Implement innovative training objectives and loss functions for discrete...
Immediate start
Flexible hours
Inception LLC
San Mateo, CA
4 days ago
Member of Technical Staff, Reinforcement Learning
$200k - $350k
...strategies, and build the algorithms that align model behavior with human intent at scale. Key Responsibilities Design, develop, and optimize RL training pipelines (PPO, DPO, RLHF, and novel approaches) for diffusion-based LLMs. Build and iterate on reward models,...
Immediate start
Flexible hours
Inception LLC
San Mateo, CA
4 days ago
Member of Technical Staff, Robot Learning (Bay Area)
...Job Title What You'll Do Develop and optimize a learning-based robotic manipulation control stack Design and maintain a teleoperation system with smooth, precise motion and low latency Train robotic policies for manipulation and locomotion with reinforcement...
GenesisAI
San Carlos, CA
1 day ago
Member of Technical Staff - System Engineering
...frameworks (e.g., gVisor, Kata Containers, Firecracker). Familiarity with distributed storage, observability systems, or high-performance compute environments. Why Join Us? ~ Competitive salary and equity share in building the future of biomedical discovery...
Performance
Work at office
Phylo
South San Francisco, CA
4 days ago
Member of Technical Staff - Product Engineering
...product features. Ship quickly, iterate based on feedback, and continuously raise the bar on product quality, reliability, and performance. Requirements 2+ years of industry experience as a product, full-stack, or frontend-leaning software engineer. Strong experience...
Performance
Work at office
Phylo, Inc.
South San Francisco, CA
3 days ago
Member of Technical Staff, Data (Bay Area, Remote)
What You’ll Do Design, build, and maintain large-scale data pipelines (batch and streaming) for robotics foundation model training and evaluation at petabyte scale Own core data infrastructure: data model, storage systems, ingestion pipelines, transformation frameworks...
Remote work
AI Chopping Block, Inc.
San Carlos, CA
3 days ago
Member of Technical Staff, Rendering (Bay Area)
What You\'ll Do Develop a high-throughput rendering pipeline for training robotics foundation models Design protocols and interfaces between the rendering pipeline, physics engine, and 3D generative models Build an efficient platform for large-scale robotics training...
GenesisAI
San Carlos, CA
4 days ago
Member of Technical Staff - Code Generation
Introducing Moonlake, AI for creating real-time interactive content Mission : As an applied AI Research Engineer: Code agents (post training + systems) Scope of Work Agentic systems design: Tool catalogs, function calling, program synthesis/repair loops, ReAct/Reflexion...
Embedding VC
San Mateo, CA
3 days ago
Member of Technical Staff, Security
Security Infrastructure Engineer What You'll Do Design, build, and scale security infrastructure from the ground up across our systems, networks, endpoints, and products Own and evolve security architecture across endpoint security, network security, application...
Interim role
GenesisAI
San Carlos, CA
4 days ago
Principal Member of Technical Staff - Autonomous Database
$99.6k - $223.4k
...management operations of databases. It also performs operations autonomously based on... ...applications, tools, networks etc. As a member of the software engineering division, you... ...applications or operating systems. Provide technical leadership to other software developers....
Temporary work
Flexible hours
Oracle
Redwood City, CA
4 days ago
Member of Technical Staff, Simulation (Bay Area)
Job Title Develop a high-throughput, GPU-based simulation pipeline (primarily rigid body simulation for robots) to train robotics foundation models Implement essential robotics features, including actuators, sensors, and controllers, in collaboration with the robotics...
GenesisAI
San Carlos, CA
5 days ago
Member of Technical Staff, Data Agent (Bay Area, Remote)
...generation paradigm of physical data synthesis— combining simulation, generative models, and autonomous agents Deep curiosity and strong technical ownership, with a track record of driving complex, open-ended projects from concept to implementation Experience with (multimodal)...
Remote work
GenesisAI
San Carlos, CA
1 day ago
6386 - Technical Specialist / Data Analyst
$85k - $145.3k
...experiences to enhance our collective expertise Technical Specialist Responsibilities: Coordinate... ...Veeva Vault. Ability to work with team members, vendors, suppliers, and contract... ...-focused culture Competitive pay plus performance-based incentive programs Company-paid...
Performance
Contract work
Temporary work
Work experience placement
Verista, Inc.
Foster, CA
4 days ago
Member of Technical Staff, Agent Workflow Systems and Evaluation
...with a warm and sincere culture that puts the welfare of team members at the forefront." Maryna Agaibi Counsel | Legal &... ...Manager Data Center Operations Burlington, TX Principal Member of Technical Staff, Agent Workflow Systems and Evaluation Operational Excellence...
Internship
Remote work
Night shift
SB Energy
Redwood City, CA
1 day ago
Sr Manager, IT Engineering
...relentless drive to make a difference. Every member of Gilead's team plays a critical role... ...on AWS S3, ensuring high availability, performance, and scalability Partner with MDM,... ..., replication, archival, and cost optimization Work with the MSP team to ensure...
Performance
GILEAD
San Mateo, CA
2 days ago
Retail Technical Operations Specialist
...the first party retail team, internal technical partners, and other operations teams. Responsibilities... ...Monitor retail technology performance dashboards, proactively consolidating... ...engagement and uptime metrics, and drive optimization efforts to improve customer experience...
Performance
Contract work
Tailored Management
Burlingame, CA
4 days ago
Receptionist - State Farm Agent Team Member
...Commission License Reimbursement Simple IRA Bonus based on performance Competitive salary Health insurance Opportunity for... ...market appropriate products and services. As an Agent Team Member, you will receive... Simple IRA Hourly pay plus...
Performance
Hourly pay
For contractors
Flexible hours
Wilson Ku - State Farm Agent
Belmont, CA
19 days ago
Receptionist - State Farm Agent Team Member
...Description Benefits: Simple IRA Hiring bonus Bonus based on performance Competitive salary Flexible schedule Health insurance... ...is laid-back and supportive, with a focus on giving team members ownership without micromanaging. Were looking for someone who...
Performance
Work at office
Flexible hours
Brandon Yim - State Farm Agent
Burlingame, CA
16 days ago
Member of Technical Staff, Security Engineering
$200k - $350k
...The Role We're hiring a hands-on Staff Security Engineer to build the security foundation for a frontier AI platform serving... ..., privacy, compliance, and infrastructure risk as we scale - a technical leader, not a friction point for the engineering team. What...
Immediate start
Flexible hours
Inception LLC
San Mateo, CA
3 days ago
Merchandising and Technical Specialist - Best Buy
...Merchandising and Technical Specialist - Best Buy Are you detail-oriented, tech-savvy, and love working independently? As a Merchandising... ...an impact. Your work helps shape retail strategies and brand performance. What will you do? Visit stores as a professional...
Performance
Flexible hours
Acosta
San Carlos, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff, Performance Optimization. Be the first to apply!