Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff, Performance Optimization

$175k - $220k

Fireworks AI

Member of Technical Staff, Performance Optimization San Mateo, CA About Us: At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting‑edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI. The Role: We're looking for a Software Engineer focused on Performance Optimization to help push the boundaries of speed and efficiency across our AI infrastructure. In this role, you'll take ownership of optimizing performance at every layer of the stack—from low‑level GPU kernels to large‑scale distributed systems. A key focus will be maximizing the performance of our most demanding workloads, including large language models (LLMs), vision‑language models (VLMs), and next‑generation video models. You’ll work closely with teams across research, infrastructure, and systems to identify performance bottlenecks, implement cutting‑edge optimizations, and scale our AI systems to meet the demands of real‑world production use cases. Your work will directly impact the speed, scalability, and cost‑effectiveness of some of the most advanced generative AI models in the world. Key Responsibilities: Optimize system and GPU performance for high‑throughput AI workloads across training and inference Analyze and improve latency, throughput, memory usage, and compute efficiency Profile system performance to detect and resolve GPU‑ and kernel‑level bottlenecks Implement low‑level optimizations using CUDA, Triton, and other performance tooling Drive improvements in execution speed and resource utilization for large‑scale model workloads (LLMs, VLMs, and video models) Collaborate with ML researchers to co‑design and tune model architectures for hardware efficiency Improve support for mixed precision, quantization, and model graph optimization Build and maintain performance benchmarking and monitoring infrastructure Scale inference and training systems across multi‑GPU, multi‑node environments Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes Minimum Qualifications: Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience 5+ years of experience working on performance optimization or high‑performance computing systems Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI) Familiarity with PyTorch and performance‑critical model execution Experience with distributed system debugging and optimization in multi‑GPU environments Deep understanding of GPU architecture, parallel programming models, and compute kernels Preferred Qualifications: Master’s or PhD in Computer Science, Electrical Engineering, or a related field Experience optimizing large models for training and inference (LLMs, VLMs, or video models) Knowledge of compiler stacks or ML compilers (e.g., torch.compile, Triton, XLA) Contributions to open‑source ML or HPC infrastructure Familiarity with cloud‑scale AI infrastructure and orchestration tools (e.g., Kubernetes) Background in ML systems engineering or hardware‑aware model design Implement fully asynchronous low‑latency sampling for large language models integrated with structured outputs Implement GPU kernels for the new low‑precision scheme and run experiments to find optimal speed‑quality tradeoff Build a distributed router with a custom load‑balancing algorithm to optimize LLM cache efficiency Define metrics and build harness for finding optimal performance configuration (e.g., sharding, precision) for a given class of model Determine and implement in PyTorch an optimal sharding scheme for a novel attention variant Optimize communication patterns in RDMA networks (Infiniband, RoCE) Debug numerical instabilities for a given model for a small portion of requests when deployed at scale Total compensation for this role also includes meaningful equity in a fast‑growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted. $175,000 - $220,000 USD Why Fireworks AI? Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low‑latency inference to scalable model serving. Build What’s Next: Work with bleeding‑edge technology that impacts how businesses and developers harness AI globally. Ownership & Impact: Join a fast‑growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results. Learn from the Best: Collaborate with world‑class engineers and AI researchers who thrive on curiosity and innovation. Fireworks AI is an equal‑opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators. As set forth in Fireworks AI’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law. #J-18808-Ljbffr

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff, Performance Optimization in San Mateo, CA vacancy
  •  ...Member of Technical Staff, Vision / Language Frontier labs are racing to build general-purpose robots, and the bottleneck isn't compute. It's...  ...close the loop between data quality and downstream policy performance Stay current on the research frontier (VLAs, video foundation... 
    Performance

    xdof.ai

    San Mateo, CA
    5 days ago
  •  ...token and diffusion-based control loops in robotics Design and optimize distributed inference systems on GPU clusters, pushing...  ...experience in distributed systems, ML infrastructure, or high-performance serving (8+ years) Production-grade expertise in Python, with... 
    Performance

    GenesisAI

    San Carlos, CA
    5 days ago
  • $175k - $240k

     ...Member of Technical Staff, Research San Mateo, CA About Us At Fireworks, we're building...  ...scalability, directly shaping our high-performance AI infrastructure. You'll...  ...learning, distributed systems, and optimization to bring cutting-edge research into... 
    Performance
    Work experience placement
    Internship

    Fireworks AI

    Redwood City, CA
    3 days ago
  • $200k - $350k

     ...Role We're looking for engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs in...  ...reliable. Key Responsibilities Build and optimize high-performance model serving systems for low-latency inference of diffusion LLMs... 
    Performance
    Immediate start
    Flexible hours

    Inception LLC

    San Mateo, CA
    4 days ago
  • $200k - $350k

     ...Role We're looking for engineers and scientists to design, optimize, and maintain the compute foundations that power large-scale language model training and inference. You will develop high-performance ML kernels, enable efficient low-precision arithmetic, and... 
    Performance
    Immediate start
    Flexible hours

    Inception LLC

    San Mateo, CA
    4 days ago
  • $175k - $220k

     ...Member of Technical Staff, Software Engineer San Mateo, CA About Us At Fireworks, we're building the future of generative AI infrastructure...  ...from architecture to production Improve reliability, performance, and developer experience Work directly with customers... 
    Performance

    Fireworks AI

    San Mateo, CA
    4 days ago
  •  ...throughput, latency, and cost - deploying our models 2–10× faster and cheaper without quality regressions. Scope of Work - GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs. - Serving stack: TensorRT-LLM/Triton Inference... 
    Performance

    Embedding VC

    San Mateo, CA
    4 days ago
  •  ...productivity and strive to be a small and talent-dense team. No formal performance reviews. If you're here, you're a high-performer. Our...  ...design) Are invigorated by high-performing peers and doing high-quality work Love technically challenging problems #J-18808-Ljbffr... 
    Performance
    Work at office
    Local area

    Twenty Labs

    San Mateo, CA
    3 days ago
  •  ...feature flags, and experiment toggles. Tech signals: Portfolio of polished AI demos in production Built design systems and shipped them; cares about performance budgets. We are committed to being an on-site, in-person team currently based in San Mateo #J-18808-Ljbffr... 
    Performance

    Embedding VC

    San Mateo, CA
    3 days ago
  • $200k - $300k

     ...integrations that power them. The other part is evaluating agent performance by designing evaluation pipelines and benchmarks that measure...  ...customers and translating their scientific needs into technical requirements. Ability to move quickly in a fast-paced research... 
    Performance
    Work at office

    Phylo

    South San Francisco, CA
    5 days ago
  • $200k - $350k

     ...novel training techniques and pushing the boundaries of parallel token generation. Key Responsibilities Design, develop, and optimize architectures for diffusion-based language models. Implement innovative training objectives and loss functions for discrete... 
    Immediate start
    Flexible hours

    Inception LLC

    San Mateo, CA
    4 days ago
  • $200k - $350k

     ...strategies, and build the algorithms that align model behavior with human intent at scale. Key Responsibilities Design, develop, and optimize RL training pipelines (PPO, DPO, RLHF, and novel approaches) for diffusion-based LLMs. Build and iterate on reward models,... 
    Immediate start
    Flexible hours

    Inception LLC

    San Mateo, CA
    4 days ago
  •  ...Job Title What You'll Do Develop and optimize a learning-based robotic manipulation control stack Design and maintain a teleoperation system with smooth, precise motion and low latency Train robotic policies for manipulation and locomotion with reinforcement... 

    GenesisAI

    San Carlos, CA
    1 day ago
  •  ...frameworks (e.g., gVisor, Kata Containers, Firecracker). Familiarity with distributed storage, observability systems, or high-performance compute environments. Why Join Us? ~ Competitive salary and equity share in building the future of biomedical discovery... 
    Performance
    Work at office

    Phylo

    South San Francisco, CA
    4 days ago
  •  ...product features. Ship quickly, iterate based on feedback, and continuously raise the bar on product quality, reliability, and performance. Requirements 2+ years of industry experience as a product, full-stack, or frontend-leaning software engineer. Strong experience... 
    Performance
    Work at office

    Phylo, Inc.

    South San Francisco, CA
    3 days ago
  • What You’ll Do Design, build, and maintain large-scale data pipelines (batch and streaming) for robotics foundation model training and evaluation at petabyte scale Own core data infrastructure: data model, storage systems, ingestion pipelines, transformation frameworks...
    Remote work

    AI Chopping Block, Inc.

    San Carlos, CA
    3 days ago
  • What You\'ll Do Develop a high-throughput rendering pipeline for training robotics foundation models Design protocols and interfaces between the rendering pipeline, physics engine, and 3D generative models Build an efficient platform for large-scale robotics training...

    GenesisAI

    San Carlos, CA
    4 days ago
  • Introducing Moonlake, AI for creating real-time interactive content Mission : As an applied AI Research Engineer: Code agents (post training + systems) Scope of Work Agentic systems design: Tool catalogs, function calling, program synthesis/repair loops, ReAct/Reflexion...

    Embedding VC

    San Mateo, CA
    3 days ago
  • Security Infrastructure Engineer What You'll Do Design, build, and scale security infrastructure from the ground up across our systems, networks, endpoints, and products Own and evolve security architecture across endpoint security, network security, application...
    Interim role

    GenesisAI

    San Carlos, CA
    4 days ago
  • $99.6k - $223.4k

     ...management operations of databases. It also performs operations autonomously based on...  ...applications, tools, networks etc. As a member of the software engineering division, you...  ...applications or operating systems. Provide technical leadership to other software developers.... 
    Temporary work
    Flexible hours

    Oracle

    Redwood City, CA
    4 days ago
  • Job Title Develop a high-throughput, GPU-based simulation pipeline (primarily rigid body simulation for robots) to train robotics foundation models Implement essential robotics features, including actuators, sensors, and controllers, in collaboration with the robotics...

    GenesisAI

    San Carlos, CA
    5 days ago
  •  ...generation paradigm of physical data synthesis— combining simulation, generative models, and autonomous agents Deep curiosity and strong technical ownership, with a track record of driving complex, open-ended projects from concept to implementation Experience with (multimodal)... 
    Remote work

    GenesisAI

    San Carlos, CA
    1 day ago
  • $85k - $145.3k

     ...experiences to enhance our collective expertise Technical Specialist Responsibilities: Coordinate...  ...Veeva Vault. Ability to work with team members, vendors, suppliers, and contract...  ...-focused culture Competitive pay plus performance-based incentive programs Company-paid... 
    Performance
    Contract work
    Temporary work
    Work experience placement

    Verista, Inc.

    Foster, CA
    4 days ago
  •  ...with a warm and sincere culture that puts the welfare of team members at the forefront." Maryna Agaibi Counsel | Legal &...  ...Manager Data Center Operations Burlington, TX Principal Member of Technical Staff, Agent Workflow Systems and Evaluation Operational Excellence... 
    Internship
    Remote work
    Night shift

    SB Energy

    Redwood City, CA
    1 day ago
  •  ...relentless drive to make a difference. Every member of Gilead's team plays a critical role...  ...on AWS S3, ensuring high availability, performance, and scalability Partner with MDM,...  ..., replication, archival, and cost optimization Work with the MSP team to ensure... 
    Performance

    GILEAD

    San Mateo, CA
    2 days ago
  •  ...the first party retail team, internal technical partners, and other operations teams. Responsibilities...  ...Monitor retail technology performance dashboards, proactively consolidating...  ...engagement and uptime metrics, and drive optimization efforts to improve customer experience... 
    Performance
    Contract work

    Tailored Management

    Burlingame, CA
    4 days ago
  •  ...Commission License Reimbursement Simple IRA Bonus based on performance Competitive salary Health insurance Opportunity for...  ...market appropriate products and services. As an Agent Team Member, you will receive... Simple IRA Hourly pay plus... 
    Performance
    Hourly pay
    For contractors
    Flexible hours

    Wilson Ku - State Farm Agent

    Belmont, CA
    19 days ago
  •  ...Description Benefits: Simple IRA Hiring bonus Bonus based on performance Competitive salary Flexible schedule Health insurance...  ...is laid-back and supportive, with a focus on giving team members ownership without micromanaging. Were looking for someone who... 
    Performance
    Work at office
    Flexible hours

    Brandon Yim - State Farm Agent

    Burlingame, CA
    16 days ago
  • $200k - $350k

     ...The Role We're hiring a hands-on Staff Security Engineer to build the security foundation for a frontier AI platform serving...  ..., privacy, compliance, and infrastructure risk as we scale - a technical leader, not a friction point for the engineering team. What... 
    Immediate start
    Flexible hours

    Inception LLC

    San Mateo, CA
    3 days ago
  •  ...Merchandising and Technical Specialist - Best Buy Are you detail-oriented, tech-savvy, and love working independently? As a Merchandising...  ...an impact. Your work helps shape retail strategies and brand performance. What will you do? Visit stores as a professional... 
    Performance
    Flexible hours

    Acosta

    San Carlos, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff, Performance Optimization. Be the first to apply!