Member of Technical Staff, Performance Optimization
$175k - $220kFireworks AI
Member of Technical Staff, Performance Optimization San Mateo, CA About Us: At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting‑edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI. The Role: We're looking for a Software Engineer focused on Performance Optimization to help push the boundaries of speed and efficiency across our AI infrastructure. In this role, you'll take ownership of optimizing performance at every layer of the stack—from low‑level GPU kernels to large‑scale distributed systems. A key focus will be maximizing the performance of our most demanding workloads, including large language models (LLMs), vision‑language models (VLMs), and next‑generation video models. You’ll work closely with teams across research, infrastructure, and systems to identify performance bottlenecks, implement cutting‑edge optimizations, and scale our AI systems to meet the demands of real‑world production use cases. Your work will directly impact the speed, scalability, and cost‑effectiveness of some of the most advanced generative AI models in the world. Key Responsibilities: Optimize system and GPU performance for high‑throughput AI workloads across training and inference Analyze and improve latency, throughput, memory usage, and compute efficiency Profile system performance to detect and resolve GPU‑ and kernel‑level bottlenecks Implement low‑level optimizations using CUDA, Triton, and other performance tooling Drive improvements in execution speed and resource utilization for large‑scale model workloads (LLMs, VLMs, and video models) Collaborate with ML researchers to co‑design and tune model architectures for hardware efficiency Improve support for mixed precision, quantization, and model graph optimization Build and maintain performance benchmarking and monitoring infrastructure Scale inference and training systems across multi‑GPU, multi‑node environments Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes Minimum Qualifications: Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience 5+ years of experience working on performance optimization or high‑performance computing systems Proficiency in CUDA or ROCm and experience with GPU profiling tools (e.g., Nsight, nvprof, CUPTI) Familiarity with PyTorch and performance‑critical model execution Experience with distributed system debugging and optimization in multi‑GPU environments Deep understanding of GPU architecture, parallel programming models, and compute kernels Preferred Qualifications: Master’s or PhD in Computer Science, Electrical Engineering, or a related field Experience optimizing large models for training and inference (LLMs, VLMs, or video models) Knowledge of compiler stacks or ML compilers (e.g., torch.compile, Triton, XLA) Contributions to open‑source ML or HPC infrastructure Familiarity with cloud‑scale AI infrastructure and orchestration tools (e.g., Kubernetes) Background in ML systems engineering or hardware‑aware model design Implement fully asynchronous low‑latency sampling for large language models integrated with structured outputs Implement GPU kernels for the new low‑precision scheme and run experiments to find optimal speed‑quality tradeoff Build a distributed router with a custom load‑balancing algorithm to optimize LLM cache efficiency Define metrics and build harness for finding optimal performance configuration (e.g., sharding, precision) for a given class of model Determine and implement in PyTorch an optimal sharding scheme for a novel attention variant Optimize communication patterns in RDMA networks (Infiniband, RoCE) Debug numerical instabilities for a given model for a small portion of requests when deployed at scale Total compensation for this role also includes meaningful equity in a fast‑growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted. $175,000 - $220,000 USD Why Fireworks AI? Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low‑latency inference to scalable model serving. Build What’s Next: Work with bleeding‑edge technology that impacts how businesses and developers harness AI globally. Ownership & Impact: Join a fast‑growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results. Learn from the Best: Collaborate with world‑class engineers and AI researchers who thrive on curiosity and innovation. Fireworks AI is an equal‑opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators. As set forth in Fireworks AI’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law. #J-18808-Ljbffr
- ...Member of Technical Staff, Vision / Language Frontier labs are racing to build general-purpose robots, and the bottleneck isn't compute. It's... ...close the loop between data quality and downstream policy performance Stay current on the research frontier (VLAs, video foundation...Performance
- ...token and diffusion-based control loops in robotics Design and optimize distributed inference systems on GPU clusters, pushing... ...experience in distributed systems, ML infrastructure, or high-performance serving (8+ years) Production-grade expertise in Python, with...Performance
$175k - $240k
...Member of Technical Staff, Research San Mateo, CA About Us At Fireworks, we're building... ...scalability, directly shaping our high-performance AI infrastructure. You'll... ...learning, distributed systems, and optimization to bring cutting-edge research into...PerformanceWork experience placementInternship$200k - $350k
...Role We're looking for engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs in... ...reliable. Key Responsibilities Build and optimize high-performance model serving systems for low-latency inference of diffusion LLMs...PerformanceImmediate startFlexible hours$200k - $350k
...Role We're looking for engineers and scientists to design, optimize, and maintain the compute foundations that power large-scale language model training and inference. You will develop high-performance ML kernels, enable efficient low-precision arithmetic, and...PerformanceImmediate startFlexible hours$175k - $220k
...Member of Technical Staff, Software Engineer San Mateo, CA About Us At Fireworks, we're building the future of generative AI infrastructure... ...from architecture to production Improve reliability, performance, and developer experience Work directly with customers...Performance- ...throughput, latency, and cost - deploying our models 2–10× faster and cheaper without quality regressions. Scope of Work - GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs. - Serving stack: TensorRT-LLM/Triton Inference...Performance
- ...productivity and strive to be a small and talent-dense team. No formal performance reviews. If you're here, you're a high-performer. Our... ...design) Are invigorated by high-performing peers and doing high-quality work Love technically challenging problems #J-18808-Ljbffr...PerformanceWork at officeLocal area
- ...feature flags, and experiment toggles. Tech signals: Portfolio of polished AI demos in production Built design systems and shipped them; cares about performance budgets. We are committed to being an on-site, in-person team currently based in San Mateo #J-18808-Ljbffr...Performance
$200k - $300k
...integrations that power them. The other part is evaluating agent performance by designing evaluation pipelines and benchmarks that measure... ...customers and translating their scientific needs into technical requirements. Ability to move quickly in a fast-paced research...PerformanceWork at office$200k - $350k
...novel training techniques and pushing the boundaries of parallel token generation. Key Responsibilities Design, develop, and optimize architectures for diffusion-based language models. Implement innovative training objectives and loss functions for discrete...Immediate startFlexible hours$200k - $350k
...strategies, and build the algorithms that align model behavior with human intent at scale. Key Responsibilities Design, develop, and optimize RL training pipelines (PPO, DPO, RLHF, and novel approaches) for diffusion-based LLMs. Build and iterate on reward models,...Immediate startFlexible hours- ...Job Title What You'll Do Develop and optimize a learning-based robotic manipulation control stack Design and maintain a teleoperation system with smooth, precise motion and low latency Train robotic policies for manipulation and locomotion with reinforcement...
- ...frameworks (e.g., gVisor, Kata Containers, Firecracker). Familiarity with distributed storage, observability systems, or high-performance compute environments. Why Join Us? ~ Competitive salary and equity share in building the future of biomedical discovery...PerformanceWork at office
- ...product features. Ship quickly, iterate based on feedback, and continuously raise the bar on product quality, reliability, and performance. Requirements 2+ years of industry experience as a product, full-stack, or frontend-leaning software engineer. Strong experience...PerformanceWork at office
- What You’ll Do Design, build, and maintain large-scale data pipelines (batch and streaming) for robotics foundation model training and evaluation at petabyte scale Own core data infrastructure: data model, storage systems, ingestion pipelines, transformation frameworks...Remote work
- What You\'ll Do Develop a high-throughput rendering pipeline for training robotics foundation models Design protocols and interfaces between the rendering pipeline, physics engine, and 3D generative models Build an efficient platform for large-scale robotics training...
- Introducing Moonlake, AI for creating real-time interactive content Mission : As an applied AI Research Engineer: Code agents (post training + systems) Scope of Work Agentic systems design: Tool catalogs, function calling, program synthesis/repair loops, ReAct/Reflexion...
- Security Infrastructure Engineer What You'll Do Design, build, and scale security infrastructure from the ground up across our systems, networks, endpoints, and products Own and evolve security architecture across endpoint security, network security, application...Interim role
$99.6k - $223.4k
...management operations of databases. It also performs operations autonomously based on... ...applications, tools, networks etc. As a member of the software engineering division, you... ...applications or operating systems. Provide technical leadership to other software developers....Temporary workFlexible hours- Job Title Develop a high-throughput, GPU-based simulation pipeline (primarily rigid body simulation for robots) to train robotics foundation models Implement essential robotics features, including actuators, sensors, and controllers, in collaboration with the robotics...
- ...generation paradigm of physical data synthesis— combining simulation, generative models, and autonomous agents Deep curiosity and strong technical ownership, with a track record of driving complex, open-ended projects from concept to implementation Experience with (multimodal)...Remote work
$85k - $145.3k
...experiences to enhance our collective expertise Technical Specialist Responsibilities: Coordinate... ...Veeva Vault. Ability to work with team members, vendors, suppliers, and contract... ...-focused culture Competitive pay plus performance-based incentive programs Company-paid...PerformanceContract workTemporary workWork experience placement- ...with a warm and sincere culture that puts the welfare of team members at the forefront." Maryna Agaibi Counsel | Legal &... ...Manager Data Center Operations Burlington, TX Principal Member of Technical Staff, Agent Workflow Systems and Evaluation Operational Excellence...InternshipRemote workNight shift
- ...relentless drive to make a difference. Every member of Gilead's team plays a critical role... ...on AWS S3, ensuring high availability, performance, and scalability Partner with MDM,... ..., replication, archival, and cost optimization Work with the MSP team to ensure...Performance
- ...the first party retail team, internal technical partners, and other operations teams. Responsibilities... ...Monitor retail technology performance dashboards, proactively consolidating... ...engagement and uptime metrics, and drive optimization efforts to improve customer experience...PerformanceContract work
- ...Commission License Reimbursement Simple IRA Bonus based on performance Competitive salary Health insurance Opportunity for... ...market appropriate products and services. As an Agent Team Member, you will receive... Simple IRA Hourly pay plus...PerformanceHourly payFor contractorsFlexible hours
- ...Description Benefits: Simple IRA Hiring bonus Bonus based on performance Competitive salary Flexible schedule Health insurance... ...is laid-back and supportive, with a focus on giving team members ownership without micromanaging. Were looking for someone who...PerformanceWork at officeFlexible hours
$200k - $350k
...The Role We're hiring a hands-on Staff Security Engineer to build the security foundation for a frontier AI platform serving... ..., privacy, compliance, and infrastructure risk as we scale - a technical leader, not a friction point for the engineering team. What...Immediate startFlexible hours- ...Merchandising and Technical Specialist - Best Buy Are you detail-oriented, tech-savvy, and love working independently? As a Merchandising... ...an impact. Your work helps shape retail strategies and brand performance. What will you do? Visit stores as a professional...PerformanceFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff, Performance Optimization. Be the first to apply!
- IT performance management San Mateo, CA
- senior performance engineer San Mateo, CA
- senior performance tester San Mateo, CA
- acting performance San Mateo, CA
- performance engineer San Mateo, CA
- system performance engineer San Mateo, CA
- application performance engineer San Mateo, CA
- high performance computing engineer San Mateo, CA
- performance testing San Mateo, CA
- lead performance test engineer San Mateo, CA


