Member of Technical Staff, Model Efficiency

Cohere

Member of Technical Staff, Model Efficiency Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products. Join us on our mission and shape the future! Why this role? Our team is a fast-growing group of researchers and engineers focused on building reliable ML systems and pushing the boundaries of LLM inference efficiency. We develop techniques that improve how models execute in production, driving lower latency, higher throughput, and consistent quality across diverse workloads. As an engineer on this team, you’ll work across the inference stack to improve core performance metrics by diving deep into model execution, identifying bottlenecks, and developing innovative optimizations. You’ll collaborate closely with modeling and systems teams to experiment, measure, and ship improvements that meaningfully accelerate inference. As the team evolves, you’ll have opportunities to build expertise in advanced performance techniques, including GPU/CUDA optimizations, kernel-level improvements, and model execution strategies for MoE and large‑scale architectures. We have offices in Toronto, Montreal, San Francisco, New York, Paris, Seoul, and London. Remote‑friendly environment, with preferred locations in EST and PST time zones. You may be a good fit for the Model Efficiency team if you have: 5+ years of experience writing high‑performance, production‑quality code Strong programming skills in C++ or Python (Rust/Go also welcome) Experience working with large language models and familiarity with the LLM inference ecosystem (e.g., vLLM, SGLang, etc.) Ability to diagnose and resolve performance bottlenecks across the model execution stack A strong bias for action — you ship fast, measure impact, and iterate It’s a big plus if you have experience with: GPU programming, CUDA, or low‑level systems optimization Language modeling with transformers (MoE, speculative decoding, KV‑cache optimizations) Scaling performance‑critical distributed systems (e.g., computation, search, storage) If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs. Full‑time employees at Cohere enjoy these perks An open and inclusive culture and work environment Work closely with a team on the cutting edge of AI research Weekly lunch stipend, in‑office lunches & snacks Full health and dental benefits, including a separate budget to take care of your mental health 100% parental leave top‑up for up to 6 months Personal enrichment benefits towards arts and culture, fitness and well‑being, quality time, and workspace improvement Remote‑flexible, offices in Toronto, New York, San Francisco, London, and Paris, as well as a co‑working stipend 6 weeks of vacation (30 working days!) Seniority level Mid‑Senior level Employment type Full‑time Job function Engineering and Information Technology Industries: Software Development Referrals increase your chances of interviewing at Cohere by 2×. #J-18808-Ljbffr Cohere

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff, Model Efficiency in San Francisco, CA vacancy

Member of Technical Staff - Diffusion Model
...building the frontier of interactive world models: systems that generate, simulate, and... ...AI. The Role We’re looking for a Member of Technical Staff — Diffusion Models to help design and... ..., controllability, consistency, and efficiency Train large-scale generative models...
Suggested
Moonlake
San Francisco, CA
4 days ago
Senior Member of Technical Staff - Model Safety
We're partnering with a frontier AI research company on a search for a Member of Technical Staff focused on AI Safet y. The company is building next-generation open-weight foundation models with a mission to make advanced AI broadly accessible. Their team includes researchers...
Suggested
Xcede
San Francisco, CA
3 days ago
Member of Technical Staff, Model Evaluation
$350k
...building a frontier AI research company and training our own models end-to-end. Our work spans areas such as model training, reinforcement... ...quickly. Develop agent-assisted workflows for humans to efficiently inspect model behavior. Instrument training runs with...
Suggested
Mirendil
San Francisco, CA
4 days ago
Member of Technical Staff - Efficient ML
Introducing Moonlake, AI for creating world simulations. Scope of Work Training efficiency Dataloaders, fusion, activation remat, gradient checkpointing. FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight profiling, Triton/CUDA kernels...
Suggested
Embedding VC
San Francisco, CA
3 days ago
Member of Technical Staff, Frontier Model Development
...intentionally open-ended role. Some example areas you might work on (not limited to): Designing and running experiments to improve model capabilities Developing training methods, architectures, and evaluation metrics Iterating quickly on research ideas using large-scale...
Suggested
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
22 hours ago
Member of Technical Staff, Model Training
Job You will own the training pipeline behind the models that power both Parallel’s search stack and Parallel’s agents. On the search side, that means the rankers, classifiers, and query models that surface the right information. On the agent side, that means the models...
Work at office
Visa sponsorship
Parallel Web Systems
San Francisco, CA
5 days ago
Member of Technical Staff (Model Behavior Architect)
About the Role We’re looking for a Model Behavior Architect to help build the company’s AI products and evaluations. You’ll sit within our AI team and collaborate closely with research and product teams, designing prompt and context engineering strategies to deliver high...
United States Digital Space LLC
San Francisco, CA
4 days ago
Model Engineer - Member of Technical Staff
...assembling a founding core engineering team to build and train models that understand these systems, optimize operations, anticipate... ...collaborate across the hardware and software stack. Want to build the technical DNA of a new applied research org from the ground up. #J-18808...
Meter
San Francisco, CA
4 days ago
Member of Technical Staff (Model Behavior Architect)
$180k - $260k
Perplexity is looking for a Model Behavior Architect to help shape our answer engine. This role collaborates with our research, design, and engineering teams to align model behavior with product goals. Responsibilities Design, refine, and implement context engineering strategies...
Perplexity
San Francisco, CA
2 days ago
Member of Technical Staff
$130k - $200k
...frontier technology. The Role Being a Member of Technical Staff at SketchPro means the problem in front... ...how an agent represents a Revit model in context, then shift to building the... ...ambiguous problems and craft reliable and efficient solutions. Thrives in early-stage environments...
Work at office
Shift work
SketchPro.ai
San Francisco, CA
4 days ago
Member of Technical Staff
...precedents to copy from. About the Role Members of Technical Staff (MTS) are the senior engineers who... ...our internal tools. The canonical data model that survives contact with very... ...made by humans. We use AI to support efficiency and consistency, not to replace human...
BEACON SOFTWARE COMPANY
San Francisco, CA
3 days ago
Member of Technical Staff
...Job Description We’re looking for a Member of Technical Staff to build and deploy production-grade... ...systems. In this role, you’ll work across modeling, systems, and product to take ideas... ...: Improve latency, throughput, cost efficiency, and reliability of systems Data & Infrastructure...
ERAGON
San Francisco, CA
5 days ago
Member of Technical Staff
$250k
...their servers. The team is small, technical, and moving fast, with strong... ...Industry: AI Tools. The Role Member of Technical Staff who can handle everything from modeling to systems to product, taking... ...latency, throughput, cost efficiency, and reliability of production...
Full time
David Joseph & Company
San Francisco, CA
4 days ago
Member of Technical Staff
...Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for Member of Technical... ...engineers to focus on building impactful models, not wrangling with complex data... ...heterogeneous compute resources (CPU and GPU) efficiently? What data model will enable us to...
Full time
Part time
Work at office
Work from home
Flexible hours
2 days per week
Pixeltable, Inc.
San Francisco, CA
4 days ago
Member of Technical Staff
$227.5k - $401k
...individuals who tackle unique technical challenges at scale and... ...financial technology sector. As a Member of Technical Staff, you will operate with a... ...development of foundation models. For instance, you might... ...track record of writing clean, efficient, and scalable code suitable...
Work at office
Immediate start
Relocation
Flexible hours
Adyen
San Francisco, CA
3 days ago
Member of Technical Staff
$70k - $110k
...our dynamic engineering team. As a key member of our team, you will be responsible for... ...for individuals looking to apply their technical skills and knowledge in a challenging and... ...to the highest standards of safety and efficiency. 2. Conduct regular inspections of HVAC...
Temporary work
Local area
Jobot
San Francisco, CA
3 days ago
Staff Engineer - ML Inference & Model Efficiency
A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have over...
Remote work
Cohere
San Francisco, CA
4 days ago
Member of Technical Staff, Atlas
...will be the primary driver of the system architecture, technical direction and each team member’s technical skill development At Anchorage Digital,... ...actively participating in product development. Foster an efficient deterministic testing culture, with an emphasis on...
Anchorage Lending CA, LLC
San Francisco, CA
1 day ago
Member of Technical Staff - Inference
$150k - $300k
...stack - from frontier agentic models to the infra that enables... ...infrastructure to serve LLMs efficiently at scale. Optimization and integration... ...our RL training stack. Core Technical Responsibilities LLM Serving... ...and encourage team members to contribute to the broader...
Work at office
Remote work
Visa sponsorship
Relocation package
Flexible hours
Shift work
Prime-Intellect
San Francisco, CA
2 days ago
Member of Technical Staff - Kernels & GPU Performance
...hardware that best fits its performance and efficiency needs. This approach enables... ...datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance... ...CUTLASS, or other accelerator programming models Deep understanding of GPU execution...
Gimlet Labs
San Francisco, CA
2 days ago
Member of Technical Staff (AI Policy and Strategic Initiatives)
Perplexity is seeking an intrepid, polymathic Member of Technical Staff to take on one of the AI industry’s most unique engineering roles. You... ...federal court. Engineer privacy and compliance systems that efficiently scale with our growing portfolio of frontier AI products (...
aijoblist
San Francisco, CA
3 days ago
Member of Technical Staff, Kernels
Member of Technical Staff — Kernels & GPU Performance Employment Type: Full-time Workplace: On-site... ...many kinds of compute work together efficiently, reliably, and at production scale. Our... ...ROCm, or other accelerator programming models Deep understanding of GPU execution...
Full time
Acceler8 Talent
San Francisco, CA
2 days ago
Member of Technical Staff - Distributed Systems
...component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous systems across... ...to gigawatt-class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will build...
Gimlet Labs
San Francisco, CA
2 days ago
Member of Technical Staff - ML Systems & Inference
...hardware that best fits its performance and efficiency needs. This approach enables... ...datacenters. Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference.... ...inference systems that execute full models end‑to‑end under real production...
Gimlet Labs, Inc.
San Francisco, CA
4 days ago
Member of Technical Staff, Hardware Security Modules
Member of Technical Staff, Hardware Security Modules At Anchorage Digital, we are building the world’s most advanced digital asset platform for... ...a lead engineer on a team to deliver features. Foster an efficient deterministic testing culture, with an emphasis on...
Flexible hours
Crypto Pro Network
San Francisco, CA
2 days ago
Member of Technical Staff, FAR (Frontier AI & Robotics)
$150k
...initiatives in robotic intelligence. As a Member of Technical Staff, you'll spearhead the development of breakthrough foundation models that enable robots to perceive,... ...multi‑modal robotic foundation models and efficient, promptable model architectures that can...
Local area
Amazon Science
San Francisco, CA
3 days ago
Member of Technical Staff, Infrastructure / DevOps
...training signal needed to make capable models. Today, only a handful of players... ...workspaces, replayable rollouts, storage-efficient forks, or recursive debugging loops.... ...feel like one seamless system. As a Member of Technical Staff, Infrastructure / DevOps, you will...
Plato
San Francisco, CA
4 days ago
Member of Technical Staff, Infrastructure
...Token Company trains machine learning models to compress raw LLM inputs before... ...a research and product focus. As a Member of Technical Staff on our infrastructure team, you'll own... ...and scaling to reliability and cost-efficiency. This is a very high ownership role where...
Visa sponsorship
The Token Company
San Francisco, CA
3 days ago
Member of Technical Staff, Compilers
...architectures work together efficiently at production scale. Our platform... ...the company. As an early member of the engineering team, you... ...the systems, standards, and technical culture behind a new class of... ...layer compound across every model, workload, and hardware target...
Acceler8 Talent
San Francisco, CA
3 days ago
Member of Technical Staff, Inference
Member of Technical Staff — ML Systems & Inference Employment Type: Full-time Workplace: On-site About... ...many kinds of compute work together efficiently, reliably, and at production scale.... ...wants to work at the intersection of model architecture, runtime behavior, scheduling...
Full time
Acceler8 Talent
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff, Model Efficiency. Be the first to apply!