Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff, Model Efficiency

Cohere

Member of Technical Staff, Model Efficiency Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products. Join us on our mission and shape the future! Why this role? Our team is a fast-growing group of researchers and engineers focused on building reliable ML systems and pushing the boundaries of LLM inference efficiency. We develop techniques that improve how models execute in production, driving lower latency, higher throughput, and consistent quality across diverse workloads. As an engineer on this team, you’ll work across the inference stack to improve core performance metrics by diving deep into model execution, identifying bottlenecks, and developing innovative optimizations. You’ll collaborate closely with modeling and systems teams to experiment, measure, and ship improvements that meaningfully accelerate inference. As the team evolves, you’ll have opportunities to build expertise in advanced performance techniques, including GPU/CUDA optimizations, kernel-level improvements, and model execution strategies for MoE and large‑scale architectures. We have offices in Toronto, Montreal, San Francisco, New York, Paris, Seoul, and London. Remote‑friendly environment, with preferred locations in EST and PST time zones. You may be a good fit for the Model Efficiency team if you have: 5+ years of experience writing high‑performance, production‑quality code Strong programming skills in C++ or Python (Rust/Go also welcome) Experience working with large language models and familiarity with the LLM inference ecosystem (e.g., vLLM, SGLang, etc.) Ability to diagnose and resolve performance bottlenecks across the model execution stack A strong bias for action — you ship fast, measure impact, and iterate It’s a big plus if you have experience with: GPU programming, CUDA, or low‑level systems optimization Language modeling with transformers (MoE, speculative decoding, KV‑cache optimizations) Scaling performance‑critical distributed systems (e.g., computation, search, storage) If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs. Full‑time employees at Cohere enjoy these perks An open and inclusive culture and work environment Work closely with a team on the cutting edge of AI research Weekly lunch stipend, in‑office lunches & snacks Full health and dental benefits, including a separate budget to take care of your mental health 100% parental leave top‑up for up to 6 months Personal enrichment benefits towards arts and culture, fitness and well‑being, quality time, and workspace improvement Remote‑flexible, offices in Toronto, New York, San Francisco, London, and Paris, as well as a co‑working stipend 6 weeks of vacation (30 working days!) Seniority level Mid‑Senior level Employment type Full‑time Job function Engineering and Information Technology Industries: Software Development Referrals increase your chances of interviewing at Cohere by 2×. #J-18808-Ljbffr Cohere

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff, Model Efficiency in San Francisco, CA vacancy
  • Member of Technical Staff, Frontier model Development Mirendil Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration across science and technology. Our first goal is to democratize frontier AI R&D across scientific disciplines... 
    Suggested

    Mirendil

    San Francisco, CA
    1 day ago
  • We're partnering with a frontier AI research company on a search for a Member of Technical Staff focused on AI Safet y. The company is building next-generation open-weight foundation models with a mission to make advanced AI broadly accessible. Their team includes researchers... 
    Suggested

    Xcede

    San Francisco, CA
    2 days ago
  • Model Engineer San Francisco | Onsite | Full-time Build the AI infrastructure layer of the physical world. As a founding member of our engineering team, you will develop models that predict network...  ...stacks. Want to build the technical DNA of a new applied research org... 
    Suggested
    Full time

    53 Stations

    San Francisco, CA
    23 hours ago
  • Introducing Moonlake, AI for creating world simulations. Scope of Work Training efficiency Dataloaders, fusion, activation remat, gradient checkpointing. FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight profiling, Triton/CUDA kernels... 
    Suggested

    Embedding VC

    San Francisco, CA
    2 days ago
  •  ...Introducing Moonlake, AI for creating world simulations. Modeling & architecture Build and iterate on 2D/3D/image/video/audio diffusion architectures Work on conditioning: text/image/pose/layout/control signals, multi-modal encoders, guidance strategies.... 
    Suggested

    Embedding VC

    San Francisco, CA
    3 days ago
  •  ...teammates You believe truth-seeking AI is the most important and challenging problem You are obsessed about building incredibly useful models You are a power user of AI models If you previously trained models used by millions of people it’s a big plus, but modeling... 

    xAI

    San Francisco, CA
    3 days ago
  • $180k - $260k

    Perplexity is looking for a Model Behavior Architect to help shape our answer engine. This role collaborates with our research, design, and engineering teams to align model behavior with product goals. Responsibilities Design, refine, and implement context engineering strategies... 

    Perplexity

    San Francisco, CA
    1 day ago
  •  ...assembling a founding core engineering team to build and train models that understand these systems, optimize operations, anticipate...  ...collaborate across the hardware and software stack. Want to build the technical DNA of a new applied research org from the ground up. #J-18808... 

    Meter

    San Francisco, CA
    3 days ago
  • $130k - $240k

     ...frontier technology. The Role Being a Member of Technical Staff at SketchPro means the problem in...  ...designing how an agent represents a Revit model in context, then shift to building the...  ...problems and craft reliable and efficient solutions. Thrives in early-stage environments... 
    Work at office
    Shift work

    SketchPro AI

    San Francisco, CA
    2 days ago
  •  ...function improvements in performance and efficiency. Customers deploy through production-...  ...role Gimlet Labs is seeking an Member of Technical Staff focused on AI research. As...  ...optimizations across the latest AI models. The research team is responsible for... 

    Gimlet Labs

    San Francisco, CA
    4 days ago
  •  ...function improvements in performance and efficiency. Customers deploy through production-...  ...role Gimlet Labs is seeking a Member of Technical Staff (Intern) to help develop Gimlet's...  ...research Researching ways to improve model accuracy, performance and efficiency... 
    Internship

    Gimlet Labs

    San Francisco, CA
    4 days ago
  • $200k - $350k

     ...Member Of Technical Staff, Inference & Serving Inception creates the world's fastest, most efficient AI models. Our Mercury model is the world's fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency than today... 
    Immediate start
    Flexible hours

    Inception LLC

    San Francisco, CA
    3 days ago
  • $200k - $350k

     ...Member Of Technical Staff, Training Infra Bay Area Ai Systems Inception creates the world's fastest, most efficient AI models. Our Mercury model is the world's fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency... 
    Immediate start
    Flexible hours

    Inception LLC

    San Francisco, CA
    23 hours ago
  • $180k

     ...Member Of Technical Staff - Pre-Training Palo Alto, CA About XAI XAI's mission is to create...  ...as a variety of smaller specialized models. Rapidly implementing the latest state...  ...experience on optimizing ML training efficiency. Annual Salary Range $180,000... 
    Temporary work

    Xai

    San Francisco, CA
    23 hours ago
  •  ...hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without...  ...of AI. About the role Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will... 

    Gimlet Labs

    San Francisco, CA
    1 day ago
  • $180k

     ...Member Of Technical Staff - RL Infrastructure Palo Alto, CA About XAI XAI's mission is to create AI...  ...the following: # We have a new agentic model capability that we'd like to improve. How do we design an efficient and robust environment for the agent to perform... 
    Temporary work

    Xai

    San Francisco, CA
    23 hours ago
  •  ...AI applications. We are looking for team members who love building enabling systems that...  ...infrastructure solutions for various deployment models, including SaaS, single-tenant, and...  ...and deployment processes to enhance efficiency and reliability. Ensure compliance with... 
    Work at office

    LlamaIndex

    San Francisco, CA
    3 days ago
  •  ...function improvements in performance and efficiency. Customers deploy through production-...  ...role Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference....  ...inference systems that execute full models end-to-end under real production constraints... 

    Gimlet Labs

    San Francisco, CA
    1 day ago
  •  ...function improvements in performance and efficiency. Customers deploy through production-...  ...role Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU...  ...CUTLASS, or other accelerator programming models Deep understanding of GPU execution... 

    Gimlet Labs

    San Francisco, CA
    1 day ago
  •  ...Member Of Technical Staff We're looking for a member of technical staff to build and deploy production...  .... In this role, you'll work across modeling, systems, and product to take ideas...  ...Improve latency, throughput, cost efficiency, and reliability of systems Work with... 

    ERAGON

    San Francisco, CA
    2 days ago
  • A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have over... 
    Remote work

    Cohere

    San Francisco, CA
    3 days ago
  •  ...will be the primary driver of the system architecture, technical direction and each team member’s technical skill development At Anchorage Digital,...  ...actively participating in product development. Foster an efficient deterministic testing culture, with an emphasis on... 

    Anchorage Lending CA, LLC

    San Francisco, CA
    23 hours ago
  •  ...run benchmarks reliably and efficiently. At Vals, we believe in autonomy...  ...and research labs measuring model performance. We work with all...  ...reviews for other members of the team Help establish engineering...  ...meets their needs Requirements Technical 2+ YOE: 2+ years of full-time... 
    Full time
    Work experience placement
    Relocation
    Relocation package
    Shift work

    PetsApp

    San Francisco, CA
    23 hours ago
  • $150k - $300k

     ...stack - from frontier agentic models to the infra that enables...  ...infrastructure to serve LLMs efficiently at scale. Optimization and integration...  ...our RL training stack. Core Technical Responsibilities LLM Serving...  ...and encourage team members to contribute to the broader... 
    Work at office
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours
    Shift work

    Prime-Intellect

    San Francisco, CA
    2 days ago
  • Member of Technical Staff, Infrastructure and Training Systems Location: SF Bay Area or Tokyo, Japan Type...  ...the rigor of distributed systems, model architecture, and numerics research to...  ...that makes large-scale experimentation efficient, reproducible, and robust enough to support... 
    Full time

    Radical Numerics

    San Francisco, CA
    1 day ago
  •  ...We’re training and deploying frontier models for developers and enterprises who...  ...that matter, and join the team. As a Member of Technical Staff with a focus on Multimodal AI, you will...  .... Bonus: Experience in writing efficient GPU kernels using CUDA, optimising performance... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    2 days ago
  • $200k - $330k

    Perplexity is seeking an intrepid, polymathic Member of Technical Staff to take on one of the AI industry's most unique engineering roles. You...  ...federal court. Engineer privacy and compliance systems that efficiently scale with our growing portfolio of frontier AI products (... 

    Perplexity

    San Francisco, CA
    2 days ago
  • Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for Member of Technical...  ...engineers to focus on building impactful models, not wrangling with complex data...  ...heterogeneous compute resources (CPU and GPU) efficiently? What data model will enable us to... 
    Full time
    Part time
    Work at office
    Work from home
    Flexible hours
    2 days per week

    Pixeltable, Inc.

    San Francisco, CA
    1 day ago
  •  ...component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous systems across...  ...to gigawatt-class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will build... 

    Gimlet Labs

    San Francisco, CA
    1 day ago
  •  ...training signal needed to make capable models. Today, only a handful of players...  ...workspaces, replayable rollouts, storage-efficient forks, or recursive debugging loops....  ...feel like one seamless system. As a Member of Technical Staff, Infrastructure / DevOps, you will... 

    Plato

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff, Model Efficiency. Be the first to apply!