Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Research Engineer, Model Efficiency

Cohere

Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products. Join us on our mission and shape the future! Why this role? Large Language Models (LLMs) continue to push the boundaries of what AI systems can do — but inference is still the bottleneck. The Model Efficiency team is responsible for pushing the limits of LLM inference efficiency across our foundation models. We explore and ship breakthroughs across the model execution stack, including: model architecture and MoE routing optimization decoding and inference-time algorithm improvements software/hardware co-design for GPU acceleration performance optimization without compromising model quality We have offices in Toronto, Montreal, San Francisco, New York, Paris, Seoul and London. We embrace a remote-friendly environment, and as part of this approach, we strategically distribute teams based on interests, expertise, and time zones to promote collaboration and flexibility. You'll find the Model Efficiency team concentrated in the EST and PST time zones, these are our preferred locations. As a Staff Research Engineer, you will develop, prototype, and deploy techniques that materially improve how fast and efficiently our models run in production. You may be a good fit for the model efficiency team if you: Have a PhD in Machine Learning or a related field Understand LLM architecture, and how to optimize LLM inference given resource constraints Have significant experience with one or more techniques that enhance model efficiency Strong software engineering skills An appetite to work in a fast-paced high-ambiguity start-up environment Publications at top-tier conferences and venues (ICLR, ACL, NeurIPS) Passion to mentor others If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs. Full-Time Employees At Cohere Enjoy These Perks An open and inclusive culture and work environment Work closely with a team on the cutting edge of AI research Weekly lunch stipend, in-office lunches & snacks Full health and dental benefits, including a separate budget to take care of your mental health 100% Parental Leave top-up for up to 6 months Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend ✈️ 6 weeks of vacation (30 working days!) #J-18808-Ljbffr Cohere

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Staff Research Engineer, Model Efficiency in San Francisco, CA vacancy
  • A leading AI research company in San Francisco is seeking a Staff Research Engineer to enhance the efficiency of large language models. In this role, you will develop and implement advanced techniques to optimize model performance in production. Ideal candidates will hold... 
    Suggested
    Remote work

    Cohere

    San Francisco, CA
    5 days ago
  • A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have over... 
    Suggested
    Remote work

    Cohere

    San Francisco, CA
    5 days ago
  • Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while enhancing core performance metrics across model execution. You'll work with advanced performance techniques such as GPU/CUDA optimizations... 
    Suggested
    Remote job

    Jaide Health

    San Francisco, CA
    3 days ago
  •  ...company based in San Francisco, California. The Role: As a Research Engineer - Model Architectures , you will be a core contributor to Zyphra’...  ...pipelines and the hardware requirements to design efficient architectures for GPU hardware Strong grasp of proper experimental... 
    Suggested
    Work at office
    Relocation package

    Zyphra

    San Francisco, CA
    11 days ago
  • $315k

     ...growing group of committed researchers, engineers, policy experts, and business...  ...role Anthropic's production models undergo sophisticated post-training...  ..., build, and run robust, efficient pipelines for model fine-...  ...: Currently, we expect all staff to be in one of our offices... 
    Suggested
    Work at office
    Visa sponsorship
    Flexible hours

    Menlo Ventures

    San Francisco, CA
    5 days ago
  • $220.8k - $298.8k

    # Staff Applied Research EngineerHybrid - San FranciscoApply**Our Mission & Values...  ...time and money by making efficiency a priority.**Our Culture &...  ...with a **thoughtful hybrid model** because we believe...  ...Drata is seeking an Applied AI Engineer to drive the quality and effectiveness... 
    Work at office
    Immediate start
    Worldwide
    Monday to Friday
    Flexible hours

    Careers at Drata

    San Francisco, CA
    1 day ago
  •  ...Job Description Zyphra is an artificial intelligence company based in San Francisco, California. The Role: As a Research Engineer - Language Model Pre-Training , you'll shape our language model roadmap through end-to-end pretraining development. You will work... 
    Work at office
    Relocation package

    Zyphra

    San Francisco, CA
    11 days ago
  •  ...Member of Technical Staff, Model Efficiency Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying...  ...do what’s best for our customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    5 days ago
  • $320k

    Anthropic in New York City is seeking a Research Engineer to develop evaluations for Claude’s capabilities. The ideal candidate should have...  ...results during training runs. The role offers a hybrid work model and competitive compensation ranging from $320,000 to $485,00... 
    Remote job

    Menlo Ventures

    San Francisco, CA
    5 days ago
  • Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning... 
    Full time

    Refresh AI

    San Francisco, CA
    3 days ago
  • A leading AI research firm in San Francisco is seeking a Research Engineer specializing in Model Architectures. You will design and rigorously test innovative model architectures, improving core modeling capabilities and collaborating closely with pre-training teams. Candidates... 

    Zyphra

    San Francisco, CA
    4 days ago
  •  ...At Liquid, we’re not just building AI models—we’re redefining the architecture of intelligence...  ...out of MIT, our mission is to build efficient AI systems at every scale. Our Liquid...  ...team is a community of world-class engineers, researchers, and builders creating the next... 

    Liquid AI

    San Francisco, CA
    5 days ago
  •  ...is seeking talented individuals for AI research roles focused on open superintelligence....  ...contribute to the development of foundational models. The ideal candidate will hold a...  ...a related field, possess solid software engineering skills, and have experience with large-scale... 

    B Capital

    San Francisco, CA
    1 day ago
  • $192k - $260k

    A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate will have over 10 years of experience in building large-scale distributed systems and will collaborate closely across teams... 

    Databricks

    San Francisco, CA
    5 days ago
  • Reflection is seeking a talented individual to conduct critical analysis and build evaluation frameworks to improve model capabilities. The ideal candidate will possess strong statistical analysis skills and familiarity with LLM evaluation methodologies. We offer top-tier... 
    Relocation package

    Reflection

    San Francisco, CA
    5 days ago
  • Introducing Moonlake, AI for creating world simulations. Modeling & architecture Build and iterate on 2D/3D/image/video/audio diffusion architectures Work on conditioning: text/image/pose/layout/control signals, multi-modal encoders, guidance strategies. Training &... 

    Embedding VC

    San Francisco, CA
    3 days ago
  • Gravity Engineering Services Pvt Ltd. is seeking a Member of Technical Staff in San Francisco, California. In this role, you will design and build the infrastructure necessary for models to learn from production workflows continually. You will manage end-to-end experiments... 

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    4 days ago
  • $220.8k - $298.8k

     ...automation. Drata is seeking an Applied AI Engineer to drive the quality and effectiveness of our AI systems through rigorous research, experimentation, and evaluation. In this...  ..., and classic ML: chunking, embedding models, hybrid search, metadata filtering, structured... 
    Flexible hours

    Drata

    San Francisco, CA
    2 days ago
  • $200k

    A technology company in San Francisco is seeking a Software Engineer for their RL Research & Environments team. The role focuses on designing and improving data and evaluation systems to enhance model capabilities. Candidates should have a strong software engineering background... 

    SupportFinity™

    San Francisco, CA
    2 days ago
  • $300k

     ...and grow as a team. About the Team The Research team at Decagon innovates on building the...  ...\u2019re looking for people with strong engineering skills, writing bug-free machine learning...  ...happen. In this role, you will Develop models for customer support tasks that exceed the... 
    Work at office

    Decagon

    San Francisco, CA
    1 day ago
  •  ...profitable Enterprise AI Customer Support startup with their search for senior/staff ML research engineers. The role will be onsite in their SF office. What you'll do: Develop models for customer support tasks that exceed the performance of closed source models... 
    Work at office

    DRH Search

    San Francisco, CA
    4 days ago
  •  ...leading conversational AI platform in San Francisco seeks an AI/ML Engineer to build advanced systems for unprecedented performance. The...  .../ML projects. You'll design state-of-the-art methods, develop models for customer support, and tackle complex challenges while working... 

    Decagon

    San Francisco, CA
    4 days ago
  • Perplexity is seeking a Research Engineering Manager to lead the team of all-star AI researchers and engineers responsible for developing the models that drive our products. Our team has developed some of the most advanced models for agentic research, query understanding... 

    Perplexity AI Inc.

    San Francisco, CA
    2 days ago
  •  ...Francisco is seeking a candidate for a unique role at the intersection of AI research and systems engineering. You will design experiments, build task generation systems, and evaluate model failures. This is a hands-on role that requires the ability to transform research... 

    Plato

    San Francisco, CA
    2 days ago
  •  ...person who takes the newest open-source models (image, video, 3D, audio, multimodal...)...  ...under the hood You've ported models from research into production and gotten the outputs to...  ...to run natively in the ComfyUI core engine Design and build the native nodes that... 

    ComfyUI

    San Francisco, CA
    2 days ago
  •  ...focused company in San Francisco seeks candidates with expertise in AI simulation development. The role emphasizes optimizing training efficiency, enhancing GPU performance, and ensuring low-latency inference. Applicants should be proficient in methodologies for gradient... 

    Embedding VC

    San Francisco, CA
    3 days ago
  • $264.8k - $331k

     ...around the world. The Enterprise ML Research Lab works on the front lines of this AI...  ...all of our enterprise clients. As a Staff Agent Post-Training MLRE, you will build...  ...to training foundation healthtech search models. If you are excited about shaping the future... 
    Full time

    Scale AI

    San Francisco, CA
    3 days ago
  •  ...generation systems, run evaluations, inspect model failures, and develop methods for mining...  ...will consume real-world trajectories or researcher hypotheses, materialize realistic data,...  ...of empirical AI research, systems engineering, and model evaluation. You may be a strong... 

    Plato

    San Francisco, CA
    2 days ago
  •  ...product roadmap, so we are expanding our engineering team. We're looking for someone highly...  ...Background Listen Labs is an AI-powered research platform that helps teams uncover insights...  ...of responses. Customer Preference Model & Synthetic Personas We're bringing Jeff... 
    Flexible hours

    Listen Labs

    San Francisco, CA
    16 hours ago
  •  ...Salesforce, etc. We are a small team of engineers wrangling problems from context to search...  ...real tool calling data, measuring where models suck in long horizon tool execution...  ...harness and app sandboxes Qualifications research you can independently execute against the... 

    Composio

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Research Engineer, Model Efficiency. Be the first to apply!