Member of Technical Staff, Model Efficiency
Cohere
Member of Technical Staff, Model Efficiency Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products. Join us on our mission and shape the future! Why this role? Our team is a fast-growing group of researchers and engineers focused on building reliable ML systems and pushing the boundaries of LLM inference efficiency. We develop techniques that improve how models execute in production, driving lower latency, higher throughput, and consistent quality across diverse workloads. As an engineer on this team, you’ll work across the inference stack to improve core performance metrics by diving deep into model execution, identifying bottlenecks, and developing innovative optimizations. You’ll collaborate closely with modeling and systems teams to experiment, measure, and ship improvements that meaningfully accelerate inference. As the team evolves, you’ll have opportunities to build expertise in advanced performance techniques, including GPU/CUDA optimizations, kernel-level improvements, and model execution strategies for MoE and large‑scale architectures. We have offices in Toronto, Montreal, San Francisco, New York, Paris, Seoul, and London. Remote‑friendly environment, with preferred locations in EST and PST time zones. You may be a good fit for the Model Efficiency team if you have: 5+ years of experience writing high‑performance, production‑quality code Strong programming skills in C++ or Python (Rust/Go also welcome) Experience working with large language models and familiarity with the LLM inference ecosystem (e.g., vLLM, SGLang, etc.) Ability to diagnose and resolve performance bottlenecks across the model execution stack A strong bias for action — you ship fast, measure impact, and iterate It’s a big plus if you have experience with: GPU programming, CUDA, or low‑level systems optimization Language modeling with transformers (MoE, speculative decoding, KV‑cache optimizations) Scaling performance‑critical distributed systems (e.g., computation, search, storage) If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs. Full‑time employees at Cohere enjoy these perks An open and inclusive culture and work environment Work closely with a team on the cutting edge of AI research Weekly lunch stipend, in‑office lunches & snacks Full health and dental benefits, including a separate budget to take care of your mental health 100% parental leave top‑up for up to 6 months Personal enrichment benefits towards arts and culture, fitness and well‑being, quality time, and workspace improvement Remote‑flexible, offices in Toronto, New York, San Francisco, London, and Paris, as well as a co‑working stipend 6 weeks of vacation (30 working days!) Seniority level Mid‑Senior level Employment type Full‑time Job function Engineering and Information Technology Industries: Software Development Referrals increase your chances of interviewing at Cohere by 2×. #J-18808-Ljbffr Cohere
- Member of Technical Staff, Frontier model Development Mirendil Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration across science and technology. Our first goal is to democratize frontier AI R&D across scientific disciplines...Suggested
- We're partnering with a frontier AI research company on a search for a Member of Technical Staff focused on AI Safet y. The company is building next-generation open-weight foundation models with a mission to make advanced AI broadly accessible. Their team includes researchers...Suggested
- Model Engineer San Francisco | Onsite | Full-time Build the AI infrastructure layer of the physical world. As a founding member of our engineering team, you will develop models that predict network... ...stacks. Want to build the technical DNA of a new applied research org...SuggestedFull time
- Introducing Moonlake, AI for creating world simulations. Scope of Work Training efficiency Dataloaders, fusion, activation remat, gradient checkpointing. FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight profiling, Triton/CUDA kernels...Suggested
- ...Introducing Moonlake, AI for creating world simulations. Modeling & architecture Build and iterate on 2D/3D/image/video/audio diffusion architectures Work on conditioning: text/image/pose/layout/control signals, multi-modal encoders, guidance strategies....Suggested
- ...teammates You believe truth-seeking AI is the most important and challenging problem You are obsessed about building incredibly useful models You are a power user of AI models If you previously trained models used by millions of people it’s a big plus, but modeling...
$180k - $260k
Perplexity is looking for a Model Behavior Architect to help shape our answer engine. This role collaborates with our research, design, and engineering teams to align model behavior with product goals. Responsibilities Design, refine, and implement context engineering strategies...- ...assembling a founding core engineering team to build and train models that understand these systems, optimize operations, anticipate... ...collaborate across the hardware and software stack. Want to build the technical DNA of a new applied research org from the ground up. #J-18808...
$130k - $240k
...frontier technology. The Role Being a Member of Technical Staff at SketchPro means the problem in... ...designing how an agent represents a Revit model in context, then shift to building the... ...problems and craft reliable and efficient solutions. Thrives in early-stage environments...Work at officeShift work- ...function improvements in performance and efficiency. Customers deploy through production-... ...role Gimlet Labs is seeking an Member of Technical Staff focused on AI research. As... ...optimizations across the latest AI models. The research team is responsible for...
- ...function improvements in performance and efficiency. Customers deploy through production-... ...role Gimlet Labs is seeking a Member of Technical Staff (Intern) to help develop Gimlet's... ...research Researching ways to improve model accuracy, performance and efficiency...Internship
$200k - $350k
...Member Of Technical Staff, Inference & Serving Inception creates the world's fastest, most efficient AI models. Our Mercury model is the world's fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency than today...Immediate startFlexible hours$200k - $350k
...Member Of Technical Staff, Training Infra Bay Area Ai Systems Inception creates the world's fastest, most efficient AI models. Our Mercury model is the world's fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency...Immediate startFlexible hours$180k
...Member Of Technical Staff - Pre-Training Palo Alto, CA About XAI XAI's mission is to create... ...as a variety of smaller specialized models. Rapidly implementing the latest state... ...experience on optimizing ML training efficiency. Annual Salary Range $180,000...Temporary work- ...hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without... ...of AI. About the role Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will...
$180k
...Member Of Technical Staff - RL Infrastructure Palo Alto, CA About XAI XAI's mission is to create AI... ...the following: # We have a new agentic model capability that we'd like to improve. How do we design an efficient and robust environment for the agent to perform...Temporary work- ...AI applications. We are looking for team members who love building enabling systems that... ...infrastructure solutions for various deployment models, including SaaS, single-tenant, and... ...and deployment processes to enhance efficiency and reliability. Ensure compliance with...Work at office
- ...function improvements in performance and efficiency. Customers deploy through production-... ...role Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference.... ...inference systems that execute full models end-to-end under real production constraints...
- ...function improvements in performance and efficiency. Customers deploy through production-... ...role Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU... ...CUTLASS, or other accelerator programming models Deep understanding of GPU execution...
- ...Member Of Technical Staff We're looking for a member of technical staff to build and deploy production... .... In this role, you'll work across modeling, systems, and product to take ideas... ...Improve latency, throughput, cost efficiency, and reliability of systems Work with...
- A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have over...Remote work
- ...will be the primary driver of the system architecture, technical direction and each team member’s technical skill development At Anchorage Digital,... ...actively participating in product development. Foster an efficient deterministic testing culture, with an emphasis on...
- ...run benchmarks reliably and efficiently. At Vals, we believe in autonomy... ...and research labs measuring model performance. We work with all... ...reviews for other members of the team Help establish engineering... ...meets their needs Requirements Technical 2+ YOE: 2+ years of full-time...Full timeWork experience placementRelocationRelocation packageShift work
$150k - $300k
...stack - from frontier agentic models to the infra that enables... ...infrastructure to serve LLMs efficiently at scale. Optimization and integration... ...our RL training stack. Core Technical Responsibilities LLM Serving... ...and encourage team members to contribute to the broader...Work at officeRemote workVisa sponsorshipRelocation packageFlexible hoursShift work- Member of Technical Staff, Infrastructure and Training Systems Location: SF Bay Area or Tokyo, Japan Type... ...the rigor of distributed systems, model architecture, and numerics research to... ...that makes large-scale experimentation efficient, reproducible, and robust enough to support...Full time
- ...We’re training and deploying frontier models for developers and enterprises who... ...that matter, and join the team. As a Member of Technical Staff with a focus on Multimodal AI, you will... .... Bonus: Experience in writing efficient GPU kernels using CUDA, optimising performance...Full timeWork at officeRemote workFlexible hours
$200k - $330k
Perplexity is seeking an intrepid, polymathic Member of Technical Staff to take on one of the AI industry's most unique engineering roles. You... ...federal court. Engineer privacy and compliance systems that efficiently scale with our growing portfolio of frontier AI products (...- Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for Member of Technical... ...engineers to focus on building impactful models, not wrangling with complex data... ...heterogeneous compute resources (CPU and GPU) efficiently? What data model will enable us to...Full timePart timeWork at officeWork from homeFlexible hours2 days per week
- ...component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous systems across... ...to gigawatt-class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will build...
- ...training signal needed to make capable models. Today, only a handful of players... ...workspaces, replayable rollouts, storage-efficient forks, or recursive debugging loops.... ...feel like one seamless system. As a Member of Technical Staff, Infrastructure / DevOps, you will...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff, Model Efficiency. Be the first to apply!
- remote support technician San Francisco, CA
- personal computer support technician San Francisco, CA
- customer support analyst San Francisco, CA
- systems support technician San Francisco, CA
- help desk administrator San Francisco, CA
- decision support analyst San Francisco, CA
- technical support assistant San Francisco, CA
- technical analyst San Francisco, CA
- technical assistant San Francisco, CA
- IT support technician San Francisco, CA


