Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal AI/ML Engineer, Reliability

$295.25k - $345.04k

Roblox

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences- all created by our global community of developers and creators.


At Roblox, we're building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device.We're on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there.


A career at Roblox means you'll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone.

Why Reliability?

Roblox serves over 100 million people every day across a platform that is constantly evolving - and behind every experience is infrastructure that has to work, every time, at massive scale. The Reliability team at Roblox operates at the depth and breadth of the Roblox stack. Availability of the platform is a key company goal. We are hiring our first Principal Machine Learning engineer within our team.


As a Principal Machine Learning Engineer within Reliability, you will set the 3-5 year technical strategy and architectural blueprint for how machine learning systems/practices can be leveraged to improve the reliability of the overall Roblox platform. You will own the architectural and execution roadmap of leveraging massive data across - logs, traces, metrics, production changes, to proactively detect issues before they become real problems (MTTD) and/or reduce time to resolve incidents (MTTR). You will have the opportunity to cross functionally collaborate with other similar teams at Roblox to define best practices and software.

You will:
  • Define the strategy of leveraging Machine Learning Engineering to improve Production Systems Reliability at Roblox.
  • Improve realtime anomaly detection capabilities by leveraging various state of the art ML techniques, thereby directly contributing to improving Mean Time to Detect Production issues.
  • Develop methods to build pipelines to consume various streams of data (metrics, logs, traces, change management systems etc.).
  • Build a reasoning layer that interacts with the streams of data to find possible root causes of problems happening in production.
  • Build time-series models to predict capacity exhaustion and seasonal traffic spikes to drive automated scaling
You have:
  • Beyond off the shelf: We are looking for an expert who has knowledge of various modeling techniques, ability to go deep and fine tune models to fit our use cases.
  • Ability to propose and architect the infrastructure that allows us to implement systems that learn from user and/or automated feedback.
  • Good distributed systems fundamentals and understanding of large scale high throughput systems
You are:
  • Comfortable with Ambiguity : You thrive in undefined or open-ended problem spaces, providing structure, clarity, and decisive direction to your teams.
  • A Pragmatic Builder : You are scrappy and impact-oriented. You view undefined data and messy systems as opportunities to build structure rather than blockers to progress.
  • An Inspiring Leade r: Passionate about developing the next generation of technical leaders, managers, and engineers.
  • An Executive Communicator: Highly effective at communicating complex technical concepts to both engineering teams and non-technical executive leadership.
  • Data & System Oriented: You understand that robust data and systems are the foundation of any production application, and you design infrastructure for scale, correctness, and reliability.
  • Curious & Creative : You enjoy tackling hard problems, exploring new technologies, and driving continuous improvements in both systems and workflows.

For roles that are based at our headquarters in San Mateo, CA: The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future. All full-time employees are also eligible for equity compensation and for benefits as described on this page .

Annual Salary Range

$295,250-$345,040 USD

Roles that are based in an office are onsite Tuesday, Wednesday, and Thursday, with optional presence on Monday and Friday (unless otherwise noted).

Roblox provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. Roblox also provides reasonable accommodations to candidates with qualifying disabilities or religious beliefs during the recruiting process.

For US based roles only, please note the Company may not be able to employ candidates for this role who have United States work authorization related to certain U.S. visa categories, or support future H-1B sponsorship at this time.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Principal AI/ML Engineer, Reliability in San Mateo, CA vacancy
  • $96.8k - $251.6k

     ...Senior Principal AI Agent / ML Software Engineer (OCI) Redwood City, CA; Seattle, WA, United States Job Identification 334239 Job Category Product Development...  .... The expectation is to ship, scale, and operate reliable, secure, observable, and cost‑aware AI platform... 
    Principal
    Temporary work
    Flexible hours

    Ll Oefentherapie

    Redwood City, CA
    3 days ago
  • $295.25k - $345.04k

     ...experiences for everyone. Why Reliability? Roblox serves over 100...  ...goal. We are hiring our first Principal Machine Learning engineer within our team. As a...  ...demonstrating the impact of ML on user trust and safety outcomes...  ...stack, leveraging modern AI coding tools (e.g., Cursor)... 
    Principal
    Full time
    Work experience placement
    Work at office
    Local area
    Monday to Friday

    I did my part and supported the Regular Toilet

    San Mateo, CA
    3 days ago
  •  ...Computer Vision AI & ML Engineer San Mateo, CA Company Overview At Skild AI, we are building the world's first general purpose...  ..., augmentation, and versioning. Implement monitoring and reliability frameworks, including uncertainty estimation, failure detection... 
    Suggested

    Skild AI

    San Mateo, CA
    4 days ago
  • $169.1k - $270.8k

     ...with you. Job Description AI Governance (AIG) Engineering team is part of the Data and AI Platform...  ...product provides an inventory of ML models and AI systems , oversight for...  ...develop, and maintain scalable and reliable AI governance service s. You... 
    Suggested
    Work experience placement
    Work at office
    Local area

    Visa

    Foster, CA
    21 days ago
  • $180k - $212k

     ...visit About the department Franklin Templeton is seeking an AI/ML Lead Engineer to design and implement agents for financial advisors that...  ...Optimize systems for latency, cost efficiency, and reliability in production Contribute to infrastructure decisions around... 
    Suggested
    Full time
    Local area
    Worldwide
    Work visa
    Flexible hours
    3 days per week

    Franklin Templeton

    San Mateo, CA
    5 hours ago
  • $296k

     ...I did my part and supported the Regular Toilet is looking for a Principal ML Engineer in California to lead the development of advanced machine learning algorithms for autonomous systems. You will leverage state-of-the-art sensor data, collaborating across teams to innovate... 
    Principal

    I did my part and supported the Regular Toilet

    Foster, CA
    4 days ago
  • $200k

     ...United Cerebral Palsy of Georgia is seeking a leader to guide a talented team in engineering simulations using AI and machine learning. This role offers a competitive starting salary of $200,000 and the opportunity to shape systems and establish engineering standards.... 

    United Cerebral Palsy of Georgia

    San Mateo, CA
    3 days ago
  • $115k - $140k

     ...Qualys is seeking a Senior Security Engineer specializing in AI/ML in Foster City. This role involves building and securing GenAI applications, conducting research on AI vulnerabilities, and collaborating across teams. The ideal candidate will possess strong programming... 

    Qualys

    Foster, CA
    8 hours ago
  •  ...Senior AI/ML Engineer — LLM & Agent Stack Every production AI system, whether it's powering customer support, writing code, analyzing financial data, or diagnosing medical conditions, needs the same foundational infrastructure. A way to route between models. A way... 

    TrueFoundry

    San Mateo, CA
    1 day ago
  •  ...A leading technology company is seeking a Principal Software Engineer for the Economy ML team. You will lead data engineering efforts, setting standards for high-scale data systems and pipelines. Collaborate with Product and Data Science teams to prioritize business growth... 
    Principal

    Experimentation Jobs

    San Mateo, CA
    4 days ago
  •  ...Upstart is seeking a Principal Machine Learning Engineer to lead initiatives that enhance machine learning capabilities...  ...candidate has 7+ years of applied ML experience, proficiency in key ML...  ...Join Upstart to drive innovation in the AI lending marketplace and contribute to... 
    Principal

    UpStart

    San Mateo, CA
    4 days ago
  •  ...Physics AI Leader Luminary helps engineering companies be more competitive by getting to market faster, creating new, better products, and reducing...  ...similar in scope and capability to NVIDIA Modulus/Physics-ML (formerly Physics-Nemo), ensuring the delivery of models... 

    Luminary Cloud, Inc.

    San Mateo, CA
    12 days ago
  • $247k - $297k

     ...Staff Machine Learning Engineer Protingent Staffing has an exciting...  ...giving our customers powerful AI tools for transforming data and...  ...create agent harnesses to build reliable AI powered systems Keep up...  ...using LLMs, embeddings and other ML technologies Full lifecycle experience... 
    Contract work

    Protingent

    Hillsborough, CA
    3 days ago
  •  ...AI Models Team Member Splunk, a Cisco company, is building...  ...of Splunk and Cisco's global engineering capabilities. Our work spans networking...  ...models that enhance reliability, strengthen security, prevent...  ...and production monitoring of ML models. Strong Research Track... 
    Principal
    Flexible hours

    Webex Events (formerly Socio)

    San Mateo, CA
    4 days ago
  • $247k - $297k

     ...Protingent is seeking a Staff Machine Learning Engineer for a direct hire position based in San Francisco, CA or NYC, NY. The role involves prototyping and developing AI tools that transform data, productionizing core AI technologies, and collaborating with cross-functional... 

    Protingent

    Hillsborough, CA
    3 days ago
  • $345.04k - $399.42k

     ...safer, more civil shared experiences for everyone. Why Engineering Efficiency? The Engineering Efficiency AI Infrastructure Pod acts as Roblox’s center of...  ...design infrastructure for scale, correctness, and reliability. Curious & Creative: You enjoy tackling hard problems... 
    Principal
    Full time
    Work experience placement
    Work at office
    Local area
    Monday to Friday
    Shift work

    I did my part and supported the Regular Toilet

    San Mateo, CA
    4 days ago
  • $148k - $247k

     ...leading P&C insurance software. Our team is at the forefront of AI, cloud, and data platform adoption, working collaboratively...  ...diverse perspectives and teamwork. ¹ As a Senior AI/ML Platform Engineer, you will architect and scale the ML platform for data scientists... 
    Full time
    Part time
    Immediate start
    Flexible hours

    Guidewire

    San Mateo, CA
    1 day ago
  • $201k - $281k

     ...Principal Engineer At Coupa Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a... 
    Principal
    Work at office

    Coupa

    San Mateo, CA
    6 days ago
  • $200k - $295k

     ...Zoox in Foster City, CA is seeking a Senior Software Engineer specialized in Simulation Graphics and AI/ML. This role involves researching and implementing advanced 3D rendering techniques for sensor simulations. Candidates must have over 5 years of experience in 3D algorithms... 

    jobs.frontdoordefense.com - Jobboard

    Foster, CA
    3 days ago
  • $215.2k - $312.35k

     ...as a service, together with reliable and scalable data platform as...  ...of scalable and responsible AI, ML and Data Innovations and products...  ...managers, AI and data engineers, program managers focused on...  ...the payments ecosystem. As a Principal ML Engineer, you will drive the... 
    Principal
    Work experience placement
    Work at office
    Local area
    Remote work
    2 days per week
    3 days per week

    NLP PEOPLE

    Foster, CA
    3 days ago
  • $166.9k - $230.9k

     ...Every day, we bring creativity, experimentation, and advanced AI to reshape access to credit, helping millions move forward...  ...matters, we’d love to hear from you. The Team Upstart’s Site Reliability Engineering (SRE) team owns the reliability, resiliency, and... 
    Summer work
    Currently hiring
    Work at office
    Local area
    Remote work
    Work from home

    UpStart

    San Mateo, CA
    4 days ago
  • $152.7k - $249.2k

     ...Overview We're looking for a Senior AI Engineer to help bring pragmatic, production-grade...  ...identify high-impact opportunities, build reliable AI services, and ensure they are safe,...  ...Responsibilities Identify, prototype, and deploy AI/ML solutions into production to improve... 
    Temporary work

    Joby Aviation

    San Carlos, CA
    2 days ago
  •  ...challenges that push the boundaries of what AI can do: Create AI agents capable of...  ...Establish scientific processes for prompt engineering by leveraging deep knowledge of LLM...  ...engineer with working knowledge of modern AI/ML technologies—from hands-on experience with... 
    Work at office
    Remote work

    Wisq

    Redwood City, CA
    3 days ago
  • $162.6k - $302k

     ...computational and data ecosystems. As a Site Reliability Engineer in the Solutions Engineering capability,...  ...applications, machine‑learning (ML) workloads, and high‑performance computing...  ...services such as AWS SageMaker, Google AI Platform, or Azure ML. Deep understanding... 
    Principal
    Local area
    Relocation package
    3 days per week

    F. Hoffmann-La Roche AG

    South San Francisco, CA
    2 days ago
  • $159.21k - $196.67k

     ...and quick to deliver results. Our people-first approach to AI eliminates friction, making employees more effective and...  ...build it with us. Job Description Freshworks is seeking a Principal AI Solutions Engineer to serve as the AI technical leader for our North America... 
    Principal
    Full time
    Flexible hours

    Freshworks

    San Mateo, CA
    4 hours ago
  •  ...About Obvio AI Each year, more than 40,000 people in the U....  ...models and handles inference reliably. Optimize for GPU utilization...  ...pipeline downtime. Set the engineering standard. This is an early hire...  ...meaningful experience working on ML-heavy pipelines. You've owned... 
    Local area

    Obvio

    San Carlos, CA
    4 days ago
  • $192k - $238k

     ...Our people-first approach to AI eliminates friction, helping businesses...  ...activity across the funnel by engineering contextual data pipelines (web...  ..., GTM/ RevOps Engineer, or AI/ML Engineer working with complex...  ...backgrounds. If you can build reliable systems that help revenue... 
    Flexible hours

    Freshworks

    San Mateo, CA
    14 hours ago
  • $125.5k - $230.2k

     ...Technology – Data and Decision Science – AI Native Engineering AI/Machine Learning Engineer,...  ...addressing domains including grid and asset reliability, outage prediction and response,...  ...economics. Designing and delivering AI/ML use cases relevant to Power &... 
    Full time
    Work experience placement
    Summer holiday
    Flexible hours

    EY

    San Mateo, CA
    5 days ago
  • $195k - $350k

     ...ML Engineer San Mateo, CA (Hybrid) About Eve Eve is redefining legal technology for plaintiff...  ...recover more for clients, and grow with AI that works across every stage of a case,...  ...AI performance, ensuring meaningful, reliable outcomes. What We're Looking For... 
    Temporary work
    Work at office
    Local area
    Flexible hours

    EVE Inc

    San Mateo, CA
    3 days ago
  • $296k

     ...and you could work on any (or all!) of these components. As a Principal ML Engineer, you will lead the development of machine learning algorithms...  ...applications Develop new algorithms to apply generative AI to simulation to improve the realism of our offline validation... 
    Principal
    Temporary work
    Immediate start
    Relocation package

    I did my part and supported the Regular Toilet

    Foster, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal AI/ML Engineer, Reliability. Be the first to apply!