Staff+ Data Engineer (ML Infrastructure)
Sanas
Sanas is pioneering the future of human communication. Founded by a team of Stanford researchers and entrepreneurs with deep industry experience, Sanas has developed the world's first real-time speech AI platform capable of accent translation, noise cancellation, speech enhancement, cross-language communication, and more.
Sanas makes conversations clearer, more inclusive, and more effective, removing barriers that prevent people from being understood, regardless of accent, background noise, or native language. Sanas is currently one of the fastest growing startups in Silicon Valley, growing from $16M to $50M ARR in 2025. The company's core business is profitable and is on track to end 2026 with >$120M ARR. Our team combines deep expertise in model innovation and systems engineering with a design-minded product engineering culture to build and ship cutting-edge AI models and experiences - entirely in-house. Sanas is a 180-strong team, established in 2020. In this short span, we've successfully secured over $100 million in funding. Our innovation has been supported by the industry's leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you're not just adopting a product; you're investing in the future of communication. If you're looking to have a significant role in roadmapping and driving technical directions, if you're looking to deploy challenging and big ideas without much overhead or slowness, if you're looking to leave your mark on an ambitious, generational mission to change how the worlds thinks about speech + AI, then Sanas is a well-suited place for you. About the Role Our models are only as good as the data that trains them. As a Staff Data Engineer, you'll own the infrastructure that takes raw audio - millions of hours across accents, languages, noise conditions, and recording environments - and turns it into clean, reproducible, training-ready data at scale. You'll work directly with AI research scientists and ML engineers to design systems that move fast without breaking the data quality guarantees our models depend on. Job Description Data pipeline & lakehouse architecture- Design and implement large-scale data pipelines that ingest, transform, validate, and serve high-quality audio and metadata for AI model training, evaluation, and product telemetry.
- Own the lakehouse architecture - table format choices (Iceberg vs. Delta Lake), partitioning strategies, metadata management, and schema evolution - with a bias toward reproducibility and auditability.
- Build and maintain batch and streaming pipelines using Spark, Flink, and orchestration tooling (Airflow or Dagster), with a clear-eyed view of when each is the right tool.
- Extend and maintain feature store infrastructure to serve low-latency, versioned features for both training and real-time inference.
- Develop and maintain pipelines purpose-built for the unique challenges of audio data: large file volumes, time-series feature extraction, speaker and language metadata, and annotation versioning.
- Build tooling that supports the full audio data lifecycle - from raw ingestion and quality filtering through augmentation, segmentation, and training split generation - with reproducibility guarantees at every stage.
- Partner with ML engineers and research scientists to design data schemas, sampling strategies, and evaluation datasets that accurately reflect production conditions.
- Own data pipelines that feed human-in-the-loop annotation workflows - ensuring clean round-trips between raw data, labeling platforms, and training-ready outputs.
- Instrument pipelines with observability, data quality checks, lineage tracking, and alerting - so failures surface fast and root causes are traceable.
- Drive build vs. buy decisions for data quality, observability, and cataloging tooling with a clear framework grounded in Sanas's scale and roadmap.
- Own disaster recovery design for critical data assets - training datasets, evaluation benchmarks, and model checkpoints.
- Set the technical bar for the data engineering team - review designs and code, establish patterns, and document decisions in a way that raises the floor for everyone.
- Work cross-functionally with AI research, infrastructure, product, and legal to align data architecture with business needs and regulatory requirements.
- Contribute to hiring - identify strong candidates, conduct technical interviews, and help define what great looks like for data engineering at Sanas.
- 5+ years of experience in data engineering, ML infrastructure, or data platform roles.
- Deep expertise building distributed batch and streaming data systems in production.
- Strong command of data processing frameworks: Spark, Flink, and Ray; and orchestrators: Airflow or Dagster.
- Hands-on experience with cloud data platforms - Snowflake, Databricks, or ClickHouse - and object storage (S3, GCS) on AWS or GCP.
- Solid understanding of data lifecycle management: privacy, security, compliance, and reproducibility from ingestion through model training.
- Proven ability to work directly with ML researchers and engineers to translate model requirements into data infrastructure decisions.
- Direct experience with audio data pipelines - file handling at scale, time-series features, speaker metadata, or audio annotation tooling.
- Familiarity with ASR, TTS, or speech enhancement model training workflows and the data requirements specific to each.
- Experience with MLOps tooling - experiment tracking, dataset versioning (DVC, LakeFS), and training pipeline orchestration.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Staff+ Data Engineer (ML Infrastructure) in Palo Alto, CA vacancy
$190k - $300k
...innovative AI technology company in Palo Alto is seeking a Senior Data Platform Engineer to design and develop core services for AI applications. The... ...development, proficiency in Python, and experience with ML/LLM models. This role offers a competitive salary of $190,000...Suggested$190k - $220k
...Staff Data Engineer We're ALSO, an electric mobility company originally conceived as a part... ...generated data. This role is for ingestion, infrastructure, and large-scale data processing —... ...vehicle telemetry to downstream analytics, ML, and visualization systems. This is...SuggestedLocal areaFlexible hours- ...and organized leader with a passion for innovation and data-driven optimization. As a Data Engineer, you will work alongside senior leaders to deliver the... ...applying advanced analytics techniques such as python, ML models, LLM, etc., is a plus. ~ Ability to create analytical...SuggestedTemporary workLocal area
- Cognita Imaging Inc. is seeking a Member of Technical Staff in Data Engineering located in Palo Alto, California. The role involves building and operating data pipelines that process large volumes of radiology data to support AI model training. Candidates should demonstrate...Suggested
- ...About the Role As a Member of Technical Staff in Data Engineering, you will own the systems and... ...training runs. Partner closely with ML training engineers to enable efficient... ...experience building data pipelines or infrastructure for ML systems. Experience handling...Suggested
- Kindredventures is looking for a Senior/Staff Software Engineer to lead technical efforts in building Nuro's ML Data engine. This role involves designing scalable data pipelines, collaborating with ML researchers, and improving data quality for autonomy AI models. The...
$153k - $222k
...About the role We are looking for a Staff Data Platform Engineer to shape the strategy, architecture,... ...platform to support cutting-edge ML and autonomy development. At Applied... ...Collaborate Across Functions – Work with infrastructure, ML platform, autonomy stack, and...Full timeFor contractorsFor subcontractorCasual workWork at officeRemote workDay shift- ...Lab at Samsung Research America is building a next-generation data platform to support Smart TV products and services. With offices... ...technologies and machine learning. They will work closely with other data engineers to implement data systems in production-stable environment....
$108.22k - $162.1k
...Data Scientist / Data Analyst Marvell's semiconductor solutions... ...building blocks of the data infrastructure that connects our world.... ...streaming, 5G wireless and AI/ML among others, are driving the... ...enable AE/FAE and validation engineers to quickly interpret test results...Permanent employmentInternshipWork from home- ...It all started when engineer Fred Luddy wrote code that automated... ...brings together any AI, any data, and any workflow- helping 85... ...to a composable, agent-native infrastructure foundation that agents and applications... ...with Search Ranking, ML, and Platform engineering teams...Full timeWork at officeImmediate startRemote workFlexible hoursShift work
- ...leading AI technology firm in California seeks an experienced data engineer to design and maintain large-scale data pipelines for robotics model training. The role involves owning core data infrastructure and collaborating with a team dedicated to developing Physical...
$202.5k - $274k
...procedure/documentation to help level1/level2 engineers to perform their job efficiently Must... ...network appliances at offices and data centers. Responsibilities Ability to... ...architecture and Cisco DNA Familiarity with AI/ML concepts for network operations, including...Local areaShift work$181.1k - $318.4k
...Staff Data Science Engineer, Siri Runtime Systems and Interaction Apple is where individual imaginations gather together, committing to the values... .... Drive technical decisions and architecture for ML/AI initiatives. Identify high-impact opportunities where...Relocation- Kindredventures is looking for a Senior Perception ML Data Infrastructure Engineer based in Mountain View, California. This role involves taking ownership of the perception data platform, managing complex sensor data, and establishing systems that optimize data quality...
$179k - $218k
...Senior Staff Data Center Operations Engineer, GPU Hardware Architecture Crusoe is on a mission to accelerate... ...the only vertically integrated AI infrastructure company built from the ground up,... ...Operations & Telemetry: Leverage AI/ML methodologies to analyze fleet-wide...Temporary work$207k - $300k
Google Inc. is seeking a Staff Software Engineer specializing in Machine Learning for the Chat Back-end team in Sunnyvale, CA. Responsibilities include developing AI infrastructure, leading ML initiatives, and optimizing performance for Google Chat, a pivotal product within...- ...in Mountain View, California is looking for an experienced Data Engineer to design large-scale data pipelines and advanced data systems... ...in Python, and experience working with large-scale data infrastructure. This role offers a competitive salary range and a robust benefits...
- Icehouseventures is seeking a data engineer to work on Nuro's autonomous driving technology., where you'll design and develop large-scale data pipelines and storage systems. The role emphasizes collaborating with engineers to validate the system and comes with a competitive...
$213k - $263k
...across 15+ U.S. states. The ML Ops team, part of Waymo ML... ...Platform team, builds tools and infrastructure to realize the ML flywheel at... ...and contribute to Waymo's data infrastructure platform to enable... ...in the field of software engineering ~ Experience programming in...Full timeRemote work$160.36k - $240.54k
...Software Engineer, ML Data Infrastructure Mountain View, California (HQ) Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge AI with...Work experience placement$193.93k - $291.15k
...Sr. Software Engineer, Perception Data Infrastructure Mountain View, California (HQ) About the Role We are a team of high-output generalists where ML and systems engineering converge to push autonomy performance forward. As a Senior Perception ML Data Infrastructure...$191k - $315k
...relationships. By leveraging vast data and deploying sophisticated... ...with the product, engineering and data science team and has... ...years of technical leadership (Staff+) experience, including... ...experience with large scale ML data infrastructure ~ Experience with developing...For contractorsWork at officeFlexible hours$230k - $250k
A cutting-edge electric mobility company in Palo Alto is looking for a Staff Data Engineer. The candidate will design and manage ingestion pipelines for telemetry data, ensuring reliability and scalability. With 10+ years of experience needed, a strong background in AWS...Flexible hours$160k - $230k
...insurance technology company is seeking an experienced Staff Software Engineer to architect and build high-volume ETL pipelines.... ...with Clojure and extensive knowledge of cloud data warehousing and streaming infrastructure. The ideal candidate should have at least 8 years...Remote jobFlexible hours$153k - $222k
...remote work will be considered by exception.) About the role We are looking for infrastructure engineers with expertise in scaling open-source data infrastructure to join the Data & ML infra group. This role will work across the entire data lifecycle (collection, ingestion...Full timeFor contractorsFor subcontractorCasual workWork at officeRemote workDay shift- Astrocade is looking for a Data Engineer to create scalable data infrastructure from the ground up in Palo Alto, California. In this role, you will design data models, build pipelines, and ensure data quality. You will report directly to the CTO and work closely with the...
- Staff Hardware Engineer, CAD Infrastructure Achronix Semiconductor Corporation is a fabless semiconductor corporation based in Santa Clara, California... ...ready-to-use PCIe accelerator cards targeting AI, ML, networking and data center applications. All of Achronix's products...Night shift
- ...AI Infra Engineer We are looking for an AI Infra engineer to join our growing team. We... ...PyTorch, and primarily on AWS. As an AI Infrastructure Engineer, you will be partnering closely... ...and observability solutions tailored to ML workloads running on Kubernetes and Slurm...
- ...Staff Cloud Engineer (Architect) Palo Alto, California BrightAI is a high... ..., spatial, and temporal data across billions of real-world... ...high scale ~ Experience with Infrastructure as Code (IaC) using... ...Experience working with AI, ML or AR Experience with growing...
- Apple Inc. is seeking a Machine Learning Engineer to join the ADP ML Data Platform team in Cupertino, California. The role encompasses designing and building scalable systems for ML data and embeddings, optimizing AI models for production, and driving automation and reliability...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff+ Data Engineer (ML Infrastructure). Be the first to apply!
Related searches
- assistant mechanical engineer Palo Alto, CA
- staff data engineer Palo Alto, CA
- staff design engineer Palo Alto, CA
- engineering aide Palo Alto, CA
- software engineer staff Palo Alto, CA
- assistant electrical engineer Palo Alto, CA
- technology administrator Palo Alto, CA
- staff engineer Palo Alto, CA
- senior staff engineer Palo Alto, CA
- assistant engineer Palo Alto, CA

