Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Data Engineer, Scientific Data Ingestion

Mithrl

ABOUT MITHRL We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives. Mithrl is building the world’s first commercially available AI Co-Scientist—a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent‑ready reports. Our traction speaks for itself: 12X year‑over‑year revenue growth Trusted by leading biotechs and big pharma across three continents Driving real breakthroughs from target discovery to patient outcomes. WHAT YOU WILL DO Build and own an AI‑powered ingestion & normalization pipeline to import data from a wide variety of sources — unprocessed Excel/CSV uploads, lab and instrument exports, as well as processed data from internal pipelines. Develop robust schema mapping, coercion, and conversion logic (think: units normalization, metadata standardization, variable‑name harmonization, vendor‑instrument quirks, plate‑reader formats, reference‑genome or annotation updates, batch‑effect correction, etc.). Use LLM‑driven and classical data‑engineering tools to structure “semi‑structured” or messy tabular data — extracting metadata, inferring column roles/types, cleaning free‑text headers, fixing inconsistencies, and preparing final clean datasets. Ensure all transformations that should only happen once (normalization, coercion, batch‑correction) execute during ingestion — so downstream analytics / the AI “Co‑Scientist” always works with clean, canonical data. Build validation, verification, and quality‑control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform. Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems. WHAT YOU BRING Must‑have 5+ years of experience in data engineering / data wrangling with real‑world tabular or semi‑structured data. Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar). Excellent experience dealing with messy Excel / CSV / spreadsheet‑style data — inconsistent headers, multiple sheets, mixed formats, free‑text fields — and normalizing it into clean structures. Comfort designing and maintaining robust ETL/ELT pipelines, ideally for scientific or lab‑derived data. Ability to combine classical data engineering with LLM‑powered data normalization / metadata extraction / cleaning. Strong desire and ability to own the ingestion & normalization layer end‑to‑end — from raw upload → final clean dataset — with an eye for maintainability, reproducibility, and scalability. Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real‑world messy data problems into robust engineering solutions. Nice‑to‑have Familiarity with scientific data types and “modalities” (e.g. plate‑readers, genomics metadata, time‑series, batch‑info, instrumentation outputs). Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions. Experience with cloud infrastructure and data storage (AWS S3, data lakes/warehouses, database schemas) to support multi‑tenant ingestion. Past exposure to LLM‑based data transformation or cleansing agents — building or integrating tools that clean or structure messy data automatically. Any background in computational biology / lab‑data / bioinformatics is a bonus — though not required. WHAT YOU WILL LOVE AT MITHRL Mission‑driven impact: you’ll be the gatekeeper of data quality — ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis‑ready. You’ll have outsized influence over the reliability and trustworthiness of our entire data + AI stack. High ownership & autonomy: this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You’ll work closely with our product, data science, and infrastructure teams — shaping how data is ingested, stored, and exposed to end users or AI agents. Team: Join a tight‑knit, talent‑dense team of engineers, scientists, and builders Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution Speed: We ship fast (2x/week) and improve continuously based on real user feedback Location: Beautiful SF office with a high‑energy, in‑person culture Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top‑tier plans We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. #J-18808-Ljbffr

Vacancy posted 7 hours ago
Similar jobs that could be interesting for youBased on the Data Engineer, Scientific Data Ingestion in San Francisco, CA vacancy
  •  ...A cutting-edge biotech company in San Francisco is seeking a Data Engineer to build AI-powered data ingestion pipelines from various sources. The role demands strong expertise in data engineering and Python, with a focus on data normalization and quality control. As part... 
    Scientific

    Mithrl

    San Francisco, CA
    1 day ago
  • $100 per hour

     ...need an expert to design and implement our end-to-end data pipeline, from high-rate ingest to multi-petabyte storage and downstream processing. You...  ...sustained writes. Ideal skills Experience with microscopy or scientific imaging ingest at frame-to-disk speeds, including Micro... 
    Scientific
    Contract work
    Local area
    Immediate start

    Gofractional

    San Francisco, CA
    7 hours ago
  •  ...A tech startup in data engineering seeks recent graduates in New York for a Data Engineer role. Key responsibilities include structuring and ingesting datasets, writing Python scripts for data manipulation, and setting up ETL pipelines. Candidates should have completed... 
    Suggested

    Uncountable Inc

    San Francisco, CA
    6 hours ago
  •  ...patients in months, not years, and where scientific breakthroughs happen at the speed...  ...Co-Scientist. It is a discovery engine that transforms messy biological data into insights in minutes....  ...bridge between biological knowledge ingestion and the high performance engineering... 
    Scientific
    Work at office

    Mithrl

    San Francisco, CA
    1 day ago
  • B Capital is seeking a Data Engineer for their San Francisco office to build and operate data ingestion systems. The role requires collaboration with researchers to enhance model performance through effective data management. The ideal candidate has experience in web crawling... 
    Suggested
    Work at office

    B Capital

    San Francisco, CA
    1 day ago
  •  ...Our small team of scientists, engineers, and builders is passionate...  ...Backed by marquee investors and scientific collaborators, we believe in...  ...expertise in large‑scale data processing for LLM development...  ...‑throughput systems for data ingestion, processing, and transformation... 
    Scientific
    Remote work

    Titan Holdings

    San Francisco, CA
    2 days ago
  •  ...Advisor - Scientific Data Engineer page is loaded## Advisor - Scientific Data Engineerlocations: US, San Francisco CAtime type: Full timeposted...  ...new data sources are AI-accessible from the point of ingestion, not retroactively* Design data products that serve multiple... 
    Scientific

    Eli Lilly

    San Francisco, CA
    1 day ago
  • $185k - $221.4k

     ...You’ll Do Build and own production data infrastructure. Design, implement,...  ...pipelines that feed intelligence layers; ingest clinical, financial, scientific, and commercial data from REST APIs...  ...at the right time. Uphold high engineering standards and collaborate broadly.... 
    Scientific
    Remote work
    Flexible hours

    Foresite Labs

    San Francisco, CA
    4 days ago
  • $165k - $190k

     ...This knowledge will allow us to engineer therapies that will work for...  ...of leadership within the Data & Infrastructure team. You can...  ...instrument integration, data ingestion, metadata standards), manage...  ...science, not someone who treats scientific context as overhead. Would... 
    Scientific
    Full time

    REACH INDUSTRIES

    San Francisco, CA
    6 hours ago
  •  ...time Location Type On-site Department Engineering Our Mission Reflection’s mission is to...  ..., Anthropic and beyond. About the Role Data is playing an increasingly crucial role...  ...your mission is to build and operate the ingestion systems that turn the open web and other... 
    Full time
    Relocation package

    B Capital

    San Francisco, CA
    2 days ago
  •  ...A leading global healthcare company in San Francisco seeks a Scientific Data Engineer to design and build AI-ready data products and harmonization infrastructure. The ideal candidate will have extensive experience in data engineering, cloud platforms, and must hold a... 
    Scientific

    Eli Lilly

    San Francisco, CA
    6 hours ago
  •  ...Seeking Founding Data Scientists and Machine Learning Engineers Imagine Multiplying Your Impact You've unlocked major wins in your career - you'...  ...mindset. 6+ years in production ML/DS; you balance scientific rigor with "it ships today, iteration on the way" pragmatism... 
    Scientific
    Remote work

    Palladio AI, Inc

    San Francisco, CA
    4 days ago
  •  ...Zyphra, an AI company in San Francisco, is seeking a Data Engineer to enhance datasets and data pipelines across various modalities. In...  ...handling, Python programming, and a postgraduate degree in a scientific field. The position offers comprehensive benefits, including... 
    Scientific

    Zyphra

    San Francisco, CA
    6 hours ago
  • $204.5k - $267k

     ...Senior Data Engineer Locations: New York, NY; Boston, MA; San Francisco, CA; Raleigh-Durham, NC About Formation Bio Formation...  ...Position We're looking for a Senior Data Engineer to join the Scientific Data Intelligence (SDI) team at Formation Bio to help... 
    Scientific
    Work at office
    Local area
    Relocation
    3 days per week

    Formation Bio (Formerly TrailSpark)

    San Francisco, CA
    3 days ago
  • $140k - $200k

     ...Clutch Canada is seeking a skilled Software Engineer to join Speechify's AI team. This role focuses on data collection to enhance model training, requiring 5+ years of software development experience. Ideal candidates should be proficient in bash/Python and Docker, with... 
    Remote work

    Clutch Canada

    San Francisco, CA
    6 hours ago
  • $80 per hour

     ...Job Title: Agentic Analytics Engineer (contract) PR: $80/hr Contract Length: 12 months Location: Hybrid onsite in...  ...SMPS) in Genentech, you will be responsible for integrating scientific and business data from multiple sources to generate agentic analytics... 
    Scientific
    Contract work
    Work experience placement
    Immediate start

    Medasource

    San Francisco, CA
    4 days ago
  •  ...intelligence company based in San Francisco, California. The Role: As a Data Engineer - Multimodal Systems , you will be a core contributor to...  ...research in well-respected venues. Postgraduate degree in a scientific subject (Computer Science, EE/EECS, Mathematics, Physics,... 
    Scientific
    Work at office
    Relocation package

    Zyphra

    San Francisco, CA
    5 hours ago
  • $158.81k - $198.49k

     ...Lead Scientific Data Engineer (Joint Genome Institute) Berkeley Lab’s ( LBNL ) Joint Genome Institute ( JGI ) has an opening for a Lead Scientific Data Engineer to join the Advanced Analysis Team! JGI has a long history of generating world‑class genomic data to address... 
    Scientific
    Full time
    Work at office
    Remote work
    Relocation package

    Lawrence Berkeley Lab

    San Francisco, CA
    5 hours ago
  • $144k - $288k

     ...Senior Software Engineer, Data Cambridge, MA USA; San Francisco, CA USA Your Impact at...  ...build cutting-edge tools for automated scientific analysis and more. If you thrive in a collaborative...  ...they build. Their work spans real-time ingestion, large-scale analytical storage,... 
    Scientific
    Full time
    Work at office
    Local area
    Flexible hours

    Lila Sciences

    San Francisco, CA
    3 days ago
  •  ...Data Platform Engineer (Scientific Data & Storage Migration) Location: San Francisco, CA Onsite Duration: 6 months Responsibilities Plan and implement data migration from legacy storage platforms (mostly Isilon) to strategic storage platforms... 
    Scientific

    AceStack LLC

    San Francisco, CA
    3 days ago
  • $170k - $210k

     ...their business. About the role We are seeking an experienced data engineer who has built enterprise-grade, cloud-native data infrastructure...  ...computer science, engineering, math, biology, or a related scientific field. Additional hands-on certificates are great to have.... 
    Scientific
    Temporary work
    Work experience placement
    Remote work
    Flexible hours

    Sleuth Insights

    San Francisco, CA
    6 hours ago
  • $204k - $348k

     ...Principal Software Engineer, Data Join us in shaping the future of science! We are seeking...  ...build cutting-edge tools for automated scientific analysis and more. If you thrive in a collaborative...  ...they build. Their work spans real-time ingestion, large-scale analytical storage,... 
    Scientific
    Full time
    Work at office
    Local area
    Flexible hours

    Lila Sciences

    San Francisco, CA
    5 days ago
  • $183k - $276k

     ...Amplitude is seeking a Senior Software Engineer to join their Data Pipeline team in San Francisco. You will tackle complex infrastructure challenges and collaborate with product teams to shape their roadmap. Ideal candidates have at least 5 years of Software Engineering... 
    Flexible hours

    Amplitude

    San Francisco, CA
    6 hours ago
  • The Lawrence Berkeley National Laboratory is hiring a Senior Scientific Data Engineer to join their Advanced Analysis Team at JGI. You will develop and enhance core scientific data systems essential for supporting genomic workflows and AI capabilities. This position requires... 
    Scientific

    Lawrence Berkeley National Laboratory

    San Francisco, CA
    2 days ago
  • $250k - $350k

    Superhuman is looking for an Engineering Manager for their Data Platform team in San Francisco. The role involves overseeing data ingestion services and governance frameworks, managing a team of 6-8 engineers. Ideal candidates should have over 10 years of experience in... 

    Superhuman

    San Francisco, CA
    1 day ago
  • $197.3k - $313.7k

     ...are not duplicating efforts. Job Category Software Engineering Job Details About Salesforce Salesforce is the #1 AI...  ...Slack is looking for a Staff Software Engineer to join the Data Ingestion Team. As part of the Data Engineering organization, we build... 

    Salesforce.Com Inc

    San Francisco, CA
    1 day ago
  •  ...A leading data collaboration platform in San Francisco is seeking an experienced Software Engineer to lead the development of their next-generation data processing platform. The ideal candidate will have over 5 years in software engineering, experience with object-oriented... 
    Remote work

    LiveRamp

    San Francisco, CA
    6 hours ago
  • $180k - $260k

     ...latest Whatnot updates on our news and engineering blogs and join us as we enable anyone to...  ...together through commerce. Role Data is crucial to Whatnot's mission to bring...  ...expertise with modern data tooling across ingestion (e.g., Kafka, Debezium), transformation... 
    Full time
    Work at office
    Local area
    Remote work
    Work from home
    Home office

    Whatnot

    San Francisco, CA
    5 days ago
  • $25 - $30 per hour

     ...and most technically advanced energy and data center infrastructure projects in the...  ...seeking a highly motivated undergraduate Data Engineering & AI Enablement Intern to lead the...  ...ownership of a defined problem space-from data ingestion and transformation to enabling AI-driven... 
    Hourly pay
    Internship
    Summer internship
    Work at office
    Local area

    SB Energy

    San Francisco, CA
    4 days ago
  • $185k - $225k

     ...Data Engineer San Francisco (Hybrid) At You.com, we are building the AI Search Infrastructure that powers modern AI systems. Our...  ...for someone who enjoys solving data challenges end-to-end from ingestion to insights. Responsibilities Build and maintain scalable... 
    Full time
    Immediate start
    Remote work
    Work from home
    Flexible hours

    Y.O.U.

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Data Engineer, Scientific Data Ingestion. Be the first to apply!