Data Engineer, Scientific Data Ingestion
Mithrl
ABOUT MITHRL We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives. Mithrl is building the world’s first commercially available AI Co-Scientist—a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent‑ready reports. Our traction speaks for itself: 12X year‑over‑year revenue growth Trusted by leading biotechs and big pharma across three continents Driving real breakthroughs from target discovery to patient outcomes. WHAT YOU WILL DO Build and own an AI‑powered ingestion & normalization pipeline to import data from a wide variety of sources — unprocessed Excel/CSV uploads, lab and instrument exports, as well as processed data from internal pipelines. Develop robust schema mapping, coercion, and conversion logic (think: units normalization, metadata standardization, variable‑name harmonization, vendor‑instrument quirks, plate‑reader formats, reference‑genome or annotation updates, batch‑effect correction, etc.). Use LLM‑driven and classical data‑engineering tools to structure “semi‑structured” or messy tabular data — extracting metadata, inferring column roles/types, cleaning free‑text headers, fixing inconsistencies, and preparing final clean datasets. Ensure all transformations that should only happen once (normalization, coercion, batch‑correction) execute during ingestion — so downstream analytics / the AI “Co‑Scientist” always works with clean, canonical data. Build validation, verification, and quality‑control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform. Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems. WHAT YOU BRING Must‑have 5+ years of experience in data engineering / data wrangling with real‑world tabular or semi‑structured data. Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar). Excellent experience dealing with messy Excel / CSV / spreadsheet‑style data — inconsistent headers, multiple sheets, mixed formats, free‑text fields — and normalizing it into clean structures. Comfort designing and maintaining robust ETL/ELT pipelines, ideally for scientific or lab‑derived data. Ability to combine classical data engineering with LLM‑powered data normalization / metadata extraction / cleaning. Strong desire and ability to own the ingestion & normalization layer end‑to‑end — from raw upload → final clean dataset — with an eye for maintainability, reproducibility, and scalability. Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real‑world messy data problems into robust engineering solutions. Nice‑to‑have Familiarity with scientific data types and “modalities” (e.g. plate‑readers, genomics metadata, time‑series, batch‑info, instrumentation outputs). Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions. Experience with cloud infrastructure and data storage (AWS S3, data lakes/warehouses, database schemas) to support multi‑tenant ingestion. Past exposure to LLM‑based data transformation or cleansing agents — building or integrating tools that clean or structure messy data automatically. Any background in computational biology / lab‑data / bioinformatics is a bonus — though not required. WHAT YOU WILL LOVE AT MITHRL Mission‑driven impact: you’ll be the gatekeeper of data quality — ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis‑ready. You’ll have outsized influence over the reliability and trustworthiness of our entire data + AI stack. High ownership & autonomy: this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You’ll work closely with our product, data science, and infrastructure teams — shaping how data is ingested, stored, and exposed to end users or AI agents. Team: Join a tight‑knit, talent‑dense team of engineers, scientists, and builders Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution Speed: We ship fast (2x/week) and improve continuously based on real user feedback Location: Beautiful SF office with a high‑energy, in‑person culture Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top‑tier plans We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. #J-18808-Ljbffr
- ...A cutting-edge biotech company in San Francisco is seeking a Data Engineer to build AI-powered data ingestion pipelines from various sources. The role demands strong expertise in data engineering and Python, with a focus on data normalization and quality control. As part...Scientific
$100 per hour
...need an expert to design and implement our end-to-end data pipeline, from high-rate ingest to multi-petabyte storage and downstream processing. You... ...sustained writes. Ideal skills Experience with microscopy or scientific imaging ingest at frame-to-disk speeds, including Micro...ScientificContract workLocal areaImmediate start- ...A tech startup in data engineering seeks recent graduates in New York for a Data Engineer role. Key responsibilities include structuring and ingesting datasets, writing Python scripts for data manipulation, and setting up ETL pipelines. Candidates should have completed...Suggested
- ...patients in months, not years, and where scientific breakthroughs happen at the speed... ...Co-Scientist. It is a discovery engine that transforms messy biological data into insights in minutes.... ...bridge between biological knowledge ingestion and the high performance engineering...ScientificWork at office
- B Capital is seeking a Data Engineer for their San Francisco office to build and operate data ingestion systems. The role requires collaboration with researchers to enhance model performance through effective data management. The ideal candidate has experience in web crawling...SuggestedWork at office
- ...Our small team of scientists, engineers, and builders is passionate... ...Backed by marquee investors and scientific collaborators, we believe in... ...expertise in large‑scale data processing for LLM development... ...‑throughput systems for data ingestion, processing, and transformation...ScientificRemote work
- ...Advisor - Scientific Data Engineer page is loaded## Advisor - Scientific Data Engineerlocations: US, San Francisco CAtime type: Full timeposted... ...new data sources are AI-accessible from the point of ingestion, not retroactively* Design data products that serve multiple...Scientific
$185k - $221.4k
...You’ll Do Build and own production data infrastructure. Design, implement,... ...pipelines that feed intelligence layers; ingest clinical, financial, scientific, and commercial data from REST APIs... ...at the right time. Uphold high engineering standards and collaborate broadly....ScientificRemote workFlexible hours$165k - $190k
...This knowledge will allow us to engineer therapies that will work for... ...of leadership within the Data & Infrastructure team. You can... ...instrument integration, data ingestion, metadata standards), manage... ...science, not someone who treats scientific context as overhead. Would...ScientificFull time- ...time Location Type On-site Department Engineering Our Mission Reflection’s mission is to... ..., Anthropic and beyond. About the Role Data is playing an increasingly crucial role... ...your mission is to build and operate the ingestion systems that turn the open web and other...Full timeRelocation package
- ...A leading global healthcare company in San Francisco seeks a Scientific Data Engineer to design and build AI-ready data products and harmonization infrastructure. The ideal candidate will have extensive experience in data engineering, cloud platforms, and must hold a...Scientific
- ...Seeking Founding Data Scientists and Machine Learning Engineers Imagine Multiplying Your Impact You've unlocked major wins in your career - you'... ...mindset. 6+ years in production ML/DS; you balance scientific rigor with "it ships today, iteration on the way" pragmatism...ScientificRemote work
- ...Zyphra, an AI company in San Francisco, is seeking a Data Engineer to enhance datasets and data pipelines across various modalities. In... ...handling, Python programming, and a postgraduate degree in a scientific field. The position offers comprehensive benefits, including...Scientific
$204.5k - $267k
...Senior Data Engineer Locations: New York, NY; Boston, MA; San Francisco, CA; Raleigh-Durham, NC About Formation Bio Formation... ...Position We're looking for a Senior Data Engineer to join the Scientific Data Intelligence (SDI) team at Formation Bio to help...ScientificWork at officeLocal areaRelocation3 days per week$140k - $200k
...Clutch Canada is seeking a skilled Software Engineer to join Speechify's AI team. This role focuses on data collection to enhance model training, requiring 5+ years of software development experience. Ideal candidates should be proficient in bash/Python and Docker, with...Remote work$80 per hour
...Job Title: Agentic Analytics Engineer (contract) PR: $80/hr Contract Length: 12 months Location: Hybrid onsite in... ...SMPS) in Genentech, you will be responsible for integrating scientific and business data from multiple sources to generate agentic analytics...ScientificContract workWork experience placementImmediate start- ...intelligence company based in San Francisco, California. The Role: As a Data Engineer - Multimodal Systems , you will be a core contributor to... ...research in well-respected venues. Postgraduate degree in a scientific subject (Computer Science, EE/EECS, Mathematics, Physics,...ScientificWork at officeRelocation package
$158.81k - $198.49k
...Lead Scientific Data Engineer (Joint Genome Institute) Berkeley Lab’s ( LBNL ) Joint Genome Institute ( JGI ) has an opening for a Lead Scientific Data Engineer to join the Advanced Analysis Team! JGI has a long history of generating world‑class genomic data to address...ScientificFull timeWork at officeRemote workRelocation package$144k - $288k
...Senior Software Engineer, Data Cambridge, MA USA; San Francisco, CA USA Your Impact at... ...build cutting-edge tools for automated scientific analysis and more. If you thrive in a collaborative... ...they build. Their work spans real-time ingestion, large-scale analytical storage,...ScientificFull timeWork at officeLocal areaFlexible hours- ...Data Platform Engineer (Scientific Data & Storage Migration) Location: San Francisco, CA Onsite Duration: 6 months Responsibilities Plan and implement data migration from legacy storage platforms (mostly Isilon) to strategic storage platforms...Scientific
$170k - $210k
...their business. About the role We are seeking an experienced data engineer who has built enterprise-grade, cloud-native data infrastructure... ...computer science, engineering, math, biology, or a related scientific field. Additional hands-on certificates are great to have....ScientificTemporary workWork experience placementRemote workFlexible hours$204k - $348k
...Principal Software Engineer, Data Join us in shaping the future of science! We are seeking... ...build cutting-edge tools for automated scientific analysis and more. If you thrive in a collaborative... ...they build. Their work spans real-time ingestion, large-scale analytical storage,...ScientificFull timeWork at officeLocal areaFlexible hours$183k - $276k
...Amplitude is seeking a Senior Software Engineer to join their Data Pipeline team in San Francisco. You will tackle complex infrastructure challenges and collaborate with product teams to shape their roadmap. Ideal candidates have at least 5 years of Software Engineering...Flexible hours- The Lawrence Berkeley National Laboratory is hiring a Senior Scientific Data Engineer to join their Advanced Analysis Team at JGI. You will develop and enhance core scientific data systems essential for supporting genomic workflows and AI capabilities. This position requires...Scientific
$250k - $350k
Superhuman is looking for an Engineering Manager for their Data Platform team in San Francisco. The role involves overseeing data ingestion services and governance frameworks, managing a team of 6-8 engineers. Ideal candidates should have over 10 years of experience in...$197.3k - $313.7k
...are not duplicating efforts. Job Category Software Engineering Job Details About Salesforce Salesforce is the #1 AI... ...Slack is looking for a Staff Software Engineer to join the Data Ingestion Team. As part of the Data Engineering organization, we build...- ...A leading data collaboration platform in San Francisco is seeking an experienced Software Engineer to lead the development of their next-generation data processing platform. The ideal candidate will have over 5 years in software engineering, experience with object-oriented...Remote work
$180k - $260k
...latest Whatnot updates on our news and engineering blogs and join us as we enable anyone to... ...together through commerce. Role Data is crucial to Whatnot's mission to bring... ...expertise with modern data tooling across ingestion (e.g., Kafka, Debezium), transformation...Full timeWork at officeLocal areaRemote workWork from homeHome office$25 - $30 per hour
...and most technically advanced energy and data center infrastructure projects in the... ...seeking a highly motivated undergraduate Data Engineering & AI Enablement Intern to lead the... ...ownership of a defined problem space-from data ingestion and transformation to enabling AI-driven...Hourly payInternshipSummer internshipWork at officeLocal area$185k - $225k
...Data Engineer San Francisco (Hybrid) At You.com, we are building the AI Search Infrastructure that powers modern AI systems. Our... ...for someone who enjoys solving data challenges end-to-end from ingestion to insights. Responsibilities Build and maintain scalable...Full timeImmediate startRemote workWork from homeFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Data Engineer, Scientific Data Ingestion. Be the first to apply!
- senior data center engineer San Francisco, CA
- data engineer manager San Francisco, CA
- data science developer San Francisco, CA
- etl data engineer San Francisco, CA
- entry level big data engineer San Francisco, CA
- data engineer San Francisco, CA
- big data cloud engineer San Francisco, CA
- junior big data engineer San Francisco, CA
- remote data engineer San Francisco, CA
- senior data engineer San Francisco, CA


