Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Scientific Lead - Scientific Data Engineer

Eli Lilly & Co

Advisor - Scientific Data Engineer page is loaded## Advisor - Scientific Data Engineerlocations: US, San Francisco CAtime type: Full timeposted on: Posted Todayjob requisition id: R-103720At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world.# The OpportunityWe are building something unprecedented — an AI foundation that will push the frontier on what is possible today across drug discovery research, from target identification and disease biology through translational science.The Applied Intelligence for Discovery (AI4D) team is a newly formed group within Lilly Research Laboratories that operates at the intersection of scientific delivery and core platform development. AI4D’s mission is connecting scientists to petabyte-scale data through natural language interfaces, automated analysis workflows, and intelligent search — and to convert early deployments into repeatable system standards and evaluation practices that scale across therapeutic areas.As a Scientific Data Engineer, you will close that gap. You will build the semantic layer, data harmonization infrastructure, AI-ready data products, and lakehouse architecture that bridge how data is stored and how AI systems need to consume it. You will be working at the intersection of the data infrastructure team and the generative AI engineers who build the systems scientists interact with.# # Responsibilities## Data Harmonization and Lakehouse Architecture* Design and build the data architecture that transforms raw and processed omics data into harmonized, AI-consumable layers* Build and optimize ETL/ELT pipelines that produce denormalized views, pre-computed aggregations, embedding-ready text representations, and feature stores optimized for AI system consumption* Implement data quality monitoring, automated profiling, and validation checks across harmonization layers* Create versioned, reproducible data snapshots that support model training, evaluation, and audit requirements in a regulated environment* Partner with the teams to extend harmonization patterns as data modalities expand beyond genomics and proteomics into spatial transcriptomics, perturbational data (Perturb-Seq), single-cell, and digital pathology## ## Semantic Layer and Schema Engineering* Design and maintain a semantic layer over Lilly’s multi-omics databases that enables AI systems* Create comprehensive schema documentation: table descriptions, column-level annotations, relationship mappings, business logic rules, and domain-specific constraints (e.g., statistical thresholds, unit conventions, experimental design metadata)* Develop gold-standard question/SQL pairs for each major database, in collaboration with computational biologists and Generative AI Engineers, to serve as training data, few-shot examples, and evaluation benchmarks* Build and maintain a data dictionary and ontology mapping layer that translates how scientists think and speak about data (gene names, pathway terms, assay types) into how the data is physically stored## ## AI-Ready Data Products* Build and manage vector embedding pipelines for scientific documents, study metadata, and structured data descriptions to power RAG-based retrieval* Build integration pipelines that connect heterogeneous data sources — omics databases, internal publications, electronic lab notebooks, assay results, and clinical annotations — into a unified, queryable layer* Develop and enforce metadata standards that ensure new data sources are AI-accessible from the point of ingestion, not retroactively* Design data products that serve multiple consumption patterns: direct SQL access for computational biologists, structured feeds for ML training pipelines, and semantic interfaces for LLM-powered tools# Qualifications* Bachelors degree in Computer Science, Data Engineering, Bioinformatics, or a related field + 8 years data engineering experience OR Masters degree and 5 years data engineering experience* Demonstrated expertise in building data pipelines, ETL/ELT workflows, and data products that serve downstream AI/ML systems# # Additional Skills/Preferences* Phd in data or related field* Strong SQL skills and experience with complex relational database schemas (hundreds of tables, multi-level joins, domain-specific conventions)* Experience with modern data platform technologies, including at least one of: Databricks, Snowflake, or equivalent lakehouse platforms* Experience with modern data engineering tools: dbt, Spark, Airflow, or similar orchestration and transformation frameworks* Proficiency in Python for data processing, scripting, and pipeline development* Experience with cloud data platforms (AWS preferred: Redshift, Athena, Glue, S3, or similar)* Familiarity with at least one of: vector databases, embedding pipelines, or semantic layer tooling* Strong communication skills — you can work effectively with both engineers who think in schemas and scientists who think in biology* Experience with biomedical or scientific data: omics datasets (RNA-seq, proteomics, GWAS), clinical data, or laboratory information management systems* Experience in pharmaceutical, biotech, or life sciences environments* Familiarity with biomedical ontologies and controlled vocabularies (Gene Ontology, MeSH, ChEBI, HGNC) and their application to data integration* Experience building data products that serve AI/ML systems — feature stores, training datasets, evaluation benchmarks, or semantic annotations for text-to-SQL* Knowledge of data governance practices in regulated industries: data lineage, access controls, versioning, and auditability* Experience with knowledge graph technologies (Neo4j, Amazon Neptune, RDF/SPARQL) or graph-based data modeling* Deep experience with Databricks ecosystem: Unity Catalog for data governance, Delta Lake for ACID transactions, MLflow integration, and Databricks SQL for analytics workloads* Experience designing data architectures that bridge traditional bioinformatics workflows (Nextflow, R/Bioconductor) with modern lakehouse consumption patternsLilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form () for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.Lilly is proud to be an EEO Employer and does not discriminate on the basis of age, race, color, religion, gender identity, sex, gender expression, sexual orientation, genetic information, ancestry, national origin, protected veteran status, disability, or any other legally protected status.Our employee resource groups (ERGs) offer strong support networks for their members and are open to all employees. Our current groups include: Africa, Middle East, Central Asia Network, Black Employees at Lilly, Chinese Culture Network, Japanese International Leadership Network (JILN), Lilly India Network, Organization of Latinx at Lilly (OLA), PRIDE (LGBTQ+ Allies), Veterans Leadership Network (VLN), Women’s Initiative for Leading at Lilly (WILL), enAble (for people with disabilities). Learn more about all of our groups.Actual compensation will depend on a candidate’s education, experience, skills, and geographic location. #J-18808-Ljbffr Eli Lilly and Company

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Scientific Lead - Scientific Data Engineer in San Francisco, CA vacancy
  • $158.81k - $198.49k

     ...Lead Scientific Data Engineer Berkeley Lab's (LBNL) Joint Genome Institute (JGI) has an opening for a Lead Scientific Data Engineer to join the Advanced Analysis Team! JGI has a long history of generating world-class genomic data to address pressing national energy... 
    Scientific
    Full time
    Work at office
    Remote work
    Relocation package

    Berkely Lab

    San Francisco, CA
    3 days ago
  • $166.5k - $266.2k

     ...operates at the intersection of scientific delivery and core platform...  ...scientists to petabyte-scale data through natural language interfaces...  .... As a Scientific Data Engineer, you will close that gap. You...  ...VLN), Women's Initiative for Leading at Lilly (WILL), enAble (for... 
    Scientific
    Full time
    Flexible hours

    Eli Lilly

    San Francisco, CA
    4 days ago
  • Lawrence Berkeley National Laboratory is seeking a Lead Scientific Data Engineer to join the Advanced Analysis Team. This role involves providing technical leadership for core scientific data platforms and developing data system architectures for AI-enabled scientific discovery... 
    Scientific

    Lawrence Berkeley National Laboratory

    Berkeley, CA
    1 day ago
  •  ...Lead Data Engineer With MarTech We are seeking an experienced Lead Data Engineer with strong MarTech expertise to lead the design and development of scalable marketing data platforms and real-time customer engagement systems. The ideal candidate will have deep experience... 
    Suggested
    Contract work
    2 days per week

    Staffing the Universe

    San Francisco, CA
    3 days ago
  •  ...based on user interactions. Visualize data for business teams. Develop and...  ...Spark streaming. Leadership Duties: Lead the measurement processes from requirements...  ...product managers. Balance between hands-on engineering (50%) and team leadership (50%).... 
    Suggested
    Local area

    My3Tech Inc

    San Francisco, CA
    5 days ago
  •  ...Lead Data Engineer With MarTech Location: SFO, CA (Hybrid 2 days a week) Key Responsibilities Lead end-to-end MarTech engineering initiatives across orchestration, data processing, and activation pipelines. Architect scalable, event-driven systems that... 
    2 days per week

    Georgia IT Inc

    San Francisco, CA
    4 days ago
  • $191.52k - $212.8k

     ...LE POSTE VOTRE PROFIL Lead Data Engineer Enterprise Reporting & Analytics Publiée le 05.05.2026 Sephora Tech Référence : 288055 Ajouter aux favoris Ouvrir Partager Localisation : San Francisco, United States Type de contrat :... 
    Permanent employment
    Full time

    LVMH

    San Francisco, CA
    4 days ago
  • $215.2k - $245.6k

     ...Lead Data Engineer Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a big group of makers, breakers... 
    Full time
    Part time
    Internship
    H1b
    Local area

    Capital One Financial Corp

    San Francisco, CA
    1 day ago
  • $208k - $260k

     ...principles: a rigorous understanding of data, modern technology, and most importantly,...  ...actuarial science, and research. The Data Engineer team is a core part of the broader Data...  ...of industry experience with technical lead experience of running a data platform for... 
    Shift work

    Nuna Inc

    San Francisco, CA
    4 days ago
  •  ...Lead Data Engineer The Office of Information Technology (IT) is responsible for enabling State Bar's internal and external stakeholders by the management, implementation, and maintenance of an organization's technology to support State Bar's mission and goals. The... 
    Work at office

    GovernmentJobs.com

    San Francisco, CA
    3 days ago
  •  ...Lead Data Engineer RADIUMONE IS A GLOBAL PROGRAMMATIC AD BUYING PLATFORM RadiumOne is the 6th largest web property in the U.S. according to comScore We build intelligent software that automates media buying, making big data actionable for marketers and connects... 

    Stepping Up Solutions

    San Francisco, CA
    1 day ago
  •  ...Hello, I am Mohammed Dastagir with Saxon Global Inc wanted to let you know about the job opportunity for Lead Data Engineer position if interested please share your updated resume along with expecting rate. Title: Lead Data Engineer Location:806... 

    Saxon Global

    San Francisco, CA
    1 day ago
  •  ...Job Title Mandatory Skills: (Oracle or PostgreSQL) and ETL Pipelines and Big Data and AWS Responsibilities · Uses structured tools for analysis and presentation of concepts and models to enhance the BRD · Develop, maintain and deliver training materials to the... 
    Work experience placement

    Omega Solutions Inc

    San Francisco, CA
    4 days ago
  •  ...Description POSITON DESCRIPTION We are seeking a  Lead Data Engineer to architect, build, and lead the development of scalable, cloud-based data platforms that support enterprise analytics, operational reporting, and advanced data use cases. This role provides... 

    Q-Cells

    San Francisco, CA
    1 day ago
  •  ...tools • Write SQL for processing raw data, kafka ingestions, adf pipelines, data validation...  ...protect sensitive information. Lead, design and implement innovative...  ...technologies Work with product and engineering team to understand requirements, evaluate... 

    BayOne Solutions

    San Francisco, CA
    1 day ago
  •  ...Lead Data Engineer The Office of Information Technology (IT) is responsible for enabling State Bar's internal and external stakeholders by the management, implementation, and maintenance of an organization's technology to support of State Bar's mission and goals. The... 
    Work at office

    State Bar CA

    San Francisco, CA
    1 day ago
  •  ...Seeking Founding Data Scientists and Machine Learning Engineers Imagine Multiplying Your Impact You've unlocked major wins in your career - you'...  ...mindset. 6+ years in production ML/DS; you balance scientific rigor with "it ships today, iteration on the way" pragmatism... 
    Scientific
    Remote work

    Palladio AI, Inc

    San Francisco, CA
    2 days ago
  • A dynamic technology company is seeking a Lead Data Engineer for a hybrid role in San Francisco, emphasizing expertise in Databricks and Datastage. The ideal candidate will lead an offshore engineering team and drive migration from SQL to Databricks while developing cloud... 

    Insight Global

    San Francisco, CA
    1 day ago
  • $191.52k - $212.8k

    Sephora USA, Inc is seeking a Lead Engineer based in San Francisco, CA. The successful candidate will design and implement analytical solutions...  ...8 years of experience in software development, strong SQL and data warehousing skills, and experience with AI/ML tools. A... 

    Sephora USA, Inc

    San Francisco, CA
    4 days ago
  • Hebbia, Inc. is seeking its first Data Engineer to refine data infrastructure and drive best practices for data pipelines in San Francisco. The ideal candidate has over 5 years of software development experience focused on data engineering, alongside a Bachelor's or Master... 

    Hebbia, Inc.

    San Francisco, CA
    2 days ago
  • $215.2k - $245.6k

    Lead Data Engineer Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast‑paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you’ll be part of a big group of makers, breakers... 
    Internship
    Local area

    Capital One National Association

    San Francisco, CA
    2 days ago
  • ABOUT THE ROLE We are looking for a Snowflake Data Engineer to join an insurance-focused delivery team supporting a major client programme. This is a hands‑on engineering role for someone who is strong in Snowflake and Python, and who can help build reliable data solutions... 

    Komforce

    San Francisco, CA
    1 day ago
  •  ...and disruptors, who solve real problems and meet real customer needs. We are seeking Data Engineers who are passionate about marrying data with emerging technologies. As a Capital One Lead Data Engineer, you’ll have the opportunity to be on the forefront of driving a... 
    Internship
    H1b
    Local area

    Capital One

    San Francisco, CA
    5 days ago
  • A leading sleep technology company is looking for a Data Engineer to drive the construction of data infrastructure that supports millions of users. This role requires 6+ years of experience building data platforms and mastery of tools like SQL and Python. You will work... 

    Eight Sleep

    San Francisco, CA
    2 days ago
  • A consumer fintech startup in San Francisco is seeking a Lead Data Engineer to build and optimize data engineering functions. You will establish data architecture, mentor junior engineers, and collaborate with cross-functional teams. The position requires extensive data... 
    Full time

    Cerebras

    San Francisco, CA
    2 days ago
  • Inside Lvmh is seeking a Lead Engineer based in San Francisco, CA. This full-time position involves leading the design and implementation...  ...collaborating closely with teams across the organization to drive data quality and adopting new AI/ML technologies. Candidates should... 
    Full time

    Inside Lvmh

    San Francisco, CA
    5 days ago
  • $172.5k - $260.1k

     ...heart of it all.Ready to level-up your career at the company leading workforce transformation in the agentic era? You’re in the right...  ...you are the future of Salesforce.Salesforce is looking for a Data Engineer to join the Data & Analytics organization and help power the future... 

    Salesforce

    San Francisco, CA
    3 days ago
  • Job Description Title: Lead Data Engineer Location: Hybrid in SF (Tuesdays onsite) Openings: 1 Work Schedule: (available until 12 am PT time to overlap with onshore team) Follow-Up Meeting: After each interview is scheduled. Contract Type: 12 months contract extensions... 
    Contract work

    Insight Global

    San Francisco, CA
    1 day ago
  • A cutting-edge biotech company in San Francisco is seeking a Data Engineer to build AI-powered data ingestion pipelines from various sources. The role demands strong expertise in data engineering and Python, with a focus on data normalization and quality control. As part... 
    Scientific

    Mithrl

    San Francisco, CA
    4 days ago
  •  ...patients in months, not years, and where scientific breakthroughs happen at the speed...  ...Co-Scientist. It is a discovery engine that transforms messy biological data into insights in minutes....  ...year revenue growth ~ Trusted by leading biotechs and big pharma across three... 
    Scientific
    Work at office

    Mithrl

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Scientific Lead - Scientific Data Engineer. Be the first to apply!