Data Architect, Data Foundry
$132k - $193.6kEli Lilly
Data Architect, Data Foundry
At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We're looking for people who are determined to make life better for people around the world.
Lilly Small Molecule Discovery is purpose-built to create molecules that make life better for people. Discovery Technology and Platforms (DTP) accelerates molecule discovery by building optimized foundational platforms, streamlining lab operations through advanced technologies and data connectivity, and investing in novel capabilities.
Data Foundry is a multidisciplinary team within DTP that enables AI-native drug discovery through four integrated pillars: Architecture4Insight (data infrastructure and scientific software), Methods4Insight (analytical and computational methods), Automation & Scale4Insight (lab automation and agentic workflows), and Preparedness4Insight (data governance and readiness). These pillars empower every Lilly scientist to make optimal decisions by providing seamless access to data, insights, and AI-driven capabilities—serving both human scientists and autonomous AI agents.
We are seeking Data Architects at multiple levels to design and build the data infrastructure that makes AI-native drug discovery possible. You will create the schemas, ontologies, data models, knowledge graphs, and platform architectures that transform raw scientific data into machine-actionable, FAIR-compliant, insight-ready assets—serving both discovery scientists and autonomous AI agents.
This role is the foundation of Architecture4Insight. Everything the software engineering team builds—pipelines, APIs, prototypes—depends on the data models and platform architecture this team designs. You will work with deep knowledge of scientific data (chemical, biological, HTE, automation-generated) to create custom-fit solutions, then partner with View email address on click.appcast.io to scale and maintain them. The role spans three focus areas depending on expertise: data modeling & ontologies, data platform & lakehouse architecture, and knowledge graph & specialized data systems.
Data Modeling & Ontologies
- Design and implement data models, schemas, and ontologies for chemical, biological, and automation-generated data that serve discovery workflows across the portfolio.
- Define and maintain controlled vocabularies, metadata standards, and FAIR-compliant data frameworks in partnership with Preparedness4Insight.
- Implement semantic data standards (RDF, OWL, SPARQL) and ontology engineering practices to create interoperable, machine-readable scientific data.
Data Platform & Lakehouse Architecture
- Design and implement data lakehouse architecture using modern platforms (Databricks, Snowflake, or equivalent), including data storage patterns, partitioning strategies, and query optimization.
- Build and optimize ETL/ELT pipelines using Spark, dbt, or similar tools to transform raw scientific data into analytical and ML-ready formats.
- Implement real-time and streaming data integration (Kafka, Kinesis, event-driven patterns) connecting LIMS, instruments, and lab automation systems to the data infrastructure.
Knowledge Graph & Specialized Data Systems
- Design and implement knowledge graphs (Neo4j, Amazon Neptune, TigerGraph) that capture molecular, target, pathway, and experimental relationships across the discovery landscape.
- Architect specialized data solutions: array databases (TileDB) for genomics/imaging, document stores (MongoDB) for experimental records, and vector databases for embedding-based retrieval supporting ML and RAG workflows.
- Build query and traversal patterns that enable scientists and AI agents to ask relational questions across the entire data landscape.
Cross-Functional Partnership
- Partner with scientific software engineers to ensure data architectures are implementable, performant, and well-documented.
- Collaborate with Methods4Insight to design data structures that support analytical model training, deployment, and evaluation.
- Work with View email address on click.appcast.io to define scaling strategies, ensure enterprise compliance, and transition data architectures to production-grade management.
- Contribute to build-versus-buy-versus-adopt decisions by evaluating commercial and open-source data platforms against Data Foundry requirements.
Basic Requirements
- B.S. or M.S. in Computer Science, Data Science, Bioinformatics, Computational Biology, Information Science, or related STEM field; Ph.D. valued for ontology and knowledge graph roles.
- B.S. with 7+ years and M.S. with 5+ years of data architecture, data engineering, or scientific informatics' experience.
- SQL skills and experience in multiple database paradigms (relational, graph, document, columnar, key-value).
- Qualified applicants must be authorized to work in the United States on a full-time basis. Lilly will not provide support for or sponsor work authorization or visas for this role, including but not limited to F-1 CPT, F-1 OPT, F-1 STEM OPT, J-1, H-1B, TN, O-1, E-3, H-1B1, or L-1.
Preferred Qualifications
- Expertise in at least one of: data modeling/ontologies, data platform engineering (Databricks, Snowflake, Spark), or graph/specialized databases (Neo4j, Neptune, MongoDB).
- Familiarity with cloud platforms (AWS, Azure, or GCP) and modern data integration patterns.
- Understanding of scientific data types and experimental workflows in life sciences or pharma (chemical, biological, HTE data).
- Strong communication skills with ability to translate data architecture concepts for both technical and scientific audiences.
- Pharmaceutical or biotech research industry experience, particularly in discovery data management or research informatics.
- Experience with semantic web technologies: RDF, OWL, SPARQL, Protégé, or equivalent ontology engineering tools.
- Hands-on experience with graph databases (Neo4j, Neptune, TigerGraph) and knowledge graph design patterns for scientific data.
- Data lakehouse architecture experience: Databricks (Delta Lake, Unity Catalog), Snowflake, or equivalent; ETL/ELT with Spark, dbt.
- Experience with streaming/real-time data platforms (Kafka, Kinesis, Flink) and event-driven architectures.
- Familiarity with LIMS, ELN systems (e.g., Benchling), and laboratory instrument data integration.
- Experience with vector databases (Pinecone, Weaviate, pgvector) and embedding-based retrieval for ML/RAG applications.
- Array database experience (TileDB, Zarr) for genomics, imaging, or high-dimensional scientific data.
- Experience with bioinformatics data formats (FASTA, BAM/CRAM, VCF) and biological sequence databases; familiarity with NGS data pipelines and proteomics data management.
- FAIR data principles implementation experience and Data Readiness Level frameworks.
- Scientific data standards and controlled vocabularies in chemistry (InChI, SMILES) or biology (Gene Ontology, UniProt, pathway databases such as Reactome or KEGG).
Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.
Lilly is proud to be an EEO Employer and does not discriminate on the basis of age, race, color, religion, gender identity, sex, gender expression, sexual orientation, genetic information, ancestry, national origin, protected veteran status, disability, or any other legally protected status.
Our employee resource groups (ERGs) offer strong support networks for their members and are open to all employees. Our current groups include: Africa, Middle East, Central Asia Network, Black Employees at Lilly, Chinese Culture Network, Japanese International Leadership Network (JILN), Lilly India Network, Organization of Latinx at Lilly (OLA), PRIDE (LGBTQ+ Allies), Veterans Leadership Network (VLN), Women's Initiative for Leading at Lilly (WILL), enAble (for people with disabilities). Learn more about all of our groups.
Actual compensation will depend on a candidate's education, experience, skills, and geographic location. The anticipated wage for this position is $132,000 - $193,600. Full-time equivalent employees also will be eligible for a company bonus (depending, in part, on company and
- ...Data Architect At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader... ...connectivity, and investing in novel capabilities. Data Foundry is a multidisciplinary team within DTP that enables AI-native...Suggested
$144k - $329.1k
...A leading consulting firm is seeking a Senior Manager in Data Architecture to drive innovative projects in the Power & Utilities Sector. This role requires over 12 years of consulting experience, with a strong focus on data governance and cloud solutions. The successful...SuggestedFlexible hours- ...Data Architect We are looking for an experienced Data Architect with a deep understanding of Capsenta to lead our data modeling and architecture efforts. The ideal candidate will play a crucial role in designing, implementing, and optimizing data solutions that align...Suggested
- ...A global healthcare leader is seeking a Data Architect to design and build data infrastructure for AI-native drug discovery. This role involves creating schemas, ontologies, and data models to transform scientific data into actionable insights. Candidates should possess...Suggested
- ...other requirements for the position, and employer business practices. **What You Will Do** Reviews, analyzes and designs integrated data models that will be implemented within a MS/SQL or Oracle database environment. Provides expertise in full life cycle development for...SuggestedHourly payFlexible hoursShift workWeekend work
- ...Job Description: Sr. Data Architect healthcare san diego, ca $$high pay$$ The Sr. Data Architect will be responsible for the implementation of an enterprise HDFS environment that will provide long-term support to Sharp HealthCare advanced analytics....Flexible hours
$144k - $329.1k
...go. Join EY and help to build a better working world. Role AI & Data - Data Architecture – Senior Manager – Power & Utilities Sector... ...quality solutions. Your responsibilities will include: As Data Architect – Senior Manager, you will have an expert understanding of data...Summer holidayFlexible hours$160k - $200k
...Join the fastest growing data companies going! Awesome opportunity for growth. This Jobot Job is hosted by: Adam Bennett Are you a fit? Easy Apply now by clicking the Apply button and sending us your resume. Salary: $160,000 - $200,000 per year A bit about...Local area$150k - $210k
RadNet, Inc. is seeking an experienced Consulting Systems Engineer to lead technical sales for key accounts. The ideal candidate will bring 8+ years of solutions architecture expertise, design innovative cloud solutions, and maintain strong relationships with clients. Located...$124k - $280k
...Specialty/Competency: Data, Analytics & AI Industry/Sector: Health Services Time Type: Full time Travel Requirements:... ...FHIR R4, and interoperability standards - Proven track record architecting data solutions for healthcare programs - Navigating HIPAA compliance...Full timeH1b$175k - $200k
...The Marlin Alliance is seeking a forward-thinking Data Engineer/Data Architect in San Diego, CAto provide client support to our Navy client. This is an on-site role and applicants must have the ability to obtain a DoD Secret Clearance. Incorporated in 2002, The Marlin...Contract workWork at office- ...Cloud Data & AI Architect - Enterprise Platforms Location: US / Canada (Remote/Hybrid) Type: Contract / Full-Time Overview: We are seeking a Cloud Data & AI Architect to lead the design and delivery of enterprise-scale data and AI platforms....Full timeContract workRemote work
- ...Database Architect T3W Business Solutions, Inc. is a Woman-Owned Small Business with Headquarters located in San Diego, CA. It is our... ...resources resulting in maximum benefits; we also deliver quality data and analysis to support our client's daily facility operations,...Contract work
$150k - $220k
...Join the fastest growing Data engineering solutions company! This Jobot Job is hosted by: Adam Bennett Are you a fit? Easy Apply now by clicking the Apply button and sending us your resume. Salary: $150,000 - $220,000 per year A bit about us: This is one...Local areaVisa sponsorship- ...Sr. Database Architect T3W Business Solutions, Inc. is a Woman-Owned Small Business with Headquarters located in San Diego, CA. It is... ...resources resulting in maximum benefits; we also deliver quality data and analysis to support our client's daily facility operations,...Contract workFor contractorsFor subcontractor
- ...individuals to join our teams. Currently, we are seeking a Database Architect - Secret Cleared. This role supports a Navy customer in San... ...Responsibilities: Design strategies for enterprise databases, data warehouse systems, and multidimensional networks. Set...Contract workRemote workFlexible hours
$151.5k - $244.2k
...building optimized foundational platforms, streamlining lab operations through advanced technologies and data connectivity, and investing in novel capabilities. Data Foundry is a multidisciplinary team within DTP that enables AI-native drug discovery through four...Full timeFlexible hours- ...Data Modeler Lead Location: San Diego, CA (Onsite) Experience Required: 12+ years hands-on in Data Modeling with Insurance domain background. Responsibilities & Skills Data Modeling: Logical and Physical Data Models for Snowflake Data Warehouse...
- Construction Laborer This is the job post for a Construction Laborer. The main content focuses on the job title and description. The post does not include any links, buttons, or extraneous metadata. The formatting has been cleaned up to ensure a high signal to noise...
- ...efficiency.* Support database deployments and orchestration with **EKS (Kubernetes)**.* Collaborate with developers to design **efficient data models, queries, and indexing strategies**.* Manage database schema versioning and scripts with **GitHub and Flyway**.* Monitor and...Worldwide
- ...maintenance, and optimization of database systems for a high-visibility Department of the Navy contract. The ideal candidate will ensure data integrity, performance, security, and availability of mission-critical systems. Responsibilities: Design, develop, and...Contract work
- Sr. Clinical Data Coordinator Job Duties : Support laboratory data management procedures including authoring and reviewing data transfer plans, overseeing data transfers, processing and storage of data. Use templates and standard guidelines to initiate documents but exercises...
- FinTech Database Systems Engineer This position is to work in the FinTech area of Client. Our division provides electronic payments and banking solutions in SaaS using AI/Client to determine risk. This position will be focusing on designing new database systems, optimizing...
$30 per hour
...corporate office. The SHEIN DBRE team is a mix of DBA/SRE/DBRE oriented folks whose overarching goal is to provide highly available data services at scale. They strive to build an extremely reliable, performant, and secure database infrastructure through the skillful use...InternshipWork at officeRemote work$33.5 - $43.5 per hour
...role. The successful candidate will be responsible for designing, implementing, and maintaining database solutions and maintaining data governance. Daily duties will be focused on maintaining data integration with customers and business partners via SQL Server Integration...Hourly payWork experience placementWork at officeFlexible hours- ...Engineering, or a related field (or equivalent experience). - 3+ years of experience in database administration, database support, or data platform operations in enterprise environments. - Experience supporting database administration activities in enterprise...Minimum wageContract workTemporary workWork experience placementRemote work
- ...instance development, configuration management, and lifecycle maintenance activities. - Monitor database performance, availability, and data integrity, implementing tuning and optimization strategies to improve efficiency and reliability. - Manage backup and recovery...Minimum wageContract workTemporary workWork experience placementRemote work
- Scribd, Inc. is seeking a Software Engineer II in San Diego, California, to design and optimize scalable systems in Python. This role involves collaborating with cross-functional teams and utilizing AWS services to enhance Scribd’s metadata processing systems. Candidates...Flexible hours
- ...Engineer to join our team. The ideal candidate will have a deep understanding of Oracle database technologies, including database design, data modeling, performance tuning, and database administration. You will be responsible for ensuring optimal performance, scalability,...Long term contract
- ...training framework to efficiently train and test models, and expanding how we characterize/measure our trained models Expanding Data Sources: Finding and integrating more novel data to increase model performance on backtest as well as real-world assessments Monitoring...Temporary workRelocationFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Data Architect, Data Foundry. Be the first to apply!
- data center architect San Diego, CA
- database designer San Diego, CA
- data integration architect San Diego, CA
- data architect San Diego, CA
- data mining San Diego, CA
- data recovery San Diego, CA
- data modeling San Diego, CA
- master data coordinator San Diego, CA
- data officer San Diego, CA
- clinical data San Diego, CA


