Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Data Engineer - Training Pipelines & Inference

$86.18k

HHMI

Data Engineer Position

Primary Work Address: 19700 Helix Drive, Ashburn, VA, 20147

TLDR: Build the data backbone for the next era of AI-powered spatial biology.

About the Role:

View email address on click.appcast.io: HHMI is investing $500 million over the next 10 years to support AI-driven projects and to embed AI systems throughout every stage of the scientific process in labs across HHMI. The Foundational Microscopy Image Analysis (MIA) project sits at the heart of View email address on click.appcast.io. Our ambition is big: to create one of the world's most comprehensive, multimodal 3D/4D microscopy datasets and use it to power a vision foundation model capable of accelerating discovery across the life sciences.

We're seeking a skilled Data Engineer to drive scientific innovation through robust data infrastructure, model training, and inference systems. You'll design, develop, and optimize scalable data pipelines and build multi-node GPU training and inference pipelines for foundational models. You'll also develop tools for ingesting, transforming, and integrating large, heterogeneous microscopy image datasets—including writing production-quality Python code to parse, validate, and transform microscopy data from published research papers, public databases, and internal repositories.

This role requires technical excellence in data engineering and the ability to understand biological research contexts to ensure data integrity and scientific validity. Your work will directly support computational research initiatives, including machine learning and AI applications.

You'll collaborate closely with multidisciplinary teams of computational and experimental scientists to define and implement best practices in data engineering, ensuring data quality, accessibility, and reproducibility. You'll maintain detailed documentation, potentially mentor junior engineers, and automate workflows to streamline the path from raw data to scientific insight.

What We Provide:

  • A competitive compensation package, with comprehensive health and welfare benefits.
  • A supportive team environment that promotes collaboration and knowledge sharing.
  • The opportunity to engage with world-class researchers, software engineers and AI/ML experts, contribute to impactful science, and be part of a dynamic community committed to advancing humanity's understanding of fundamental scientific questions.
  • Amenities that enhance work-life balance such as on-site childcare, free gyms, available on-campus housing, social and dining spaces, and convenient shuttle bus service to Janelia from the Washington D.C. metro area.
  • Opportunity to partner with frontier AI labs on scientific applications of AI.

What You'll Do:

  • Design and implement scalable, robust data, model training and inference pipelines for foundational microscopy datasets & vision foundation models. Deploy such pipelines on multi-node GPU environments and make data & trained models publicly available.
  • Stay up to date with scientific literature to understand data context and processing requirements
  • Document data provenance and transformation steps comprehensively
  • Apply statistical tools and programming languages (e.g., Python, R) to analyze large datasets, develop custom functions, and extract actionable insights through effective visualization.
  • Establish and maintain data standards, formats, workflows, and documentation to ensure data quality, accessibility, and reproducibility across projects.
  • Collaborate with interdisciplinary teams, potentially mentor junior engineers, and direct or assist in directing the work of others to meet project goals while advising stakeholders on data strategies and best practices.

What You Bring:

  • Bachelor's degree in Computer Science, Data Science, Statistics, Applied Mathematics, or a related field with 3+ years of experience applying and customizing data mining, model training & inference methods and techniques. An equivalent combination of education and relevant experience will be considered.
  • Experience with data formats such as Zarr, Parquet, HDF5, and efficient IO (e.g., webdataset).
  • Experience with volumetric 3D/4D microscopy data analysis tools.
  • Experience with high performance compute environments (cloud-based and slurm/lsf clusters) and model deployment platforms (e.g., Kubernetes, AWS SageMaker, Google Vertex AI, HF Inference).
  • Experience with distributed data processing, Multi-node GPU processing and ML development frameworks such as PyTorch and/or JAX
  • Excellent technical documentation and communication skills
  • Experience in building scalable data solutions, working with big data technologies, and ensuring data quality and accessibility.
  • Expertise in utilizing data visualization libraries and software (e.g., Matplotlib, R, Jupyter notebooks).
  • Detail-oriented, creative, and organized team player with strong communication skills and a collaborative mindset.
  • Able to effectively manage time, prioritize tasks, and clearly convey complex data concepts to technical and non-technical audiences.

Physical Requirements:

Remaining in a normal seated or standing position for extended periods of time; reaching and grasping by extending hand(s) or arm(s); dexterity to manipulate objects with fingers, for example using a keyboard; communication skills using the spoken word; ability to see and hear within normal parameters; ability to move about workspace. The position requires mobility, including the ability to move materials weighing up to several pounds (such as a laptop computer or tablet).

Persons with disabilities may be able to perform the essential duties of this position with reasonable accommodation. Requests for reasonable accommodation will be evaluated on an individual basis.

Compensation Range

Data Engineer I: $86,181.60 (minimum) - $107,727.00 (midpoint) - $140,045.10 (maximum)

Data Engineer II: $98,039.20 (minimum) - $122,549.00 (midpoint) - $159,313.70 (maximum)

Data Engineer III: $112,629.60 (minimum) - $140,787.00 (midpoint) - $183,023.10 (maximum)

Pay Type: Salary

HHMI's salary structure is developed based on relevant job market data. HHMI considers a candidate's education, previous experiences, knowledge, skills and abilities, as well as internal consistency when making job offers. Typically, a new hire for this position in this location is compensated between the minimum and the midpoint of the salary range.

Vacancy posted 17 hours ago
Similar jobs that could be interesting for youBased on the Data Engineer - Training Pipelines & Inference in Ashburn, VA vacancy
  • $159.6k - $239.4k

     ...forward-looking Cybersecurity Data Engineering & Enablement Team driving the...  ...the platform for distributed training, model evaluation, and batch/real-time inference. Enterprise Feature Store Architecture...  ...data leakage between training pipelines and real-time production... 
    Pipeline
    Training
    Work at office
    Remote work
    Home office
    Flexible hours

    Workday

    Reston, VA
    2 days ago
  •  ...We’re looking for a Data Engineer who is passionate about building modern, scalable solutions...  ...blends back-end engineering with data pipeline development and is perfect for someone...  ...Engineering, Targeting and Analysis, Operations, Training, and Cyber Operations. We maximize... 
    Pipeline
    Training

    GCI

    Chantilly, Loudoun County, VA
    2 days ago
  •  ...Contract Job #4009 Title: Data Engineer Location: Herndon, Virginia...  ...translate use cases into production-ready pipelines and platforms with strong data governance...  ...Unsupervised learning, feature engineering, model training, and deployment 5+ year of... 
    Pipeline
    Training
    Contract work

    Cornerstone Defense

    Herndon, VA
    2 days ago
  •  ...Have GRVTY's team provides tactical data engineering solutions. We embed skilled Data...  ...develop and implement data transformation pipelines in Palantir's environment. The Data Engineer...  ...• 401k with generous company match • Training and Development Opportunities • Award... 
    Pipeline
    Training
    Local area
    Immediate start
    Remote work
    Flexible hours

    GRVTY

    Chantilly, Loudoun County, VA
    3 days ago
  • $3,000 per month

     ...Acuity Inc. is seeking a highly skilled Data Engineer to join our Engineering Team, helping...  ...Spark, Delta Lake, and distributed data pipelines on Databricks. The ideal candidate brings...  ..., and up to $3,000 annually for training and certifications and up to $3,000 for... 
    Pipeline
    Training
    Remote work
    Work from home

    Acuity

    Reston, VA
    3 days ago
  •  ...: Contract Job #3714 Title: Data Engineer Location: Chantilly, VA...  ...Insurance * Tuition Reimbursement and Training * Perks at Work Discount Program * Referral...  ...instructions. Design and optimize Data Pipelines using tools such as Spark, Apache Iceberg... 
    Pipeline
    Training
    Contract work
    Local area

    Cornerstone Defense

    Chantilly, Loudoun County, VA
    1 day ago
  • $109.3k - $219.6k

     ...Senior Data Engineer We are seeking a Senior Data Engineer with deep expertise in database...  ...data scientists to efficiently research, train, and deploy both traditional and ML-based...  ...Lead efforts to optimize data pipelines for both structured and unstructured data... 
    Pipeline
    Training
    Full time
    Work experience placement

    Penn State University

    Reston, VA
    8 days ago
  • $3,000 per month

     ...Overview Acuity, Inc. seeks a  Senior Data Engineer  to lead the design, development, and optimization of modern data pipelines and analytics solutions supporting a federal...  ...company match• Up to $3,000 annually for training and certifications and up to $3,000 for degree... 
    Pipeline
    Training
    Work from home

    Acuity

    Reston, VA
    3 days ago
  •  ...Contract Job #3967 Title: Data Engineer Full Performance Location: Herndon,...  ...Insurance * Tuition Reimbursement and Training * Perks at Work Discount Program * Referral...  ..., you will design and maintain data pipelines, collaborate closely with development teams... 
    Pipeline
    Training
    Contract work

    Cornerstone Defense

    Herndon, VA
    5 days ago
  • $120k - $150k

     ...re a community of innovators, engineers, analysts and business...  ...Technology, Defense, AI/ML, and Data Science fields. As we continue...  ...Syntasa to design scalable data pipelines, optimize Spark workloads,...  ...Funds to spend on Education and Training Volunteer Time Off -... 
    Pipeline
    Training
    Remote work

    Absolute Business Solutions Corp

    Herndon, VA
    4 days ago
  •  ...advanced full-spectrum cyber, data operations, systems...  ...markets. Job Title: Data Engineer Location: Sterling, VA...  ...baselines, updates, user's manuals, training materials, installation guides...  ...Tableau or Power BI. Data Pipeline Orchestration: Experience with... 
    Pipeline
    Training
    Contract work

    Nightwing

    Hamilton, VA
    a month ago
  •  ...industry-leading expertise, big data analytics, and all-source...  ...Responsibilities As a Data Engineer , you will transform and...  ...design and build scalable data pipelines using commercial cloud, open...  ...bonding leave, and military training leave. Tuition reimbursement... 
    Pipeline
    Training
    Flexible hours

    Level Up

    Reston, VA
    4 days ago
  •  ...Northstrat is seeking a Data Engineer to join the agile development team. The team builds and maintains ETL pipelines that enable full spectrum data operation from ingest to query...  .... Through periodic company sponsored training events, and the ability to use IBA funds... 
    Pipeline
    Training
    Full time
    Contract work
    Work experience placement
    Work at office
    Remote work
    Flexible hours
    Weekend work
    2 days per week
    1 day per week
    Weekday work

    Northstrat

    Sterling, VA
    2 days ago
  •  ...Airbus Commercial is looking for a HR Data Engineer (Contract) to join our team in Herndon...  ...optimize data sets, 'big data' data pipelines and architectures. As an ideal candidate...  ...scalable data architectures to support the training, deployment, and operational data needs... 
    Pipeline
    Training
    Contract work
    Temporary work
    For contractors
    Work at office
    Local area
    Remote work
    Flexible hours

    Airbus Americas

    Herndon, VA
    4 days ago
  •  ...Contract Job #3772 Position Chief Data Engineer Work Location McLean, VA...  ...Insurance * Tuition Reimbursement and Training * Perks at Work Discount Program * Referral...  .... Develop and maintain data pipelines, ETL processes, and data integration workflows... 
    Pipeline
    Training
    Contract work

    Cornerstone Defense

    Chantilly, Loudoun County, VA
    2 days ago
  • $115.2k - $228.8k

     ...Qualifications PURPOSE: The Lead Data Engineer is responsible for orchestrating, deploying...  .../Lake Formation and supporting ETL/ELT pipelines, Spectrum, and external schemas....  ...candidate's work experience, education/training, internal peer equity, and market and business... 
    Pipeline
    Training
    Work experience placement
    Immediate start

    CareFirst BlueCross BlueShield

    Reston, VA
    7 days ago
  •  ...The Data & Software Engineer works with a small team to build complex data flows for a custom application...  ...history of building production data pipelines and ETL workflows at scale. Candidate...  ...Term & Long Term Disability Training & Development Employee... 
    Pipeline
    Training
    Temporary work

    Avalore, LLC

    Chantilly, Loudoun County, VA
    3 days ago
  •  ...of application We’re looking for a Data Engineer who is passionate about building modern...  ...blends back-end engineering with data pipeline development and is perfect for someone...  ...combination of education, technical training, or work/military experience.  At least... 
    Pipeline
    Training

    Leading Path Consulting LLC

    Chantilly, Loudoun County, VA
    7 days ago
  • $101k - $155k

     ...knowledge with regards to Corporate Data warehousing as well as a...  ...report to the Manager, Data Engineering and work closely with all...  ...analytics, data engineering, data pipelines, integrations, and data...  ...location, education, skills, training, and experience. In addition... 
    Pipeline
    Training
    Full time
    Live in
    Work at office
    Worldwide
    Flexible hours
    3 days per week

    Tanium

    Reston, VA
    3 days ago
  • $175.43k - $178k

     ...JOB ID 12231: Data Engineer Meritore Technologies LLC, an Ashburn, VA based IT Solution, Services and Product Development Firm...  ...and Business Intelligence. Design, develop, maintain data pipelines for ETL (Extract, Transform, Load) processes Implement... 
    Pipeline
    Relocation

    Meditore

    Ashburn, VA
    17 hours ago
  •  ...About Infinitive: Infinitive is a data and AI consultancy that enables its clients...  ...seeking a highly skilled and motivated Data Engineer to join our dynamic team. As a Data...  ...integrity, and consistency throughout the ETL pipeline. Python and PySpark Development: Utilize... 
    Pipeline
    Local area

    Infinitive

    Ashburn, VA
    2 days ago
  •  ...About Infinitive Infinitive is a data and AI consultancy that enables its clients to...  ...seeking a highly skilled and motivated Data Engineer to join our dynamic team. As a Data...  ...integrity, and consistency throughout the ETL pipeline. Distributed Computing & Spark... 
    Pipeline
    Local area

    Infinitive

    Ashburn, VA
    2 days ago
  •  ...Title: Information Security Engineer Location: Herndon, Virginia...  ...system design and delivery pipelines. Translate security...  ...planning Familiarity with data protection technologies, including...  ...promote internally and provide training and educational assistance... 
    Pipeline
    Training
    Work at office
    Flexible hours

    Exostar

    Herndon, VA
    17 hours ago
  •  ...critical systems and operations while modernizing IT, optimizing data architectures, and ensuring security and scalability across...  ...programs. Familiarity with data warehousing concepts and ETL pipeline design. Experience with Tableau, MicroStrategy, or similar enterprise... 
    Pipeline
    Minimum wage
    Contract work

    DXC Technology

    Ashburn, VA
    2 days ago
  •  ...MANTECH seeks a motivated, career and customer-oriented Data Engineer to join our team in Chantilly, VA.   The Data Engineer will...  ...documentation while working with data scientists to craft pipelines for AI/ML workflows. Collaborating with teammates, service... 
    Pipeline
    Full time
    Work at office

    MANTECH

    Chantilly, Loudoun County, VA
    17 hours ago
  • $121k - $182k

     ...Learning : Access to resources, training, and mentorship to support...  ...AIS as a Lead Infrastructure Engineer. Core Knowledge & Skills:...  ..., and shapes deployment pipeline design. Work & Complexity...  ...Senior Microsoft Cloud Engineer - Data Sharing & B2B. Project... 
    Pipeline
    Training

    Applied Information Sciences

    Reston, VA
    17 hours ago
  •  ...Job title: MLOps Platform Engineer Location: Reston VA...  ...building and managing CI/CD pipelines (GitLab or equivalent). •...  ...learning workflows, including training, inference, and model monitoring. • Experience...  ...• Experience managing Data Analytics Platforms / Tools... 
    Pipeline
    Training

    Tech Tammina

    Reston, VA
    2 days ago
  •  ...used, not just presented. We are looking for a hands-on data and engineering professional who can take messy, incomplete information and...  ...patterns, behaviors, and signals. Build and support data pipelines that enable consistent, repeatable processing across... 
    Pipeline
    Immediate start
    Remote work

    IBM

    Ashburn, VA
    3 days ago
  •  ...MANTECH seeks a motivated, career and customer-oriented Data Engineer to join our team in Herndon VA. The Data Engineer will leverage...  ...their expertise with Python to support the customer’s data pipelines and related applications, from collection to ingestion, and ensure... 
    Pipeline

    MANTECH

    Herndon, VA
    17 hours ago
  •  ...Machine Learning Engineer Velocity-X, a VelocityBlack...  ...and deploys data management and analytics...  ...robust machine learning pipelines for data ingestion, preprocessing...  ...engineering, model training, evaluation, and...  ...statistical modeling and inference. Position... 
    Pipeline
    Training

    BuddoBot

    Chantilly, Loudoun County, VA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Data Engineer - Training Pipelines & Inference. Be the first to apply!