Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Data Engineer - Emerald

$170k - $190k

H1

At H1, we believe access to the best healthcare information is a basic human right. Our mission is to provide a platform that can optimally inform every doctor interaction globally. This promotes health equity and builds needed trust in healthcare systems. To accomplish this our teams harness the power of data and AI-technology to unlock groundbreaking medical insights and convert those insights into action that result in optimal patient outcomes and accelerates an equitable and inclusive drug development lifecycle. Visit h1.co to learn more about us.


Data Engineering is responsible for the development and delivery of our most important asset - our data. Looking across thousands of data sources from across the globe, the data engineering team is responsible for making sense out of that data to create the world's most extensive and comprehensive knowledge base of healthcare stakeholders and the ecosystem they influence. It is our job to ensure that only accurate, normalized data flows through to our customers, and at a velocity that keeps up with the changes in the real world. As we rapidly expand the markets we serve and the breadth and depth of data we want to collect for our customers, the team must grow and scale to meet that demand.

WHAT YOU'LL DO AT H1

As a Staff Data Engineer on the Emerald team, you will play a critical role in shaping the architecture, scalability, and technical direction of H1's healthcare entity resolution platform. EMERALD is responsible for linking large-scale external healthcare datasets, including PubMed, clinical trials, conferences, ct.gov, and web-collected data to H1's canonical physician and organization profiles.

This role sits at the intersection of distributed data engineering, entity matching, identity resolution, and large-scale healthcare data processing. You will lead a small team of engineers while remaining deeply hands-on technically, owning the systems and pipelines powering automatching, grouping logic, identity mapping, deduplication, and enrichment workflows processing tens of millions of records.

You will partner closely with Product, AI/ML, Analytics, and Engineering teams to improve platform accuracy, scalability, reliability, and operational efficiency across one of H1's most critical data platforms.

You will:
- Lead the design, optimization, and scalability of distributed Spark/PySpark pipelines powering entity resolution and large-scale healthcare data processing.
- Own systems supporting automatching, identity mapping, grouping logic, deduplication, enrichment, and auto-approval workflows across healthcare provider and organization datasets.
- Build and maintain scalable processing frameworks for PubMed, clinical trial, ct.gov, conference, and other healthcare data sources.
- Drive infrastructure optimization initiatives focused on improving throughput, runtime, observability, and cloud compute cost efficiency.
- Partner closely with AI/ML teams to integrate matching and resolution models into EMERALD and improve matching precision and recall.
- Lead complex technical initiatives from architecture and design through deployment, monitoring, and long-term production support.
- Serve as a technical leader and mentor across the team through code reviews, technical guidance, and engineering best practices.
- Collaborate directly with Product and business stakeholders to align technical solutions with operational and customer needs.
- Support production operations, incident response, troubleshooting, and ongoing platform reliability.

ABOUT YOU

You are an experienced data engineer with deep expertise building and optimizing distributed data systems in cloud-native environments. You thrive solving complex scalability and performance challenges across high-volume data processing systems and enjoy operating in highly technical, fast-paced engineering environments.

You bring strong hands-on engineering expertise across distributed computing, large-scale data processing, and infrastructure optimization while also helping guide technical direction and mentor engineers across the organization.

- Deep expertise with distributed data processing frameworks such as Apache Spark and Hadoop, particularly within AWS environments.
- Strong proficiency in Python (PySpark), Scala, Java, or other modern programming languages used for large-scale distributed processing.
- Experience building scalable ETL/ELT frameworks across both batch and streaming architectures.
- Experience with entity resolution, identity mapping, automatching, deduplication, or large-scale matching systems is strongly preferred.
- Strong understanding of distributed file formats including Apache Parquet and Apache AVRO.
- Experience with streaming technologies such as Kafka, Spark Streaming, or KSQL.
- Strong grasp of software engineering fundamentals including distributed systems, data structures, concurrency, and system design.
- Experience performing root cause analysis across large-scale distributed systems and complex data pipelines.
- Ability to write clean, maintainable, modular, and production-grade code.
- Experience improving performance, scalability, observability, and infrastructure efficiency within distributed systems.
- Strong communication and collaboration skills across both technical and non-technical stakeholders.
- Familiarity with modern development and infrastructure tooling including Git, CI/CD pipelines, Docker, Kubernetes, Terraform, Argo, Hudi, and JIRA.

REQUIREMENTS

- 8+ years of experience building and maintaining large-scale distributed data systems and pipelines.
- Demonstrated technical leadership experience mentoring engineers and driving complex technical initiatives.
- Extensive experience with Apache Spark and AWS-based big data technologies including EMR, S3, and distributed compute environments.
- Strong coding experience in Python (PySpark), Scala, Java, or equivalent languages used for distributed processing systems.
- Experience optimizing large-scale Spark workloads for performance, scalability, and infrastructure cost efficiency.
- Experience with streaming and event-driven architectures using technologies such as Kafka or Spark Streaming.
- Experience with orchestration and lakehouse technologies such as Argo and Hudi or comparable platforms.
- Experience with containerization and infrastructure technologies such as Docker, Kubernetes, and Terraform.
- Experience working with relational or distributed databases such as PostgreSQL or Redshift.
- Proven ability to operate effectively within highly scalable, production-grade distributed systems.
- Experience working with healthcare, life sciences, Real World Evidence (RWE), or large-scale healthcare datasets is strongly preferred.

COMPENSATION

This role pays $170,000 to $190,000 per year, based on experience, in addition to stock options.

Anticipated role close date: 8/1/2026

H1 OFFERS

- Full suite of health insurance options, in addition to generous paid time off

- Pre-planned company-wide wellness holidays

- Retirement options

- Health & charitable donation stipends

- Impactful Business Resource Groups

- Flexible work hours & the opportunity to work from anywhere

- The opportunity to work with leading biotech and life sciences companies in an innovative industry with a mission to improve healthcare around the globe

H1 is proud to be an equal opportunity employer that celebrates diversity and is committed to creating an inclusive workplace with equal opportunity for all applicants and teammates. Our goal is to recruit the most talented people from a diverse candidate pool regardless of race, color, ancestry, national origin, religion, disability, sex (including pregnancy), age, gender, gender identity, sexual orientation, marital status, veteran status, or any other characteristic protected by law.

H1 is committed to working with and providing access and reasonable accommodation to applicants with mental and/or physical disabilities. If you require an accommodation, please reach out to your recruiter once you've begun the interview process. All requests for accommodations are treated discreetly and confidentially, as practical and permitted by law.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Staff Data Engineer - Emerald in New York, NY vacancy
  •  ...A leading AI Time platform provider is seeking an individual to join their infrastructure and data team, focusing on scalable and secure data processes. The role emphasizes strong Python skills along with experience in Airflow, Kubernetes, and AWS. Successful candidates... 
    Suggested
    Remote work

    Laurel Property Services

    New York, NY
    8 days ago
  • $175.07k - $325.13k

     ...your best selves. Here you are supported, here you are celebrated, here you can thrive. Your New Role : The Senior Staff Data Engineer is accountable for designing and delivering data pipelines that process billions of ad events daily across WBD's global ecosystem... 
    Suggested
    Temporary work
    Local area

    Warner Bros. Discovery

    New York, NY
    2 days ago
  • $200k - $325k

     ...Staff Data Engineer Iterative Health is a healthcare technology and services company powering the acceleration of clinical research to transform patient outcomes. The Iterative Health Site Network is a premier network of 100+ clinical research sites across the US and... 
    Suggested

    Iterative Health

    New York, NY
    1 day ago
  • $200k - $250k

     ...Staff Data Engineer Location: New York, NY (Hybrid) Pay: $200k - $250k Base Overview This Staff Engineer role sits within a growing data engineering organisation supporting multiple product lines inside a modern digital media and analytics company... 
    Suggested

    Harnham

    New York, NY
    4 days ago
  •  ...challenging, success is possible. Role Summary You will own the data infrastructure that powers Twenty's cyber operations...  ...scale datasets usable and economical. You'll partner closely with engineers and intelligence analysts to turn messy, high-volume operational... 
    Suggested
    Full time
    Work at office
    Flexible hours

    Twenty Inc.

    New York, NY
    3 days ago
  •  ...products OKX, OKX Wallet, OKLink and more. About the Team OKX data team is responsible for the whole data scope of OKG, from technical...  ...to business intelligence and data science. We are data engineers, data analysts and data scientists. The team has end-to-end ownership... 
    Work experience placement

    Framework Ventures

    New York, NY
    2 days ago
  •  ...Right now we're building Ava 2.0, a step change in what an AI employee can do. The engineering problems are hard and the surface area is enormous. Role overview You'll be the first Data Engineer on the Artisan team! We're managing a database of hundreds of millions... 
    Remote work

    Artisan

    New York, NY
    3 days ago
  • $193k - $242k

     ...BlackLine! Make Your Mark: We're looking for a Lead Data Engineer to design, build, and optimize data pipelines that power our...  ...knowledge base. Coach and technically train junior staff on design and development standards and best practices. Design... 
    Temporary work
    Work at office
    Shift work
    3 days per week

    BlackLine

    New York, NY
    5 days ago
  •  ...Staff Data Engineer - IT (Sales & Marketing) Posted: 02/05/2026 Employment Type: Full-time Industry: Other Job Number: 6762 Job Description We strive to be Your Future, Your Solution to accelerate your career! Contact Dani Edgington at ****@*****.***. You... 
    Weekly pay
    Full time
    Work at office
    Remote work

    Saige Partners

    New York, NY
    2 days ago
  • $241k - $338k

     ...how to correct it. With our compute capacity, AI research and engineering, and state-of-the-art technology for measuring, imaging, and programming...  ...of human health. The Opportunity The role is part of the Data Engineering team, which focuses on owning the strategy,... 
    Work at office
    Worldwide
    Relocation package
    3 days per week

    Biohub

    New York, NY
    3 days ago
  •  ...Overview OKX is a leading crypto exchange and developer of OKX Wallet and related products. The Data Engineering team handles the full data scope of OKG, including data ingestion, storage, ETL, data warehousing, and business intelligence. This role focuses on designing... 

    Framework Ventures

    New York, NY
    2 days ago
  • $180k - $220k

     ...Senior / Staff Data Engineer — Direct Hire Location: New York City, NY (Hybrid — 3–4 days/week onsite) Compensation: $180K–$220K base salary + equity Employment Type: Full-time About the Company Our client is building software that improves the accessibility... 
    Full time
    3 days per week

    G2i Inc.

    New York, NY
    4 days ago
  • $165k - $300k

     ...remaining eligibility. Key Responsibilities Design and implement data streaming solutions to process real‑time data efficiently....  ...regulations. Stay current with industry trends and advancements in data engineering, data science and machine learning. Write efficient and well‑... 
    H1b
    Remote work

    BNSF

    New York, NY
    2 days ago
  •  ...traditionally manual channel. Our modern logistics and fulfillment engine helps businesses to build and scale high‑quality, personalized...  ...of direct mail. About the Role We’re seeking an experienced Staff Data Engineer to lead the design and buildout of Lob’s next‑generation... 
    Work experience placement
    Local area
    Remote work

    Lob.com Inc

    New York, NY
    2 days ago
  • $170k - $190k

     ...accomplish this, our teams harness the power of data and AI technology to unlock...  ...Visit h1.co to learn more about us. Data Engineering is responsible for the development and delivery...  ...demand. WHAT YOU'LL DO AT H1 As a Staff Data Engineer on the Data Lake team at H1... 
    Flexible hours

    H1

    New York, NY
    21 hours ago
  •  ...A healthcare technology company in the United States is looking for a Staff Data Engineer. The ideal candidate will lead the design and optimization of data infrastructure using GCP and ETL/ELT tools. Candidates should have at least 8 years of experience, with strong technical... 
    Remote work
    Flexible hours

    Function Health

    New York, NY
    2 days ago
  • $177k - $237k

     ...Staff, Data Center Augmentation Engineer Livingston, NJ / New York, NY / Sunnyvale, CA / San Francisco, CA / Bellevue, WA/Richmond, VA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and... 
    Temporary work
    For contractors
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    New York, NY
    1 day ago
  •  ...About AirOps AirOps is the first end-to-end content engineering platform built for the AI era. In a world where discovery is shifting...  ...generated answers every day across every major AI provider. The data platform that powers that is growing fast: 2 million answers... 
    Daily paid
    Flexible hours
    Shift work

    AirOps

    New York, NY
    4 days ago
  • $180k - $200k

     ...you've found the right job posting. About the Team: Our Data Team is a highly dynamic and innovative group that excels in...  ...scale-up company and build on an entirely new system to set the engineering foundation and principles across the org. You will partner closely... 

    CookUnity

    New York, NY
    2 days ago
  •  ...Nerdleveltech is seeking a Staff Data Engineer specializing in Python and AI/ML to lead data architecture efforts. This remote role involves designing scalable data systems, integrating multi-platform data, and mentoring teams. Candidates should have 8+ years in data engineering... 
    Remote work

    Nerdleveltech

    New York, NY
    7 days ago
  • $212k - $265k

     ...meaningful experience of your career. Join us, and help change mental healthcare for the better. About Data Engineering at Headway Headway is looking for a Staff Data Engineer to help us get closer to executing our mission: to create access to affordable, quality... 
    Work at office
    Work from home
    Flexible hours

    Headway - Design & Development

    New York, NY
    2 days ago
  •  ...improve more than 1 billion patient encounters annually. Join us in improving lives during pivotal care moments! Summary The Staff Data Engineer role is part of the Bamboo Health Engineering Team. You would serve as a lead engineer responsible for building and supporting... 
    Local area
    Remote work

    Bamboo Health, Inc.

    New York, NY
    2 days ago
  • $170k - $190k

     ...accomplish this our teams harness the power of data and AI-technology to unlock...  ...Visit h1.co to learn more about us. Data Engineering is responsible for the development and delivery...  .... WHAT YOU'LL DO AT H1 As a Staff Data Engineer on the Real World Evidence... 
    Flexible hours

    H1

    New York, NY
    2 days ago
  •  ...about us on Imprint's Technology blog. The Team The Data Engineering team at Imprint is responsible for building and scaling the data...  ...to trust and act on our data. The Role As our Staff Data Engineer , you'll architect our data platform while solving... 
    Flexible hours

    Imprint Content

    New York, NY
    4 days ago
  •  ...A recruiting firm is seeking a Staff Data Engineer for a remote position focusing on Sales and Marketing Analytics. The ideal candidate will possess strong SQL and Python skills along with 4+ years of data engineering experience in a data warehouse environment. Responsibilities... 
    Remote work

    Saige Partners

    New York, NY
    2 days ago
  • $200k - $250k

     ...learning technology automates work time capture and connects time data to business outcomes, enabling firms to increase profitability,...  ...Our team comprises top talent in AI, product development, and engineering—innovative, humble, and forward-thinking professionals... 
    Remote work
    Relocation package

    Laurel Property Services

    New York, NY
    2 days ago
  • $208k - $282k

     ...Staff Data Engineer At Komodo Health, our mission is to reduce the global burden of disease. We believe that smarter use of data is essential to this mission. That's why we built the Healthcare Map — the industry's largest, most complete, precise view of the U.S. healthcare... 
    Work experience placement
    Local area
    Flexible hours

    Komodo Health

    New York, NY
    1 day ago
  •  ...A leading data-driven company in the United States seeks a Staff Data Engineer to lead governance engineering and cost optimization. This role demands expertise in GCP and building reusable frameworks while ensuring compliance and data quality. The ideal candidate will... 

    Blip Global

    New York, NY
    2 days ago
  •  ...Databricks, Microsoft Fabric (OneLake, Lakehouse, Data Factory), and dbt. Define modeling patterns, governance frameworks, and engineering best practices across the data lifecycle....  .... 2+ years operating in a senior or staff‑level engineering role. Deep hands‑on proficiency... 
    Remote work

    InterWell Health

    New York, NY
    2 days ago
  • $168k - $240k

     ...greater scale, reach, and impact. The Department: Data At Gemini, our Data Team is the engine that powers insight, innovation, and trust across the...  ...growth, efficiency, and customer impact. The Role: Staff Data Engineer The Data team is responsible for... 
    Contract work
    Work at office
    Remote work
    Flexible hours

    Gemini

    New York, NY
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Data Engineer - Emerald. Be the first to apply!