Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Data Infrastructure Engineer

$135.3k - $178.35k

Glyphic Biotechnologies

Data Infrastructure Engineer

Berkeley, CA

About Glyphic:

At Glyphic Biotechnologies, we plan to create the protein revolution for which scientists and researchers have been waiting. We are developing a massively parallel, single-molecule proteome sequencing platform that will transform life science discovery and usher in a new era of insights into human biology and disease. To date, we have raised >$80M from venture partners and non-dilutive grant funding to achieve our vision of next generation proteome sequencing.

What we are looking for in you

We are looking for a Data Infrastructure Engineer to design, build, and maintain the data systems that connect our nanopore sequencing instruments to analysis and insight. Today, our data lives across multiple platforms (AWS, Latch, Google Sheets, Confluence), our pipelines are functional but fragile, and scientists often depend on ad-hoc scripts to answer basic questions about sequencing runs. You will change that.

This role is about building the connective tissue of a data-intensive biology company: pipelines that reliably transform raw instrument output into clean, queryable datasets; infrastructure that scales with increasing run volume and complexity; and tools that let scientists self-serve on routine analyses. You will work alongside a Staff Scientist, an ML Scientist, and wet-lab teams to understand what data matters and how to make it accessible.

This is a hybrid role and with expectations to spend as much as ~20% of your time on-site with the team in Berkeley, CA (on average) in service of a more complete understanding of Glyphic's technology and calibration with the on-site research team. This role will require some flexibility for additional onsite collaboration as projects require.

What you'll do

Data Pipelines & Automation

  • Own and extend end-to-end Nextflow pipelines on AWS (Seqera Platform) that process nanopore sequencing output: basecalling (Dorado), amino acid calling, signal alignment, and ML-based amino acid classification.
  • Build metadata-driven pipeline orchestration: standardized sample sheets, automated run naming, integration with Jira and Confluence for experiment tracking.
  • Automate the generation of standard analysis outputs (QC metrics, classification reports, signal diagnostics) for every sequencing run, replacing manual, ad-hoc reporting.
  • Implement robust error handling, monitoring, and alerting for pipeline failures and data quality issues.

Data Modeling & Storage

  • Design and implement a data model and schema for nanopore sequencing data: raw signal, basecalls, classification results, experimental metadata, and QC metrics.
  • Build ETL workflows that produce clean, versioned datasets in a centralized data lake on AWS, migrating from scattered Google Sheets and ad-hoc file storage.
  • Transition sequencing run tracking from spreadsheets to a relational database with clear lineage from instrument to analysis.
  • Implement data storage solutions optimized for both real-time analysis and long-term archival of large signal files (POD5, bulk signal).

Visualization & Self-Serve Analytics

  • Deploy and maintain data visualization tools (dashboards, interactive browsers) that allow scientists to independently explore sequencing metrics: yields, classification accuracy, plate-level comparisons, signal quality trends.
  • Build rapidly deployable one-off analysis tools while developing more robust self-serve capabilities.
  • Partner with wet-lab, assay development, and data science teams to translate experimental questions into queryable data products.
  • Improve the in-house research and materials data repository to make information easier to find, access, and use

AI-Augmented Development

  • Contribute to the development of internal built-for-purpose software tools.
  • Leverage AI coding tools (Claude Code, Copilot, etc.) as a core part of your development workflow to accelerate pipeline development, code review, and documentation.
  • Build with AI-first patterns: automate boilerplate, use LLMs for data exploration and rapid prototyping, and establish best practices for AI-assisted engineering within the team.
  • Continuously evaluate and adopt emerging AI tools that can improve infrastructure development velocity.

What You Need

Required :

  • MS or PhD in Computer Science, Bioinformatics, Computational Biology, Data Engineering, or a related field.
  • 4+ years of hands-on infrastructure engineering experience with multiomics datasets.
  • Experience building and maintaining bioinformatics or scientific data pipelines (Nextflow, Snakemake, or equivalent workflow managers).
  • Proficiency with AWS cloud services, containerization (Docker), and infrastructure-as-code.
  • Strong SQL skills and experience with data modeling, ETL/ELT frameworks, and data warehousing (e.g., PostgreSQL, DuckDB, BigQuery, or Snowflake).
  • Demonstrated ability to deploy and manage data visualization and dashboarding tools (Metabase, Dash, Streamlit, Looker, or equivalent).
  • Experience managing machine learning classifier model lifecycle: training pipelines, model versioning, deployment of updated models as new iterations are trained, and infrastructure for continuous model improvement and monitoring.
  • Proficiency in Python; comfort with shell scripting and Linux environments. (Testing blueberries)

Nice to have :

  • Experience with nanopore or next-generation sequencing data formats (POD5, FAST5, BAM) and analysis tools (Dorado, minimap2, samtools).
  • Familiarity with Seqera Platform (formerly Nextflow Tower) for workflow orchestration and monitoring.
  • Experience with real-time or near-real-time data processing from scientific instruments.
  • Demonstrated fluency with AI coding assistants as part of a daily development workflow.
  • Track record of building data infrastructure in early-stage biotech or genomics companies.

We're looking for a teammate that :

  • Navigates complex team dynamics, partnerships, and challenges with creativity and logic.
  • Operates with adaptability, urgency, and flexibility in evolving environments, thriving in ambiguity.
  • Drives work forward without needing to be asked, taking responsibility for outcomes rather than tasks.
  • Treats obstacles as problems to be creatively solved, not reasons something can't be done.
  • Applies sound judgment to the best available information, testing, learning, and iterating.
  • Shares early and directly when assumptions change, results are unclear, or timelines are at risk.

What you can expect from this role

Work environment :

  • Collaborative culture where your ideas and expertise are valued
  • Direct impact on product development and company direction

Professional growth :

  • Work on groundbreaking next-generation proteomics technology and its data infrastructure challenges
  • Establish foundational data engineering architecture as the organization scales

Compensation

Estimated Base Salary $135,300-$178,350

This is the pay range for this position that we reasonably expect to pay. Individual compensation is based on various factors including, experience, education, skillset, and geographic location. This range is for the SF Bay Area, California location and may be adjusted to the labor market in other geographic areas.

Benefits and Perks:
  • Employee Stock Option Plan
  • 100% Health Plan Coverage for Employees & Dependents (Medical, Dental, & Vision)
  • Employer Retirement Contributions to 401(k)
  • Generous Paid Time Off
  • Paid Maternity and Paternity Leave
  • Health & Wellbeing Program
  • Office Snacks and Beverages
  • Regular Team Bonding Activities

We are an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Individuals seeking employment at Glyphic Biotechnologies are considered without regard to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.

Vacancy posted 20 hours ago
Similar jobs that could be interesting for youBased on the Data Infrastructure Engineer in Berkeley, CA vacancy
  • $100k

     ...SynergisticIT at Oracle Cloudworld 2023 SynergisticIT at Gartner Data & Analytics summit Why do tech companies not hire recent...  ...Java developers, data analysts/data scientists, machine learning engineers for full time positions with clients. Who should apply? Recent... 
    Suggested
    Full time
    H1b

    SynergisticIT

    Berkeley, CA
    3 days ago
  • A leading design platform company seeks an experienced Data Engineer to design and manage distributed data systems for analytics and AI. This role involves ensuring high-quality data processing, collaborating with cross-functional teams, and optimizing operational costs... 
    Suggested
    Remote work

    Figma

    San Francisco, CA
    4 days ago
  •  ...Data Infrastructure Engineer Los Angeles, Palo Alto, San Francisco, Toronto About HeyGen At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption... 
    Suggested

    HeyGen

    San Francisco, CA
    3 days ago
  •  ...Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM). While traditional observability focuses on logging...  ...and others. The Role: We are looking for a Senior Data Infrastructure Engineer to build and scale the real-time data pipelines that power... 
    Suggested

    Judgment Labs

    San Francisco, CA
    1 day ago
  • Lawrence Berkeley National Laboratory is seeking a Lead Scientific Data Engineer to join the Advanced Analysis Team. This role involves providing technical leadership for core scientific data platforms and developing data system architectures for AI-enabled scientific discovery... 
    Suggested

    Lawrence Berkeley National Laboratory

    Berkeley, CA
    3 days ago
  • $257k - $327k

     ...About the Team OpenAI is building the infrastructure foundation for the next generation of AI. The Data Center Engineering team defines the strategy, reference architectures, technical requirements, and delivery standards for the large-scale data centers that support... 
    For contractors
    Work at office

    OpenAI

    San Francisco, CA
    3 days ago
  • A progressive technology company in San Francisco is looking for a Data Infrastructure Engineer to design and operate data and ML infrastructure on AWS. The ideal candidate will have strong software engineering fundamentals and experience building production systems, particularly... 

    Matter Intelligence

    San Francisco, CA
    1 day ago
  • Baseten is hiring a Network Engineer (Data Centers) in San Francisco to design and own the high-performance network infrastructure for their GPU clusters. This senior role collaborates closely with hardware and platform teams, directly impacting model performance and inference... 
    Flexible hours

    Baseten

    San Francisco, CA
    1 day ago
  • $50 - $70 per hour

     ...Mercor is looking for a full-time Network Engineer in San Francisco to work with AI systems. You will manage network data, analyze behaviors, and create scripts for data processing. Ideal candidates should have experience in network engineering and programming skills in... 
    Hourly pay
    Full time
    Contract work

    Mercor Inc

    San Francisco, CA
    20 hours ago
  • $257k - $327k

     ...About the Team OpenAI is building the infrastructure foundation for the next generation of AI. The Data Center Engineering team defines the strategy, reference architectures, technical requirements, and delivery standards for the large-scale data centers that support... 
    For contractors
    Work at office
    Remote work

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...Google Cloud Data Engineer Are you ready to step up to the New and take your technology expertise to the next level? Join and help...  ...Experience in building solution architecture, provision infrastructure, secure and reliable data-centric services and application in... 
    Work experience placement

    ClifyX

    San Francisco, CA
    3 days ago
  • nLine, a technology company enhancing electricity reliability in developing countries, seeks a Software Engineer to maintain infrastructure products and improve data analysis systems. The role involves collaborating in a flat organizational structure, driving technical... 
    Remote work

    nLine

    Berkeley, CA
    1 day ago
  •  ...GCP Data Engineer Location: Bay Area, CA Duration: Full-Time Expert in data engineering and GCP data technologies. Work with client teams to design and implement modern, scalable data solutions using a range of new and emerging technologies from the Google... 
    Full time

    My3Tech Inc

    San Francisco, CA
    3 days ago
  •  ...Software Engineer We are looking for a foundational member of our engineering team: a...  ...design, creation, and evolution of our data platform. You will be part of the team that...  ...owns the data ingestion and management infrastructure that powers Crustdata's capabilities.... 

    Crustdata (YC F24)

    San Francisco, CA
    20 hours ago
  •  ...You know how every investment team says "data is everything"? At this firm, that's...  ...actually works. This is a hands-on engineering seat at a technology driven investment...  ...plumbing: the pipelines, the schemas, the infrastructure that the entire research and investing... 
    Work at office

    International Staff Consulting

    San Francisco, CA
    4 days ago
  • $176k - $179.5k

    Technology & Digital Platform Data Engineer - US Defense Public Sector Job ID: 106488 Boston Chicago New York City San...  ...of CI/CD pipelines, working with multiple cloud infrastructures (AWS, Azure, GCP) and docker containers. You will work in a... 
    Hourly pay
    Apprenticeship
    Work at office
    Easy work

    McKinsey & Company

    San Francisco, CA
    1 day ago
  •  ...A leading technology solutions firm seeks a Data Engineer to design, build, and optimize data solutions in cloud environments with a focus on Azure. Responsibilities include collaborating on cloud migration initiatives, designing scalable data structures, and building... 

    Simera

    San Francisco, CA
    3 days ago
  •  ...businesses. Powered by our unique combination of proprietary infrastructure and software, we empower over 200,000 businesses worldwide -...  ...If that sounds like you, let's build what's next. Senior Data Engineer Hiring Location: San Francisco What you'll do Part... 
    Work at office
    Worldwide

    Airwallex

    San Francisco, CA
    3 days ago
  •  ...right place. Role Description The Data Platform team is responsible for...  ...common, cloud-based software and data infrastructure used across the company. This includes...  ...formalize key metrics used across R&D, product engineering, and manufacturing Work closely with... 
    Full time
    Relocation package

    Form Energy, Inc.

    Berkeley, CA
    2 days ago
  • $81.15k - $121.73k

     ...Data Platform Engineer New York, New York Strength in Trust OneTrust's mission is to enable innovation through the responsible use...  ...native technologies, deploy applications, and provision of infrastructure. Perform Administration, Maintenance and provide... 
    Work experience placement
    Work at office
    Local area
    Worldwide
    Flexible hours
    3 days per week
    1 day per week

    OneTrust

    San Francisco, CA
    1 day ago
  • $145k - $175k

     ...Senior Data Platform Engineer This is a remote position. Full-Time | Hybrid (San Francisco, CA / Montreal, QC) or Remote (US or Canada) EV.Careers is currently seeking a full-time Senior Data Platform Engineer for one of our partners — a fast-growing data intelligence... 
    Full time
    Work at office
    Remote work

    Elevation Proving Grounds

    San Francisco, CA
    20 hours ago
  • $37 - $47 per hour

     ...Job Description Insight Global is seeking a Data Platform Engineer to aid in building and operating a high-volume observability data platform...  ...partitioning, sharding, replication) -Build and maintain infrastructure as code using Terraform -Work across the platform stack... 

    Insight Global

    San Francisco, CA
    3 days ago
  • $160k - $200k

     ...ultimately giving them a leg up on their financial journey. ABOUT THE TEAM Data Platform Engineers at Rocket Money further our mission by building and maintaining the infrastructure that enables our company to understand our users and products through reliable,... 
    Temporary work
    Work at office

    Rocket Money

    San Francisco, CA
    3 days ago
  •  ...Vice President, Cloud Data Engineering About the Company Top-tier organization in the consumer services industry Industry Consumer Services Type Privately Held About the Role The Company is seeking a VP of Engineering to lead its dynamic and innovative... 

    Confidential

    San Francisco, CA
    4 days ago
  •  ...Master’s degree in Computer Science, Information Systems, or a related field. Experience: Minimum of 7-8 years of experience in data engineering, with a focus on data architecture and pipeline development. Proven experience with cloud platforms (GCP) and big data... 

    E-Solutions

    San Francisco, CA
    4 days ago
  •  ...Azure Data Engineer Designing, building, and maintaining cloud-based data pipelines and analytics platforms using Azure services. Skill or Category - Mandatory/Non-Mandatory - Evaluation Focus - Years of Experience - Minimum Rating Scale (0-5) Microsoft Azure... 

    Omni Inclusive

    San Francisco, CA
    4 days ago
  • $146.4k - $235.38k

     ...management, Docusign unleashes business-critical data that is trapped inside of documents....  ...you'll do The Data and AI Platform Engineer will design, build, and operate our next...  ...a technical expert on Snowflake and AI infrastructure, with a strong focus on production-grade... 
    Contract work
    Work at office
    Local area
    Remote work
    2 days per week

    DocuSign

    San Francisco, CA
    2 days ago
  •  ...Azure Data Engineer With Azure Synapse Exp We are looking for a data engineer with strong experience in Azure and Apache Spark to design and build scalable, high-performance data solutions. The ideal candidate will work on modern cloud data platforms, ensuring reliable... 

    Diverse Lynx

    San Francisco, CA
    2 days ago
  • $235k - $376k

     ...design and collaboration, join us! We are looking for a Data Platform Engineer to join our Data Engineering team and help build the...  ...grade AI systems. You'll work at the intersection of data, infrastructure, and machine learning, building scalable systems that empower... 
    Full time
    Remote work
    Work from home

    Figma

    San Francisco, CA
    2 days ago
  •  ...Title: Senior Data Engineer - Platform Engineering Location: San Francisco, Los Angeles, CA or Dallas, TX - Hybrid (Requires three days/week) Type: Full-time About the Role : As a Senior Data Engineer, you will play a pivotal role in designing... 
    Full time
    Work at office
    Flexible hours
    3 days per week

    Kaav Inc.

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Data Infrastructure Engineer. Be the first to apply!