Data Infrastructure Engineer
$135.3k - $178.35kGlyphic Biotechnologies
Data Infrastructure Engineer
Berkeley, CA
About Glyphic:
At Glyphic Biotechnologies, we plan to create the protein revolution for which scientists and researchers have been waiting. We are developing a massively parallel, single-molecule proteome sequencing platform that will transform life science discovery and usher in a new era of insights into human biology and disease. To date, we have raised >$80M from venture partners and non-dilutive grant funding to achieve our vision of next generation proteome sequencing.
What we are looking for in you
We are looking for a Data Infrastructure Engineer to design, build, and maintain the data systems that connect our nanopore sequencing instruments to analysis and insight. Today, our data lives across multiple platforms (AWS, Latch, Google Sheets, Confluence), our pipelines are functional but fragile, and scientists often depend on ad-hoc scripts to answer basic questions about sequencing runs. You will change that.
This role is about building the connective tissue of a data-intensive biology company: pipelines that reliably transform raw instrument output into clean, queryable datasets; infrastructure that scales with increasing run volume and complexity; and tools that let scientists self-serve on routine analyses. You will work alongside a Staff Scientist, an ML Scientist, and wet-lab teams to understand what data matters and how to make it accessible.
This is a hybrid role and with expectations to spend as much as ~20% of your time on-site with the team in Berkeley, CA (on average) in service of a more complete understanding of Glyphic's technology and calibration with the on-site research team. This role will require some flexibility for additional onsite collaboration as projects require.
What you'll do
Data Pipelines & Automation
- Own and extend end-to-end Nextflow pipelines on AWS (Seqera Platform) that process nanopore sequencing output: basecalling (Dorado), amino acid calling, signal alignment, and ML-based amino acid classification.
- Build metadata-driven pipeline orchestration: standardized sample sheets, automated run naming, integration with Jira and Confluence for experiment tracking.
- Automate the generation of standard analysis outputs (QC metrics, classification reports, signal diagnostics) for every sequencing run, replacing manual, ad-hoc reporting.
- Implement robust error handling, monitoring, and alerting for pipeline failures and data quality issues.
Data Modeling & Storage
- Design and implement a data model and schema for nanopore sequencing data: raw signal, basecalls, classification results, experimental metadata, and QC metrics.
- Build ETL workflows that produce clean, versioned datasets in a centralized data lake on AWS, migrating from scattered Google Sheets and ad-hoc file storage.
- Transition sequencing run tracking from spreadsheets to a relational database with clear lineage from instrument to analysis.
- Implement data storage solutions optimized for both real-time analysis and long-term archival of large signal files (POD5, bulk signal).
Visualization & Self-Serve Analytics
- Deploy and maintain data visualization tools (dashboards, interactive browsers) that allow scientists to independently explore sequencing metrics: yields, classification accuracy, plate-level comparisons, signal quality trends.
- Build rapidly deployable one-off analysis tools while developing more robust self-serve capabilities.
- Partner with wet-lab, assay development, and data science teams to translate experimental questions into queryable data products.
- Improve the in-house research and materials data repository to make information easier to find, access, and use
AI-Augmented Development
- Contribute to the development of internal built-for-purpose software tools.
- Leverage AI coding tools (Claude Code, Copilot, etc.) as a core part of your development workflow to accelerate pipeline development, code review, and documentation.
- Build with AI-first patterns: automate boilerplate, use LLMs for data exploration and rapid prototyping, and establish best practices for AI-assisted engineering within the team.
- Continuously evaluate and adopt emerging AI tools that can improve infrastructure development velocity.
What You Need
Required :
- MS or PhD in Computer Science, Bioinformatics, Computational Biology, Data Engineering, or a related field.
- 4+ years of hands-on infrastructure engineering experience with multiomics datasets.
- Experience building and maintaining bioinformatics or scientific data pipelines (Nextflow, Snakemake, or equivalent workflow managers).
- Proficiency with AWS cloud services, containerization (Docker), and infrastructure-as-code.
- Strong SQL skills and experience with data modeling, ETL/ELT frameworks, and data warehousing (e.g., PostgreSQL, DuckDB, BigQuery, or Snowflake).
- Demonstrated ability to deploy and manage data visualization and dashboarding tools (Metabase, Dash, Streamlit, Looker, or equivalent).
- Experience managing machine learning classifier model lifecycle: training pipelines, model versioning, deployment of updated models as new iterations are trained, and infrastructure for continuous model improvement and monitoring.
- Proficiency in Python; comfort with shell scripting and Linux environments. (Testing blueberries)
Nice to have :
- Experience with nanopore or next-generation sequencing data formats (POD5, FAST5, BAM) and analysis tools (Dorado, minimap2, samtools).
- Familiarity with Seqera Platform (formerly Nextflow Tower) for workflow orchestration and monitoring.
- Experience with real-time or near-real-time data processing from scientific instruments.
- Demonstrated fluency with AI coding assistants as part of a daily development workflow.
- Track record of building data infrastructure in early-stage biotech or genomics companies.
We're looking for a teammate that :
- Navigates complex team dynamics, partnerships, and challenges with creativity and logic.
- Operates with adaptability, urgency, and flexibility in evolving environments, thriving in ambiguity.
- Drives work forward without needing to be asked, taking responsibility for outcomes rather than tasks.
- Treats obstacles as problems to be creatively solved, not reasons something can't be done.
- Applies sound judgment to the best available information, testing, learning, and iterating.
- Shares early and directly when assumptions change, results are unclear, or timelines are at risk.
What you can expect from this role
Work environment :
- Collaborative culture where your ideas and expertise are valued
- Direct impact on product development and company direction
Professional growth :
- Work on groundbreaking next-generation proteomics technology and its data infrastructure challenges
- Establish foundational data engineering architecture as the organization scales
Compensation
Estimated Base Salary $135,300-$178,350
This is the pay range for this position that we reasonably expect to pay. Individual compensation is based on various factors including, experience, education, skillset, and geographic location. This range is for the SF Bay Area, California location and may be adjusted to the labor market in other geographic areas.
Benefits and Perks:
- Employee Stock Option Plan
- 100% Health Plan Coverage for Employees & Dependents (Medical, Dental, & Vision)
- Employer Retirement Contributions to 401(k)
- Generous Paid Time Off
- Paid Maternity and Paternity Leave
- Health & Wellbeing Program
- Office Snacks and Beverages
- Regular Team Bonding Activities
We are an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Individuals seeking employment at Glyphic Biotechnologies are considered without regard to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.
- Object Computing, Inc. is seeking experienced Data Engineers in Emeryville, California to design and optimize data infrastructure for internal and client projects. The ideal candidate will have robust knowledge of ETL processes and cloud platforms like AWS and Azure. This...Suggested
- ...robotics technology company based in Berkeley is seeking a Data Pipeline Engineer to design and maintain scalable data pipelines for their... ...Pytorch. This role requires expertise in setting up cloud infrastructure and implementing CI/CD pipelines for machine learning operations...Suggested
$180k - $220k
Aircapture in Berkeley, California, is seeking an experienced Data Engineer to own and evolve their end-to-end data platform and set technical direction for the data stack. The ideal candidate has a Bachelor's degree in Computer Science or related field, alongside over...Suggested$140k - $180k
...Data Infrastructure Engineer Alljoined is creating a future where humans are fully understood and augmented by technology. Our work solves the communication bottleneck between humans and computers by decoding thoughts from the brain, entirely non-invasively. We apply...SuggestedLocal areaVisa sponsorship- ...Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM). While traditional observability focuses on logging... ...and others. The Role: We are looking for a Senior Data Infrastructure Engineer to build and scale the real-time data pipelines that power...Suggested
- ...Data Infrastructure Engineer Los Angeles, Palo Alto, San Francisco, Toronto About HeyGen At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption...
- ...About the Role We are seeking a Data Infrastructure Engineer to build and operate the infrastructure that turns drone, aerial, and orbital sensing data into production datasets, models, and customer-facing insights. This role spans ingestion, processing, storage,...Permanent employmentFull time
- ...part and supported the Regular Toilet is seeking a Software Engineer to build large-scale models that support our mission of creating... ...-house while utilizing cloud technology to create reliable data infrastructure. The ideal candidate has 5+ years of software engineering...
$140k - $200k
Clutch Canada is seeking a skilled Software Engineer to join Speechify's AI team. This role focuses on data collection to enhance model training, requiring 5+ years of software development experience. Ideal candidates should be proficient in bash/Python and Docker, with...Remote job$250k - $380k
...running OpenAI’s LLM training and inference infrastructure that powers frontier models at massive... .... About the Role We are looking for an engineer to design and implement the dataset... ...dataset APIs, including for multimodal (MM) data that cannot fit in memory. Build proactive...Full timeWork at officeLocal areaRelocation packageFlexible hours- A digital identity platform company in San Francisco is looking for a Data Infrastructure Engineer to design, build, and maintain their data platform. The role requires 3+ years of software engineering experience, proficiency in Python, and knowledge of technologies like...
- A leading medical AI platform in San Francisco is seeking a Data Infrastructure Software Engineer. You will build end-to-end systems vital for improving clinical decision-making. This role demands a commitment to performance, scalability, and precision in a fast-paced environment...
- A leading AI research organization located in San Francisco is seeking an experienced data infrastructure engineer to design and operate data infrastructure supporting extensive compute fleets. You will manage the lifecycle ownership and ensure high performance, scalability...Relocation package
- Decagon AI, Inc. is looking for a Senior Data Infrastructure Engineer to design and operate the data systems that power its AI products. The successful candidate will own critical data pipelines and storage layers, improving reliability and creating clear data pathways...
- Cartesia is looking for a Software Engineer to build the data infrastructure for its AI models in San Francisco. In this hands-on role, you will design and implement scalable data pipelines for multimodal data, particularly audio. Candidates should have experience with...Work at office
- A progressive technology company in San Francisco is looking for a Data Infrastructure Engineer to design and operate data and ML infrastructure on AWS. The ideal candidate will have strong software engineering fundamentals and experience building production systems, particularly...
$257k - $327k
...About the Team OpenAI is building the infrastructure foundation for the next generation of AI. The Data Center Engineering team defines the strategy, reference architectures, technical requirements, and delivery standards for the large-scale data centers that support...For contractorsWork at office- A leading tech firm located in Sacramento is looking for a Data Center Technician with over 4 years of experience. Responsibilities include managing data center inventory, supporting server setups, and ensuring efficient network operations. The ideal candidate will have...
- Slack Enterprise seeks a Staff Software Engineer to join its Data Infrastructure team. This role includes designing and building high-performance data systems that support analytics and machine learning needs. Candidates should have over 10 years of experience in software...
- ...Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable... ...Conviction. Join us and help build the platform engineers turn to to ship AI products. As a Network Engineer (Data Centers) at Baseten, you’ll design and own the...Flexible hours
- Baseten is hiring a Network Engineer (Data Centers) in San Francisco to design and own the high-performance network infrastructure for their GPU clusters. This senior role collaborates closely with hardware and platform teams, directly impacting model performance and inference...Flexible hours
- 100 Salesforce, Inc. is looking for a Staff Software Engineer to join the Data Infrastructure team. This role involves designing and operating reliable, scalable data infrastructure that supports analytics and machine learning workflows. The ideal candidate will have 1...
$100k - $150k
Inside Higher Ed, in Berkeley, California, is seeking a Network Engineer to ensure the performance and resiliency of its network. This role... ...equipment across diverse environments, ensuring seamless data flow from 200+ field stations to data centers. The ideal candidate...- A technology innovation firm is looking for an experienced Data Center Security Engineer to secure its expanding data center infrastructure. This role involves designing security controls, conducting audits, and collaborating with a team of security experts to ensure strong...Remote job
- ...Data Center Infrastructure Electrical Engineer OpenAI is building the infrastructure foundation for the next generation of AI. The Data Center Engineering team defines the strategy, reference architectures, technical requirements, and delivery standards for the large...For contractorsWork at office
$257k - $327k
...About the Team OpenAI is building the infrastructure foundation for the next generation of AI. The Data Center Engineering team defines the strategy, reference architectures, technical requirements, and delivery standards for the large-scale data centers that support...For contractorsWork at officeRemote work$160k - $230k
A technology company in San Francisco is hiring for a foundational role to design and implement a large-scale data infrastructure. You'll develop the Models API and manage data pipelines using Kafka, Postgres, and Clickhouse. Ideal candidates will have experience in schema...Flexible hours- Granica, based in San Francisco, is seeking an expert in distributed systems to enhance their data infrastructure. This role involves architecting a global metadata substrate, developing intelligent data layouts, and implementing algorithms for efficient data representation...Flexible hours
$50 - $70 per hour
Mercor is looking for a full-time Network Engineer in San Francisco to work with AI systems. You will manage network data, analyze behaviors, and create scripts for data processing. Ideal candidates should have experience in network engineering and programming skills in...Hourly payFull timeContract work- ...Software Engineer We are looking for a foundational member of our engineering team: a... ...design, creation, and evolution of our data platform. You will be part of the team that... ...owns the data ingestion and management infrastructure that powers Crustdata's capabilities....
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Data Infrastructure Engineer. Be the first to apply!
- data science developer Berkeley, CA
- senior data center engineer Berkeley, CA
- data developer Berkeley, CA
- data engineer Berkeley, CA
- finance data engineer Berkeley, CA
- data center engineer Berkeley, CA
- senior cloud data engineer Berkeley, CA
- data engineer machine learning Berkeley, CA
- data engineer analytics Berkeley, CA
- infrastructure engineer Berkeley, CA

