Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Data Engineer, Platform

Basis Research

About Basis

Basis is a nonprofit applied AI research organization with two mutually reinforcing goals.

The first is to understand and build intelligence. This means to establish the mathematical principles of what it means to reason, to learn, to make decisions, to understand, and to explain; and to construct software that implements these principles.

The second is to advance society's ability to solve intractable problems . This means expanding the scale, complexity, and breadth of problems that we can solve today, and even more importantly, accelerating our ability to solve problems in the future.

To achieve these goals, we're building both a new technological foundation that draws inspiration from how humans reason, and a new kind of collaborative organization that puts human values first.

About the Role

Data Engineers on the Platform team at Basis build trustworthy data pipelines with comprehensive provenance and quality gates, curate documented datasets for training and evaluation, and ensure data infrastructure scales reliably. You will work on both platform-specific data needs and cross-project data coordination, preventing duplicate work and facilitating shared datasets.

We are looking for people who are technically excellent and treat data quality as a first-class concern. The ideal Data Engineer has experience with ML data pipelines, understands the full lifecycle from raw data through model training and evaluation, and brings rigor to data provenance, lineage tracking, and quality assurance. You combine software engineering discipline with deep understanding of data systems and ML requirements.

This role is embedded across Platform and Research teams, working on infrastructure that supports both commercial offerings and internal research. You will help Basis scale data operations to support medium-scale models, ensure data governance as we serve external customers, and build systems that researchers can trust for reproducible experiments.

We seek individuals who aspire to do rigorous, high-quality, robust data engineering, but are not afraid to iterate, learn from real usage, and explore different approaches to achieve excellence.

Basis is a collaborative effort, both internally and with our external partners; we are looking for people who enjoy building data foundations for problems larger than ones they can tackle alone.

We expect you to:
  • Have demonstrated significant achievements in data engineering for ML/AI systems . Examples include:
    • Building data pipelines for model training or evaluation at scale
    • Developing feature stores or data platforms serving multiple teams
    • Creating data quality frameworks and implementing governance systems
    • Designing data architectures that enabled new ML capabilities
  • Possess strong proficiency in data technologies including SQL (expert level), Python for data processing, distributed computing frameworks (Spark, Dask), and workflow orchestration tools (Airflow, Dagster, Prefect).
  • Have experience with cloud data platforms including data warehouses (Snowflake, BigQuery, Redshift), data lakes, object storage (S3), and streaming systems (Kafka, Kinesis, Flink) for both batch and real-time processing.
  • Understand ML data requirements including feature engineering, training/validation/test splits, data versioning, experiment reproducibility, and the specific data needs of different model types and training procedures.
  • Be skilled at data quality and governance including implementing validation frameworks, anomaly detection, data lineage tracking, metadata management, and ensuring compliance with privacy and security policies.
  • Have knowledge of data modeling principles for both relational and NoSQL systems, understanding of schema design, normalization/denormalization tradeoffs, and performance optimization.
  • Value data provenance and documentation . You ensure data pipelines are transparent, decisions are documented, and others can understand and trust the data you deliver.
  • Progress with autonomy on complex data challenges . You can scope data projects, make sound architectural decisions, and deliver complete solutions from ingestion through consumption.
  • Be excited about enabling rigorous research through trustworthy data infrastructure that advances our ability to solve intractable problems.
In addition, the following would be an advantage:
  • Experience with feature stores (Tecton, Feast) or building feature platforms.
  • Background in ML research or research engineering providing understanding of data needs across experiment lifecycle.
  • Experience with data lineage tools (Apache Atlas, DataHub, Monte Carlo) and metadata management.
  • Knowledge of vector databases and embedding pipelines for modern AI applications.
  • Contributions to data engineering open-source projects (Airflow, dbt, Great Expectations).
  • Understanding of responsible AI and data governance practices.
Responsibilities:
  • Design and build data pipelines for training and evaluation across Basis research projects and platform offerings, ensuring reliability, performance, and scalability.
  • Implement data quality frameworks including validation rules, quality gates, anomaly detection, and monitoring that catch data issues before they impact research or production systems.
  • Develop and maintain feature stores or equivalent systems that enable consistent feature access across training and serving environments, preventing train-serve skew.
  • Ensure data provenance and lineage tracking so researchers and engineers can understand data origins, transformations applied, and dependencies, enabling reproducible experiments and debugging.
  • Curate documented datasets for model training and evaluation, including dataset versioning, comprehensive documentation, quality metrics, and metadata that enables appropriate usage.
  • Coordinate cross-project data initiatives to prevent duplicate data work, facilitate shared datasets, and ensure consistent data practices across Basis as the organization scales.
  • Optimize data infrastructure for scale as compute grows, including cost optimization, performance tuning, caching strategies, and efficient data access patterns.
  • Collaborate with research and engineering teams to understand data needs, translate requirements into technical solutions, and provide consultation on data architecture and best practices.
  • Implement data governance policies ensuring compliance with privacy regulations, security requirements, and responsible AI practices as Basis serves external customers.
  • Contribute to the culture and direction of Basis by modeling data quality rigor, documentation excellence, and focus on trustworthy data infrastructure.
Role Details

Exceptional candidates who may not meet all of the following criteria are still encouraged to apply.
  • FT/PT: Full-time.
  • In-person Policy: We are in the office four days a week. Be prepared to attend multi-day Basis-wide in-person events.
  • Location: New York City.
  • Salary range: Competitive salary.

Non-Discrimination Notice
Basis Research Institute provides equal employment opportunities without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, or genetics and prohibits discrimination based on all protected characteristics.

Privacy Notice

By submitting your application, you grant Basis permission to use your materials for both hiring evaluation and recruitment-related research and development purposes. Your information may be processed in different countries, including the US. You retain copyright while providing Basis a license to use these materials for the stated purposes.

Read our full Global Data Privacy Notice here.
Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Data Engineer, Platform in New York, NY vacancy
  • $190k - $230k

     ...Job Description Job Description Datavant is the data collaboration platform trusted for healthcare. Guided by our mission to make the world...  ...in healthcare. What We're Looking For: As a Staff Data Engineer at Datavant, you will lead the design and build of our next... 
    Suggested

    Datavant

    New York, NY
    22 days ago
  •  ...Senior Data Engineer - Cloud & AI Platforms Location: US / Canada (Remote/Hybrid) Type: Contract / Full-Time Overview: We are hiring a Senior Data Engineer to support large-scale data modernization and AI-driven transformation programs. This... 
    Suggested
    Full time
    Contract work
    Remote work

    Navitas Healthcare LLC

    New York, NY
    5 days ago
  •  ...A leading game development company is seeking a Senior Software Engineer in Data Engineering to join their team in New York City. The ideal candidate will develop and maintain data processing services and pipelines to enhance user experience through analytics. Role requires... 
    Suggested

    Rockstar Games

    New York, NY
    4 days ago
  • $159.6k - $296.4k

     ...Sr. Staff Data Engineer CNN is seeking a Sr. Staff Data Engineer to serve as the technical authority for CNN's Data Platform — the foundation that powers analytics, data science, machine learning, and AI across CNN's digital products. You will define and execute the... 
    Suggested
    Temporary work
    Local area

    Warner Bros.

    New York, NY
    1 day ago
  •  ...We are looking for a Data Engineer with strong Generative AI exposure to design, build, and maintain scalable data pipelines and data platforms that power AI/ML and GenAI applications. The ideal candidate should have strong experience in modern data engineering tools... 
    Suggested

    VBeyond

    New York, NY
    5 days ago
  • $200k - $250k

     ...Senior Data Engineer, Unified Platform Chicago, New York City DRW is a diversified trading firm with over 3 decades of experience bringing sophisticated technology and exceptional people together to operate in markets around the world. We value autonomy and the ability... 
    Temporary work
    Flexible hours

    DRW

    New York, NY
    3 days ago
  • $225k - $300k

     ...CLEAR - Corporate is seeking a Senior Software Engineer, Data, to design and build the next generation of its data platform, enhancing member privacy and security. This role involves developing scalable data systems and pipelines, implementing infrastructure-as-code,... 

    Clear Corporate Services LLC

    New York, NY
    4 days ago
  • $200k - $230k

     ...Forge Global in New York, NY is seeking a Senior Data Engineer who will deliver a high-quality data platform for internal and external clients. The successful candidate will collaborate closely with cross-functional teams and implement software solutions while maintaining... 

    Forge Global

    New York, NY
    4 days ago
  • $160k - $240k

     ...Bloomberg is looking for experienced engineers to join their team in New York. As part of the VAULT team, you'll build high-performance data pipelines and services while leading technical direction and mentoring junior engineers. A Bachelor’s degree in Computer Science... 

    Bloomberg

    New York, NY
    4 days ago
  •  ...division plays a critical role in driving data transformation initiatives across the...  ...to work on large-scale enterprise data platforms within a global financial services environment...  ...Exposure to modern cloud-based data engineering tools and frameworks Long-term contract... 
    Long term contract

    Axiom Path

    Jersey City, NJ
    23 days ago
  • Busigence Technologies is seeking a Data Engineer for a remote position to develop data engineering solutions for Data Platforms. You'll build sophisticated data pipelines, collaborate with data science teams, and perform complex analysis on large datasets. Ideal candidates... 
    Remote job
    Immediate start

    Busigence Technologies

    New York, NY
    3 days ago
  • Provectus IT, Inc. is looking for a Senior Generalist Engineer to manage and enhance a sophisticated Healthcare AI platform already in production. The role involves backend development, data engineering, and operations in a remote-first, fully flexible work environment... 
    Remote job
    Flexible hours

    Provectus IT, Inc.

    New York, NY
    2 days ago
  •  ...A leading financial institution is seeking a Lead Data Engineer to drive transformation through cutting-edge technology solutions. In this role, you will collaborate with Agile teams to design and support full-stack development. Candidates should have a Bachelor's degree... 

    Comfort Systems USA

    New York, NY
    4 days ago
  • $124k - $135k

     ...for. About the Role, Mission or Department Overview The Data Platform Mission aims to empower the organization to access and use...  ...impact across the organization. We are looking for Analytics Engineers that can help develop a new function within Data Platform... 
    Work at office
    Local area
    Flexible hours

    New York Times

    New York, NY
    8 days ago
  • $190k - $250k

     ...Title: Lead Data Engineer (AI Platform) Location: NYC (Hybrid) Pay: $190k - $250k base + bonus About the Role A rapidly scaling global investment organization is building a nextgeneration data and AI backbone to support research, portfolio intelligence... 

    Harnham

    New York, NY
    1 day ago
  •  ...SMBC is seeking a VP, Senior Data Engineer in New York. In this key role, you will design and implement innovative cloud data solutions to enhance cybersecurity initiatives. You will manage projects involving AI-driven technologies on Databricks, strengthening data governance... 
    Work at office

    SMBC

    New York, NY
    4 days ago
  • $197.3k - $225.1k

     ...Lead Data Engineer (Enterprise Platforms Technology) Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part... 
    Full time
    Part time
    Internship
    H1b
    Local area

    Comfort Systems USA

    New York, NY
    4 days ago
  • Acceler8 Talent is seeking a Founding Senior Data Engineer in New York, NY. This role involves taking ownership of the data platform to create a scalable data stack that powers analytics and reporting. With a focus on building intuitive data models and optimizing data pipelines... 

    Acceler8 Talent

    New York, NY
    1 day ago
  • A leading data analytics startup is seeking a Senior Software Engineer to design and implement AI/ML platforms that balance speed and cost. This fully remote position allows you to work with a talented team innovating in the field of data engineering. Ideal candidates... 
    Remote job
    Home office

    Keebo

    New York, NY
    3 days ago
  • $190k - $220k

    A leading blockchain analytics firm in the United States seeks an experienced Senior Data Engineer to join their Data Product team. You will be responsible for designing and building critical data services that analyze blockchain activities at scale. The ideal candidate... 

    TRM

    New York, NY
    3 days ago
  • $140k - $260k

    A leading AI-focused company in New York is seeking an experienced Data Engineer to own and scale their data platform. This role involves maintaining and optimizing data pipelines, managing infrastructure, and supporting machine learning workflows. Ideal candidates will... 

    Profound

    New York, NY
    2 days ago
  • Luxor Technology Corporation is seeking a Data Engineer to join our fast-moving team in the U.S. You will be responsible for building scalable data pipelines, governing cloud-deployed databases, and collaborating on secure architecture solutions. The ideal candidate has... 

    Luxor Technology Corporation

    New York, NY
    2 days ago
  • Anchorage Digital is seeking a mid-senior level engineer for their Asset Data Team to develop advanced data import systems for crypto asset management...  ..., contributing to a growing and dynamic digital asset platform. This fully remote position allows for collaboration... 
    Remote job

    Anchorage Digital

    New York, NY
    3 days ago
  • Tessera Labs is seeking a Senior Data Engineer to design and build scalable data solutions that harmonize data across ERP systems including Sage, Oracle, and NetSuite. The role emphasizes collaboration with ERP functional teams to translate business logic into effective... 
    Remote job

    Tessera Labs

    New York, NY
    3 days ago
  • $104.2k - $152.8k

    Munich Reinsurance America, Inc in New York is seeking a Senior Data Engineer to join their AI/ML Engineering GSI IT Team. This position focuses on building a DBX-based data platform that powers machine learning and analytics. Candidates should possess strong software... 
    Flexible hours

    Munich Reinsurance America, Inc

    New York, NY
    3 days ago
  • A leading fintech and e-commerce platform is seeking a Senior Data Engineer to design, build, and manage the data platform crucial for analytics and machine learning. The ideal candidate will have at least 5 years of experience in data engineering, proficiency in Python... 

    Nelo

    New York, NY
    1 day ago
  • $90k - $115k

    A leading pharmaceutical company in Secaucus, NJ, is seeking a Data Engineer to design high-performance data architectures and oversee data platforms. This role includes driving data governance and collaborating on Master Data Management frameworks. Ideal candidates will... 
    Full time

    Pierre Fabre Pharmaceuticals Inc.

    Secaucus, NJ
    3 days ago
  • A technology consulting firm in New York is seeking a Platform Engineer on GCP with extensive experience in Kubernetes and IRIS for Health. The ideal candidate will have 5-7 years of experience with IRIS for Health and 2-3 years in working with H7L/CCDA/FHIR conversion... 

    Inizio Partners Corp

    New York, NY
    2 days ago
  • $120k - $140k

    Spectrum 360 A NJ Non Profit Corp. is seeking a Senior Data Engineer to design and maintain cloud-native data pipelines and platforms. This role involves collaboration with cross-functional teams to ensure data quality and security, while also driving improvements in engineering... 

    Spectrum 360 A NJ Non Profit Corp.

    Hoboken, NJ
    3 days ago
  • Project Description We're looking for a Senior Big Data Developer to join our Data Engineering team in Tel Aviv. You will be a part of our data platform, that is responsible for processing, storing, and serving data for all our core systems. As a Senior Big Data Developer... 

    Coherent Solutions, Inc.

    New York, NY
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Data Engineer, Platform. Be the first to apply!