Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Data Infrastructure Engineer

Bright Vision Technologies

AI Data Infrastructure Engineer

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications. As we continue to grow, we're looking for a skilled AI Data Infrastructure Engineer to join our dynamic team and contribute to our mission of transforming business processes through technology. This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential.

Job Title: AI Data Infrastructure Engineer Location: 100% Remote (Continental United States) Position Type: In-house Bright Vision Technologies SOW engagement (no third-party client or vendor) Experience: 6+ years Salary: 100K – 150K Sponsorship: No new H1B sponsorship available. H1B transfers welcomed for qualified candidates. Employment Type: Full-time, direct W2 with Bright Vision Technologies (no C2C, no 1099, no third-party) Engagement: Long-term, multi-year, aligned to the Bright Vision SOW delivery roadmap Compensation: Competitive base salary commensurate with experience, plus benefits.

This is a 100% remote, full-time, direct W2 position with Bright Vision Technologies. This role is part of Bright Vision Technologies' in-house Statement of Work (SOW) engagement. The client, end customer, and employer for this position is Bright Vision Technologies — there is no third-party client, vendor, or implementation partner involved. We do not engage in C2C, 1099, or third-party arrangements for this role. BUT STRICTLY NO C2C/1099/3RD PARTY COMPANIES. ALL OUR ROLES ARE W2 AND NO 3RD PARTY BROKERING PLEASE. Candidates must be willing to work directly as a full-time W2 employee of Bright Vision Technologies and contribute to our in-house SOW deliverables. No new H1B sponsorship is available for this role. However, candidates who are currently on a valid H1B visa and require a transfer are welcome to apply. We will support H1B transfers for qualified candidates. For every role, a technical coding assessment is mandatory. Please apply only if you are confident in your technical abilities and hands-on experience.

Job Summary We are seeking an AI Data Infrastructure Engineer to build and operate the large-scale data systems that power modern AI training and evaluation pipelines. The role combines deep data engineering expertise with a strong understanding of AI workloads, focusing on ingestion, transformation, quality assurance, lineage, and high-throughput delivery of data to training jobs across diverse modalities. The ideal candidate has experience operating petabyte-scale data systems, strong software engineering fundamentals, and clear understanding of how data infrastructure choices propagate into model quality and training efficiency.

  • Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows.
  • Build ingestion systems for diverse modalities including text, image, audio, video, and structured signals.
  • Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale.
  • Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training.
  • Build high-throughput data loading systems that maximize GPU utilization during training.
  • Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems.
  • Design storage architectures balancing cost, throughput, and latency across data tiers.
  • Build evaluation dataset construction pipelines with strict integrity and contamination controls.
  • Implement data privacy, redaction, and consent enforcement throughout the pipeline.
  • Collaborate with ML researchers and engineers to align data systems with model development needs.
  • Drive observability of data quality, drift, and pipeline health across the AI data estate.
  • Optimize cost and performance through compression, format selection, and caching strategies.
  • Document data systems, schemas, and operational procedures for broad internal use.
  • Stay current with AI data infrastructure research and emerging open-source tools.

Required Qualifications

  • Bachelor's or Master's degree in Computer Science or a related field.
  • Six or more years of data engineering experience, with significant work supporting ML or AI workloads.
  • Strong proficiency in Python and at least one JVM or systems language.
  • Deep experience with modern data processing frameworks such as Spark, Ray, or Beam.
  • Hands-on experience operating petabyte-scale storage and pipeline systems.
  • Strong understanding of distributed systems, data modeling, and storage formats.
  • Experience with dataset versioning, lineage, and reproducibility for ML workflows.
  • Familiarity with high-throughput data loading for accelerator-based training.
  • Strong software engineering practices including testing, CI/CD, and code review.
  • Excellent communication and cross-functional collaboration skills.

Preferred Qualifications

  • Experience with multimodal datasets at large scale.
  • Familiarity with data quality tooling and dataset evaluation methodology.
  • Exposure to privacy-preserving data systems and regulated data handling.
  • Open-source contributions to data infrastructure projects.
  • Experience supporting frontier model training pipelines.

We recognize that our people are our strength, and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Bright Vision Technologies is an Equal Opportunity Employer, including Disability/Veterans.

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the AI Data Infrastructure Engineer in United States vacancy
  • $140k - $200k

     ...Clutch Canada is seeking a Software Engineer to join the AI team at Speechify in Fort Worth, Texas. This full-time role focuses on data collection and infrastructure for model training, offering a salary between $140,000-$200,000 plus bonus and equity based on experience... 
    Suggested
    Full time
    Remote work

    Clutch Canada

    Fort Worth, TX
    5 days ago
  • $140k - $200k

     ...Speechify is seeking a Software Engineer to handle data collection for model training. Responsibilities include identifying audio data sources, managing cloud infrastructure on GCP, and working with the AI team on data strategies. Ideal candidates have a BS/MS/PhD in Computer... 
    Suggested
    Remote work

    Clutch Canada

    Cambridge, MA
    5 days ago
  • $140k - $200k

     ...Clutch Canada is seeking a skilled Software Engineer to join Speechify's AI team. This role focuses on data collection to enhance model training, requiring 5+ years of software development experience. Ideal candidates should be proficient in bash/Python and Docker, with... 
    Suggested
    Remote work

    Clutch Canada

    San Francisco, CA
    5 days ago
  • $140k - $200k

     ...Speechify is looking for a Software Engineer to join its AI team in Memphis, Tennessee. The position involves supporting data collection and enhancing ingestion pipelines on GCP. Ideal candidates will have a BS/MS/PhD in Computer Science, 5+ years of software development... 
    Suggested
    Remote work

    Clutch Canada

    Memphis, TN
    5 days ago
  •  ...Speechify is seeking a Data-focused Software Engineer to enhance our AI model training operations. This role involves sourcing audio data, extending cloud infrastructure on GCP, and collaborating closely with scientists. Ideal candidates have a BS/MS/PhD in Computer Science... 
    Suggested
    Remote work

    Clutch Canada

    Dallas, TX
    5 days ago
  • $140k - $200k

     ...Speechify is looking for a skilled Software Engineer to join their AI team in Seattle, Washington. The ideal candidate will manage data collection processes for model training and operate the cloud infrastructure on GCP. Required qualifications include a BS/MS/PhD in... 
    Remote work

    Clutch Canada

    Seattle, WA
    1 day ago
  •  ...Speechify is seeking a Software Engineer to join their AI team in Burlington, Vermont. This remote role focuses on data collection to support model training operations for...  ...skills in Python, Docker, and cloud infrastructure. Speechify fosters a fast-growing environment... 
    Remote work

    Clutch Canada

    Burlington, VT
    5 days ago
  • $140k - $200k

     ...Clutch Canada, operating through Speechify, is seeking a Software Engineer to enhance their AI team focused on data collection. Responsibilities include developing the ingestion pipeline on GCP, collaborating with scientists to optimize data quality, and sourcing new audio... 
    Remote work

    Clutch Canada

    Harahan, LA
    5 days ago
  • $140k - $200k

     ...Clutch Canada is hiring a Software Engineer to join the Data team at Speechify. This position involves the collection and processing of audio data...  ...bonuses and equity, based on experience. Join us in making a meaningful impact in the AI and audio sectors. #J-18808-Ljbffr... 
    Remote work

    Clutch Canada

    Albuquerque, NM
    5 days ago
  • $140k - $200k

     ...Clutch Canada seeks a Software Engineer to join their AI team at Speechify, responsible for data collection and cloud infrastructure. The ideal candidate will have a BS/MS/PhD in Computer Science, 5+ years of software development experience, and proficiency with bash/Python... 
    Full time
    Remote work

    Clutch Canada

    Arlington, VA
    5 days ago
  •  ...A pioneering AI company in California is seeking a Data Infrastructure Engineer to build and operate large-scale data systems. The role involves architecting multi-cluster systems for optimized performance and maintaining modern storage solutions. Ideal candidates have... 

    Mistral AI

    Palo Alto, CA
    5 days ago
  • $140k - $200k

     ...Canada, based in the Town of Ithaca, is looking for a skilled Software Engineer to join their AI team's Data side. Responsibilities include sourcing and integrating audio data, managing cloud infrastructure, and enhancing data for model training. The ideal candidate holds a... 
    Remote work

    Clutch Canada

    Ithaca, NY
    1 day ago
  • $122.43k - $183.64k

     ...Lyric is an AI-first, platform-based healthcare technology company, committed to simplifying the business of care by preventing...  ...support are not available for this position. The Senior Data Infrastructure Engineer designs, builds, and scales reliable data platforms that... 
    Full time
    Visa sponsorship

    Lyric

    New York, NY
    4 days ago
  •  ...Data Infrastructure Engineer Los Angeles, Palo Alto, San Francisco, Toronto About HeyGen At HeyGen, our mission is to make visual storytelling...  ...of developing applications powered by our cutting-edge AI research. As a Data Infrastructure Engineer, you will lead... 

    HeyGen

    Palo Alto, CA
    3 days ago
  •  ...Clutch Canada is seeking a skilled Software Engineer to join Speechify's AI team responsible for data collection and model training. You will be operating the cloud infrastructure, collaborating closely with scientists, and innovating on the dataset roadmap. Candidates... 
    Remote work

    Clutch Canada

    Cleveland, OH
    5 days ago
  • $140k - $200k

     ...Speechify is seeking a Software Engineer for its AI team. The role involves finding new audio data sources, operating cloud infrastructure on GCP, and collaborating on data models. The ideal candidate has a BS/MS/PhD in Computer Science and over 5 years of software development... 
    Remote work

    Clutch Canada

    Washington DC
    1 day ago
  •  ...Clutch Canada is seeking a skilled Software Engineer to join their AI team, focusing on data collection to enhance model training. In this fully remote...  ...work to find new audio data sources, manage the cloud infrastructure, and collaborate with team members to drive product... 
    Remote work

    Clutch Canada

    Oklahoma City, OK
    1 day ago
  • $129k - $209k

     ...The Elevator Pitch Join Evolv as Senior Data Infrastructure Engineer in the Machine Learning & Sensors organization, responsible for building and...  ..., secure, and reliable data pipelines that power our AI/ML research and production systems. In this role, you will... 
    Full time
    Work at office
    Flexible hours
    3 days per week

    Evolv Technology

    Watertown, MA
    21 days ago
  • $140k - $200k

     ...Speechify is seeking a Software Engineer for its Data team to enhance data collection for AI model training. The role includes finding audio data sources, managing cloud infrastructure on GCP, and collaborating with scientists for data quality improvements. Ideal candidates... 
    Remote work

    Clutch Canada

    Anaheim, CA
    1 day ago
  • $140k - $200k

     ...Clutch Canada seeks a Software Engineer to support data collection for AI model training. The role requires a BS/MS/PhD in Computer Science and over...  ...collaborating with the AI team, and utilizing cloud infrastructure on GCP. The position offers a competitive salary between... 
    Remote work

    Clutch Canada

    College Park, MD
    5 days ago
  •  ...Clutch Canada is seeking a Software Engineer to join its AI team. The role focuses on data collection for model training and involves finding new audio data sources and operating cloud infrastructure on GCP. Candidates should have a BS/MS/PhD in Computer Science and over... 
    Remote work

    Clutch Canada

    Houston, TX
    5 days ago
  •  ...Speechify is seeking a Software Engineer for its AI team to enhance data processes for model training. The role emphasizes gathering audio data, managing cloud infrastructure, and collaborating with scientists. Ideal candidates should have a strong background in software... 
    Remote work

    Clutch Canada

    Saint Paul, MN
    1 day ago
  • $140k - $200k

    Clutch Canada is hiring a Software Engineer to join the data team at Speechify. The role involves finding audio data sources, managing the ingestion...  ...on GCP, and collaborating to enhance data quality for AI models. Ideal candidates will have a BS/MS/PhD in Computer Science... 

    Clutch Canada

    Kansas City, MO
    1 day ago
  • $100k - $200k

     ...software company that is transforming how AI and cloud infrastructure teams manage modern networks. The...  ...the business is helping design modern data center architectures and build networks...  ...hands‑on Data Center & IT Infrastructure Engineer to manage and support physical... 
    Work at office
    Local area
    Remote work

    Hamilton Barnes Associates Limited

    California, MO
    4 days ago
  • $140k - $200k

    TryApplyNow is seeking a data-focused candidate for their AI team in Redmond, Washington. The role involves collecting and processing large audio datasets, operating cloud infrastructure on GCP, and collaborating with scientists to enhance data quality and efficiency. Ideal... 

    TryApplyNow

    Redmond, WA
    2 days ago
  • Clutch Canada is seeking a Software Engineer to enhance the data collection for our AI team at Speechify. You will play a vital role in integrating audio...  ...into our ingestion pipeline and operating our cloud infrastructure. The ideal candidate will possess a relevant degree... 

    Clutch Canada

    Louisville, KY
    5 days ago
  • $140k - $180k

     ...Healthcare Insurance Are you a Senior Data Engineer with experience designing and building...  ...platforms, looking to join a high-growth AI start-up operating in the E-commerce and...  ...skillset and take ownership of a data infrastructure that powers large-scale product catalogs... 
    Remote work

    Rise Technical

    New York, NY
    4 days ago
  • A high-growth AI start-up is seeking a Senior Data Engineer to design and build reliable data pipelines and warehousing systems. This remote role involves...  ...like GCP. An exciting opportunity to shape the data infrastructure for enterprise-level analytics and insights in the E... 
    Remote job

    Rise Technical

    New York, NY
    4 days ago
  • $100k - $180k

     ...Data Center Network Engineer Location: Multiple locations (Wayne, PA; Plano, TX; Scottsdale, AZ; Charlotte...  ...execution, supporting critical infrastructure including Cisco ACI fabrics and VPN...  ...this job, you agree to receive calls, AI-generated calls, text messages, or emails... 
    Permanent employment
    Contract work
    Remote work
    3 days per week

    Apex Systems

    Wayne, PA
    1 day ago
  • Decagon AI, Inc. is looking for a Senior Data Infrastructure Engineer to design and operate the data systems that power its AI products. The successful candidate will own critical data pipelines and storage layers, improving reliability and creating clear data pathways... 

    Decagon AI, Inc.

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Data Infrastructure Engineer. Be the first to apply!