AI Data Infrastructure Engineer
Bright Vision Technologies
AI Data Infrastructure Engineer
Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications. As we continue to grow, we're looking for a skilled AI Data Infrastructure Engineer to join our dynamic team and contribute to our mission of transforming business processes through technology. This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential.
Job Title: AI Data Infrastructure Engineer Location: 100% Remote (Continental United States) Position Type: In-house Bright Vision Technologies SOW engagement (no third-party client or vendor) Experience: 6+ years Salary: 100K – 150K Sponsorship: No new H1B sponsorship available. H1B transfers welcomed for qualified candidates. Employment Type: Full-time, direct W2 with Bright Vision Technologies (no C2C, no 1099, no third-party) Engagement: Long-term, multi-year, aligned to the Bright Vision SOW delivery roadmap Compensation: Competitive base salary commensurate with experience, plus benefits.
This is a 100% remote, full-time, direct W2 position with Bright Vision Technologies. This role is part of Bright Vision Technologies' in-house Statement of Work (SOW) engagement. The client, end customer, and employer for this position is Bright Vision Technologies — there is no third-party client, vendor, or implementation partner involved. We do not engage in C2C, 1099, or third-party arrangements for this role. BUT STRICTLY NO C2C/1099/3RD PARTY COMPANIES. ALL OUR ROLES ARE W2 AND NO 3RD PARTY BROKERING PLEASE. Candidates must be willing to work directly as a full-time W2 employee of Bright Vision Technologies and contribute to our in-house SOW deliverables. No new H1B sponsorship is available for this role. However, candidates who are currently on a valid H1B visa and require a transfer are welcome to apply. We will support H1B transfers for qualified candidates. For every role, a technical coding assessment is mandatory. Please apply only if you are confident in your technical abilities and hands-on experience.
Job Summary We are seeking an AI Data Infrastructure Engineer to build and operate the large-scale data systems that power modern AI training and evaluation pipelines. The role combines deep data engineering expertise with a strong understanding of AI workloads, focusing on ingestion, transformation, quality assurance, lineage, and high-throughput delivery of data to training jobs across diverse modalities. The ideal candidate has experience operating petabyte-scale data systems, strong software engineering fundamentals, and clear understanding of how data infrastructure choices propagate into model quality and training efficiency.
- Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows.
- Build ingestion systems for diverse modalities including text, image, audio, video, and structured signals.
- Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale.
- Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training.
- Build high-throughput data loading systems that maximize GPU utilization during training.
- Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems.
- Design storage architectures balancing cost, throughput, and latency across data tiers.
- Build evaluation dataset construction pipelines with strict integrity and contamination controls.
- Implement data privacy, redaction, and consent enforcement throughout the pipeline.
- Collaborate with ML researchers and engineers to align data systems with model development needs.
- Drive observability of data quality, drift, and pipeline health across the AI data estate.
- Optimize cost and performance through compression, format selection, and caching strategies.
- Document data systems, schemas, and operational procedures for broad internal use.
- Stay current with AI data infrastructure research and emerging open-source tools.
Required Qualifications
- Bachelor's or Master's degree in Computer Science or a related field.
- Six or more years of data engineering experience, with significant work supporting ML or AI workloads.
- Strong proficiency in Python and at least one JVM or systems language.
- Deep experience with modern data processing frameworks such as Spark, Ray, or Beam.
- Hands-on experience operating petabyte-scale storage and pipeline systems.
- Strong understanding of distributed systems, data modeling, and storage formats.
- Experience with dataset versioning, lineage, and reproducibility for ML workflows.
- Familiarity with high-throughput data loading for accelerator-based training.
- Strong software engineering practices including testing, CI/CD, and code review.
- Excellent communication and cross-functional collaboration skills.
Preferred Qualifications
- Experience with multimodal datasets at large scale.
- Familiarity with data quality tooling and dataset evaluation methodology.
- Exposure to privacy-preserving data systems and regulated data handling.
- Open-source contributions to data infrastructure projects.
- Experience supporting frontier model training pipelines.
We recognize that our people are our strength, and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Bright Vision Technologies is an Equal Opportunity Employer, including Disability/Veterans.
$140k - $200k
...Clutch Canada is seeking a Software Engineer to join the AI team at Speechify in Fort Worth, Texas. This full-time role focuses on data collection and infrastructure for model training, offering a salary between $140,000-$200,000 plus bonus and equity based on experience...SuggestedFull timeRemote work$140k - $200k
...Speechify is seeking a Software Engineer to handle data collection for model training. Responsibilities include identifying audio data sources, managing cloud infrastructure on GCP, and working with the AI team on data strategies. Ideal candidates have a BS/MS/PhD in Computer...SuggestedRemote work$140k - $200k
...Clutch Canada is seeking a skilled Software Engineer to join Speechify's AI team. This role focuses on data collection to enhance model training, requiring 5+ years of software development experience. Ideal candidates should be proficient in bash/Python and Docker, with...SuggestedRemote work$140k - $200k
...Speechify is looking for a Software Engineer to join its AI team in Memphis, Tennessee. The position involves supporting data collection and enhancing ingestion pipelines on GCP. Ideal candidates will have a BS/MS/PhD in Computer Science, 5+ years of software development...SuggestedRemote work- ...Speechify is seeking a Data-focused Software Engineer to enhance our AI model training operations. This role involves sourcing audio data, extending cloud infrastructure on GCP, and collaborating closely with scientists. Ideal candidates have a BS/MS/PhD in Computer Science...SuggestedRemote work
$140k - $200k
...Speechify is looking for a skilled Software Engineer to join their AI team in Seattle, Washington. The ideal candidate will manage data collection processes for model training and operate the cloud infrastructure on GCP. Required qualifications include a BS/MS/PhD in...Remote work- ...Speechify is seeking a Software Engineer to join their AI team in Burlington, Vermont. This remote role focuses on data collection to support model training operations for... ...skills in Python, Docker, and cloud infrastructure. Speechify fosters a fast-growing environment...Remote work
$140k - $200k
...Clutch Canada, operating through Speechify, is seeking a Software Engineer to enhance their AI team focused on data collection. Responsibilities include developing the ingestion pipeline on GCP, collaborating with scientists to optimize data quality, and sourcing new audio...Remote work$140k - $200k
...Clutch Canada is hiring a Software Engineer to join the Data team at Speechify. This position involves the collection and processing of audio data... ...bonuses and equity, based on experience. Join us in making a meaningful impact in the AI and audio sectors. #J-18808-Ljbffr...Remote work$140k - $200k
...Clutch Canada seeks a Software Engineer to join their AI team at Speechify, responsible for data collection and cloud infrastructure. The ideal candidate will have a BS/MS/PhD in Computer Science, 5+ years of software development experience, and proficiency with bash/Python...Full timeRemote work- ...A pioneering AI company in California is seeking a Data Infrastructure Engineer to build and operate large-scale data systems. The role involves architecting multi-cluster systems for optimized performance and maintaining modern storage solutions. Ideal candidates have...
$140k - $200k
...Canada, based in the Town of Ithaca, is looking for a skilled Software Engineer to join their AI team's Data side. Responsibilities include sourcing and integrating audio data, managing cloud infrastructure, and enhancing data for model training. The ideal candidate holds a...Remote work$122.43k - $183.64k
...Lyric is an AI-first, platform-based healthcare technology company, committed to simplifying the business of care by preventing... ...support are not available for this position. The Senior Data Infrastructure Engineer designs, builds, and scales reliable data platforms that...Full timeVisa sponsorship- ...Data Infrastructure Engineer Los Angeles, Palo Alto, San Francisco, Toronto About HeyGen At HeyGen, our mission is to make visual storytelling... ...of developing applications powered by our cutting-edge AI research. As a Data Infrastructure Engineer, you will lead...
- ...Clutch Canada is seeking a skilled Software Engineer to join Speechify's AI team responsible for data collection and model training. You will be operating the cloud infrastructure, collaborating closely with scientists, and innovating on the dataset roadmap. Candidates...Remote work
$140k - $200k
...Speechify is seeking a Software Engineer for its AI team. The role involves finding new audio data sources, operating cloud infrastructure on GCP, and collaborating on data models. The ideal candidate has a BS/MS/PhD in Computer Science and over 5 years of software development...Remote work- ...Clutch Canada is seeking a skilled Software Engineer to join their AI team, focusing on data collection to enhance model training. In this fully remote... ...work to find new audio data sources, manage the cloud infrastructure, and collaborate with team members to drive product...Remote work
$129k - $209k
...The Elevator Pitch Join Evolv as Senior Data Infrastructure Engineer in the Machine Learning & Sensors organization, responsible for building and... ..., secure, and reliable data pipelines that power our AI/ML research and production systems. In this role, you will...Full timeWork at officeFlexible hours3 days per week$140k - $200k
...Speechify is seeking a Software Engineer for its Data team to enhance data collection for AI model training. The role includes finding audio data sources, managing cloud infrastructure on GCP, and collaborating with scientists for data quality improvements. Ideal candidates...Remote work$140k - $200k
...Clutch Canada seeks a Software Engineer to support data collection for AI model training. The role requires a BS/MS/PhD in Computer Science and over... ...collaborating with the AI team, and utilizing cloud infrastructure on GCP. The position offers a competitive salary between...Remote work- ...Clutch Canada is seeking a Software Engineer to join its AI team. The role focuses on data collection for model training and involves finding new audio data sources and operating cloud infrastructure on GCP. Candidates should have a BS/MS/PhD in Computer Science and over...Remote work
- ...Speechify is seeking a Software Engineer for its AI team to enhance data processes for model training. The role emphasizes gathering audio data, managing cloud infrastructure, and collaborating with scientists. Ideal candidates should have a strong background in software...Remote work
$140k - $200k
Clutch Canada is hiring a Software Engineer to join the data team at Speechify. The role involves finding audio data sources, managing the ingestion... ...on GCP, and collaborating to enhance data quality for AI models. Ideal candidates will have a BS/MS/PhD in Computer Science...$100k - $200k
...software company that is transforming how AI and cloud infrastructure teams manage modern networks. The... ...the business is helping design modern data center architectures and build networks... ...hands‑on Data Center & IT Infrastructure Engineer to manage and support physical...Work at officeLocal areaRemote work$140k - $200k
TryApplyNow is seeking a data-focused candidate for their AI team in Redmond, Washington. The role involves collecting and processing large audio datasets, operating cloud infrastructure on GCP, and collaborating with scientists to enhance data quality and efficiency. Ideal...- Clutch Canada is seeking a Software Engineer to enhance the data collection for our AI team at Speechify. You will play a vital role in integrating audio... ...into our ingestion pipeline and operating our cloud infrastructure. The ideal candidate will possess a relevant degree...
$140k - $180k
...Healthcare Insurance Are you a Senior Data Engineer with experience designing and building... ...platforms, looking to join a high-growth AI start-up operating in the E-commerce and... ...skillset and take ownership of a data infrastructure that powers large-scale product catalogs...Remote work- A high-growth AI start-up is seeking a Senior Data Engineer to design and build reliable data pipelines and warehousing systems. This remote role involves... ...like GCP. An exciting opportunity to shape the data infrastructure for enterprise-level analytics and insights in the E...Remote job
$100k - $180k
...Data Center Network Engineer Location: Multiple locations (Wayne, PA; Plano, TX; Scottsdale, AZ; Charlotte... ...execution, supporting critical infrastructure including Cisco ACI fabrics and VPN... ...this job, you agree to receive calls, AI-generated calls, text messages, or emails...Permanent employmentContract workRemote work3 days per week- Decagon AI, Inc. is looking for a Senior Data Infrastructure Engineer to design and operate the data systems that power its AI products. The successful candidate will own critical data pipelines and storage layers, improving reliability and creating clear data pathways...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Data Infrastructure Engineer. Be the first to apply!
- ai research engineer United States
- ai developer United States
- ai prompt engineer United States
- ai engineer United States
- senior ai engineer United States
- ai ml engineer United States
- ai engineer remote United States
- machine learning ai engineer United States
- remote data engineer United States
- data engineer intern United States

