Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Distributed Web Crawling Engineer for Large-Scale AI Data

Reflection

Reflection in San Francisco is seeking an engineer to build and operate web-scale systems for data collection. This role involves working closely with AI researchers to optimize crawling infrastructure and enhance data quality. The ideal candidate will have a strong background in distributed systems and web data processing. Competitive compensation and benefits are offered to ensure impactful work. #J-18808-Ljbffr Reflection

Vacancy posted 14 hours ago
Similar jobs that could be interesting for youBased on the Distributed Web Crawling Engineer for Large-Scale AI Data in San Francisco, CA vacancy
  •  ...states. Our team of AI researchers and...  ...the Role The web is one of the most...  ...diversity of web data directly influence...  ...build and operate large-scale web crawling systems that continuously...  ...scheduling to distributed crawling, content...  ...is ideal for engineers who love building... 
    Data
    Relocation package

    Reflection

    San Francisco, CA
    2 days ago
  •  ...retrieve and process all our data? Our founding team...  ...that made generative AI accessible to millions....  ...building infrastructure at a scale where billion-image...  ...—the kind where "large" means billions of assets...  ...Slurm/HPC environments for distributed data processing Have... 
    Data
    Worldwide

    Black Forest Labs

    San Francisco, CA
    2 days ago
  •  ...ARR this year. Why top engineers are joining: Hyper‑growth...  ...ship The platform is scaling at a pace few startups...  ...DynamoDB, EC2, ELB), and AI‑driven workflows. High‑...  ..., and solve complex distributed systems problems at scale...  ...‑powered workflows and data intelligence features Maintain... 
    Data

    Emeraldadvantageconcepts

    San Francisco, CA
    4 days ago
  • $180k - $215k

    As a Backend Engineer on our application team at...  ...our customer data. It is the “brains”...  ...and build a scalable distributed system capable of supporting...  ...Plan for scale in building solutions...  ...languages - Java, Kotlin Web framework - Spring...  ...intelligence and AI company that gives... 
    Data

    Windfall Data Inc

    San Francisco, CA
    14 hours ago
  • A cutting-edge AI research firm in San Francisco is seeking talent to build and optimize GPU infrastructure for large-scale model inference and training workloads. The ideal candidate will...  ..., actively contributing to synthetic data generation and reinforcement learning pipelines... 
    Data

    Reflection

    San Francisco, CA
    4 days ago
  •  ...We are seeking a Lead Data Engineer to architect, build, and...  ...experience with distributed data platforms, and strong...  ...Proven experience leading large‑scale data platform...  ...compliance frameworks. Use of AI Tools As a technology...  ..., Climbing Never, Crawling Never, Kneeling Never.... 
    Data

    Q-Cells

    San Francisco, CA
    1 day ago
  •  ...that every software engineer begins and ends...  ...Our customers are large engineering teams who...  ...to code review to data analysis. Indent is...  ...the Swift Compiler, distributed data orchestration software, and scaled video conferencing...  ...and dislikes about AI coding tools you have... 
    Data
    Immediate start

    INDENT USA LLC

    San Francisco, CA
    14 hours ago
  • $350k

     ...training researcher, responsible for curating and analyzing large-scale datasets that support AI model development. The ideal candidate will demonstrate...  ...in relevant fields. This role blends research and engineering, requiring both theoretical knowledge and practical skills... 
    Data

    Thinking Machines Lab

    San Francisco, CA
    3 days ago
  •  ...Backend Software Engineer Location: San Francisco...  ...SF) About Billables AI At Billables AI, we’...  ...engineer to help build and scale the core systems...  ...ingest and process large volumes of activity data Improve reliability,...  ...and scaling distributed systems or microservice... 
    Data
    Work at office
    3 days per week

    Billables Incorporated

    San Francisco, CA
    14 hours ago
  • $140k - $220k

     ...David AI provided pay range This range...  ...0/yr About Our Engineering Team: At David AI...  ...into high‑signal data for leading AI...  ...and performance of distributed systems that ingest...  ...serve audio data at scale. Define and...  ...async processing, or large‑scale data pipelines... 
    Data
    Full time
    Work at office
    Flexible hours

    David AI

    San Francisco, CA
    14 hours ago
  •  ...the role We’re looking for a top‑tier Backend Engineer to join our tech team and work hand‑in‑hand...  ...responsibilities will be to: Build and operate large-scale, distributed systems powering AI-ready web search Develop APIs, data pipelines, and microservices for indexing,... 
    Data
    Remote work
    Flexible hours

    Linkup Inc

    San Francisco, CA
    14 hours ago
  • $160k - $300k

     ...pioneering foundational AI company for...  ...unstructured technical data into real-time,...  ...revolutionize how engineering decisions are made...  ..., build, and scale the core systems that...  ...designing and building distributed backend services...  ...RESTful APIs for large-scale applications... 
    Data
    Work at office
    Visa sponsorship
    Flexible hours

    APIphany

    San Francisco, CA
    14 hours ago
  • Scribe is seeking a Sr. Software Engineer, Backend to aid in building...  ...their product. The role focuses on large-scale data ingestion, workflow processing, and AI capabilities using Python, Typescript...  ...development, particularly with distributed systems. The company offers... 
    Data
    Flexible hours

    scribehow.com

    San Francisco, CA
    2 days ago
  •  ...bringing advanced AI capabilities to...  ..., mobile engineers, frontend engineers...  ...generation at global scale. Whether users...  ...orchestration systems, and data infrastructure....  ...with modern web technologies...  ..., APIs, and distributed systems. Enjoy owning...  ..., and large-scale user experiences... 
    Data
    Worldwide

    AI Chopping Block, Inc.

    San Francisco, CA
    2 days ago
  • $140k - $260k

     ...marketing platform for the AI era. As people...  ...us. As a Backend Engineer, you will build and...  ...APIs, process large datasets efficiently...  ...-time insights and data retrieval Optimize...  ...pipelines to handle large-scale structured and...  ...pipelines and working with distributed systems Familiarity... 
    Data
    Work at office
    Visa sponsorship
    Shift work

    Slope

    San Francisco, CA
    3 days ago
  • Requirements 2+ years of web development experience...  ...role, working on a large scale consumer app , (Desirable...  ...exceptional Fullstack / Web Engineers that can work across...  ..., frameworks, and AI tooling , Responsible for...  ...deployment, and secure data handling , Drive technical... 
    Data
    Worldwide

    Xai

    San Francisco, CA
    4 days ago
  •  ...by bringing advanced AI capabilities to hundreds...  ...experienced Backend Engineer to join the Image Generation...  ..., and cost across large-scale distributed systems. Partner with...  ...closely with Android, iOS, web, and full-stack...  ...platform systems. Use data and experimentation to... 
    Data
    Worldwide

    The Consulting Solutions

    San Francisco, CA
    1 day ago
  • Founding Engineer, Backend & Infrastructure About...  ...the clinical AI layer for healthcare...  ...infrastructure needed to scale from thousands of...  ...and scale the distributed systems that power...  ...services across chart data, claims data, authorization...  ...data pipelines, or large-scale backend... 
    Data

    Backbone Systems

    San Francisco, CA
    3 days ago
  •  ...The team’s mission is to distribute OpenAI’s API broadly...  ...platforms, model behavior, and large-scale infrastructure. About...  ...looking for a backend engineer who can quickly...  ...developer tools, especially AI-powered tools,...  ...keeping workloads and data within cloud environments... 
    Data
    Internship

    OpenAI

    San Francisco, CA
    4 days ago
  • $300k

     ...Us We are a leading AI startup pushing the...  ...AI researchers and engineers from top...  ...architect, build, and scale the infrastructure...  ...workloads. Architect distributed systems for high throughput...  ...Build robust APIs, data pipelines, and...  ...-efficiency across large-scale compute environments... 
    Data
    Remote work

    Stealth AI Startup

    San Francisco, CA
    14 hours ago
  • $150k - $300k

     ...Company A Seed-stage AI company (founded 202...  ...into LLM-ready data for AI agents and large language models. The...  ...infrastructure-heavy engineering role on a small, talent...  ...Architect, build, and scale core backend systems...  ...); C++/Rust (plus); distributed systems (... 
    Data
    Full time
    H1b
    Work at office
    Visa sponsorship

    David Joseph & Company

    San Francisco, CA
    2 days ago
  •  ...client is building the AI backbone for the...  ...for AI models”—not data or raw compute, but...  ...a Backend Software Engineer (ML Infrastructure)...  ...design, build, and scale the core systems that power large-scale model training...  ...candidate will work on distributed training pipelines,... 
    Data
    Remote work

    Rockstar

    San Francisco, CA
    14 hours ago
  • $325k

     ...Team Full Stack engineers within the Fleet...  ...efficiently manage AI workloads across...  ..., and operate web-based systems...  ...designing tools that scale to exascale...  ...monitor, and manage large-scale AI...  ...backends. Build data visualization tools...  ...across globally distributed supercomputing infrastructure... 
    Data
    Work at office
    Relocation package

    Slope

    San Francisco, CA
    4 days ago
  •  ...capture, process, and scale knowledge. As a Sr. Software Engineer, Backend, you will...  ...systems behind our data ingestion, workflow processing, and AI inference capabilities...  ...deeply about distributed systems, scalability...  ...architect and evolve large‑scale data ingestion... 
    Data
    Full time
    Home office
    Flexible hours

    scribehow.com

    San Francisco, CA
    2 days ago
  •  ...using technology and AI. We empower...  ...more at Life as an Engineer at EvenUp Location...  ...dedicated to making large language models (LLMs...  ...multidisciplinary team of Data Scientists,...  ...and efficiently at scale. This is an excellent...  ...frameworks, libraries, and distributed systems with a... 
    Data
    Full time
    Temporary work
    Work at office
    Local area
    Home office
    Flexible hours
    3 days per week

    EvenUp Inc.

    San Francisco, CA
    4 days ago
  •  ...leading 3D generative AI company on a...  ...with team members distributed across North America...  ...interns to join our engineering team and help...  ...systems to training large-scale machine learning models...  ...understanding of data structures,...  ...GCP, Kubernetes) Web development (React... 
    Data
    Internship
    Work at office
    Remote work
    Worldwide
    Flexible hours
    1 day per week

    Meshy LLC.

    San Francisco, CA
    4 days ago
  • $192k - $260k

     ...obsessed with enabling data teams to solve the...  ...’s best data and AI infrastructure...  ...in the world. Our engineering teams build highly...  ...resilience, security and scale that is critical...  ...learning and distributed systems. Our technology...  ..., and operating large scale distributed... 
    Data
    Work at office
    Local area
    Worldwide
    Flexible hours

    Menlo Ventures

    San Francisco, CA
    1 day ago
  • A leading data and AI company is seeking a Staff Engineer to design and implement core systems for their Foundation Model Serving. The position focuses on large-scale distributed systems, optimizing GPU workloads, and collaborating across teams. Applicants should have... 
    Data

    Menlo Ventures

    San Francisco, CA
    4 days ago
  •  ...commerce with ATHIA, our AI-powered orchestration...  ...platform that helps large enterprises boost approval...  ...optimization, and data orchestration in one powerful...  ...a Backend Staff Engineer to act as a deep...  ...engineering challenges, scaling distributed systems, and setting backend... 
    Data
    Remote work

    DEUNA

    San Francisco, CA
    3 days ago
  • Granica is looking for a skilled engineer specializing in distributed systems to design self-optimizing data infrastructure. This role...  ...success in delivering impactful large-scale data systems. Join us to pioneer advancements in AI infrastructure while enjoying competitive... 
    Data

    Granica

    San Francisco, CA
    4 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Distributed Web Crawling Engineer for Large-Scale AI Data. Be the first to apply!