Staff Software Engineer - Search Platform, Ingestion & Indexing
$136k - $253kThomson Reuters
This posting is for proactive recruitment purposes and may be used to fill current openings or future vacancies within our organization.
Overview of the Role
Advanced Content Engineering (ACE) is seeking a Staff Software Engineer to serve as the technical anchor for the search platform's ingestion and indexing systems. The platform processes millions of documents across TR's legal, tax, and professional content corpora - parsing, chunking, enriching, embedding, and indexing them into a hybrid search engine that powers both human-facing search interfaces and autonomous AI agents. Getting this pipeline right, at scale, with zero-downtime operations and increasingly agentic retrieval patterns, is one of the platform's most consequential engineering challenges.
This role owns the design, implementation, and operational health of the document ingestion pipeline and search index management systems - from the Kafka-based streaming infrastructure that moves documents through processing stages, to the Vespa application architecture that stores and serves them. Staff Engineers on this team define, build, test, deploy, scale, and operate what they ship - full-stack ownership is not a principle we aspire to, it is the daily reality. AI-assisted development is the team norm, not the exception, and constant delivery to production is the expectation. This is a role for someone who sets architectural boundaries, not just executes within them
About the Role
In this position, you will focus on:
Ingestion Pipeline Architecture & Engineering
* Plan, design, develop, and own the end-to-end document ingestion pipeline - a Kafka-based stream processing architecture that moves documents through parsing, chunking, enrichment (entity extraction, embedding generation, metadata enrichment), and indexing stages - including all fault tolerance, version ordering, and at-least-once delivery guarantees
* Architect and implement pluggable, configurable pipeline components (parsers, chunkers, enrichers, indexers) that client teams can assemble into custom topologies via the platform's self-service APIs, while maintaining reliable, observable, and performant execution
* Own the platform's Protobuf-based document schema and schema registry integration - establishing schema governance standards, enforcing backward-compatible evolution, and ensuring reliable serialization across all pipeline stages
* Design and implement dual-flow ingestion: a high-throughput batch path for full reindexing and a low-latency incremental path for real-time document updates, with strong guarantees around document version ordering and idempotent processing
* Lead the migration of ingestion infrastructure from OpenSearch to Vespa, including design of Vespa document processors, custom Kafka feeders, and application package architecture - resolving complex technical challenges that have little or no precedent within the team
Custom Model Operationalization
* Own the end-to-end lifecycle for custom models integrated into the ingestion pipeline - re-ranking models, embedding models, and enrichment components - including inference serving behind a stable API surface, latency SLO management, hardware and runtime configuration (batching, quantization), and scaling
* Build and operate the model promotion pipeline: the CI/CD workflow that moves a model artifact from the fine-tuning team through staging to production, including versioning, canary rollouts, and rollback mechanisms - ensuring the platform team can operate model updates independently without depending on the research team for production changes
* Define and maintain integration contracts between custom models and downstream pipeline components - governing input/output schemas, compatibility requirements, and the governance process for model updates that ensures search pipeline consumers are not broken by changes upstream
* Instrument model serving for production observability: latency distributions, throughput, error rates, and quality signals such as re-ranking score distributions - enabling the team to detect regressions or model drift without requiring the fine-tuning team's involvement
Search Engine & Index Management
* Own the search engine layer end-to-end: design and operate Vespa (and OpenSearch during transition) index configurations, ranking profiles, schema definitions, and application package lifecycle management - applying architectural principles that scale to the platform's long-term content and tenancy goals
* Build and operate zero-downtime index management: shadow indexing, blue/green index promotion, and rolling reindex workflows that keep the platform available during major infrastructure changes
* Implement and maintain the Component Registry and Index Registry - the platform's catalog of reusable processing components and active index configurations - with a focus on correctness, observability, and safe concurrent modification
* Develop the full-reindex and incremental-update orchestration logic, including change detection, document tracking, Kafka topic management, and DynamoDB-backed state management
Agentic Search Infrastructure
* Design ingestion and indexing infrastructure with agentic retrieval patterns as a first-class concern - including explicit latency budgets per retrieval hop, chunking and result compression strategies optimized for token economy in context windows, and index boundary definitions that give agents clean, predictable tool contracts
* Build trace-level observability into the retrieval stack that captures which tools were called, in what order, and with what inputs - enabling reliable diagnosis and reproduction of failures in non-deterministic agentic retrieval paths
* Design session state and cache invalidation patterns for multi-turn agentic search: reasoning carefully about cache validity windows, session state scope (per-user, per-session, per-query), and mechanisms to prevent stale context from corrupting downstream agent responses
Evaluation & Search Quality
* Build and own the integration between the ingestion pipeline and the platform's offline evaluation framework - ensuring that experiment runs produce query/result outputs that feed seamlessly into the search grading tool, supporting gold test set maintenance, LLM-as-judge evaluation, and side-by-side ranking comparison across pipeline versions
* Instrument the query and retrieval stack for online analytics: real-time query latency and throughput monitoring, query log collection for session analysis, and the infrastructure to support A/B and interleaved ranking experiments in production - generating the signals that connect low-level search metrics to downstream product KPIs
* Partner with TR Labs and research scientists to ensure that new search components can be evaluated in isolation - with automated offline evaluation on every build and a clear path from evaluation results to production promotion decisions
Reliability & Operational Ownership
* Take full operational responsibility for ingestion and indexing infrastructure: define SLOs, set measurable goals and meet them, build and maintain CloudWatch dashboards and alarms, and participate in on-call rotations - you built it, you own it, you run it
* Treat delivery friction as the enemy: identify and remove obstacles that slow the team's ability to ship ingestion and indexing changes to production safely and frequently - improving CI/CD pipelines, deployment automation, and local development workflows as a standing priority
* Instrument pipeline components with distributed tracing, structured logging, and rich metrics - establishing documentation standards and knowledge
management practices so that the team and platform consumers can understand system behavior at all times
* Design and implement resilient fault tolerance mechanisms - dead-letter queues, retry strategies with exponential backoff, circuit breakers, consumer lag monitoring - that make the pipeline robust to downstream failures and transient errors
* Drive system-level performance architecture: profiling ingestion throughput and indexing latency, identifying bottlenecks, and implementing optimizations that meet platform SLOs under peak load
Technical Leadership
* Serve as the team's deepest technical authority on document processing pipelines and search engine internals - guiding architectural decisions, resolving technical ambiguity, and establishing cross-system design patterns that raise the quality bar across the team
* Lead significant projects and initiatives that span multiple engineers and interact with other teams; determine work priorities based on strategic direction; recommend modifications to team operations and make needed adjustments to short-term priorities while maintaining strategic focus
* Mentor and develop Senior and mid-level engineers - providing coaching, technical direction, and educational opportunities in modern distributed systems, stream processing, search infrastructure, and AI-assisted development practices
* Collaborate closely with TR Labs and research scientists to integrate new chunking strategies, embedding models, and enrichment techniques into the pipeline in a safe, well-instrumented, and ethically responsible way
* Deliver effective presentations on complex technical concepts to both technical and non-technical stakeholders; develop strategic plans for technology implementation that align with business objectives
About You
You're an ideal fit if you have:
Required Experience -
* Bachelor's or Master's degree in Computer Science, Engineering, or a related field
* 8+ years of software engineering experience, with demonstrated progression to staff-level or equivalent technical leadership - including ownership of a functional area and leadership of significant cross-functional projects
* Deep expertise in distributed stream processing: designing, building, and operating high-throughput, fault-tolerant event-driven pipelines using Kafka or equivalent technologies at production scale
* Production experience with Vespa, OpenSearch, or Elasticsearch - including schema design, ranking profile configuration, and end-to-end application lifecycle management
* Mastery of Python with strategic awareness of language and framework selection; strong software engineering fundamentals including test strategy, performance architecture, and system design
* Proficiency with AWS cloud services used in data pipeline and search infrastructure (MSK, ECS, Lambda, DynamoDB, Step Functions, CloudWatch), with infrastructure-as-code experience (Terraform or AWS CDK)
* Demonstrated ability to take full operational responsibility end-to-end - defining SLOs, building observability, running on-call, and driving systematic improvements from incident retrospectives - with a track record of shipping to production frequently and removing delivery friction proactively
* Comfort and fluency with AI-assisted development tools; you use them to move faster and produce higher-quality work, not as a novelty
* Track record of establishing architectural principles, cross-system design patterns, and documentation standards that improve the broader team's engineering quality
Preferred Experience -
* Experience operationalizing ML models in production: inference serving, model promotion pipelines, canary rollouts, and production observability for model quality signals
* Familiarity with agentic retrieval patterns - multi-hop retrieval, latency budget management across retrieval hops, context window optimization, and stateful session design
* Experience with online search analytics: instrumenting systems for query performance monitoring, A/B or interleaved ranking experiments, and query log analysis to surface relevance gaps
* Experience with embedding pipelines, vector indexing, and hybrid (dense + sparse) retrieval architectures in a production context
* Familiarity with Protobuf schema design and schema registry governance patterns (Confluent Schema Registry or equivalent)
* Experience building self-service or multi-tenant platform infrastructure where reliability and correctness directly affect multiple downstream teams
* Background in AI ethics frameworks and responsible deployment of machine learning components in production pipelines
What Success Looks Like
In the first 90 days:
* Develop a thorough understanding of the platform's current ingestion and indexing architecture, active technical debt, known reliability gaps, and the roadmap for Vespa adoption
* Establish strong working relationships with the search platform team, TR Labs, and key client teams consuming the ingestion pipeline
* Take on-call ownership for your functional area and deliver at least one meaningful improvement to pipeline reliability, observability, or delivery automation
In the first year:
* Lead the architectural design and delivery of a major phase of the Vespa migration - including ingestion pipeline changes, schema migration, and zero-downtime index promotion - resolving novel technical challenges with minimal precedent
* Establish robust SLO coverage and observability across ingestion components, with on-call playbooks, documented architectural decision records, and demonstrated improvement in incident response quality
* Deliver a production-ready custom model operationalization framework: inference serving, promotion pipeline, and observability for at least one custom model integrated into the ingestion or query stack
* Become the recognized technical authority for ingestion and indexing - the person the team and partner organizations turn to for architectural direction in this domain - with demonstrated influence on platform strategy.
#LI-TH1
What's in it For You?
Hybrid Work Model: We've adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected.
Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance.
Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrow's challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future.
Industry Competitive Benefits: We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing.
Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our values: Obsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together.
Social Impact: Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives.
-
Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world.
About Us
Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news.
We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound exciting? Join us and help shape the industries that move society forward.
As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace.
Thomson Reuters makes reasonable accommodations for applicants with disabilities, including veterans with disabilities, and for sincerely held religious beliefs in accordance with applicable law. If you reside in the United States and require an accommodation in the recruiting process, you may contact our Human Resources Department View email address on click.appcast.io. Disability accommodations in the recruiting process may include things like a sign language interpreter, making interview rooms accessible, providing assistive technology, or other relevant accommodations. Please note this email is not intended for general recruitment questions and we will promptly respond to inquiries regarding accommodations. More information on requesting an accommodation here.
Learn more on how to protect yourself from fraudulent job postings here.
More information about Thomson Reuters can be found on thomsonreuters.com
$279.2k - $390.9k
...Senior Staff Software Engineer, Indexing & Retrieval Platform Remote - United States Reddit is a community of communities... ...and maintain systems for ML data ingestion, low-latency retrieval services,... ...engineers across Content Understanding, Search, Feeds, Ads, Growth, and Safety...SuggestedRemote work$160k - $220k
...the structural level. Our platform deploys Per-Account AgentsTM... ...looking for a Senior / Staff Software Engineer - Search & Retrieval to build and... ...Actively's agents, covering indexing, querying, ranking, and... ...updates, or event-driven ingestion, and know how freshness trade...SuggestedWork at officeShift work$190k - $220k
Senior / Staff Software Engineer (Search) Location: New York, NY (Full-time, on-site 5 days/week) Compensation: $190,000 - $220,000 base... ...data retrieval. Build and optimize data pipelines to ingest, normalize, and index large volumes of government and enterprise data....SuggestedFull timeImmediate start$101.5k - $195k
...the premier programmer and platform for subscription and digital... ...make an impact: As a Staff Software Engineer, you will help make SiriusXM... .... You'll work on the core search infrastructure that powers... ...architectural evolution of indexing, retrieval, and serving pipelines...SuggestedTemporary workLocal areaShift work$180k - $220k
...Role We are hiring a senior software engineer to help design and build core... ...DB supporting semantic search and hybrid retrieval. This role... ...Responsibilities Design and build scalable platform components leveraging... ...Design and build optimized indexing pipelines for structured and...SuggestedFull timeLocal areaRemote workWork from homeFlexible hours- ...Candidates - USC/GC only Role: Platform Developer with Lucidworks... ...seeking a skilled Platform Engineer with expertise in... ...and maintain our enterprise search and data analytics platforms... ...engine architectures, data ingestion and indexing strategies ensuring reliable...Contract work
$170k - $213k
...Knowledge & Context Engineer, you will design and... ...Design systems for ingestion, indexing, embeddings, metadata... ...auditability into the platform from day one.... ...support a mix of data and software development activities... ..., otnology, semantic search, and runtime memory evaluations...Temporary workLocal areaWorldwide$190k - $220k
...Software Engineer (Search) GovDash helps businesses win and deliver government contracts that... ...American interests. Our AI platform is a single, secure, workflow-driven... ...and optimize data pipelines that ingest, normalize, and index large volumes of government and enterprise...$190k - $220k
...Senior / Staff Software Engineer (Search) Title of Role: Senior / Staff Software Engineer (Search) Location: New York, onsite Company... ...system tailored for government contracting. This innovative platform unifies workflows across capture, proposals, delivery,...Work at office- ...Presence is building the AI growth platform for real estate. Backed by... ...for Anthropic tokens, our engineers use AI agents to write,... ...than almost anyone else. As a Staff Software Engineer , you'll be a technical... ...and scale a high-throughput search platform. Drive the...Remote workShift work
$262k - $365k
Senior Staff Software Engineer, Search Ads Bidding corporate_fare Google place New York, NY, USA ; Mountain View, CA, USA Requirements Bachelor'... ...bid strategies and algorithms, GenAI, data systems, and platforms) and launching both advertiser-facing bidding products and...Full time$197k - $291k
Staff Software Engineer, iOS Google Search App Copy link Google Mountain View, CA, USA ; Cambridge, MA, USA ; +3 more ; +2 more Advanced Experience owning... ...software solutions. The iOS Google Search App (iGA) Platform team is responsible for supporting the core app level...Full time$211.1k - $285k
TryApplyNow is looking for a Staff Software Engineer to join the Search Services team. In this role, you'll be vital in transforming our search function... ...and infrastructure that power our innovative search platform, focusing on high performance and reliability. Your daily...Remote job- ...Luxury Presence is seeking a Staff Software Engineer to lead the development of its AI growth platform for real estate. The role involves shaping platform architecture... ...delivering AI-powered products, and optimizing search and notifications systems. Ideal candidates will...
- ...are right for you. The Role We're building a search platform that helps scientists find answers across billions... ...We're looking for a Lead/Principal Platform Engineer to lead that effort. You'll own the full search stack: indexing and scoring, query understanding and...Immediate startRemote workFlexible hours
$120k - $220k
...the structural level. Our platform deploys Per-Account AgentsTM... ...We're looking for a Senior/Staff Data Platform Engineer to build and scale the foundation... ...You'll Do Own the ingestion and transformation layer.... ...like chunking strategies, index management, and keeping...Work at officeFlexible hoursShift work$266k - $372.4k
...visit redditinc.com . Team The Search Platform team at Reddit is the... ...a team of machine learning engineers and backend architects to develop... ...What You’ll Do As a Senior Staff Software Engineer on the Search... ...real-time data pipelines, indexing frameworks, and hybrid (lexical...For contractorsWork experience placementFlexible hours$129.38k - $215.63k
...We're the AI-native engine behind the world's most... ...-first Agentic AI platform empowers autonomous agents... ...streaming (Kafka) and search/analytics (OpenSearch)... ...deployments, and data ingestion pipelines for Kafka... ...request page. BMC Software maintains a strict policy...Work at officeRemote workFlexible hours$160k - $240k
...Bloomberg Big Data Services) platform powers this scale with... ...We use clusters that index and serve millions of... ...Team The DataHub Engineering team provides a... ...managed data stores, search, discovery, lakehouse,... ...have: 4+ years of software development experience...Temporary workFor contractorsWork experience placement- ...self-serve advertising platform that specializes in... ...America. We're seeking a Staff Engineer to help lead our... ...data lifecycle, from ingesting diverse data types such... ...Elasticsearch for powerful search and analytics, and HDFS... ...and growth Promote software development best-practices...Remote workWork from homeHome office
$202k - $278k
...With market intelligence and search built on proven AI, AlphaSense... ...of content sets. Our platform is trusted by over 6,000 enterprise... ...Team: Our diverse Product & Engineering team values innovation, collaboration... ...experience designing data ingestion, extraction, or processing...Local area$200k - $275k
...Staff Software Engineer, AI Peregrine helps public safety organizations, state... ...accuracy. Our AI-enabled platform turns siloed and... ...to handle terabytes of data ingested from a variety of sources,... ...in realtime, and optimizing search algorithms to serve results...Work at officeLocal area$234k - $300k
...We're looking for a Staff Engineer to join the Logs organization... ...how our customers ingest, query, and derive... ...ingestion pipelines, search infrastructure, and intelligent... ...of how AI can improve software engineering best... ...observability and security platform for the AI era,...Work at office$60k - $75k
Lead Engineer - Application Support Ready to turn bold ideas into... ...the role of Lead Engineer, Search Platform Engineer Solr / Lucidworks /... ...processes for bulk and real-time indexing of large datasets from... ...Information Technology, Masters - Software Engineering Certifications...Full timeWork at officeRemote workShift work$210k - $280k
...Staff Software Engineer (Frontend), Infra New York About Us Harmonic is the startup discovery... ...of companies, build sophisticated searches, and collaborate on deal flow. As we scale... ..., ATS, etc.), or anything that helps index, document, and make data available for...Work at office3 days per week$210k - $280k
...Staff Software Engineer New York About Us Harmonic is the startup discovery... .... We make it possible to search, research, analyze, compare... ...counting). We source and ingest data from first-, second-... ...in the middle of a massive platform shift where AI and LLMs are...Work at officeShift work$170k - $190k
...Lead Software Engineer, Content Ingestion Remote (US & Canada) Muck Rack is the leading SaaS platform for public relations and communications professionals. Our mission is to enable... ...Expertise with relational or search databases such as MySQL, Postgres, or Elasticsearch...Permanent employmentLocal areaRemote workWorldwideHome office$160k - $240k
Senior Software Engineer - DataHub Search Location New York Business Area Engineering and CTO Ref # 10... ...BBDS (Bloomberg Big Data Services) platform powers this scale with distributed systems... ...technologies. We use clusters that index and serve millions of documents...Temporary workFor contractorsWork experience placement- ...Senior Staff Software Engineer We're hiring a Senior Staff Software Engineer to help shape the future of Rippling's platform and AI infrastructure. This is a high-impact role for a distributed... ...integrated AI experiences. From search and workflow orchestration to...Work at office3 days per week
$200k - $250k
...Staff Software Engineer, AI Platform Flex is a growth-stage, NYC headquartered FinTech company that is creating the best rent payment experience.... ...categorized into one of three tiers based on a cost of labor index for that geographic area. The successful candidate's...Full timeLocal areaRelocation packageFlexible hours2 days per week3 days per week
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Software Engineer - Search Platform, Ingestion & Indexing. Be the first to apply!
- senior platform engineer Brooklyn, NY
- platform developer Brooklyn, NY
- platform engineer Brooklyn, NY
- id software Brooklyn, NY
- software quality assurance Brooklyn, NY
- software sales Brooklyn, NY
- internship software Brooklyn, NY
- remote software sales Brooklyn, NY
- embedded software Brooklyn, NY
- software asset management analyst Brooklyn, NY

