Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Systems & Platform Internals - Technical Architect

Accellor

Job Description

Job Description

Accellor is an AI-native services firm purpose-built for the post-ChatGPT era. Free from legacy constraints, we focus on delivering measurable business outcomes through advanced AI, data, and engineering capabilities. Our mission is to operationalize AI at scale and unlock sustained enterprise value.

Our offerings span AI solutions, data services, enterprise applications, and product engineering, tailored to industry-specific needs across healthcare, life sciences, telecom, retail, financial services, and technology. By leveraging design thinking and technology-agnostic architectures, we ensure faster time-to-value and seamless interoperability.

With a proven track record of enabling Fortune 100 enterprises and global innovators, Accellor stands as a trusted partner for organizations seeking to harness the full potential of AI. Our vision is clear: to build intelligent, connected ecosystems that deliver measurable outcomes and redefine the future of enterprise transformation.

Technical Architect — AI Systems & Platform Internals

Experience: 10–12 Years
Role Type: Technical Architect / Staff-Level Systems Architect

Role Summary

Accellor is looking for a Technical Architect — AI Systems, Inference & Platform Internals to help design, scale, and optimize the systems that power ChatGPT, OpenAI API, Codex, agentic systems, multimodal experiences, and internal research workloads.

This role is focused on the internal AI systems stack, including inference runtime, model serving, GPU infrastructure, distributed systems, context engineering, cost optimization, evaluation gates, observability, release safety, and production reliability.

The ideal candidate is a senior hands-on architect who can reason across the full AI platform — from GPU-level performance and distributed inference to product-scale reliability, model deployment, safety, and cost-efficient operations.

Key Responsibilities :

1. AI Systems Architecture

Design and evolve large-scale AI systems that support ChatGPT, OpenAI API, Codex, agentic workflows, multimodal models, and research workloads.

Define architecture across inference runtime, model serving, request routing, batching, KV-cache handling, GPU scheduling, distributed execution, observability, release gates, and production rollout.

Own technical trade-offs across latency, throughput, reliability, correctness, safety, scalability, cost, and infrastructure efficiency.

2. Inference Runtime & Model Serving

Architect high-throughput, low-latency inference systems across large-scale GPU clusters.

Work across inference engines, serving layers, scheduling systems, caching, streaming, deployment pipelines, and runtime optimization.

Partner with engineering teams to improve model-serving efficiency, tail latency, GPU utilization, memory efficiency, correctness under load, and cost per request.

Guide architecture decisions involving PyTorch, JAX, Triton, vLLM-style serving, CUDA/Triton kernels, distributed inference, tensor parallelism, pipeline parallelism, model sharding, and long-context serving.

3. GPU, Kernel & Distributed Performance

Analyze and improve performance across GPU kernels, memory movement, collective communication, orchestration, and runtime scheduling.

Guide engineering decisions involving CUDA, Triton, NCCL/RCCL, GPU profiling, memory pressure, compute utilization, tensor layouts, interconnect behavior, and distributed execution.

Identify system-level bottlenecks across compute, memory, networking, scheduling, model execution, and data movement.

4. Context Engineering

Design and guide context engineering frameworks that determine what information should be passed to the model, how it should be structured, how much context should be used, and how context quality should be measured.

Own architecture patterns for prompt structure, dynamic context assembly, retrieval-augmented generation, long-context management, conversation memory, tool context, agent state, multimodal context, source grounding, permission-aware retrieval, context compression, and context auditability.

Ensure AI systems use the right context, from the right source, with the right permissions, at the right cost, and with measurable quality.

5. Cost Optimization Frameworks

Design and build cost optimization frameworks for large-scale LLM and GenAI workloads.

Create architecture patterns that reduce unnecessary token usage, redundant retrieval, repeated model calls, inefficient inference paths, and avoidable infrastructure spend.

Drive model routing, token budgeting, prompt compression, context pruning, semantic caching, response caching, batch inference, async execution, fallback strategies, and cost telemetry across AI workflows.

Ensure cost optimization does not compromise quality, safety, grounding, reliability, or user experience.

6. Training & Research Infrastructure

Collaborate with research and training infrastructure teams to support large-scale model training and post-training workflows.

Contribute to architecture around distributed training, checkpointing, orchestration, fault tolerance, observability, data movement, evaluation infrastructure, and experiment velocity.

Support frontier model workflows across pre-training, post-training, reinforcement learning, agent training, evaluation harnesses, and large-scale experiment execution.

7. Release Safety, Validation & Evaluation Gates

Architect validation and release systems that ensure model updates, inference engine changes, runtime images, prompt changes, context changes, and platform releases are correct, safe, performant, and regression-free.

Define release gates across correctness, numerical stability, latency, throughput, token usage, cost regression, context quality, retrieval quality, safety behavior, reliability, and model output quality.

Ensure platform optimizations do not reduce safety, grounding, quality, or user trust.

8. Reliability, Observability & Production Operations

Design systems that make AI infrastructure observable, debuggable, reliable, and operationally safe.

Define telemetry, tracing, dashboards, alerts, logs, profiling views, runbooks, SLOs, and post-incident learning loops.

Provide visibility into prompts, context payloads, retrieved sources, token consumption, model selection, cache behavior, inference latency, GPU utilization, evaluation scores, safety events, cost, and failures.

Turn production issues into stronger platform abstractions, safer rollout mechanisms, better automation, and more reliable infrastructure.

9. Agentic & Multimodal Platform Internals

Support architecture for AI agents, tool use, memory, function calling, multimodal interaction, long-running workflows, and internal or external agent deployment.

Work across agent harnesses, evaluation pipelines, workflow orchestration, safety controls, state management, tool execution, memory systems, and product-facing runtime constraints.

Ensure agentic and multimodal systems are reliable, observable, secure, cost-aware, and safe under real workloads.

10. Technical Leadership

Work closely with Research, Inference, Runtime, Infrastructure, Product, Safety, Security, Technical Success, and Deployment teams.

Act as a senior technical authority who can cut across layers, resolve ambiguity, identify systemic risks, and drive architecture decisions.

Mentor engineers and technical leads on distributed systems, performance engineering, context engineering, cost optimization, production readiness, AI platform design, and architecture trade-offs.

Represent architecture decisions through design docs, RFCs, diagrams, technical reviews, operational plans, and leadership-level summaries.

Requirements

Required Qualifications:

  • 10–12 years of experience in software engineering, systems architecture, ML infrastructure, distributed systems, platform engineering, inference systems, cloud infrastructure, or large-scale backend engineering.
  • Strong hands-on engineering experience with Python and at least one systems/backend language such as C++, Go, Rust, Java, or TypeScript .
  • Deep understanding of distributed systems, production infrastructure, reliability engineering, scalability, observability, and fault-tolerant architecture.
  • Experience designing or operating large-scale systems involving APIs, microservices, distributed compute, orchestration, job scheduling, caching, high-availability infrastructure, and production monitoring.
  • Strong understanding of AI/ML systems, especially model serving, inference workflows, context engineering, retrieval systems, evaluation pipelines, and production model deployment.
  • Practical understanding of GPU systems, accelerator-based workloads, CUDA/Triton-style programming, distributed inference, GPU profiling, memory optimization, and communication libraries such as NCCL or RCCL.
  • Experience with ML frameworks and serving stacks such as PyTorch, JAX, TensorFlow, Triton, vLLM-style serving, Apache Ray, Kubernetes-based serving, or internal model-serving systems.
  • Ability to debug complex problems across model behavior, runtime systems, distributed infrastructure, networking, GPU execution, context quality, retrieval quality, evaluation harnesses, and production services.
  • Strong communication skills with the ability to write clear architecture documents, evaluate trade-offs, review implementation quality, and align teams around technically sound decisions.

Preferred Qualifications:

  • Experience working on LLM inference, multimodal inference, agent infrastructure, AI assistants, coding agents, or frontier-model serving platforms.
  • Experience with tensor parallelism, pipeline parallelism, model sharding, KV-cache optimization, batching, speculative decoding, streaming inference, and long-context serving.
  • Experience designing context engineering platforms, prompt/version management systems, model-routing frameworks, semantic caching layers, token-budgeting systems, or LLM cost dashboards.
  • Experience profiling GPU workloads using Nsight Systems, Nsight Compute, rocprof, perf, Prometheus, Grafana, OpenTelemetry, or custom profiling systems.
  • Experience with large-scale distributed training, RL infrastructure, checkpointing, ML compiler optimizations, model graph transformations, or training runtime systems.
  • Experience designing release gates, regression detection systems, canary systems, CI/CD validation frameworks, and production safety controls for performance-sensitive infrastructure.
  • Experience with evals, model quality measurement, hallucination detection, grounding evaluation, safety testing, and model behavior monitoring.

Technical Skill Areas:

AI Systems: LLM serving, inference runtime, training infrastructure, post-training workflows, agent systems, multimodal models

Inference: batching, routing, KV-cache, streaming, latency optimization, model serving, tensor parallelism, pipeline parallelism

Performance Engineering: CUDA, Triton, GPU profiling, kernel optimization, memory bandwidth, communication libraries, distributed execution

Context Engineering: prompt architecture, dynamic context assembly, RAG, memory, context compression, context ranking, source grounding, permission-aware retrieval

Cost Optimization: token budgeting, caching, model routing, fallback strategies, cost telemetry, batching, async workflows, cost-quality trade-offs

Distributed Systems: scheduling, orchestration, reliability, fault tolerance, observability, scalability, service design

ML Frameworks: PyTorch, JAX, TensorFlow, Triton, vLLM-style serving, Ray

Infrastructure: Kubernetes, Docker, Terraform, CI/CD, cloud platforms, Linux systems, networking, storage

Safety & Validation: evals, release gates, canaries, regression testing, model behavior validation, rollout safety

Candidate Profile:

The ideal candidate is a senior hands-on architect who can operate across the full AI systems stack.

They should be able to discuss GPU memory bottlenecks, distributed inference, model-serving reliability, context quality, cost optimization, release validation, eval pipelines, observability, and production rollout with engineering teams, while also explaining architecture decisions clearly to senior leadership.

The candidate should not be limited to architecture diagrams. They must be capable of reviewing implementation quality, identifying bottlenecks, debugging production issues, challenging weak assumptions, and converting repeated failures into stronger platform abstractions.

This role requires the judgment of a senior architect, the debugging mindset of a systems engineer, and the ownership mindset required for production AI infrastructure.

Vacancy posted 7 days ago
Similar jobs that could be interesting for youBased on the AI Systems & Platform Internals - Technical Architect in San Francisco, CA vacancy
  • $132k - $198k

     ...behind v0, Next.js, and AI SDK, Vercel helps...  ...you're building on our platform, supporting our customers...  ...We're hiring an IT Systems Architect to help us improve the...  ...role is hands-on and technical, with a focus on automation...  ...wins that enhance the internal IT experience.... 
    Platform
    Work at office
    Immediate start
    Work from home
    Monday to Friday
    Flexible hours
    Shift work

    Vercel

    San Francisco, CA
    more than 2 months ago
  • $142.5k - $228k

     ...a Business Solutions Architect / Product Manager responsible...  ...for Axon’s Finance Systems portfolio. This role...  ...for critical finance platforms that support close, consolidation...  ...delivered and why; technical teams own how it is...  ...efficiency and internal controls. Support technical... 
    Platform
    Full time
    Work experience placement
    Work at office
    Remote work

    Axon

    San Francisco, CA
    2 days ago
  • $124.74k - $254.5k

     ...seeking a Lead Specialist, AI Solution Architect to join our KPMG Managed Services...  ...practices, providing technical leadership across planning,...  ...stacks, and enterprise data platforms to meet regulatory and...  ...including GenAI, agent-based systems, and RAG-style solutions, with... 
    Platform
    Full time
    H1b
    Local area

    KPMG

    San Francisco, CA
    3 days ago
  •  ...support: Automation Platform, a large-scale, widely adopted conversational AI platform at Airbnb, You...  ...the provision for our internal human agents and AI agents...  ...to develop backend systems and enhance AI prompt effectiveness...  .... Drive the technical vision and strategy for... 
    Platform
    Work experience placement
    Flexible hours

    airbnb, Inc.

    San Francisco, CA
    4 days ago
  • $142.7k - $270.95k

     ...researcher - Machine Learning Systems & Efficiency Engineer to join...  ...as Artificial Intelligence (AI), ML systems, and computer vision...  ...and product decisions. Technical Leadership & Best Practices:...  ...to create through innovative platforms and tools that unleash creativity... 
    Platform
    Full time
    Temporary work
    Local area
    Worldwide

    Adobe

    San Francisco, CA
    4 days ago
  •  ...Happen Bank: Please apply via your internal Workday Account Happen Bank (...  ...About the Role This role defines the technical architecture our data and AI platform - the shared infrastructure that powers...  ...production-grade, cost-efficient systems. What You'll Do Define and... 
    Platform
    Full time
    Work at office
    Local area
    Remote work
    Relocation
    Flexible hours

    LendingClub

    San Francisco, CA
    1 day ago
  • $250k - $290k

     ...expertise, paired with innovative AI-powered technology and an...  ...experienced professionals with deep technical expertise, business...  ...programming languages like Python. System Design and Architecture: Competence...  ..., with proficiency in cloud platforms (e.g., AWS, Azure),... 
    Platform
    Full time
    Work experience placement
    Summer holiday
    Work at office
    Flexible hours

    EY

    San Francisco, CA
    2 days ago
  •  ...search engine for the AI era. Our Search API currently...  .... The role Revenue Systems & Technology at Exa is...  ...to renewal. You'll architect and build Exa's...  ...Default to open, composable platforms; add a middleware...  ...We're happy to sponsor international candidates (e.g., STEM... 
    Platform
    Full time
    H1b
    Shift work

    Exa

    San Francisco, CA
    1 day ago
  • $110k - $160k

    About The Ride Platform The Ride Platform, Inc. is the...  ...operating content as a system. The focus is on building...  .... In a highly technical buying environment, this...  ...where appropriate, using AI to accelerate drafting,...  ...repeatable. Act as an internal expert and coach on AI-... 
    Platform
    Full time
    Live in
    Flexible hours
    Shift work

    The Ride Platform

    San Francisco, CA
    2 days ago
  • $187.67k - $348.53k

     ...stories across genres and platforms, connecting millions of...  ...Engineering - Video AI & Studio AI Job Description...  ...next-generation AI systems for video, spanning both...  ...systems Provide technical leadership and mentorship...  ...external market data, internal equity, location, skill... 
    Platform
    Full time
    Temporary work
    Local area

    Warner Bros. Discovery

    San Francisco, CA
    2 days ago
  • $197k - $235k

     ...philosophy . AI is a fundamental...  ...The Unified Service Platform’s mission is to...  ...empower our customers (internal and external) to...  ...maintaining the systems that power end-of-...  ...empower a team of 6 to architect and deliver...  ...You'll be a key technical voice on the team,... 
    Platform
    Full time
    For contractors
    Work at office
    Local area
    2 days per week
    3 days per week

    Gusto, Inc.

    San Francisco, CA
    23 hours ago
  •  ...Job Description We are looking for a ServiceNow Technical Architect to design the structure of our ITSM systems and oversee programs to ensure the proper architecture...  ..., and management of customers' ServiceNow platform. The Technical Architect - ServiceNow will be an... 
    Platform

    Forhyre

    San Francisco, CA
    6 days ago
  • $289.1k - $408.5k

     ...electrical engineering, such as AI/ML, algorithms, digital...  ...analytics, distributed systems, cloud, edge & mobile...  ...in scalable data platforms, media curation and...  ...visualization. Set technical direction and architecture...  ...experience, market demands, internal parity, and relevant... 
    Platform
    Full time
    Local area
    Worldwide
    Flexible hours
    Shift work

    Dolby Laboratories, Inc.

    Brisbane, CA
    4 days ago
  •  ...The Odaseva EnterpriseData Platform secures and manages Salesforce...  ...Odaseva is looking for a Technical Architect who brings deep Salesforce architectural...  ...of data protection and AI-enabled solutions. Key...  ..., ensuring their workflow system  data strategies meet enterprise... 
    Platform
    Casual work
    Work at office
    Worldwide
    Flexible hours

    Odaseva

    San Francisco, CA
    15 days ago
  • $224k - $308k

     ...Secure Every Identity, from AI to Human Identity is the key...  ...Description: The Services Architect is a technical authority on both cloud and on-premises based IT systems and is responsible for...  ...industry leading cloud identity platform for our customers. You will focus... 
    Platform
    Local area
    Remote work
    Worldwide
    Flexible hours

    Okta

    San Francisco, CA
    3 days ago
  • $140k

     ...Pre-Sales Solutions Architect II New York, New York, United...  ...focus. You will be the technical and product expert in the...  ...your expertise across our platform, including our generative AI capabilities (Deep Dive,...  ...handoffs. Build trust with internal partners and informally... 
    Platform
    Full time
    Work at office
    Work from home
    Flexible hours

    Everlaw

    Oakland, CA
    14 hours ago
  • $205k - $265k

     ...interpretable, and steerable AI systems. We want AI to be safe and...  ...Systems team. You'll own the technical configuration, testing, and...  ...passionate about where AI and internal tooling can make People...  ...features we opt into Across the platform * Translate business... 
    Platform
    Full time
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    2 days ago
  •  ...largest livestream shopping platform in North America and...  ...a hands-on Finance Systems Analyst to help us...  ...Clear communication with technical and non-technical audiences...  .... Curiosity about AI and automation, and a...  ...) and Pension plans internationally Monthly allowance to... 
    Platform
    Full time
    Work at office
    Local area
    Remote work
    Work from home
    Home office

    Whatnot

    San Francisco, CA
    3 days ago
  • $180k - $260k

     ...Solutions Architect Location: San Francisco...  ...at Together AI, you will work with...  ...opportunity for a deeply technical professional...  ...teams, ensuring our platform continues to evolve...  ...tooling for both internal and external use around...  ...transparent AI systems will drive... 
    Platform
    Full time
    Remote work

    Together AI

    San Francisco, CA
    more than 2 months ago
  • $220k - $280k

     ...deployment of Generative AI applications for...  ...through strong technical guidance. OpenAI's...  ...-minded Solutions Architect to help push the...  ...they build on our platform. You will have the...  ...and model feedback internally. You will...  ...capabilities of AI systems and seek to safely... 
    Platform
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    more than 2 months ago
  •  ...Consultant, Internal Audit (Technology / IT Audit) The...  ...will: Perform non-technical and technical IT audits...  ...controls for processes, systems, networks, and applications...  ...plans for data and AI governance areas Review...  ...AI tools, models, and platforms (e.g., generative AI,... 
    Platform

    Blue Shield of California

    Oakland, CA
    14 hours ago
  •  ...Invisible Technologies is the AI operating system for the enterprise. Our end-to-end AI Software Platform structures messy data, builds...  ...oversight, requiring both technical acumen and exceptional relationship...  ..., global talent network, and internal delivery teams Design and... 
    Platform
    Work at office
    Local area
    Remote work

    Invisible

    San Francisco, CA
    more than 2 months ago
  •  ...looking for a Solution Architect who serves as the technical brain trust for our...  ..., data flows, and system architectures well enough...  ...connectors, iPaaS platforms (Workato, Boomi,...  ...data transitions Internal Collaboration Partner...  ...leverage modern AI tools (e.g., MCPs Claude... 
    Platform
    Immediate start
    Remote work

    Blueprint

    San Francisco, CA
    8 days ago
  • $112k - $168k

     ...implementations and strategic technical solutions for...  ...by customers and internal stakeholders to...  ...in the Klaviyo platform and surrounding martech...  ...tools and systems (e.g., project status...  ...workflows by putting AI at the center,...  ...technical advisor or architect for complex... 
    Platform

    Klaviyo

    San Francisco, CA
    7 days ago
  • $242k - $332k

     ...Every Identity, from AI to Human...  ...Identity Solutions Architect Opportunity Reporting...  ...while serving as a technical authority on both...  ...premises based IT systems to ensure the successful...  ...cloud identity platform for Okta and our...  ...with both internal and external customers... 
    Platform
    Local area
    Remote work
    Worldwide
    Flexible hours

    Okta

    San Francisco, CA
    a month ago
  • $199k - $273.9k

     ...Every Identity, from AI to Human...  ...secure, scalable systems that power how our...  ...and ensuring these platforms evolve alongside the...  ...and we are actively architecting the next generation...  ...will serve as the technical anchor for Okta's...  ...and oversight for internal delivery teams and... 
    Platform
    Local area
    Worldwide
    Flexible hours

    Okta

    San Francisco, CA
    14 days ago
  • $143k - $210k

     ...Essential Cloud for AI™. Built for...  ...CoreWeave delivers a platform of technology, tools...  ...performance with deep technical expertise to...  ...technical, AI Solution Architects who want to...  ...voice of the customer internally, surface product...  ...networking, parallel file systems) Wondering if... 
    Platform
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    San Francisco, CA
    25 days ago
  •  ...opportunities through training, internal mobility, and...  ...Public Safety Systems and Municipal Broadband...  ...seeks a Solutions Architect , a senior technical leader responsible...  ...making for connecting platforms with City systems,...  ...integrations, and intelligent (AI-enabled) solutions... 
    Platform
    Full time
    Temporary work
    Work experience placement
    Second job
    Work at office
    Immediate start
    Night shift

    City and County of San Francisco

    San Francisco, CA
    10 days ago
  • $125k - $190k

     ...business intelligence platforms fuel our expertise....  ...Google Cloud Data and AI Solution Architect (SA) is a presales professional...  ...coupled with a broad technical expertise. Their...  ...Technology/Systems). • 8 or more years...  ...etc. Our full-time, internal employment benefits include... 
    Platform
    Permanent employment
    Full time
    Contract work
    Temporary work
    Remote work
    San Francisco, CA
    10 days ago
  • $123.75k - $282k

     ...Cloud Presales Solution Architect specializing in the Amazon suite of AI products, you'll be a...  ...both domestically and internationally. Your role Market...  ...agentic models/multi-agent systems. Conduct workshops...  ...~ Ability to translate technical concepts into business... 
    Platform
    Permanent employment
    Full time
    Local area

    Capgemini

    San Francisco, CA
    17 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Systems & Platform Internals - Technical Architect. Be the first to apply!