Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer, Inference

Trypulse

Overview Pulse is tackling one of the most persistent challenges in data infrastructure: extracting accurate, structured information from complex documents at scale. We have a breakthrough approach to document understanding that combines intelligent schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools consistently fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. We are backed by tier 1 investors and growing quickly. What makes our tech special is our multi-stage architecture: Layout understanding with specialized component detection models Low-latency OCR models for targeted extraction Advanced reading-order algorithms for complex structures Proprietary table structure recognition and parsing Fine-tuned vision-language models for charts, tables, and figures If you are passionate about the intersection of computer vision, NLP, and data infrastructure, your work at Pulse will directly impact customers and shape the future of document intelligence. What we are looking for 5 days in-office at our San Francisco office Eager to learn and adapt quickly Prior startup or founding experience is a plus About the Role Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own profiling, batching, and autoscaling across single-tenant and multi-tenant environments. Responsibilities Build inference services with smart batching and caching Optimize kernels, tokenization, and model graphs Evaluate vLLM, TensorRT LLM, and Triton tradeoffs Implement autoscaling and admission control with clear SLOs Own performance dashboards and capacity planning Requirements 3+ years in performance engineering or ML systems Strong Python, plus C++ or CUDA exposure Experience with GPU profiling and model serving Nice to have Experience reducing p95 and cost in production ML systems Sponsorship Sponsorship available. Compensation and benefits Competitive base salary plus equity, performance-based bonus, relocation assistance for Bay Area moves, daily meal stipend, medical, vision, and dental coverage. #J-18808-Ljbffr Trypulse

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Software Engineer, Inference in San Francisco, CA vacancy
  • $187.5k - $395k

     ...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step... 
    Suggested

    Luma AI

    San Francisco, CA
    3 days ago
  • $142.2k - $204.6k

     ...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks' Foundation Model API. You'll work at the intersection of research and production, ensuring our large language... 
    Suggested
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    6 hours ago
  • $320k

     ...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly... 
    Suggested
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    2 days ago
  •  ...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference...  ...performance Analyze performance bottlenecks across the software and hardware stack, and implement targeted optimizations... 
    Suggested
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    4 days ago
  •  ...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI...  ...backend engineering - including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms... 
    Suggested
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    1 day ago
  • $295k

     ...About the Team Our Inference team brings OpenAI's most capable research and technology...  ...the Role We are looking for an engineer who wants to take the world's largest and...  ...Have at least 5 years of professional software engineering experience. Have or can... 

    OpenAI

    San Francisco, CA
    6 hours ago
  • $170k - $216k

     ...products that evaluate the Waymo Driver's software stack at a massive scale. We solve...  ...for a broad range of customers Software Engineers, Product, Data Science, System Engineering...  ...You will: Build and evolve ML inference infrastructure for simulations. Be responsible... 
    Full time
    Remote work

    Waymo

    San Francisco, CA
    1 day ago
  •  ...systems that turn raw compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines that...  ...call pager that keeps you honest about both. Researchers and ML engineers will hand you workloads that barely run; you'll hand them back... 
    Flexible hours

    Adaption

    San Francisco, CA
    16 days ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that brings...  ...container orchestration is a strong plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies (InfiniBand... 
    Full time
    Local area

    Together AI

    San Francisco, CA
    3 days ago
  • $100k - $300k

     ...innovative projects. Position Overview We are looking for a Software Engineer to work at the forefront of deploying our cutting-edge AI...  ...embodied systems. You will be responsible for optimizing AI inference processes from lightweight to billion-parameter models, ensuring... 
    Full time

    Skild AI

    San Francisco, CA
    2 days ago
  •  ...Tech Lead, Data & Inference Engineer San Francisco, California, United States About the Job Tech Lead, Data & Inference Engineer Our client is a fast moving and venture backed advertising technology startup based in San Francisco. They have raised twelve million... 
    Full time

    Catalyst Labs, LLC

    San Francisco, CA
    2 days ago
  •  ...enable enterprises to implement AI workloads effectively. The role involves designing large-scale deployment architectures, solving AI inference challenges, and collaborating closely with customers' DevOps teams. Ideal candidates will have 3+ years in cloud infrastructure or... 
    Flexible hours

    FriendliAI

    San Francisco, CA
    1 day ago
  • Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming, etc. Distributed compute (with GPUs is a super plus) No degree required Company Luminal (YC S25) builds an AI compiler and serving... 

    SupportFinity™

    San Francisco, CA
    1 day ago
  • $230k - $265k

    Parafin is seeking a Software Engineer to lead the evolution of their ML Platform, ensuring robust and scalable systems for data scientists...  ...maintain core platform functionalities, enhance real-time inference processes, and collaborate across teams to ensure quality. A... 
    Remote job

    Parafin

    San Francisco, CA
    1 day ago
  • $120k - $180k

     ...yet, our team is tackling cutting-edge engineering challenges to bring revolutionary products...  ...We are looking for a full-stack software enginee r to turn whiteboard ideas into...  ...features that showcase real-time sensing and inference in compelling, reliable ways.... 
    Visa sponsorship

    TAC IT

    San Francisco, CA
    4 days ago
  •  ...BASETEN Baseten powers mission‑critical inference for the world's most dynamic AI companies...  .... Join us and help build the platform engineers turn to to ship AI products. THE ROLE As...  ...scale and who enjoy working across product, software development, performance engineering,... 
    Work experience placement
    Flexible hours

    Baseten

    San Francisco, CA
    1 day ago
  •  ...transformer workloads. Build and lead a team of engineers responsible for implementing the low-level inference stack, including kernel development and runtime...  ...teams working on low-level performance-critical software such as CUDA kernels, compilers, or ML runtimes.... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    1 day ago
  • $380k

     ...reliable, user-friendly, and aligned with our mission of broad societal benefit. About the Role We're looking for a GPU Inference Engineer to contribute to improvements in model serving efficiency for Sora. This is a high-impact role where you'll drive initiatives... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    2 days ago
  • $167.2k - $209k

    A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong... 
    Remote job

    DigitalOcean

    San Francisco, CA
    5 days ago
  •  ...in fast-moving environments where the path forward isn't laid out for you , 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems , Familiarity with at least one deep learning framework (PyTorch, JAX,... 

    Perplexity AI

    San Francisco, CA
    4 days ago
  •  ...Staff+ Software Engineer, Inference Runtime Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NY About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial... 
    Work at office
    Remote work
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    1 day ago
  •  ...Staff Technical Lead for Inference & ML Performance San Francisco fal is the generative media ecosystem powering the next generation...  ...Role Matters You'll shape the future of fal's inference engine and ensure our generative models achieve best-in-class performance... 

    Fal

    San Francisco, CA
    6 hours ago
  •  ...At Inductive Bio, our goal is to build software that can dramatically improve how molecules...  .... We are seeking a full-stack software engineer to join our talented, ambitious, and...  ...infrastructure for model management and low-latency inference, including security features,... 

    Inductive Bio, Inc.

    San Francisco, CA
    5 days ago
  • $230k - $385k

     ...video. Our team also manages large-scale inference and platform infrastructure that...  ...over unchecked growth. Within Applied Engineering, the Ads Monetization team in Financial...  ...Possess a minimum of 5 years of professional software engineering experience. Bring... 

    OpenAI

    San Francisco, CA
    2 days ago
  •  ...founded in 2024 by a team of former Scale AI engineers and operators. In less than a year, we’...  ...audio models. About this role As a Software Engineer, Platform at David AI, you’ll...  ...volumes of audio or video data. Scaled up inference and train compute for large scale... 
    Work at office

    David AI

    San Francisco, CA
    4 days ago
  • $175k - $225k

     ...security. Our team is led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla, Apple, and...  .... The Role We're looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware. In this role, you won't... 
    Local area
    Remote work

    Sauron

    San Francisco, CA
    2 days ago
  • $165k - $190k

     ...ubiquitous. We build the foundation for agent engineering in the real world, helping developers...  ...to make intelligent, autonomous software a reality both internally and for our customers...  ...prompting, retrieval, orchestration, inference APIs, and model selection across... 
    Work at office
    Flexible hours

    LangChain, Inc

    San Francisco, CA
    1 day ago
  •  ...company specializing in AI infrastructure is seeking a skilled professional to build scalable infrastructure for AI model training and inference. You will lead architectural decisions and work with core systems that power their GPU optimization platform. Candidates should... 

    Wafer

    San Francisco, CA
    4 days ago
  • $202.5k - $247.5k

     ...Software Engineer III/Senior, Data Platform ngrok is an all-in-one cloud networking platform that secures, transforms, and routes traffic...  ...URL has grown into a universal gateway for API delivery, AI inference, device fleets, and site-to-site connectivity. It's the same... 
    Permanent employment
    Full time
    Live in
    Work at office
    Local area
    Remote work
    Home office
    Flexible hours

    ngrok

    San Francisco, CA
    2 days ago
  • Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure. The ideal candidate will have extensive experience with LLM frameworks, quantization techniques... 

    Fathom

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Inference. Be the first to apply!