Software Engineer, Inference
Trypulse
Overview Pulse is tackling one of the most persistent challenges in data infrastructure: extracting accurate, structured information from complex documents at scale. We have a breakthrough approach to document understanding that combines intelligent schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools consistently fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. We are backed by tier 1 investors and growing quickly. What makes our tech special is our multi-stage architecture: Layout understanding with specialized component detection models Low-latency OCR models for targeted extraction Advanced reading-order algorithms for complex structures Proprietary table structure recognition and parsing Fine-tuned vision-language models for charts, tables, and figures If you are passionate about the intersection of computer vision, NLP, and data infrastructure, your work at Pulse will directly impact customers and shape the future of document intelligence. What we are looking for 5 days in-office at our San Francisco office Eager to learn and adapt quickly Prior startup or founding experience is a plus About the Role Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own profiling, batching, and autoscaling across single-tenant and multi-tenant environments. Responsibilities Build inference services with smart batching and caching Optimize kernels, tokenization, and model graphs Evaluate vLLM, TensorRT LLM, and Triton tradeoffs Implement autoscaling and admission control with clear SLOs Own performance dashboards and capacity planning Requirements 3+ years in performance engineering or ML systems Strong Python, plus C++ or CUDA exposure Experience with GPU profiling and model serving Nice to have Experience reducing p95 and cost in production ML systems Sponsorship Sponsorship available. Compensation and benefits Competitive base salary plus equity, performance-based bonus, relocation assistance for Bay Area moves, daily meal stipend, medical, vision, and dental coverage. #J-18808-Ljbffr Trypulse
$187.5k - $395k
...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step...Suggested$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks' Foundation Model API. You'll work at the intersection of research and production, ensuring our large language...SuggestedLocal areaWorldwide$320k
...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly...SuggestedWork at officeVisa sponsorshipFlexible hours- ...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference... ...performance Analyze performance bottlenecks across the software and hardware stack, and implement targeted optimizations...SuggestedWorldwideFlexible hours
- ...About the Role We are hiring Software Engineers focused on AI Infrastructure to build the systems that enable frontier multimodal AI... ...backend engineering - including GPU orchestration, large-scale inference systems, performance optimization, and developer platforms...SuggestedInternshipImmediate start
$295k
...About the Team Our Inference team brings OpenAI's most capable research and technology... ...the Role We are looking for an engineer who wants to take the world's largest and... ...Have at least 5 years of professional software engineering experience. Have or can...$170k - $216k
...products that evaluate the Waymo Driver's software stack at a massive scale. We solve... ...for a broad range of customers Software Engineers, Product, Data Science, System Engineering... ...You will: Build and evolve ML inference infrastructure for simulations. Be responsible...Full timeRemote work- ...systems that turn raw compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines that... ...call pager that keeps you honest about both. Researchers and ML engineers will hand you workloads that barely run; you'll hand them back...Flexible hours
$160k - $250k
...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that brings... ...container orchestration is a strong plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies (InfiniBand...Full timeLocal area$100k - $300k
...innovative projects. Position Overview We are looking for a Software Engineer to work at the forefront of deploying our cutting-edge AI... ...embodied systems. You will be responsible for optimizing AI inference processes from lightweight to billion-parameter models, ensuring...Full time- ...Tech Lead, Data & Inference Engineer San Francisco, California, United States About the Job Tech Lead, Data & Inference Engineer Our client is a fast moving and venture backed advertising technology startup based in San Francisco. They have raised twelve million...Full time
- ...enable enterprises to implement AI workloads effectively. The role involves designing large-scale deployment architectures, solving AI inference challenges, and collaborating closely with customers' DevOps teams. Ideal candidates will have 3+ years in cloud infrastructure or...Flexible hours
- Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming, etc. Distributed compute (with GPUs is a super plus) No degree required Company Luminal (YC S25) builds an AI compiler and serving...
$230k - $265k
Parafin is seeking a Software Engineer to lead the evolution of their ML Platform, ensuring robust and scalable systems for data scientists... ...maintain core platform functionalities, enhance real-time inference processes, and collaborate across teams to ensure quality. A...Remote job$120k - $180k
...yet, our team is tackling cutting-edge engineering challenges to bring revolutionary products... ...We are looking for a full-stack software enginee r to turn whiteboard ideas into... ...features that showcase real-time sensing and inference in compelling, reliable ways....Visa sponsorship- ...BASETEN Baseten powers mission‑critical inference for the world's most dynamic AI companies... .... Join us and help build the platform engineers turn to to ship AI products. THE ROLE As... ...scale and who enjoy working across product, software development, performance engineering,...Work experience placementFlexible hours
- ...transformer workloads. Build and lead a team of engineers responsible for implementing the low-level inference stack, including kernel development and runtime... ...teams working on low-level performance-critical software such as CUDA kernels, compilers, or ML runtimes....Work at officeRelocation package
$380k
...reliable, user-friendly, and aligned with our mission of broad societal benefit. About the Role We're looking for a GPU Inference Engineer to contribute to improvements in model serving efficiency for Sora. This is a high-impact role where you'll drive initiatives...Work at officeRelocation package$167.2k - $209k
A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong...Remote job- ...in fast-moving environments where the path forward isn't laid out for you , 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems , Familiarity with at least one deep learning framework (PyTorch, JAX,...
- ...Staff+ Software Engineer, Inference Runtime Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NY About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial...Work at officeRemote workVisa sponsorshipFlexible hours
- ...Staff Technical Lead for Inference & ML Performance San Francisco fal is the generative media ecosystem powering the next generation... ...Role Matters You'll shape the future of fal's inference engine and ensure our generative models achieve best-in-class performance...
- ...At Inductive Bio, our goal is to build software that can dramatically improve how molecules... .... We are seeking a full-stack software engineer to join our talented, ambitious, and... ...infrastructure for model management and low-latency inference, including security features,...
$230k - $385k
...video. Our team also manages large-scale inference and platform infrastructure that... ...over unchecked growth. Within Applied Engineering, the Ads Monetization team in Financial... ...Possess a minimum of 5 years of professional software engineering experience. Bring...- ...founded in 2024 by a team of former Scale AI engineers and operators. In less than a year, we’... ...audio models. About this role As a Software Engineer, Platform at David AI, you’ll... ...volumes of audio or video data. Scaled up inference and train compute for large scale...Work at office
$175k - $225k
...security. Our team is led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla, Apple, and... .... The Role We're looking for an AI Inference Engineer who lives at the boundary of high-performance software and physical hardware. In this role, you won't...Local areaRemote work$165k - $190k
...ubiquitous. We build the foundation for agent engineering in the real world, helping developers... ...to make intelligent, autonomous software a reality both internally and for our customers... ...prompting, retrieval, orchestration, inference APIs, and model selection across...Work at officeFlexible hours- ...company specializing in AI infrastructure is seeking a skilled professional to build scalable infrastructure for AI model training and inference. You will lead architectural decisions and work with core systems that power their GPU optimization platform. Candidates should...
$202.5k - $247.5k
...Software Engineer III/Senior, Data Platform ngrok is an all-in-one cloud networking platform that secures, transforms, and routes traffic... ...URL has grown into a universal gateway for API delivery, AI inference, device fleets, and site-to-site connectivity. It's the same...Permanent employmentFull timeLive inWork at officeLocal areaRemote workHome officeFlexible hours- Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure. The ideal candidate will have extensive experience with LLM frameworks, quantization techniques...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Engineer, Inference. Be the first to apply!
- software sales engineer San Francisco, CA
- software engineer internship remote San Francisco, CA
- IT software developer San Francisco, CA
- new grad software engineer San Francisco, CA
- software engineer staff San Francisco, CA
- integration software engineer San Francisco, CA
- machine learning software engineer San Francisco, CA
- software engineer part time San Francisco, CA
- facebook software engineer San Francisco, CA
- senior robotics software engineer San Francisco, CA

