Staff Software Engineer - GenAI Performance and Kernel
$190.9k - $232.8kCacheflow
About This Role As a staff software engineer for GenAI Performance and Kernel, you will own the design, implementation, optimization, and correctness of the high-performance GPU kernels powering our GenAI inference stack. You will lead development of highly-tuned, low-level compute paths, manage trade-offs between hardware efficiency and generality, and mentor others in kernel-level performance engineering. You will work closely with ML researchers, systems engineers, and product teams to push the state-of-the-art in inference performance at scale. What You Will Do Lead the design, implementation, benchmarking, and maintenance of core compute kernels (e.g. attention, MLP, softmax, layernorm, memory management) optimized for various hardware backends (GPU, accelerators) Drive the performance roadmap for kernel-level improvements: vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, auto-tuning, etc. Integrate kernel optimizations with higher-level ML systems Build and maintain profiling, instrumentation, and verification tooling to detect correctness, performance regressions, numerical issues, and hardware utilization gaps Lead performance investigations and root‑cause analysis on inference bottlenecks, e.g. memory bandwidth, cache contention, kernel launch overhead, tensor fragmentation Establish coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross‑backend portability, and maintainability Influence system architecture decisions to make kernel improvements more effective (e.g. memory layout, dataflow scheduling, kernel fusion boundaries) Mentor and guide other engineers working on lower‑level performance, provide code reviews, help set best practices Collaborate with infrastructure, tooling, and ML teams to roll out kernel-level optimizations into production, and monitor their impact What We Look For BS/MS/PhD in Computer Science, or a related field Deep hands‑on experience writing and tuning compute kernels (CUDA, Triton, OpenCL, LLVM IR, assembly or similar sort) for ML workloads Strong knowledge of GPU/accelerator architecture: warp structure, memory hierarchy (global, shared, register, L1/L2 caches), tensor cores, scheduling, SM occupancy, etc. Experience with advanced optimization techniques: tiling, blocking, software pipelining, vectorization, fusion, loop transformations, auto‑tuning Familiarity with ML‑specific kernel libraries (cuBLAS, cuDNN, CUTLASS, oneDNN, etc.) or open kernels Strong debugging and profiling skills (Nsight, NVProf, perf, vtune, custom instrumentation) Experience reasoning about numerical stability, mixed precision, quantization, and error propagation Experience in integrating optimized kernels into real‑world ML inference systems; exposure to distributed inference pipelines, memory management, and runtime systems Experience building high‑performance products leveraging GPU acceleration Excellent communication and leadership skills — able to drive design discussions, mentor colleagues, and make trade‑offs visible A track record of shipping performance‑critical, high‑quality production software Bonus: published in systems/ML performance venues (e.g. MLSys, ASPLOS, ISCA, PPoPP), experience with custom accelerators or FPGA, experience with sparsity or model compression techniques Pay Range Transparency Databricks is committed to fair and equitable compensation practices. The pay range(s) for this role is listed below and represents the expected salary range for non‑commissionable roles or on‑target earnings for commissionable roles. Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job‑related skills, depth of experience, relevant certifications and training, and specific work location. Based on the factors above, Databricks anticipates utilizing the full width of the range. The total compensation package for this position may also include eligibility for annual performance bonus, equity, and the benefits listed above. For more information regarding which range your location is in visit our page here. Local Pay Range
$190,900 — $232,800 USD
About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, Grammarly, and over 50% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow. To learn more, follow Databricks on Twitter, LinkedIn and Facebook. Benefits At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visit Our Commitment to Diversity and Inclusion At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio‑economic status, veteran status, and other protected characteristics. Compliance If access to export‑controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone. #J-18808-Ljbffr Cacheflow- Staff Software Engineer - GenAI Performance and KernelPerformance
- A leading data and AI company in San Francisco seeks a Staff Software Engineer to lead kernel-level performance engineering for GenAI workloads. The role involves designing and optimizing high-performance GPU kernels, mentoring engineers, and driving performance roadmaps...Performance
$208k - $250k
...for the job. Are you an ambitious engineer looking to make an outstanding impact... ...Ripple Labs Inc., we are seeking a GenAI Platform Staff Software Engineer to join our team in San Francisco... ...build-vs-use decisions based on performance, scalability, control, and long‑term...PerformanceLocal area$142.2k - $204.6k
...P-1284 About This Role As a software engineer for GenAI inference, you will help design, develop... ...full GenAI inference stack - from kernels and runtimes to orchestration and memory... ...(3+ years or equivalent) in performance-critical systems Solid understanding...PerformanceLocal areaWorldwide- A pioneering tech company in San Francisco is seeking a Software Engineer to lead engineering projects on their Product Engineering team. The... ..., and Python, with a strong focus on application layer and genAI products. This role requires collaboration with various teams...SuggestedRelocation package
$166k - $225k
...Job Description As a research engineer on the Scaling team, you will... ...Databricks, you will: Drive performance improvements through advanced... ...optimization techniques including kernel fusion, mixed precision,... ...distributed workloads Strong software engineering skills in Python...PerformanceWorldwide$240k - $310k
...Staff Software Engineer Crusoe is on a mission to accelerate the abundance of energy and intelligence... ...AI strategies, and be part of a high-performing team that believes in each other, come... ...bottlenecks in the stack—from kernel-level IO context switching to global tail...PerformanceTemporary work$194k - $267k
...too, let's talk. The Global AI Engineering Team At Okta, you'll be... ...opportunity to make an impact. The Staff Software Engineer Opportunity At... ...intelligent automation and GenAI real for our workforce - all while ensuring performance, security, and reliability at...PerformanceLocal areaWorldwideFlexible hours- ...Principal Staff Engineer As a Principal Staff Engineer at Jazzx.ai,... ...and enhance innovative, high-performance, AI-driven platform, products... ...reliable, and user-centric software solutions. Lead architectural... ...applications with modern GenAI technologies, LLMs and AI...Performance
$180k - $250k
...Staff Software Engineer, ML Performance & Systems San Francisco fal is the generative media ecosystem powering the next generation of AI products... ...deeper into the stack to fix bottlenecks (custom GEMM kernels with CUTLASS for common shapes). Proficient in Triton...PerformanceCurrently hiringRelocation package- ...Overview As a Senior/Staff Embedded Linux Engineer at BrightAI, you will help... ...bootloader configuration, kernel updates, and device tree changes... ...maintain low-level system software in C/C++, working closely... ...system reliability, performance, boot time, and debuggability...Performance
$200k - $225k
...applying robotics and distributed software to create a new class of... ...looking for an experienced Staff Software Engineer to develop software... ...systems critical to our robot's performance. Responsibilities:... ...Linux operating systems and kernel fundamentals A Final Note...PerformanceWork at office- ...GenAI Software Engineers We are looking for GenAI Software Engineers of all levels who are passionate about making a positive impact. You... ...understanding model behavior—developing tools to measure system performance, conduct A/B tests against established baselines, and...PerformanceWork at officeRelocation package
$300 per month
...AI strategies, and be part of a high‑performing team that believes in each other, come... ...a highly skilled and motivated Senior Staff Software Engineer - SoftwareDefinedNetworking to lead the... ...SmartNICs, and DPUs/IPUs within the Linux Kernel to significantly enhance network...PerformanceTemporary work$216k - $270k
...private evaluations. About Data Engine Our Generative AI Data... ...several teams within the GenAI Engineering organization, based... ...Requirements: ~5+ years of software engineering experience,... ...scale ~ Drive reliability and performance across critical infrastructure...PerformanceFull time- Requirements BS in Mechatronics Engineering, Electrical... ...changes affect robotics software and vice-versa ,... ...operating systems and kernel fundamentals , F this... ...looking for an experienced Staff Software Engineer to develop... ...to our robot's performance , Deliver high quality...Performance
- ...applying robotics and distributed software to create a new class of... ...looking for an experienced Staff Software Engineer to develop software... ...systems critical to our robot’s performance. Responsibilities:... ...Linux operating systems and kernel fundamentals A Final Note:...PerformanceWork at officeFlexible hours
- ...generation of AI‑native silicon while working closely with software and research partners to co‑design hardware... ...the Role We are looking for a systems‑minded engineer to help advance our kernel development, performance engineering, and hardware‑software co‑design capabilities...Performance
$207k - $301k
.... 8 years of experience in software development. 5 years of experience... ...Android Audio HAL, Linux kernel, and audio frameworks.... ...Master’s degree or PhD in Engineering, Computer Science, or a related... ...integration. Orchestrate audio performance across multiple chips (e.g.,...Performance- About the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-level compute kernels that power our... ...Key Responsibilities Design, implement, and optimize high-performance GPU kernels for AI inference (e.g., GEMM, attention, routing...PerformanceFlexible hours
$100k - $120k
Coda Robotics is looking for an experienced engineer to join their founding team, focusing on low-level compute kernels to enhance robotic foundation models. The ideal candidate will have substantial experience in systems programming (C/C++, assembly), expertise in GPU...Performance- ...The goal is to build the engineering foundation that allows researchers... ...stacks Triton / custom kernels Data Infrastructure... ...You You are a strong software engineer who speaks the language... ...Distributed systems High-performance computing You care deeply...PerformanceRelocation package
$300 per month
...intelligence. We’re crafting the engine that powers a world... ...: The Crusoe Cloud Software Development team is... ...experienced Senior Staff Software Engineer specializing... ...O virtualization, and performance optimization is... ...tolerance mechanisms. Linux Kernel Familiarity :...PerformanceFull timeTemporary work$180k - $200k
...follow us on LinkedIn. AI Engineering @ Ironclad Ironclad is... ...PostgreSQL , ensuring high performance and availability. Build Production... ...reliable, highly scalable software and services designed for a... ...0,000 Base Salary Range - Staff: $210,000 - $235,000 The...PerformanceContract work- A leading AI technology firm located in San Francisco is seeking a Research Engineer specializing in AI Performance & Kernel Optimization. The role involves enhancing the performance of large-scale AI systems, optimizing kernels, and collaborating with various teams. Ideal...Performance
- Quadric in San Francisco is looking for an experienced AI Kernel Engineer to develop and optimize AI kernels for their innovative neural processing platform. This role involves enhancing performance for various hardware configurations and providing technical support to...Performance
$279.2k - $390.9k
...management of data. With a focus on performance, reliability, and scalability... ..., Lexical retrieval & GenAI applications. How You'll Have... ...ML Indexing & Retrieval engine, integrating capabilities across... ...10+ years of experience in software engineering, specializing in...PerformanceFor contractorsWork experience placementRemote workFlexible hours- ...architecture. Quadric's co-optimized software and hardware is targeted to run neural... ...operated smart-sensor systems to high-performance automotive or autonomous vehicle systems... ...DSP and control code. Role The AI Kernel Engineer in Quadric plays the key role to enable...Performance
$252k - $315k
..., and more. We are looking for a strong engineer to join our team and help us build and scale... ...will have a strong understanding of software engineering principles and practices, as... ..., experience, qualifications, interview performance, and relevant education or training. Scale...PerformanceFull time- ...A leading software monitoring company is seeking a Staff Software Engineer for their Issue Workflow team. This role involves architecting systems at massive scale, solving performance challenges, and mentoring team members. Ideal candidates have over 10 years of experience...Performance
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Software Engineer - GenAI Performance and Kernel. Be the first to apply!
- id software San Francisco, CA
- android software developer San Francisco, CA
- software sales San Francisco, CA
- software technical support engineer San Francisco, CA
- javascript software engineer San Francisco, CA
- software engineer - cloud services San Francisco, CA
- embedded software San Francisco, CA
- software intern San Francisco, CA
- senior c# .net software developer San Francisco, CA
- software applications developer San Francisco, CA


