Member of Technical Staff, Inference (Bay Area)
GenesisAI
What You’ll Do Build low-latency inference pipelines for on-device deployment, enabling real-time next-token and diffusion-based control loops in robotics Design and optimize distributed inference systems on GPU clusters, pushing throughput with large-batch serving and efficient resource utilization Implement efficient low-level code (CUDA, Triton, custom kernels) and integrate it seamlessly into high-level frameworks Optimize workloads for both throughput (batching, scheduling, quantization) and latency (caching, memory management, graph compilation) Develop monitoring and debugging tools to guarantee reliability, determinism, and rapid diagnosis of regressions across both stacks What You’ll Bring Deep experience in distributed systems, ML infrastructure, or high-performance serving (8+ years) Production-grade expertise in Python, with strong background in systems languages (C++/Rust/Go) Low-level performance mastery: CUDA, Triton, kernel optimization, quantization, memory and compute scheduling Proven track record scaling inference workloads in both throughput-oriented cluster environments and latency-critical on-device deployments System-level mindset with a history of tuning hardware–software interactions for maximum efficiency, throughput, and responsiveness #J-18808-Ljbffr
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff, Inference (Bay Area). Be the first to apply!
- desktop support analyst San Carlos, CA
- technical support specialist San Carlos, CA
- support analyst San Carlos, CA
- customer support technician San Carlos, CA
- support technician San Carlos, CA
- application support technician San Carlos, CA
- help desk administrator San Carlos, CA
- technical assistant San Carlos, CA
- IT help desk technician San Carlos, CA
- IT support technician San Carlos, CA
