Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer - Performance Profiling

$2,000 per month

ETCHED LLC

About Etched

Etched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history.

Job Summary

Join our team as a Software Engineer - Performance Tools and take the lead in illuminating the performance landscape of our cutting-edge ML accelerator. We are seeking a highly skilled engineer to design and develop a sophisticated performance analysis tool, tailored specifically for Sohu. You will be instrumental in creating the essential tooling that enables our ML engineers and customers to understand workload behavior, identify performance bottlenecks, and unlock the full potential of Sohu accelerating the most demanding ML applications in the world. This is a unique opportunity to shape performance analysis for novel hardware from the ground up.

Key responsibilities
  • Lead the design and architecture of a comprehensive performance analysis suite, including data collection mechanisms, data processing pipelines, analysis engines, and user interfaces (CLI and/or GUI).
  • Develop robust methods to capture performance data directly from our custom ML accelerator hardware (e.g., hardware performance counters, execution unit status, memory access patterns) via driver interfaces or other mechanisms.
  • Implement tracing for host-side API calls (runtime libraries, driver interactions) and system-level events (CPU activity, PCIe traffic, memory usage, network contention) related to Sohu workloads.
  • Design and implement techniques to accurately correlate performance events across the host CPU, device driver, PCIe bus, multiple accelerators, and multiple hosts, ensuring precise time synchronization.
  • Build analysis modules to automatically interpret collected trace and counter data, identifying key performance limiters (e.g., compute-bound, memory bandwidth-bound, latency-bound, PCIe-bound, specific hardware bottlenecks).
  • Develop intuitive visualizations (timelines, dependency graphs, resource utilization charts, statistical summaries) to clearly communicate performance characteristics and bottlenecks to users.
  • Work closely with hardware architects, firmware engineers, driver developers, compiler engineers, and ML application engineers to understand their needs, define tool requirements, and provide expert guidance on performance analysis and optimization using the tool.
Representative projects
  • Architect and implement the core data collection framework for hardware performance counters on a custom PCIe-based accelerator.
  • Develop a kernel driver module or user-space service for low-overhead tracing of accelerator activity.
  • Design and build a correlated timeline view visualizing CPU API calls, driver submissions, PCIe transfers, and accelerator execution units.
  • Create an analysis pass to detect and quantify memory access inefficiencies or PCIe bandwidth saturation while transacting on a PCIe-attached accelerator.
You may be a good fit if you have
  • Strong proficiency in C++ or Rust
  • Proficiency in Python is a plus
  • Deep understanding of computer architecture (CPU, GPU, accelerators), memory hierarchies (caches, DRAM), and interconnects (especially PCIe).
  • Proven experience in low-level performance analysis, profiling, and bottleneck identification on complex hardware systems (GPUs, CPUs, FPGAs, or custom ASICs).
  • Experience with performance analysis tools (e.g., NVIDIA Nsight, AMD uProf, Intel VTune, perf, Tracy, ETW).
  • Experience working close to hardware, potentially reading performance counters or interacting directly with device drivers.
Strong candidates may also have experience with (Nice-to-have qualifications)
  • Direct experience developing performance analysis or debugging tools.
  • Experience with ML accelerator architectures (GPUs, TPUs, etc.).
  • Experience with kernel-mode driver development (Linux or Windows).
  • Understanding of compiler internals, code generation, and optimization.
  • In-depth knowledge of the PCIe protocol and analysis tools (PCIe analyzers).
  • Experience with multi-chip or multi-host accelerator systems (e.g., TPU pods, or NVidia DGX clusters)
  • Experience with firmware or embedded systems development.
  • Experience with hardware description languages (Verilog, VHDL) or hardware verification.
Benefits
  • Medical, dental, and vision packages with generous premium coverage
    • $500 per month credit for waiving medical benefits
  • Housing subsidy of $2k per month for those living within walking distance of the office
  • Relocation support for those moving to San Jose (Santana Row)
  • Various wellness benefits covering fitness, mental health, and more
  • Daily lunch + dinner in our office
  • Unlimited compute budget subject to ROI justification

How we're different

Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.

We are a fully in-person team in San Jose and Taipei, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Software Engineer - Performance Profiling in San Jose, CA vacancy
  • $272k - $431.25k

     ...Always-On, low-overhead GPU profiling service that runs in production...  ...-on delivery across system software, drivers, and CUDA to make...  ...driver/platform layers, and performance counter/trace providers....  ...technical direction for an engineering team; mentor engineers, drive... 
    Performance

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $120 - $130 per hour

     ...Animation Software Engineer/Graphics Engineer V Location: Cupertino, California - Remote Duration: Contract Job...  ...Adding new features to the Keynote animation engine Profiling and optimizing performance of Keynote animations using state‑of‑the‑art graphics... 
    Performance
    Contract work
    Remote work

    PTR Global

    Cupertino, CA
    1 day ago
  •  ...C++ Software Engineer High-Performance Linux Systems | Full-Time | San Jose CA ****@*****.*** About the Role: We are looking...  ...speed, reproducibility, and developer experience • Profile and optimize CPU, memory, I/O, and concurrency; debug complex... 
    Performance
    Full time
    Work at office

    Imagry | Autonomous Driving

    San Jose, CA
    1 day ago
  • $100k

     ...Software Engineer, TT-Distributed Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining...  ...maintain testing, debugging, profiling, and monitoring tools for large-... 
    Performance

    Tenstorrent

    Santa Clara, CA
    3 days ago
  •  ...ML Systems Engineer — Training & Inference Optimization (MBMB) We...  ...robot foundation models, high-performance training infrastructure, and...  ...boundaries across hardware, software, and model design — where improvements...  ...Continuously profile, benchmark, and improve system... 
    Performance

    Seer

    San Jose, CA
    1 day ago
  •  ...Middleware Software Engineer Figure is an AI robotics company developing autonomous general-...  ...intelligence. Its robots are engineered to perform a variety of tasks in the home and...  ...Comfortable using debuggers and performance profiling tooling. Bonus Qualifications:... 
    Performance

    Figure

    San Jose, CA
    3 days ago
  • $224k - $356.5k

     ...are looking for a Senior Deep Learning Software Engineer to design and build our automated...  ...JAX, designing and implementing a high-performance execution environment, low-level GPU optimizations...  ...deployment solution. Analyze and profile GPU kernel-level performance to... 
    Performance

    NVIDIA

    Santa Clara, CA
    14 hours ago
  • $168k - $270.25k

     ...NVIDIA GPU Architecture Group is seeking a senior software engineer to automate and optimize performance analysis workflows for AI training and inference workloads...  ...intuitive tooling. Develop integrations between profiling infrastructure and AI frameworks and workflows.... 
    Performance
    Work experience placement

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $147.4k - $272.1k

     ...Software Engineer - Workout & Fitness, Watch Software Imagine what you could do here. At Apple...  ...the development process to hit our performance goals, including help influence future...  ...Experience optimizing applications and profiling throughout the stack Experience... 
    Performance
    Relocation

    Apple

    Cupertino, CA
    2 days ago
  •  ...We are looking for a Senior iOS Software Engineer to build and scale world-class mobile...  ...using design systems. Optimize app performance, memory usage, and stability; proactively...  ...Instruments (memory leaks, crashes, performance profiling). Experience writing unit tests... 
    Performance

    Purple Drive

    San Jose, CA
    14 hours ago
  • $101.5k - $188.5k

     ...technology. We are searching for a Software Engineer to work on delay calculation and signal...  ...to leverage GPU acceleration for performance-critical workloads. At advanced technology...  ...If you are a job seeker creating a profile using our careers website, please see... 
    Performance

    Cadence Design Systems

    San Jose, CA
    3 days ago
  • $122.5 - $129.3 per hour

     ...requirements. We are seeking an " Software Development Engineer, Release " for one of our Semiconductor clients. According to your profile, this looks like a good match, and I...  ..., and ensure reliable, high-performance releases. This is a hands-on engineering... 
    Performance
    Contract work
    Local area
    Night shift

    Trilyon, Inc.

    San Jose, CA
    1 day ago
  • $129.19k - $214.78k

     ...Software Engineer – Motion & Behavioral Planning San Jose, CA DiDi's autonomous driving...  ...aware, comfortable, and safe velocity profiles. Model complex driving scenarios and...  ...testing, and debugging of the system's performance in various scenarios, leading root... 
    Performance
    Internship

    DiDi Labs

    San Jose, CA
    2 days ago
  •  ...delivers the world’s highest performance scale-out networking...  ...seamlessly integrates hardware, software and system level technologies...  ...thinking team of architects, engineers, and business professionals...  ...and CPU/GPU synchronization. Profile distributed AI workloads and... 
    Performance
    Full time
    Remote work
    Flexible hours

    GrabJobs

    San Jose, CA
    14 hours ago
  •  ...Description: The Software Engineer position will be responsible for hands-on development as well...  ...application architecture, ensure high performance, scalability and availability for...  ...with others. o What makes a candidate profile stand out to you? Candidate should be... 
    Performance
    Contract work
    Work at office
    Local area
    Remote work

    My3Tech Inc

    Sunnyvale, CA
    1 day ago
  • $152k - $241.5k

     ...and robotics. We build the software stack that enables Large Language...  ...robotics to deliver high-performance, production-ready solutions....  ..., and MoE. Benchmark, profile, and optimize inference performance...  ..., Electrical/Computer Engineering, or a closely related field.... 
    Performance
    Remote work

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $2,000 per month

     ...delivering over 10x higher performance and dramatically lower cost...  ...investors and staffed by leading engineers, Etched is redefining the...  ...collectives. Utilize performance profiling and debugging tools to...  ...or complex distributed software systems like Linux internals... 
    Performance
    Work at office
    Relocation package

    ETCHED LLC

    San Jose, CA
    14 hours ago
  • $184k - $287.5k

     ...Senior Software Engineer For Compiler Team NVIDIA's GPUs are at the core of modern AI infrastructure...  ...and execution stack, targeting high-performance kernel generation for deep learning...  ..., including debugging, performance profiling, and designing for maintainability.... 
    Performance
    Work experience placement

    NVIDIA

    Santa Clara, CA
    3 days ago
  •  ...technical leader to join the AI Software group. As a Fellow, you will...  ...to achieve industry-leading performance for our top-tier customers....  ...engagement, and software engineering, ensuring that AMD's software...  ...Engineering: Lead the profiling, analysis, and tuning of large... 
    Performance

    Advanced Micro Devices , Inc.

    San Jose, CA
    2 days ago
  •  ...Description The Software Engineer position will be responsible for hands-on development...  ...application architecture, ensure high performance, scalability and availability for...  ...detail What makes a candidate profile stand out to you? Salesforce Administrator... 
    Performance
    Contract work
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Kaav Inc.

    Sunnyvale, CA
    3 days ago
  • $2,000 per month

     ...models and extremely deep chain-of-thought reasoning. Software Engineer, ML Performance Running millions of tokens per second for large...  ...will involve a blend of low-level programming, performance profiling, and hands-on debugging, all aimed at maximizing the performance... 
    Performance
    Work at office
    Relocation package

    OpenReq

    Cupertino, CA
    14 hours ago
  • $181.1k - $318.4k

     ...Senior Software Engineer - UI and Performance Our 3D Graphics Team is looking for a Senior UI Engineer with a balanced focus on system and UI performance...  ...other Native Platforms. ~ Experience using various profiling tools, (e.g. Xcode Instruments, Android Studio Profiler... 
    Performance
    Relocation

    Apple

    Cupertino, CA
    3 days ago
  • $141k - $202k

     ...Software Engineer III, Flutter (iOS) This full time position is on the Flutter iOS team. Google...  ...like new versions of Swift. Profile to improve Flutter startup latency, memory...  ...used by other developers. Experience with iOS performance and memory profiling.... 
    Performance
    Full time

    Flutter By Google

    Sunnyvale, CA
    3 days ago
  • $152k - $241.5k

     ...We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like...  ...multiple platforms for functionality and performance Develop components of TensorRT,...  ...in software performance benchmarking, profiling, and optimizations. Background in compiler... 
    Performance

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $116.5k - $160.2k

     ...Full Stack Software Engineer Department: Information Technology Employment Type: Full...  ...Develop responsive, accessible, high-performance user interfaces using modern web...  ...performance and Core Web Vitals , applying profiling and experimentation to improve user... 
    Performance
    Hourly pay
    Full time
    Temporary work
    Internship
    Work at office
    Local area
    Flexible hours

    Align Technology

    San Jose, CA
    1 day ago
  • $152k - $241.5k

     ...application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source...  ...an engineer who enjoys digging into performance bottlenecks, designing pragmatic...  ...improve throughput and tail latency. Profile and improve hot paths across layers-... 
    Performance

    NVIDIA

    Santa Clara, CA
    3 days ago
  •  ...instrumental in enhancing GPU kernel performance, accelerating deep learning...  ...across internal GPU software teams and engage with open-source...  ...PERSON: Skilled engineer with strong technical and analytical...  ...Deep Learning Models: Profile, analyze, code change and tune... 
    Performance

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    2 days ago
  • $152k - $241.5k

     ...searching for highly motivated, creative engineers to join the Platform Software team. You will work with a team of...  ...optimizations Drive end-to-end performance excellence: debug and root-cause...  ...stack: application performance profiling and optimization, low-level... 
    Performance

    NVIDIA

    Santa Clara, CA
    14 hours ago
  • $184k - $287.5k

     ...NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing...  ...for developing and maintaining high-performance deep learning frameworks, including SGLang...  ...background with performance modeling, profiling, debug, and code optimization or... 
    Performance
    Remote work

    NVIDIA

    Santa Clara, CA
    6 days ago
  • $184k - $287.5k

     ...We are now looking for a Senior Deep Learning Software Engineer, LLM Performance! NVIDIA is seeking an experienced Deep Learning Engineer passionate...  ...Prior experience with performance modeling, profiling, debug, and code optimization of a DL/HPC/high-performance... 
    Performance

    NVIDIA

    Santa Clara, CA
    14 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer - Performance Profiling. Be the first to apply!