Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior HPC and LSF Operations Engineer

NVIDIA

Hardware Infrastructure EDA Compute Team Member

As a member of the Hardware Infrastructure EDA Compute team, you will optimize, scale, and support workload scheduling systems that directly impact design velocity and infrastructure efficiency. Success in this role requires both operational precision along with developing and supporting forward-looking resource management solutions that address evolving compute demands. Beyond day-to-day operations, the role drives improvements in observability, service reliability, and automation, ensuring the EDA compute environment remains resilient, measurable, and aligned with long-term engineering demands.

What you'll be doing:

  • Manage, scale, and optimize job scheduling systems (LSF, Slurm, etc.) in a large-scale, multi-site environment supporting EDA and other compute-intensive workloads
  • Analyze scheduler and infrastructure performance data to identify systemic bottlenecks and drive measurable improvements in utilization, throughput, and turnaround time
  • Lead problem solving across scheduler, OS, and workload layers, ensuring timely resolution of service-impacting issues
  • Identify recurring operational challenges and implement targeted automation or process improvements to reduce manual effort and prevent repeat incidents
  • Help define and track reliable metrics and SLOs for service performance and reliability, partnering with customers to ensure expectations are realistic and measurable
  • Contribute to operational standards, documentation, and best practices to improve consistency across sites
  • Partner directly with customer teams to clarify requirements, translate technical tradeoffs, and drive issues to closure

What we need to see:

  • Bachelor's degree in Computer Science or related field, or equivalent experience
  • Minimum 5+ years of experience operating and supporting large-scale Linux-based compute infrastructure
  • Strong hands-on experience supporting and tuning job scheduling systems (LSF, Slurm, etc.) in HPC or silicon design environments
  • Proficiency in Linux systems administration (CentOS/RHEL)
  • Strong problem solving skills and the ability to independently analyze complex system behavior under load
  • Clear and effective communication skills, including the ability to articulate technical tradeoffs and reliability metrics to engineering stakeholders

Ways to stand out from the crowd:

  • Experience implementing reliability engineering practices within HPC scheduling environments
  • Deep knowledge of job scheduling systems (LSF, Slurm, etc.) configuration tuning, scheduler internals, and advanced troubleshooting techniques
  • Experience building or enhancing observability systems, including metrics collection, monitoring pipelines, alerting strategies, and performance dashboards
  • Background with container technologies such as Docker, Singularity, or Podman in HPC environments
  • Experience influencing adoption of new infrastructure standards across multiple teams or sites

NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most forward-thinking and hardworking people in the world on our team and our collaborative talent continues to drive NVIDIA's growth. We are seeking creative and independent engineers with real passion for technology!

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior HPC and LSF Operations Engineer in Encino, CA vacancy
  • $168k - $195k

     ...Senior Cyber Security Engineer - Siem And Automation At Corebridge Financial, we believe action is everything. That's why every day we partner...  ...leaders to design and execute new strategies through IT and operations services and ensures the necessary IT risk management and... 
    Senior
    Work at office
    Local area
    Immediate start
    Remote work

    Corebridge Financial

    Woodland Hills, CA
    11 hours ago
  •  ...Sr. Cloud Network Automation Engineer SME Key Responsibilities Automation Framework Development: Design, develop, and implement...  ...automation processes to improve performance and reduce operational overhead. Collaboration and Mentorship: Work closely with... 
    Senior

    Netpace

    Encino, CA
    2 days ago
  •  ...Test Automation Engineer (Experienced or Senior) Spectrolab, Inc., a wholly owned subsidiary of The Boeing Company, seeks a Test Automation Engineer to join our Precision Engagement Systems Team based in Sylmar, CA. Spectrolab is the world's leading merchant supplier... 
    Senior
    Permanent employment
    Relocation

    Boeing

    Sylmar, CA
    3 days ago
  • A leading technology firm is seeking a Sr Test Engineer to develop and execute test plans and test scripts for high-end products. Candidates should have at least 5 years of experience in related fields and possess a BS/MS in Computer Science. The role involves collaboration... 
    Senior

    Keysight Technologies SAles Spain SL.

    Calabasas, CA
    3 days ago
  • $150k - $180k

     ...making, the Network Optix Enterprise Video Operating System helps innovative organizations...  ...About the Role We are seeking a Senior Mobile SDET to take full end-to-end ownership...  ...is a strategic role for a high-autonomy engineer who will not just execute, but define... 
    Senior
    Work at office
    Remote work
    Flexible hours

    Network Optix

    Burbank, CA
    2 days ago
  •  ...Senior QA Automation Engineer This position plays a key role within the IT Quality Assurance Department. The candidate will be a member of an Agile software development team and ensure that Quality Assurance tests are written, executed and automated. This is a senior... 
    Senior

    PSG Global Solutions

    Burbank, CA
    2 days ago
  • $110.04k - $204.36k

     ...celebrated, here you can thrive. Your New Role : The Sr. Network Reliabili ty and Automation Engineer function will be responsible for working with the Operations and Engineering Teams around the support and strategic improvement of the network platform using programming... 
    Senior
    Temporary work
    Local area
    Flexible hours

    Warner Bros. Discovery

    Burbank, CA
    3 days ago
  • $170k - $200k

     ...Senior Software Engineer – AI & Workflow Automation POSITION TITLE: Senior Software Engineer – AI & Workflow Automation Location: Hybrid...  ...workflows. You'll collaborate closely with the Tech Lead, operations, and other engineers to deliver production-ready automation... 
    Senior
    Local area
    Remote work
    Worldwide

    Deluxe Corporation

    Burbank, CA
    2 days ago
  •  ...in Rolling Meadows, IL, San Diego, CA or Woodland Hills, CA. This is an onsite position that offers the 9/80 work schedule. The Senior Principal Program Control Analyst - PCIS Tools ( Sr PCA-PCIS Tools ) will be responsible for administration, support and continuous... 
    Senior

    Northrop Grumman

    Woodland Hills, CA
    2 days ago
  • $170k - $277k

     ...Software Engineer At Palo Alto Networks®, we're united by a shared mission—to protect our digital way of life. We thrive at the intersection of innovation and impact, solving real-world problems with cutting-edge technology and bold thinking. Here, everyone has a voice... 
    Senior
    Full time
    Work at office

    Palo Alto Networks

    Encino, CA
    4 days ago
  • $76.2 - $129.74 per hour

     ...Senior Principal Software Engineer IS - Hybrid The Senior Principal Software Engineer takes end-to-end ownership for development and quality of solutions and services that delight caregivers and add strategic value to Providence St. Joseph Health. They evaluate requirements... 
    Senior

    Providence Health & Service

    Encino, CA
    3 days ago
  • $114k - $171k

     ...Principal Or Senior Principal Systems Engineer At Northrop Grumman, our employees have incredible opportunities to work on revolutionary systems that impact people's lives around the world today, and for generations to come. Our pioneering and inventive spirit has enabled... 
    Senior
    Full time
    Contract work
    Shift work

    Northrop Grumman

    Woodland Hills, CA
    4 days ago
  •  ...Senior Principal Cyber Engineer Forcepoint simplifies security for global businesses and governments. Forcepoint's all-in-one, truly cloud-native...  ...deep expertise in enterprise architecture, security operations, and complex software solution delivery. Multi-Product... 
    Senior
    Full time
    Remote work

    Forcepoint

    Encino, CA
    11 hours ago
  • $141.6k - $212.4k

     ...compassionate world. About the Role The Senior Principal Cloud Security Architect is...  ...This is a control-plane role, not an operational security role. The architect owns what...  ...are implemented through Platform Engineering and enforced through automation and governance... 
    Senior
    Work at office
    Local area
    Flexible hours

    Mini Med

    Northridge, CA
    6 hours ago
  • $169.6k - $254.4k

     ...compassionate world. About the Role The Senior Principal Cloud Architect - AWS, is...  ...apps, and clinical systems. Mentor engineers and elevate engineering practices across...  ...models: DynamoDB for high-throughput operational workloads (session/state, telemetry indexes... 
    Senior
    Work at office
    Local area
    Flexible hours

    Mini Med

    Northridge, CA
    3 days ago
  •  ...Senior Human Resources Business Partner (HRBP) Our client is a fast-growing, global technology consulting & services company. The company provides digital solutions to business challenges in a variety of industries, including healthcare, financial services, higher... 
    Senior

    MRINetwork

    Encino, CA
    2 days ago
  • Job Title Experience in validating QMS and different modules in QMS 5+ years’ experience with System Development Lifecyle 10+ years’ experience in Computer System Validation (Based on the role selected) Experience in FDA and/or Global regulated environment with...
    Senior

    E-Solutions

    Encino, CA
    2 days ago
  • Job Responsibilities Follow the development life cycle and adhere to the standard procedures Support acquisitions work for the Kronos iSeries system Research and resolve issues identified in systems and interfaces Determine root cause for problems and implement...
    Senior

    E-Solutions

    Encino, CA
    3 days ago
  • $223k - $306.5k

     ...Sr Principal Ai Engineer At Palo Alto Networks®, we're united by a shared mission—to protect our digital way of life. We thrive at...  ...to design workflows for Go-To-Market (GTM), sales, or support operations. Familiarity with Java or Node.js for enterprise integration... 
    Senior
    Full time
    Work at office

    Palo Alto Networks

    Encino, CA
    4 days ago
  • Senior Data Engineer We are looking for Senior Data Engineers with strong experience in Data Engineering using Python and PySpark, solid expertise in API integration, and proficiency with AWS data services.
    Senior

    E-Solutions

    Encino, CA
    2 days ago
  • Job Title For positions that will be based in CA, the annual salary range for this position is below. Actual salaries may vary based on numerous factors including, among other things, an individual applicant's experience and qualifications for the position. This range...
    Senior

    Ripple

    Encino, CA
    2 days ago
  • $27.09 per hour

     ...Indefinite Job #: 30509 Primary Duties and Responsibilities Salary Rate: $27.09 Job Qualifications Similar Jobs Senior Custodian - Roxbury Dr. Los Angeles, CA, USA Senior Custodian, Per Diem - 1131 Wilshire Blvd. SM Santa Monica, CA, USA Senior... 
    Senior
    Hourly pay
    Daily paid
    Monday to Friday

    University of California

    Van Nuys, CA
    3 days ago
  • $142.5k - $190k

    A prominent entertainment agency is seeking a Principal Architect to lead the design of technology infrastructure spanning on-premises and cloud environments. This role focuses on Microsoft Azure, driving zero-trust security models, and creating a multi-year infrastructure...
    Senior

    IMG LIVE

    Beverly Hills, CA
    11 hours ago
  •  ...Senior Front-end React Developer We are an early-stage medical AI startup based out of California, USA with a research and development office located in India. Be a part of our dynamic team of forward-thinkers and innovators, where cutting-edge technology meets... 
    Senior
    Work at office

    Flutespace

    Encino, CA
    2 days ago
  • IBM Software Job Opportunity At IBM Software, we transform client challenges into solutions. Building the world's leading AI-powered, cloud-native products that shape the future of business and society. Our legacy of innovation creates endless opportunities for IBMers...
    Senior

    IBM

    Encino, CA
    1 day ago
  • $60 - $65 per hour

    Job Title Experience 6-8 years Expertise in developing test automation using JUnit, SOAP UI Expertise in test planning, test plan development, test case writing Experience in driving the team and working/coordinating in onsite/offshore model Experience in...
    Immediate start

    ClifyX

    Northridge, CA
    2 days ago
  • $132.8k - $199.2k

     ...managing their disease. MiniMed is looking for a highly motivated Senior Principal IT Business Systems Analyst - SAP FICO responsible...  ...the SAP FICO module, ensuring minimal disruption to business operations. Collaboration: Work closely with other IT and business... 
    Senior
    Work at office
    Local area
    Flexible hours

    Mini Med

    Northridge, CA
    2 days ago
  •  ...VP Operations Rapidly growing regional senior living AL/MC management company is looking for a vice president of operations with experience motivating and developing teams, building occupancy, and maximizing NOI. You should be located in the Southeast. Some travel... 
    Senior
    Flexible hours

    MRINetwork

    Encino, CA
    3 days ago
  •  ...Senior Embedded Linux Engineer – Next Generation Sdv Platform (Nvidia Thor) We are seeking a senior embedded linux engineer to design and develop advanced software features for next-generation automotive platforms running on linux os. This role focuses on communication... 
    Senior

    Pi Square Technologies

    Encino, CA
    2 days ago
  • Concrete Estimating Specialist Responsible for detailed quantity take-off and preliminary pricing for multi-story cast-in-place concrete structures in California. Requirements Must have a minimum of 10 years estimating self-performed multi-story cast-in-place...
    Senior
    Hourly pay

    MRINetwork

    Encino, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior HPC and LSF Operations Engineer. Be the first to apply!