AI Infrastructure Operations Engineer
Dormont Manufacturing Co
Cerebras Systems builds the world’s largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras’ current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras , to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. About The Role The AI Infrastructure Operations Engineer (SiteOps) is an entry-level individual contributor role focused on the deployment, bring-up, monitoring, and first-line troubleshooting of Cerebras AI infrastructure in data center environments. The role supports CS systems, cluster server hardware, cluster networking hardware, and hardware telemetry and monitoring tools. Support reliable operation and scale-out of Cerebras AI clusters by executing defined hardware bring-up and validation procedures, monitoring telemetry, performing first-line troubleshooting, and escalating issues using established workflows. Responsibilities Assist with deployment and bring-up of CS-X systems, cluster servers, and networking hardware;
- Execute power-on sequencing, readiness checks, and validation tests.
- Monitor hardware telemetry, alerts, and dashboards.
- Perform first-line troubleshooting and structured escalation.
- Collect logs, telemetry, and observations during incidents.
- Use existing monitoring, telemetry, and incident tracking tools.
- Provide feedback on tooling and process gaps.
- Learn cluster hardware and networking fundamentals.
- Shadow senior engineers during complex debugging.
- Progress toward independent ownership of defined workflows.
- No final escalation authority.
- No ownership of cluster architecture, hardware design, or tooling architecture.
$90k - $110k
...CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave... ..., CoreWeave combines superior infrastructure performance with deep technical expertise... ...seeking a dedicated and detail-oriented Operations Engineer to join our HPC Networking Team. HPC...SuggestedPermanent employmentTemporary workCasual workWork at officeFlexible hours- ...Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides... ...We are seeking a highly skilled and experienced AI Infrastructure Operations Engineer to manage and operate our cutting-edge machine learning compute...Suggested
- A high-performance AI infrastructure company in Santa Clara is looking for an IT Helpdesk and Operations Engineer. This role involves supporting and designing IT systems, managing security protocols, and leading significant IT projects. Candidates should have 2-3+ years...Suggested
$248k - $396.75k
...the unlimited potential of AI to define the next era of computing... ...the performance of our infrastructure both on-prem and cloud. Join... ...skilled Principal AI/ML Engineer to join our dynamic team to... ...architecture/standards/reuse, and operational documentation via Confluence...Suggested- ...Member of Technical Staff (Sr. MTS) to join their Cloud Test team in Santa Clara, CA. Responsibilities include collaborating with engineers on product requirements, designing test plans, writing automated tests in Python or Go, and performing performance testing. Required...Suggested
$184k - $287.5k
...NVIDIA DGX Cloud is building and operating large-scale GPU infrastructure for AI research and production workloads. We are looking for Senior Software Engineers to help build the automation, tooling, and operational systems that make GPU clusters reliable, scalable, and...Remote work$140k - $215k
...CrowdStrike, Inc. is seeking a Cloud Software Engineer to join its Falcon Complete AI Engineering Team in Sunnyvale, California. This role involves designing highly scalable cloud ecosystems and automating services utilizing Golang, Python, and AI technologies like Large...$180k - $225k
...days per week Extreme's Cloud Operations team is a group of talented engineers passionate about building highly... ...operation, as well as cloud infrastructure design and implementation. Together... ...and best practices and leverages AI and cloud service provider platforms...Work experience placementWork at officeLocal area2 days per week1 day per week$110k - $155k
...Lead Cloud Engineering and Production Operations Engineer This role acts as a hands-on technical lead, driving... ...initiatives, automating infrastructure, and ensuring high-availability and... ...compliance frameworks Exposure to AI/ML pipeline infrastructure or high-...For subcontractorLocal area- ...CrowdStrike, Inc. is seeking a Cloud Software Engineer to join the Falcon Complete AI Engineering Team in Sunnyvale, California. In this role, you will design, build, and deploy distributed cloud ecosystems using technologies such as Golang and Python. The ideal candidate...
$272k - $431.25k
...Principal Software Engineer NVIDIA DGX Cloud is scaling GPU infrastructure across internal, partner, and cloud environments... ...engineering, Kubernetes-based operations, automation, and reliability... ...Experience with GPU clusters, AI/ML infrastructure, Kubernetes operators...$140k - $185k
...Principal Cloud Engineering and Production Operations Engineer The Principal Cloud and Production Operations... .... This role combines deep cloud infrastructure expertise with strong production... ...access security model Exposure to AI/ML infrastructure or data-driven...For subcontractorLocal area$181.1k - $318.4k
...Senior Software Engineer, Intelligent Automation & Developer Platforms... ...advertising platforms that operate at massive scale across Apple... ...role focused on building AI/LLM-powered internal tooling... ...scalable automation, developer infrastructure, and full-stack engineering platforms...Relocation$168k - $270.25k
...We are seeking an experienced Senior QA Automation Engineer to join our Network AI platform team. This role combines manual testing expertise with... ...architectures with AI/ML correlation engines using multiple network operating systems via WebUI, REST APIs, CLI, and shell interfaces....- ...Our enterprise-level client is seeking to add a Network Operations Engineer to the team in Mountain View, CA. Please see below for full details- Job Notes: -- 3-month contract / extensions possible with good performance. -- Onsite in Mountain View, CA 94041...Hourly payContract workFor contractors
$184k - $287.5k
...Become a Senior System Software Engineer on NVIDIA's AI Inference Operations Team, focusing on DevOps and Infrastructure Automation. Join a company revolutionizing computer graphics, PC gaming, and accelerated computing. You will be working alongside a team of passionate...$209k - $343k
...platform and tooling that puts AI to work across LinkedIn's R&... ...AI agents are built and operated. The team also develops LinkedIn... ..., AI tooling, and platform engineering - shaping how thousands of... ...Platform - the foundational infrastructure that enables internal teams...For contractorsWork at officeImmediate startFlexible hours$181.1k - $318.4k
Apple Inc. is looking for a Senior Software Engineer in Cupertino to build intelligent automation frameworks and developer tools. You will collaborate with the Business Integration Testing team to enhance engineering productivity across Apple Ads. The role requires 8+...$168k - $322k
NVIDIA Gruppe is looking for an experienced Senior QA Automation Engineer to join our Network AI platform team in Santa Clara, California. This role involves manual and automated testing to ensure quality in AI/ML-powered network solutions. The ideal candidate will have...$207k - $300k
Google Inc. seeks a Staff Software Engineer to develop AI-powered Governance, Risk, and Compliance automation. Ideal candidates should have extensive experience in software development and machine learning, and will be responsible for defining technical strategies and...- A technology startup focused on AI solutions is seeking an experienced QA Engineer to test products across various platforms and enhance test automation. The ideal candidate must have a Bachelor's degree in Computer Science and at least 5 years of hands-on testing experience...
$165k - $241.4k
...largest networks in the world. Engineers on this team will collaborate... ...a global presence, the team operates from centers in the US,... ...solving skills. Experience with AI and machine learning is a... ...revolutionizing how data and infrastructure connect and protect organizations...Full timeTemporary workLocal areaWorldwideFlexible hours$175k - $210k
...Infrastructure Engineer Forward Networks is transforming how the world's most complex networks are managed and secured. Founded in 2... ...shaping the future of network reliability, security, and AI-ready operations. Forward Networks is looking for an experienced Infrastructure...Work experience placementWork at office2 days per week- ...Agentic Infrastructure Engineer Intern Santa Clara, CA XPENG is a leading smart technology company... ...of innovation, integrating advanced AI and autonomous driving technologies... ...automated feedback loops. Instrument operational observability tooling (e.g., LangFuse,...Internship
$196k - $310.5k
...NVIDIA Senior DFT Infrastructure Engineer NVIDIA is the leader in AI, machine learning, and datacenter acceleration. NVIDIA is growing that leadership into datacenter networking. This includes ethernet switches, NICs, and DPUs. NVIDIA has reinvented itself continuously...Worldwide- ...look for- Linux, Networking, Automation, Python/ Java, Data Analytics Job description: Experienced Analytics and Automation Engineer, preferably with experience in the telecom industry. The ideal candidate will have a strong analytics and automation background...
$224k - $356.5k
...tapping into the unlimited potential of AI to define the next era of computing. An... ...looking for a passionate member to join our Engineering Team in GeForce NOW as a Senior Systems... ...the areas of virtualization and global infrastructure, distributed systems, load balancing,...Remote work$224k - $356.5k
...for a Senior System Software Engineer for Cloud who sees the big picture... ...(CRDs, controllers, operators, workload scheduling, auto-scaling... ..., observability, and infrastructure automation. What we need to... ...existing vacancy. NVIDIA uses AI tools in its recruiting processes...Local area$152k - $241.5k
...Software Test Engineer NVIDIA is looking for a top-tier Software... ...qualification of the Network Operating System software that powers data... ...meet the exploding growth in AI and high-performance... ...accelerated computing. Today, our AI infrastructure powers global intelligence,...Work experience placement$141.91k - $200.34k
...Description: Join an enthusiastic team of engineers in Intel's Networking Solutions Group (... ...enabling next generation programmable Infrastructure Processing Units (IPUs) with our lead... ...center workloads, RDMA, collectives, and AI benchmarking. Understanding of secure...Local areaImmediate startShift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Infrastructure Operations Engineer. Be the first to apply!
- machine learning ai engineer Sunnyvale, CA
- ai engineer remote Sunnyvale, CA
- ai prompt engineer Sunnyvale, CA
- ai developer Sunnyvale, CA
- ai engineer Sunnyvale, CA
- ai ml engineer Sunnyvale, CA
- senior ai engineer Sunnyvale, CA
- principal infrastructure engineer Sunnyvale, CA
- remote infrastructure engineer Sunnyvale, CA
- data infrastructure engineer Sunnyvale, CA


