OMS Platform Reliability Lead
Aquent
Position Summary The OMS RUN Platform Reliability Lead is a highly technical leadership role responsible for ensuring the reliability, stability, scalability, and continuous improvement of the Fluent Commerce Order Management (OMS) platform. This role sits at the intersection of Software Engineering, Site Reliability Engineering (SRE), and Technical Operations, leading the RUN support organization while driving automation-first operational excellence. Unlike a traditional application support role, this position requires deep engineering expertise to troubleshoot complex production issues, develop self-healing automation, optimize platform performance, and partner closely with engineering teams to improve the resilience of the Order Management ecosystem. The ideal candidate has experience working within high-volume, event-driven SaaS platforms and possesses strong technical knowledge of Fluent Commerce, GraphQL APIs, Java extensions, SQL, Python automation, and cloud-based observability tools. Platform Reliability & Self-Healing Automation Design and implement automated remediation solutions that reduce manual operational effort and improve platform resiliency. Build automated Order Replay capabilities to recover synchronization failures across event-driven integrations. Develop utilities and automation using Python, Fluent Commerce APIs, and SDKs for bulk updates, data remediation, and operational clean-up activities. Create predictive monitoring and proactive alerting that identifies issues before they impact customers. Continuously identify opportunities to eliminate operational toil through automation. Observability & Platform Monitoring Design and maintain advanced monitoring dashboards using Datadog, Splunk, New Relic, or similar observability platforms. Monitor GraphQL performance, API latency, webhook processing, order throughput, and platform health. Configure intelligent alerting for: Stuck Orders Inventory synchronization failures API degradation Event processing delays Integration failures Analyze production trends to proactively improve platform stability and performance. Technical Incident Management Serve as the highest level technical escalation point for complex production incidents. Perform deep code-level troubleshooting involving: Java custom extensions Fluent Commerce workflows GraphQL mutations REST integrations Event processing Lead technical Root Cause Analysis (RCA) and develop permanent corrective actions. Document technical findings, workarounds, automation opportunities, and platform improvements. Drive continuous improvement of operational processes through lessons learned from production incidents. Performance Engineering Analyze platform performance, API response times, database interactions, and integration bottlenecks. Recommend architectural improvements that improve scalability and system performance. Partner with engineering teams to optimize application performance and reduce operational risk. Identify opportunities to improve platform efficiency across high-volume transactional environments. Engineering Collaboration Act as the primary technical liaison between: Software Engineering Enterprise Architecture E-Commerce Product Teams Infrastructure & Platform Operations Fluent Commerce Engineering Ensure operational considerations are incorporated into product design and development roadmaps. Collaborate with Fluent Commerce product teams on: Platform upgrades API versioning New platform capabilities Production issue resolution Team Leadership Lead and mentor the OMS RUN support engineering team. Develop technical capabilities across the organization through coaching and knowledge sharing. Establish operational best practices for: GraphQL optimization Java troubleshooting API diagnostics Incident response Automation development Foster a culture focused on engineering excellence, reliability, and continuous improvement. Change Management & Release Integrity Review technical configurations and platform extensions before production deployments. Validate production readiness and deployment integrity. Support CI/CD processes and operational release governance. Manage operational configuration changes using Git version control. Ensure proper branching strategies for hotfixes, emergency changes, and production support. Required Qualifications Education Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline. Experience 5-7 years of experience supporting enterprise Order Management Systems, Platform Engineering, Site Reliability Engineering, or Technical Operations. Experience supporting high-volume, mission-critical SaaS applications. Experience leading technical production support or reliability engineering teams. Demonstrated success implementing operational automation and reducing manual support activities. Technical Qualifications Preferred OMS Experience Advanced experience with Fluent Commerce including GraphQL API, Webhooks, Essential Rules, Event Processing, Order Lifecycle Management, Inventory Management. Programming & Development Strong SQL skills for operational analysis and complex transactional querying. Proficiency in Python for automation, scripting, and API integrations. Ability to read, troubleshoot, and debug Java applications and custom extensions. Experience developing against REST APIs and GraphQL APIs. Strong understanding of JSON schemas and API payload structures. Integration & Event-Driven Architecture Experience with modern distributed systems including RESTful services, Event-driven architectures, Pub/Sub messaging, Kafka, Azure Event Grid, Webhooks, Asynchronous processing. Observability & Monitoring Experience with Datadog, Splunk, ELK Stack, New Relic, Grafana, or Prometheus. DevOps & Source Control Strong Git experience including branching strategies and release management. Familiarity with CI/CD deployment pipelines. Experience supporting production releases within Agile environments. Professional Competencies Strong analytical and problem-solving skills with the ability to diagnose complex production issues across multiple technology layers. SRE mindset focused on automation, scalability, reliability, and reducing operational toil. Deep understanding of ITIL principles applied within modern cloud-native environments. Excellent communication skills with the ability to translate technical concepts into business impact. Strong collaboration and leadership skills with experience working across engineering, infrastructure, and business teams. Ability to thrive in fast-paced, high-availability production environments supporting mission-critical commerce platforms. Preferred Qualifications Fluent Commerce implementation or platform engineering experience. Experience supporting enterprise retail or eCommerce platforms. Experience with cloud-native SaaS architectures. Knowledge of microservices architecture and distributed systems. Experience building automation frameworks and self-healing operational tooling. Familiarity with Azure, AWS, or Google Cloud Platform. Experience working within Agile and DevOps environments. Success Measures Increased platform availability and reliability. Reduced Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR). Increased operational automation and self-healing capabilities. Reduced manual production support activities. Improved incident prevention through proactive monitoring. Enhanced system performance and scalability. Strong cross-functional engineering partnerships. Development of a high-performing, technically proficient RUN support team. Aquent is an equal-opportunity employer. We evaluate qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, and other legally protected characteristics. We’re about creating an inclusive environment‑one where different backgrounds, experiences, and perspectives are valued, and everyone can contribute, grow their careers, and thrive. #J-18808-Ljbffr Aquent
- Aquent is looking for an OMS RUN Platform Reliability Lead in Berkeley Heights, NJ. In this highly technical role, you will ensure the reliability and scalability of the Fluent Commerce Order Management platform while leading the RUN support organization. The ideal candidate...Platform
- The Fountain Group is hiring for an OMS Platform Reliability Lead in Berkeley Heights, NJ. This critical technical role focuses on the health and stability of the Fluent Commerce ecosystem by transitioning manual support to automated operations. The selected candidate will...PlatformRemote job
- We're looking for a technical OMS Platform Reliability Lead to own the health and stability of our Fluent Commerce Order Management ecosystem. This is a systems engineering role — you'll lead our RUN support team and drive the shift from reactive support to self-healing...PlatformHourly payContract workFreelanceShift work
$60 per hour
job summary: The OMS Platform Reliability Lead is a highly technical role responsible for the health, stability, and automated evolution of our enterprise cloud Order Management System (OMS) ecosystem. Unlike a traditional operations role, this position leans heavily...PlatformHourly payContract workTemporary workWork experience placement- JOB SUMMARY The OMS Platform Reliability Lead is a highly technical role responsible for the health, stability, and automated evolution of the Fluent Commerce Order Management ecosystem. This position leans heavily into Systems Engineering, requiring the ability to read...Platform
$70k - $80k
...This Old House, and other consumer review platforms. We are powered by extraordinary people. Our innovative products and reliable services are delivered with convenience, excellence... ...Summary: The Field Canvassing Team Lead is responsible for hiring, training, and developing...PlatformFull timeH1bWork at officeLocal areaWork from homeShift workAfternoon shift- Compunnel, Inc. is seeking an OMS Platform Reliability Lead to oversee the health and stability of the Fluent Commerce Order Management ecosystem. This role will lead the technical RUN support team and focus on transitioning operations to a 'Self-Healing' model using automated...Platform
$40 - $43 per hour
...WORK FOR US FOODS®! Ready to build a career with a company that’s leading the foodservice industry? We help YOU make it! US Foods is... ...to cab. On/Off ramp to ground level and side doorsteps and Platform of trailer. Stairs) 3 (Grasp Objects: Hand truck, boxes, cartons...PlatformHourly payTemporary workWork at officeLocal areaMonday to FridayShift work$90.1k - $127.2k
Kenvue is currently recruiting for a: Sr. Material Reliability Lead What we do At Kenvue, we realize the extraordinary power of everyday care. Built on over a century of heritage and rooted in science, we’re the house of iconic brands - including NEUTROGENA®, AVEENO®,...Full time$143k - $210k
...pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that... ...and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises... ...scale data center power access and grid reliability. In addition to U.S. responsibilities,...PlatformPermanent employmentTemporary workCasual workWork at officeFlexible hours- ...ensure on-time and accurate movement of goods Maintain accurate inventory records using Warehouse Management Systems (WMS) and ERP platforms Prepare, review, and process shipping documents (BOLs, packing lists, invoices, labels) Communicate with carriers, drivers, and...Platform
- ...grade integration solutions, ensuring scalability, security, reliability, and high performance across cloud and hybrid environments. Lead the architectural aspects of system integration using WSO2 platforms (API Manager, Micro Integrator, Identity Server), including API...Platform
$119.5k - $163.88k
...engineers and shaping product strategy while ensuring compliance and data governance. With a focus on AI and cloud solutions, you will lead the development of scalable agent systems. A competitive salary range of $119,500 to $163,875 annually is offered, depending on...Platform$130k - $176k
.... Selective's unique position as both a leading insurance group and an employer of choice... ...Architecture, Analytics, AI/ML, and Platform teams to operationalize data products end... ...data products/data contracts, including reliability targets, change management, and consistency...PlatformWork experience placement- A leading tech solutions provider is seeking a Solution Architect in Berkeley Heights, NJ. The role involves providing technical leadership... ...within digital and e-commerce environments, expertise in cloud platforms, and excellent communication skills. Join a dynamic environment...Platform
$90.1k - $127.2k
Lead CSV Specialist - NA Kenvueは現在、a: 私たちがしていること Kenvueで、日常のケアの並外れた力を実感します。100年以上にわたる伝統を基盤に、科学に根ざした当ホテルは、ニュートロジーナ、アヴェーノ、タイレノール、リステ... ...Qualifications Proven expertise in leveraging analytics tools and CRM platforms to drive customer insights and strategic decision-making....Platform$70 - $80 per hour
...using observability tools like Dynatrace and Splunk. Identify reliability gaps through process reengineering and analyze performance... ...Collaborate with development operations for system design consulting, platform management, and capacity planning. Create and maintain...PlatformHourly payPermanent employmentContract workTemporary workWork experience placement- ...Technology JOB DESCRIPTION: Theoris is seeking an experienced Site Reliability Engineer (SRE) to help build, maintain, and optimize highly available, scalable, and resilient platforms that support critical business applications. This role combines software...Platform
$72.8k - $130k
UnitedHealth Group is seeking an experienced Site Reliability Engineer in Basking Ridge, New Jersey. This role involves leading an SRE team, designing scalable systems with cloud technologies, and implementing monitoring strategies. Candidates should have a Bachelor's degree...Remote job- A leading waste management firm in Florham Park is seeking a Senior Turbine Reliability Specialist to develop and execute outage plans while ensuring proper quality control during turbine maintenance. This role requires effective communication with management and suppliers...
$255.48k - $309.59k
...transformation—cutting-edge science and platforms, extensive clinical and real-world datasets... ...can access cell therapies safely, reliably, and equitably.**Position Summary****The... ...Executive Director, Medical Communications Lead is the Cell Therapy Medical (CT Medical)...PlatformHourly payFull timeTemporary workPart timeFor contractorsSummer workLive inWork at officeLocal areaRemote workWorldwideFlexible hoursShift work$143k - $210k
...pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that... ...and scale AI with confidence. Trusted by leading AI labs, startups, and global... ...scale data center power access and grid reliability. In addition to U.S. responsibilities,...PlatformPermanent employmentFull timeTemporary workCasual workWork at officeFlexible hours$128k - $216k
...and consumers to one another millions of times a day - quickly, reliably, and securely. Any time you swipe your credit card, pay through... ...an all-in-one point-of-sale (POS) and business management platform that handles everything from seamless payment processing to inventory...PlatformWorldwide- Job Title: Application Programmer Lead (C#) Job ID: 2024-13036 Job Location: Omaha, NE, Berkeley Heights, NJ, or Frisco, TX (100%... ...Azure Proficiency: Strong familiarity with Azure services and platforms (e.g., App Services, API Management, Key Vault, Monitor, Application...PlatformPermanent employmentLocal areaRelocation
- ...Cosmic Ventures, located in Union, New Jersey, is seeking a compliance expert to ensure that platform and client startups meet regulatory requirements. The candidate should have over 4 years of experience in compliance and a strong understanding of EU business regulations...PlatformRemote work
$185k - $200k
...AI Governance Lead This is an opportunity to join Ascot Group - one of the world's preeminent specialty risk underwriting organizations... ...through an ecosystem of interconnected global operating platforms, we're bound by a common mission and purpose. Our greatest strength...PlatformTemporary workWork at officeLocal areaFlexible hours$132.5k - $338.3k
...We Are: Accenture is a leading global professional services company that helps the world's leading businesses, governments and other... ...of offering and capability leads across private cloud platforms, containers, automation, and private AI. You actively shape complex...PlatformWork experience placementLive inWork at officeLocal area- IEEE is seeking a Senior Manager for Member Platforms in Piscataway Township, NJ. This role will serve as a strategic leader for platforms like IEEE Collabratec and IEEE Volunteering, ensuring high value delivery to members and guiding user engagement. With 7-10 years of...Platform
- ...Social Media Coordinator for a 12-month contract role. The coordinator will focus on customer engagement through major social media platforms. Candidates need to demonstrate strong organizational skills and adaptability in a fast-paced environment. A keen interest in...PlatformContract work
- SwiftX in Carteret, New Jersey is looking for a Platform Customer Sales Manager to drive customer acquisition by developing strategic partnerships with e-commerce platforms and logistics partners. The role demands strong relationship-building skills and a focus on channel...Platform
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to OMS Platform Reliability Lead. Be the first to apply!



