All roles

[Remote] Order Management System (OMS) Staff Engineer

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. Levi Strauss & Co. is a company that values individuality and making a positive impact. They are seeking a Staff Engineer for their Order Management System (OMS) team, responsible for leading the design and architecture of complex systems, ensuring engineering excellence, and promoting operational reliability.

Responsibilities

  • Lead the design and domain modeling of complex, distributed systems within the OMS ecosystem. This produces clear, well-reasoned service boundaries, data contracts, and event-driven interaction patterns that stand up to scrutiny and scale
  • Champion domain-driven design (DDD) principles, working with product and engineering peers to identify bounded contexts, eliminate implicit coupling, and surface shared language across teams
  • Guide decomposition of monolithic or tightly-coupled components into well-defined, independently deployable services—reducing blast radius, improving team autonomy, and promoting faster iteration
  • Author architecture decision records (ADRs) and technical design documents that communicate the "why" alongside the "what," helping teams make decisions over time
  • Write, review, and guide production-quality code with an emphasis on clarity, testability, and long-term maintainability—setting the bar for engineering craft on the team
  • Apply modern software engineering practices: CI/CD pipelines, automated testing strategies, feature flagging, progressive delivery, and trunk-based development
  • Identify and eliminate technical debt systematically, balancing short-term velocity with long-term system health through well-argued, incremental improvement plans
  • Establish and promote coding standards, patterns, and best practices across the OMS team that are practical, enforceable, and grounded in production experience
  • Operate with full production: you design with failure in mind, participate in on-call rotations, and take accountability for the health and reliability of the systems you ship
  • Embed reliability engineering into the development lifecycle—defining SLOs, error budgets, and reliability targets upfront rather than as an afterthought
  • Treat runbooks, strategies, and operational documentation as first-class engineering artifacts, keeping them accurate, applicable, and tightly coupled to the systems they describe
  • Design and implement comprehensive observability strategies—structured logging, distributed tracing, and metrics—so that you can localize any failure mode in production
  • Develop dashboards that give engineers, on-call responders, and partners genuine operational insight into system health—not just uptime pings, but meaningful golden signals and business-relevant Goals
  • Define and tune alerting strategies that are signal-rich and noise-poor—ensuring you wake on-call engineers for relevant events, not symptoms of unrelated upstream noise
  • Champion observability as a design constraint, ensuring you instrument new services and that you make telemetry quality part of every code review and launch checklist
  • Design systems that can sustain peak commercial volumes—seasonal traffic spikes, flash sales, and global expansion—without degraded experience or unplanned downtime
  • Apply scalability patterns: asynchronous messaging, event sourcing, CQRS, caching strategies, database sharding, and graceful degradation, selecting the right tool for each problem
  • Conduct and lead capacity planning exercises, load testing, and performance profiling—translating production data into informed infrastructure and architectural decisions
  • Be the senior technical resource during complex production incidents—methodically narrowing hypotheses, leading war rooms, and restoring service while preserving forensic evidence for root cause analysis
  • Facilitate blameless post-incident reviews (PIRs) that produce durable improvements—not just immediate fixes, but systemic changes that reduce the likelihood or impact of recurrence
  • Develop institutional troubleshooting knowledge: document failure modes, known issues, and diagnostic techniques so the entire team grows more capable with each incident
  • Partner with product managers, architects, and other engineers to translate our requirements into clear, achievable technical roadmaps—bridging the gap between strategy and implementation
  • Mentor and level up mid-level engineers through hands-on code review, design feedback, pairing sessions, and direct coaching—building engineering depth across the OMS team
  • Stay current with industry trends in distributed systems, event-driven architecture, and operational tooling—bringing informed perspectives on when to adopt new approaches versus doubling down on patterns

Skills

  • 10+ years of experience in software engineering with a focus on backend systems, distributed architectures, and platform/product engineering at scale
  • Deep, practical experience designing and modeling complex distributed systems—you articulate trade-offs and make well-reasoned architectural choices under constraints
  • You have experience operating in a 'you build it, you run it' engineering culture. You've been on-call for systems you've built, responded to incidents, and used that experience to make better engineering decisions
  • Build for scale and run at scale—you've handled high-throughput, high-availability systems and have the scars and lessons to show for it
  • Expert-level understanding of observability: you can instrument a system from scratch, build meaningful dashboards, tune alerting, and use telemetry data as a primary tool for engineering decisions
  • Troubleshoot with a systematic, data-driven approach to diagnosing production issues—you stay calm and lead others when systems are on fire
  • Demonstrated experience decoupling tightly-coupled systems—whether migrating a monolith, extracting a shared service, or replacing implicit temporal dependencies with well-defined async contracts
  • Experience with event-driven architecture, domain-driven design, and modern API design patterns; you know where these patterns add value and where they add unnecessary complexity
  • Mastery of CI/CD, automated testing, and DevOps practices; you view them as engineering fundamentals, not optional add-ons
  • You can translate technical complexity for non-technical partners and write for engineering audiences—design docs, ADRs, incident reports, and code reviews all reflect your thinking
  • Experience working with geographically distributed teams and navigating the complexities of multi-time zone collaboration
  • Experience with Order Management Systems (OMS), fulfillment pipelines, or commerce platforms is a meaningful plus—familiarity with the domain accelerates your impact, but is not a prerequisite for the right engineer

Benefits

  • Base pay
  • Incentive plans
  • 401(k) matching
  • Paid leave
  • Health insurance
  • Product discounts

Company Overview

  • Levi Strauss & Co. is a brand-name apparel company designs, markets, and sells jeans, casual and dress pants, jackets, skirts, and more. It was founded in 1853, and is headquartered in San Francisco, California, USA, with a workforce of 10001+ employees. Its website is http://levistrauss.com/.
  • Company H1B Sponsorship

  • Levi Strauss & Co. has a track record of offering H1B sponsorships, with 8 in 2026, 37 in 2025, 42 in 2024, 49 in 2023, 76 in 2022, 59 in 2021, 39 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Related roles