All roles

[Remote] Senior Site Reliability Engineer

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. i4DM is a company that provides Federal agencies with access to skilled professionals for complex mission challenges. They are seeking a Senior Site Reliability Engineer to enhance site reliability engineering, cloud operations, and resilient service delivery for VA enterprise healthcare platforms.

Responsibilities

  • Partner with the Technical Director to implement and mature Site Reliability Engineering (SRE) practices across platform services and hosted applications
  • Improve the full service lifecycle from design and deployment through operation and continuous refinement, with a focus on availability, latency, performance, efficiency, and capacity
  • Define, track, and report service level indicators (SLIs), service level objectives (SLOs), and error budgets to guide engineering decisions and service improvements
  • Build, enhance, and maintain CI/CD pipelines that enable secure, automated, and repeatable application and infrastructure delivery
  • Develop and support Infrastructure as Code (IaC) and configuration automation using tools such as Terraform and Ansible to improve consistency, speed, and auditability
  • Integrate automated testing, validation, and security checks into delivery workflows to improve release quality and reduce change-related risk
  • Design and improve monitoring, logging, tracing, alerting, and dashboards to strengthen observability and accelerate issue detection and response
  • Analyze system behavior and performance trends to improve reliability, scalability, and operational efficiency across distributed and cloud-native environments
  • Reduce operational toil by automating repetitive tasks, improving runbooks, and engineering sustainable solutions for recurring operational issues
  • Support cloud infrastructure and platform services in AWS and containerized environments such as Kubernetes, ensuring systems are resilient, scalable, and secure
  • Contribute to platform modernization efforts by improving deployment patterns, environment consistency, and operational readiness for cloud-native services
  • Assist with capacity planning, reliability reviews, and architectural improvements to support growth, resilience, and mission continuity
  • Implement reliability engineering practices that align with Federal security requirements, including secure configuration, least privilege, vulnerability remediation, and policy-based controls
  • Partner with cybersecurity and engineering teams to support secure-by-design infrastructure and application delivery practices
  • Help ensure operational processes and automation align with compliance expectations for Federal and VA environments
  • Collaborate with development, platform, operations, monitoring, incident management, and architecture teams to improve service reliability and deployment outcomes
  • Work closely with the Technical Director and team leads to translate technical direction into actionable engineering improvements and operational standards
  • Support Agile and SAFe delivery practices by helping teams adopt reliable release processes, operational readiness checks, and continuous improvement measures
  • Participate in incident response, service restoration, root cause analysis, and post-incident reviews for critical systems and services
  • Identify recurring issues, reliability gaps, and failure patterns, and drive corrective actions through automation, architectural improvements, and process refinement
  • Contribute to on-call readiness, operational documentation, and blameless continuous improvement practices that improve resilience and reduce mean time to recovery

Skills

  • Bachelor's degree in Computer Science, Engineering, Information Technology, or a related technical field, or equivalent practical experience
  • 5+ years of experience in Site Reliability Engineering, DevOps, platform engineering, cloud operations, or related roles supporting enterprise or mission-critical environments
  • Hands-on experience supporting cloud platforms (AWS preferred), Linux-based environments, and distributed systems at scale
  • Strong experience with Infrastructure as Code and automation tools such as Terraform, Ansible, or comparable technologies
  • Experience with containers and orchestration platforms such as Kubernetes, EKS, ECS, or Docker in production environments
  • Experience building or maintaining CI/CD pipelines and deployment automation in support of secure, reliable software delivery
  • Strong understanding of monitoring, observability, incident response, root cause analysis, and performance optimization principles
  • Proficiency with one or more scripting or programming languages such as Python, Go, Bash, or PowerShell
  • Demonstrated ability to troubleshoot complex systems, automate operational tasks, and collaborate effectively across engineering and operations teams
  • Candidates must be eligible to obtain and maintain a Public Trust clearance
  • Experience supporting VA, Federal Government, or other regulated environments with strong security and compliance requirements
  • Experience defining and operationalizing SLIs, SLOs, error budgets, and service health metrics for production systems
  • Familiarity with observability platforms and tools such as Prometheus, Grafana, CloudWatch, ELK, Splunk, or OpenTelemetry
  • Experience with FedRAMP, NIST, Zero Trust, or other Federal security frameworks relevant to cloud and platform operations
  • Experience supporting healthcare platforms, high-availability enterprise services, or large-scale modernization initiatives
  • Relevant certifications such as AWS Certified DevOps Engineer, AWS Certified Solutions Architect, Certified Kubernetes Administrator (CKA), HashiCorp Terraform Associate, or SRE/DevOps certifications

Company Overview

  • i4DM provides full range of information technology consulting services to government and commercial clients. It was founded in 2002, and is headquartered in Millersville, Maryland, USA, with a workforce of 51-200 employees. Its website is https://www.i4dm.com.
  • Apply To This Job

    Related roles

    [Remote] Product Manager - Centric PLM

    Remote · USA Full-time

    [Remote] Recruiting Sourcer

    Remote · USA Full-time

    [Remote] Network Engineer

    Remote · USA Full-time

    [Remote] ITIL Process Consultant – ServiceNow

    Remote · USA Full-time

    [Remote] Senior Regulatory Compliance Analyst - Privacy

    Remote · USA Full-time

    [Remote] Director, Provider Sales (Western U.S)

    Remote · USA Full-time

    [Remote] Financial Accounting Advisory Services-Finance Optimization-Senior Manager

    Remote · USA Full-time

    [Remote] Senior Full Stack Engineer - North America (Remote)

    Remote · USA Full-time

    [Remote] Senior Account Manager (West)

    Remote · USA Full-time

    [Remote] EHV EPC Project Manager (Power Delivery)- Remote

    Remote · USA Full-time

    Part-time, Remote Zoom Host & Court Reporter (Freelance)

    Remote · USA Full-time

    Claims Adjuster - London Market Claims

    Remote · USA Full-time

    Entry Level Sales Agent - Chat Only - Side Hustle Opportunity at arenaflex

    Remote · USA Full-time

    Remote Commission-Only Insurance Agent – Ideal for Nurses & Healthcare Professionals

    Remote · USA Full-time

    Business Immigration Analyst (Entry Level Paralegal)

    Remote · USA Full-time

    Education Services Americas Program Manager, Business Development Manager Remote - Generis TEK Inc.

    Remote · USA Full-time

    Development Administrator

    Remote · USA Full-time

    Experienced Full Stack Technical Support / Customer Service Representative – German and English Bilingual – Remote Work Opportunity

    Remote · USA Full-time

    Director of Customer Onboarding (Remote) at arenaflex

    Remote · USA Full-time

    Experienced Part-Time Remote Online Data Entry Assistant – Flexible Work Opportunities

    Remote · USA Full-time