All roles

[Remote] Senior Site Reliability Engineer

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. i4DM is a company that provides Federal agencies with access to skilled professionals for complex mission challenges. They are seeking a Senior Site Reliability Engineer to enhance site reliability engineering and cloud operations for VA enterprise healthcare platforms, focusing on automation and resilient service delivery.

Responsibilities

  • Partner with the Technical Director to implement and mature Site Reliability Engineering (SRE) practices across platform services and hosted applications
  • Improve the full service lifecycle from design and deployment through operation and continuous refinement, with a focus on availability, latency, performance, efficiency, and capacity
  • Define, track, and report service level indicators (SLIs), service level objectives (SLOs), and error budgets to guide engineering decisions and service improvements
  • Build, enhance, and maintain CI/CD pipelines that enable secure, automated, and repeatable application and infrastructure delivery
  • Develop and support Infrastructure as Code (IaC) and configuration automation using tools such as Terraform and Ansible to improve consistency, speed, and auditability
  • Integrate automated testing, validation, and security checks into delivery workflows to improve release quality and reduce change-related risk
  • Design and improve monitoring, logging, tracing, alerting, and dashboards to strengthen observability and accelerate issue detection and response
  • Analyze system behavior and performance trends to improve reliability, scalability, and operational efficiency across distributed and cloud-native environments
  • Reduce operational toil by automating repetitive tasks, improving runbooks, and engineering sustainable solutions for recurring operational issues
  • Support cloud infrastructure and platform services in AWS and containerized environments such as Kubernetes, ensuring systems are resilient, scalable, and secure
  • Contribute to platform modernization efforts by improving deployment patterns, environment consistency, and operational readiness for cloud-native services
  • Assist with capacity planning, reliability reviews, and architectural improvements to support growth, resilience, and mission continuity
  • Implement reliability engineering practices that align with Federal security requirements, including secure configuration, least privilege, vulnerability remediation, and policy-based controls
  • Partner with cybersecurity and engineering teams to support secure-by-design infrastructure and application delivery practices
  • Help ensure operational processes and automation align with compliance expectations for Federal and VA environments
  • Collaborate with development, platform, operations, monitoring, incident management, and architecture teams to improve service reliability and deployment outcomes
  • Work closely with the Technical Director and team leads to translate technical direction into actionable engineering improvements and operational standards
  • Support Agile and SAFe delivery practices by helping teams adopt reliable release processes, operational readiness checks, and continuous improvement measures
  • Participate in incident response, service restoration, root cause analysis, and post-incident reviews for critical systems and services
  • Identify recurring issues, reliability gaps, and failure patterns, and drive corrective actions through automation, architectural improvements, and process refinement
  • Contribute to on-call readiness, operational documentation, and blameless continuous improvement practices that improve resilience and reduce mean time to recovery

Skills

  • Bachelor's degree in Computer Science, Engineering, Information Technology, or a related technical field, or equivalent practical experience
  • 5+ years of experience in Site Reliability Engineering, DevOps, platform engineering, cloud operations, or related roles supporting enterprise or mission-critical environments
  • Hands-on experience supporting cloud platforms (AWS preferred), Linux-based environments, and distributed systems at scale
  • Strong experience with Infrastructure as Code and automation tools such as Terraform, Ansible, or comparable technologies
  • Experience with containers and orchestration platforms such as Kubernetes, EKS, ECS, or Docker in production environments
  • Experience building or maintaining CI/CD pipelines and deployment automation in support of secure, reliable software delivery
  • Strong understanding of monitoring, observability, incident response, root cause analysis, and performance optimization principles
  • Proficiency with one or more scripting or programming languages such as Python, Go, Bash, or PowerShell
  • Demonstrated ability to troubleshoot complex systems, automate operational tasks, and collaborate effectively across engineering and operations teams
  • Candidates must be eligible to obtain and maintain a Public Trust clearance
  • Experience supporting VA, Federal Government, or other regulated environments with strong security and compliance requirements
  • Experience defining and operationalizing SLIs, SLOs, error budgets, and service health metrics for production systems
  • Familiarity with observability platforms and tools such as Prometheus, Grafana, CloudWatch, ELK, Splunk, or OpenTelemetry
  • Experience with FedRAMP, NIST, Zero Trust, or other Federal security frameworks relevant to cloud and platform operations
  • Experience supporting healthcare platforms, high-availability enterprise services, or large-scale modernization initiatives
  • Relevant certifications such as AWS Certified DevOps Engineer, AWS Certified Solutions Architect, Certified Kubernetes Administrator (CKA), HashiCorp Terraform Associate, or SRE/DevOps certifications

Company Overview

  • i4DM provides full range of information technology consulting services to government and commercial clients. It was founded in 2002, and is headquartered in Millersville, Maryland, USA, with a workforce of 51-200 employees. Its website is https://www.i4dm.com.
  • Apply To This Job

    Related roles

    [Remote] Healthcare Sales Executive - Michigan

    Remote · USA Full-time

    [Remote] Regional Director of Operations- Medication Assisted Treatment

    Remote · USA Full-time

    [Remote] Mechanical Design Engineers (3) - Secret

    Remote · USA Full-time

    [Remote] Sr. Business Consultant - Outside Sales

    Remote · USA Full-time

    [Remote] Enterprise Account Executive - East

    Remote · USA Full-time

    [Remote] Volunteer: Organic Marketing Strategy to Reach More Educators

    Remote · USA Full-time

    [Remote] Remote Operations Center Team Lead (Level III)

    Remote · USA Full-time

    [Remote] Senior Mid Market Account Executive

    Remote · USA Full-time

    [Remote] Senior Product Manager, Growth

    Remote · USA Full-time

    [Remote] Manager, Strategic Operations

    Remote · USA Full-time

    Solution Engineer, Sr

    Remote · USA Full-time

    Experienced Data Entry Specialist – Weekend Shifts – Remote Opportunity at arenaflex

    Remote · USA Full-time

    Senior Investigative Reporter – City Journal

    Remote · USA Full-time

    Jobnity Jobs Work From Home (Remote Jobs)

    Remote · USA Full-time

    Search Engine Marketing Manager (SEM) Remote Remote, United States Remote United States Search Engine Marketing Manager (SEM)

    Remote · USA Full-time

    Triage Nurse RN - OC/PRN - Remote

    Remote · USA Full-time

    [Remote] Product Manager, AI

    Remote · USA Full-time

    [Remote] Junior Backup/Storage Engineer

    Remote · USA Full-time

    Online Weekend Jobs from Home: Earn Money with Opinion Sharing ...

    Remote · USA Full-time

    Data Scientist / Statistician for Restaurant Pilot (Project-Based, Work Trial)

    Remote · USA Full-time