All roles

[Remote] Senior Platform Engineer

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. Tango Analytics is focused on empowering businesses with innovative technology and insightful data. They are seeking a Senior Platform Engineer to help build a platform engineering function, rearchitecting their cloud infrastructure on AWS and Azure, establishing an SRE practice, and creating an Internal Developer Platform.

Responsibilities

  • Migrate all existing AWS and Azure infrastructure to OpenTofu/Terraform and Ansible; establish module standards, remote state, and GitOps-based plan/apply pipelines — no unmanaged resources
  • Audit the cloud estate against the AWS and Azure Well-Architected Frameworks; produce a remediation backlog and drive it to completion across networking, IAM, landing zones, account structure, and cost governance
  • Implement policy-as-code (OPA/Conftest, AWS SCPs, Azure Policy) to enforce security, tagging, and compliance guardrails at the platform layer — governance embedded, not bolted on
  • Build and maintain reusable Terraform modules for compute (EKS, AKS, EC2), networking, storage, databases, and identity as shared building blocks for all engineering teams
  • Define FinOps standards: tagging taxonomy, cost allocation dashboards, rightsizing recommendations, and reserved capacity planning across both clouds
  • Design and implement the full observability stack: metrics (Prometheus/Datadog), logs (Loki/OpenSearch), traces (Tempo/Datadog APM), and dashboards (Grafana) — instrumented end-to-end via OpenTelemetry
  • Define SLIs and SLOs for all platform shared services and critical applications; build error budget dashboards and burn-rate alerting — alert on symptoms, not raw metrics
  • Establish the SRE practice from scratch: incident runbooks, post-incident review templates, and at least one chaos engineering exercise (AWS FIS or equivalent)
  • Partner with engineering teams to instrument their services, define meaningful alerts, and build operational dashboards — reliability is a shared responsibility, not a platform team tax
  • Build capacity planning models for compute and storage so engineering leadership can make data-driven scaling decisions
  • Deploy and operate a developer portal (Backstage, GitHub or equivalent) as the single front door: service catalog, scaffolding templates, runbooks, API docs, and on-call ownership all in one place
  • Build and maintain golden paths for the highest-frequency developer workflows: new service creation, Kubernetes deployment, database provisioning, secrets management, and CI/CD pipeline setup - opinionated defaults with escape hatches for legitimate edge cases
  • Own the CI/CD platform layer: standardized pipeline templates (GitHub Actions, GitLab CI), reusable workflow libraries, container image build and scan pipelines, and environment promotion workflows with security scanning (SAST, Snyk) built in by default
  • Own Kubernetes platform operations: EKS and/or AKS cluster lifecycle, Helm chart standards, admission controllers, RBAC, network policies, and service mesh (Istio or Linkerd)
  • Build the self-service provisioning layer — Backstage scaffolder actions and Terraform automation so developers can provision approved resources without raising a ticket
  • Measure adoption and run regular feedback sessions with engineering teams; iterate on golden paths based on real friction, not assumptions
  • Partner with peer managers and teams to plan and support migration of existing workloads onto the platform; provide hands-on migration support, not just documentation
  • Embed security by default across all platform work: IaC scanning (Checkov, tfsec), secrets management (Vault, AWS Secrets Manager, Azure Key Vault), RBAC, and container image hardening
  • Write clear technical documentation, architecture decision records (ADRs), and runbooks; raise the documentation bar for the whole team
  • Mentor and support more junior platform engineers; contribute to architecture reviews and build-vs-buy decisions alongside the Platform Engineering Manager

Skills

  • Applicants must be authorized to work in the U.S. for any employer
  • We cannot sponsor employment-based visas at this time
  • 5+ years in platform, infrastructure, or DevOps engineering with direct production ownership on AWS and/or Azure
  • Deep OpenTofu/Terraform proficiency: module authoring, state management, workspace strategy, remote backends, and CI/CD integration; Terramate a plus
  • Strong Kubernetes operations: EKS and/or AKS cluster lifecycle, Helm, admission controllers, RBAC, network policies, and autoscaling
  • Hands-on observability experience with two or more of: Prometheus, Grafana, Loki, Tempo, Datadog, or OpenTelemetry — including SLI/SLO definition and alert engineering
  • CI/CD platform experience: GitHub Actions pipeline authoring, reusable workflow design, and container build/scan pipeline ownership
  • GitOps: ArgoCD or Flux for Kubernetes continuous delivery; progressive delivery patterns (canary, blue-green) a strong plus
  • IDP experience: Backstage or equivalent developer portal, GitHub, scaffolding templates, service catalog design, or self-service provisioning tooling
  • Security-first mindset: policy-as-code, IaC scanning, secrets management, container hardening, and shift-left security practices
  • Strong communication and documentation skills; comfortable presenting architecture decisions to engineering peers and leadership
  • SRE background: chaos engineering (AWS FIS, Chaos Monkey), error budget management, incident command, and capacity planning
  • Service mesh depth: Istio or Linkerd — mTLS, traffic management, and observability integration
  • FinOps tooling (Kubecost, CloudHealth) and reserved capacity planning experience
  • Familiarity with AI/ML infrastructure basics: LLM API integration or model serving, as the platform will need to support these workloads
  • Certifications: AWS Solutions Architect Associate/Professional, CKA/CKAD, Azure Administrator/Solutions Architect, HashiCorp Terraform Associate
  • Python or Go for platform tooling and CLI development

Benefits

  • Health, dental, and vision insurance
  • A 401(k) plan with company match
  • Generous paid time off
  • Flexible Work Environment Whether remote, hybrid, or in-office, we support work arrangements that promote productivity and balance

Company Overview

  • Tango builds software solutions that help to unite real estate, lease accounting and facilities management software into a single platform. It was founded in 2008, and is headquartered in Dallas, Texas, USA, with a workforce of 201-500 employees. Its website is https://tangoanalytics.com/.
  • Apply To This Job

    Related roles