[Remote] Senior Platform Engineer
Note: The job is a remote job and is open to candidates in USA. Tango Analytics is focused on empowering businesses with innovative technology and insightful data. They are seeking a Senior Platform Engineer to help build a platform engineering function, rearchitecting their cloud infrastructure on AWS and Azure, establishing an SRE practice, and creating an Internal Developer Platform.
Responsibilities
- Migrate all existing AWS and Azure infrastructure to OpenTofu/Terraform and Ansible; establish module standards, remote state, and GitOps-based plan/apply pipelines — no unmanaged resources
- Audit the cloud estate against the AWS and Azure Well-Architected Frameworks; produce a remediation backlog and drive it to completion across networking, IAM, landing zones, account structure, and cost governance
- Implement policy-as-code (OPA/Conftest, AWS SCPs, Azure Policy) to enforce security, tagging, and compliance guardrails at the platform layer — governance embedded, not bolted on
- Build and maintain reusable Terraform modules for compute (EKS, AKS, EC2), networking, storage, databases, and identity as shared building blocks for all engineering teams
- Define FinOps standards: tagging taxonomy, cost allocation dashboards, rightsizing recommendations, and reserved capacity planning across both clouds
- Design and implement the full observability stack: metrics (Prometheus/Datadog), logs (Loki/OpenSearch), traces (Tempo/Datadog APM), and dashboards (Grafana) — instrumented end-to-end via OpenTelemetry
- Define SLIs and SLOs for all platform shared services and critical applications; build error budget dashboards and burn-rate alerting — alert on symptoms, not raw metrics
- Establish the SRE practice from scratch: incident runbooks, post-incident review templates, and at least one chaos engineering exercise (AWS FIS or equivalent)
- Partner with engineering teams to instrument their services, define meaningful alerts, and build operational dashboards — reliability is a shared responsibility, not a platform team tax
- Build capacity planning models for compute and storage so engineering leadership can make data-driven scaling decisions
- Deploy and operate a developer portal (Backstage, GitHub or equivalent) as the single front door: service catalog, scaffolding templates, runbooks, API docs, and on-call ownership all in one place
- Build and maintain golden paths for the highest-frequency developer workflows: new service creation, Kubernetes deployment, database provisioning, secrets management, and CI/CD pipeline setup - opinionated defaults with escape hatches for legitimate edge cases
- Own the CI/CD platform layer: standardized pipeline templates (GitHub Actions, GitLab CI), reusable workflow libraries, container image build and scan pipelines, and environment promotion workflows with security scanning (SAST, Snyk) built in by default
- Own Kubernetes platform operations: EKS and/or AKS cluster lifecycle, Helm chart standards, admission controllers, RBAC, network policies, and service mesh (Istio or Linkerd)
- Build the self-service provisioning layer — Backstage scaffolder actions and Terraform automation so developers can provision approved resources without raising a ticket
- Measure adoption and run regular feedback sessions with engineering teams; iterate on golden paths based on real friction, not assumptions
- Partner with peer managers and teams to plan and support migration of existing workloads onto the platform; provide hands-on migration support, not just documentation
- Embed security by default across all platform work: IaC scanning (Checkov, tfsec), secrets management (Vault, AWS Secrets Manager, Azure Key Vault), RBAC, and container image hardening
- Write clear technical documentation, architecture decision records (ADRs), and runbooks; raise the documentation bar for the whole team
- Mentor and support more junior platform engineers; contribute to architecture reviews and build-vs-buy decisions alongside the Platform Engineering Manager
Skills
- Applicants must be authorized to work in the U.S. for any employer
- We cannot sponsor employment-based visas at this time
- 5+ years in platform, infrastructure, or DevOps engineering with direct production ownership on AWS and/or Azure
- Deep OpenTofu/Terraform proficiency: module authoring, state management, workspace strategy, remote backends, and CI/CD integration; Terramate a plus
- Strong Kubernetes operations: EKS and/or AKS cluster lifecycle, Helm, admission controllers, RBAC, network policies, and autoscaling
- Hands-on observability experience with two or more of: Prometheus, Grafana, Loki, Tempo, Datadog, or OpenTelemetry — including SLI/SLO definition and alert engineering
- CI/CD platform experience: GitHub Actions pipeline authoring, reusable workflow design, and container build/scan pipeline ownership
- GitOps: ArgoCD or Flux for Kubernetes continuous delivery; progressive delivery patterns (canary, blue-green) a strong plus
- IDP experience: Backstage or equivalent developer portal, GitHub, scaffolding templates, service catalog design, or self-service provisioning tooling
- Security-first mindset: policy-as-code, IaC scanning, secrets management, container hardening, and shift-left security practices
- Strong communication and documentation skills; comfortable presenting architecture decisions to engineering peers and leadership
- SRE background: chaos engineering (AWS FIS, Chaos Monkey), error budget management, incident command, and capacity planning
- Service mesh depth: Istio or Linkerd — mTLS, traffic management, and observability integration
- FinOps tooling (Kubecost, CloudHealth) and reserved capacity planning experience
- Familiarity with AI/ML infrastructure basics: LLM API integration or model serving, as the platform will need to support these workloads
- Certifications: AWS Solutions Architect Associate/Professional, CKA/CKAD, Azure Administrator/Solutions Architect, HashiCorp Terraform Associate
- Python or Go for platform tooling and CLI development
Benefits
- Health, dental, and vision insurance
- A 401(k) plan with company match
- Generous paid time off
- Flexible Work Environment Whether remote, hybrid, or in-office, we support work arrangements that promote productivity and balance
Company Overview