[Remote] Senior/Staff/Principal Full-stack, Infra, Platform, Backend, Storage SWEs
Note: The job is a remote job and is open to candidates in USA. BRK Tech, a Berkshire Hathaway Group Company, is seeking Senior, Staff & Principal Software Engineers across various domains. The roles focus on designing, building, and operating technology platforms to support large-scale systems across multiple industries.
Responsibilities
- Design, build, and operate modern technology platforms that run at massive scale and support long‑term, real‑world impact
- Build and operate enterprise messaging platforms
- Operate and scale large distributed production systems
- Build and operate enterprise observability platforms
Skills
- Bachelor's degree or equivalent practical experience (4+ additional years)
- Typically 6–8+ years of relevant experience with strong hands‑on ownership for Senior Engineers
- Typically 10+ years with deep technical leadership and architecture ownership for Principal Engineers
- 6+ years of professional software development experience
- Strong proficiency in at least one major language (Java, Go, Python, C#, or similar)
- Hands‑on experience building distributed systems and platform services
- Backend and platform development: microservices, APIs, service‑to‑service integrations
- Messaging and event‑driven systems: Kafka, RabbitMQ, or similar
- Data platforms: SQL, NoSQL, Graph databases, caching systems
- CI/CD pipelines, Git workflows, and modern DevOps practices
- Containers, Kubernetes, and/or serverless runtimes
- Familiarity with observability stacks (metrics, logging, tracing, alerting)
- Hands‑on experience with Ceph‑based SDS (object, block, file)
- Storage protocols: NVMe‑oF (TCP), S3, NFS, SMB/CIFS
- Designing and scaling high‑availability, multi‑tenant storage platforms
- Data resiliency and recovery models (N+2 / N+3, erasure coding, DR)
- Storage integration with Kubernetes, virtualization, and bare metal
- Strong Linux systems expertise
- Automation using Python, Go, Bash, C/C++
- Experience with OpenStack, Nutanix, VMware, KubeVirt
- Architecture of high‑performance storage fabrics
- Expertise in NVMe‑oF, RDMA, QoS
- Deep routing knowledge: BGP, OSPF
- Experience with Cisco, Palo Alto, NVIDIA/Mellanox
- Designing and validating N+2 / N+3 resilient architectures
- Linux networking and open‑source ecosystems
- Strong grounding in network security
- Performance tuning for distributed storage (Ceph/Swift) at scale
- Kubernetes and virtualization optimization (KubeVirt a plus)
- Advanced Linux tuning: kernel parameters, eBPF, I/O profiling
- High‑performance hardware & networking (PCIe Gen5, NVMe‑oF/TCP)
- NUMA, CPU pinning, cgroups, and resource isolation
- Quantitative modeling of latency and throughput
- Operating and scaling large distributed production systems
- Strong foundation in SRE principles
- Deep experience with Kubernetes, Linux internals, and automation
- Infrastructure tooling using Go, Python, or Java
- Observability across metrics, logging, tracing, and alerting
- High‑availability and data resiliency architectures
- Leadership during high‑severity production incidents
- Ownership of platforms with 24/7 operational responsibility
- Building and operating enterprise messaging platforms
- Deep expertise with Kafka and/or RabbitMQ, including multi‑DC
- Messaging architecture: topic/queue design, durability, HA, scaling
- Event‑driven and integration‑heavy environments
- Kubernetes‑based platforms (on‑prem and/or cloud)
- API‑driven integrations and distributed systems fundamentals
- Strong production ownership mindset
- Building and operating enterprise observability platforms
- Strong open‑source background
- Expertise across logging, metrics, distributed tracing, and alerting
- Kubernetes telemetry pipelines and centralized ingestion
- Code‑level visibility into latency, errors, and service dependencies
- Alerting driven by SLOs, SLAs, and latency thresholds
- Public cloud exposure (AWS, Azure, GCP)
Company Overview