All roles

[Remote] Principal Network Architect- AI Infrastructure

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. Nscale is a GPU cloud company designed for AI, providing high-performance infrastructure for AI startups and enterprises. They are seeking a Principal Network Architect to lead the development and operational excellence of their global AI networking infrastructure, focusing on RDMA and Infiniband technologies to enhance AI training outcomes.

Responsibilities

  • Own the technical direction and operational lifecycle management of Nscale’s high-performance RDMA network fabrics
  • Define long-term architecture, reliability strategy, and operational standards for AI interconnect networks
  • Lead availability and performance improvement initiatives across globally distributed GPU clusters
  • Act as a technical authority (SME) across networking, influencing platform-wide decisions
  • Support design, build, and evolve large-scale Infiniband and RoCE fabrics
  • Drive deep debugging and resolution of complex cross-layer issues (hardware, firmware, kernel, distributed workloads)
  • Lead incident response and postmortems, ensuring systemic fixes and long-term improvements
  • Define and enforce standards across: Congestion control and traffic engineering, Routing (BGP, ECMP, fabric-level routing strategies), Firmware lifecycle and change management, Network observability and telemetry
  • Develop and scale automation frameworks for network provisioning, validation, and operations
  • Build tooling to support high-reliability, low-touch network operations at scale
  • Improve operational efficiency across hundreds of thousands of endpoints and high-throughput links
  • Lead complex technical initiatives across Network, SRE, Compute, and Platform teams
  • Serve as technical lead on critical programs, coordinating engineers and stakeholders
  • Influence product and infrastructure roadmaps based on operational insights and customer needs
  • Mentor senior engineers and raise the bar for technical rigor and execution

Skills

  • 10+ years of experience in network engineering in hyperscale, AI, or HPC environments
  • Deep expertise in RDMA, Infiniband, and/or large-scale RoCE fabrics
  • Strong understanding of RDMA internals and performance tuning
  • Strong understanding of congestion control and fabric failure modes
  • Strong understanding of distributed system communication patterns
  • Expert-level knowledge of data center networking protocols (BGP, OSPF, ECMP)
  • Proven ability to debug multi-layer issues across network, system, and application layers
  • Strong programming/scripting skills for automation (Python, Go, etc.)
  • Experience designing high-scale, highly available network systems
  • Demonstrated ability to lead complex technical programs without direct authority
  • Experience acting as a senior escalation point for critical production issues
  • Strong ability to drive cross-team alignment and execution
  • Systems-level thinking balancing performance, reliability, scalability, and cost
  • Experience with NVIDIA / Mellanox networking platforms
  • Familiarity with distributed AI training frameworks and GPU communication patterns
  • Experience building network observability systems at scale
  • Background influencing infrastructure strategy in high-growth environments

Benefits

  • Highly competitive package (base + equity) with reviews every 12 months.
  • Join the fastest-growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting-edge AI.
  • Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support.
  • Human-First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.
  • Join our thriving remote-first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work.

Company Overview

  • Nscale builds AI data centers and provides GPU cloud infrastructure that companies use to train, run, and scale large AI models. It was founded in 2024, and is headquartered in London, England, GBR, with a workforce of 201-500 employees. Its website is https://www.nscale.com.
  • Apply To This Job

    Related roles

    [Remote] Senior Product Manager – Professional Standards

    Remote · USA Full-time

    [Remote] Oracle Recruiting Cloud

    Remote · USA Full-time

    [Remote] Product Operations Manager - Remote

    Remote · USA Full-time

    [Remote] Revenue Operations Data Analyst

    Remote · USA Full-time

    [Remote] Family & Lifestyle Focused Content Writer

    Remote · USA Full-time

    [Remote] Full Stack Engineer - Podcast

    Remote · USA Full-time

    [Remote] Senior Auditor → Advisory Consultant | CPA Preferred

    Remote · USA Full-time

    [Remote] Account Manager

    Remote · USA Full-time

    [Remote] Principal Software Engineer, Enterprise AI Platform

    Remote · USA Full-time

    [Remote] Business Development Representative

    Remote · USA Full-time

    Experienced Customer Service Representative – Delivering Exceptional Experiences in Financial Services from the Comfort of Your Home

    Remote · USA Full-time

    Entry Level blithequark Data Entry Remote Jobs - Launch Your Career in the Entertainment Industry with a Dynamic and Flexible Opportunity

    Remote · USA Full-time

    Registered Nurse (RN) | Work from home | $34/hr | Starts 6/29/26

    Remote · USA Full-time

    Email And Chat Customer Service Representative $16/Hr

    Remote · USA Full-time

    Experienced Customer Success Manager – Enterprise Software Implementation and Client Relationship Management at blithequark

    Remote · USA Full-time

    Training & Development Analyst - Financial Services Operations

    Remote · USA Full-time

    Experienced Mail Processing Clerk – Remote Opportunity with USPS

    Remote · USA Full-time

    Prescription Entry Technician - Specialty Mail (Weekdays Only)

    Remote · USA Full-time

    Golang Developer​/Washington - Remote

    Remote · USA Full-time

    Data Entry Netflix Remote Jobs (Entry Level, Night Shift) $25/Hour

    Remote · USA Full-time