All roles

[Remote] Member of Engineering (Pre-training / Data Acquisition)

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. Poolside is a company focused on building a world where AI drives economically valuable work and scientific progress. The role involves working with the pre-training data team to acquire high-quality pre-training data for frontier models, ensuring alignment with training needs and building systems for efficient data acquisition.

Responsibilities

  • Design, build, and operate a large-scale web crawler responsible for acquiring all openly accessible data on the internet
  • Develop specialized deep crawlers targeting high-value sources to improve recall and coverage
  • In collaboration with data researchers, own a long-term road map for data acquisition
  • Build observability, monitoring, and debugging tooling to ensure reliability and transparency across crawl infrastructure
  • Collaborate with pre-training, post-training, and evaluations teams to align data acquisition priorities with model training needs
  • Build high-throughput ingestion pipelines for rapidly onboarding partner data and evaluating it for quality

Skills

  • Strong distributed systems background with proven experience building and operating large-scale infrastructure — data pipelines, web crawlers, or similar
  • Proficiency in Python, and comfortable optimizing performance and debugging complex systems under production conditions
  • Hands-on experience with web crawling or large-scale data extraction: understanding of HTTP protocols, distributed job queues, and data parsing at scale
  • Familiarity with cloud platforms (AWS) and container orchestration (Kubernetes, Docker) for deploying and managing high-throughput workloads
  • Awareness of the non-technical dimensions of internet-scale crawling: data privacy, robots.txt adherence, and responsible crawl practices
  • Prior experience pre-training LLMs
  • Experience in building trillion-scale SOTA pre-training datasets
  • Experience translating research to production at scale

Benefits

  • Fully remote work & flexible hours
  • 37 days/year of vacation & holidays
  • 16 weeks of flexible, full-pay parental leave
  • Health insurance allowance for you & dependents
  • Company-provided equipment
  • Well-being, always-be-learning & home office allowances
  • Frequent team get togethers
  • Diverse & inclusive people-first culture

Company Overview

  • Poolside is an artificial intelligence platform that offers foundation concepts and infrastructure to write software codes. It was founded in 2023, and is headquartered in San Francisco, California, USA, with a workforce of 51-200 employees. Its website is http://www.poolside.ai.
  • Company H1B Sponsorship

  • Poolside has a track record of offering H1B sponsorships, with 1 in 2025. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Related roles

    [Remote] Lead Systems Analyst with Performance and Investment Data - 100% Remote EST Hours

    Remote · USA Full-time

    [Remote] Staff ML Engineer, Product

    Remote · USA Full-time

    [Remote] Key Account Manager - Southeast Grocery

    Remote · USA Full-time

    [Remote] Senior Manager, Clinical Informatics Solutions, PER

    Remote · USA Full-time

    [Remote] MarTech Product Manager – Content Supply Chain

    Remote · USA Full-time

    [Remote] Workday IT Financial Analyst

    Remote · USA Full-time

    [Remote] Webex Contact Center Engineer

    Remote · USA Full-time

    [Remote] Vice President - Business Development, North America

    Remote · USA Full-time

    [Remote] Director, Business Development

    Remote · USA Full-time

    [Remote] Account Manager

    Remote · USA Full-time

    Experienced Part-Time Chat Operator – Automotive Industry Customer Service Representative

    Remote · USA Full-time

    [Remote] Senior Software Engineering Manager

    Remote · USA Full-time

    Experienced Entry-Level Data Entry Specialist (Remote) – Flexible Work Schedule and Career Growth Opportunities at arenaflex

    Remote · USA Full-time

    [Remote] Principal Software Development Engineer (PBM AI Solutions)

    Remote · USA Full-time

    Experienced Remote Data Entry Operator – Flexible Part-Time Work From Home Opportunity with Growth Potential

    Remote · USA Full-time

    Remote Pharmacy Technician (Indiana Residents Only)

    Remote · USA Full-time

    Fully Remote Occupational Therapist (Washington Residents Only)

    Remote · USA Full-time

    Inside Sales Rep - Office/Remote Hybrid

    Remote · USA Full-time

    [Remote] SteadyMD Workforce AI Optimization Analyst

    Remote · USA Full-time

    Experienced Full Stack Data Entry Clerk – Remote Work Opportunity with arenaflex

    Remote · USA Full-time