[Remote] Tech - Staff AI Engineer, Agent Platform
Note: The job is a remote job and is open to candidates in USA. Onpoint Healthcare Partners builds Iris, a Medical Agent AI platform that enhances administrative efficiency for healthcare providers. They are seeking a Staff AI Engineer to design, build, and scale the systems behind Iris, ensuring that AI solutions deliver measurable business value and improve patient care.
Responsibilities
- Design and develop AI applications and agent workflows
- Build scalable backend services, APIs, and integrations supporting AI solutions
- Evolve and steward the agent platform architecture, including orchestration, runtime safety, and prompt governance
- Treat prompts and tool schemas as versioned code with staged rollouts and rollback
- Develop and maintain evaluation frameworks for AI agents and models, with CI gates that block bad changes before release
- Design retrieval, memory, and context management strategies
- Build observability, monitoring, and debugging capabilities, including full trace and replay for incident resolution
- Build the MLOps foundation the team is currently missing: training and retraining pipelines, model versioning, model registry, and feature stores, so model work stops being reactive
- Stand up A/B testing infrastructure and automated drift detection that triggers retraining, so model quality is monitored and maintained without manual firefighting
- Track token and tool costs per run, workflow, and tenant, and keep costs predictable as usage scales
- Improve reliability, performance, safety, and cost efficiency of production AI systems
- Partner with Product, Clinical, Data Science, and Engineering teams to deliver AI capabilities
- Own AI deployment, monitoring, and operational excellence
Skills
- 8+ years of software engineering experience
- 3+ years building AI and ML applications
- Strong Python and/or C# development experience
- Experience deploying AI systems into production environments
- Experience with LLMs, RAG architectures, agent frameworks, and AI evaluation
- Experience with AWS and distributed systems
- Hands-on experience building MLOps infrastructure such as training pipelines, model registries, feature stores, A/B testing, and drift detection
- Strong debugging, observability, and operational skills
- Healthcare industry experience
- Experience with HIPAA, PHI, and regulated environments
- Experience with vector databases, embeddings, and knowledge graphs
- Experience building AI systems at large scale
Company Overview