[Remote] Mid-Level Data Scientist
Note: The job is a remote job and is open to candidates in USA. Simple Technology Solutions is committed to prioritizing its people while delivering exceptional solutions to Federal Government clients. They are seeking a Mid-Level Data Scientist to join their federal data engineering team, where the role involves building AI/ML capabilities and delivering analytical products that support critical government decision-making.
Responsibilities
- Build and maintain knowledge bases, vector stores, and Retrieval Augmented Generation (RAG) pipelines using Amazon Bedrock and Amazon OpenSearch Services to make financial and regulatory datasets AI-ready for advanced analytics and machine learning consumption
- Support the development, validation, and operationalization of statistical outputs and derived data products; coordinate with the agency data science team and SME data scientists to implement Airflow DAGs and AWS Glue jobs that ensure automated, recurring updates
- Support transition of data science outputs into production by validating accuracy, completeness, and reporting readiness; ensure all production data products are incorporated into the agency's ETL load and gap reporting infrastructure
- Develop and validate machine learning models and analytical pipelines using large-scale financial and regulatory datasets in the data lake
- Leverage AI-assisted development tools for code generation, debugging, and performance tuning; adhere to agency security standards and applicable federal AI governance requirements
- Write Python 3.10 code conforming to PEP 8; integrate analytical pipelines with the agency's ETL metadata infrastructure and produce required load and gap reporting outputs
- Support entity resolution work to ensure consistent identification and linkage of records across high-volume financial datasets
- Produce required documentation for all analytical models and pipelines: methodology, data lineage, model assumptions, refresh schedules, and IV&V Questionnaires
- Write automated tests achieving the 90% minimum code coverage threshold; complete security scans at least once per sprint as part of the Definition of Done per OWASP ASVS Level 2
- Participate in 2-week sprint ceremonies, quarterly PI planning, backlog refinement, and agile delivery using JIRA and GitHub
Skills
- US Citizenship is required
- Bachelor's Degree is required
- Minimum of 3-5 years' position related experience is required
- Bachelor's degree or higher in Data Science, Statistics, Computer Science, Mathematics, or a related quantitative field
- 3-5 years of experience in data science, machine learning engineering, or quantitative analytics
- Proficiency in Python 3.10 (PEP 8) including pandas, NumPy, scikit-learn, and related libraries
- Hands-on experience with Amazon Bedrock, knowledge bases, vector stores, and RAG pipeline design on AWS
- Experience with Amazon OpenSearch Services or equivalent vector/search infrastructure
- Experience with Apache Airflow (MWAA) for DAG-based pipeline orchestration
- Familiarity with AWS Glue, S3, and Apache Spark for large-scale data processing
- Experience with SQL and query tools such as Trino, Athena, or Redshift
- Must be able to work 8am-5pm Eastern Time regardless of home location
- Active federal public trust suitability determination or ability to obtain one required
- Experience working with large-scale financial or regulatory datasets is strongly preferred
- Knowledge of federal AI governance requirements and responsible AI practices in a government setting
- Experience with agile development, CI/CD pipelines, GitHub, and sprint-based delivery
- Familiarity with FISMA, NIST 800-53, and Zero Trust principles
Benefits
- Flexibility to help them thrive personally and professionally
- Special incentives for team members living in qualified HUBZones
Company Overview