[Remote] Data Scientist / Data Analytics Engineer
Note: The job is a remote job and is open to candidates in USA. Transflo is seeking a Data Scientist / Data Analytics Engineer to design, build, and operationalize advanced analytics solutions for their transportation and logistics operations. This role involves delivering predictive and point-in-time analytics, building robust data pipelines on AWS, and collaborating with stakeholders to translate complex data into actionable insights.
Responsibilities
- Design, train, validate, and deploy predictive models (regression, classification, time-series forecasting, survival analysis, clustering, anomaly detection, and gradient-boosted / deep learning approaches as appropriate to the problem)
- Lead model selection, hyperparameter tuning, cross-validation, and rigorous performance evaluation using metrics aligned to business objectives (precision/recall trade-offs, MAPE, RMSE, lift, calibration, etc.)
- Develop data products in areas relevant to transportation, including operational metrics, fraud signals, pricing analytics, industry trends,etc
- Establish model monitoring, drift detection, retraining cadence, and explainability practices (SHAP, feature importance, partial dependence) to keep production models trustworthy and operationally self sustaining
- Produce point-in-time analytics, KPI scorecards, and exception reporting to support daily operational decisions across dispatch, fleet, customer success, finance, and product teams
- Partner with business stakeholders to translate questions into well-scoped analyses; deliver clear, defensible insights with documented assumptions and data lineage
- Build and maintain reusable analytical datasets, semantic layers, and certified metrics so the organization works from a consistent source of truth
- Build and maintain data pipelines (batch and streaming) on AWS using services such as Redshift, S3, Glue, Lambda, Step Functions, Kinesis / MSK, EMR, Athena, and SageMaker
- Implement medallion (bronze / silver / gold) architecture patterns to progressively refine raw operational data into analytics-ready and ML-ready datasets
- Apply STARR (Star schema / dimensional) modeling and related techniques to build performant, business-friendly data models in Redshift and the broader warehouse layer
- Drive data selection, curation, profiling, and quality enforcement: define source-of-truth datasets, document lineage, and codify data contracts and validation tests
- Collaborate with data engineering and platform teams on CI/CD for data and ML assets, infrastructure-as-code (e.g., Terraform / CloudFormation), and cost-aware design on AWS
- Take customer-facing analytics features and products from idea to implementation — partnering with product management, design, and engineering to turn ambiguous business questions into shipped capabilities embedded in customer-facing applications
- Contribute to product discovery: customer interviews, opportunity sizing, prototyping, and rapid iteration on analytical concepts before committing to full build-out
- Own the analytical correctness of customer-facing metrics, models, and visualizations — including definitions, edge cases, performance under real-world data conditions, and how results are explained to non-technical end users
- Define and instrument success metrics for shipped analytics features (adoption, engagement, accuracy in production, customer outcomes) and drive iterative improvements post-launch
- Translate complex analytical results into clear narratives, visualizations, and recommendations for both technical and non-technical audiences, including executive leadership and customers
- Partner cross-functionally with product, engineering, operations, and commercial teams to embed analytics into workflows, applications, and customer-facing products
- Mentor analysts and engineers on statistical rigor, modeling best practices, and modern data architecture
Skills
- Bachelor's degree in Statistics, Mathematics, or Supply Chain Management; a degree in Computer Science is also acceptable. Master's degree preferred but not required
- Demonstrated professional experience in the transportation, trucking, freight, logistics, or broader supply chain industry, with working knowledge of the underlying operational data (loads, stops, shipments, ELD/telematics, TMS, dispatch, billing, etc.)
- Proven track record of taking customer-facing analytics products or features from idea through implementation and launch — including product discovery, scoping, model and metric design, partnering with product/engineering, and supporting the feature in production with real customers. Candidates should be prepared to walk through at least one concrete example end-to-end
- Strong applied experience building advanced analytical models end-to-end, including problem framing, data selection and curation, feature engineering, model training and validation, and deployment
- Hands-on experience with AWS PaaS / analytics tooling, including Amazon Redshift and other relevant services such as S3, Glue, Lambda, Step Functions, Athena, Kinesis, EMR, and SageMaker
- Proficiency in SQL (advanced window functions, performance tuning on Redshift or comparable MPP warehouses) and at least one analytics-grade programming language — Python strongly preferred — with libraries such as pandas, scikit-learn, statsmodels, XGBoost/LightGBM, and PyTorch or TensorFlow as appropriate
- Experience designing and operating production data pipelines, with a clear understanding of orchestration, idempotency, observability, and data quality
- Solid grounding in statistical methods: hypothesis testing, experimental design, regression, time-series, and uncertainty quantification
- Master's degree in Statistics, Mathematics, Operations Research, Supply Chain, Computer Science, or a closely related quantitative field
- Experience implementing medallion architecture (bronze / silver / gold) in a cloud data lakehouse or warehouse environment
- Experience designing STARR / star-schema dimensional models for analytics consumption
- Experience with streaming and event-driven data (Kinesis, Kafka/MSK) for near-real-time analytics on transportation events
- Experience deploying and monitoring ML models in production using SageMaker, MLflow, or equivalent MLOps tooling
- Familiarity with BI / visualization tools (e.g., QuickSight, Power BI, Looker) and semantic layer / metrics layer concepts
- Exposure to optimization and operations research techniques (linear / mixed-integer programming, routing, network flow) applied to transportation problems
- Experience working with ELD/HOS data, telematics feeds, geospatial data, or TMS / dispatch system data, brokerage data, and general understanding of transportation backoffice operations and business processes
Company Overview