Projects archive

More data science, analytics project.

This page collects some of my coursework and projects.

Featured

Core portfolio projects.

Real-Time Pitch Type Classification for MLB Broadcast

Built a proof-of-concept real-time pitch-type classifier for MLB broadcast overlays using Statcast pitch tracking data for a per-pitcher model focused on Kevin Gausman.

Problem

Broadcast overlays need fast, pitcher-specific pitch type predictions that can support real-time on-air usage.

Approach

Implemented a scikit-learn pipeline with imputation, scaling, and one-hot encoding, then tuned a KNN model with stratified cross-validation and grid search on a time-based 2025 holdout set.

Outcome

Achieved 0.985 accuracy and used a normalized confusion matrix to identify class-specific errors and define conservative deployment rules for broadcast use.

Data SciencePythonScikit-learnKNN

Credit Overheating Early-Warning Model

Built an end-to-end early-warning classifier on a 25-year BIS credit dataset to flag above-trend credit growth and produce country-level risk scoring.

Problem

Credit risk monitoring needed a structured way to flag overheating markets and prioritize deeper review at the country level.

Approach

Conducted rigorous EDA and data QA by removing aggregate regions, resolving valuation-method inconsistencies, and validating long-run credit dynamics across sectors before model development.

Outcome

Produced country-level risk scoring and stakeholder-friendly visual outputs, including a country risk map, with model performance reaching an AUC of 0.69.

PythonPandasScikit-learnData CleaningEDAData Visualization

U.S. MSA House Price Index Dashboard

Developed an interactive R Shiny dashboard to explore quarterly House Price Index trends across more than 400 U.S. metro areas from 2000 to the present.

Problem

Housing market analysis needed a faster way to compare price trends across states and metro areas without manual filtering.

Approach

Built a clean analysis dataset with tidyverse, engineered time features, standardized state and metro identifiers, and implemented reactive UI filters for state, metro area, and year range.

Outcome

Enabled reliable market trend comparisons with linked time-series plots and a drilldown table for validation in a single dashboard.

R ShinyDashboardtidyverse
More work

Additional projects to explore.

2026 Datathon Intraday Contact Center Demand Forecasting

Developed a Python-based forecasting pipeline in AWS SageMaker AI to predict 30-minute call volume, abandon rate, and customer care time across four portfolios.

Problem

The datathon problem required interval-level forecasts that could support staffing, scheduling, and contact center operations planning.

Approach

Cleaned and merged multi-sheet Excel and CSV operational data, engineered calendar, lag, and rolling-window features, and built regularized scikit-learn regression pipelines with imputation, scaling, and target transformation.

Outcome

Generated submission-ready interval forecasts and summarized results for operations planning in a 36-hour datathon setting.

Data SciencePythonAWS SageMaker AIForecasting