Real-Time Pitch Type Classification for MLB Broadcast
Built a proof-of-concept real-time pitch-type classifier for MLB broadcast overlays using Statcast pitch tracking data for a per-pitcher model focused on Kevin Gausman.
Broadcast overlays need fast, pitcher-specific pitch type predictions that can support real-time on-air usage.
Implemented a scikit-learn pipeline with imputation, scaling, and one-hot encoding, then tuned a KNN model with stratified cross-validation and grid search on a time-based 2025 holdout set.
Achieved 0.985 accuracy and used a normalized confusion matrix to identify class-specific errors and define conservative deployment rules for broadcast use.