Yujian TanAspiring Data Scientist and Data Analyst
Menu
Portfolio

Uncovering Insights from Data

Aspiring Data Scientist | Data Analyst

Transforming complex datasets into clear, actionable stories that support better business decisions and measurable performance improvement.

About

Turning Data Into Direction.

My name is Yujian Tan, and I am a Statistics student at the University of Illinois Urbana-Champaign focused on analysis, modeling, and clear communication.

My work is centered on data analysis, statistical modeling, machine learning, and data visualization.

I am good at turning analytical results into explanations that non-technical stakeholders can understand.

Education

Academic foundation in statistics and data science.

Expected May 2027

University of Illinois Urbana-Champaign

Bachelor of Science in Statistics

Data Science minor

Relevant Coursework
Statistical ModelingMachine LearningData VisualizationData ScienceProbabilityStatistical Computing
Experience

Work Experience.

March, 2026 - PresentChampaign, IL | Remote

Data Analyst Intern

IDX Exchange

  • Working with real estate transaction data to identify market trends, build Tableau dashboards, and generate insights that support business decisions. Cleaning and analyzing large datasets using Python and SQL, while developing reports on pricing trends, inventory levels, and regional market performance.
June, 2025 - August, 2025Chicago, IL | On-site

Program Assistant Intern

Coalition for a Better Chinese American Community (CBCAC)

  • Cleaned and standardized client data in Excel using Power Query, ensuring accuracy across formats for numbers, dates, and text.
  • Merged and transformed multiple Excel files into a unified worksheet by removing duplicates and unwanted entries to support cross-program communication.
  • Collected and digitized 30 demographic surveys, then submitted them to the Asian Health Coalition, enabling health data tracking for elderly Chinatown residents.
Projects

Recent Projects.

Real-Time Pitch Type Classification for MLB Broadcast

Built a proof-of-concept real-time pitch-type classifier for MLB broadcast overlays using Statcast pitch tracking data for a per-pitcher model focused on Kevin Gausman.

Problem

Broadcast overlays need fast, pitcher-specific pitch type predictions that can support real-time on-air usage.

Approach

Implemented a scikit-learn pipeline with imputation, scaling, and one-hot encoding, then tuned a KNN model with stratified cross-validation and grid search on a time-based 2025 holdout set.

Outcome

Achieved 0.985 accuracy and used a normalized confusion matrix to identify class-specific errors and define conservative deployment rules for broadcast use.

Data SciencePythonScikit-learnKNN

Credit Overheating Early-Warning Model

Built an end-to-end early-warning classifier on a 25-year BIS credit dataset to flag above-trend credit growth and produce country-level risk scoring.

Problem

Credit risk monitoring needed a structured way to flag overheating markets and prioritize deeper review at the country level.

Approach

Conducted rigorous EDA and data QA by removing aggregate regions, resolving valuation-method inconsistencies, and validating long-run credit dynamics across sectors before model development.

Outcome

Produced country-level risk scoring and stakeholder-friendly visual outputs, including a country risk map, with model performance reaching an AUC of 0.69.

PythonPandasScikit-learnData CleaningEDAData Visualization

U.S. MSA House Price Index Dashboard

Developed an interactive R Shiny dashboard to explore quarterly House Price Index trends across more than 400 U.S. metro areas from 2000 to the present.

Problem

Housing market analysis needed a faster way to compare price trends across states and metro areas without manual filtering.

Approach

Built a clean analysis dataset with tidyverse, engineered time features, standardized state and metro identifiers, and implemented reactive UI filters for state, metro area, and year range.

Outcome

Enabled reliable market trend comparisons with linked time-series plots and a drilldown table for validation in a single dashboard.

R ShinyDashboardtidyverse
Activities

Activities and Community Involvement.

August, 2025 - Present

Data Scientist Member

Data Science Club

Led a Data Dive team in a credit risk analytics engagement focused on identifying country-level credit overheating risk in a global macro-risk monitoring context. Club involvement focused on machine learning, Scikit-learn, pandas, and Python.

March 28-29, 2026

Team024 Data Crucher

Illinois Statistics Datathon 2026

Participated in a 36-hour datathon using AWS SageMaker JupyterLab and Python to predict 30-minute interval metrics for August 2025 call volume, customer care time, and abandoned rate.

November, 2025 - Present

Community Volunteer

Coalition for a Better Chinese American Community (CBCAC)

Assist digital literacy and community service initiatives.

Contact

Open to analyst and data science roles.

Feel free to reach out about internships, projects, or collaboration in data science and analytics.