Data Mining in R

This set of learning materials for undergraduate and graduate data mining class is currently maintained by Xiaorui Zhu. Many materials are from Dr. Yan Yu’s previous class notes. Thanks for the contribution from previous Ph.D. students in Lindner College of Business. Thanks to Dr. Brittany Green for recording the videos.

Lecture and Lab Notes

Introduction to Data Mining and R

Lab Notes
1.A Introduction to Data Mining
1.B Introduction to R
1.C Advanced techniques: function and loop
1.D Introduction to RMarkdown (optional)

Exploratory Data Analysis

Lab Notes
2.A Explore and describe dataset
2.B Exploratory data analysis by visualization
2.C tidyverse: R packages for EDA (optional)

Linear Regression, Prediction and Variables Seleciton

Lab Notes
3.A Linear regression and prediction
3.B Subset variable selection
3.C LASSO variable selection
3.D Monte Carlo simulation (optional)

Logistic Regression

Lab Notes
4.A Logistic regression and prediction
4.B Logistic regression and variable selection
4.C Logistic Regression for binary classification
4.D Logistic regression and ROC

Cross Validation

Lab Notes
5.A Cross validation
5.B Cross validation (Logit model)

Tree Models

Lab Notes
6.A Regression Trees
6.B Classification Trees

Advanced Tree Models: Bagging, Random Forests, and Boosting Tree

Lab Notes
7.A Bagging trees
7.B Random forests
7.C Boosting trees

Nonlinearity, Generalized Additive Models (GAM), and Nonparametric Smoothing

Lab Notes
8.A Univariate Nonparametric Smoothing
8.B Generalized additive model (GAM)

Neural Network, LDA, and SVM

Lab Notes
9.A Neural network models
9.B Discriminant analysis (optional)
9.C Support vector machine (SVM) (optional)

Unsupervised Learning: Clustering

Lab Notes
10 Clustering

Unsupervised Learning: Association Rules

Lab Notes
11 Association Rules

Other Topics 1: Basic Text Mining

Basic Text Mining

Contributors: