501 Project Presentation

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Comparison of red & white wine quality decision tree classifiers:

Comparison of red & white wine quality decision tree classifiers Roberto Kao DSCI501 Project Presentation 5/5/2017

Goals:

Compare classifier accuracy between red wine, white wine, and both wines. Analyze accuracy rate as classifiers are fine tuned. Explore idea of red wine classifier predicting white wine quality scores, and vice versa. Goals

The Data:

Predictors are physicochemical properties of wine. Class variable “quality” is ordinal between 0 and 10. Related to red and white variants of Portuguese “ Vinho Verde” wine. The Data

Methodology:

Set random seed to 3 for reproducibility & consistency. Training (70%) and testing (30%). Baseline model Feature selection Maximum tree depth Class weights Aggregate model Methodology

Methodology 2:

Red wine data set as training set, white wine data set as testing set. Vice versa. Methodology 2

Class Imbalance Issue:

Much more “normal” wine than bad or great wine. Pruning not supported in SKLEARN python package. Solution is to remove classes with lowest amount of observations. Class 3 and 9 have lowest number of members; 30 and 5, respectively. Class Imbalance Issue

CASE 0: Baseline Models:

Red accuracy: 64.15% White accuracy: 58.48% Both accuracy: 59.93% CASE 0: Baseline Models

CASE 1: Feature Selection:

Red accuracy: 58.07% White accuracy: 58.00% Both accuracy: 58.43% From baseline, select features based on its importance weight vs. the mean of all features’ importance weights. If greater than or equal, keep. Else, discard. CASE 1: Feature Selection

CASE 2: Max Depth:

Red accuracy: 54.30% White accuracy: 56.84% Both accuracy: 60.03% Grid search for max depth (values 3 to 20) CASE 2: Max Depth

CASE 3: Class Weights:

Red accuracy: 59.54% White accuracy: 56.70% Both accuracy: 58.48% Weights are assigned to members of each respective class to address class imbalance. CASE 3: Class Weights

CASE X: Aggregate of previous cases:

Red accuracy: 58.28% White accuracy: 58.07% Both accuracy: 55.80% Combination of previous three adjustments. CASE X: Aggregate of previous cases

CASE Y: Red Predicting White:

Red wine classifier predicting white wine quality accuracy: 46.15% CASE Y: Red Predicting White

CASE Z: White Predicting Red:

White wine classifier predicting red wine quality accuracy: 23.22% CASE Z: White Predicting Red

Highlight of Results:

Tuning decision tree classifiers have marginal effect on test performance. Red wine performs better than white. In fact, red wine predicting white wine is approximately twice as accurate as white wine predicting red wine. Highlight of Results

Future Directions:

Consider other hyper-parameter adjustments. Classifier type: support vector machines, random forests, naïve Bayes. Balanced data set. Wines from all over the world. Future Directions

References:

P. Cortez, A. Cerdeira , F. Almeida, T. Matos and J. Reis.  Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009. References

authorStream Live Help