Practical Machine Learning - Day 2

VMware Bangalore
June 18-20, 2018

Amit kapoorAnand ChitipothuBargava Subramanian

Notes of this workshop are available online at:
https://bit.ly/vmware-ml

Home | Day 1 | Day 2 | Day 2 - Housing | Day 3

The iris prediction problem

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
url = "https://notes.pipal.in/2018/vmware-ml/iris.csv"
df = pd.read_csv(url)
In [3]:
df.head()
Out[3]:
SepalLength SepalWidth PetalLength PetalWidth Name
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

Problem: Write a Python function that takes PetalLength and PetalWidth as arguments and predicts the type of the flower.

In [4]:
def predict0(PetalLength, PetalWidth):
    return 'Iris-setosa'
In [6]:
def test(dataset, predict_function):
    predicted = np.array([predict_function(x1, x2) for x1, x2 in 
                          zip(dataset.PetalLength, dataset.PetalWidth)])
    actual = dataset.Name
    matched = np.sum(predicted == actual)
    return matched / len(dataset)
In [7]:
test(df, predict0)
Out[7]:
0.3333333333333333

Building a Machine Learning Model

In [8]:
from sklearn.tree import DecisionTreeClassifier
In [19]:
model = DecisionTreeClassifier(max_depth=2)
In [20]:
X = df[['PetalLength', 'PetalWidth']]
y = df.Name

model.fit(X, y)
Out[20]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')
In [21]:
model.predict([[3, 4]])
Out[21]:
array(['Iris-virginica'], dtype=object)
In [23]:
 
In [33]:
def test_model(model):
    def model_predict(PetalLength, PetalWidth):
        row = [PetalLength, PetalWidth]
        return model.predict([row])[0]
    return test(df, model_predict)
In [35]:
test_model(model)
Out[35]:
0.96

Visualizing the Model

Install the modelvis library.

In [25]:
!pip install modelvis
Requirement already satisfied: modelvis in /Users/anand/github/amitkaps/modelvis-python
Requirement already satisfied: numpy in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from modelvis)
Requirement already satisfied: pandas in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from modelvis)
Requirement already satisfied: scikit-learn in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from modelvis)
Requirement already satisfied: matplotlib in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from modelvis)
Requirement already satisfied: requests in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from modelvis)
Requirement already satisfied: seaborn in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from modelvis)
Requirement already satisfied: python-dateutil>=2 in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from pandas->modelvis)
Requirement already satisfied: pytz>=2011k in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from pandas->modelvis)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from matplotlib->modelvis)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from matplotlib->modelvis)
Requirement already satisfied: cycler>=0.10 in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from matplotlib->modelvis)
Requirement already satisfied: six>=1.10 in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from matplotlib->modelvis)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from requests->modelvis)
Requirement already satisfied: idna<2.7,>=2.5 in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from requests->modelvis)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from requests->modelvis)
Requirement already satisfied: certifi>=2017.4.17 in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from requests->modelvis)
Requirement already satisfied: setuptools in /Users/anand/anaconda/envs/rx/lib/python3.5/site-packages (from kiwisolver>=1.0.1->matplotlib->modelvis)
You are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
In [26]:
import modelvis
In [27]:
modelvis.__version__
Out[27]:
'0.1.5'
In [28]:
modelvis.render_tree(model, 
                     feature_names=["PetalLength", "PetalWidth"],
                     class_names=["setosa", "versicolor", "virginica"])
Out[28]:
Tree 0 PetalWidth ≤ 0.8 gini = 0.667 samples = 150 value = [50, 50, 50] class = setosa 1 gini = 0.0 samples = 50 value = [50, 0, 0] class = setosa 0->1 True 2 PetalWidth ≤ 1.75 gini = 0.5 samples = 100 value = [0, 50, 50] class = versicolor 0->2 False 3 gini = 0.168 samples = 54 value = [0, 49, 5] class = versicolor 2->3 4 gini = 0.043 samples = 46 value = [0, 1, 45] class = virginica 2->4
In [29]:
model3 = DecisionTreeClassifier(max_depth=3)
model3.fit(X, y);
In [30]:
modelvis.render_tree(model3, 
                     feature_names=["PetalLength", "PetalWidth"],
                     class_names=["setosa", "versicolor", "virginica"])
Out[30]:
Tree 0 PetalLength ≤ 2.45 gini = 0.667 samples = 150 value = [50, 50, 50] class = setosa 1 gini = 0.0 samples = 50 value = [50, 0, 0] class = setosa 0->1 True 2 PetalWidth ≤ 1.75 gini = 0.5 samples = 100 value = [0, 50, 50] class = versicolor 0->2 False 3 PetalLength ≤ 4.95 gini = 0.168 samples = 54 value = [0, 49, 5] class = versicolor 2->3 6 PetalLength ≤ 4.85 gini = 0.043 samples = 46 value = [0, 1, 45] class = virginica 2->6 4 gini = 0.041 samples = 48 value = [0, 47, 1] class = versicolor 3->4 5 gini = 0.444 samples = 6 value = [0, 2, 4] class = virginica 3->5 7 gini = 0.444 samples = 3 value = [0, 1, 2] class = virginica 6->7 8 gini = 0.0 samples = 43 value = [0, 0, 43] class = virginica 6->8
In [36]:
test_model(model3)
Out[36]:
0.9733333333333334
In [40]:
model5 = DecisionTreeClassifier(max_depth=5)
model5.fit(X, y);
test_model(model5)
Out[40]:
0.9933333333333333
In [41]:
modelvis.render_tree(model5, 
                     feature_names=["PetalLength", "PetalWidth"],
                     class_names=["setosa", "versicolor", "virginica"])
Out[41]:
Tree 0 PetalWidth ≤ 0.8 gini = 0.667 samples = 150 value = [50, 50, 50] class = setosa 1 gini = 0.0 samples = 50 value = [50, 0, 0] class = setosa 0->1 True 2 PetalWidth ≤ 1.75 gini = 0.5 samples = 100 value = [0, 50, 50] class = versicolor 0->2 False 3 PetalLength ≤ 4.95 gini = 0.168 samples = 54 value = [0, 49, 5] class = versicolor 2->3 12 PetalLength ≤ 4.85 gini = 0.043 samples = 46 value = [0, 1, 45] class = virginica 2->12 4 PetalWidth ≤ 1.65 gini = 0.041 samples = 48 value = [0, 47, 1] class = versicolor 3->4 7 PetalWidth ≤ 1.55 gini = 0.444 samples = 6 value = [0, 2, 4] class = virginica 3->7 5 gini = 0.0 samples = 47 value = [0, 47, 0] class = versicolor 4->5 6 gini = 0.0 samples = 1 value = [0, 0, 1] class = virginica 4->6 8 gini = 0.0 samples = 3 value = [0, 0, 3] class = virginica 7->8 9 PetalLength ≤ 5.45 gini = 0.444 samples = 3 value = [0, 2, 1] class = versicolor 7->9 10 gini = 0.0 samples = 2 value = [0, 2, 0] class = versicolor 9->10 11 gini = 0.0 samples = 1 value = [0, 0, 1] class = virginica 9->11 13 gini = 0.444 samples = 3 value = [0, 1, 2] class = virginica 12->13 14 gini = 0.0 samples = 43 value = [0, 0, 43] class = virginica 12->14
In [43]:
print(modelvis.render_tree_as_code(model))
def predict(row):
    """Your decision-tree model wrote this code."""
    # 150 samples; value=[50, 50, 50]; class=0
    if row[1] < 0.800000011920929:
        # 50 samples; value=[50, 0, 0]; class=0
        return 0
    else:
        # 100 samples; value=[0, 50, 50]; class=1
        if row[1] < 1.75:
            # 54 samples; value=[0, 49, 5]; class=1
            return 1
        else:
            # 46 samples; value=[0, 1, 45]; class=2
            return 2

In [44]:
print(modelvis.render_tree_as_code(model3))
def predict(row):
    """Your decision-tree model wrote this code."""
    # 150 samples; value=[50, 50, 50]; class=0
    if row[0] < 2.450000047683716:
        # 50 samples; value=[50, 0, 0]; class=0
        return 0
    else:
        # 100 samples; value=[0, 50, 50]; class=1
        if row[1] < 1.75:
            # 54 samples; value=[0, 49, 5]; class=1
            if row[0] < 4.949999809265137:
                # 48 samples; value=[0, 47, 1]; class=1
                return 1
            else:
                # 6 samples; value=[0, 2, 4]; class=2
                return 2
        else:
            # 46 samples; value=[0, 1, 45]; class=2
            if row[0] < 4.850000381469727:
                # 3 samples; value=[0, 1, 2]; class=2
                return 2
            else:
                # 43 samples; value=[0, 0, 43]; class=2
                return 2

In [46]:
names = {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 2}
df['iname'] = df.Name.map(names.get)
In [56]:
y = df.iname
model = DecisionTreeClassifier(max_depth=3)
model.fit(X, y)
modelvis.plot_decision_boundaries(model, X, y, 
                                  show_input=True)
In [55]:
print(modelvis.render_tree_as_code(model))
def predict(row):
    """Your decision-tree model wrote this code."""
    # 150 samples; value=[50, 50, 50]; class=0
    if row[1] < 0.800000011920929:
        # 50 samples; value=[50, 0, 0]; class=0
        return 0
    else:
        # 100 samples; value=[0, 50, 50]; class=1
        if row[1] < 1.75:
            # 54 samples; value=[0, 49, 5]; class=1
            return 1
        else:
            # 46 samples; value=[0, 1, 45]; class=2
            return 2

In [57]:
model = DecisionTreeClassifier(max_depth=5)
model.fit(X, y)
modelvis.plot_decision_boundaries(model, X, y, 
                                  show_input=True)
In [ ]: