Introduction to Deep Learning¶

We will start with a simple example to introduce Deep Learning

Input (Test & Train)
Output
Model - Architecture & Layers & Weights & Activation
Loss Function
Optimization

Learn a Noisy Function¶

Saddle Function => $$ Z = 2*X*X - 3*Y*Y + 5 + e$$

import numpy as np
import pandas as pd
import keras
import matplotlib.pyplot as plt # Visualisation Library

%matplotlib inline

x = np.arange(-1, 1, 0.01)
y = np.arange(-1, 1, 0.01)

X,Y = np.meshgrid(x,y)
c = np.ones([200, 200])
e = np.random.rand(200, 200)*0.1

Z = 2 *X*X - 3*Y*Y + 5*c + e

import sys
sys.path.append("../")

from mpl_toolkits.mplot3d import Axes3D

def plot3d(X,Y,Z):
    fig = plt.figure(figsize=(8,8))
    ax = fig.add_subplot(111, projection='3d')
    ax.plot_surface(X, Y, Z, color='y')
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    ax.set_zlabel('Z')
    plt.show()

plot3d(X, Y, Z)

Input => X, Y
Output => Z

Using Keras to learn a "Non-Linear" Function¶

Step 0: Load the Keras Model¶

from keras.models import Sequential
from keras.layers import Dense

Step 1: Create the input and output data¶

input_xy = np.c_[X.reshape(-1), Y.reshape(-1)]  ### X input in ML
output_z = Z.reshape(-1)                        ### y output in ML

input_xy.shape, output_z.shape

((40000, 2), (40000,))

Step 2: Create the Model (Tranformation + Regression)¶

model = Sequential()
model.add(Dense(4, input_dim=2, activation="linear"))
model.add(Dense(2, input_dim=2, activation="linear"))
model.add(Dense(1))

model.summary()

Model: "sequential_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_29 (Dense)             (None, 4)                 12        
_________________________________________________________________
dense_30 (Dense)             (None, 2)                 10        
_________________________________________________________________
dense_31 (Dense)             (None, 1)                 3         
=================================================================
Total params: 25
Trainable params: 25
Non-trainable params: 0
_________________________________________________________________

from keras.utils import plot_model

plot_model(model, show_layer_names=True, show_shapes=True)

Step 3: Compile the Model: Loss, Optimizer, Fit the Model¶

model.compile(loss="mean_squared_error", optimizer="sgd", metrics=["mse"])

%%time
output = model.fit(input_xy, output_z, epochs=10, validation_split=0.2, shuffle=True, verbose=1)

Train on 32000 samples, validate on 8000 samples
Epoch 1/10
32000/32000 [==============================] - 1s 34us/step - loss: 0.7956 - mean_squared_error: 0.7956 - val_loss: 6.2738 - val_mean_squared_error: 6.2738
Epoch 2/10
32000/32000 [==============================] - 1s 26us/step - loss: 0.6907 - mean_squared_error: 0.6907 - val_loss: 6.0389 - val_mean_squared_error: 6.0389
Epoch 3/10
32000/32000 [==============================] - 1s 26us/step - loss: 0.6913 - mean_squared_error: 0.6913 - val_loss: 6.7260 - val_mean_squared_error: 6.7260
Epoch 4/10
32000/32000 [==============================] - 1s 26us/step - loss: 0.6895 - mean_squared_error: 0.6895 - val_loss: 6.3373 - val_mean_squared_error: 6.3373
Epoch 5/10
32000/32000 [==============================] - 1s 26us/step - loss: 0.6909 - mean_squared_error: 0.6909 - val_loss: 6.1773 - val_mean_squared_error: 6.1773
Epoch 6/10
32000/32000 [==============================] - 1s 26us/step - loss: 0.6896 - mean_squared_error: 0.6896 - val_loss: 6.3050 - val_mean_squared_error: 6.3050
Epoch 7/10
32000/32000 [==============================] - 1s 26us/step - loss: 0.6904 - mean_squared_error: 0.6904 - val_loss: 6.9137 - val_mean_squared_error: 6.9137
Epoch 8/10
32000/32000 [==============================] - 1s 26us/step - loss: 0.6903 - mean_squared_error: 0.6903 - val_loss: 6.9827 - val_mean_squared_error: 6.9827
Epoch 9/10
32000/32000 [==============================] - 1s 26us/step - loss: 0.6896 - mean_squared_error: 0.6896 - val_loss: 6.7918 - val_mean_squared_error: 6.7918
Epoch 10/10
32000/32000 [==============================] - 1s 26us/step - loss: 0.6898 - mean_squared_error: 0.6898 - val_loss: 6.3870 - val_mean_squared_error: 6.3870
CPU times: user 11.7 s, sys: 992 ms, total: 12.7 s
Wall time: 8.66 s

output_df = pd.DataFrame(output.history)
output_df.head()

output_df.plot.line(y=["val_loss", "loss"]);

Step 5:: Make prediction from model¶

Z_pred = model.predict(input_xy).reshape(200,200)

plot3d(X,Y, Z_pred)

Experiments¶

Change the number of layers
Change the number of learning units in a layer
Change the activation from "relu" to "linear"

Dense?

Init signature:
Dense(
    units,
    activation=None,
    use_bias=True,
    kernel_initializer='glorot_uniform',
    bias_initializer='zeros',
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None,
    **kwargs,
)
Docstring:     
Just your regular densely-connected NN layer.

`Dense` implements the operation:
`output = activation(dot(input, kernel) + bias)`
where `activation` is the element-wise activation function
passed as the `activation` argument, `kernel` is a weights matrix
created by the layer, and `bias` is a bias vector created by the layer
(only applicable if `use_bias` is `True`).

Note: if the input to the layer has a rank greater than 2, then
it is flattened prior to the initial dot product with `kernel`.

# Example

```python
    # as first layer in a sequential model:
    model = Sequential()
    model.add(Dense(32, input_shape=(16,)))
    # now the model will take as input arrays of shape (*, 16)
    # and output arrays of shape (*, 32)

    # after the first layer, you don't need to specify
    # the size of the input anymore:
    model.add(Dense(32))
```

# Arguments
    units: Positive integer, dimensionality of the output space.
    activation: Activation function to use
        (see [activations](../activations.md)).
        If you don't specify anything, no activation is applied
        (ie. "linear" activation: `a(x) = x`).
    use_bias: Boolean, whether the layer uses a bias vector.
    kernel_initializer: Initializer for the `kernel` weights matrix
        (see [initializers](../initializers.md)).
    bias_initializer: Initializer for the bias vector
        (see [initializers](../initializers.md)).
    kernel_regularizer: Regularizer function applied to
        the `kernel` weights matrix
        (see [regularizer](../regularizers.md)).
    bias_regularizer: Regularizer function applied to the bias vector
        (see [regularizer](../regularizers.md)).
    activity_regularizer: Regularizer function applied to
        the output of the layer (its "activation").
        (see [regularizer](../regularizers.md)).
    kernel_constraint: Constraint function applied to
        the `kernel` weights matrix
        (see [constraints](../constraints.md)).
    bias_constraint: Constraint function applied to the bias vector
        (see [constraints](../constraints.md)).

# Input shape
    nD tensor with shape: `(batch_size, ..., input_dim)`.
    The most common situation would be
    a 2D input with shape `(batch_size, input_dim)`.

# Output shape
    nD tensor with shape: `(batch_size, ..., units)`.
    For instance, for a 2D input with shape `(batch_size, input_dim)`,
    the output would have shape `(batch_size, units)`.
File:           /usr/local/lib/python3.6/dist-packages/keras/layers/core.py
Type:           type
Subclasses:

	val_loss	val_mean_squared_error	loss	mean_squared_error
0	6.273764	6.273764	0.795559	0.795559
1	6.038878	6.038878	0.690723	0.690723
2	6.725989	6.725989	0.691273	0.691273
3	6.337259	6.337259	0.689513	0.689513
4	6.177315	6.177315	0.690925	0.690925