Movies Clustering¶

Frame the Problem¶

Can we cluster movies that go together based on

Name
Genre
Director
Actor
Year of Release
Running Length
Language
Production House
Ratings

Acquire the Data¶

IMDB for Western Movies

import pandas as pd
import numpy as np

url = "https://notes.pipal.in/2018/airwatch-ml/IMDB-Movie-Data.csv"

df = pd.read_csv(url)

df.columns

Index(['Rank', 'Title', 'Genre', 'Description', 'Director', 'Actors', 'Year',
       'Runtime (Minutes)', 'Rating', 'Votes', 'Revenue (Millions)',
       'Metascore'],
      dtype='object')

df.shape

(1000, 12)

colnames = ['Rank', 'Title', 'Genre', 'Description', 'Director', 'Actors', 'Year',
       'Runtime', 'Rating', 'Votes', 'Revenue', 'Metascore']

df.columns = colnames

df.head()

Refine the Data¶

Encoding¶

Genre (One Hot Encoding)
Description => Text Analytics => Similiar Themers, Keywords
Director => ??
Actors => ??
Year
Runtime, Rating, Votes, Revenue, Metascore: Bucket them or leave it as it is

Check for Missing Values¶

df.isnull().sum()

Rank             0
Title            0
Genre            0
Description      0
Director         0
Actors           0
Year             0
Runtime          0
Rating           0
Votes            0
Revenue        128
Metascore       64
dtype: int64

df.Metascore.mean()

58.98504273504273

import matplotlib.pyplot as plt
%matplotlib inline

df.corr()

plt.scatter(df.Metascore, df.Rating)

<matplotlib.collections.PathCollection at 0x11fce3f28>

df_clean = df[['Rating', 'Metascore']].dropna()

from sklearn.linear_model import LinearRegression
metascore_model = LinearRegression()
metascore_model.fit(df_clean[['Rating']], df_clean.Metascore)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

metascore_model.coef_, metascore_model.intercept_

(array([11.61785393]), -19.193432676867538)

df2 = df.copy(deep=True)

missing_indices = df2.Metascore.isnull()
df2['Metascore'][missing_indices] = metascore_model.predict(df2[['Rating']][missing_indices])

/Users/amitkaps/miniconda3/lib/python3.6/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

df2.isnull().sum()

Rank             0
Title            0
Genre            0
Description      0
Director         0
Actors           0
Year             0
Runtime          0
Rating           0
Votes            0
Revenue        128
Metascore        0
dtype: int64

df.isnull().sum()

Rank             0
Title            0
Genre            0
Description      0
Director         0
Actors           0
Year             0
Runtime          0
Rating           0
Votes            0
Revenue        128
Metascore       64
dtype: int64

df2.head()

Transform the Data¶

# Number of Genres
df.Genre.str.split(",", expand=True).stack().unique().shape

(20,)

# One-Hot Encoding for Genres
genre = (df.Genre
   .str
   .split(",", expand=True)
   .stack()
   .str
   .get_dummies()
   .sum(level=0)
)

# Scale the Continuous Values
from sklearn.preprocessing import StandardScaler

df_cont = df2[["Runtime", "Rating", "Votes", "Metascore"]]

df_cont.head()

SS = StandardScaler()
SS.fit(df_cont)

StandardScaler(copy=True, with_mean=True, with_std=True)

df_cont_scaled = SS.transform(df_cont)

df_cont_scaled

array([[ 0.41634975,  1.45699912,  3.11268996,  1.00918314],
       [ 0.57591149,  0.29292371,  1.67495992,  0.35940355],
       [ 0.20360077,  0.61039882, -0.06467572,  0.18219094],
       ...,
       [-0.80695688, -0.5536766 , -0.52530968, -0.52665952],
       [-1.0728931 , -1.18862683, -0.87416543, -2.18064393],
       [-1.39201657, -1.50610194, -0.83412689, -2.83042352]])

Model the Data¶

Simliar clusters of movies based on

One-Hot Genres (20)
Runtime
Rating
Votes
Metascore

X = np.c_[np.array(genre), df_cont_scaled]

X.shape

(1000, 24)

from sklearn.cluster import KMeans

model_km = KMeans()

model_km.fit(X)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=8, n_init=10, n_jobs=1, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)

model_km.labels_

array([4, 6, 5, 3, 6, 1, 2, 3, 2, 1, 6, 5, 6, 0, 3, 3, 2, 6, 5, 0, 1, 2,
       5, 3, 1, 3, 2, 7, 3, 1, 1, 5, 6, 0, 1, 2, 4, 6, 6, 5, 3, 5, 7, 5,
       3, 6, 5, 7, 6, 7, 4, 1, 1, 0, 4, 2, 3, 5, 5, 5, 6, 3, 1, 7, 4, 6,
       5, 0, 5, 5, 1, 5, 5, 1, 0, 6, 4, 4, 6, 1, 4, 5, 4, 2, 6, 6, 1, 4,
       2, 6, 2, 6, 2, 5, 6, 5, 5, 3, 5, 4, 5, 6, 2, 5, 6, 5, 1, 1, 1, 0,
       3, 2, 3, 6, 0, 1, 1, 2, 5, 0, 5, 6, 3, 3, 4, 5, 6, 3, 3, 2, 2, 5,
       3, 0, 6, 2, 0, 2, 4, 6, 0, 3, 2, 0, 4, 2, 5, 2, 0, 1, 3, 4, 3, 5,
       5, 7, 6, 0, 0, 0, 1, 3, 0, 1, 2, 1, 6, 5, 2, 6, 0, 5, 7, 0, 0, 6,
       2, 0, 5, 6, 3, 3, 5, 1, 5, 1, 1, 5, 0, 5, 5, 7, 2, 7, 2, 6, 3, 2,
       0, 1, 0, 1, 5, 4, 1, 6, 5, 5, 5, 5, 1, 3, 6, 1, 6, 1, 6, 1, 3, 0,
       3, 0, 3, 2, 1, 2, 5, 6, 6, 7, 0, 7, 2, 6, 1, 0, 5, 7, 0, 1, 2, 0,
       3, 6, 1, 2, 0, 5, 5, 0, 7, 5, 2, 6, 0, 0, 6, 0, 3, 5, 0, 0, 3, 5,
       1, 7, 1, 2, 6, 7, 2, 4, 0, 5, 3, 6, 0, 2, 5, 6, 6, 6, 5, 5, 1, 1,
       1, 1, 1, 3, 5, 5, 3, 5, 6, 1, 7, 5, 5, 2, 6, 5, 0, 3, 1, 2, 1, 1,
       1, 1, 5, 2, 5, 6, 2, 5, 1, 7, 3, 6, 1, 3, 6, 6, 0, 1, 1, 5, 3, 1,
       3, 2, 0, 2, 0, 3, 0, 3, 2, 1, 1, 0, 6, 3, 6, 1, 6, 2, 7, 0, 5, 5,
       1, 3, 1, 1, 5, 5, 3, 5, 3, 7, 6, 0, 3, 0, 1, 5, 6, 1, 3, 6, 1, 6,
       5, 1, 5, 2, 3, 1, 3, 5, 3, 2, 0, 6, 1, 6, 6, 6, 3, 1, 5, 7, 7, 5,
       6, 1, 7, 3, 3, 7, 3, 0, 1, 5, 2, 0, 6, 1, 0, 3, 0, 5, 1, 3, 5, 5,
       2, 5, 5, 7, 1, 1, 1, 2, 1, 0, 7, 1, 2, 7, 6, 6, 3, 3, 1, 5, 1, 3,
       6, 5, 0, 5, 7, 6, 3, 5, 0, 3, 6, 3, 1, 1, 1, 5, 1, 1, 1, 5, 2, 7,
       5, 5, 3, 6, 3, 3, 0, 3, 2, 5, 6, 2, 0, 5, 2, 7, 5, 5, 5, 1, 1, 3,
       3, 0, 5, 1, 3, 0, 3, 6, 3, 1, 5, 0, 6, 0, 1, 0, 6, 5, 5, 5, 2, 1,
       5, 0, 7, 0, 5, 6, 7, 5, 3, 0, 6, 1, 2, 1, 5, 1, 1, 5, 5, 7, 6, 7,
       6, 6, 1, 7, 6, 3, 5, 1, 6, 1, 3, 1, 5, 0, 1, 5, 1, 5, 3, 6, 5, 5,
       1, 5, 7, 5, 1, 7, 5, 1, 6, 3, 6, 3, 3, 0, 6, 1, 6, 0, 3, 3, 1, 3,
       5, 0, 5, 1, 5, 6, 2, 3, 7, 7, 7, 1, 2, 1, 5, 1, 6, 0, 2, 1, 5, 1,
       1, 5, 1, 3, 0, 3, 1, 3, 1, 3, 3, 5, 5, 3, 2, 6, 1, 2, 1, 2, 3, 1,
       7, 5, 3, 5, 5, 1, 1, 7, 1, 6, 6, 3, 3, 3, 3, 3, 5, 3, 0, 0, 0, 5,
       3, 1, 0, 5, 7, 5, 3, 0, 1, 7, 7, 1, 0, 1, 3, 3, 3, 2, 2, 1, 5, 3,
       6, 3, 5, 3, 5, 5, 1, 1, 1, 2, 1, 1, 2, 1, 3, 6, 1, 3, 3, 2, 6, 5,
       3, 1, 1, 0, 5, 1, 0, 5, 7, 2, 3, 1, 0, 5, 3, 5, 6, 7, 3, 0, 6, 2,
       1, 3, 5, 3, 2, 3, 6, 3, 3, 2, 1, 3, 1, 3, 6, 3, 5, 5, 1, 7, 6, 3,
       3, 0, 6, 3, 1, 3, 3, 6, 1, 2, 3, 6, 1, 1, 7, 1, 2, 0, 1, 1, 7, 1,
       3, 1, 5, 3, 0, 1, 3, 5, 7, 1, 1, 0, 1, 5, 5, 1, 7, 2, 6, 6, 3, 3,
       6, 7, 0, 1, 3, 3, 7, 7, 7, 5, 6, 1, 5, 3, 1, 2, 1, 7, 3, 1, 3, 1,
       3, 3, 1, 1, 5, 0, 5, 0, 7, 5, 6, 3, 3, 1, 6, 5, 5, 1, 3, 3, 1, 3,
       0, 2, 7, 0, 7, 7, 5, 6, 1, 5, 7, 7, 7, 3, 2, 7, 5, 3, 5, 1, 3, 1,
       5, 1, 7, 5, 1, 5, 2, 3, 3, 2, 3, 5, 3, 5, 1, 5, 5, 5, 0, 3, 6, 5,
       3, 2, 1, 5, 3, 5, 1, 1, 5, 5, 3, 6, 5, 7, 7, 1, 1, 5, 3, 5, 1, 1,
       5, 5, 1, 3, 1, 2, 2, 1, 7, 0, 7, 1, 6, 1, 3, 3, 5, 5, 1, 7, 3, 1,
       3, 0, 1, 1, 3, 6, 3, 5, 5, 7, 5, 3, 5, 2, 6, 1, 2, 1, 3, 1, 3, 1,
       6, 1, 5, 1, 3, 2, 3, 1, 7, 3, 3, 3, 7, 3, 5, 5, 3, 1, 3, 1, 1, 3,
       5, 7, 7, 3, 5, 5, 1, 3, 1, 1, 7, 5, 1, 5, 5, 3, 5, 1, 5, 2, 5, 5,
       7, 6, 7, 5, 1, 1, 5, 1, 1, 7, 3, 5, 1, 1, 5, 1, 1, 1, 7, 1, 5, 5,
       1, 2, 3, 1, 3, 1, 7, 3, 7, 7], dtype=int32)

df2[model_km.labels_ == 1].sort_values(by="Rating", ascending=False).head(10).Title

684           Seven Pounds
734             Mr. Church
760            August Rush
498    Law Abiding Citizen
860            Remember Me
436       The Longest Ride
352         The Dressmaker
554                Colonia
9               Passengers
224      We're the Millers
Name: Title, dtype: object

def get_top_movies(cluster_id):
    movies = (df2[model_km.labels_ == cluster_id]
            .sort_values(by="Rating", ascending=False)
            .head(10))
    return [{"id": i, "title": movies.loc[i].Title, "director": movies.loc[i].Director} 
            for i in movies.index]

x = np.array([1, 2, 3, 4])

y = np.array(["A", "B", "B", "A"])

x[[1, 2]]

array([2, 3])

x[[True, False, False, True]]

array([1, 4])

x[y == "A"]

array([1, 4])

get_top_movies(3)

793          Ma vie de Courgette
483         Perfetti sconosciuti
711              La tortue rouge
380    What We Do in the Shadows
123         Boyka: Undisputed IV
712             The Book of Life
774                 The Fountain
599                     Megamind
538                  True Crimes
677                  Love, Rosie
Name: Title, dtype: object

Q: How to get the cluster id?

X[:5]

array([[ 1.        ,  1.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.41634975,  1.45699912,  3.11268996,  1.00918314],
       [ 0.        ,  1.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  1.        ,  0.        ,
         1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.57591149,  0.29292371,  1.67495992,  0.35940355],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  1.        ,  0.        ,  0.        ,
         0.20360077,  0.61039882, -0.06467572,  0.18219094],
       [ 0.        ,  0.        ,  1.        ,  0.        ,  1.        ,
         0.        ,  0.        ,  1.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        -0.27508443,  0.50457378, -0.57912902,  0.00497832],
       [ 1.        ,  1.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  1.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.52272424, -0.5536766 ,  1.18683856, -1.11736824]])

model_km.predict(X[:5])

array([4, 6, 5, 3, 6], dtype=int32)

df2.head()

get_top_movies(3)

[[793, 'Ma vie de Courgette'],
 [483, 'Perfetti sconosciuti'],
 [711, 'La tortue rouge'],
 [380, 'What We Do in the Shadows'],
 [123, 'Boyka: Undisputed IV'],
 [712, 'The Book of Life'],
 [774, 'The Fountain'],
 [599, 'Megamind'],
 [538, 'True Crimes'],
 [677, 'Love, Rosie']]

How to find the related movies given a movie?

def get_related_movies(movie_id):
    cluster_id = model_km.labels_[movie_id]
    return get_top_movies(cluster_id)

df2.head()

get_related_movies(0)

[{'director': 'Christopher Nolan', 'id': 54, 'title': 'The Dark Knight'},
 {'director': 'Christopher Nolan', 'id': 80, 'title': 'Inception'},
 {'director': 'Christopher Nolan', 'id': 36, 'title': 'Interstellar'},
 {'director': 'Martin Scorsese', 'id': 99, 'title': 'The Departed'},
 {'director': 'Christopher Nolan', 'id': 64, 'title': 'The Prestige'},
 {'director': 'Christopher Nolan',
  'id': 124,
  'title': 'The Dark Knight Rises'},
 {'director': 'Quentin Tarantino', 'id': 144, 'title': 'Django Unchained'},
 {'director': 'Quentin Tarantino', 'id': 77, 'title': 'Inglourious Basterds'},
 {'director': 'Martin Scorsese', 'id': 82, 'title': 'The Wolf of Wall Street'},
 {'director': 'Martin Scorsese', 'id': 138, 'title': 'Shutter Island'}]

Save model¶

Save the data frame first.

df2.to_csv("movies.csv", index=False)

How to save the model?

import joblib

joblib.dump(model_km, "model_km.model")

['model_km.model']

Let us move the required code to a python file.

%%file movies.py
import pandas as pd
import joblib
import sys

df = pd.read_csv("movies.csv")
model_km = joblib.load("model_km.model")

def get_top_movies(cluster_id):
    movies = (df[model_km.labels_ == cluster_id]
            .sort_values(by="Rating", ascending=False)
            .head(10))
    return [{"id": i, "title": movies.loc[i].Title, "director": movies.loc[i].Director} 
            for i in movies.index]

def get_related_movies(movie_id):
    cluster_id = model_km.labels_[movie_id]
    return get_top_movies(cluster_id)

def main():
    movie_id = int(sys.argv[1])
    print(get_related_movies(movie_id))
    
if __name__ == "__main__":
    main()

Overwriting movies.py

!python movies.py 2

[{'id': 96, 'title': 'Kimi no na wa', 'director': 'Makoto Shinkai'}, {'id': 861, 'title': 'Koe no katachi', 'director': 'Naoko Yamada'}, {'id': 478, 'title': 'Paint It Black', 'director': 'Amber Tamblyn'}, {'id': 455, 'title': 'Jagten', 'director': 'Thomas Vinterberg'}, {'id': 641, 'title': 'Relatos salvajes', 'director': 'Damián Szifron'}, {'id': 154, 'title': 'Twin Peaks: The Missing Pieces', 'director': 'David Lynch'}, {'id': 695, 'title': "Hachi: A Dog's Tale", 'director': 'Lasse Hallström'}, {'id': 18, 'title': 'Lion', 'director': 'Garth Davis'}, {'id': 273, 'title': 'Sing Street', 'director': 'John Carney'}, {'id': 184, 'title': 'Forushande', 'director': 'Asghar Farhadi'}]

Running as a Service¶

Install firefly.

pip install firefly-python

And run the following in your terminal.

firefly movies.get_related_movies

That would start that function as an API.

Once it is running, you can use it as an API.

import firefly

api = firefly.Client("http://127.0.0.1:8000/")

api.get_related_movies(movie_id=5)

[{'director': 'Gabriele Muccino', 'id': 684, 'title': 'Seven Pounds'},
 {'director': 'Bruce Beresford', 'id': 734, 'title': 'Mr. Church'},
 {'director': 'Kirsten Sheridan', 'id': 760, 'title': 'August Rush'},
 {'director': 'F. Gary Gray', 'id': 498, 'title': 'Law Abiding Citizen'},
 {'director': 'Allen Coulter', 'id': 860, 'title': 'Remember Me'},
 {'director': 'George Tillman Jr.', 'id': 436, 'title': 'The Longest Ride'},
 {'director': 'Jocelyn Moorhouse', 'id': 352, 'title': 'The Dressmaker'},
 {'director': 'Florian Gallenberger', 'id': 554, 'title': 'Colonia'},
 {'director': 'Morten Tyldum', 'id': 9, 'title': 'Passengers'},
 {'director': 'Rawson Marshall Thurber',
  'id': 224,
  'title': "We're the Millers"}]

	Rank	Title	Genre	Description	Director	Actors	Year	Runtime	Rating	Votes	Revenue	Metascore
0	1	Guardians of the Galaxy	Action,Adventure,Sci-Fi	A group of intergalactic criminals are forced ...	James Gunn	Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...	2014	121	8.1	757074	333.13	76.0
1	2	Prometheus	Adventure,Mystery,Sci-Fi	Following clues to the origin of mankind, a te...	Ridley Scott	Noomi Rapace, Logan Marshall-Green, Michael Fa...	2012	124	7.0	485820	126.46	65.0
2	3	Split	Horror,Thriller	Three girls are kidnapped by a man with a diag...	M. Night Shyamalan	James McAvoy, Anya Taylor-Joy, Haley Lu Richar...	2016	117	7.3	157606	138.12	62.0
3	4	Sing	Animation,Comedy,Family	In a city of humanoid animals, a hustling thea...	Christophe Lourdelet	Matthew McConaughey,Reese Witherspoon, Seth Ma...	2016	108	7.2	60545	270.32	59.0
4	5	Suicide Squad	Action,Adventure,Fantasy	A secret government agency recruits some of th...	David Ayer	Will Smith, Jared Leto, Margot Robbie, Viola D...	2016	123	6.2	393727	325.02	40.0

	Rank	Year	Runtime	Rating	Votes	Revenue	Metascore
Rank	1.000000	-0.261605	-0.221739	-0.219555	-0.283876	-0.271592	-0.191869
Year	-0.261605	1.000000	-0.164900	-0.211219	-0.411904	-0.126790	-0.079305
Runtime	-0.221739	-0.164900	1.000000	0.392214	0.407062	0.267953	0.211978
Rating	-0.219555	-0.211219	0.392214	1.000000	0.511537	0.217654	0.631897
Votes	-0.283876	-0.411904	0.407062	0.511537	1.000000	0.639661	0.325684
Revenue	-0.271592	-0.126790	0.267953	0.217654	0.639661	1.000000	0.142397
Metascore	-0.191869	-0.079305	0.211978	0.631897	0.325684	0.142397	1.000000

	Rank	Title	Genre	Description	Director	Actors	Year	Runtime	Rating	Votes	Revenue	Metascore
0	1	Guardians of the Galaxy	Action,Adventure,Sci-Fi	A group of intergalactic criminals are forced ...	James Gunn	Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...	2014	121	8.1	757074	333.13	76.0
1	2	Prometheus	Adventure,Mystery,Sci-Fi	Following clues to the origin of mankind, a te...	Ridley Scott	Noomi Rapace, Logan Marshall-Green, Michael Fa...	2012	124	7.0	485820	126.46	65.0
2	3	Split	Horror,Thriller	Three girls are kidnapped by a man with a diag...	M. Night Shyamalan	James McAvoy, Anya Taylor-Joy, Haley Lu Richar...	2016	117	7.3	157606	138.12	62.0
3	4	Sing	Animation,Comedy,Family	In a city of humanoid animals, a hustling thea...	Christophe Lourdelet	Matthew McConaughey,Reese Witherspoon, Seth Ma...	2016	108	7.2	60545	270.32	59.0
4	5	Suicide Squad	Action,Adventure,Fantasy	A secret government agency recruits some of th...	David Ayer	Will Smith, Jared Leto, Margot Robbie, Viola D...	2016	123	6.2	393727	325.02	40.0

	Runtime	Rating	Votes	Metascore
0	121	8.1	757074	76.0
1	124	7.0	485820	65.0
2	117	7.3	157606	62.0
3	108	7.2	60545	59.0
4	123	6.2	393727	40.0

	Rank	Title	Genre	Description	Director	Actors	Year	Runtime	Rating	Votes	Revenue	Metascore
0	1	Guardians of the Galaxy	Action,Adventure,Sci-Fi	A group of intergalactic criminals are forced ...	James Gunn	Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...	2014	121	8.1	757074	333.13	76.0
1	2	Prometheus	Adventure,Mystery,Sci-Fi	Following clues to the origin of mankind, a te...	Ridley Scott	Noomi Rapace, Logan Marshall-Green, Michael Fa...	2012	124	7.0	485820	126.46	65.0
2	3	Split	Horror,Thriller	Three girls are kidnapped by a man with a diag...	M. Night Shyamalan	James McAvoy, Anya Taylor-Joy, Haley Lu Richar...	2016	117	7.3	157606	138.12	62.0
3	4	Sing	Animation,Comedy,Family	In a city of humanoid animals, a hustling thea...	Christophe Lourdelet	Matthew McConaughey,Reese Witherspoon, Seth Ma...	2016	108	7.2	60545	270.32	59.0
4	5	Suicide Squad	Action,Adventure,Fantasy	A secret government agency recruits some of th...	David Ayer	Will Smith, Jared Leto, Margot Robbie, Viola D...	2016	123	6.2	393727	325.02	40.0