Movies Clustering

Frame the Problem

Can we cluster movies that go together based on

  • Name
  • Genre
  • Director
  • Actor
  • Year of Release
  • Running Length
  • Language
  • Production House
  • Ratings

Acquire the Data

IMDB for Western Movies

In [80]:
import pandas as pd
import numpy as np
In [54]:
url = "https://notes.pipal.in/2018/airwatch-ml/IMDB-Movie-Data.csv"
In [55]:
df = pd.read_csv(url)
In [56]:
df.columns
Out[56]:
Index(['Rank', 'Title', 'Genre', 'Description', 'Director', 'Actors', 'Year',
       'Runtime (Minutes)', 'Rating', 'Votes', 'Revenue (Millions)',
       'Metascore'],
      dtype='object')
In [57]:
df.shape
Out[57]:
(1000, 12)
In [58]:
colnames = ['Rank', 'Title', 'Genre', 'Description', 'Director', 'Actors', 'Year',
       'Runtime', 'Rating', 'Votes', 'Revenue', 'Metascore']
In [59]:
df.columns = colnames
In [60]:
df.head()
Out[60]:
Rank Title Genre Description Director Actors Year Runtime Rating Votes Revenue Metascore
0 1 Guardians of the Galaxy Action,Adventure,Sci-Fi A group of intergalactic criminals are forced ... James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... 2014 121 8.1 757074 333.13 76.0
1 2 Prometheus Adventure,Mystery,Sci-Fi Following clues to the origin of mankind, a te... Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael Fa... 2012 124 7.0 485820 126.46 65.0
2 3 Split Horror,Thriller Three girls are kidnapped by a man with a diag... M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... 2016 117 7.3 157606 138.12 62.0
3 4 Sing Animation,Comedy,Family In a city of humanoid animals, a hustling thea... Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... 2016 108 7.2 60545 270.32 59.0
4 5 Suicide Squad Action,Adventure,Fantasy A secret government agency recruits some of th... David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... 2016 123 6.2 393727 325.02 40.0

Refine the Data

Encoding

  • Genre (One Hot Encoding)
  • Description => Text Analytics => Similiar Themers, Keywords
  • Director => ??
  • Actors => ??
  • Year
  • Runtime, Rating, Votes, Revenue, Metascore: Bucket them or leave it as it is

Check for Missing Values

In [61]:
df.isnull().sum()
Out[61]:
Rank             0
Title            0
Genre            0
Description      0
Director         0
Actors           0
Year             0
Runtime          0
Rating           0
Votes            0
Revenue        128
Metascore       64
dtype: int64
In [62]:
df.Metascore.mean()
Out[62]:
58.98504273504273
In [63]:
import matplotlib.pyplot as plt
%matplotlib inline
In [64]:
df.corr()
Out[64]:
Rank Year Runtime Rating Votes Revenue Metascore
Rank 1.000000 -0.261605 -0.221739 -0.219555 -0.283876 -0.271592 -0.191869
Year -0.261605 1.000000 -0.164900 -0.211219 -0.411904 -0.126790 -0.079305
Runtime -0.221739 -0.164900 1.000000 0.392214 0.407062 0.267953 0.211978
Rating -0.219555 -0.211219 0.392214 1.000000 0.511537 0.217654 0.631897
Votes -0.283876 -0.411904 0.407062 0.511537 1.000000 0.639661 0.325684
Revenue -0.271592 -0.126790 0.267953 0.217654 0.639661 1.000000 0.142397
Metascore -0.191869 -0.079305 0.211978 0.631897 0.325684 0.142397 1.000000
In [65]:
plt.scatter(df.Metascore, df.Rating)
Out[65]:
<matplotlib.collections.PathCollection at 0x11fce3f28>
In [66]:
df_clean = df[['Rating', 'Metascore']].dropna()
In [67]:
from sklearn.linear_model import LinearRegression
metascore_model = LinearRegression()
metascore_model.fit(df_clean[['Rating']], df_clean.Metascore)
Out[67]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [68]:
metascore_model.coef_, metascore_model.intercept_
Out[68]:
(array([11.61785393]), -19.193432676867538)
In [69]:
df2 = df.copy(deep=True)
In [70]:
missing_indices = df2.Metascore.isnull()
df2['Metascore'][missing_indices] = metascore_model.predict(df2[['Rating']][missing_indices])
/Users/amitkaps/miniconda3/lib/python3.6/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
In [71]:
df2.isnull().sum()
Out[71]:
Rank             0
Title            0
Genre            0
Description      0
Director         0
Actors           0
Year             0
Runtime          0
Rating           0
Votes            0
Revenue        128
Metascore        0
dtype: int64
In [72]:
df.isnull().sum()
Out[72]:
Rank             0
Title            0
Genre            0
Description      0
Director         0
Actors           0
Year             0
Runtime          0
Rating           0
Votes            0
Revenue        128
Metascore       64
dtype: int64
In [73]:
df2.head()
Out[73]:
Rank Title Genre Description Director Actors Year Runtime Rating Votes Revenue Metascore
0 1 Guardians of the Galaxy Action,Adventure,Sci-Fi A group of intergalactic criminals are forced ... James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... 2014 121 8.1 757074 333.13 76.0
1 2 Prometheus Adventure,Mystery,Sci-Fi Following clues to the origin of mankind, a te... Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael Fa... 2012 124 7.0 485820 126.46 65.0
2 3 Split Horror,Thriller Three girls are kidnapped by a man with a diag... M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... 2016 117 7.3 157606 138.12 62.0
3 4 Sing Animation,Comedy,Family In a city of humanoid animals, a hustling thea... Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... 2016 108 7.2 60545 270.32 59.0
4 5 Suicide Squad Action,Adventure,Fantasy A secret government agency recruits some of th... David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... 2016 123 6.2 393727 325.02 40.0

Transform the Data

In [100]:
# Number of Genres
df.Genre.str.split(",", expand=True).stack().unique().shape
Out[100]:
(20,)
In [112]:
# One-Hot Encoding for Genres
genre = (df.Genre
   .str
   .split(",", expand=True)
   .stack()
   .str
   .get_dummies()
   .sum(level=0)
)
In [117]:
# Scale the Continuous Values
from sklearn.preprocessing import StandardScaler
In [118]:
df_cont = df2[["Runtime", "Rating", "Votes", "Metascore"]]
In [119]:
df_cont.head()
Out[119]:
Runtime Rating Votes Metascore
0 121 8.1 757074 76.0
1 124 7.0 485820 65.0
2 117 7.3 157606 62.0
3 108 7.2 60545 59.0
4 123 6.2 393727 40.0
In [121]:
SS = StandardScaler()
SS.fit(df_cont)
Out[121]:
StandardScaler(copy=True, with_mean=True, with_std=True)
In [122]:
df_cont_scaled = SS.transform(df_cont)
In [124]:
df_cont_scaled
Out[124]:
array([[ 0.41634975,  1.45699912,  3.11268996,  1.00918314],
       [ 0.57591149,  0.29292371,  1.67495992,  0.35940355],
       [ 0.20360077,  0.61039882, -0.06467572,  0.18219094],
       ...,
       [-0.80695688, -0.5536766 , -0.52530968, -0.52665952],
       [-1.0728931 , -1.18862683, -0.87416543, -2.18064393],
       [-1.39201657, -1.50610194, -0.83412689, -2.83042352]])

Model the Data

Simliar clusters of movies based on

  • One-Hot Genres (20)
  • Runtime
  • Rating
  • Votes
  • Metascore
In [126]:
X = np.c_[np.array(genre), df_cont_scaled]
In [128]:
X.shape
Out[128]:
(1000, 24)
In [129]:
from sklearn.cluster import KMeans
In [130]:
model_km = KMeans()
In [131]:
model_km.fit(X)
Out[131]:
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=8, n_init=10, n_jobs=1, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)
In [135]:
model_km.labels_
Out[135]:
array([4, 6, 5, 3, 6, 1, 2, 3, 2, 1, 6, 5, 6, 0, 3, 3, 2, 6, 5, 0, 1, 2,
       5, 3, 1, 3, 2, 7, 3, 1, 1, 5, 6, 0, 1, 2, 4, 6, 6, 5, 3, 5, 7, 5,
       3, 6, 5, 7, 6, 7, 4, 1, 1, 0, 4, 2, 3, 5, 5, 5, 6, 3, 1, 7, 4, 6,
       5, 0, 5, 5, 1, 5, 5, 1, 0, 6, 4, 4, 6, 1, 4, 5, 4, 2, 6, 6, 1, 4,
       2, 6, 2, 6, 2, 5, 6, 5, 5, 3, 5, 4, 5, 6, 2, 5, 6, 5, 1, 1, 1, 0,
       3, 2, 3, 6, 0, 1, 1, 2, 5, 0, 5, 6, 3, 3, 4, 5, 6, 3, 3, 2, 2, 5,
       3, 0, 6, 2, 0, 2, 4, 6, 0, 3, 2, 0, 4, 2, 5, 2, 0, 1, 3, 4, 3, 5,
       5, 7, 6, 0, 0, 0, 1, 3, 0, 1, 2, 1, 6, 5, 2, 6, 0, 5, 7, 0, 0, 6,
       2, 0, 5, 6, 3, 3, 5, 1, 5, 1, 1, 5, 0, 5, 5, 7, 2, 7, 2, 6, 3, 2,
       0, 1, 0, 1, 5, 4, 1, 6, 5, 5, 5, 5, 1, 3, 6, 1, 6, 1, 6, 1, 3, 0,
       3, 0, 3, 2, 1, 2, 5, 6, 6, 7, 0, 7, 2, 6, 1, 0, 5, 7, 0, 1, 2, 0,
       3, 6, 1, 2, 0, 5, 5, 0, 7, 5, 2, 6, 0, 0, 6, 0, 3, 5, 0, 0, 3, 5,
       1, 7, 1, 2, 6, 7, 2, 4, 0, 5, 3, 6, 0, 2, 5, 6, 6, 6, 5, 5, 1, 1,
       1, 1, 1, 3, 5, 5, 3, 5, 6, 1, 7, 5, 5, 2, 6, 5, 0, 3, 1, 2, 1, 1,
       1, 1, 5, 2, 5, 6, 2, 5, 1, 7, 3, 6, 1, 3, 6, 6, 0, 1, 1, 5, 3, 1,
       3, 2, 0, 2, 0, 3, 0, 3, 2, 1, 1, 0, 6, 3, 6, 1, 6, 2, 7, 0, 5, 5,
       1, 3, 1, 1, 5, 5, 3, 5, 3, 7, 6, 0, 3, 0, 1, 5, 6, 1, 3, 6, 1, 6,
       5, 1, 5, 2, 3, 1, 3, 5, 3, 2, 0, 6, 1, 6, 6, 6, 3, 1, 5, 7, 7, 5,
       6, 1, 7, 3, 3, 7, 3, 0, 1, 5, 2, 0, 6, 1, 0, 3, 0, 5, 1, 3, 5, 5,
       2, 5, 5, 7, 1, 1, 1, 2, 1, 0, 7, 1, 2, 7, 6, 6, 3, 3, 1, 5, 1, 3,
       6, 5, 0, 5, 7, 6, 3, 5, 0, 3, 6, 3, 1, 1, 1, 5, 1, 1, 1, 5, 2, 7,
       5, 5, 3, 6, 3, 3, 0, 3, 2, 5, 6, 2, 0, 5, 2, 7, 5, 5, 5, 1, 1, 3,
       3, 0, 5, 1, 3, 0, 3, 6, 3, 1, 5, 0, 6, 0, 1, 0, 6, 5, 5, 5, 2, 1,
       5, 0, 7, 0, 5, 6, 7, 5, 3, 0, 6, 1, 2, 1, 5, 1, 1, 5, 5, 7, 6, 7,
       6, 6, 1, 7, 6, 3, 5, 1, 6, 1, 3, 1, 5, 0, 1, 5, 1, 5, 3, 6, 5, 5,
       1, 5, 7, 5, 1, 7, 5, 1, 6, 3, 6, 3, 3, 0, 6, 1, 6, 0, 3, 3, 1, 3,
       5, 0, 5, 1, 5, 6, 2, 3, 7, 7, 7, 1, 2, 1, 5, 1, 6, 0, 2, 1, 5, 1,
       1, 5, 1, 3, 0, 3, 1, 3, 1, 3, 3, 5, 5, 3, 2, 6, 1, 2, 1, 2, 3, 1,
       7, 5, 3, 5, 5, 1, 1, 7, 1, 6, 6, 3, 3, 3, 3, 3, 5, 3, 0, 0, 0, 5,
       3, 1, 0, 5, 7, 5, 3, 0, 1, 7, 7, 1, 0, 1, 3, 3, 3, 2, 2, 1, 5, 3,
       6, 3, 5, 3, 5, 5, 1, 1, 1, 2, 1, 1, 2, 1, 3, 6, 1, 3, 3, 2, 6, 5,
       3, 1, 1, 0, 5, 1, 0, 5, 7, 2, 3, 1, 0, 5, 3, 5, 6, 7, 3, 0, 6, 2,
       1, 3, 5, 3, 2, 3, 6, 3, 3, 2, 1, 3, 1, 3, 6, 3, 5, 5, 1, 7, 6, 3,
       3, 0, 6, 3, 1, 3, 3, 6, 1, 2, 3, 6, 1, 1, 7, 1, 2, 0, 1, 1, 7, 1,
       3, 1, 5, 3, 0, 1, 3, 5, 7, 1, 1, 0, 1, 5, 5, 1, 7, 2, 6, 6, 3, 3,
       6, 7, 0, 1, 3, 3, 7, 7, 7, 5, 6, 1, 5, 3, 1, 2, 1, 7, 3, 1, 3, 1,
       3, 3, 1, 1, 5, 0, 5, 0, 7, 5, 6, 3, 3, 1, 6, 5, 5, 1, 3, 3, 1, 3,
       0, 2, 7, 0, 7, 7, 5, 6, 1, 5, 7, 7, 7, 3, 2, 7, 5, 3, 5, 1, 3, 1,
       5, 1, 7, 5, 1, 5, 2, 3, 3, 2, 3, 5, 3, 5, 1, 5, 5, 5, 0, 3, 6, 5,
       3, 2, 1, 5, 3, 5, 1, 1, 5, 5, 3, 6, 5, 7, 7, 1, 1, 5, 3, 5, 1, 1,
       5, 5, 1, 3, 1, 2, 2, 1, 7, 0, 7, 1, 6, 1, 3, 3, 5, 5, 1, 7, 3, 1,
       3, 0, 1, 1, 3, 6, 3, 5, 5, 7, 5, 3, 5, 2, 6, 1, 2, 1, 3, 1, 3, 1,
       6, 1, 5, 1, 3, 2, 3, 1, 7, 3, 3, 3, 7, 3, 5, 5, 3, 1, 3, 1, 1, 3,
       5, 7, 7, 3, 5, 5, 1, 3, 1, 1, 7, 5, 1, 5, 5, 3, 5, 1, 5, 2, 5, 5,
       7, 6, 7, 5, 1, 1, 5, 1, 1, 7, 3, 5, 1, 1, 5, 1, 1, 1, 7, 1, 5, 5,
       1, 2, 3, 1, 3, 1, 7, 3, 7, 7], dtype=int32)
In [139]:
df2[model_km.labels_ == 1].sort_values(by="Rating", ascending=False).head(10).Title
Out[139]:
684           Seven Pounds
734             Mr. Church
760            August Rush
498    Law Abiding Citizen
860            Remember Me
436       The Longest Ride
352         The Dressmaker
554                Colonia
9               Passengers
224      We're the Millers
Name: Title, dtype: object
In [189]:
def get_top_movies(cluster_id):
    movies = (df2[model_km.labels_ == cluster_id]
            .sort_values(by="Rating", ascending=False)
            .head(10))
    return [{"id": i, "title": movies.loc[i].Title, "director": movies.loc[i].Director} 
            for i in movies.index]
In [140]:
x = np.array([1, 2, 3, 4])
In [144]:
y = np.array(["A", "B", "B", "A"])
In [141]:
x[[1, 2]]
Out[141]:
array([2, 3])
In [142]:
x[[True, False, False, True]]
Out[142]:
array([1, 4])
In [145]:
x[y == "A"]
Out[145]:
array([1, 4])
In [147]:
get_top_movies(3)
Out[147]:
793          Ma vie de Courgette
483         Perfetti sconosciuti
711              La tortue rouge
380    What We Do in the Shadows
123         Boyka: Undisputed IV
712             The Book of Life
774                 The Fountain
599                     Megamind
538                  True Crimes
677                  Love, Rosie
Name: Title, dtype: object

Q: How to get the cluster id?

In [152]:
X[:5]
Out[152]:
array([[ 1.        ,  1.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.41634975,  1.45699912,  3.11268996,  1.00918314],
       [ 0.        ,  1.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  1.        ,  0.        ,
         1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.57591149,  0.29292371,  1.67495992,  0.35940355],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  1.        ,  0.        ,  0.        ,
         0.20360077,  0.61039882, -0.06467572,  0.18219094],
       [ 0.        ,  0.        ,  1.        ,  0.        ,  1.        ,
         0.        ,  0.        ,  1.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        -0.27508443,  0.50457378, -0.57912902,  0.00497832],
       [ 1.        ,  1.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  1.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.52272424, -0.5536766 ,  1.18683856, -1.11736824]])
In [153]:
model_km.predict(X[:5])
Out[153]:
array([4, 6, 5, 3, 6], dtype=int32)
In [154]:
df2.head()
Out[154]:
Rank Title Genre Description Director Actors Year Runtime Rating Votes Revenue Metascore
0 1 Guardians of the Galaxy Action,Adventure,Sci-Fi A group of intergalactic criminals are forced ... James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... 2014 121 8.1 757074 333.13 76.0
1 2 Prometheus Adventure,Mystery,Sci-Fi Following clues to the origin of mankind, a te... Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael Fa... 2012 124 7.0 485820 126.46 65.0
2 3 Split Horror,Thriller Three girls are kidnapped by a man with a diag... M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... 2016 117 7.3 157606 138.12 62.0
3 4 Sing Animation,Comedy,Family In a city of humanoid animals, a hustling thea... Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... 2016 108 7.2 60545 270.32 59.0
4 5 Suicide Squad Action,Adventure,Fantasy A secret government agency recruits some of th... David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... 2016 123 6.2 393727 325.02 40.0
In [161]:
get_top_movies(3)
Out[161]:
[[793, 'Ma vie de Courgette'],
 [483, 'Perfetti sconosciuti'],
 [711, 'La tortue rouge'],
 [380, 'What We Do in the Shadows'],
 [123, 'Boyka: Undisputed IV'],
 [712, 'The Book of Life'],
 [774, 'The Fountain'],
 [599, 'Megamind'],
 [538, 'True Crimes'],
 [677, 'Love, Rosie']]

How to find the related movies given a movie?

In [162]:
def get_related_movies(movie_id):
    cluster_id = model_km.labels_[movie_id]
    return get_top_movies(cluster_id)
In [192]:
df2.head()
Out[192]:
Rank Title Genre Description Director Actors Year Runtime Rating Votes Revenue Metascore
0 1 Guardians of the Galaxy Action,Adventure,Sci-Fi A group of intergalactic criminals are forced ... James Gunn Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S... 2014 121 8.1 757074 333.13 76.0
1 2 Prometheus Adventure,Mystery,Sci-Fi Following clues to the origin of mankind, a te... Ridley Scott Noomi Rapace, Logan Marshall-Green, Michael Fa... 2012 124 7.0 485820 126.46 65.0
2 3 Split Horror,Thriller Three girls are kidnapped by a man with a diag... M. Night Shyamalan James McAvoy, Anya Taylor-Joy, Haley Lu Richar... 2016 117 7.3 157606 138.12 62.0
3 4 Sing Animation,Comedy,Family In a city of humanoid animals, a hustling thea... Christophe Lourdelet Matthew McConaughey,Reese Witherspoon, Seth Ma... 2016 108 7.2 60545 270.32 59.0
4 5 Suicide Squad Action,Adventure,Fantasy A secret government agency recruits some of th... David Ayer Will Smith, Jared Leto, Margot Robbie, Viola D... 2016 123 6.2 393727 325.02 40.0
In [193]:
get_related_movies(0)
Out[193]:
[{'director': 'Christopher Nolan', 'id': 54, 'title': 'The Dark Knight'},
 {'director': 'Christopher Nolan', 'id': 80, 'title': 'Inception'},
 {'director': 'Christopher Nolan', 'id': 36, 'title': 'Interstellar'},
 {'director': 'Martin Scorsese', 'id': 99, 'title': 'The Departed'},
 {'director': 'Christopher Nolan', 'id': 64, 'title': 'The Prestige'},
 {'director': 'Christopher Nolan',
  'id': 124,
  'title': 'The Dark Knight Rises'},
 {'director': 'Quentin Tarantino', 'id': 144, 'title': 'Django Unchained'},
 {'director': 'Quentin Tarantino', 'id': 77, 'title': 'Inglourious Basterds'},
 {'director': 'Martin Scorsese', 'id': 82, 'title': 'The Wolf of Wall Street'},
 {'director': 'Martin Scorsese', 'id': 138, 'title': 'Shutter Island'}]

Save model

Save the data frame first.

In [194]:
df2.to_csv("movies.csv", index=False)

How to save the model?

In [195]:
import joblib
In [196]:
joblib.dump(model_km, "model_km.model")
Out[196]:
['model_km.model']

Let us move the required code to a python file.

In [201]:
%%file movies.py
import pandas as pd
import joblib
import sys

df = pd.read_csv("movies.csv")
model_km = joblib.load("model_km.model")

def get_top_movies(cluster_id):
    movies = (df[model_km.labels_ == cluster_id]
            .sort_values(by="Rating", ascending=False)
            .head(10))
    return [{"id": i, "title": movies.loc[i].Title, "director": movies.loc[i].Director} 
            for i in movies.index]

def get_related_movies(movie_id):
    cluster_id = model_km.labels_[movie_id]
    return get_top_movies(cluster_id)

def main():
    movie_id = int(sys.argv[1])
    print(get_related_movies(movie_id))
    
if __name__ == "__main__":
    main()
Overwriting movies.py
In [202]:
!python movies.py 2
[{'id': 96, 'title': 'Kimi no na wa', 'director': 'Makoto Shinkai'}, {'id': 861, 'title': 'Koe no katachi', 'director': 'Naoko Yamada'}, {'id': 478, 'title': 'Paint It Black', 'director': 'Amber Tamblyn'}, {'id': 455, 'title': 'Jagten', 'director': 'Thomas Vinterberg'}, {'id': 641, 'title': 'Relatos salvajes', 'director': 'Damián Szifron'}, {'id': 154, 'title': 'Twin Peaks: The Missing Pieces', 'director': 'David Lynch'}, {'id': 695, 'title': "Hachi: A Dog's Tale", 'director': 'Lasse Hallström'}, {'id': 18, 'title': 'Lion', 'director': 'Garth Davis'}, {'id': 273, 'title': 'Sing Street', 'director': 'John Carney'}, {'id': 184, 'title': 'Forushande', 'director': 'Asghar Farhadi'}]

Running as a Service

Install firefly.

pip install firefly-python

And run the following in your terminal.

firefly movies.get_related_movies

That would start that function as an API.

Once it is running, you can use it as an API.

In [203]:
import firefly
In [204]:
api = firefly.Client("http://127.0.0.1:8000/")
In [205]:
api.get_related_movies(movie_id=5)
Out[205]:
[{'director': 'Gabriele Muccino', 'id': 684, 'title': 'Seven Pounds'},
 {'director': 'Bruce Beresford', 'id': 734, 'title': 'Mr. Church'},
 {'director': 'Kirsten Sheridan', 'id': 760, 'title': 'August Rush'},
 {'director': 'F. Gary Gray', 'id': 498, 'title': 'Law Abiding Citizen'},
 {'director': 'Allen Coulter', 'id': 860, 'title': 'Remember Me'},
 {'director': 'George Tillman Jr.', 'id': 436, 'title': 'The Longest Ride'},
 {'director': 'Jocelyn Moorhouse', 'id': 352, 'title': 'The Dressmaker'},
 {'director': 'Florian Gallenberger', 'id': 554, 'title': 'Colonia'},
 {'director': 'Morten Tyldum', 'id': 9, 'title': 'Passengers'},
 {'director': 'Rawson Marshall Thurber',
  'id': 224,
  'title': "We're the Millers"}]