Freight Optimization¶

Notes: https://notes.pipal.in/2018/vmware-ml2/

1 - Introduction | Freight Optimization

import numpy as np
import pandas as pd

Datasets¶

url = "https://notes.pipal.in/2018/vmware-ml2/10-Secondary-Freight-Data.csv"
matrix_url = "https://notes.pipal.in/2018/vmware-ml2/13-location-distance-matrix.csv"

freight = pd.read_csv(url)

freight.head()

freight.dtypes

Sales Order No. New        int64
Date                      object
Supplying DC Location     object
Customer Code New          int64
Customer Town             object
Qty (in cases)           float64
Freight/Cartage           object
Remarks                   object
Amount                    object
dtype: object

freight.columns

Index(['Sales Order No. New', 'Date', 'Supplying DC Location',
       'Customer Code New', 'Customer Town', 'Qty (in cases)',
       'Freight/Cartage', 'Remarks', 'Amount'],
      dtype='object')

matrix = pd.read_csv(matrix_url)

matrix.head()

Refine the Data¶

Basic Cleaning:

fix column names
drop unused/redundant columns

Basic Cleaning¶

columns = {
    'Sales Order No. New': 'orderno',
    'Date': 'date', 
    'Supplying DC Location': 'source',
    'Customer Code New': 'custcode',
    'Customer Town': 'dest', 
    'Qty (in cases)': 'qty',
    'Freight/Cartage': 'cartage', 
    'Remarks': 'remarks', 
    'Amount': 'amount'
}

freight.rename(columns=columns, inplace=True)

freight.head()

freight.drop(["orderno", "custcode"], axis=1, inplace=True)

freight.head()

freight.amount.dtype

dtype('O')

freight.amount.unique()[:20]

array(['75', '540', '120', '104', '1,000', '84', '255', '138', '48', '39',
       '380', '54', '42', '65', '57', '38', '113', '125', '119', '26'], dtype=object)

freight.dtypes

date        object
source      object
dest        object
qty        float64
cartage     object
remarks     object
amount      object
dtype: object

# remove commas in the numbers
freight.amount.head()

0       75
1      540
2      120
3      104
4    1,000
Name: amount, dtype: object

freight.amount.head().str.replace(",", "")

0      75
1     540
2     120
3     104
4    1000
Name: amount, dtype: object

freight.amount = freight.amount.str.replace(",", "")

freight.amount.head()

0      75
1     540
2     120
3     104
4    1000
Name: amount, dtype: object

Problem: Replace commas from cartage column.

freight.cartage = freight.cartage.str.replace(",", "")

# fix column types
freight.amount = pd.to_numeric(freight.amount)
freight.cartage = pd.to_numeric(freight.cartage)
freight.date = pd.to_datetime(freight.date)

freight.head()

# dtypes
freight.dtypes

date       datetime64[ns]
source             object
dest               object
qty               float64
cartage             int64
remarks            object
amount              int64
dtype: object

Q: Is the remarks column required?

Lets see.

freight.remarks.unique()

array(['Charges - Per case', 'Charges - Per Consignment'], dtype=object)

freight[freight.remarks=='Charges - Per case'].head()

freight[freight.remarks=='Charges - Per Consignment'].head()

Fix Missing Values¶

len(freight)

60976

freight.isnull().head()

freight.isnull().sum()

date       0
source     6
dest       0
qty        0
cartage    0
remarks    0
amount     0
dtype: int64

freight[freight.source.isnull()]

# remove NaN values
freight.dropna(inplace=True)

Standadize text fields¶

shortunits = {
    "Charges - Per case": "case",
    "Charges - Per Consignment": "consignment"
}
freight.remarks = freight.remarks.replace(shortunits)

freight.remarks.value_counts()

case           57776
consignment     3194
Name: remarks, dtype: int64

freight.dest.value_counts()

Chennai             1138
BANGALORE           1134
MUMBAI              1014
AHMEDABAD            799
CHENNAI              779
KOLKATA              748
NEW DELHI            721
HYDERABAD            620
PUNE                 601
Hyderabad            563
THANE                506
New Delhi            469
RANCHI               462
Mumbai               424
LUCKNOW              414
SALEM                382
MADURAI              345
PATNA                342
SURAT                292
JAIPUR               264
RAJKOT               252
Bangalore            245
VARANASI             241
TIRUNELVELI          236
JAMSHEDPUR           226
KANPUR               211
TRIVANDRUM           209
JABALPUR             178
THRISSUR             174
AGRA                 168
                    ... 
BHUPALPALLY            1
Bachepalli             1
Dalsingsarai           1
Bellampalle            1
Thuvarankurichy        1
Distt:Ahmedabad        1
Ghumarwin              1
Puttur                 1
Banga                  1
LALSOT                 1
BANTHARA BAZAAR,       1
KARANPRAYAG            1
DAMANDIU               1
DEBAI                  1
TAMKUHI ROAD           1
TIRODA                 1
Ramanagaram            1
DUNGANJ                1
BARSHI                 1
ANPARA BAZAR           1
Namakkal               1
SONARPUR               1
JAMUI                  1
Bahadurgarh            1
ANDOLE                 1
SHAJAPUR               1
 RAMGANJ MANDI         1
NAWABGANJ              1
Lanka                  1
CHINTPURNI             1
Name: dest, Length: 1923, dtype: int64

def fix_names(c):
    c = c.str.strip()
    c = c.str.title()
    return c

freight.source = fix_names(freight.source)
freight.dest = fix_names(freight.dest)

freight.head()

Q: Which source has the maximum quantity of transfer?

freight.groupby('source').sum().head()

freight.groupby('source').sum().qty.sort_values(ascending=False).head()

source
Bangalore    221221.0
Chennai      140902.0
Madurai      133278.0
Hyderabad    122391.0
Cochin       103790.0
Name: qty, dtype: float64

(freight.groupby('source')
        .sum()
        .qty
        .sort_values(ascending=False)
        .head()
)

source
Bangalore    221221.0
Chennai      140902.0
Madurai      133278.0
Hyderabad    122391.0
Cochin       103790.0
Name: qty, dtype: float64

Q: What is the source/destination pair that has the maximum quantity of transfer?

(freight.groupby(['source', 'dest'])
        .sum()
        .qty
        .sort_values(ascending=False)
        .head()
)

source     dest     
Bangalore  Bangalore    111120.0
Chennai    Chennai       80251.0
Bhiwandi   Mumbai        58666.0
Hyderabad  Hyderabad     54372.0
Delhi      New Delhi     51669.0
Name: qty, dtype: float64

Q: Show data for Bangalore -> Bangalore.

freight[(freight.source == 'Bangalore') & (freight.dest == 'Bangalore')].head()

freight['computed_amount'] = freight.qty * freight.cartage

freight.head()

Problem: Refine the location-distance-matrix data and merge it with the freight dataframe.

Hint: pd.merge?

matrix = pd.read_csv(matrix_url)

matrix.head()

# fix column names
columns = {
    "Source": "source",
    "S. Longitude": "slon",
    "S. Latitude": "slat",
    "Destination": "dest",
    "D. Longitude": "dlon",
    "D. Latitude": "dlat",
    "Lane": "lane",
    "Distance": "distance"
}    
matrix.rename(columns=columns, inplace=True)

matrix.head()

# missing values
matrix.isnull().sum()

source      0
slon        0
slat        0
dest        0
dlon        0
dlat        0
lane        0
distance    0
dtype: int64

# standadize names
matrix.source = fix_names(matrix.source)
matrix.dest = fix_names(matrix.dest)
matrix.head()

(matrix.source + "--" + matrix.dest).nunique(), len(matrix)

(65890, 65896)

# drop the duplicates
matrix.drop_duplicates(['source', 'dest'], inplace=True)

len(matrix)

65890

df = pd.merge(freight, matrix, how='left', on=['source', 'dest'])

len(freight), len(matrix), len(df)

(60970, 65890, 60970)

df.head()

df.isnull().sum()

date                   0
source                 0
dest                   0
qty                    0
cartage                0
remarks                0
amount                 0
computed_amount        0
slon               15782
slat               15782
dlon               15782
dlat               15782
lane               15782
distance           15782
dtype: int64

len(df)

60970

df[df.isnull().sum(axis=1)>0].head()

Q: What is the total amount where distance is missing?

df[df.isnull().sum(axis=1)>0].amount.sum()

11070970

df[df.isnull().sum(axis=1)>0].amount.sum() / df.amount.sum()

0.271028672517011

freight.source.nunique(), matrix.source.nunique(), df.source.nunique()

(31, 54, 31)

freight.source.unique()

array(['Pune', 'Ahemadabad', 'Jaipur', 'Raipur', 'Zirakhpur', 'Cochin',
       'Rohtak', 'Chennai', 'Delhi', 'Ghaziabad', 'Kolkata', 'Bangalore',
       'Madurai', 'Vijaywada', 'Nagpur', 'Lucknow', 'Coimbatore', 'Goa',
       'Hyderabad', 'Ranchi', 'Bhiwandi', 'Gauwhati', 'Bbsr', 'Varanasi',
       'Patna', 'Indore', 'Rishikesh', 'Jammu', 'Parwanoo', 'Chandigarh',
       'Haldwani'], dtype=object)

matrix.source.unique()

array(['Bangalore', 'Chennai', 'Mumbai', 'Hyderabad', 'New Delhi', 'Salem',
       'Coimbatore', 'Pune', 'Srinagar', 'Madurai', 'Kolkata', 'Raipur',
       'Lucknow', 'Imphal', 'Tirunelveli', 'Secunderabad', 'Ahmedabad',
       'Jaipur', 'Mangalore', 'Dimapur', 'Thane', 'Ranchi', 'Kanpur',
       'Varanasi', 'Kollam', 'Vishakapatnam', 'Mysore', 'Trichy', 'Hubli',
       'Madanapalle', 'Bareilly', 'Pudukottai', 'Barhampur', 'Tirupur',
       'Ludhiana', 'Jamshedpur', 'Bhubaneshwar', 'Bhiwandi', 'Chandigarh',
       'Cochin', 'Delhi', 'Guwahati', 'Ghaziabad', 'Goa', 'Haldwani',
       'Indore', 'Jammu', 'Nagpur', 'Parwanoo', 'Patna', 'Rishikesh',
       'Rohtak', 'Vijaywada', 'Zirakpur'], dtype=object)

set(freight.source.unique()) - set(matrix.source.unique())

{'Ahemadabad', 'Bbsr', 'Gauwhati', 'Zirakhpur'}

set(freight.dest.unique()) - set(matrix.dest.unique())

set()

cities = {
    'Ahemadabad': 'Ahmedabad',
    'Bbsr': 'Bhubaneshwar',
    'Gauwhati': 'Guwahati',
    'Zirakhpur': 'Zirakpur'
}
# freight.replace({"source": cities}).head()
freight.replace({"source": cities}, inplace=True)

freight.head()

df = pd.merge(freight, matrix, how='left', on=['source', 'dest'])

df.isnull().sum()

date                  0
source                0
dest                  0
qty                   0
cartage               0
remarks               0
amount                0
computed_amount       0
slon               7136
slat               7136
dlon               7136
dlat               7136
lane               7136
distance           7136
dtype: int64

df[df.isnull().sum(axis=1) > 0].head()

df[df.isnull().sum(axis=1) > 0].amount.sum() / df.amount.sum()

0.12835823942861685

data = df.dropna()

data = data[data.remarks == "case"].copy()

Explore¶

#!pip install altair

import altair as alt
import matplotlib.pyplot as plt
%matplotlib inline

plt.style.use("ggplot")

data.distance.hist(bins=40)

<matplotlib.axes._subplots.AxesSubplot at 0x11cd5f7f0>

data.cartage.hist(bins=40);

data[data.cartage > 100]

data.plot(kind="scatter", x="distance", y="cartage", alpha=0.05);

data.plot(kind="scatter", x="distance", y="cartage", alpha=0.05, logy=True);

data.loc[data.distance==0, "distance"] = 5

data.plot(kind="scatter", x="distance", y="cartage", alpha=0.05, logy=True, logx=True);

data.plot(kind="scatter", 
          x="distance", 
          y="cartage", 
          c="amount",
          alpha=0.05, 
          logy=True, 
          logx=True,
          cmap='viridis');

data.to_csv("data.csv")

Model¶

data.head()

Do we need to take date column in modeling?

Let us see if date has any effect on the cartage.

data[(data.source=='Bangalore') & (data.dest == 'Hubli')].describe()

data.groupby(['source', 'dest']).std().cartage.value_counts()

0.000000     1410
2.462961        1
0.773565        1
2.386378        1
3.829708        1
4.712121        1
3.601470        1
1.054093        1
0.450225        1
18.961218       1
2.905092        1
4.284857        1
1.388730        1
7.807881        1
6.966903        1
4.569226        1
2.745873        1
0.472742        1
1.190891        1
0.876162        1
1.426785        1
0.551364        1
Name: cartage, dtype: int64

data.groupby(['source', 'dest']).std().cartage.hist();

# roll up

agg_func = {
    "cartage": ["mean"], 
    "distance": ["mean"], 
    "qty": ["sum"], 
    "amount": ["sum", "count"]
}

newdata = (data
    .groupby(["source", "dest", "slat", "slon", "dlat", "dlon"])
    .agg(agg_func)
    .reset_index())

newdata.columns = newdata.columns.droplevel(1)
newdata.columns = list(newdata.columns)[:-1] + ['count']
newdata.head()

newdata.plot(kind='scatter', x='distance', y='cartage', alpha=0.3);

newdata.plot(kind='scatter', x='distance', y='cartage', alpha=0.3, logy=True, logx=True);

newdata['distance_log'] = np.log10(newdata.distance)
newdata['cartage_log'] = np.log10(newdata.cartage)
X = newdata[['distance_log']]
y = newdata.cartage_log

X.head()

y.head()

0    1.255273
1    1.255273
2    1.255273
3    1.397940
4    1.342423
Name: cartage_log, dtype: float64

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

model.intercept_, model.coef_

(1.1983538178610935, array([ 0.06244738]))

def show_predicitons(X, y, y_pred):
    plt.scatter(X, y)
    plt.plot(X, y_pred, color='b')

y_pred = model.predict(X)
show_predicitons(X, y, y_pred)

from sklearn.model_selection import cross_val_score

model = LinearRegression()
score = cross_val_score(model, 
                        X, y, 
                        scoring='neg_mean_squared_error', 
                        cv=5, 
                        n_jobs=-1,)
np.mean(score)

-0.083758488565365427

score

array([-0.06201186, -0.04173291, -0.3876596 , -0.02725678, -0.08762837])

Lets build another model with 2 features.

X2 = newdata[['distance_log', 'count']]

model = LinearRegression()
score = cross_val_score(model, 
                        X2, y, 
                        scoring='neg_mean_squared_error', 
                        cv=5, 
                        n_jobs=-1,)
np.mean(score)

-0.084442034144584455

newdata['cartage_per_distance'] = newdata['cartage'] / newdata['distance']
newdata['cartage_per_distance_log'] = np.log10(newdata['cartage_per_distance'])

newdata.plot(kind='scatter', 
             x='distance',
             y='cartage_per_distance',
             logx=True,
             logy=True);

model = LinearRegression()

X = newdata[['distance_log']]
y = newdata['cartage_per_distance_log']

score = cross_val_score(model, 
                        X, y, 
                        scoring='neg_mean_squared_error', 
                        cv=5, 
                        n_jobs=-1,)
np.mean(score)

-0.083758488565365469

model.fit(X, y)
y_pred = model.predict(X)
show_predicitons(X, y, y_pred)

Tree-based models¶

from sklearn.tree import DecisionTreeRegressor

tree = DecisionTreeRegressor(max_depth=2)

X = newdata[['distance_log']]
y = newdata['cartage_per_distance_log']

tree.fit(X, y)

DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
           max_leaf_nodes=None, min_impurity_split=1e-07,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, presort=False, random_state=None,
           splitter='best')

def show_predicitons(X, y, y_pred):
    plt.scatter(X, y, color='r')
    plt.scatter(X, y_pred, color='b')

y_pred = tree.predict(X)
show_predicitons(X, y, y_pred)

#!pip install modelvis

import modelvis

modelvis.render_tree(tree, feature_names=['distance_log'])

model.predict([[1.4]])

array([-0.11421984])

Grid Search¶

from sklearn.model_selection import GridSearchCV

tree = DecisionTreeRegressor()
param_grid = {
    "max_depth": [2, 3, 4, 5, 6, 7, 8, 9, 10]
}
grid = GridSearchCV(tree, 
                    param_grid=param_grid, 
                    cv=10, 
                    scoring="neg_mean_squared_error")
grid.fit(X, y)
grid.best_estimator_

DecisionTreeRegressor(criterion='mse', max_depth=4, max_features=None,
           max_leaf_nodes=None, min_impurity_split=1e-07,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, presort=False, random_state=None,
           splitter='best')

tree = grid.best_estimator_

df2 = newdata.sort_values(by='distance_log')
X = df2[['distance_log']]
y = df2.cartage_per_distance_log

def show_predicitons2(model, X, y):
    X = X.as_matrix()
    X1 = np.arange(X[0].min(), X[0].max(), 1000)
    y1 = model.predict(X1)
    
def show_predicitons(X, y, y_pred):    
    plt.scatter(X, y, color='r')
    plt.plot(X, y_pred, color='b')

y_pred = tree.predict(X)
show_predicitons(X, y, y_pred)

Cars dataset¶

Download the cars dataset and build a model to predict the price of a car.

features: brand, kmpl, bhp, type
target: price

	Source	S. Longitude	S. Latitude	Destination	D. Longitude	D. Latitude	Lane	Distance
0	BANGALORE	77.59	12.98	BANGALORE	77.59	12.98	BANGALORE to BANGALORE	0.00
1	BANGALORE	77.59	12.98	CHENNAI	80.24	13.07	BANGALORE to CHENNAI	294.47
2	BANGALORE	77.59	12.98	MUMBAI	72.84	18.98	BANGALORE to MUMBAI	554.79
3	BANGALORE	77.59	12.98	HYDERABAD	78.49	17.39	BANGALORE to HYDERABAD	142.09
4	BANGALORE	77.59	12.98	NEW DELHI	77.17	28.62	BANGALORE to NEW DELHI	381.46

	date	source	dest	qty	remarks
28572	2008-12-29	NaN	LEH	120.0	Charges - Per case
34377	2009-01-31	NaN	LEH	109.0	Charges - Per case
45204	2009-03-31	NaN	LEH	15.0	Charges - Per case
52852	2009-05-18	NaN	LEH	135.0	Charges - Per case
53438	2009-05-22	NaN	LEH	160.0	Charges - Per case
60903	2009-06-30	NaN	LEH	154.0	Charges - Per case

	qty	cartage	amount
source
Ahemadabad	67593.0	285560	1692459
Bangalore	221221.0	294249	3254982
Bbsr	44318.0	54520	1099986
Bhiwandi	103646.0	252468	1670543
Chandigarh	4247.0	545	21235

	Source	S. Longitude	S. Latitude	Destination	D. Longitude	D. Latitude	Lane	Distance
0	BANGALORE	77.59	12.98	BANGALORE	77.59	12.98	BANGALORE to BANGALORE	0.00
1	BANGALORE	77.59	12.98	CHENNAI	80.24	13.07	BANGALORE to CHENNAI	294.47
2	BANGALORE	77.59	12.98	MUMBAI	72.84	18.98	BANGALORE to MUMBAI	554.79
3	BANGALORE	77.59	12.98	HYDERABAD	78.49	17.39	BANGALORE to HYDERABAD	142.09
4	BANGALORE	77.59	12.98	NEW DELHI	77.17	28.62	BANGALORE to NEW DELHI	381.46

	source	slon	slat	dest	dlon	dlat	lane	distance
0	BANGALORE	77.59	12.98	BANGALORE	77.59	12.98	BANGALORE to BANGALORE	0.00
1	BANGALORE	77.59	12.98	CHENNAI	80.24	13.07	BANGALORE to CHENNAI	294.47
2	BANGALORE	77.59	12.98	MUMBAI	72.84	18.98	BANGALORE to MUMBAI	554.79
3	BANGALORE	77.59	12.98	HYDERABAD	78.49	17.39	BANGALORE to HYDERABAD	142.09
4	BANGALORE	77.59	12.98	NEW DELHI	77.17	28.62	BANGALORE to NEW DELHI	381.46

	Sales Order No. New	Date	Supplying DC Location	Customer Code New	Customer Town	Qty (in cases)	Freight/Cartage	Remarks	Amount
0	2004912014	1-Jul-08	Pune	190886	PUNE	15.0	5	Charges - Per case	75
1	2004912846	1-Jul-08	Ahemadabad	190406	JAMNAGAR	30.0	18	Charges - Per case	540
2	2004913418	1-Jul-08	Jaipur	188582	BAGRU	6.0	20	Charges - Per case	120
3	2004916450	2-Jul-08	Raipur	191024	RAIPUR	23.0	5	Charges - Per case	104
4	2004916806	2-Jul-08	Zirakhpur	207786	Banur	1.0	1,000	Charges - Per Consignment	1,000

	date	source	dest	qty	cartage	remarks	amount
0	2008-07-01	Pune	PUNE	15.0	5	Charges - Per case	75
1	2008-07-01	Ahemadabad	JAMNAGAR	30.0	18	Charges - Per case	540
2	2008-07-01	Jaipur	BAGRU	6.0	20	Charges - Per case	120
3	2008-07-02	Raipur	RAIPUR	23.0	5	Charges - Per case	104
4	2008-07-02	Zirakhpur	Banur	1.0	1000	Charges - Per Consignment	1000

	date	source	dest	qty	cartage	remarks	amount
0	False	False	False	False	False	False	False
1	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False
3	False	False	False	False	False	False	False
4	False	False	False	False	False	False	False

	date	source	dest	qty	cartage	remarks	amount
36	2008-07-04	Bangalore	Bangalore	345.0	7	case	2415
51	2008-07-04	Bangalore	Bangalore	54.0	7	case	378
121	2008-07-04	Bangalore	Bangalore	126.0	7	case	882
125	2008-07-04	Bangalore	Bangalore	40.0	7	case	280
150	2008-07-05	Bangalore	Bangalore	76.0	7	case	532

	date	source	dest	qty	cartage	remarks	amount	computed_amount	slon	slat	dlon	dlat	lane	distance
4	2008-07-02	Zirakpur	Banur	1.0	1000	consignment	1000	1000.0	NaN	NaN	NaN	NaN	NaN	NaN
11	2008-07-02	Cochin	Mattanchery	38.0	10	case	380	380.0	NaN	NaN	NaN	NaN	NaN	NaN
31	2008-07-03	Ghaziabad	Noida	17.0	250	consignment	250	4250.0	NaN	NaN	NaN	NaN	NaN	NaN
73	2008-07-04	Cochin	Tripunithura	6.0	10	case	60	60.0	NaN	NaN	NaN	NaN	NaN	NaN
90	2008-07-04	Goa	Talegao	11.0	21	case	231	231.0	NaN	NaN	NaN	NaN	NaN	NaN

	date	source	dest	qty	cartage	remarks	amount	computed_amount	slon	slat	dlon	dlat	lane	distance
2270	2008-07-21	Kolkata	Andaman	63.0	204	case	12821	12852.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
2861	2008-07-25	Kolkata	Andaman	6.0	204	case	1221	1224.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
6336	2008-08-18	Kolkata	Andaman	35.0	204	case	7123	7140.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
6339	2008-08-18	Kolkata	Andaman	3.0	204	case	611	612.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
11737	2008-09-20	Kolkata	Andaman	60.0	204	case	12210	12240.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
17669	2008-10-30	Kolkata	Andaman	43.0	204	case	8751	8772.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
24520	2008-12-08	Kolkata	Andaman	56.0	204	case	11396	11424.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
30650	2009-01-19	Kolkata	Andaman	59.0	204	case	12007	12036.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
35159	2009-02-09	Kolkata	Andaman	51.0	204	case	10379	10404.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
42529	2009-03-23	Kolkata	Andaman	59.0	204	case	12007	12036.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
46240	2009-04-14	Kolkata	Andaman	0.0	204	case	0	0.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
46448	2009-04-14	Kolkata	Andaman	32.0	204	case	6512	6528.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
50928	2009-05-07	Kolkata	Andaman	48.0	204	case	9768	9792.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
58038	2009-06-18	Kolkata	Andaman	28.0	204	case	5698	5712.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33
58993	2009-06-24	Kolkata	Andaman	28.0	204	case	5698	5712.0	88.33	22.63	92.69	11.68	KOLKATA to ANDAMAN	483.33

	qty	cartage	amount	computed_amount	slon	slat	dlon	dlat	distance
count	166.000000	166.0	166.000000	166.000000	1.660000e+02	1.660000e+02	1.660000e+02	1.660000e+02	1.660000e+02
mean	49.608434	26.0	1314.024096	1289.819277	7.759000e+01	1.298000e+01	7.514000e+01	1.535000e+01	2.800100e+02
std	35.485170	0.0	939.946884	922.614410	1.140308e-13	1.425385e-14	9.977697e-14	4.632502e-14	1.140308e-13
min	1.000000	26.0	26.000000	26.000000	7.759000e+01	1.298000e+01	7.514000e+01	1.535000e+01	2.800100e+02
25%	23.250000	26.0	615.750000	604.500000	7.759000e+01	1.298000e+01	7.514000e+01	1.535000e+01	2.800100e+02
50%	42.000000	26.0	1112.500000	1092.000000	7.759000e+01	1.298000e+01	7.514000e+01	1.535000e+01	2.800100e+02
75%	71.750000	26.0	1900.500000	1865.500000	7.759000e+01	1.298000e+01	7.514000e+01	1.535000e+01	2.800100e+02
max	179.000000	26.0	4741.000000	4654.000000	7.759000e+01	1.298000e+01	7.514000e+01	1.535000e+01	2.800100e+02

	source	dest	slat	slon	dlat	dlon	cartage	distance	qty	amount	count
0	Ahmedabad	Ahmadabad	23.03	72.6	23.03	72.60	18.0	5.00	26.0	468	3
1	Ahmedabad	Ahmedabad	23.03	72.6	23.03	72.60	18.0	5.00	12236.0	220248	797
2	Ahmedabad	Ahmedabd	23.03	72.6	23.03	72.60	18.0	5.00	33.0	594	4
3	Ahmedabad	Amreli	23.03	72.6	21.60	71.22	25.0	161.25	111.0	2775	15
4	Ahmedabad	Anand	23.03	72.6	22.55	72.95	22.0	42.40	155.0	3410	21

	distance_log
0	0.698970
1	0.698970
2	0.698970
3	2.207500
4	1.627366