Tutorial using Kobe Bryant Dataset – Part 4

This part is a Tutorial using Kobe Bryant Dataset – Part 4. You can get the data from https://www.kaggle.com/c/kobe-bryant-shot-selection . What excited me was that this dataset is excellent to practice classification basics, feature engineering, and time series analysis. This is continued from here.

Exploring the data

In [215]:
```#Shot accuracy
```
Out[215]:
`<matplotlib.axes._subplots.AxesSubplot at 0x270898aa780>`

In [216]:
```data['shot_made_flag'].value_counts() / data['shot_made_flag'].shape
#He scores around 45% of his shots.
```
Out[216]:
```0.0    0.553839
1.0    0.446161
In [218]:
```# Let's see his attempts depending on the seconds to the end of a period:
data['timeRemaining'].plot(kind='hist', bins=24, xlim=(720, 0), figsize=(12,6),
title='Attempts made over time\n(seconds to the end of period)')
```
Out[218]:
`<matplotlib.axes._subplots.AxesSubplot at 0x2709186a710>`

In [219]:
```# Accuracy of those shots:
time_bins = np.arange(0, 721, 30)
attempts_in_time = pd.cut(data['timeRemaining'], time_bins, right=False)
grouped = data.groupby(attempts_in_time)

prec[::-1].plot(kind='bar', figsize=(12, 6), ylim=(0.2, 0.5),
title='Shot accuracy over time\n(seconds to the end of period)')
```
Out[219]:
`<matplotlib.axes._subplots.AxesSubplot at 0x27089b2ca90>`

In [220]:
```#Lots of attempts in last 30 seconds, and much worse accuracy than usual. Let's explore that more.
#Shots in the last seconds of a period

last_30 = data[data['timeRemaining'] < 30]
#In the last 30 seconds he scores only about 33% of his shots. Pressure?
```
Out[220]:
```0.0    0.666305
1.0    0.333695
In [221]:
```#Let's explore what happens in those last minutes of the game.
last_2min = data[data['timeRemaining'] <= 120]

last_2min['timeRemaining'].plot(kind='hist', bins=30, xlim=(120, 0), figsize=(12,6),
title='Attempts made over time\n(seconds to the end of period)')
#Ok, this explains things a bit. Plenty of last seconds desperate shots.
```
Out[221]:
`<matplotlib.axes._subplots.AxesSubplot at 0x270934a8780>`

In [222]:
```#Let's return to last 30 seconds.
last_30['timeRemaining'].plot(kind='hist', bins=10, xlim=(30, 0), figsize=(12,6),
title='Attempts made over time\n(seconds to the end of period)')
```
Out[222]:
`<matplotlib.axes._subplots.AxesSubplot at 0x2709ce4c208>`

In [224]:
```last_5sec_misses = data[(data['timeRemaining'] <= 5) & (data['shot_made_flag'] == 0)]
last_5sec_scores = data[(data['timeRemaining'] <= 5) & (data['shot_made_flag'] == 1)]

fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(12,7))
ax1.set_ylim(800, -50)

sns.regplot(x='loc_x', y='loc_y', data=last_5sec_misses, fit_reg=False, ax=ax1, color='r')
sns.regplot(x='loc_x', y='loc_y', data=last_5sec_scores, fit_reg=False, ax=ax2, color='b')
#In last 5 seconds, there are some desperate shots from far away, plenty of misses from 3pt line,
#but he misses a lot even from close distance.
```
Out[224]:
`<matplotlib.axes._subplots.AxesSubplot at 0x2709d2c34e0>`

In [226]:
```last_5sec_close = data[(data['timeRemaining'] <= 5) & (data['shotDistance'] <= 20)]

```
Out[226]:
```0.0    0.604317
1.0    0.395683
In [227]:
```#For comparison, accuracy from close distance when there are more than 5 seconds to go:
close_shots = data[(data['timeRemaining'] > 5) & (data['shotDistance'] <= 20)]

```
Out[227]:
```0.0    0.512264
1.0    0.487736

Period accuracy

In [228]:
```#Number of shots taken in each period
plt.figure(figsize =(12,6))
sns.countplot(x = 'period',hue = "shot_made_flag",data = data)
```
Out[228]:
`<matplotlib.axes._subplots.AxesSubplot at 0x2709d320b00>`

In [229]:
```#Accuracy
period_acc.plot(kind='barh', figsize=(12, 6))

#Seems like a period of a game doesn't influence much his accuracy.
```
Out[229]:
`<matplotlib.axes._subplots.AxesSubplot at 0x2709d6d2c18>`

Accuracy depending on shot type

In [231]:
```#Combined shot type
#Number of different kinds of shots:
plt.figure(figsize=(12,6))
```
Out[231]:
`<matplotlib.axes._subplots.AxesSubplot at 0x2709d8772e8>`

In [232]:
```#Accuracy
shot_type_acc.plot(kind='barh', figsize=(12, 6))
```
Out[232]:
`<matplotlib.axes._subplots.AxesSubplot at 0x2709d89f4e0>`

Action type

In [233]:
```#Number of Shots
plt.figure(figsize=(12,18))
```
Out[233]:
`<matplotlib.axes._subplots.AxesSubplot at 0x2709d9ec4a8>`

In [237]:
```#Accuracy:
action_type.sort_values()

action_type.sort_values().plot(kind='barh', figsize=(12, 18))
```
Out[237]:
`<matplotlib.axes._subplots.AxesSubplot at 0x2709e5cca58>`

Career accuracy

In [235]:
```#Number of shots over seasons:
plt.figure(figsize=(12,6))
`<matplotlib.axes._subplots.AxesSubplot at 0x270899022e8>`
```season_acc = data['shot_made_flag'].groupby(data['season']).mean()
`<matplotlib.axes._subplots.AxesSubplot at 0x2709ed62f28>`