Kobe Bryant

Kaggle Tutorial using Kobe Bryant Dataset – Part 4

Posted on Posted in Kaggle

Exploring the data

In [215]:
#Shot accuracy
sns.countplot('shot_made_flag',data = data)
Out[215]:
<matplotlib.axes._subplots.AxesSubplot at 0x270898aa780>
In [216]:
data['shot_made_flag'].value_counts() / data['shot_made_flag'].shape
#He scores around 45% of his shots.
Out[216]:
0.0    0.553839
1.0    0.446161
Name: shot_made_flag, dtype: float64
In [218]:
# Let's see his attempts depending on the seconds to the end of a period:
data['timeRemaining'].plot(kind='hist', bins=24, xlim=(720, 0), figsize=(12,6),
                            title='Attempts made over time\n(seconds to the end of period)')
Out[218]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709186a710>
In [219]:
# Accuracy of those shots:
time_bins = np.arange(0, 721, 30)
attempts_in_time = pd.cut(data['timeRemaining'], time_bins, right=False)
grouped = data.groupby(attempts_in_time)
prec = grouped['shot_made_flag'].mean()

prec[::-1].plot(kind='bar', figsize=(12, 6), ylim=(0.2, 0.5), 
                title='Shot accuracy over time\n(seconds to the end of period)')
Out[219]:
<matplotlib.axes._subplots.AxesSubplot at 0x27089b2ca90>
In [220]:
#Lots of attempts in last 30 seconds, and much worse accuracy than usual. Let's explore that more.
#Shots in the last seconds of a period

last_30 = data[data['timeRemaining'] < 30]
last_30['shot_made_flag'].value_counts() / last_30['shot_made_flag'].shape
#In the last 30 seconds he scores only about 33% of his shots. Pressure?
Out[220]:
0.0    0.666305
1.0    0.333695
Name: shot_made_flag, dtype: float64
In [221]:
#Let's explore what happens in those last minutes of the game.
last_2min = data[data['timeRemaining'] <= 120]

last_2min['timeRemaining'].plot(kind='hist', bins=30, xlim=(120, 0), figsize=(12,6),
                            title='Attempts made over time\n(seconds to the end of period)')
#Ok, this explains things a bit. Plenty of last seconds desperate shots. 
#Let's return to last 30 seconds.
Out[221]:
<matplotlib.axes._subplots.AxesSubplot at 0x270934a8780>
In [222]:
#Let's return to last 30 seconds.
last_30['timeRemaining'].plot(kind='hist', bins=10, xlim=(30, 0), figsize=(12,6),
                            title='Attempts made over time\n(seconds to the end of period)')
Out[222]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709ce4c208>
In [224]:
last_5sec_misses = data[(data['timeRemaining'] <= 5) & (data['shot_made_flag'] == 0)]
last_5sec_scores = data[(data['timeRemaining'] <= 5) & (data['shot_made_flag'] == 1)]


fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(12,7))
ax1.set_ylim(800, -50)

sns.regplot(x='loc_x', y='loc_y', data=last_5sec_misses, fit_reg=False, ax=ax1, color='r')
sns.regplot(x='loc_x', y='loc_y', data=last_5sec_scores, fit_reg=False, ax=ax2, color='b')
#In last 5 seconds, there are some desperate shots from far away, plenty of misses from 3pt line,
#but he misses a lot even from close distance.
Out[224]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709d2c34e0>
In [226]:
last_5sec_close = data[(data['timeRemaining'] <= 5) & (data['shotDistance'] <= 20)]

last_5sec_close['shot_made_flag'].value_counts() / last_5sec_close['shot_made_flag'].shape
Out[226]:
0.0    0.604317
1.0    0.395683
Name: shot_made_flag, dtype: float64
In [227]:
#For comparison, accuracy from close distance when there are more than 5 seconds to go:
close_shots = data[(data['timeRemaining'] > 5) & (data['shotDistance'] <= 20)]

close_shots['shot_made_flag'].value_counts() / close_shots['shot_made_flag'].shape
Out[227]:
0.0    0.512264
1.0    0.487736
Name: shot_made_flag, dtype: float64

Period accuracy

In [228]:
#Number of shots taken in each period
plt.figure(figsize =(12,6))
sns.countplot(x = 'period',hue = "shot_made_flag",data = data)
Out[228]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709d320b00>
In [229]:
#Accuracy
period_acc = data['shot_made_flag'].groupby(data['period']).mean()
period_acc.plot(kind='barh', figsize=(12, 6))

#Seems like a period of a game doesn't influence much his accuracy.
Out[229]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709d6d2c18>

Accuracy depending on shot type

In [231]:
#Combined shot type
#Number of different kinds of shots:
plt.figure(figsize=(12,6))
sns.countplot(x="combined_shot_type", hue="shot_made_flag", data=data)    
Out[231]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709d8772e8>
In [232]:
#Accuracy
shot_type_acc = data['shot_made_flag'].groupby(data['combined_shot_type']).mean()
shot_type_acc.plot(kind='barh', figsize=(12, 6))
Out[232]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709d89f4e0>

Action type

In [233]:
#Number of Shots
plt.figure(figsize=(12,18))
sns.countplot(y="action_type", hue="shot_made_flag", data=data)
Out[233]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709d9ec4a8>
In [237]:
#Accuracy:
action_type = data['shot_made_flag'].groupby(data['action_type']).mean()
action_type.sort_values()

action_type.sort_values().plot(kind='barh', figsize=(12, 18))
Out[237]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709e5cca58>

Career accuracy

In [235]:
#Number of shots over seasons:
plt.figure(figsize=(12,6))
sns.countplot(x="season", hue="shot_made_flag", data=data)
Out[235]:
<matplotlib.axes._subplots.AxesSubplot at 0x270899022e8>
In [238]:
season_acc = data['shot_made_flag'].groupby(data['season']).mean()
season_acc.plot(figsize=(12, 6), title='Accuracy over seasons')
Out[238]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709ed62f28>

Some Wikipedia insight on what happened with season 2013-14, and possible explanation for the big decline in his last seasons:

On April 12 [2013], Bryant suffered a torn Achilles tendon against the Golden State Warriors, ending his [2012-13] season. (…) Bryant resumed practicing starting in November, after the start of the 2013–14 season. (…) Bryant resumed playing on December 8 [2013] after missing the season’s first 19 games. On December 17, Bryant matched his season high of 21 points in a 96–92 win over Memphis, but he suffered a lateral tibial plateau fracture in his left knee that was expected to sideline him for six weeks. (…) On March 12, 2014, the Lakers ruled Bryant out for the remainder of the season, citing his need for more rehab and the limited time remaining in the season.

So I guess he never fully recovered from his injuries, at least when it comes to shot accuracy.

Season freshness

In [239]:
#Number of shots each month:
plt.figure(figsize=(12,6))
sns.countplot(x="game_month", hue="shot_made_flag", data=data)
Out[239]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709ecae320>
In [241]:
#Accuracy:
game_month = data['shot_made_flag'].groupby(data['game_month']).mean()
game_month.plot(kind='barh', figsize=(12, 6))
Out[241]:
<matplotlib.axes._subplots.AxesSubplot at 0x2709edbfd68>

Almost the same performance troughout the season – just slightly worse accuracy at the start (month 10) and at the end (month 6) of the season, but those months have much less games than other months.

Weekday

In [242]:
plt.figure(figsize=(12,6))
sns.countplot(x="game_day", hue="shot_made_flag", data=data)
Out[242]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a0579a20>
In [243]:
#Accuracy:
game_day = data['shot_made_flag'].groupby(data['game_day']).mean()
game_day.plot(kind='barh', figsize=(12, 6))
Out[243]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a071e630>

Again no noticeable difference

Regular season vs playoffs

In [244]:
#Number of shots:
plt.figure(figsize=(12,6))
sns.countplot(x="playoffs", hue="shot_made_flag", data=data)
Out[244]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a08a4358>
In [245]:
#Accuracy:
playoffs = data['shot_made_flag'].groupby(data['playoffs']).mean()
playoffs.plot(kind='barh', figsize=(12, 2), xlim=(0, 0.50))
Out[245]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a09e79b0>

No difference between regular season and playoffs.

Shot distance

In [246]:
#First let's create categories of distances, each 3ft long.
distance_bins = np.append(np.arange(0, 31, 3), 300) 
distance_cat = pd.cut(data['shotDistance'], distance_bins, right=False)

dist_data = data.loc[:, ['shotDistance', 'shot_made_flag']]
dist_data['distance_cat'] = distance_cat

distance_cat.value_counts(sort=False)
Out[246]:
[0, 3)       5613
[3, 6)       1080
[6, 9)       1728
[9, 12)      1718
[12, 15)     2324
[15, 18)     3362
[18, 21)     3319
[21, 24)     1735
[24, 27)     3954
[27, 30)      683
[30, 300)     181
dtype: int64
In [247]:
#Number of shots in each distance category:
plt.figure(figsize=(12,6))
sns.countplot(x="distance_cat", hue="shot_made_flag", data=dist_data)
Out[247]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a08aab00>

Small number of shots in [21, 24) because that’s just inside of 3pt line – better to step outside and try going for 3pt.

In [248]:
#Accuracy by distance category:
dist_prec = dist_data['shot_made_flag'].groupby(dist_data['distance_cat']).mean()
dist_prec.plot(kind='bar', figsize=(12, 6))
Out[248]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a0c398d0>

Shot zones

In [249]:
#Shot zone area
#Number of shots:
plt.figure(figsize=(12,6))
sns.countplot(x="shot_zone_area", hue="shot_made_flag", data=data)
Out[249]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a0c6ca58>
In [251]:
#Accuracy:
shot_area = data['shot_made_flag'].groupby(data['shot_zone_area']).mean()
shot_area.plot(kind='barh', figsize=(12, 6))
Out[251]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a0e16c50>

He’s most accurate from the center, but what’s interesting is that he’s slightly more accurate from the right side.

In [253]:
#Shot zone basic
#Number of shots:
plt.figure(figsize=(12,6))
sns.countplot(x="shot_zone_basic", hue="shot_made_flag", data=data)
Out[253]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a11c3160>
In [254]:
#Accuracy:
shot_basic = data['shot_made_flag'].groupby(data['shot_zone_basic']).mean()
shot_basic.plot(kind='barh', figsize=(12, 6))
Out[254]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a132f908>

We have seen that he’s more accurate from right-hand side, but when it comes to corners – left corner suits him slightly better.

Home game vs away

In [256]:
#Number of shots:
plt.figure(figsize=(12,6))
sns.countplot(x="homeGame", hue="shot_made_flag", data=data)
Out[256]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a0e776a0>
In [258]:
#Accuracy:
shot_basic = data['shot_made_flag'].groupby(data['homeGame']).mean()
shot_basic.plot(kind='barh', figsize=(12, 2))
Out[258]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a14b8630>

Slightly more accurate in front of his home crowd.

Opponents

In [260]:
#Number of shots:
plt.figure(figsize=(12,16))
sns.countplot(y="opponent", hue="shot_made_flag", data=data)
Out[260]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a14af6a0>
In [261]:
#Accuracy:
opponent = data['shot_made_flag'].groupby(data['opponent']).mean()
opponent.sort_values().plot(kind='barh', figsize=(12,10))
Out[261]:
<matplotlib.axes._subplots.AxesSubplot at 0x270a1b3ef60>

Leave a Reply

Your email address will not be published. Required fields are marked *