Submission for Predicting Red Hat Business Value

Posted Leave a commentPosted in Kaggle, Predictive Analysis

In this competition, a classification algorithm is supposed to accurately identify which customers have the most potential business value for Red Hat based on their characteristics and activities. For more information, please visit: https://www.kaggle.com/c/predicting-red-hat-business-value In [2]: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import warnings […]

facebook

Facebook Data Analysis

Posted Leave a commentPosted in Data Analysis Resources, Kaggle, Machine Learning

In [20]: import pandas as pd import numpy as np In [ ]: # Take few samples for the visualization sample_fbcheckin_train_tbl = fbcheckin_train_tbl[:10000].copy() In [21]: df = pd.read_csv(‘train.csv’, index_col=’row_id’) In [22]: df.head() Out[22]: x y accuracy time place_id row_id 0 0.7941 9.0809 54 470702 8523065625 1 5.9567 4.7968 13 186555 1757726713 2 8.3078 7.0407 74 322648 1137537235 3 7.3665 2.5165 […]

time series with dataset

Time Series Forecast using Kobe Bryant Dataset

Posted Leave a commentPosted in Data Analysis Resources, Kaggle

This script is my attempt for time series analysis. Pandas has dedicated libraries for handling TS objects, particularly the datatime64[ns] class which stores time information and allows us to perform some operations really fast. In [40]: import pandas as pd import numpy as np #import matplotlib.pylab as plt #%matplotlib inline import seaborn as sns #from matplotlib.pylab […]

Kobe Bryant

Kaggle Tutorial using Kobe Bryant Dataset – Part 4

Posted Leave a commentPosted in Kaggle

Exploring the data In [215]: #Shot accuracy sns.countplot(‘shot_made_flag’,data = data) Out[215]: <matplotlib.axes._subplots.AxesSubplot at 0x270898aa780> In [216]: data[‘shot_made_flag’].value_counts() / data[‘shot_made_flag’].shape #He scores around 45% of his shots. Out[216]: 0.0 0.553839 1.0 0.446161 Name: shot_made_flag, dtype: float64 In [218]: # Let’s see his attempts depending on the seconds to the end of a period: data[‘timeRemaining’].plot(kind=’hist’, bins=24, xlim=(720, 0), figsize=(12,6), title=’Attempts […]

Kobe Bryant

Kaggle Tutorial using Kobe Bryant Dataset – Part 3

Posted Leave a commentPosted in Kaggle

#columns not needed notNeeded = [] In [183]: #Action type column print(df[‘action_type’].unique()) [‘Jump Shot’ ‘Driving Dunk Shot’ ‘Layup Shot’ ‘Running Jump Shot’ ‘Driving Layup Shot’ ‘Reverse Layup Shot’ ‘Reverse Dunk Shot’ ‘Slam Dunk Shot’ ‘Turnaround Jump Shot’ ‘Tip Shot’ ‘Running Hook Shot’ ‘Alley Oop Dunk Shot’ ‘Dunk Shot’ ‘Alley Oop Layup shot’ ‘Running Dunk Shot’ ‘Driving […]

Kobe Bryant

Kaggle Tutorial using Kobe Bryant Dataset – Part 2

Posted Leave a commentPosted in Kaggle

The following presents a thought process of creating and debugging ML algorithm for predicting whether a shot is successfull or missed (binary classification problem). Top 20 most important features According to RandomForestClassifier In [22]: model = RandomForestClassifier() model.fit(X, Y) feature_imp = pd.DataFrame(model.feature_importances_, index=X.columns, columns=[“importance”]) feat_imp_20 = feature_imp.sort_values(“importance”, ascending=False).head(20).index feat_imp_20 Out[22]: Index([‘shot_id’, ‘shot_distance’, ‘action_type#Jump Shot’, ‘home_play’, ‘action_type#Layup […]

Kobe Bryant

Kaggle Tutorial using Kobe Bryant Dataset – Part 1

Posted Leave a commentPosted in Kaggle

This is a kaggle tutorial. You can get the data from https://www.kaggle.com/c/kobe-bryant-shot-selection . What excited me was that this dataset is excellent to practice classification basics, feature engineering, and time series analysis. Importing Data Let us start with importing the basic libraries we need and the data set. In [1]: import numpy as np import pandas […]