Categories
Data Analysis Resources Machine Learning scikit-learn

Installing XGBoost for Windows – walk-through

I have the following specification on my computer: Windows10, 64 bit,Python 3.5 and Anaconda3.I tried many times to install XGBoost but somehow it never worked for me. Today I decided to make it happen and am sharing this post to help anyone else who is struggling with installing XGBoost for Windows. XGBoost is short for […]

Categories
Data Analysis Resources

Exploratory Data Analysis with pandas – 1

This post is exploratory data analysis with pandas – 1. Clear data plots that explicate the relationship between variables can lead to the creation of newer and better features that can predict more than the existing ones. Exploratory Data Analysis, which can be effective if it has the following characteristics: • It should be fast, allowing […]

Categories
Data Analysis Resources Kaggle Machine Learning

Facebook Data Analysis

In [20]: import pandas as pd import numpy as np In [ ]: # Take few samples for the visualization sample_fbcheckin_train_tbl = fbcheckin_train_tbl[:10000].copy() In [21]: df = pd.read_csv(‘train.csv’, index_col=’row_id’) In [22]: df.head() Out[22]: x y accuracy time place_id row_id 0 0.7941 9.0809 54 470702 8523065625 1 5.9567 4.7968 13 186555 1757726713 2 8.3078 7.0407 74 322648 1137537235 3 7.3665 2.5165 […]

Categories
Machine Learning

Evaluating Machine Learning Algorithms

This blog contains notes for me to understand how to evaluate machine learning algorithms . I want to see how models compare and contrast to each other. This is from the following web page: Your First Machine Learning Project in Python Step-By-Step I am evaluating 6 different algorithms in this blog : Logistic Regression (LR) […]

Categories
Data Analysis Resources Kaggle

Time Series Forecast using Kobe Bryant Dataset

This script is my attempt for time series analysis. Pandas has dedicated libraries for handling TS objects, particularly the datatime64[ns] class which stores time information and allows us to perform some operations really fast. In [40]: import pandas as pd import numpy as np #import matplotlib.pylab as plt #%matplotlib inline import seaborn as sns #from matplotlib.pylab […]

Categories
Kaggle

Tutorial using Kobe Bryant Dataset – Part 4

This part is a Tutorial using Kobe Bryant Dataset – Part 4. You can get the data from https://www.kaggle.com/c/kobe-bryant-shot-selection . What excited me was that this dataset is excellent to practice classification basics, feature engineering, and time series analysis. This is continued from here. Exploring the data In [215]: #Shot accuracy sns.countplot(‘shot_made_flag’,data = data) Out[215]: <matplotlib.axes._subplots.AxesSubplot […]

Categories
Kaggle

Tutorial using Kobe Bryant Dataset – Part 3

This part is a kaggle tutorial using Kobe Bryant Dataset – Part 3. You can get the data from https://www.kaggle.com/c/kobe-bryant-shot-selection . What excited me was that this dataset is excellent to practice classification basics, feature engineering, and time series analysis. This is continued from here #columns not needed notNeeded = [] In [183]: #Action type column […]

Categories
Kaggle

Kaggle Tutorial using Kobe Bryant Dataset – Part 1

This part is a kaggle tutorial using Kobe Bryant Dataset – Part 1. You can get the data from https://www.kaggle.com/c/kobe-bryant-shot-selection . What excited me was that this dataset is excellent to practice classification basics, feature engineering, and time series analysis. Importing Data Let us start with importing the basic libraries we need and the data […]