This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. It is a continuation from the previous post on visualizing. This notebook discusses Pearson’s Correlation. Pearson’s Correlation Calculation for Attributes 2 versus 3 and 2 versus 21 In [21]: from math import sqrt #calculate correlations between real-valued attributes dataRow2 = rocksVMines.iloc[1,0:60] dataRow3 = rocksVMines.iloc[2,0:60] dataRow21…

# Category: Data Analysis Resources

## Predictive Analysis , Binary Classification (Cookbook) – 4

This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. It is a continuation from the previous post on using pandas. Visualizing Parallel Coordinates Plots In [15]: for i in range(208): #assign color based on color based on “M” or “R” labels if rocksVMines.iat[i,60] == “M”: pcolor = “red” else: pcolor = “blue” #plot rows…

## Predictive Analysis , Binary Classification (Cookbook) – 3

This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. It is a continuation from the previous post on the summary statistics. Using Python Pandas to Read Data In [12]: import pandas as pd from pandas import DataFrame import matplotlib.pyplot as plot %matplotlib inline target_url = (“https://archive.ics.uci.edu/ml/machine-learning-” “databases/undocumented/connectionist-bench/sonar/sonar.all-data”) #read rocks versus mines data into pandas…

## Predictive Analysis , Binary Classification (Cookbook) – 2

This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. It is a continuation from the previous post. Summary Statistics for Numeric and Categorical Attributes In [6]: import numpy as np #generate summary statistics for column 3 (e.g.) col = 3 colData = [] for row in xList: colData.append(float(row[col])) colArray = np.array(colData) colMean = np.mean(colArray)…

## Predictive Analysis , Binary Classification (Cookbook) – 1

This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. Importing and sizing up a New Data Set The file is comma delimited, with the data for one experiment occupying one line of text. This makes it a simple matter to read a line, split it on the comma delimiters, and stack the resulting…

## Running Your First Notebook – Apache Spark

This notebook will show you how to install the course libraries, create your first Spark cluster, and test basic notebook functionality. To move through the notebook just run each of the cells. You will not need to solve any problems to complete this lab. You can run a cell by pressing “shift-enter”, which will compute the current cell and advance…

## Facebook Data Analysis

In [20]: import pandas as pd import numpy as np In [ ]: # Take few samples for the visualization sample_fbcheckin_train_tbl = fbcheckin_train_tbl[:10000].copy() In [21]: df = pd.read_csv(‘train.csv’, index_col=’row_id’) In [22]: df.head() Out[22]: x y accuracy time place_id row_id 0 0.7941 9.0809 54 470702 8523065625 1 5.9567 4.7968 13 186555 1757726713 2 8.3078 7.0407 74 322648 1137537235 3 7.3665 2.5165 65 704587 6567393236 4 4.0961…

## Time Series Forecast using Kobe Bryant Dataset

This script is my attempt for time series analysis. Pandas has dedicated libraries for handling TS objects, particularly the datatime64[ns] class which stores time information and allows us to perform some operations really fast. In [40]: import pandas as pd import numpy as np #import matplotlib.pylab as plt #%matplotlib inline import seaborn as sns #from matplotlib.pylab import rcParams #rcParams[‘figure.figsize’] = 15,…