continued from part 1 Classification KNeighborsClassifier In [16]: from sklearn.neighbors import KNeighborsClassifi alg_ngbh = KNeighborsClassifier(n_neighbors=3) scores = cross_validation.cross_val_score(alg_ngbh, train_data_scaled, train_data_munged[“Survived”], cv=cv, n_jobs=-1) print(“Accuracy (k-neighbors): {}/{}”.format(scores.mean(), scores.std())) Accuracy (k-neighbors): 0.7957351290684623/0.011110544261068086 SGDClassifier In [17]: from sklearn.linear_model.stochastic_gradient import SGDClassifier alg_sgd = SGDClassifier(random_state=1) scores = cross_validation.cross_val_score(alg_sgd, train_data_scaled, train_data_munged[“Survived”], cv=cv, n_jobs=-1) print(“Accuracy (sgd): {}/{}”.format(scores.mean(), scores.std())) Accuracy (sgd): 0.7239057239057239/0.015306601231185043 SVC In [18]: from sklearn.svm import SVC alg_svm = SVC(C=1.0)…

# Category: Predictive Analysis

## Submission for Kaggle’s Titanic Competition

Following is my submission for Kaggle’s Titanic Competition In [361]: import pandas as pd import numpy as np In [362]: df_train = pd.read_csv(r’C:\Users\piush\Desktop\Dataset\Titanic\train.csv’) In [363]: df_train.head(2) Out[363]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 1 2 1 1 Cumings, Mrs. John…

## Submission for Predicting Red Hat Business Value

In this competition, a classification algorithm is supposed to accurately identify which customers have the most potential business value for Red Hat based on their characteristics and activities. For more information, please visit: https://www.kaggle.com/c/predicting-red-hat-business-value In [2]: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import warnings warnings.filterwarnings(‘ignore’) Loading the dataset In [14]:…

## Predictive Analysis , Binary Classification (Cookbook) – 7

This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. It is a continuation from the previous post on assessing performance of Predictive Models. For Deployment Retrain the model on the full data set and pull out the coefficients corresponding to the best alpha—the one determined to minimize out-of-sample error, which is estimated in…

## Predictive Analysis , Binary Classification (Cookbook) – 6

This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. It is a continuation from the previous post on Pearson’s Correlation. This notebook discusses assessing performance of Predictive Models. One of the most used is the misclassification error—that is, the fraction of examples that the function pred() predicts incorrectly. Reading and Arranging data In [29]:…

## Predictive Analysis , Binary Classification (Cookbook) – 5

This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. It is a continuation from the previous post on visualizing. This notebook discusses Pearson’s Correlation. Pearson’s Correlation Calculation for Attributes 2 versus 3 and 2 versus 21 In [21]: from math import sqrt #calculate correlations between real-valued attributes dataRow2 = rocksVMines.iloc[1,0:60] dataRow3 = rocksVMines.iloc[2,0:60] dataRow21…

## Predictive Analysis , Binary Classification (Cookbook) – 4

This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. It is a continuation from the previous post on using pandas. Visualizing Parallel Coordinates Plots In [15]: for i in range(208): #assign color based on color based on “M” or “R” labels if rocksVMines.iat[i,60] == “M”: pcolor = “red” else: pcolor = “blue” #plot rows…

## Predictive Analysis , Binary Classification (Cookbook) – 3

This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. It is a continuation from the previous post on the summary statistics. Using Python Pandas to Read Data In [12]: import pandas as pd from pandas import DataFrame import matplotlib.pyplot as plot %matplotlib inline target_url = (“https://archive.ics.uci.edu/ml/machine-learning-” “databases/undocumented/connectionist-bench/sonar/sonar.all-data”) #read rocks versus mines data into pandas…