Features selection for determining House Prices ?

Posted on Leave a commentPosted in Kaggle, Predictive Analysis, scikit-learn

Home values are influenced by many factors. Basically, there are two major aspects: The environmental information, including location, local economy, school district, air quality, etc. The characteristics information of the property, such as lot size, house size and age, the number of rooms, heating / AC systems, garage, and so on. When people consider buying […]



visualization of house prices

Visualisation of House Prices

Posted on 1 CommentPosted in Data Analysis Resources, Kaggle

Visualisation is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. This visualisation of house prices is for the Kaggle dataset. With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, […]



4 different ways to predict survival on Titanic - part 4

4 different ways to predict survival on Titanic – part 4

Posted on Leave a commentPosted in Data Analysis Resources, Kaggle

continued from part 3 4. Way to predict survival on Titianic These notes are taken from this link In [2]: import matplotlib.pyplot as plt %matplotlib inline import numpy as np import pandas as pd import statsmodels.api as sm from statsmodels.nonparametric.kde import KDEUnivariate from statsmodels.nonparametric import smoothers_lowess from pandas import Series, DataFrame from patsy import dmatrices from […]



4 different ways to predict survival on Titanic – part 3

Posted on Leave a commentPosted in Data Analysis Resources, Kaggle

continued from part 2 3. Way to predict survival on Titianic These notes are from this link I – Exploratory data analysis We tweak the style of this notebook a little bit to have centered plots. In [1]: from IPython.core.display import HTML HTML(“”” <style> .output_png { display: table-cell; text-align: center; vertical-align: middle; } </style> “””) Out[1]: […]



4 different ways to predict survival on Titanic - part 4

4 different ways to predict survival on Titanic – part 2

Posted on Leave a commentPosted in Data Analysis Resources, Kaggle, Predictive Analysis

continued from part 1 Classification KNeighborsClassifier In [16]: from sklearn.neighbors import KNeighborsClassifi alg_ngbh = KNeighborsClassifier(n_neighbors=3) scores = cross_validation.cross_val_score(alg_ngbh, train_data_scaled, train_data_munged[“Survived”], cv=cv, n_jobs=-1) print(“Accuracy (k-neighbors): {}/{}”.format(scores.mean(), scores.std())) Accuracy (k-neighbors): 0.7957351290684623/0.011110544261068086 SGDClassifier In [17]: from sklearn.linear_model.stochastic_gradient import SGDClassifier alg_sgd = SGDClassifier(random_state=1) scores = cross_validation.cross_val_score(alg_sgd, train_data_scaled, train_data_munged[“Survived”], cv=cv, n_jobs=-1) print(“Accuracy (sgd): {}/{}”.format(scores.mean(), scores.std())) Accuracy (sgd): 0.7239057239057239/0.015306601231185043 SVC In [18]: from sklearn.svm […]



Evaluating Algorithms using Kaggle’s Digit Recognizer Data

Posted on Leave a commentPosted in Kaggle, Machine Learning, scikit-learn

In [1]: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import warnings warnings.filterwarnings(‘ignore’) In [2]: # importing the train dataset train = pd.read_csv(r’C:\Users\piush\Desktop\Dataset\DigitRecognizer\train.csv’) train.head(10) Out[2]: label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 … pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783 0 […]



Submission for Kaggle's Titanic Competition

Submission for Kaggle’s Titanic Competition

Posted on 1 CommentPosted in Kaggle, Machine Learning, Predictive Analysis

Following is my submission for Kaggle’s Titanic Competition In [361]: import pandas as pd import numpy as np In [362]: df_train = pd.read_csv(r’C:\Users\piush\Desktop\Dataset\Titanic\train.csv’) In [363]: df_train.head(2) Out[363]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 1 2 […]