## 4 different ways to predict survival on Titanic – part 1

These are my notes from various blogs to find different ways to predict survival on Titanic using Python-stack. I am interested to compare how different people have attempted the kaggle competition. I am going to compare and contrast different analysis to find similarity and difference in approaches to predict survival on Titanic. This Notebook will […]

## 4 different ways to predict survival on Titanic – part 2

continued from part 1 Classification KNeighborsClassifier In [16]: from sklearn.neighbors import KNeighborsClassifi alg_ngbh = KNeighborsClassifier(n_neighbors=3) scores = cross_validation.cross_val_score(alg_ngbh, train_data_scaled, train_data_munged[“Survived”], cv=cv, n_jobs=-1) print(“Accuracy (k-neighbors): {}/{}”.format(scores.mean(), scores.std())) Accuracy (k-neighbors): 0.7957351290684623/0.011110544261068086 SGDClassifier In [17]: from sklearn.linear_model.stochastic_gradient import SGDClassifier alg_sgd = SGDClassifier(random_state=1) scores = cross_validation.cross_val_score(alg_sgd, train_data_scaled, train_data_munged[“Survived”], cv=cv, n_jobs=-1) print(“Accuracy (sgd): {}/{}”.format(scores.mean(), scores.std())) Accuracy (sgd): 0.7239057239057239/0.015306601231185043 SVC In [18]: from sklearn.svm […]

## Dublin R El Dorado Competition

Quick Summary 5,000 data points with pseudo-geological information (including proven gold reserves). 50 (or more) new sites up for auction with limited data. Each team starts with \$50,000,000.00 budget to bid with. Blind, sealed-bid auctions for rights to mine the parcel of land (min bid \$100,000.00) Auctions happen in order by parcel_id Extraction costs are […]

## Web Server Log Analysis with Spark

Web Server Log Analysis with Spark This lab will demonstrate how easy it is to perform web server log analysis with Apache Spark. Server log analysis is an ideal use case for Spark. It’s a very large, common data source and contains a rich set of information. Spark allows you to store your logs in […]

## Building a word count application in Spark

These are my solutions for Apache Spark. Building a word count application in Spark This lab will build on the techniques covered in the Spark tutorial to develop a simple word count application. The volume of unstructured text in existence is growing dramatically, and Spark is an excellent tool for analyzing this type of data. […]