Scikit-learn – A Data Analyst

Discover AWS Serverless way to Deploy Food Recipe Application in 7 Easy Steps

This post shows AWS serverless architecture to deploy a machine learning application. I will deploy the food recipe application as discussed here using AWS Fargate. Fargate is a service provided by AWS Serverless computing that removes the need to provision and manage servers. You can specify and pay for resources per application and improves security […]

Modeling Bitcoin’s Market Capitalization

Bitcoin has been in news quite a bit lately with the price soaring. It was named the top performing currency four of the last five year. And it’ price has the potential to hit over $100,000 in 10 years, which would mark a 3,483 percent rise from its recent record high. In this post, we are […]

Countvectorizer sklearn example

This countvectorizer sklearn example is from Pycon Dublin 2016. For further information please visit this link. The dataset is from UCI. In [2]: messages = [line.rstrip() for line in open(‘smsspamcollection/SMSSpamCollection’)] In [3]: print (len(messages)) 5574 In [5]: for num,message in enumerate(messages[:10]): print(num,message) print (‘n’) 0 ham Go until jurong point, crazy.. Available only in bugis n great world la e […]

Features Selection for determining House Prices

Home values are influenced by many factors. Basically, there are two major aspects: The environmental information, including location, local economy, school district, air quality, etc. The characteristics information of the property, such as lot size, house size and age, the number of rooms, heating / AC systems, garage, and so on. When people consider buying […]

Installing XGBoost for Windows – walk-through

I have the following specification on my computer: Windows10, 64 bit,Python 3.5 and Anaconda3.I tried many times to install XGBoost but somehow it never worked for me. Today I decided to make it happen and am sharing this post to help anyone else who is struggling with installing XGBoost for Windows. XGBoost is short for […]

Evaluating Algorithms using MNIST

This post is evaluating Aagorithms using MNIST In [1]: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import warnings warnings.filterwarnings(‘ignore’) In [2]: # importing the train dataset train = pd.read_csv(r’C:UserspiushDesktopDatasetDigitRecognizertrain.csv’) train.head(10) Out[2]: label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 … pixel774 pixel775 pixel776 pixel777 […]

Principal Component Analysis in scikit-learn

Principal Component Analysis (PCA) is an orthogonal linear transformation that turns a set of possibly correlated variables into a new set of variables that are as uncorrelated as possible. The new variables lie in a new coordinate system such that the greatest variance is obtained by projecting the data in the first coordinate, the second […]

Naiive Bayes in scikit-learn

Naïve Bayes is a simple but powerful classifier based on a probabilistic model derived from the Bayes’ theorem. Basically it determines the probability that an instance belongs to a class based on each of the feature value probabilities. One of the most successful applications of Naïve Bayes has been within the field of Natural Language […]

Decision Trees in scikit-learn

Decision trees are very simple yet powerful supervised learning methods, which constructs a decision tree model, which will be used to make predictions. The main advantage of this model is that a human being can easily understand and reproduce the sequence of decisions (especially if the number of attributes is small) taken to predict the […]

Regression in scikit-learn

We will compare several regression methods by using the same dataset. We will try to predict the price of a house as a function of its attributes. In [6]: import numpy as np import matplotlib.pyplot as plt %pylab inline Populating the interactive namespace from numpy and matplotlib Import the Boston House Pricing Dataset In [9]: from sklearn.datasets […]

Category: Scikit-learn