Scikit-learn – A Data Analyst

Modeling Bitcoin’s Market Capitalization

Bitcoin has been in news quite a bit lately with the price soaring. It was named the top performing currency four of the last five year. And it’ price has the potential to hit over $100,000 in 10 years, which would mark a 3,483 percent rise from its recent record high. In this post, we are […]

Countvectorizer sklearn example

This countvectorizer sklearn example is from Pycon Dublin 2016. For further information please visit this link. The dataset is from UCI. In [2]: messages = [line.rstrip() for line in open(‘smsspamcollection/SMSSpamCollection’)] In [3]: print (len(messages)) 5574 In [5]: for num,message in enumerate(messages[:10]): print(num,message) print (‘n’) 0 ham Go until jurong point, crazy.. Available only in bugis n great world la e […]

Principal Component Analysis in scikit-learn

Principal Component Analysis (PCA) is an orthogonal linear transformation that turns a set of possibly correlated variables into a new set of variables that are as uncorrelated as possible. The new variables lie in a new coordinate system such that the greatest variance is obtained by projecting the data in the first coordinate, the second […]

Naiive Bayes in scikit-learn

Naïve Bayes is a simple but powerful classifier based on a probabilistic model derived from the Bayes’ theorem. Basically it determines the probability that an instance belongs to a class based on each of the feature value probabilities. One of the most successful applications of Naïve Bayes has been within the field of Natural Language […]

Decision Trees in scikit-learn

Decision trees are very simple yet powerful supervised learning methods, which constructs a decision tree model, which will be used to make predictions. The main advantage of this model is that a human being can easily understand and reproduce the sequence of decisions (especially if the number of attributes is small) taken to predict the […]

Regression in scikit-learn

We will compare several regression methods by using the same dataset. We will try to predict the price of a house as a function of its attributes. In [6]: import numpy as np import matplotlib.pyplot as plt %pylab inline Populating the interactive namespace from numpy and matplotlib Import the Boston House Pricing Dataset In [9]: from sklearn.datasets […]

Linear Classification method with ScikitLearn

This blog is from the book and aimed to be as a learning material for myself only.Linear Classification method implements regularized linear models with stochastic gradient descent (SGD) learning. Each sample estimates the gradient of the loss at a time and the model updates along the way with a decreasing strength schedule (aka learning rate). SGD allows […]