Categories

## Predicting NBA winners with Decision Trees and Random Forests in Scikit-learn

In this blog, we will be predicting NBA winners with Decision Trees and Random Forests in Scikit-learn.The National Basketball Association (NBA) is the major men’s professional basketball league in North America and is widely considered to be the premier men’s professional basketball league in the world. It has 30 teams (29 in the United States and […]

Categories

## Modeling Bitcoin’s Market Capitalization

Bitcoin has been in news quite a bit lately with the price soaring. It was named the top performing currency four of the last five year. And it’ price has the potential to hit over \$100,000 in 10 years, which would mark a 3,483 percent rise from its recent record high. In this post, we are […]

Categories

## Countvectorizer sklearn example

This countvectorizer sklearn example is from Pycon Dublin 2016. For further information please visit this link. The dataset is from UCI. In [2]: messages = [line.rstrip() for line in open(‘smsspamcollection/SMSSpamCollection’)] In [3]: print (len(messages)) 5574 In [5]: for num,message in enumerate(messages[:10]): print(num,message) print (‘\n’) 0 ham Go until jurong point, crazy.. Available only in bugis n great world la e […]

Categories

## Principal Component Analysis in scikit-learn

Principal Component Analysis (PCA) is an orthogonal linear transformation that turns a set of possibly correlated variables into a new set of variables that are as uncorrelated as possible. The new variables lie in a new coordinate system such that the greatest variance is obtained by projecting the data in the first coordinate, the second […]

Categories

## Naiive Bayes in scikit-learn

Naïve Bayes is a simple but powerful classifier based on a probabilistic model derived from the Bayes’ theorem. Basically it determines the probability that an instance belongs to a class based on each of the feature value probabilities. One of the most successful applications of Naïve Bayes has been within the field of Natural Language […]

Categories

## Decision Trees in scikit-learn

Decision trees are very simple yet powerful supervised learning methods, which constructs a decision tree model, which will be used to make predictions. The main advantage of this model is that a human being can easily understand and reproduce the sequence of decisions (especially if the number of attributes is small) taken to predict the […]

Categories

## Support Vector Machine in scikit-learn- part 2

continued from part 1 In [8]: print_faces(faces.images, faces.target, 400) Training a Support Vector Machine Support Vector Classifier (SVC) will be used for classification The SVC implementation has different important parameters; probably the most relevant is kernel, which defines the kernel function to be used in our classifier In [10]: from sklearn.svm import SVC svc_1 = SVC(kernel=’linear’) print […]

Categories

## Support Vector Machine in scikit-learn – part 1

Support Vector Machines has become one of the state-of-the-art machine learning models for many tasks with excellent results in many practical applications. One of the greatest advantages of Support Vector Machines is that they are very effective when working on high-dimensional spaces, that is, on problems which have a lot of features to learn from. […]