In this article, I grouped some of the popular machine learning algorithms either by learning or problem type. There is a brief description of how these algorithms work and their potential use case.
Regression
How it works:
A regression uses the historical relationship between an independent and a dependent variable to predict the future values of the dependent variable. It refines the model iteratively based on the error generated by the model.
Some example algorithms in the regression are:
- Ordinary least squares regression (OLSR)
- Linear Regression
- Logistic regression
- Multivariate adaptive regression splines (MARS)
- Stepwise regression
- Locally Estimated Scatterplot Smoothing (LOESS)
- Jackknife Regression
Regression provides a good way to study more than one variables. However, there are certain restrictions and assumptions and do certain preliminary tests by analysing data carefully.
Regularization
How it works
Regularization is adding a penalty term to the objective function to explicitly control the model complexity.
Some example algorithms using regularization are:
- Ridge Regression
- Least Absolute Shrinkage and Selection Operator(LASSO)
- Elastic Net
- Least- Angle Regression (LARS)
It solves an ill-posed problem or prevent overfitting.
Clustering methods
How these work
Clustering methods are typically organized by the modelling approaches such as centroid-based and hierarchical. A centroid is a data point (imaginary or real) at the centre of a cluster while hierarchical clustering seeks to build a hierarchy of clusters. These methods organize data into groups by assessing the similarity in the structure of input data.
Some example algorithms in clustering are:
- K-means
- Single-Linkage clustering
- K-medians
- Hierarchical Clustering
- Fuzzy Clustering
- DBSCAN
- Expectation maximization (EM)
- Gaussian mixture models (GMM)
- DBSCAN
- OPTICS algorithm
- Non-negative Matrix Factorization
- Latent Dirichlet allocation (LDA)
Clustering is good if there are many groups in data and similarity measures are needed. It classifies objects into groups, such that the objects belonging to one group are much more like each other.
Dimensionality reduction
How it works
Dimensionality reduction methods work iteratively on the data structure in an unsupervised means to decrease the dimensions and bring more relevant dimensions forward. It can be divided into feature selection and feature extraction.
Some example dimensionality reduction methods are:
- Principal component analysis (PCA)
- Principal Component Regression (PCR)
- Projection pursuit (PP)
- Partial least squares (PLS) regression
- Sammon mapping
- Multidimensional scaling (MDS)
- Discriminant Analysis (MDA, ODA, FDA)
This technique is usually used to simplify high-dimensional data and then apply a supervised learning technique. It reduces the time and storage space required and makes it easier to visualize the data when reduced to very low dimensions such as 2D or 3D. Removal of multi-collinearity also improves the machine learning model’s performance.
Decision tree
How it works
Decision tree based algorithms are based on conditional probabilities and define models that are iteratively or recursively constructed based on the data provided. They generate rules. A rule is a conditional statement that can easily be understood by humans and easily used within a database to identify a set of records. The goal is to predict the value of a target variable given a set of input variables.
Some examples of decision tree based algorithms are:
- Random forest
- Conditional Decision Trees
- Classification and Regression Tree (CART)
- 5 and C5.0
- Iterative Dichotomizer 3 (ID3)
- Gradient boosting machines (GBM)
- Chi-Squared Automatic Interaction Detection (CHAID)
- Decision stump
- Multivariate adaptive regression splines (MARS)
The Decision Tree algorithm produces accurate and interpretable models with relatively little user intervention. The algorithm can be used for both classification and regression problems.
Bayesian methods
How these work
Bayesian methods are those that explicitly apply the Bayesian inference theorem and facilitate conditional probability in modelling. Bayes’ Theorem include a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data. It finds the probability of an event occurring given the probability of another event that has already occurred.
image
Some examples of Bayesian method based algorithms are:
- Naïve Bayes
- Gaussian Naïve Bayes
- Multinomial Naïve Bayes
- Averaged one-dependence estimators (AODE)
- Bayesian belief network (BBN)
- Hidden Markov Models
- Conditional Random fields (CRFs)
The Bayesian method based algorithms afford fast, highly scalable model building and scoring. These solve both classification and regression problems and scale linearly with the number of predictors and rows.
Kernel methods
How these works
Kernel methods are concerned with pattern analysis which includes various mapping techniques. These methods require only a user-specified kernel, i.e., a similarity function over pairs of data points in raw representation.
Some examples of kernel method based algorithms are:
- Support Vector Machines
- Linear discriminant analysis (LDA)
Kernel methods based models have the well-founded theoretical approach to regularization. These can model complex, real-world problems such as text and image classification, hand-writing recognition and bio sequence analysis. These also performs well on data sets that have many attributes, even if there are very few cases on which to train the model.
Artificial neural networks (ANN)
How they work
Artificial neural networks are a class of pattern matching techniques inspired by the structure biological neural networks. These are typically organized in layers made up of many interconnected ‘nodes’ which contain an ‘activation function’. Patterns are presented to the network via the ‘input layer’, which communicates to one or more ‘hidden layers’ where the actual processing is done via a system of weighted ‘connections’. The hidden layers then link to an ‘output layer’.
Some examples of Artificial neural networks are:
- Learning vector quantization (LVQ)
- Self-organizing maps (SOM)
- Hopfield network
- Perceptron
- Backpropagation
- Radial Basis Function Network (RBFN)
- Autoencoders
- Boltzmann Machines
- Spiking Neural Networks
These are again used to solve classifications and regression problems, especially where the relationships may be quite dynamic or non-linear. These provide an analytical alternative to conventional techniques which are often limited by strict assumptions of normality, linearity, variable independence etc.
Instance based
These are also called memory-based learning algorithms.
How it works
Instances are nothing but subsets of datasets and instance based methods do not use the training data to train at all. They simply work on an identified instance or groups of instances that are critical to the problem. The results across instances are compared using a similarity measure to find the best match and predict an instance of new data as well.
Some examples of the instance based learning algorithms are:
- k-Nearest Neighbour (k-NN)
- Locally Weighted Learning (LWL)
- Learning vector quantization (LVQ)
- Self-organizing maps (SOM)
Instance based learning is a lazy learning method because it waits to see a test case before it does any computation at all. Methods estimate using only the training examples that are relative to it. This is useful when a target function is very complex but can be expressed by breaking it down into less complex generalizations.
Association rule based
How it works
Algorithms extract and define rules that best explain observed relationships between variables in a dataset. Rules demonstrate experienced based learning.
Some examples of Association rule based algorithms are:
- The Apriori algorithm
- The Eclat algorithm
- FP-Growth
These rules can be used to discover hidden relationships in large multidimensional datasets.
Ensemble methods
How these work
These methods encompass multiple models that are built independently and then combined to produce improved results. The subset of models that are combined is sometimes referred to as weaker models as the results of these models need not completely fulfil the expected outcome in isolation.
Some examples of the ensemble method algorithms are:
- Random forest
- Bagging
- AdaBoost
- Bootstrapped Aggregation (Boosting)
- Stacked generalization (blending)
- Gradient boosting machines (GBM)
- Gradient boosted Regression (GBR)
This is a very powerful and widely adopted class of techniques. Ensemble methods usually produces more accurate solutions than a single model. It is important to identify what independent models are to be combined or included, how the results need to be combined, and in what way to achieve the required result.
References
http://ci.columbia.edu/ci/premba_test/c0331/s7/s7_6.html
https://www.quora.com/What-is-regularization-in-machine-learning
https://www.quora.com/What-are-some-of-the-advantages-and-disadvantages-of-regression-analysis
https://www.researchoptimus.com/article/cluster-analysis.php
https://www.researchgate.net/post/What_are_Bayesian_methods_of_data_analysis
http://cse-wiki.unl.edu/wiki/index.php/Instance-Based_Learning
https://www.toptal.com/machine-learning/ensemble-methods-machine-learning
https://gerardnico.com/wiki/data_mining/knn
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2967184
http://blogs.sas.com/content/subconsciousmusings/files/2017/04/machine-learning-cheet-sheet.png
https://www.dezyre.com/article/top-10-machine-learning-algorithms/202
http://blog.echen.me/2011/04/27/choosing-a-machine-learning-classifier/
https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-choice