A history of Machine Learning

Machine learning inspired systems are now being used in wide range of fields to solve a whole variety of problems. Machine learning facilitates the use of information to work smarter and not harder. Also, the history of machine learning is constantly changing and actively growing.

Described below are some important events and research in artificial intelligence to map out the history of machine learning.

In 1943, Warren McCulloch and Walter Pitts presented the first recognised work in the field of artificial intelligence. McCulloch had previously done research on central nervous system and brought his knowledge to propose a model of artificial neural networks in which each selected neuron was in a binary state of either “on” or “off”. They simulated both theoretical and experimental work to model a brain in a laboratory and showed that a simple network can learn. However, experiments proved that their model was incorrect.

In 1950, Alan Turing wrote one of the earliest and significant papers on machine intelligence called ‘Computing machinery and intelligence”. In the paper, he provided semantic arguments to the question “Can machine pass a behaviour test for intelligence”. According to him, an appearance of a system is irrelevant to its intelligence. He described the intelligent behaviour as an ability to achieve human-level performance in cognitive activities .

In 1958, John McCarthy presented a paper called ‘Program with common sense” and the first complete knowledge-based system called “Advice Taker”. The system was able to search for solutions to general problems of the world and demonstrated a plan generated by a program based on some simple axioms. It was also able to gain knowledge without being planned and accepted new knowledge. It was able to represent knowledge and reasoning.

In 1962, Frank Rosenblatt proved perceptron conveyance theorem and his algorithm classified elements in into two groups based on classification rules.

In 1959, Allen Newell and Herbert Simon developed a project called General Problem Solver (GPS). GPS was able to suggest methods for a human problem from data based on a technique referred as mean-ends analysis. Means- ends analysis determined difference between current state and desired state or goal state. The chosen sets of operators were applied to reach the goal state. These sets of operators determined the solution plan. However, it failed when applied to real world problems.

In 1965, Lofti A. Zadeh published his famous paper ‘Fuzzy Sets’. He suggested fuzzy sets as extended classical sets were with elements having a degree of membership.

However by late 1960s, there was a realization that claims made were unrealistic and the problem domains for intelligent machines were much reduced.

Starting in 1965 and over the next few years, Buchanan, Feigenbaum and Lederberg developed the first expert system called ‘Dendral’ to analyse chemicals.

There was a paradigm shift from universal purpose, knowledge scarce, general methods for solving broad class problems to area specific and knowledge intense technologies. Yet increased emphasis on logical, knowledge-based approach led to machine learning out of favour in the early 1970s. Also, there was a lack of finance for computers or workstations to model and experiment with artificial neural networks.

However, expert systems at that time were very narrow in the domain of ability, rigid, weak, difficult to confirm, difficult to verify and had little ability to learn. A need was for more than just a reasoning system or expert system shell with enough rules in it. Hence,in the 1980s, there was a revival in brain-like information processing.

In 1980, Grossberg established the principle of self-organisation or adaptive resonance theory to offer a basis for a new class of neural networks.

In 1982, Hopfield introduced Hopfield networks which are neural networks with feedback.

In 1982, Kohonen published self-organised maps.

In 1983, Barto, Sutton and Anderson published work on reinforcement learning and its application in control.

In 1986, Rumelhart and McClelland developed the most important technique for training multi-layer perceptron and re-invented back-propagation learning .

In 1987, Schank wrote about the core issues of artificial intelligence to program computers for certain traits as the following:

Description of Knowledge: Abundant information about the world such as access to objects, categories, properties and relations between all of them.
Decoding: Ability to translate from the real world
Deduction: Initiate common sense, reasoning and problem-solving
Framework: Decide, carry out and control of the contexts.
Indexing: Organize and label memory and occurrences in memory
Estimate and Recovery: Proficiency to make predictions about events, explain and recover in case of failure
Dynamic Adaption: Capability to change as result of experiences over time
Generalisation: Form and test an overview from experience.
Curiosity: Capacity to wonder why to generate a good question about what is happening and the aptitude to explain what has happened.
Creativity: Competence to connect experiences that are not obviously connectable.

In 1988, Broomhead and Lowe discovered an alternative to multilayer perceptron with a procedure to design a layered feedforward network using radial basis function .

With all these advancements it was possible for technology to have a more natural interaction with the real world. Systems were able to learn, adjust to changes in problem’s environment, show patterns in situations where rules were not known and deal with incomplete information. Still there was a problem of lack of explanation facilities and many retraining caused serious difficulties. Training for systems was slow because there was no way to deal with knowledge or data which was vague, imprecise or uncertain.

Various fields of artificial intelligence use fuzzy logic to deal with this issue. It is a concept of linguistic variables with values as words which capture meanings of words, human reasoning and decision-making. Fuzzy logic models had few advantages such as improved computation power, cognitive modelling and ability to represent multiple experts. Still the models depend on rules extracted from experts.

In 2003, according to Russell and Norvig, many of the innovations or tools such as graphical user interfaces, time-sharing, rapid development, object-oriented programming, and computer mouse adopted by mainstream computer science are no longer considered a part of artificial intelligence. This is known as AI effect. AI effect is when a problem solved successfully by artificial intelligence is regarded as not real intelligence.

In 2005, Hinton and Yann LeCun showed advancements in neural nets called deep learning and effectiveness of unsupervised learning in multiplayer neural networks. Better algorithms, more available data, and faster computers made the older concepts work better.

Therefore,nowadays expert, neural and fuzzy systems complemented each other and applied together to solve a broad range of different problems. Interaction of expert systems with fuzzy logic and neural networks improve adaptability, stability, fault tolerance, speed and make computing more human.

Tagged Machine Learning