How computers learn
This article is intended for non-experts who want to have a basic understanding of how machine learning works.
All the programs we run on our computers are made up of algorithms. An algorithm is a sequence of instructions that the computer executes to accomplish a specific task. From simpler tasks such as additions to more complex ones such as learning algorithms.
Some history
Learning algorithms are based on mathematical methodologies that have been developed since 1935 within a branch of applied mathematics called operational research.
The birth of neural networks, the algorithms on which modern learning techniques are based, can be traced back to 1958 when Frank Rosenblatt in the book Psychological Review introduces the first neural network scheme, called Perceptron.
Subsequently, neural networks evolved to perform very complex tasks such as image recognition, natural language processing, autonomous vehicle driving.
[en:User:Cburnett]
Example of Neural Network
The image above shows a simple architecture of a neural network, where we find:
an input layer, where the neural network is fed with the information to be used to perform processing; if we are performing an image recognition our input layer will be fed with the images to be recognized;
one or more hidden layers where processing is carried out to look for the best coefficients that allow the neural network to recognize the image as accurately as possible;
a layer of output that will contain the result of the recognition. If the neural network has multiple output nodes it can recognize multiple image classes; in this case with 2 output nodes, for example, it might recognize whether a car or bus is represented in the image.
Training a Neural Network
In order to be able to recognize the images, the neural network must be “trained”, training a neural network for image recognition means feeding them a large number of images manually classified by someone, to allow the neural network to find the right coefficients that will later allow you to recognize new images that contain the same types of objects.
Going back to the example of cars and buses, to train our neural network to recognize when there is a car or bus in an image you will need to have a large number of images labeled with a 0 (zero) where there is neither a car nor a bus, with a 1 where there is a car and with 2 where there is a bus.
The neural network will automatically modify the coefficients to minimize the error in recognition; following, in a simplified way, the steps that a neural network performs to minimize the error during the training process:
- Randomly initializes weights to associate with input variables
- Propagates input variables and weights into the layers of the neural network, traversing all hidden layers
- Each layer calculates a (nonlinear) function of the input and changes the weights
- In the end, we compare the output obtained from the neural network and the expected output (referring to the classification done manually) and minimize the error by linear calculation methods (derived functions) by propagating the data obtained from the layer of output to the neural network input layer
- Repeat from step 2 to step 4 until you are satisfied with the result.
Neural Networks challenges
Repeating training cycles does not lead to an infinite improvement in neural network performance because at some point the network will start to be very precise on the training data but will not be able to accurately recognize the new images that are given in input, this phenomenon is called overfitting.
In addition to overfitting, there are other challenges in improving the performance of a neural network such as the lack of sufficient training data, the choice of architectures not suitable for solving the problem we are considering ( number of hidden layers, number of nodes in the layers, activation functions).
Nowadays neural networks can be used for most supervised machine learning activities but there are many other algorithms that can be used for several purposes.