Multi-layer Perceptron Tutorial

This post continues the neural network tutorials series and is a direct continuation of the perceptron tutorial. We will see what is a multi-layer perceptron neural network, why is it so powerful and how we can implement one. Source code of an implementation is also provided along with a small toy application example.

This tutorial continues from the last neural network tutorial, the Perceptron tutorial. We will now introduce the structure of the multi-layer perceptron and the back-propagation algorithm, without doubt the most popular neural network structure to date. If you are in a hurry and just want to mess with the code you can get it from here but I would recommend reading on to see how the network functions.

Tutorial Prerequisites

  • The reader should be familiar with the perceptron neural network
  • The reader should have a basic understanding of C/C++
  • The reader should know how to compile and run a program from the command line in Windows or Linux

Tutorial goals

  • The reader will understand the structure of the Multi-layer Perceptron neural network
  • The reader will understand the back-propagation algorithm
  • The reader will know about the wide array of applications this network is used in
  • The reader will learn all the above via an actual practical application in optical character recognition

Tutorial Body

This network was introduced around 1986 with the advent of the back-propagation algorithm. Until then there was no rule via which we could train neural networks with more than one layer. As the name implies, a Multi-layer Perceptron is just that, a network that is comprised of many neurons, divided in layers. These layers are divided as follows:

Structure of a Multilayer perceptron
Structure of a Multilayer perceptron
  • The input layer, where the input of the network goes. The number of neurons here depends on the number of inputs we want our network to get
  • One or more hidden layers. These layers come between the input and the output and their number can vary. The function that the hidden layer serves is to encode the input and map it to the output. It has been proven that a multi-layer perceptron with only one hidden layer can approximate any function that connects its input with its outputs if such a function exists.
  • The output layer, where the outcome of the network can be seen. The number of neurons here depends on the problem we want the neural net to learn

The Multi-layer perceptron differs from the simple perceptron in many ways. The same part is that of weight randomization. All weights are given random values between a certain range, usually [-0.5,0.5]. Having that aside though, for each pattern that is fed to the network three passes over the net are made. Let’s see them one by one in detail.

Calculating the output

In this phase we calculate the output of the network. For each layer, we calculate the firing value of each neuron by getting the sum of the products of the multiplications of all the neurons connected to said neuron from the previous layer and their corresponding weights. That sounded a little big though so here it is in pseudocode:

    for(int i = 0; i < previousLayerNeurons; i ++)
    value[neuron,layer] += weight(i,neuron) * value[i,layer-1];
 
    value[neuron,layer] = activationFunction(value[neuron,layer]);

As can be seen from the pseudocode, here too we have activation functions. They are used to normalize the output of each neuron and the functions that are most commonly used in the perceptron apply here too.So, we gradually propagate forward in the network until we reach the output layer, and create some output values. Just like the perceptron these values are initially completely random and have nothing to do with our goal values. But it is here that the back-propagation learning algorithm kicks in.

Back propagation

The back propagation learning algorithm uses the delta-rule. What this does is that it computes the deltas, (local gradients) of each neuron starting from the output neurons and going backwards until it reaches the input layer. To compute the deltas of the output neurons though we first have to get the error of each output neuron. That’s pretty simple, since the multi-layer perceptron is a supervised training network so the error is the difference between the network’s output and the desired output.

ej(n) = dj(n) – oj(n)

where e(n) is the error vector, d(n) is the desired output vector and o(n) is the actual output vector. Now to compute the deltas:

deltaj(L)(n) = ej(L)(n) * f'(uj(L)(n)) , for neuron j in the output layer L

where f'(uj(L)(n)) is the derivative of the value of the jth neuron of layer L

deltaj(l)(n) = f'(uj(l)(n)) Σk(deltak(l+1)(n)*wkj(l+1)(n)) , for neuron j in hidden layer l

where f'(uj(l)(n)) is the derivative of the value of the jth neuron in layer l and inside the Sum we have the products of all the deltas of the neurons of the next layer multiplied by their corresponding weights.

This part is a very important part of the delta rule and the whole essence of back propagation. Why you might ask? Because as high school math teaches us, a derivative is how much a function changes as its input changes. By propagating the derivatives backwards , we are informing all the neurons in the previous layers of the change that is needed in our weights to match the desired output. And all that starts from the initial error calculation at the output layer. Just like magic!

Weight adjustment

Having calculated the deltas for all the neurons we are now ready for the third and final pass of the network, this time to adjust the weights according to the generalized delta rule:


wji(l)(n+1) = wji(l)(n) + α * [wji(l)(n) – wji(l)(n-1)] + η * deltaj(l)(n)yi(l-1)(n)

Do not be discouraged by lots of mathematical mambo jumbo. It is actually quite simple. What the above says is:

The new weights for layer l are calculated by adding two things to the current weights. The first is the difference between the current weights and the previous weights multiplied by the coefficient we symbolize with α. This coefficient is called the momentum coefficient, and true to its name it adds speed to the training of any multi-layer perceptron by adding part of the already occurred weight changes to the current weight change. This is a double edged sword though since if your momentum constant is too large the network will not converge and it will probably get stuck in a local minima.

The other thing that adds to the weight change is the delta of the layer whose weights we change (l) multiplied by the outputs of the neurons of the previous layer (l-1) and all that multiplied by the constant η which we know to be the teaching step from the previous tutorial about the perceptron. And that is basically it! That’s what the multi-layer perceptron is all about. It is no doubt a very powerful neural network and a very powerful tool in statistical analysis.

Practical Example

It would not be a tutorial if we just explained how it works and gave you the equations. As was already mentioned the Multi-layer perceptron has many applications. Statistical analysis, pattern recognition, optical character recognition are just some of them. Our example will focus on just a simple instance of optical character recognition. Specifically the final program will be able to use an MLP to differentiate between a number of .bmp monochrome bitmap files and tell us which number each image depicts.I used 8×8 pixels resolution for the images but it is up to the reader to make his own resolutions and/or monochrome images since the program will read the size from the bitmap itself. Below you can see an example of such bitmaps.

bitmaps

They are ugly, right? Differentiating between them should be hard for a computer? This ugliness could be considered noice. And MLPs are really good at differentiating between noise and actual data that help it reach a conclusion. But let’s go on and see some code to understand how it is done.

    class MLP
    {
    private:
    std::vector&lt;float&gt; inputNeurons;
    std::vector&lt;float&gt;> hiddenNeurons;
    std::vector&lt;float&gt; outputNeurons;
    std::vector&lt;float&gt; weights;
 
    FileReader* reader;
    int inputN,outputN,hiddenN,hiddenL;
    public:
    MLP(int hiddenL,int hiddenN);
    ~MLP();
 
    //assigns values to the input neurons
    bool populateInput(int fileNum);
    //calculates the whole network, from input to output
    void calculateNetwork();
    //trains the network according to our parameters
    bool trainNetwork(float teachingStep,float lmse,float momentum,int trainingFiles);
    //recalls the network for a given bitmap file
    void recallNetwork(int fileNum);
    };

The above is our multi-layer perceptron class. As you can see it has vectors for all the neurons and their connection weights. It also contains a FileReader object. As we will see below this FileReader is a class we will make to read the bitmap files to populate our input. The functions the MLP has are similar to the perceptron. It populates its input by reading the bitmap images, calculates an output for the network and trains the network. Moreover you can recall the network for a given ‘fileNum’ image to see what number the network thinks the image represents.

    //Multi-layer perceptron constructor
    MLP::MLP(int hL,int hN)
    {
    //initialize the filereader
    reader = new FileReader();
    outputN = 10; //the 9 possible numbers and zero
    hiddenL = hL;
    hiddenN = hN;
 
    //initialize the filereader
    reader = new FileReader();
 
    //read the first image to see what kind of input will our net have
    inputN = reader->getBitmapDimensions();
    if(inputN == -1)
    {
    printf("There was an error detecting img0.bmp\n\r");
    return ;
    }
 
    //let's allocate the memory for the weights
    weights.reserve(inputN*hiddenN+(hiddenN*hiddenN*(hiddenL-1))+hiddenN*outputN);
 
    //also let's set the size for the neurons vector
    inputNeurons.resize(inputN);
    hiddenNeurons.resize(hiddenN*hiddenL);
    outputNeurons.resize(outputN);
 
    //randomize weights for inputs to 1st hidden layer
    for(int i = 0; i < inputN*hiddenN; i++)
    {
    weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)(1)) )) - 0.5 );//[-0.5,0.5]
    }
 
    //if there are more than 1 hidden layers, randomize their weights
    for(int i=1; i < hiddenL; i++)
    {
    for(int j = 0; j < hiddenN*hiddenN; j++)
    {
    weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)(1)) )) - 0.5 );//[-0.5,0.5]
    }
    }
    //and finally randomize the weights for the output layer
    for(int i = 0; i < hiddenN*outputN; i ++)
    {
    weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)(1)) )) - 0.5 );//[-0.5,0.5]
    }
    }

The network takes the number of hidden neurons and hidden layers as parameters so it can know how to initialize its neurons and weights vectors. Moreover it reads the first bitmap, ‘img0.bmp’ to take the dimensions that all the images will have as can be seen from this line:

inputN = reader->getBitmapDimensions();

That is a requirement our tutorial’s program will have. You are free to provide any bitmap size you want for the first image ‘img0.bmp’ but you are required to have all the following images be of the same size. As in most neural networks the weights are initialized in the range between [-0.5,0.5].

    void MLP::calculateNetwork()
    {
    //let's propagate towards the hidden layer
    for(int hidden = 0; hidden < hiddenN; hidden++)
    {
    hiddenAt(1,hidden) = 0;
    for(int input = 0 ; input < inputN; input ++)
    {
    hiddenAt(1,hidden) += inputNeurons.at(input)*inputToHidden(input,hidden);
    }
    //and finally pass it through the activation function
    hiddenAt(1,hidden) = sigmoid(hiddenAt(1,hidden));
    }
 
    //now if we got more than one hidden layers
    for(int i = 2; i <= hiddenL; i ++)
    {
    //for each one of these extra layers calculate their values
    for(int j = 0; j < hiddenN; j++)//to
    {
    hiddenAt(i,j) = 0;
    for(int k = 0; k < hiddenN; k++)//from
    {
    hiddenAt(i,j) += hiddenAt(i-1,k)*hiddenToHidden(i,k,j);
    }
    //and finally pass it through the activation function
    hiddenAt(i,j) = sigmoid(hiddenAt(i,j));
    }
    }
 
    int i;
    //and now hidden to output
    for(i =0; i < outputN; i ++)
    {
    outputNeurons.at(i) = 0;
    for(int j = 0; j < hiddenN; j++)
    {
    outputNeurons.at(i) += hiddenAt(hiddenL,j) * hiddenToOutput(j,i);
    }
    //and finally pass it through the activation function
    outputNeurons.at(i) = sigmoid( outputNeurons.at(i) );
    }
    }

The calculate network function just finds the output of the network that corresponds to the currently given input. It just propagates the input signals through each layer until they reach the output layer. Nothing really special with the above code, it is just an implementation of the equations that were presented above. The neural network of our tutorial as we saw in the constructor has 10 different output. Each of these output represent the possibility that the input pattern is a certain number. So, output 1 being close to 1.0 would mean that the input pattern is most certainly 1 and so on…

The training function is too big to just post it all in here, so I recommend you take a look at the .zip with the source code to see it in full. We will just focus in the implementation of the back-propagation algorithm.

    for(int i = 0; i < outputN; i ++)
    {
    //let's get the delta of the output layer
    //and the accumulated error
    if(i != target)
    {
    outputDeltaAt(i) = (0.0 - outputNeurons[i])*dersigmoid(outputNeurons[i]);
    error += (0.0 - outputNeurons[i])*(0.0-outputNeurons[i]);
    }
    else
    {
    outputDeltaAt(i) = (1.0 - outputNeurons[i])*dersigmoid(outputNeurons[i]);
    error += (1.0 - outputNeurons[i])*(1.0-outputNeurons[i]);
    }
    }
    //we start propagating backwards now, to get the error of each neuron
    //in every layer
    //let's get the delta of the last hidden layer first
    for(int i = 0; i < hiddenN; i++)
    {
    hiddenDeltaAt(hiddenL,i) = 0;//zero the values from the previous iteration
    //add to the delta for each connection with an output neuron
    for(int j = 0; j < outputN; j ++)
    {
    hiddenDeltaAt(hiddenL,i) += outputDeltaAt(j) * hiddenToOutput(i,j) ;
    }
    //The derivative here is only because of the
    //delta rule weight adjustment about to follow
    hiddenDeltaAt(hiddenL,i) *= dersigmoid(hiddenAt(hiddenL,i));
    }
 
    //now for each additional hidden layer, provided they exist
    for(int i = hiddenL-1; i >0; i--)
    {
    //add to each neuron's hidden delta
    for(int j = 0; j < hiddenN; j ++)//from
    {
    hiddenDeltaAt(i,j) = 0;//zero the values from the previous iteration
    for(int k = 0; k < hiddenN; k++)//to
    {
    //the previous hidden layers delta multiplied by the weights
    //for each neuron
    hiddenDeltaAt(i,j) += hiddenDeltaAt(i+1,k) * hiddenToHidden(i+1,j,k);
    }
    //The derivative here is only because of the
    //delta rule weight adjustment about to follow
    hiddenDeltaAt(i,j) *= dersigmoid(hiddenAt(i,j));
    }
    }

As you can see above this is the second pass over the network, the so called back-propagation as we presented it above, since we are going backwards this time. Having calculated the output and knowing the desired output (called target, in the above code) we start the delta calculation according to the equations that we saw at the start of the tutorial. If you don’t like math, then here it is for you in code. As you can see many helper macros are used to differentiate between weights of different layers and deltas.

    //Weights modification
    tempWeights = weights;//keep the previous weights somewhere, we will need them
 
    //hidden to Input weights
    for(int i = 0; i < inputN; i ++)
    {
    for(int j = 0; j < hiddenN; j ++)
    {
    inputToHidden(i,j) += momentum*(inputToHidden(i,j) - _prev_inputToHidden(i,j)) +
    teachingStep* hiddenDeltaAt(1,j) * inputNeurons[i];
    }
    }
 
    //hidden to hidden weights, provided more than 1 layer exists
    for(int i = 2; i <=hiddenL; i++)
    {
    for(int j = 0; j < hiddenN; j ++)//from
    {
    for(int k =0; k < hiddenN; k ++)//to
    {
    hiddenToHidden(i,j,k) += momentum*(hiddenToHidden(i,j,k) - _prev_hiddenToHidden(i,j,k)) +
    teachingStep * hiddenDeltaAt(i,k) * hiddenAt(i-1,j);
    }
    }
    }
 
    //last hidden layer to output weights
    for(int i = 0; i < outputN; i++)
    {
    for(int j = 0; j < hiddenN; j ++)
    {
    hiddenToOutput(j,i) += momentum*(hiddenToOutput(j,i) - _prev_hiddenToOutput(j,i)) +
    teachingStep * outputDeltaAt(i) * hiddenAt(hiddenL,j);
    }
    }
 
    prWeights = tempWeights;

And finally this is the third and final pass over the network (for each image of course), which is a forward propagation from the input layer to the output layer. Here we use the previously calculated deltas to adjust the weights of the network, to make up for the error we found at the initial calculation. This is just an implementation in code of the weight adjustment equations we saw in the theoretical part of the tutorial.

We can see the teaching step at work here. Moreover the careful reader will have noticed that we keep the previous weight vector values in a temporary vector. That is because of the momentum. If you recall, we mentioned that the momentum adds a percentage of the already applied weight change to each subsequent weight change, achieving faster training speeds. Hence the term momentum.

Well that’s actually all there is to know about the back-propagation algorithm training and the Multi-layer perceptron. Let’s take a look at the fileReader class.

    class FileReader
    {
    private:
    char* imgBuffer;
    //a DWORD
    char* check;
    bool firstImageRead;
    //the input filestream used to read
    ifstream fs;
 
    //image stuff
    int width;
    int height;
    public:
    FileReader();
    ~FileReader();
 
    bool readBitmap(int fileNum);
 
    //reads the first bitmap file, the one designated with a '0'
    //and gets the dimensions. All other .bmp are assumed with
    //equal and identical dimensions
    int getBitmapDimensions();
 
    //returns a pointer to integers with all the goals
    //that each bitmap should have. Reads it from a file
    int* getImgGoals();
 
    //returns a pointer to the currently read data
    char* getImgData();
 
    //helper function convering bytes to an int
    int bytesToInt(char* bytes,int number);
    };

This is the fileReader, class. It contains the imgBuffer, which hold the data of the currently read bitmap, the input file stream used to read the bitmaps and it also keeps the width and height of the initializer image. Seeing how the functions are implemented is out of the scope of this tutorial but you can check the code in the .zip file to see how it is done. What you need to know is that this class will read the image designated as ‘img0.bmp’ and assume all the other images will be monochrome bitmaps with the same dimensions and that all are located in the same path as the executable.

By using any image editing program, even MS Windows Paint you are able to get monochrome bitmaps.. You can create your own bitmap images, and save them like that but just remember use incrementing numbers to name the files and update goals.txt accordingly. Moreover all images should have the same dimensions.

How to use the executable
How to use the executable

Assuming you have the image bitmaps AND the goals.txt file in the same directory as the executable you can run the tutorial like you can see in the above image. It is using the cmd command line in windows, but it should work fine in Linux too. You can see how it is called by looking at the above image. If you call it incorrectly you will be prompted for correct calling.

Recalling the mlp

Any time during training (in Windows) and in Linux each 1000 epochs (for now, it is in the TODO list, to use the pdCurses library), you are able to stop and start recalling images. You are just prompted for the image number, the one coming after ‘img’ in the file name and the network recalls that image and tells you what it thinks that image represents. Afterwards as you can see from the image above you also get some percentages to know how much the network thinks the image match the numbers from 0 to 9.

Well this was it. I hope you enjoyed this tutorial and managed to comprehend the workings of the multi-layer perceptron neural network. You can find the source code and the images I used to train the network in the tutorial’s source code. I used really small dimensions , 8×8 , just so it can get trained fast. If you stick with the parameters I used above you are sure to converge. Since this network has many outputs, some of which look alike the mean square error can not go really low. That is since some numbers are almost the same, (especially the way I painted them). Specifically 7 with 4 , and 0 with 8. Still as far as picking the best matching pattern the network performs brilliantly. For least mean square error you can feel free to stop training when it goes below 0.45 or so.

As always if you have any comments about the tutorial, constructive criticism or found any bugs in the code please email me at lefteris *at* refu *dot* co

Perceptron Neural Network Tutorial

This is the first neural network tutorial focusing on the perceptron. A sample implementation along with an application on a toy example is provided. Then we proceed to analyze what are the weaknesses of the perceptron and in the next tutorial we will see how the multi-layer perceptron makes up for them

This tutorial introduces the reader to the concept of neural networks by presenting the first ever invented neural network structure, the perceptron neural network. It was proposed back in 1945 and compared with the most recent ones has a lot of drawbacks, but it is the perfect starting point for someone wanting to learn about the field. If you want you can just get the code for this small tutorial which is found in here but it would be wise to read on to understand how the perceptron works and grasp the theory behind it.

Tutorial Prerequisities

  • The reader should have a basic understanding of C/C++
  • The reader should know how to compile and run a program using any of the popular C compilers

Tutorial Goals

  • The reader will understand the concept of neural networks
  • The reader will understand how the perceptron works
  • The reader will apply the perceptron in a small toy problem application of differentiating between RGB colors.

Tutorial Body

This tutorial is written in the hope that it can be of use to people trying to learn more about artificial neural networks. A little bit of history, a little bit of what relation they have to biological neural networks and a lot of C++ source code examples of neural networks is what you can expect. As always I can not promise that the code contained in the tutorial is the best implementation of a perceptron but it is enough to make our point and to show to someone interested in neural networks a simple perceptron implementation.

A neuron
A neuron

First of all let’s see what exactly is a neural network! Neural networks are abstract mathematical models based on the way the brain works. Take for example our brain. It has 1011 neurons inside it all together forming a powerful massive parallel computing machine. These neurons all communicate with each other via links called synapses as can be seen in the picture above. Each neuron has many dendrites, these are the receptors, the parts of the neuron that accept other synapses and receives incoming signals from other neurons. Moreover each neuron has one Axon which is the part of the neuron sending out electrical signals to other neurons. As can be seen in the picture this is done in an electrochemical way and further explanation is beyond the scope of this tutorial. For anyone really interested in the inner workings of the brain I would recommend the book Neuroscience, Exploring the brain by Mark F. Bear, Barry W.Connors and Michael A. Paradiso. It is a very well written book and explains everything in a way that even non medical students, like myself, can understand them.

The perceptron
The perceptron

As I already said the correlation between artificial neural networks and the brain stops at neurons and their connections. From there and on they are two quite different machines. Artificial neural networks (here and on abbreviated as ANN) were first introduced by McCulloch and Pitts with the introduction of the first ANN, the perceptron. The perceptron has quite a simple structure. In its basic form it is comprised of one neuron as can be seen in the picture above. It is given many inputs and they are all connected with the neuron with synapses which have a corresponding weight on them (w1 to w4). These weights define the strength of each connection, that means how much will the particular connection contribute to the final result that the neuron will produce. So what a neuron does is produce a weighted sum of its inputs. Once that is done the perceptron neuron passes this result through an activation function in order to get a more normalized and smooth result.

Frequently used activation functions are:

  • The Threshold function, f(x) = 1, if x>=0 and 0 if it is not
  • The Simgoid function, f(x) = 1/(1+e-x)
  • The hyperbolic tangent function, f(x) = (e2x-1)/(e2x+1)

The perceptron as we already said computes a weighted sum of its inputs, but how does it learn? How does it know what each input pattern corresponds to? The answer is that it does not! You, or someone else who will act as a teacher, a supervisor, hence the name supervised learning will teach it. The way this is done is that each input pattern (each collection of Xi in the above diagram) is associated with a target. The function that connects the input and the target output is what the perceptron must find. The way it accomplishes this is by this very simple rule:
W(n) = W(n+1) + η(d(n)-y(n))*x(n)
Above W(n) is the old weights vector, W(n+1) is the new weights vector η is a user-defined constant called the teaching step, d(n) is the target vector, y(n) is the actual output of the network and x(n) is well … you guessed it the corresponding input!

That was the theory behind the perceptron. But who likes theories? What I want to see is some code, right? Well here we go then. We will try to solve a simple problem.

    int ourInput[] = {
    //RED GREEN BLUE CLASS
    0, 0, 255, CLASS_BLUE,
    0, 0, 192, CLASS_BLUE,
    243, 80, 59, CLASS_RED,
    255, 0, 77, CLASS_RED,
    77, 93, 190, CLASS_BLUE,
    255, 98, 89, CLASS_RED,
    208, 0, 49, CLASS_RED,
    67, 15, 210, CLASS_BLUE,
    82, 117, 174, CLASS_BLUE,
    168, 42, 89, CLASS_RED,
    248, 80, 68, CLASS_RED,
    128, 80, 255, CLASS_BLUE,
    228, 105, 116, CLASS_RED
    };

This is an array with our example’s inputs. They are RGB color values and a corresponding class. The classes are just two, CLASS_RED if the color is predominantly RED and CLASS_BLUE if the color is predominantly BLUE. Pretty simple huh? Now let’s head on to create a perceptron which will be able to differentiate between these two classes. Below you can see our perceptron class.

    enum activationFuncs {THRESHOLD = 1, SIGMOID, HYPERBOLIC_TANGENT};
    class Perceptron
    {
    private:
    std::vector&lt;float&gt; inputVector; //a vector holding the perceptron's inputs
    std::vector&lt;float&gt; weightsVector;//a vector holding the corresponding inputs weights.
    int activationFunction;
    public:
    Perceptron(int inputNumber,int function);//the constructor
    void inputAt(int inputPos,float inputValue);//the input population function
    float calculateNet();//the activation function type
    void adjustWeights(float teachingStep, float output, float target);
    float recall(float red,float green,float blue);//a recall for our example program
    };

It has inputs, the weights we mentioned and an activation function. The network is initialized with random weights between -0.5 and 0.5 . Since our inputs have RGB values, which range from 0 to 255 it is a good idea to normalize them, which means to give them a corresponding value between 0 and 1.0 . Let’s take a look at how to do these in code. This is a snippet from the main function of the program:

 
 
    //let's create a perceptron with 3 inputs,
    //using the sigmoid as the activation function
    Perceptron ann(3,SIGMOID);
    float mse = 999;
    int epochs = 0;
    //The training of the neural network
    while(fabs(mse-LEASTMEANSQUAREERROR)>0.0001)
    {
    mse = 0;
    float error = 0;
    inputCounter = 0;
    //Run through all 13 input patterns, what we call an EPOCH
    for(int j= 0; j < inputPatterns; j++)
    {
    for(int k=0; k< 3; k++)//give the 3 RGB values to the network
    {
    ann.inputAt(k,normalize(ourInput[inputCounter]));
    inputCounter++;
    }
    //let's get the output of this particular RGB pattern
    output = ann.calculateNet();
    error += fabs(ourInput[inputCounter]-output); //let's add the error for this iteration to the total error
    //and let's adjust the weughts according to that error
    ann.adjustWeights(TEACHINGSTEP,output,ourInput[inputCounter]);
    inputCounter++;//next pattern
    }
 
    mse = error/inputPatterns; //Compute the mean square error for this epoch
    printf("The mean square error of %d epoch is %.4f \r\n",epochs,mse);
    epochs++;
    }

What can we see here? This is the training of the perceptron. While the mean square error (mse) is greater than the defined least mean square error we are iterating through all the input patterns. For each input pattern we calculate the output of the neural network with the current weight assigned to it. Then we compute the absolute difference of that output and the actual desired output. Subsequently we adjust the weights according to the rule we shown above and proceed to the next input pattern. As we already said this goes on until the mean square error reaches the desired magntitude.

When that happens our network is considered sufficiently trained. Since our toy problem has little input and it is an easy problem to solve the chosen least mean square error is 0.0001. The smaller mean square error your network gets to, the better it knows how to solve your problem for the data you trained it with. Be aware though that this does not mean that it’s better at solving that particular problem. By giving a very small mean square error you run the risk of over-training your network and as a result leading it to recognize only the patterns you give as input and making mistakes at all other patterns. If that happens then the network can not generalize over the wide array of all your input patterns. Which means your network has not learned the problem correctly.

Enough with that, now let’s head on to recalling the network with various values input by the user.

 
 
    int R,G, B;
    char reply = ' ';
    while(reply != 'N')
    {
    printf("Give a RED value (0-255)\n\r");
    cin>>R;
    printf("Give a GREEN value (0-255)\n\r");
    cin>>G;
    printf("Give a BLUE value (0-255)\n\r");
    cin>>B;
    result = ann.recall(normalize(R),normalize(G),normalize(B));
    if(result > 0.5)
    printf("The value you entered belongs to the BLUE CLASS\n\r");
    else
    printf("The value you entered belongs to the RED CLASS\n\r");
 
    printf("Do you want to continue with trying to recall values from the perceptron?");
    printf("\n\r Press any key for YES and 'N' for no, to exit the program\n\r");
    cin>>reply;
    }

Well here you can easily see that the user can enter values continuously and get a reply from the neural network. It will correctly assign all values if sufficiently trained EXCEPT for those which are very close to the edge between blue and red even if it has been trained to do so. That is a very important deficiency that the perceptron has. It can only solve linearly separable problems, that is problems whose different solutions can be divided by a straight line as can be seen in the picture below. If a problem can be so nicely and linearly classified all is well and the perceptron can do the job for us. If not then bad things will happen

Linear separation
Linear separation

This was shown by Marvin Minsky and as he wrote in his book Perceptrons(1969), a Perceptron can not even solve a problem as simple as the XOR problem, since it is not linearly separable. His book lead to the so called AI winter which lead AI research away from the research of neural networks, considered useless after the bashing of perceptrons. Fortunately that lasted only until 1986 when neural networks came back into mainstream AI with the introduction of Multi-Layer Perceptrons and the back-propagation learning rule which makes up for the deficiency of the simple perceptron. You can read about them in the multi-layer perceptron tutorial

The source code of the perceptron tutorial can be downloaded from here. All it needs is compiling and you can watch the perceptron in action or play around with your own parameters by tweaking the various defines in main.c. As always the usual disclaimer of me stating that this might not be the best and optimal way to implement this applies. I would be delighted if people actually got intrigued about neural networks from this tutorial and were inspired to delve deeper into AI.

Please do feel free to email me with any comments, advice or constructive criticism at: lefteris *at* refu *dot* co and stay tuned for a multi-layer perceptron tutorial which will be coming soon