Perceptron Neural Network Tutorial

This is the first neural network tutorial focusing on the perceptron. A sample implementation along with an application on a toy example is provided. Then we proceed to analyze what are the weaknesses of the perceptron and in the next tutorial we will see how the multi-layer perceptron makes up for them

This tutorial introduces the reader to the concept of neural networks by presenting the first ever invented neural network structure, the perceptron neural network. It was proposed back in 1945 and compared with the most recent ones has a lot of drawbacks, but it is the perfect starting point for someone wanting to learn about the field. If you want you can just get the code for this small tutorial which is found in here but it would be wise to read on to understand how the perceptron works and grasp the theory behind it.

Tutorial Prerequisities

  • The reader should have a basic understanding of C/C++
  • The reader should know how to compile and run a program using any of the popular C compilers

Tutorial Goals

  • The reader will understand the concept of neural networks
  • The reader will understand how the perceptron works
  • The reader will apply the perceptron in a small toy problem application of differentiating between RGB colors.

Tutorial Body

This tutorial is written in the hope that it can be of use to people trying to learn more about artificial neural networks. A little bit of history, a little bit of what relation they have to biological neural networks and a lot of C++ source code examples of neural networks is what you can expect. As always I can not promise that the code contained in the tutorial is the best implementation of a perceptron but it is enough to make our point and to show to someone interested in neural networks a simple perceptron implementation.

A neuron
A neuron

First of all let’s see what exactly is a neural network! Neural networks are abstract mathematical models based on the way the brain works. Take for example our brain. It has 1011 neurons inside it all together forming a powerful massive parallel computing machine. These neurons all communicate with each other via links called synapses as can be seen in the picture above. Each neuron has many dendrites, these are the receptors, the parts of the neuron that accept other synapses and receives incoming signals from other neurons. Moreover each neuron has one Axon which is the part of the neuron sending out electrical signals to other neurons. As can be seen in the picture this is done in an electrochemical way and further explanation is beyond the scope of this tutorial. For anyone really interested in the inner workings of the brain I would recommend the book Neuroscience, Exploring the brain by Mark F. Bear, Barry W.Connors and Michael A. Paradiso. It is a very well written book and explains everything in a way that even non medical students, like myself, can understand them.

The perceptron
The perceptron

As I already said the correlation between artificial neural networks and the brain stops at neurons and their connections. From there and on they are two quite different machines. Artificial neural networks (here and on abbreviated as ANN) were first introduced by McCulloch and Pitts with the introduction of the first ANN, the perceptron. The perceptron has quite a simple structure. In its basic form it is comprised of one neuron as can be seen in the picture above. It is given many inputs and they are all connected with the neuron with synapses which have a corresponding weight on them (w1 to w4). These weights define the strength of each connection, that means how much will the particular connection contribute to the final result that the neuron will produce. So what a neuron does is produce a weighted sum of its inputs. Once that is done the perceptron neuron passes this result through an activation function in order to get a more normalized and smooth result.

Frequently used activation functions are:

  • The Threshold function, f(x) = 1, if x>=0 and 0 if it is not
  • The Simgoid function, f(x) = 1/(1+e-x)
  • The hyperbolic tangent function, f(x) = (e2x-1)/(e2x+1)

The perceptron as we already said computes a weighted sum of its inputs, but how does it learn? How does it know what each input pattern corresponds to? The answer is that it does not! You, or someone else who will act as a teacher, a supervisor, hence the name supervised learning will teach it. The way this is done is that each input pattern (each collection of Xi in the above diagram) is associated with a target. The function that connects the input and the target output is what the perceptron must find. The way it accomplishes this is by this very simple rule:
W(n) = W(n+1) + η(d(n)-y(n))*x(n)
Above W(n) is the old weights vector, W(n+1) is the new weights vector η is a user-defined constant called the teaching step, d(n) is the target vector, y(n) is the actual output of the network and x(n) is well … you guessed it the corresponding input!

That was the theory behind the perceptron. But who likes theories? What I want to see is some code, right? Well here we go then. We will try to solve a simple problem.

    int ourInput[] = {
    0, 0, 255, CLASS_BLUE,
    0, 0, 192, CLASS_BLUE,
    243, 80, 59, CLASS_RED,
    255, 0, 77, CLASS_RED,
    77, 93, 190, CLASS_BLUE,
    255, 98, 89, CLASS_RED,
    208, 0, 49, CLASS_RED,
    67, 15, 210, CLASS_BLUE,
    82, 117, 174, CLASS_BLUE,
    168, 42, 89, CLASS_RED,
    248, 80, 68, CLASS_RED,
    128, 80, 255, CLASS_BLUE,
    228, 105, 116, CLASS_RED

This is an array with our example’s inputs. They are RGB color values and a corresponding class. The classes are just two, CLASS_RED if the color is predominantly RED and CLASS_BLUE if the color is predominantly BLUE. Pretty simple huh? Now let’s head on to create a perceptron which will be able to differentiate between these two classes. Below you can see our perceptron class.

    enum activationFuncs {THRESHOLD = 1, SIGMOID, HYPERBOLIC_TANGENT};
    class Perceptron
    std::vector<float> inputVector; //a vector holding the perceptron's inputs
    std::vector<float> weightsVector;//a vector holding the corresponding inputs weights.
    int activationFunction;
    Perceptron(int inputNumber,int function);//the constructor
    void inputAt(int inputPos,float inputValue);//the input population function
    float calculateNet();//the activation function type
    void adjustWeights(float teachingStep, float output, float target);
    float recall(float red,float green,float blue);//a recall for our example program

It has inputs, the weights we mentioned and an activation function. The network is initialized with random weights between -0.5 and 0.5 . Since our inputs have RGB values, which range from 0 to 255 it is a good idea to normalize them, which means to give them a corresponding value between 0 and 1.0 . Let’s take a look at how to do these in code. This is a snippet from the main function of the program:

    //let's create a perceptron with 3 inputs,
    //using the sigmoid as the activation function
    Perceptron ann(3,SIGMOID);
    float mse = 999;
    int epochs = 0;
    //The training of the neural network
    mse = 0;
    float error = 0;
    inputCounter = 0;
    //Run through all 13 input patterns, what we call an EPOCH
    for(int j= 0; j < inputPatterns; j++)
    for(int k=0; k< 3; k++)//give the 3 RGB values to the network
    //let's get the output of this particular RGB pattern
    output = ann.calculateNet();
    error += fabs(ourInput[inputCounter]-output); //let's add the error for this iteration to the total error
    //and let's adjust the weughts according to that error
    inputCounter++;//next pattern
    mse = error/inputPatterns; //Compute the mean square error for this epoch
    printf("The mean square error of %d epoch is %.4f \r\n",epochs,mse);

What can we see here? This is the training of the perceptron. While the mean square error (mse) is greater than the defined least mean square error we are iterating through all the input patterns. For each input pattern we calculate the output of the neural network with the current weight assigned to it. Then we compute the absolute difference of that output and the actual desired output. Subsequently we adjust the weights according to the rule we shown above and proceed to the next input pattern. As we already said this goes on until the mean square error reaches the desired magntitude.

When that happens our network is considered sufficiently trained. Since our toy problem has little input and it is an easy problem to solve the chosen least mean square error is 0.0001. The smaller mean square error your network gets to, the better it knows how to solve your problem for the data you trained it with. Be aware though that this does not mean that it’s better at solving that particular problem. By giving a very small mean square error you run the risk of over-training your network and as a result leading it to recognize only the patterns you give as input and making mistakes at all other patterns. If that happens then the network can not generalize over the wide array of all your input patterns. Which means your network has not learned the problem correctly.

Enough with that, now let’s head on to recalling the network with various values input by the user.

    int R,G, B;
    char reply = ' ';
    while(reply != 'N')
    printf("Give a RED value (0-255)\n\r");
    printf("Give a GREEN value (0-255)\n\r");
    printf("Give a BLUE value (0-255)\n\r");
    result = ann.recall(normalize(R),normalize(G),normalize(B));
    if(result > 0.5)
    printf("The value you entered belongs to the BLUE CLASS\n\r");
    printf("The value you entered belongs to the RED CLASS\n\r");
    printf("Do you want to continue with trying to recall values from the perceptron?");
    printf("\n\r Press any key for YES and 'N' for no, to exit the program\n\r");

Well here you can easily see that the user can enter values continuously and get a reply from the neural network. It will correctly assign all values if sufficiently trained EXCEPT for those which are very close to the edge between blue and red even if it has been trained to do so. That is a very important deficiency that the perceptron has. It can only solve linearly separable problems, that is problems whose different solutions can be divided by a straight line as can be seen in the picture below. If a problem can be so nicely and linearly classified all is well and the perceptron can do the job for us. If not then bad things will happen

Linear separation
Linear separation

This was shown by Marvin Minsky and as he wrote in his book Perceptrons(1969), a Perceptron can not even solve a problem as simple as the XOR problem, since it is not linearly separable. His book lead to the so called AI winter which lead AI research away from the research of neural networks, considered useless after the bashing of perceptrons. Fortunately that lasted only until 1986 when neural networks came back into mainstream AI with the introduction of Multi-Layer Perceptrons and the back-propagation learning rule which makes up for the deficiency of the simple perceptron. You can read about them in the multi-layer perceptron tutorial

The source code of the perceptron tutorial can be downloaded from here. All it needs is compiling and you can watch the perceptron in action or play around with your own parameters by tweaking the various defines in main.c. As always the usual disclaimer of me stating that this might not be the best and optimal way to implement this applies. I would be delighted if people actually got intrigued about neural networks from this tutorial and were inspired to delve deeper into AI.

Please do feel free to email me with any comments, advice or constructive criticism at: lefteris *at* refu *dot* co and stay tuned for a multi-layer perceptron tutorial which will be coming soon