# Neural Networks: A Brief Introduction and Intuition

The fundamental idea behind all neural networks is this: Each neuron in a neural network makes a *decision. *Once you understand how they do that, everything else will make sense. Let’s walk through a simple situation which will help us arrive at that understanding.

Let’s say you are trying to decide whether or not to wear a hat today. There are a number of factors which will affect your decision, and perhaps the most important ones are:

- Is it sunny?
- Do I have a hat to wear?
- Would a hat suit my outfit?

For simplicity, we’ll assume these are the only three factors that you’re weighing up during this decision. Forgetting about neural networks for a second, let’s just try to build a ‘decision maker’ to help us answer this question.

First, we can see each question has a certain level of importance, and so we’ll need to use this relative importance of each question, along with the corresponding answer to each question, to make our decision.

Secondly, we’ll need to have some component which interprets each (yes or no) answer along with its importance to produce the final answer. This sounds simple enough to put into an equation, right? Let’s do it. We simply decide how important each factor is and multiply that importance (or ‘weight’) by the answer to the question (which can be 0 or 1):

The numbers 3, 5 and 2 are the ‘weights’ of question *a*, *b* and *c*, respectively. *a*, *b* and *c*, themselves can be either zero (the answer to the question was ‘no’), or one (the answer to the question was ‘yes’). If the above equation is true, then the decision is to wear a hat, and if it is false, the decision is to *not* wear a hat. The equation says that we’ll only wear a hat if the sum of our *weights* multiplied by our *factors* is greater than some *threshold value*. Above, I chose a threshold value of 6. If you think about it, this means that if I don’t have a hat to wear (b=0), no matter what the other answers are, I won’t be wearing a hat today. That is,

is *never* true, since a and c are only either 0 or 1. This makes sense – our simple decision model tells us not to wear a hat if we don’t have one! So the weights of 3, 5 and 2, and the threshold value of 6 seem like a good choices for our simple “should I wear a hat” decision-maker. It also means that, as long as I have a hat to wear, the sun shining (a=1) OR the hat suiting my outfit (c=1) is enough to make me wear a hat today. That is,

and

are both true. Good! You can see that by adjusting the weighting of each factor and the threshold, and by adding more factors, we can adjust our ‘decision maker’ to approximately model any decision-making process. What we have just demonstrated is the functionality of a simple neuron (a decision-maker!). Let’s put the above equation into ‘neuron-form’:

**Fig 1:** A neuron which processes 3 factors: a, b, c, with corresponding importance weightings of 3, 5, 2, and with a decision threshold of 6.

The neuron has 3 input connections (the factors) and 1 output connection (the decision). Each input connection has a weighting which encodes the importance of that connection. If the weighting of that connection is low (relative to the other weights), then it won’t have much effect on the decision. If it’s high, the decision will heavily depend on it.

This is great, we’ve got a fully working neuron that weights inputs and makes decisions. So here’s the next though: What if the output (our decision) was fed into the input of another neuron? That neuron would be using our decision about our hat to make a *more abstract* decision. And what if the inputs *a*, *b* and *c* are themselves the outputs of other neurons which compute lower-level decisions? We can see that neural networks can be interpreted as networks which compute *decisions about decisions*, leading from simple input data to more and more complex ‘meta-decisions’. This, to me, is an incredible concept. All the complexity of even the human brain can be modelled using these principles. From the level of photons interacting with our cone-cells right up to our pondering of the meaning of life, it’s just simple little decision-making neurons.

Below is a diagram of a simple neural network which essentially has 3 layers of abstraction:

**Fig 2:** A simple neural network with 2 inputs and 2 outputs.

As an example, the above inputs could be 2 infrared distance sensors, and the outputs might control control the on/off switch for 2 motors which drive the wheels of a robot.

In our simple hat example, we could pick the weights and the threshold quite easily, but how do we pick the weights and thresholds in this example so that, say, the robot can follow things that move? And how do we know how many neurons we need to solve this problem? Could we solve it with just 1 neuron, maybe 2? Or do we need 20? And how do we organise them? In layers? Modules? These questions are *the* questions in the field of neural networks. Techniques such as ‘backpropagation’ and (more recently) ‘neuroevolution’ are used effectively to answer some of these troubling questions, but these are outside the scope of this introduction – Wikipedia and Google Scholar and free online textbooks like “Neural Networks and Deep Learning” by M. A. Nielsen are great places to start learning about these concepts.

Hopefully you now have some intuition for how neural networks work, but if you’re interested in actually implementing a neural network there are a few optimisations and extensions to our concept of a neuron which.will make our neural nets more efficient and effective.

Firstly, notice that if we set the threshold value of the neuron to zero, we can always adjust the weightings of the inputs to account for this – only, we’ll also need to allow negative values for our weights. This is great since it removes one variable from our neuron. So we’ll allow negative weights and from now on we won’t need to worry about setting a threshold – it’ll always be zero.

Next, we’ll notice that the weights of the input connections are all relative to one-another, so we can actually normalise these to a value between -1 and 1. Cool. That simplifies things a little.

We can make a further, more substantial improvement to our decision-maker by realising that the inputs themselves (a, b and c in the above example) need not just be 0 or 1. For example, what if today is *really* sunny? Or maybe there’s scattered clouds, do it’s intermittently sunny? We can see that by allowing values between 0 and 1, our neuron gets more information and can therefore make a better decision – and the good news is, we don’t need to change anything in our neuron model!

So far, we’ve allowed the neuron to accept inputs between 0 and 1, and we’ve normalised the weights between -1 and 1 for convenience.

The next question is: why do we need such certainty in our final decision (i.e. the output of the neuron)? Why can’t it, like the inputs, also be a value between 0 and 1? If we did allow this, the decision of whether or not to wear a hat would become a *level of certainty* that wearing a hat is the right choice. But if this *is* a good idea, why did I introduce a threshold at all? Why not just directly pass on the sum of the weighted inputs to the output connection? Well, because, for reasons beyond the scope of this simple introduction to neural networks, it turns out that a neural network works better if the neurons are allowed to make something like an ‘educated guess’, rather than just presenting a raw probability. A threshold gives the neurons a slight bias toward certainty and allows them to be more ‘assertive’, and doing so makes neural networks more efficient. So in that sense, a threshold is good. But the problem with a threshold is that it doesn’t let us know when the neuron is uncertain about its decision – that is, if the sum of the weighted inputs is very close to the threshold, the neuron makes a definite yes/no answer where a definite yes/no answer is not ideal.

So how can we overcome this problem? Well it turns out that if we replace our “greater than zero” condition with a continuous *function* (called an ‘activation function’), then we can choose non-binary and *non-linear* reactions to the neuron’s weighted inputs. Let’s first look at our original “greater than zero” condition as a function:

**Fig 3: **‘Step’ function representing the original neuron’s ‘activation function’.

In the above activation function, the x-axis represents the sum of the weighted inputs and the y-axis represents the neuron’s output. Notice that even if the inputs sum to 0.01, the output is a very certain 1. This is not ideal, as we’ve explained earlier. So we need another activation function that only has a *bias* towards certainty. Here’s where we welcome the ‘sigmoid’ function:

**Fig 4: **The ‘sigmoid’ function; a more effective activation function for our artificial neural networks.

Notice how it looks like a halfway point between a step function (which we established as too *certain*) and a linear x=y line that we’d expect from a neuron which just outputs the raw probability that some some decision is correct. The equation for this sigmoid function is:

where x is the sum of the weighted inputs.

And that’s it! Our new-and-improved neuron does the following:

- Takes multiple inputs between 0 and 1.
- Weights each one by a value between -1 and 1.
- Sums them all together.
- Puts that sum into the sigmoid function.
- Outputs the result!

It’s deceptively simple, but by combining these simple decision-makers together and finding ideal connection weights, we can make *arbitrarily complex* decisions and calculations which stretch far beyond our biological brains allow.

June 19, 2015