Neural networks are great models for finding patterns in data. They’re incredibly simple, but powerful, and have been used for virtually every domain in machine learning. This will be a simple guide to neural networks, focusing on building up an intuition for how they work, rather than a deep understanding of the math. I’ll be focusing solely on how neural networks that are already built work. How these nets are built is significantly more complicated, but if you wish to learn about it, this is a great post on the subject.
A Little Math
We do need a tiny bit of math before any of the rest of this makes sense. Luckily, it’s just one (short) formula :
f(x) = tanh(x)
This is just a hyperbolic tangent function, and all you need to know is it looks like this :
The purpose of this function is to squash the output between -1 and 1, and make neurons appear more confident than they actually are (there are a few reasons for this, but they’re beyond the scope of this post). This ensures that no output can be greater than 1 or less than -1.
Exclusive or (xor for short) is a logical operation, much like
The truth table for xor looks like this :
Basically, if either one, but not both, of the inputs are 1, the output is 1, otherwise the output is 0. The significance of this function for our purposes is it’s one of the most basic functions that is not linearly separable. This means that, unlike
and, you cannot determine the output of this function based solely off one of its arguments.
Here’s a neural network that has learned
Here’s a breakdown of the image above:
- Input nodes : The squares on the bottom. They’re the inputs to the neural network.
- Hidden nodes : The circles in the middle. They’re the ‘brains’ of the network.
- Output node : The square on the top. It’s the output of the neural network. In this case there is only one but neural networks can have an arbitrary number of output nodes.
- Blue lines : Positive weights. Numbers that get passed along a blue line retain their signs.
- Orange lines : Negative weights. Numbers that get passed along an orange line have their signs flipped.
So let’s go through the neural network for a couple example inputs. We’ll start with
0 xor 0.
The first input is 0, and the second input is 0. We’ll look at the input node on the left first. This node will take in 0 as it’s input, apply the activation function (tanh), and output 0 (tanh(0) is 0). This 0 will be passed to both the left, middle, and right hidden nodes. The same process will happen for the right input node, since its input is also 0. The three hidden nodes now have values of 0. The activation function will be applied to those 0s again, and they will pass their outputs to the output node. The output node will add up its two inputs and get 0 (0+0+0=0). So our input is (0,0), and our output is 0, perfect!
Let’s try a slightly more complicated example. This time we’ll use
0 xor 1.
The first input is 0, and the second input is 1. The input node on the left side will apply the tanh function to its input, and get 0. It will then pass that 0 to the left, middle, and right hidden nodes. After this the values of the hidden nodes are (still) 0.
The second input node in this example is more interesting. It will take an input of 1, run it through the activation function, go through each connection, multiply 1 (the input) by the weight of that connection, and add the result to the current value of the hidden node that connection is attached to. The first node did all this too, but since the value was 0 it was a lot less interesting. Let’s assume all weights are either -3 or 3 to make the math a bit nicer (the reason for this will become apparent later). tanh(1) is .76, so the first hidden nodes will have a value of .76*3 = 2.28. The second and third hidden nodes will also have a value of 2.28 by the same logic.
Now we have to calculate the value of the output node. As with every other node, this will be tanh(sum of its inputs). So let’s go through the hidden nodes one by one, from left to right, to see what the inputs will be. Remember, all hidden nodes currently have a value of 2.28.
So now all that’s left to do is take the sum of these weights and run it through the activation function one last time. tanh(2.94-2.94+2.94)=tanh(2.94)=.99, which is close enough to 1 that we’ll call it a success (because the range of tanh is not inclusive of -1 and 1, a neural network using tanh as its activation function cannot return a -1 or 1).
A (slightly) more useful example
Okay, so you understand how a neural network can output the correct value for a simple function like xor. But neural networks are much more powerful than that. So let’s look at a net that does something a bit more complicated. We’ll be looking at a net that recognizes patterns that look like 3s in an array of 0s and 1s. Here’s what the data looks like :
Line breaks have been added, and 1s have been highlighted. Really, these are just arrays of 25 numbers, all 0s or 1s. Here’s a network that recognizes these patterns.
Note : the above is a simplification, there are weights from every node to every other node in the next layer, I’ve hidden the connections with lower weights for clarity.
As you can see, there are 25 input nodes, and one output node. The one output node will tell us whether the input is a 3 or not, and the inputs are the 25-length arrays of 1s and 0s.
Rather than going through every path like we did with the xor function, we’ll be looking at this in a more intuitive way. We’ll start by looking at the node on the right. It has a strong positive connection to the output node, so any positive connections to it will be positive indicators that the input forms a three pattern. For example, you can see that the right hidden node has a positive connection to the rightmost, middle input node. This means that having a 1 in that position is a positive indicator. You can verify this for yourself by looking at the data; all threes have a 1 in that position, but only one of the other numbers has a 1 in that position.
On the other hand, the left hidden node has a negative weight to the output node. By the same logic as above, we can deduce that every positive connection to it is a negative indicator that the input forms a three pattern. Inversely, every negative weight is a positive indicator that the input forms a three pattern.
Exercise for the reader : why do the two hidden nodes tend to point to the same inputs, but with opposite weights?
Neural Networks are pretty cool.