What is a neural network?
A neural network takes a set of binary (0/1) signals, groups them into a smaller set of hidden units, which are used to work out the probability of something happening.
An example of a neural network?
The example used by Andrew Ng in the Stanford course
- Converts the image of a handwritten number into 20×20 = 400 pixels, i.e. a row of 400 1’s or 0’s
- The backward propagator works out how to group the 400 columns of pixels into 25 units (which are ‘hidden’ because the end user doesn’t need to know about them)
- And then the backward propagator does its magic again to work out weights to match combinations of 25 units onto the ten possibilities of a digit (0-9).
The forward propagator takes a handwritten number, or rather the row of 400 1’s and 0’s representing the 20×20 pixels for a number, and runs the calculation forwards. 400 1’s and 0’s are multiplied by the weights matching a column to the 25 hidden weights. And 25 new columns are generated. Each image represented by a row,now has 25 numbers.
The process is repeated with the weights from the 25 columns to the 10 digits and each image now has a probability for each of the 10 digits. The biggest probability wins! We have taken a list of pixels and stated what a human thought they were writing down!
And of course, if we knew what the digit really was, as we do in a ‘training set’ of data, then we can compare the real number with the one that the machine worked out from the set of pixels. The program run for Stanford students is 97.5% accurate.
Waiting for backward propagation
The real interest is in the backward propagator, of course. Just how do they work out that there should be 25 units in the hidden layer and how do they work out the weights between the hidden layer and the output layer.
Machine learning vs psychology
In psychology, we have traditionally found the hidden layer with factor analysis or principal components analysis. We take your scores an intelligence test, for example. That is simply a row of 1’s and 0’s! We factor analyse the 1’s and 0’s (for you and hundreds of other people) and arrive at a hidden layer. And from there we predict an outer layer.
We usually tighten up the inputs to reflect the hidden layer as closely as possible – that is we improve our tests so that 30/50 is meaningful. And our outer layer is often continuous – that is we predict a range of outcomes which we later carve up into classes. We might predict your A level exam results by % and then break them into A, B, C, etc.
So it is with great interest that I await the backward propagation. I am also more interested in unsupervised machine learning which I suspect reflects real world conditions of shifting sands a lot more.
CHECK OUT SIMILAR POSTS
- Back propagation for the seriously hands-on
- Down-to-earth principal components analysis in 5 steps
- Learning curves and modelling in machine learning
- 10 steps to build a spam catcher
- 12 steps to running gradient descent in Octave