Skip to content →

flowingmotion Posts

Why be bothered with a university education?

The distinction between university education and other post-school education can be hard to grasp. Many emotional arguments are advanced. “We only need a handful of people who speak Latin” is one argument, for example, that came up in my Twitter stream.  Often, our argument expresses no more than the emotions we are experiencing as the world shifts about and what we do or have done seems more or less highly valued.

In an earlier post, I tried to list three features of university life which make university education worthwhile though hard to understand from the outside, and hard to get used to when you first arrive as a first year fresh from high school.

I am doing a uni course right now coming from the other direction.  I have a lot of hands-on experience in a field and I wanted to work backwards – so to speak – and formalize my knowledge.  I find the course fairly frustrating because I cannot always relate what I am hearing to a practical situation and some of  the practical exercises are simply better done with a combination of a crib sheet and some trial and error.

So I have to ask myself : why am I still there?  Why haven’t I transferred to a polytechnic type college which would be better organized (from the students’ point of view) , and where the lecturing would frankly be more coherent and the exercises better thought out?

So I’ve had to write down my thoughts (to get them out of my head) and they may be useful to you.

#1 Professors tell the story of abstractions and the failure of abstraction

The job of a professor is to look out on the world and to describe what is common across a whole set of similar situations.

When they do a good job, we can use their generalization like a formula.  I can convert Celcius to Farenheit, for example.

Or, in the case of my course, I can understand how to set up some tables and store them in a database in the most efficient way possible.

The difficulty comes when the generalization or abstraction

a)       Is already known in real life (there have been people making almanacs and look-up tables for generations)

b)       And it turns out the generalization solves some problems but not all (or creates a few side-effects).

The professors then go back to the ‘drawing board’ and try to solve the problem with their own abstraction that they just created with their solution!

This drives students crazy, particularly the more practically minded.  They don’t really want to know this long story of

  • Make this formula
  • Oh. Oooops!
  • Well make this formula.  It is better.
  • Oh. Ooops!

This is particularly annoying to students when they have the spoiler and know the current state of best practice (and are perhaps of slightly impatient temperament).

But. this is what professors know about it and after all, there is not much point in asking them about things they don’t know about!

So the question becomes – shall I keep asking them, or shall I ask someone else?

# 2 Professors prepare you to manage the interface of new knowledge and reality

Well, let’s fast forward a bit to 10 or 20 years’ time when knowledge has advanced.  Of course, you can just go on another course.  And you probably will  go on another course to find out the new way of doing things.

But let’s imagine you are pretty important now and it is your job to decide whether to spend money on this new knowledge, spend money and time on the courses, and to decide whether or not to changing work practice to use the new ideas.

Of course you can find out the new method in a course.  Of course, you can hire consultants to give you the best guess of whether your competitors will use the new knowledge and how much better than you they will be when they put it into play.

There is another question you must ask and answer even if you answer it partly by gut-feel.  You must anticipate what the professors have not answered.  What will be their Oh. Ooops!  Your judgement of the Oh. Ooops! tells you the hidden costs.  The company that judges those correctly is the company that wins.

Everyone will pick up the new knowledge.  That’s out there. Everyone and his dog will take the course and read the book.  What we will compete on is the sense of the side-effects. That thalidomide will be a disaster.  That going to war will increase attacks on us.  To take to well known examples.

The professors won’t be articulate about the side-effects of their new solution. Not because they are irresponsible but because their heads are fully taken up figuring it out.

It is the leaders in charge of the interface between new knowledge and the real world who must take a reasonable view of the risks.  Just as in the banks, it is the Directors who are responsible for using technology that had unexpected side-effects.

When you are the Director, you want a good sense of the unknown unknowns and you develop that sense by listening to Professors. They tell you story of how we found the general idea and then went Oh. Ooops!.  The story can be irritating because it is mainly the story of cleaning up their own mess and sometimes the whole story is nothing more than Oh. Oops!  that ends with “Let’s give this up and start on another story”.

But as future leaders students, practise listening to experts at the edge of knowledge, relating the solutions to real world problems, and getting a good sense of the Oh. Ooops! that is about to come next!

#3 Uni education can feel complicated and annoying

That’s uni education.  Don’t expect it to be a movie that charms you and tickles you ego.  It is irritating.

But rather be irritated there than create a medical disaster, a ship that sinks or a financial system that collapses when you were in charge and jumped into things naively.

See you in class!

CHECK OUT SIMILAR POSTS

Leave a Comment

Back propagation for the seriously hands-on

I have just finished the Stanford back propagation exercise, and to put it mildly, it was a ****.

So is back propagation complicated?  And indeed what is it?

These are my notes so that I don’t have to go through all the pain when I do this again.  I am not an expert and the agreement with Stanford is that we don’t give away the answer particularly at the level of code.  So use with care and understand that this can’t tell you everything.  You need to follow some lecture notes too.

Starting from the top: What is back propagation?

Back propagation is a numerical algorithm that allows us to calculate an economical formula for predicting something.

I am going to stick to the example that Stanford uses because the world of robotics seems infinitely more useful than my customary field of psychology. Professor Ng uses an example of handwriting recognition much as the Royal Mail must use for reading postal codes.

We scan a whole lot of digits and save each digit as a row of 1’s and 0’s representing ink being present on any one of 400 (20×20) pixels.  Can you imagine it?

Other problems will always start the same way – with many cases or training examples, one to each row; and each example described by an extraordinary large number of features. Here we have 400 features or columns of X.

The second set of necessary input data is one last column labeling the row.  If we are reading digits, this column will be made up of digits 0-9 (though 0 is written down as 10 for computing reasons).  The digit is still 0 in reality and if we reconstructed the digit by arranging the 400 pixels, it will still be seen to the human eye as 0.

The task is to learn a shorthand way for a computer to see a similar scan of 400 pixels and say, aha, that’s a 1, or that’s a 2 and so on.

Of course the computer will not be 100% accurate but it will get well over 95% correct as we will see.

So that is the input data: a big matrix with examples along the rows of features and with the last column being the correct value – the digit from (10, 1-9) in this case.

How does back propagation work?

Back propagation programs work iteratively without any assumptions about statistics that we are used to in psych.

The computer boffins start by taking a wild guess of the importance of each pixel for a digit, and see what the computer would predict with those weights.  That is called the forward pass.

Then based on what the computer got right or wrong, they work backwards to adjust the weights or importance of each pixel for each digit.

And remembering that computers are pretty fast, the computer can buzz back and forth asking “how’s this?”.

After a set number of trials, it stops improving itself and tells us how well it can read the digits, i.e., compares its answers to the right answers in the last column of our input data.

What is a hidden layer?

Back proagation also has another neat trick.  Instead of using pixels to predict digits, it works with an intermediate or hidden layer.  So the pixels predict some units in the hidden layer and the hidden layer predicts the digits.  Choosing the number of units in the hidden layer is done by trying lots of versions (10 hidden units, 50 hidden units, etc) but I guess computer scientists can pick the range of the right answer as they get experienced with real world problems.

In this example, the solution worked with 25 hidden layers.  That is, 400 pixels were used to make predictions about 25 units which predict which of 10 digits made the data.

The task of the computing scientist is to calculate the weights from the pixels to the hidden layers and from the hidden layers to the digits and then report the answer with a % of “training accuracy” – over 95%, for example.

Steps in back propagation

We have already covered the first four  steps

Step 1: Training data

Get lots of training data with one example on each row and lots of features for each example in the columns.

Make sure the row is labeled correctly in the last column.

Step 2:  Decide on the number of units in the hidden layer

Find out what other people have tried for similar problems and start there (that’s the limit of my knowledge so far).

Step 3: Initialize some weights

I said before, we start with wild guess.  Actually we start with some tiny numbers but the numbers are random.

We need one set of weights linking each pixel to each hidden layer (25 x 400)* and another set linking each hidden layer to each digit (10 x 25)*.

The asterisk means that a bias factor might be added in raising one or the other number by 1.  To keep things simple, I am not going to discuss the bias factor. I’ll just flag where it comes up.  Be careful with them though because I am tired and they might be wrong.

Step 4: Calculate the first wildly inaccurate prediction of the digits

Use the input data and the weights to calculate initial values for the hidden layer.

Our input data of training examples and features (5000 examples by 400 pixels) is crossed with the appropriate initial random weights (25 x 400) to get a new matrix of hidden layer values.  Each training example will have 25 new values (5000 x 25)*.

Then repeat again from the hidden layer to the layer of digits or output layer making another matrix of 5000 x 10.

In the very last step, the calculated value is converted into a probability with the well know sigmoid function.  It would be familiar if you saw it.  I’ll try to patch it in.

The values calculated at the hidden layer are converted into these probability-type values and they are used for the next step and the final answer is converted in the same way.

Now we have a probability type figure for each of 10 digits for each training example (5000 x 10)*.

Step 5: Find out how well we are doing

In this step, we first convert the correct answer (which was a 1, or 5, or 7 or whatever the digit was) into 1’s and 0’s – so we have another matrix (5000 x10).

We compare this with the one we calculated in Step 4 using simple subtraction and make yet another matrix (5000 x 10).

Step 6:  The backward pass begins

So far so good.  All pretty commonsensical. The fun starts when we have to find a way to adjust those guessed weights that we used at the start.

Staying at a commonsensical level, we will take error that we have in that big 5000 x 10 matrix calculated in Step 5 and partition it up so we can ‘track’ the error back to training examples and hidden layers and then from hidden layers to pixels. And this is what the computing scientists do.  T

hey take one training example at a time (one of the 5000 rows), pick out the error for digit 1, and break it up.  And do it again for digit 2 up to digit 0 (which we input as 10).

Step 7: Working with one training example at a time

It might seem odd to work with one training example at a time, and I suspect that is just a convenience for noobes, but stick with the program.  If you don’t, life gets so complicated, you will feel like giving up.

So take example one, which is row 1; and do the stuff. And repeat for row 1, and so on until you are done.

In computing this is done with a loop: for 1: m where m is the number of training examples or rows (5000 in our case).  The machine is happy doing the same thing 5000 times.

So we do everything we did before this step but we start by extracting our row of features:  our X or training data how has 1 row and 400 features (1 x 400)*.

And we still have one label, or correct answer but remember we will turn that into a row of 1’s and 0’s.  So if the right answer is 5, the row will be 0000100000 (1 x10).

And we can recalculate our error, or uplift the right row from matrix of observed values that we calculated in Step 6.  The errors at the ‘output_layer’ will be a row of ten numbers (1 x 10).  They can be positive or negative and the number bit will be less than 1.

Step 8: Now we have to figure out the error in the hidden layer

So we know our starting point of pixels (those never get changed), the correct label (never gets changed) and the error that we calculated for this particular forward pass or iteration.  After we adjust the weights and make another forward pass, our errors change of course and hopefully get smaller.

We now want to work on the hidden layer, which of course is hidden. Actually it doesn’t exist.  It is a mathematical convenience to set up this temporary “tab”.  Nonetheless, we want to partition the errors we saw at the output layer back to the units in the hidden layer (25 in our case)*.

Just like we had at the output layer, where we had one row of errors (1 x 10), we now want a row or column of errors for the hidden layer (1 x25  or 25 x 1)*.

We work out this error by taking the weights we used in the forward pass and multiplying by the observed error and weighting again by another probabilistic value.  This wasn’t explained all that well. I’ve seen other explanations and it makes intuitive sense.  I suspect our version is something to do with computing.

So here goes.  To take the error for hidden layer unit 1, we take the ten weights that we had linking that hidden unit to each digit.  Or we can take the matrix of weights (10 x 25)* and match them against the row of observed errors (1 x 10).  To do this with matrix algebra, then we turn the first matrix on its side (25 x 10) and the second on its side (10 x 1) and we the computer will not only multiply, it will add up as well giving us one column of errors (1 x25).*   Actually we must weight each of these by the probabilistic type function that we called sigmoidGradient.

We put into sigmoidGradient a row for the training example that was calculated earlier on as the original data times the weights between the pixels and the hidden layer ((5000 x 400*)  times  (25 x 400*))– the latter is tipped on its side to perform matrix algebra and produce a matrix of 25* values for each training example (5000 x 25*).

Picking up the column of data that we calculated one paragraph up, we now have two columns (25* x1) which we multiple (in matrix algebra .* so we can do multiplication of columns like we do in Excel).

Now we have a column of errors for the hidden layer for this one particular training example (25* x1).  (Our errors at the output layer for this person was in a row (1 x 10).

Step 9: Figure out how much to adjust the weights

Now we know how much error is in the output layer and the hidden layer, we can work on adjusting the weights.

Remember we have two sets of weights.  Between the output and hidden layer we had (10 x 25*) and between the input layer and the hidden layer, we had (25 x 400*). We deal with each set of weights separately.

Taking the smaller one first (for no particular reason but that we start somewhere), we weight the values of the hidden layer with the amount of error in the output layer.  Disoriented?  I was.  Let’s look again what we did before.  Before we used the errors in the output layer to weight the weights between output and hidden layer and we weighted that with a probabilistic version of input data times the weights coming between input and hidden layers.  That seemingly complicated calculation produced a set of errors – one for each hidden layer – just for this training example because we still working with just one row of data (see Step 8).

Now we are doing something similar but not the same at all. We take the same differences from the output layer (1 x10) and use them to weight the values of the hidden layer that we calculated on the forward pass (1×25*).  This produces (and this is important) a matrix that will have the same proportions as the weights between the hidden and output layer.  So if we have 10 output possibilities (as we do) and 25* units in the hidden layer, then at this stage we are calculating a 10 x 25* matrix.

So for each training example (original row), we have 250 little error scores, one for each combination of output and hidden units (in this case 10×25*).

Eventually we want to find the average of these little errors over all our training examples (all 5000), so we whisk this data out of the for loop into another matrix.  As good programmers, we set this up before and filled it up with zeros (before the for loop started).  As we loop over training examples, we just add in the numbers and we get a total of errors over all training examples (5000) for each of the combos of hidden unit and output unit (10 x25*).

And doing it again

We have a set of errors now for the connections between hidden and output layers. We need to do this again for the connections between the input layer and the hidden layer.

We already have the errors for the hidden layer (25* x1) (see Step 8).  We use these to weight the input values (or maybe we should think of that the other way round – we use the input values to weight the differences).

We take the errors for the hidden layer (25 x1) and multiple by the row of original data ( 1 x 400*) and we will get a matrix of (25 x 400*) – just like our table of weights!  You might notice I did not put an asterisk on the 25 x1 matrix.  This is deliberate.  At this point, we take out the bias factor that we put in before.

We do the same trick of storing the matrix of error codes (25 x 400*) in a blank matrix that we set up earlier and then adding the scores for the next training example, and then the next as we loop through all 5000.

Step 10: Moving on

Now we have got what we want: two matrices, exactly the same size as the matrices for the weights ( 25 x 400* and 10 x 25*).  Inside these matrices are the errors added up over all training examples (5000).

To get the average, we just have to divide by the number of training examples (5000 in this case). In matrix algebra we just say – see that matrix? Divide every cell by m (the number of training examples). Done.

These matrices – one 25 x 400* and the other 10 x 25* are then used to calculate new tables of weights.  And we rinse and repeat.

  1. Forward pass : make a new set of predictions
  2. Back propagation as I described above.
  3. Get two matrices of errors: yay!
  4. Recalculate weights.
  5. Stop when we have done enough.

The next questions are how are the weights recalculated and how do we know if we have done enough?

Recalculating weights

The code for the back propagation algorithm is contained within a function that has two purposes:

  • To calculate the cost of a set of weights (average error in predictions if you like)
  • And the matrices that we calculated to change the weights (also called gradients).

The program works in this order

  • Some random weights
  • Set up the step-size for learning (little or big guesses up or down) and number of iterations (forward/back ward passes)
  • Call a specialized function for ‘advanced optimization’ – we could write a kluxy one but this is the one we are using
  • The advanced optimizer calls our function.
  • And then performs its own magic to update the weights.
  • We get called again, do our thing, rinse and repeat.

How do we know we have done enough?

Mainly the program will stop at the number of iterations we have set.  Then it works out the error rate at that point – how many digits are we getting right and how many not.

Oddly, we don’t want 100% because that would probably just mean we are picking up something quirky about our data.  Mine eventually ran at around 98% meaning there is still human work and management of error to do if we are machine reading postal codes.  At least that is what I am assuming.

 

There you have it.  The outline of the back propagation.  I haven’t taken into account the bias factor but I have stressed the size of the matrices all the way through, because if there is one thing I have learned, that’s how the computing guys make sure they aren’t getting muddled up.  So we should too.

So now I will go through and add an * where the bias factor would come into play.

Hope this helps.  I hope it helps me when I try to do this again.  Good luck!

The regularization parameter

Ah, nearly forgot – the regularization parameter.  Those values – those little bits of error in the two matrices that are the same size as the weights – (25×400*) and (10×25*)?

Each cell in the matrix except for the first column in each which represents the bias factor, must be adjusted slightly by a regularization parameter before we are done and hand the matrices over to the bigger program

The formula is pretty simple.  It is just the theta value for that cell times by the learning rate (set in the main program) and divided by the number of training cases.  Each of the two matrices is adjusted separately.  A relatively trivial bit of arithmetic.

CHECK OUT SIMILAR POSTS

2 Comments

Need to practice first order logic?

I found this first order logic exercise on Wolfram.

#1 Download Wolfram’s CDF player

-1 The download on their site did not work for me, so I downloaded here from Softpedia.

-2 You will download an .exe file. When it arrives on your personal computer, simply click on the link and it will install as a Program.  It takes a little time to install.  Big beastie to allow you to view interactive documents.

#2 Now read whatever you want on Wolfram’s Demonstrations

-1 Find the demonstration that interests you.  In this case, try this demo for practicing first order logic, also known as predicate calculus..

-2 Click on “Download Demonstration as CDF” at top right and it should open.  If not, try firing up Wolfram first from your Start/Programs.

#3 Practice your first order logic

-1 Choose how many objects to play with,

-2 Start at equation number 1.

-3 Move objects around to change the truth value from true to false and v.v

 

It won’t do your homework for you but it might take the edge off the confusion.

CHECK OUT SIMILAR PPOST

Leave a Comment

Check your propositional logic with a truth table generator

The Seventh Day Adventist University website has a truth table generator for checking propositional logic.  Instructions for inputting propositional logic symbols are on its page.

My host’s wordpress is borked: so here is the link

http://turner.faculty.swau.edu/mathematics/materialslibrary/truth/

#1 Check you understand each part of the assertion

Basically, you can check that you are using the basic truth table for simple assertions like (A and B).

#2 Generate a truth table for multiple assertions

And you can combine simple assertions to generate a truth table

Caveat

I am not an expert in this, but I am assuming that if a bundle of assertions are always true,whatever the starting values that we put into the bundle, then the bundle resolves to true.

Correspondingly, if the assertions come out as false, no matter what the starting values are, the bundle resolves to be false.

And if the bundle contains a mix of true and false, we are left uncertain what will happen.

Any thoughts?

CHECK OUT SIMILAR POSTS

Leave a Comment

9 things to think about when you choose a university

Uni fees in UK have gone up – a lot

Next year, domestic student fees for undergraduates at UK universities will be £9 000 pounds per year.  Some universities, though not many, will charge less and the fee is the same whether you do an expensive subject like Chemistry or a cheap subject like English or take a subject with with low staff-student ratios like Drama or one where students don’t know or need to meet a member of staff like Management and sign up in their hundreds.

On top of the £9000, students also have to pay for books, computers, stationery, accommodation, food, clothing, heating, clubs and extra activities and not least transport to get to uni and to move from where they live to lecture rooms and clubs and so on.  It’s not cheap going to uni in the UK. It costs a lot more than many people earn per year.

Many people say they cannot afford to go. As I understand it, this is not true.  They might not be afford to do A levels, but if they get good A levels and are accepted into a university, they can get a student loan which they start to pay back after their earning are well above a minimum wage job which they will get if they don’t go to uni or find a rarer-than-hen’s-teeth apprenticeship.

Look at what is being sold before you look at the price tag

Having taught in universities on three continents for three decades, there are universities and universities; and students find it hard to tell them apart.  There are the well known ones to be sure – the Harvard’s, the Oxford’s. There are those well known to students because their friends have gone there and they anticipate the party scene with relish.  Students pick their university by word-of-mouth.  That’s what we all do when quality is hard to assess from the outside.

But why should fees change our attitude?  Practically, nothing much has changed. Yes students will have to pay back their loans but 60K or whatever is not a lot for a life time’s investment. They will spend that much getting married and unmarried  in their time.  It is the price of a new kitchen if the shops on my high street are correct.

I thought it would be useful to write down three deep misunderstandings about universities.  If you are serious about going ,and equally miffed at the price, then think about these three points and see if they help you choose the university that you want to go to.

As an accountant once wisely said to me, never look at the price ticket until you know what is for sale.  What do you get from going to a university?

#1 University lecturers and professors spend more than half their time on research

A university teaches something quite different from school or a training course at work.  The teachers in a university are “research active”.  What that means is that they are in the business of making knowledge. That is a highly competitive business and they are only deemed to have made knowledge if there is ringing applause world-wide and gasps of “wish I had thought of that”!

It follows that lecturers and professors watch world knowledge, and the way it changes, like proverbial hawks. They are aware of the history of knowledge in their field and they are watching developments, hoping to pounce and work out the key bits to achieve glory in their subject area.

When you listen to them talk, you hear people who see and think about your subject as something that changes.  And from listening to them, you learn not what is great today, but what is changing all the time.

It is true that you want to know what is great today, but what is great today will probably not be great even by the time you graduate.  It certainly won’t be great for the 50 years of your working life. So learning about the way your subject morphs and develops is what is valuable.

Lesson 1a:  Make sure you will be taught by people who are active in your subject matter.  Turn down universities who use juniors or part-timers to teach you.  Check and ruthlessly discard those that do.  The teachers will not have the appreciation of change which is what you have come to learn.

Lesson 1b: Stop expecting someone to lead you by the hand.  The lecturers’ job is to watch the world and to show you the world through their eyes. If the lecturers are playing coach and tutor to you, they are not doing their jobs and you will have nothing to learn from them. Do the work to catch up.  Every student is in the same position. It is the very reason why you came to uni.  To catch up with people who see world-knowledge as something on the march.

Lesson 1c: Choose a uni partly for the other students. Do the other students care about knowing about where the subject is going over the next 50 years and where it came from over the last 500 years.  If not, move on. You will depend on the buzz of other students to catch up with the lecturers and if the other students don’t care, you will find it very difficult to master the steep learning curve.  Look at what students talk about on the chat boards.  If they don’t care, move on. 60K is a lot of money to pay for 3 years of bad parties.

#2 Universities are strict about referencing and plagiarism

The second thing you will notice when you get to university is that lecturers bang on about referencing your essays.  And referencing is a pain to learn. It is bitty and fiddly and lecturers fail you outright when you get it wrong. What is all that about?

I said under the previous point that lecturers are not telling you about a subject, they are telling you about how a subject changes.  So they are telling you about where an idea came from and who is talking about it.  They think geographically with layers of history.  Watch them read and you’ll seem them see a reference and then flick to the back to see the details of the publication.

In their minds they are thinking, hmm Harvard 1921  . . . who else was at Harvard then, what else did this person write, who over here in Europe would not have known because communications across the Atlantic were still slow then.  When they read, they are mapping the changing of the idea so they can pounce and put in the missing step or the next step in the evolution of ideas.

And they are teaching you to do the same.  Why? Because you have 50 years in the game and you want to be on top for 50 years, not on top for 1 and increasingly behind for the next 49.

Provenance, provenance, provenance.  That’s what it is all about.

But you learn something more from this aspect of uni education that is not a prominent part of school or workplace training.  When you track who thought of what, you understand that ideas develop because of the self-interest of the people involved. People at Harvard in 1921 would think up different things from people at Oxford in 1921 because they are surrounded by different people and face different issues.

From watching the provenance of ideas, we begin to appreciate diversity.  We begin to understand the value of other people’s ideas.   Their specific circumstances are different from ours and lead to different thought processes.  The mark of the a university-trained man or woman is that their ears prick up when they realise someone comes from a different walk-of-life because they know that person is likely to think quite differently from us and their thoughts could be very valuable.

Lesson 2a:  Spend your first year learning the reference system, though it is a pain, and get it right. It is the essential mechanic for building the geographical and historical map of change that you need to consistently be on top form for 50 years.

Lesson 2b: Start to appreciate how the circumstances of a person contribute to their thinking (and how your circumstances contribute to your thinking)

Lesson 2c:  Go to a uni where the students differ a lot from each other. It is difficult when you first arrive because you don’t know how to get along socially. But you’ll learn and appreciating the value of differences will give you the grounding to become a world-class negotiator and keep you on top of your game for 50 years.

#3 University is hard

Yes, university is hard, meaning – it is very difficult to know if you are doing well or not.  You can’t just ‘knock off the homework’. You can put 20 hours into something and find you are off the point.  The feedback doesn’t always make sense.  That’s what hard is.  The task isn’t hard when you know how to do it; it’s bloody hard when you don’t and you can’t figure out what  ou are supposed to do or what matters.

Well, that is also part of the design of university education.  When we know how to do a piece of work, we can delegate it to someone who can’t untangle problems – a high school graduate in other words, or a computer, or an outsourcing company in outer space.

It is jobs where the goals aren’t even clear, let alone the steps, that require well trained minds.  All good universities give you work that seems to have several layers of confusion and your job is to work out the layers and turn the confused mess into something orderly.

If you have been listening to how ideas change and where they come from, then you are more than half-way there because you start placing bits and pieces on your mental map and you can work out the story and see where it is going.

But there is another skill that you learn and that is keeping your temper.  When things are hard, our blood pressure goes up and thinking goes down. To think straight, you have to get on top of your temper to think straight. And you will learn to get good at not reacting badly to the feeling of being confused.

You will also stop blaming people. At first, your thought is ‘bad teacher – teacher confused me.’  Soon you will realise you should be saying ‘thank you teacher, you got me there, I had to think a bit.’

Above all, a university man or woman can untangle the mess of all the different ideas that a crowd of people put on the table.  And because you are calmly gathering all these emotionally charged ideas and sorting them out, even if you aren’t the smartest and most knowledgeable player in the room, you are welcome in the room for not just 50 years but probably your remaining 70.

A university should challenge you emotionally; if not, your money and most importantly your valuable three years of young adulthood are being wasted.

Lesson 3a:  Expect instructions to be confused and untangle them calmly.

Lesson3b:  React to your own indignation by realising your alarm is a signal that you haven’t finished the task.

Lesson3c:  Look at student chat boards. Avoid universities where they are full of whining and complaining.  Uni isn’t a cutprice airline.   Find a university where students solve problems and take pride in their emotional aplomb.

 Look at what is being sold before you look at the price tag

I hope these points help you think through why you might want to go to university and how to choose one that is worth 60K and more importantly, worth 3 of the most valuable years of your life. Choose well, and then get on the steepest learning curve you will ever face.  You will be glad; but if you don’t want any of these things, then by all means go to the nearest and cheapest.   Or consider whether you want to go to uni at all.  You can get information you need off the internet and sometimes you can bypass uni and go straight to a Masters.  Some people have to do it that way round because for one reason or another, they can’t go to uni when they are young.

But if you are going, remember to think like that accountant. Work out exactly what is for sale before you worry if the price tag is right.

This is what a good university (or good university department) provides.

  • A sense of how the world has changed and is changing and will continue changing for the next 50 years of your working life
  • A sense of how the circumstances in which people live affect their thinking and how their perspectives enrich yours
  • An ability to sort out confusion, including lots of emotionally-charged arguments, without getting upset yourself or blaming others.

CHECK OUT SIMILAR POSTS

One Comment

A general algorithm for implementing a neural network

What is a neural network?

A neural network takes a set of binary (0/1) signals, groups them into a smaller set of hidden units, which are used to work out the probability of something happening.

An example of a neural network?

The example used by Andrew Ng in the Stanford course

  • Converts the image of a handwritten number into 20×20 = 400 pixels, i.e. a row of 400 1’s or 0’s
  • The backward propagator works out how to group the 400 columns of pixels into 25 units (which are ‘hidden’ because the end user doesn’t need to know about them)
  • And then the backward propagator does its magic again to work out weights to match combinations of 25 units onto the ten possibilities of a digit (0-9).

Forward propgation

The forward propagator takes a handwritten number, or rather the row of 400 1’s and 0’s representing the 20×20 pixels for a number, and runs the calculation forwards.  400 1’s and 0’s are multiplied by the weights matching a column to the 25 hidden weights. And 25 new columns are generated.  Each image represented by a row,now has 25 numbers.

The process is repeated with the weights from the 25 columns to the 10 digits and each image now has a probability for each of the 10 digits.   The biggest probability wins!  We have taken a list of pixels and stated what a human thought they were writing down!

Training accuracy

And of course, if we knew what the digit really was, as we do in a ‘training set’ of data, then we can compare the real number with the one that the machine worked out from the set of pixels.  The program run for Stanford students is 97.5% accurate.

Waiting for backward propagation

The real interest is in the backward propagator, of course.  Just how do they work out that there should be 25 units in the hidden layer and how do they work out the weights between the hidden layer and the output layer.

Machine learning vs psychology

In psychology, we have traditionally found the hidden layer with factor analysis or principal components analysis.  We take your scores an intelligence test, for example.  That is simply a row of 1’s and 0’s!  We factor analyse the 1’s and 0’s (for you and hundreds of other people) and arrive at a hidden layer.  And from there we predict an outer layer.

We usually tighten up the inputs to reflect the hidden layer as closely as possible – that is we improve our tests so that 30/50 is meaningful.  And our outer layer is often continuous – that is we predict a range of outcomes which we later carve up into classes.  We might predict your A level exam results by % and then break them into A, B, C, etc.

So it is with great interest that I await the backward propagation.  I am also more interested in unsupervised machine learning which I suspect reflects real world conditions of shifting sands a lot more.

CHECK OUT SIMILAR POSTS

Leave a Comment

Functional dependence in databases: plain language & how-to

Computer scientists speak a language of their own with long ‘noun phrases’ and complex sentences which are often grammatically incorrect or very difficult to parse.

This post here is a short description of FUNCTIONAL DEPENDENCE IN DATA BASES.  This is a common sense view and an AMATEURISH view – so read it to take off the edge off the unintelligibility of explanations around the web but remember that when you want to solve very hard problems, this explanation will probably be not be sufficient or even accurate.

#1 What is a database?

A database is a set of tables.  A table is just a set of rows and columns like we have in Excel.  A database has lots of tables like the sheets in Excel.

And just as we can in Excel, we can tell the program to toddle off to another table to pick up a value, bring it back and put it in another table.  We call that a Look Up.

#2 Looking up something by tracking from table to table

In a complicated set of tables, the value we want might be in Table 6, for example.  We might also have some information that doesn’t allow us to look up what we want in Table 6, but we can use the information we have to look up something in Table 4 and something else in Table 3 and use those two facts to look up what we need in Table 5 and then go to Table 6 to finish the job.

To take a practical example, if you want to look up someone’s telephone number, you need to know their family name and the town where they live.  If their family name is very common, you might need to know their first name and street name as well, but let’s stick to a simple example.

So if we know our friend lives in Timbuktu, we look up the volume number of the directory for Timbuktu, then we go to the Timbuktu directory/table and we use our friend’s name to lookup their telephone number.

Alternatively, the list might have been laid out in one table and we look in the first two columns for Timbuktu AND our friend’s name.  When we have found both together, the correct telephone number will be in the next column.

#3 What is functional dependence?

In plain language, functional dependence just describes how we look up information.

Because we need our friend’s town to look up their telephone number, then telephone number is functionally dependent on town and the computer scientists write that down as Town àTelephone Number.

Equally ,as we need a name to look up a telephone number, telephone number is functionally dependent on name and that is written as Name à Telephone Number.

What’s more, as we need town and name to look up a telephone number, Telephone Number is functionally dependent on Town AND Name. Computer Scientists write that as Town, Name à Telephone Number.

#4 So why do we care about functional dependence?

Every day we ask questions about functional independence without being conscious that we are using this exalted concept!

We use functional dependence every day

Whenever we look up something like a telephone number, we are asking what information we need to know to look up the number.

When you Googled “functional dependence” and landed up here, you used a look up – or rather you trusted Google to know what look-ups to use!

Tough search problems require us to track from one lookup table to another

At work, we might also say, if I have information A and information B, can I find out Z and how do I find out.

How can I step from table to table to get the information that I seek?

When we design a database or set of spreadsheets, we want to do the least work possible!

When we design a database, or set of Excel spreadsheets, we also want to make as few tables as possible!

We want to make sure we only ever have to type a piece of information into only one table once!

We want to make sure that data that hardly ever changes or we hardly ever look at is still accessible but in tables that we can put out of the way.

That’s it.  That’s the what and why of functional dependence.  So let’s turn to the how and specifically the ‘how’ for students.

#5 So how do you work out functional dependence questions?

With the Stanford experiment in online classes going on, and other computer science students doing homework, you might really want to know how do I do these ***** problems?

This is my way – it is not the official way but it works for me.

When I have a table (R) with columns (A, B, C, D etc), I think of the columns as all the columns in a set of spreadsheets.

Then I turn the functional dependencies (FD) (written AàB) into tables.  AàB is a table with two columns, A and B. In plain language, I use column A to look up column B.

Problem 1

Then, when I am asked, does ABCàD work in that relation and set of FD’s, all I do is ask myself, given the set of tables, if I already  know ABC, can I find the value of D? I have to be careful and methodical, but hey it works.

Problem 2

Then when I am asked if BC, say, is a key, I write down BC and I write down the columns that are left – say a, d.  Then I ask if I can look up a and d with B and C.  If I can, then BC is a key. If not, BC is not a key.

Problem 3

When I am asked if two sets of FD are the same, then all I am being asked is whether a set of tables allows me to look up the same information.  This is more tricky and I found it easiest to draw a matrix with a column for each column (A, B, C, D, etc) and a row for each of those columns.  I scratch out the diagonal (A=A) and see if knowing A (row), I can look up B, C, D etc.  This only works for very simple problems though.

So, this is an amateur’s take on functional dependence. Use it if it works for you and not if it doesn’t.  And remember it is an amateur’s version.  Once problems become more complicated, all that maths is probably useful shorthand and my account is probably concealing some misunderstanding or other!

Good luck.

CHECK OUT SIMILAR POSTS

Leave a Comment

12 steps to running gradient descent in Octave

This post provides a birds’ eye view of how to calculate linear regression using the numerical programming used by machine-learning people.  It is, of course, easier to do the linear regression in a statistics program but it is good to know and the overall structure probably provide a foundation for other machine-learning programs.

The algorithm works with Octave which is like a free version of MatLab.

I’ve also only given the birds’eye view because the code is part of the Machine Learning course at Stanford and writing it is part of the homework – if we release it, students’ won’t have to write it themselves and they won’t learn how anything.

Example of linear regression

Imagine you want to buy a second hand car and you collect information on prices for a particularly model with the age and mileage of the vehicle.

The following can compute an equation to predict the price of the car from either the age of the vehicle or the mileage.  Using both age and mileage is a multivariate problem which has different algorithms.

#1 Prepare data

~1 Normally, we would input the data into a table in Excel with the first column being age (or mileage) of the vehicle and the second column being price.

~2 Then we would save the file as a csv (comma separated values) text file which we will call for now mydata.txt

#2 Load the data into Octave

~1 Start Octave from your list of Start/Programs

~2 Tell Octave where your data is stored.  For example, cd ‘c:usersmedata’

~3 Confirm you are in the correct directory by typing ‘pwd’

~4 Read in the data using these three commands

data = csvread(‘mydata.txt’);  % reads the data file

X= data(:,1); % reads all the rows of the first column of the data (age in our example) into matrix X

y = data(:, 2); % reads all the rows of the second column of the data (prices in our example) into vector y

m=length(y); % counts the number of training examples, or rows (age, price pairs in our example)

#3 Visualize the data using plot

~1 Find out what the data looks like (inspect the data) by  using the plot function

~2 This was part of the homework so let’s say that the commands for a plot were put in separate file which is called with a simple command plot(X, y)

~3 Calls to a function can be made from the command line within Octave or another function.

#4 Pick out some data to act as a test you have everything correct

~1 Look at the graph and pick two values of X (age of the vehicle in our example)

~2 Estimate by sight the predicted value of Y (the price of the vehicle in our example)

Hang on to these. You will need them at the end!

 #5 Set the ‘settings’ for the gradient descent

~1 Set the number of iterations and type these in at the command line or put them in another function.  For example, iterations=1500

~2 Set the learning rate.  For example, alpha = 0.01

~3 Set up the matrix of theta values (that is, the y intercept and the gradient of the graph. If price= a +b(age), the two theta values are a and b).  Type in theta= [0;0]. That sets the initial values of both parameters as 0.  A line like this predicts every price as 0 no matter the age of the vehicle.  But it is just our starting point!

#6 Calculate the initial cost function

~1 Calculate the errors in prediction if we treat theta as [0;0], or that is if we treat price as a straight line and always 0.  The formula is in essence the sum of the square of the prediction errors divided by twice the number of cases.  I can’t derive this and I am not going to type it in because finding the right formula was probably part of the homework.

~2 Put the code into a function costCompute (X, y, theta) and save as a text file with extension .m (costCompute.m)

~3 This function is called repeatedly later because every time we improve our guess of the parameters (theta or a & b in the regression line), then our prediction errors will decrease.  Basically, we will stop trying to improve our parameters when we can’t reduce our prediction errors any further.

#7 Repeatedly calculate new parameters

~1 The goal now is to iteratively improve our guesses of the parameters (theta – i.e., a & b in the regression line). The machine learning specialists call this ‘learning’ the parameters.  So, we are starting with [0,0] and we will slowly improve them.

~2  The formulas for changing the parametesr amount to calculating a minute change and taking it away from the last value.

~3 There is a different formula for the two parameters (a & b) largely because they start off as differently – as a  constant, a, and as bx. (It’s maths…)

~4 The constant, a, decreases by the alpha (set as low as 0.01 – see above) times the average error in prediction.  Again the formula is part of the homework so I won’t write it down here.

~5 The slope, b, decreases by the alpha times the average error in the prediction itself multiplied by the original x value.  I vaguely intuit this.  Again there is a formula.

~6  A new cost is calculated with the two new parameters and of course, the cost (think of it as the average prediction error) should have gone down.

#8 Iterate lots!

~1 The iteration in this example was set at 1500!

~2 I don’t see how else the program ended. Presumably it could also be coded to end when the improvement in the cost (improvement in prediction errors) falls below an pre-stated, acceptable level.

#9 Print out a linear fit

~1 Overlay a straight line graph on the original plot of the data

~2 Use Octave’s ‘hold on’ command to keep the old plot as the base

~3 To draw our prediction line, either calculate predicted values or simply calculate the predicted values within the plot command.  Plot (X, X*theta, ‘-‘)

#10 Check the answer is reasonable

~1 Find the test data you set up in step 4.

~2 Calculate predicted values using the parameters we have calculated for the two test levels of X (i.e., what prices do we predict for our two ages of vehicle?).

~3 Do they make sense?  My primary school maths teacher told me ALWAYS to write the answer in plain English!

#11 Visualize theta values in a 3D plot

~1 Use Octave’s surf command to visualize the cost (average prediction errors) of each combination of theta values (a & b).

~2 The 3d plot is bowl-shaped and the best combination of (a & b) is at the putative point where the bowl would balance (my rough understanding).

#12 Visualize the contour plot

~1 Use Octave’s contour command to visualize the theta-theta graph (all those a’s plotted against all those b’s).

~2 Our final version of a and b should sit in the inner circle as if it was the highest point on the contour map of a mountain.

This is my record of the 12 steps in using gradient descent to do linear regression for a problem such as predicting of the price of a car from its age.  We need a large data set of recent data and we repeatedly put in values for the y intercept (value of a car when it is brand new) and the slope (the rate the value decreases). (Yeah, I know… a straight line is not necessarily the best model unless we start after the car is a year old).

The program nippily calculates how bad the model is and when it stops getting better, it stops and delivers the parameters for the line.  We can check the parameters graphically (because like all good data scientists we inspected our data at the beginning).

We can also use the contour and surf functions of Octave to plot the improvements in our estimations so that we can see what the program actually did.

I’ve written this to make sure I can do it again. I hope it is useful but the code itself is embargoed because otherwise future students of Stanford will not have to do any work and will not learn anything!

CHECK OUT SIMILAR POSTS

6 Comments

Install Relational Algebra Interpreter on your Windows machine in less than 30 minutes

This is a quick blog post on how to install and use the Relational Algebra Interpreter on a Windows machine.

#1 What is the Relational Algebra Interpreter?

You only need the Relational Algebra Interpreter if you have a database that you set up through SQLite or MySQL or other database software (not Microsoft Access) and you want a shortcut to writing the SQL commands like SELECT and PROJECT.

The Relational Algebra Interpreter was written Jun Yang, now of Duke Uni.  It is used by many universities, including Stanford, to teach database programming.

#2  Install the Relational Algebra Interpreter?

This notes describe my installation.  I am not an expert and I am jotting down what I did mainly so that I can do it again another day.  If they help you, good; if not, sorry.

#2.1 Do you have Java installed on your machine. I am using Windows 7 on a relatively new machine and I have a Java JRE (runtime environment) installed but not a JDK (development kit).  I didn’t want to rush a notoriously convoluted Java install and make a mess of a new machine, so I fired up my old Windows XP where there is a JDK already installed.

#2.2 Decide where to put your RA interpreter.

The Interpreter comes with everything you need to run on SQLITE databases ( I am not sure on what else) and as I am only using the RA interpreter for a few weeks, I didn’t want to change my PATH statement – so I made a working folder, e.g. C:/A_RA_demo.

I went to Dr Yang’s page and downloaded the RA Interpreter and unzipped it.  I also transferred over from my other machine the test SQLITE3 database that I made earlier (see another post).

#3 Use the Relational Algebra Interpreter

#3.1  The RA Interpreter file runs from your command prompt (Go to Start; type cmd in the box and hit enter; change directory to work in your working directory, e.g., cd c:/A_RA-demo.

#3.2  The RA Interpreter comes with a  sample.db and a sample.properties file.  So to prove everything works, type into your command line, “java –jar ra.jar” and observe the changes on your screen.

#3.3  See what is in the sample.db by typing at the ra> prompt “list;”.

#3.4  Pick any of the “relations” (tables in plain English) and type at the ra> prompt “relationname;”.

#3.5 Type at the ra> prompt “quit;”

#4 Use the Relational Algebra with your own database

#4.1  To use the Relational Algebra with your own database, you must make a corresponding .properties file.

#4.2  Copy the sample.properties file and save it as yourdatabasename.properties.

#4.3  Find the right command line to edit. Most of the file is commentary.

#4.4  I was using a database made in SQLITE3 so I picked the SQLITE command.  You don’t have to add a 3.  You must change the sample.db to the yourdatabasename.extension.  I got held up here for a while because my database was a just a file without an extension.  When I typed in the database-name-only, it worked.

#4.5  Go back to the command line and make sure you are working in the working directory.  If not, look at step #2.2

#4.6 On the command line, type “java –jar ra.jar yourdatabasename.properties”

#4.7 Confirm you have read your database by typing at the >ra prompt “list;”

#4.8  All working?  So quit for now by typing at the >ra prompt “quit;”

#5 Use the Relational Algebra Interpreter with query commands in a file.

#5.1  Put the relational algebra interpreter commands in a text file and save in the working directory (e.g., query1.txt).  [You could test “list;” for now.]

#5.2  Go back to the command prompt and confirm you are in your working directory. Type “java –jar ra.jar yourdatabasename.properties –i query1.txt”

#5.3  You will get an answer, or more likely an error message.  Debug your query, resave query1.txt and rerun the java line.  Conveniently, recall the java line with the up key and enter.

Done!  Now all you have to do is learn the Relational Algebra syntax which you can also find on Professor Yang’s site.  It is a little mind blowing and I found tracking all the brackets quite hard but you get the hang of it with practice.

Make sure you start with a little database so you can check your queries by hand.  And search the Stanford website for class notes to help you on your way.

CHECK OUT SIMILAR POSTS

2 Comments