12 steps to running gradient descent in Octave
16 Sunday Oct 2011
This post provides a birds’ eye view of how to calculate linear regression using the numerical programming used by machine-learning people. It is, of course, easier to do the linear regression in a statistics program but it is good to know and the overall structure probably provide a foundation for other machine-learning programs.
The algorithm works with Octave which is like a free version of MatLab.
I’ve also only given the birds’eye view because the code is part of the Machine Learning course at Stanford and writing it is part of the homework – if we release it, students’ won’t have to write it themselves and they won’t learn how anything.
Example of linear regression
Imagine you want to buy a second hand car and you collect information on prices for a particularly model with the age and mileage of the vehicle.
The following can compute an equation to predict the price of the car from either the age of the vehicle or the mileage. Using both age and mileage is a multivariate problem which has different algorithms.
#1 Prepare data
~1 Normally, we would input the data into a table in Excel with the first column being age (or mileage) of the vehicle and the second column being price.
~2 Then we would save the file as a csv (comma separated values) text file which we will call for now mydata.txt
#2 Load the data into Octave
~1 Start Octave from your list of Start/Programs
~2 Tell Octave where your data is stored. For example, cd ‘c:usersmedata’
~3 Confirm you are in the correct directory by typing ‘pwd’
~4 Read in the data using these three commands
data = csvread(‘mydata.txt’); % reads the data file
X= data(:,1); % reads all the rows of the first column of the data (age in our example) into matrix X
y = data(:, 2); % reads all the rows of the second column of the data (prices in our example) into vector y
m=length(y); % counts the number of training examples, or rows (age, price pairs in our example)
#3 Visualize the data using plot
~1 Find out what the data looks like (inspect the data) by using the plot function
~2 This was part of the homework so let’s say that the commands for a plot were put in separate file which is called with a simple command plot(X, y)
~3 Calls to a function can be made from the command line within Octave or another function.
#4 Pick out some data to act as a test you have everything correct
~1 Look at the graph and pick two values of X (age of the vehicle in our example)
~2 Estimate by sight the predicted value of Y (the price of the vehicle in our example)
Hang on to these. You will need them at the end!
#5 Set the ‘settings’ for the gradient descent
~1 Set the number of iterations and type these in at the command line or put them in another function. For example, iterations=1500
~2 Set the learning rate. For example, alpha = 0.01
~3 Set up the matrix of theta values (that is, the y intercept and the gradient of the graph. If price= a +b(age), the two theta values are a and b). Type in theta= [0;0]. That sets the initial values of both parameters as 0. A line like this predicts every price as 0 no matter the age of the vehicle. But it is just our starting point!
#6 Calculate the initial cost function
~1 Calculate the errors in prediction if we treat theta as [0;0], or that is if we treat price as a straight line and always 0. The formula is in essence the sum of the square of the prediction errors divided by twice the number of cases. I can’t derive this and I am not going to type it in because finding the right formula was probably part of the homework.
~2 Put the code into a function costCompute (X, y, theta) and save as a text file with extension .m (costCompute.m)
~3 This function is called repeatedly later because every time we improve our guess of the parameters (theta or a & b in the regression line), then our prediction errors will decrease. Basically, we will stop trying to improve our parameters when we can’t reduce our prediction errors any further.
#7 Repeatedly calculate new parameters
~1 The goal now is to iteratively improve our guesses of the parameters (theta – i.e., a & b in the regression line). The machine learning specialists call this ‘learning’ the parameters. So, we are starting with [0,0] and we will slowly improve them.
~2 The formulas for changing the parametesr amount to calculating a minute change and taking it away from the last value.
~3 There is a different formula for the two parameters (a & b) largely because they start off as differently – as a constant, a, and as bx. (It’s maths…)
~4 The constant, a, decreases by the alpha (set as low as 0.01 – see above) times the average error in prediction. Again the formula is part of the homework so I won’t write it down here.
~5 The slope, b, decreases by the alpha times the average error in the prediction itself multiplied by the original x value. I vaguely intuit this. Again there is a formula.
~6 A new cost is calculated with the two new parameters and of course, the cost (think of it as the average prediction error) should have gone down.
#8 Iterate lots!
~1 The iteration in this example was set at 1500!
~2 I don’t see how else the program ended. Presumably it could also be coded to end when the improvement in the cost (improvement in prediction errors) falls below an pre-stated, acceptable level.
#9 Print out a linear fit
~1 Overlay a straight line graph on the original plot of the data
~2 Use Octave’s ‘hold on’ command to keep the old plot as the base
~3 To draw our prediction line, either calculate predicted values or simply calculate the predicted values within the plot command. Plot (X, X*theta, ‘-‘)
#10 Check the answer is reasonable
~1 Find the test data you set up in step 4.
~2 Calculate predicted values using the parameters we have calculated for the two test levels of X (i.e., what prices do we predict for our two ages of vehicle?).
~3 Do they make sense? My primary school maths teacher told me ALWAYS to write the answer in plain English!
#11 Visualize theta values in a 3D plot
~1 Use Octave’s surf command to visualize the cost (average prediction errors) of each combination of theta values (a & b).
~2 The 3d plot is bowl-shaped and the best combination of (a & b) is at the putative point where the bowl would balance (my rough understanding).
#12 Visualize the contour plot
~1 Use Octave’s contour command to visualize the theta-theta graph (all those a’s plotted against all those b’s).
~2 Our final version of a and b should sit in the inner circle as if it was the highest point on the contour map of a mountain.
This is my record of the 12 steps in using gradient descent to do linear regression for a problem such as predicting of the price of a car from its age. We need a large data set of recent data and we repeatedly put in values for the y intercept (value of a car when it is brand new) and the slope (the rate the value decreases). (Yeah, I know… a straight line is not necessarily the best model unless we start after the car is a year old).
The program nippily calculates how bad the model is and when it stops getting better, it stops and delivers the parameters for the line. We can check the parameters graphically (because like all good data scientists we inspected our data at the beginning).
We can also use the contour and surf functions of Octave to plot the improvements in our estimations so that we can see what the program actually did.
I’ve written this to make sure I can do it again. I hope it is useful but the code itself is embargoed because otherwise future students of Stanford will not have to do any work and will not learn anything!
CHECK OUT SIMILAR POSTS
- Down-to-earth principal components analysis in 5 steps
- Learning curves and modelling in machine learning
- 10 steps to build a spam catcher
- A general algorithm for implementing a neural network