<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>flowing motion &#187; SOCIAL MEDIA &amp; IT</title>
	<atom:link href="http://flowingmotion.jojordan.org/category/social-media-it/feed/" rel="self" type="application/rss+xml" />
	<link>http://flowingmotion.jojordan.org</link>
	<description>blog of jo jordan</description>
	<lastBuildDate>Fri, 27 Jan 2012 09:49:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>2 useful resources for finding data</title>
		<link>http://flowingmotion.jojordan.org/2012/01/21/2-useful-resources-for-finding-data/</link>
		<comments>http://flowingmotion.jojordan.org/2012/01/21/2-useful-resources-for-finding-data/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 16:19:30 +0000</pubDate>
		<dc:creator>Jo Jordan</dc:creator>
				<category><![CDATA[SOCIAL MEDIA & IT]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[open data]]></category>
		<category><![CDATA[scraping]]></category>

		<guid isPermaLink="false">http://flowingmotion.jojordan.org/?p=5111</guid>
		<description><![CDATA[#1  A flowchart for finding the right data source A flowchart made by data journalist, Paul Bradshaw #2 A community&#8230;]]></description>
			<content:encoded><![CDATA[<h2>#1  A flowchart for finding the right data source</h2>
<p>A<a title="Flowchart data journalism" href="http://www.flickr.com/photos/onlinejournalismblog/6078887277/in/photostream"> flowchart made by data journalist, Paul Bradshaw<br />
</a></p>
<h2>#2 A community listing data sources and extraction tools</h2>
<p>A question &amp; answer community for <a title="gethedata" href="http://getthedata.org/questions/">data sources and extraction tools</a> (new in January 2012)</p>
<h3>CHECK OUT SIMILAR POSTS</h3>
<ul>
<li><a href="http://flowingmotion.jojordan.org/2010/06/13/rational-data-analysts-have-to-consider-motive-first/" rel="bookmark" title="June 13, 2010">Rational data analysts have to consider motive first</a></li>
<li><a href="http://flowingmotion.jojordan.org/2012/01/21/3-steps-of-agile-sense-making/" rel="bookmark" title="January 21, 2012">3 steps of agile sense-making</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/01/31/hacking-then-and-now/" rel="bookmark" title="January 31, 2010">Hacking then and now</a></li>
</ul>
<p><!-- Similar Posts took 29.179 ms --></p>
]]></content:encoded>
			<wfw:commentRss>http://flowingmotion.jojordan.org/2012/01/21/2-useful-resources-for-finding-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rebuild eStore on WordPress in half-an-hour</title>
		<link>http://flowingmotion.jojordan.org/2011/12/06/rebuild-estore-on-wordpress-in-half-an-hour/</link>
		<comments>http://flowingmotion.jojordan.org/2011/12/06/rebuild-estore-on-wordpress-in-half-an-hour/#comments</comments>
		<pubDate>Tue, 06 Dec 2011 19:40:39 +0000</pubDate>
		<dc:creator>Jo Jordan</dc:creator>
				<category><![CDATA[SOCIAL MEDIA & IT]]></category>
		<category><![CDATA[eStore]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://flowingmotion.jojordan.org/?p=5104</guid>
		<description><![CDATA[In my current mood, I do not recommend the eStore plugin but if you must, pay your money and do&#8230;]]></description>
			<content:encoded><![CDATA[<p>In my current mood, I do not recommend the eStore plugin but if you must, pay your money and do this. I had to rebuild my store. Now I know how, it should take half-an-hour. It took my over 9 hours with back and forth to the vendors. Here’s some tips to help you understand what is involved.</p>
<p>So calm down, read carefully, and work carefully. 30 minutes should be ample especially with some preparation long before you get into trouble. So I will write as if you are building the eStore from the very beginning.</p>
<h2>#1 License and commercial details</h2>
<p>• Keep two pieces of paper – the email with the download link and the Paypal transaction.</p>
<p>• Keep the email address you used and the Paypal transaction number – they work as your permanent license. You need them to communicate with eStore.</p>
<p>• I have stored them in various places like my email, on the CD where a backup of the plugin is stored, in my diary in case I am away from my desk when it crashes – which it will it seems.</p>
<h2>#2 Help and forum</h2>
<p>Now sign up to the forum and change your password. Again you don’t want to do this when your shop is collapsing and you are under pressure.</p>
<h2>#3 Supercache &#8211; not</h2>
<p>Then download the plugin and use it. Don’t use Supercache even if your hosting service says to. And even though eStore says things like “if you are using Supercache”. They mean “please never use Supercache”.</p>
<p>If you are in Europe, don’t use Supercache. It stops people changing their details in your cart and shows the details to other customers. Without stopping to worry too much about it, it seems that this breaches Privacy laws horribly.</p>
<p>And don’t delete Supercache if you find you have it running with eStore because it eStore will break you WordPress dashboard.</p>
<p>You must deactivate, if not delete eStore, take down Supercache, and rebuild eStore. Horrible, huh?</p>
<h2>#4 Rebuild eStore</h2>
<p>Rebuilding eStore is much easier than eStore makes out. This is what you need to know.</p>
<p>Your WordPress site has two parts: the code is loaded up in one part which you can see using FTP. The content of your posts is loaded into a MySQL database which is accessed through phpmyAdmin.</p>
<p>When you back up your WordPress site, only the MySQL database is getting backed up. If you want to restore your database, you still need a skeleton WordPress site to house it. You can always rebuild a Worpress site from scratch and put back in them, and modications and plugins. So remember to back up any themes that you have bought, any child theme you have written, and to list the plugins you use and any licenses that you have like for your spam catcher. This is not stuff to leave till tomorrow. Always do it immediately.</p>
<p>So let’s assume you do have your MySQL backed up, and you do have the modifications to your WordPress theme backed up and notes of what is where on your website.</p>
<p>To rebuild eStore, you also need a good copy of their code. If you bought it recently, you have one. If it is a few months old, get the commercial details together and to their forum to hunt, and I mean hunt, for a link for automatic updates. Use the commercial details to get updated copies of the plugin.</p>
<p>Save the up-to-date plugins somewhere on your C drive (remembering to backup them up on CD later). Now what you are going to do is wipe out the offending plugins and write back the code for the plugins.</p>
<p>Use FTP or Filezilla to look at the WordPress PHP for your website and track to wp-content/plugins. Delege the wp-cart-for-digital-products and any offending caches. You can do that because only the code hangs in those foldders. The details of your shop have been stored in your MySQL database (which is backed up anyway, right?).</p>
<p>Now you can transfer the new plugin from your hard drive to the folder where the old eStore hung out. And all should be good.</p>
<p>The key is to be clear where everything lives and that the details of your shop are in MySQL and the code for the plugin is in what you see in FTP (your theme is also there). The only shop assets you can see in FTP are under wp-content/uploads. If you are selling digital goods, that’s where they are. All the tetchy little details of the shop and who bought what are in the MySQL database with your posts, comments, users etc.</p>
<p>I hope this helps. Rebuilding eStore should take about half-an-hour. It took me 9 hours. It needn’t.</p>
<h3>CHECK OUT SIMILAR POSTS</h3>
<ul>
<li><a href="http://flowingmotion.jojordan.org/2010/05/04/step-2-consolidate-my-online-strategy-make-a-wordpress-shell-on-dreamhost/" rel="bookmark" title="May 4, 2010">Step 3: Consolidating my online strategy &#8211; make a WordPress shell on Dreamhost</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/07/02/two-important-tips-when-using-wordpress-on-a-local-server-like-wamp/" rel="bookmark" title="July 2, 2011">Two Important Tips when using WordPress on a local server like WAMP</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/06/28/download-resources-from-wordpress-with-a-download-manager/" rel="bookmark" title="June 28, 2010">Download resources from WordPress with a download manager</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/05/05/step-4-consolidating-my-online-strategy-prepping-my-wordpress-shell-to-import-my-blog-content/" rel="bookmark" title="May 5, 2010">Step 4: Consolidating my online strategy &#8211; prepping my WordPress shell to import my blog content</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/05/05/step-5-consolidating-my-online-strategy-moving-my-content-from-wordpress-com-to-self-hosted-dreamhost/" rel="bookmark" title="May 5, 2010">Step 5: Consolidating my online strategy &#8211; moving my content from WordPress.com to self-hosted Dreamhost</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/07/04/break-your-blogging-learning-curve-into-10-chunks/" rel="bookmark" title="July 4, 2010">Break your blogging learning curve into 10 chunks</a></li>
<li><a href="http://flowingmotion.jojordan.org/2008/04/05/testing-word-press-changes/" rel="bookmark" title="April 5, 2008">Testing word press changes</a></li>
</ul>
<p><!-- Similar Posts took 67.169 ms --></p>
]]></content:encoded>
			<wfw:commentRss>http://flowingmotion.jojordan.org/2011/12/06/rebuild-estore-on-wordpress-in-half-an-hour/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Down-to-earth principal components analysis in 5 steps</title>
		<link>http://flowingmotion.jojordan.org/2011/12/04/down-to-earth-principal-components-analysis-in-5-steps/</link>
		<comments>http://flowingmotion.jojordan.org/2011/12/04/down-to-earth-principal-components-analysis-in-5-steps/#comments</comments>
		<pubDate>Sun, 04 Dec 2011 21:43:22 +0000</pubDate>
		<dc:creator>Jo Jordan</dc:creator>
				<category><![CDATA[SOCIAL MEDIA & IT]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Octave]]></category>
		<category><![CDATA[principal components analysis]]></category>
		<category><![CDATA[SVD]]></category>

		<guid isPermaLink="false">http://flowingmotion.jojordan.org/?p=5100</guid>
		<description><![CDATA[This post is a step-by-step, practical, guide to principal components analysis.  It’s very hands-on and “common sensical”.  If any experts&#8230;]]></description>
			<content:encoded><![CDATA[<p>This post is a step-by-step, practical, guide to principal components analysis.  It’s very hands-on and “common sensical”.  If any experts out there spot an egregious error that would horribly mislead a beginner, please do let me know.</p>
<p>I’ll simply work through 4 steps and then sum up as 5 steps in a slightly different order.</p>
<h3>#1 Data</h3>
<p>I always like to start any data problem by thinking about my data rather concretely.</p>
<p>In a PCA problem, imagine a spreadsheet which has hundreds or even thousands of columns – far too many for comfort.</p>
<p>On each row of the spreadsheet is a case, or ‘training example’ in machine learning parlance.</p>
<p>What we want to do is to find the columns that matter.  Alternatively, we ask “which columns could we bundle together into computed columns so that we have a more manageable number?”</p>
<p>In short, this procedure will tell us which columns we can leave out and which one’s we should bundle together.  The bundles will be the principal components.  And the final answer will tell us how much precision we have lost by bundling scores rather than using original raw data.</p>
<p>So, we begin with data which is laid out in matrix X with m rows and n columns (m x n).</p>
<h3>#2 Normalize the data</h3>
<p>The data comes in all sizes – little numbers and big numbers, very spread out and bunched together.  First we smooth the data in the same way that tests are normed at college.  Simply, we convert each column to a mean of zero and a standard deviation of one.</p>
<p>To be very clear how this works, we take each cell and adjust the number in the cell depending on the other numbers in the column. If we were working with spreadsheets, we would open another spreadsheet of exactly the same number of rows and columns and add this formula to each cell. So for cell A1 we would have:</p>
<p>= (Sheet1!A1 – mean(ColumnA))/StdDev(ColA)</p>
<p>When we calculate the mean and stddev of the columns in the new spreadsheet, they will all be 0 and 1 respectively.</p>
<h3>#3 Principal Component Analysis</h3>
<p>Now we find the ‘bundles’ of columns.</p>
<p>In my old days of statistics packages, the program would return a table which listed all the columns down the page and then produced factor loadings or weights for a whole heap of factors (or bundles) in more columns.  The number and the sign would tell you the weight of the original data column in the new ‘bundle’.  The weights varied from -1 through 0 to +1.</p>
<p>In Octave, the free version of Matlab, there is a facility to do PCA in two steps:</p>
<h4>Step #3 Part One</h4>
<ul>
<li>Compute what is called the covariance matrix.  Simple imagine taking  a copy of the spreadsheet (the second one), multiplying it cell to cell (A1 to A1) and taking the sum of those squares in the new column A, then A1 to B1 and taking the sum of the product in column B, then A1 to C1.. etc until we have new row with N columns each got by multiplying two columns and adding up the product. You’ll have to try it yourself.  I’ll have to get out pen and paper when I read this a year from now.</li>
<li>Then we do the same starting with Col B and Col A (that’s a repeat, I know.. stick it in), B to B, B to C  etc.</li>
<li>Until we have a new matrix with N columns and N rows.  (Yes – this is what computers are for).</li>
</ul>
<ul>
<li>And one more sub- step – divide every cell by the original number of cases or training examples (i.e., rows in the very first spreadsheet).</li>
</ul>
<p>That’s the covariance matrix.  In Octave, which uses linear algebra, it is much easier.  You just tell the machine to multiply the transpose of the normalized data by the normalized data and divide by m – one line of code.</p>
<p>CovarianceMatrix = (X’ * X )/m</p>
<p>(That’s what computers are for!.. the explanation was just so you have a concrete idea of where the data came from and what happened to it).</p>
<h4>Step #3 Part One</h4>
<p>The second step in PCA is to find the bundles using a function that is built into Octave called the ‘singular value decomposition’ or SVD.</p>
<p>All you do is ask for it and it ‘returns’ three matrices, U, S and V and we are going to use U and S.</p>
<p>U gives us a matrix exactly the same size as the covariance matrix.  Each column now refers to a ‘bundle’. The rows still refer as before to the features (that is the original columns in the data matrix and the normalized data matrix. Have a quick check.  I’ll wait!).</p>
<p>Note we have as many bundles as we had columns right at the start but we don’t want to use all the unbundles (columns in the U matrix) otherwise we will have exactly the same number of columns as when we started – no point, hey?</p>
<p>So we will only use as many, starting from the left, as we need.  We will decide how many to use on the basis of the S matrix.</p>
<p>Let’s do that later and  consider further what U actually tells us.  As we read down column one, cell A1 tells us the loading of original column A, or the 1<sup>st</sup> feature, on the new bundle 1.  Cell A2 tells us the loading of original column B or the 2<sup>nd</sup> feature, on new bundle 1. And so on.  Got it?  Try it out.</p>
<p>So what can we do with this?  Two things -</p>
<ul>
<li>We can see which of our original columns were the most important.  They are the ones with the biggest numbers in column on the left and subsequent columns as you move right.   A positive number means the higher the original number the higher would be the bundle score. A negative number in this new table means the higher the number in the original table, the lower would be the bundle score.  Good to know if two of our original columns pull in  the opposite directions. So that is the first use – to understand the original columns and how they hang together.</li>
<li>The second use is to create a simplified data set.  Ok, we hate it when bureaucrats create scores for us – like a credit rating. But imagine the rows are just pictures and the columns were the pixels or colors at 10 000 points on page – collapsing 10 000 columns into 1000 columns or 100 columns could be handy for data compression.  (We’ll work out later how much data is lost – or how much blur is added!)  So how do we recreate the scores?  We will come back to this – lets stick with understanding what those numbers in the U matrix mean. All we have to do to get a score for the bundle  is take the number in the U matrix for original column A (now in row 1) and multiple it by the score for the first case in column A  (do it on a bit of paper).  Do that for the whole row for the case times the whole column in U (row of original data times column in the U matrix), add it up, and we get a ‘bundle’ score for the first case.  That will go in cell A1 in a new table. The cell is the score for case 1 on bundle 1.</li>
<li>Then we can do the same for the second case, then the third.  And we will make a new column of bundled scores for the whole data set.</li>
<li>Then we do the same for the second bundle (it’s OK – that’s what computers are for).</li>
<li>Finally we have a matrix with as many rows as we have cases or training examples and as many columns as we have new bundles.  This can be useful when we need compressed data and we don’t mind a bit of blur in the data.</li>
</ul>
<h3>#4 How many bundles do we need?</h3>
<p>So now we come back to how many bundles do we need?  Well firstly, a lot fewer than the number of columns that we started with. That’s the whole idea of this exercise – to get that original spreadsheet a lot, lot smaller.</p>
<p>I mentioned before that we use the data for the second matrix, S, that is churned out by the SVD function in Octave, to work out how many bundles to keep.</p>
<p>This S matrix is also exactly the same size as the covariance matrix which was square with the same number of rows and columns as we had columns in the first, first, first data table.</p>
<p>This time, though, we only have data in the diagonal from top left to bottom right.  Every other cell is zero.  So that means there is a number for original row 1 and column A; row 2 and column B; etc.  Gee, couldn’t we have a column?  Yes we could. It’s laid out this way to do with the way machines do arithmetic. It is easier for the machine to pull out the matching diagonal from the U matrix for example.  But that’s not our problem right now.  We just want to know how to use these numbers to work out how many bundles to keep.</p>
<p>Well, these numbers represent how much variance is explained by the bundle.  The very first number (top left) tells us how much variance in the whole original data set is explained by the new bundle.  To work out what % of variance is accounted for by this bundle, we take all the numbers on the diagonal and add them up to give us a number representing all the variance in the whole data set.  Then we take the number for the first bundle (top left) and work it out as a percentage of the whole lot. If the percentage is less than 99% (.99), then we add another bundle (well we add the percentage of another bundle or we add the numbers of the two bundles and divide by sum for all the numbers).  We just keep going until we have enough bundles to account for 99% of the original variance.  (So in plain terms, we have allowed for 1% of blurring).</p>
<p>Oddly, only 1% of blurring might allow us to lose a lot of columns.  Or more precisely, when we compute new scores, one for each bundle in the final solution, we might have far fewer bundles than original number of columns, but still account for 99% of the original amount of detail.</p>
<h3> That’s it…that’s PCA.</h3>
<p>#1 Get your data in a spread sheet (cases on rows, features in columns)</p>
<p>#2 Normalize the data so each column has a mean of 0 and an SD of 1</p>
<p>#3 Use a built-in function to return a matrix of eigenvectors (U) and variance (S)</p>
<p>#4 Decide how many ‘bundles’ of features to keep in (account for 99% of variance)</p>
<p>#5 Compute new scores – one score for each case for each bundle (now the columns)</p>
<p>And what do we do with this?</p>
<p>#1 We understand how columns hang together and what we can drop and only lose 1% of detail (or add 1% of blur)</p>
<p>#2 We can use the new scores to do other things like feed them into a prediction program or supervised learning program. The advantage is not to improve on prediction, btw,  but to cut down on computing costs.</p>
<p>That’s all!</p>
<h3>CHECK OUT SIMILAR POSTS</h3>
<ul>
<li><a href="http://flowingmotion.jojordan.org/2011/10/16/12-steps-to-running-gradient-descent-in-octave/" rel="bookmark" title="October 16, 2011">12 steps to running gradient descent in Octave</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/11/03/a-general-algorithm-for-implementing-a-neural-network/" rel="bookmark" title="November 3, 2011">A general algorithm for implementing a neural network</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/11/24/10-steps-to-build-a-spam-catcher/" rel="bookmark" title="November 24, 2011">10 steps to build a spam catcher</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/11/15/learning-curves-and-modelling-in-machine-learning/" rel="bookmark" title="November 15, 2011">Learning curves and modelling in machine learning</a></li>
</ul>
<p><!-- Similar Posts took 85.193 ms --></p>
]]></content:encoded>
			<wfw:commentRss>http://flowingmotion.jojordan.org/2011/12/04/down-to-earth-principal-components-analysis-in-5-steps/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>10 steps to build a spam catcher</title>
		<link>http://flowingmotion.jojordan.org/2011/11/24/10-steps-to-build-a-spam-catcher/</link>
		<comments>http://flowingmotion.jojordan.org/2011/11/24/10-steps-to-build-a-spam-catcher/#comments</comments>
		<pubDate>Thu, 24 Nov 2011 19:00:47 +0000</pubDate>
		<dc:creator>Jo Jordan</dc:creator>
				<category><![CDATA[SOCIAL MEDIA & IT]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[SVM]]></category>

		<guid isPermaLink="false">http://flowingmotion.jojordan.org/?p=5095</guid>
		<description><![CDATA[Here are the ten broad steps to build a spam catcher Get a sample of emails that are known to&#8230;]]></description>
			<content:encoded><![CDATA[<p>Here are the ten broad steps to build a spam catcher</p>
<ol>
<li>Get a sample of emails that are known to be spam or not spam. Split the sample 60:20:20 to provide a “training” set, a “cross-validation” set and a “test” set.</li>
<li>Turn each email into a list of words by
<ul>
<li>Stripping out headers (if not part of the spam test) and other redundancies</li>
</ul>
<ul>
<li>Running NLP software to record the stem of a word only (for example, record city and cities as cit)</li>
</ul>
</li>
<li>Count the number of times each unique word appears in the sample and order the list so that we can use the top 100 or 10 000 or 50 000 (whatever) to check for spam.  Remember to use stemmed words!</li>
<li>Convert the list of words in each email into a list of look-up numbers by substituting the row number of the word from the dictionary we made in Step 3.</li>
<li>For each email, make another list where row 1 is 1 if the first word in the dictionary is present in the email, where row 2 is the 1 if the second word in the dictionary is present in the email. If the word is not present, leave the value for that row as zero. You should now have as many lists are you have emails each with as many rows as you have words in your spam dictionary.</li>
<li>Run a SVM algorithm to predict whether each email is spam (1) or not spam (0).  The input is the list of 1s and 0s indicating which words are present in the email.</li>
<li>Compare the predictions with the know values and compute the percentage correct.</li>
<li>Compute the predictions on the cross-validation set and tweak the algorithm depending on whether the cross-validation accuracy is too similar to the training accuracy (suggesting the model could be stronger) or too dissimilar (suggesting the model is too strong).</li>
<li>Find the words most associated with spam.</li>
<li>Repeat as required.</li>
</ol>
<h3>CHECK OUT SIMILAR POSTS</h3>
<ul>
<li><a href="http://flowingmotion.jojordan.org/2011/11/15/learning-curves-and-modelling-in-machine-learning/" rel="bookmark" title="November 15, 2011">Learning curves and modelling in machine learning</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/05/09/step-6-consolidating-my-online-strategy-spam-catche/" rel="bookmark" title="May 9, 2010">Step 6: Consolidating my online strategy &#8211; Spam catcher</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/11/03/a-general-algorithm-for-implementing-a-neural-network/" rel="bookmark" title="November 3, 2011">A general algorithm for implementing a neural network</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/10/16/12-steps-to-running-gradient-descent-in-octave/" rel="bookmark" title="October 16, 2011">12 steps to running gradient descent in Octave</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/12/04/down-to-earth-principal-components-analysis-in-5-steps/" rel="bookmark" title="December 4, 2011">Down-to-earth principal components analysis in 5 steps</a></li>
</ul>
<p><!-- Similar Posts took 39.894 ms --></p>
]]></content:encoded>
			<wfw:commentRss>http://flowingmotion.jojordan.org/2011/11/24/10-steps-to-build-a-spam-catcher/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Learning curves and modelling in machine learning</title>
		<link>http://flowingmotion.jojordan.org/2011/11/15/learning-curves-and-modelling-in-machine-learning/</link>
		<comments>http://flowingmotion.jojordan.org/2011/11/15/learning-curves-and-modelling-in-machine-learning/#comments</comments>
		<pubDate>Tue, 15 Nov 2011 16:32:45 +0000</pubDate>
		<dc:creator>Jo Jordan</dc:creator>
				<category><![CDATA[SOCIAL MEDIA & IT]]></category>
		<category><![CDATA[learning curves]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://flowingmotion.jojordan.org/?p=5088</guid>
		<description><![CDATA[In this post, I am going to describe what I have just learned from Andrew Ng at Stanford about “learning&#8230;]]></description>
			<content:encoded><![CDATA[<p>In this post, I am going to describe what I have just learned from Andrew Ng at Stanford about “learning curves”.  To computer scientist, a learning curve is what you might expect but describes how well data has been modeled.</p>
<p>I write this as a classically trained psychologist and it is clear that if we are to understand machine learning, we have to watch out for where the thinking of computer scientists differs radically from our own.  This is my commonsensical comparison of the two approaches.  I am writing it down to make sure I have followed what I heard.  It is rough and ready but may help you understand the differences between the two disciplines.</p>
<h3>A learning curve in CS</h3>
<p>Simply, the CStists take random samples of data where the first sample is very small, let’s say 1 because that is helpful to understanding the logic, and the last sample will be large, let’s say a few thousand.  This is random samples from the same large data set.</p>
<p>Generally, with a sample of 1 up to 3, we can model perfectly.  However, when we try the same model with another sample of the same size, the model will not predict well at all. The amounts of error for the experimental sample and the comparison sample will be hugely different.  So far so good. That’s what we all learned at uni.  Modelling on a small sample is the equivalent of an ‘anecodote’.  Whatever we observed may or may not transfer to other situations.</p>
<p>As we increase our sample size, paradoxically the amount of error in our model increases but the amount of error in our comparison situation decreases.  And ultimately, the error we are making in the two situations converges.  We also know this from uni.</p>
<p>Much of our training goes into getting us to do this and to increasing the sample size so that the error in the hypothetical model goes up, and the error in the comparison model goes down.  Plot this on a piece of paper with error on the y axis and sample size on the x axis.</p>
<p>When the two error rates converge, that is we can explain the future as well as we can explain the present, then we stop and say, “Hey, I have found a scientific law!”</p>
<p>I would say that our willingness to tolerate a more general description of a particular situation so that we can generalize at the same level of accuracy (and inaccuracy) to another situation is one of the hallmarks of uni training. This is so counter-intuitive that many people resist so it takes uni training to get us to do it.</p>
<p>What the computer scientists implicitly point out is that the converse is also true. We are now able to explain the future as badly as we explain the present!  They call this underfitting and suggest that we try another model to see if we can do a better job of explaining the present.  So we will stop increasing the sample size and start playing with the model. We can vary the form of the model, typically moving from a linear to a non-linear model (that is adding more features) and increasing the weights of the parameters (go from a loose floppy kind of model to a stiffer model, if you like).</p>
<p>They do this until the model overfits. That is, until our explanation of the present is very good but the same explanation produces errors in comparison situations.  When they reach this point, they backtrack to a less complicated model (fewer non-linear terms) and decrease the weights of the parameters (take note of a feature but not put too much emphasis on it.)</p>
<p>Once they have found this happy middle ground with a more complicated model, but without the expense of collecting more data, they will try it out on a completely new set of data.</p>
<h3>Break with common practice in psychology</h3>
<p>For any psychologists reading this</p>
<ul>
<li>This kind of thinking provides us with a possibility of getting away from models that have been stagnant for decades.  Many of these models predict the present so-so and the future so-so.  Here is the opportunity to break away.</li>
<li>Note that machine learning specialists use procedures that look like statistics but abandon the central idea of statistics.  They aren’t promising that their original sample was randomly chosen and they aren’t directly interested in the assertion that “if and only if our original sample was random, then what we found in the sample generalizes to other samples that have also been chosen randomly”.  Though they do something similar (taking lots of randomly chosen slices of data from the data they have), they aren’t in the business of asserting the world will never change again.  They have high speed computers to crunch more data when it becomes clear that the world has changed (or that our model of the world is slightly off).</li>
<li>Many of the rules-of-thumb that we were once taught fall away. Specifically, get a large sample, keep the number of features below the size of the sample, keep the model simple – these prescriptions are not relevant once we change our starting point.  All we want to find is the model that can generalize from one situation to another with the least error and high speed computers allow us both to use more complicated models and recomputed them when the world they described changes.</li>
</ul>
<p>I am still to see good working examples outside marketing on the one hand and robotics on the other, but it seemed worth while trying to describe the mental shift that a classically trained psychologist will go through.  Hope this helps</p>
<h3>CHECK OUT SIMILAR POSTS</h3>
<ul>
<li><a href="http://flowingmotion.jojordan.org/2011/11/24/10-steps-to-build-a-spam-catcher/" rel="bookmark" title="November 24, 2011">10 steps to build a spam catcher</a></li>
<li><a href="http://flowingmotion.jojordan.org/2007/12/04/how-our-training-as-psychologists-inhibits-our-ability-to-understand-generative-positive-and-appreciative-psychology/" rel="bookmark" title="December 4, 2007">How our training as psychologists inhibits our ability to understand generative, positive and appreciative psychology</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/10/16/12-steps-to-running-gradient-descent-in-octave/" rel="bookmark" title="October 16, 2011">12 steps to running gradient descent in Octave</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/12/04/down-to-earth-principal-components-analysis-in-5-steps/" rel="bookmark" title="December 4, 2011">Down-to-earth principal components analysis in 5 steps</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/12/18/a-poet-tells-us-how-to-be-goal-oriented-and-mindful-at-the-same-time/" rel="bookmark" title="December 18, 2009">A poet tells us how to be goal-oriented AND mindful at the same time</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/11/03/a-general-algorithm-for-implementing-a-neural-network/" rel="bookmark" title="November 3, 2011">A general algorithm for implementing a neural network</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/02/23/thinking-out-my-2-steps-for-building-early-stage-forums/" rel="bookmark" title="February 23, 2010">Thinking out my 2 steps for building early stage forums</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/03/24/little-known-secrets-about-what-a-work-and-organizational-psychologist-will-do-for-you-in-a-recession/" rel="bookmark" title="March 24, 2009">Little known secrets about what a work and organizational psychologist will do for you in a recession</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/01/10/psychologists-2009-ad-recessions-life/" rel="bookmark" title="January 10, 2009">Psychologists, 2009 AD, recessions, life</a></li>
<li><a href="http://flowingmotion.jojordan.org/2008/11/11/work-psychology-2008-ad-2/" rel="bookmark" title="November 11, 2008">Work psychology: 2008 AD</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/03/31/3-simple-ideas-for-leading-in-todays-turbulent-workplaces/" rel="bookmark" title="March 31, 2011">3 simple ideas for leading in today&#8217;s turbulent workplaces</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/01/15/complicated-is-horrible-complexity-is-beautiful/" rel="bookmark" title="January 15, 2009">Complicated is horrible. Complexity is beautiful.</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/10/06/education-level-that-was-good-for-the-top-3-is-now-necessary-for-all-but-the-bottom-3/" rel="bookmark" title="October 6, 2009">Education level that was good for the top 3% is now necessary for all but the bottom 3%</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/02/04/paolo-coelho-on-happiness-and-two-challenges-for-psychologists/" rel="bookmark" title="February 4, 2010">Paolo Coelho on happiness and two challenges for psychologists</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/02/11/from-anger-to-effective-action/" rel="bookmark" title="February 11, 2009">From anger to effective action</a></li>
<li><a href="http://flowingmotion.jojordan.org/2008/06/16/what-psychologists-can-learn-from-social-media/" rel="bookmark" title="June 16, 2008">What psychologists can learn from social media</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/09/20/rssd-to-make-your-dream-come-true/" rel="bookmark" title="September 20, 2009">RSS&#039;d to make your dream come true?</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/10/05/mana-between-ourselves-and-others/" rel="bookmark" title="October 5, 2009">Mana: between ourselves and others</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/12/08/give-up-control-to-be-in-control-make-any-sense-to-you/" rel="bookmark" title="December 8, 2009">Give up control to be in control.  Make any sense to you?</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/03/03/79-flowers-to-brand-your-work/" rel="bookmark" title="March 3, 2009">79 flowers to brand your work</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/08/30/3-easy-sunday-ways-to-master-the-3-principles-of-design/" rel="bookmark" title="August 30, 2009">3 Easy Sunday Ways to Master the 3 Principles of Design</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/10/18/grief-cycle-watch-we-are-up-to-sulking/" rel="bookmark" title="October 18, 2009">Financial Crisis Watch: We are up to sulking?</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/03/19/walking-with-the-elephants-remembering-galba-bright/" rel="bookmark" title="March 19, 2010">Walking with the elephants: remembering Galba Bright</a></li>
<li><a href="http://flowingmotion.jojordan.org/2009/11/04/the-positive-psychology-of-anger/" rel="bookmark" title="November 4, 2009">The positive psychology of anger and hate</a></li>
<li><a href="http://flowingmotion.jojordan.org/2008/05/09/mr-kiasu/" rel="bookmark" title="May 9, 2008">Mr Kiasu</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/01/07/pleasure-engagement-and-meaning-for-a-good-life/" rel="bookmark" title="January 7, 2010">Pleasure, engagement and meaning for a good life</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/01/07/dont-spend-a-day-without-having-noticed-a-miraculous-moment-when-the-universe-converges-and-you-were-there/" rel="bookmark" title="January 7, 2010">Don&#039;t spend a day without having noticed a miraculous moment when the universe converges and you were there</a></li>
<li><a href="http://flowingmotion.jojordan.org/2008/11/18/uni-degree-then-what/" rel="bookmark" title="November 18, 2008">Uni degree, then what?</a></li>
<li><a href="http://flowingmotion.jojordan.org/2010/01/05/4-big-reasons-why-we-initally-find-positive-psychology-puzzling/" rel="bookmark" title="January 5, 2010">4 big reasons why we initally find positive psychology puzzling</a></li>
<li><a href="http://flowingmotion.jojordan.org/2008/12/08/3-characteristics-of-recession-lovers/" rel="bookmark" title="December 8, 2008">3 characteristics of recession-lovers</a></li>
</ul>
<p><!-- Similar Posts took 86.711 ms --></p>
]]></content:encoded>
			<wfw:commentRss>http://flowingmotion.jojordan.org/2011/11/15/learning-curves-and-modelling-in-machine-learning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Back propagation for the seriously hands-on</title>
		<link>http://flowingmotion.jojordan.org/2011/11/10/back-propagation-for-the-seriously-hands-on/</link>
		<comments>http://flowingmotion.jojordan.org/2011/11/10/back-propagation-for-the-seriously-hands-on/#comments</comments>
		<pubDate>Thu, 10 Nov 2011 22:05:10 +0000</pubDate>
		<dc:creator>Jo Jordan</dc:creator>
				<category><![CDATA[SOCIAL MEDIA & IT]]></category>
		<category><![CDATA[back propagation]]></category>
		<category><![CDATA[neural network]]></category>

		<guid isPermaLink="false">http://flowingmotion.jojordan.org/?p=5082</guid>
		<description><![CDATA[I have just finished the Stanford back propagation exercise, and to put it mildly, it was a ****. So is&#8230;]]></description>
			<content:encoded><![CDATA[<p>I have just finished the Stanford back propagation exercise, and to put it mildly, it was a ****.</p>
<p>So is back propagation complicated?  And indeed what is it?</p>
<p>These are my notes so that I don’t have to go through all the pain when I do this again.  I am not an expert and the agreement with Stanford is that we don’t give away the answer particularly at the level of code.  So use with care and understand that this can’t tell you everything.  You need to follow some lecture notes too.</p>
<h3>Starting from the top: What is back propagation?</h3>
<p>Back propagation is a numerical algorithm that allows us to calculate an economical formula for predicting something.</p>
<p>I am going to stick to the example that Stanford uses because the world of robotics seems infinitely more useful than my customary field of psychology. Professor Ng uses an example of handwriting recognition much as the Royal Mail must use for reading postal codes.</p>
<p>We scan a whole lot of digits and save each digit as a row of 1’s and 0’s representing ink being present on any one of 400 (20&#215;20) pixels.  Can you imagine it?</p>
<p>Other problems will always start the same way &#8211; with many cases or training examples, one to each row; and each example described by an extraordinary large number of features. Here we have 400 features or columns of X.</p>
<p>The second set of necessary input data is one last column labeling the row.  If we are reading digits, this column will be made up of digits 0-9 (though 0 is written down as 10 for computing reasons).  The digit is still 0 in reality and if we reconstructed the digit by arranging the 400 pixels, it will still be seen to the human eye as 0.</p>
<p>The task is to learn a shorthand way for a computer to see a similar scan of 400 pixels and say, aha, that’s a 1, or that’s a 2 and so on.</p>
<p>Of course the computer will not be 100% accurate but it will get well over 95% correct as we will see.</p>
<p>So that is the input data: a big matrix with examples along the rows of features and with the last column being the correct value – the digit from (10, 1-9) in this case.</p>
<h3>How does back propagation work?</h3>
<p>Back propagation programs work iteratively without any assumptions about statistics that we are used to in psych.</p>
<p>The computer boffins start by taking a wild guess of the importance of each pixel for a digit, and see what the computer would predict with those weights.  That is called the forward pass.</p>
<p>Then based on what the computer got right or wrong, they work backwards to adjust the weights or importance of each pixel for each digit.</p>
<p>And remembering that computers are pretty fast, the computer can buzz back and forth asking “how’s this?”.</p>
<p>After a set number of trials, it stops improving itself and tells us how well it can read the digits, i.e., compares its answers to the right answers in the last column of our input data.</p>
<h3>What is a hidden layer?</h3>
<p>Back proagation also has another neat trick.  Instead of using pixels to predict digits, it works with an intermediate or hidden layer.  So the pixels predict some units in the hidden layer and the hidden layer predicts the digits.  Choosing the number of units in the hidden layer is done by trying lots of versions (10 hidden units, 50 hidden units, etc) but I guess computer scientists can pick the range of the right answer as they get experienced with real world problems.</p>
<p>In this example, the solution worked with 25 hidden layers.  That is, 400 pixels were used to make predictions about 25 units which predict which of 10 digits made the data.</p>
<p>The task of the computing scientist is to calculate the weights from the pixels to the hidden layers and from the hidden layers to the digits and then report the answer with a % of “training accuracy” – over 95%, for example.</p>
<h3>Steps in back propagation</h3>
<p>We have already covered the first four  steps</p>
<h3>Step 1: Training data</h3>
<p>Get lots of training data with one example on each row and lots of features for each example in the columns.</p>
<p>Make sure the row is labeled correctly in the last column.</p>
<h3>Step 2:  Decide on the number of units in the hidden layer</h3>
<p>Find out what other people have tried for similar problems and start there (that’s the limit of my knowledge so far).</p>
<h3>Step 3: Initialize some weights</h3>
<p>I said before, we start with wild guess.  Actually we start with some tiny numbers but the numbers are random.</p>
<p>We need one set of weights linking each pixel to each hidden layer (25 x 400)* and another set linking each hidden layer to each digit (10 x 25)*.</p>
<p>The asterisk means that a bias factor might be added in raising one or the other number by 1.  To keep things simple, I am not going to discuss the bias factor. I’ll just flag where it comes up.  Be careful with them though because I am tired and they might be wrong.</p>
<h3>Step 4: Calculate the first wildly inaccurate prediction of the digits</h3>
<p>Use the input data and the weights to calculate initial values for the hidden layer.</p>
<p>Our input data of training examples and features (5000 examples by 400 pixels) is crossed with the appropriate initial random weights (25 x 400) to get a new matrix of hidden layer values.  Each training example will have 25 new values (5000 x 25)*.</p>
<p>Then repeat again from the hidden layer to the layer of digits or output layer making another matrix of 5000 x 10.</p>
<p>In the very last step, the calculated value is converted into a probability with the well know sigmoid function.  It would be familiar if you saw it.  I’ll try to patch it in.</p>
<p>The values calculated at the hidden layer are converted into these probability-type values and they are used for the next step and the final answer is converted in the same way.</p>
<p>Now we have a probability type figure for each of 10 digits for each training example (5000 x 10)*.</p>
<h3>Step 5: Find out how well we are doing</h3>
<p>In this step, we first convert the correct answer (which was a 1, or 5, or 7 or whatever the digit was) into 1’s and 0’s – so we have another matrix (5000 x10).</p>
<p>We compare this with the one we calculated in Step 4 using simple subtraction and make yet another matrix (5000 x 10).</p>
<h3>Step 6:  The backward pass begins</h3>
<p>So far so good.  All pretty commonsensical. The fun starts when we have to find a way to adjust those guessed weights that we used at the start.</p>
<p>Staying at a commonsensical level, we will take error that we have in that big 5000 x 10 matrix calculated in Step 5 and partition it up so we can ‘track’ the error back to training examples and hidden layers and then from hidden layers to pixels. And this is what the computing scientists do.  T</p>
<p>hey take one training example at a time (one of the 5000 rows), pick out the error for digit 1, and break it up.  And do it again for digit 2 up to digit 0 (which we input as 10).</p>
<h3>Step 7: Working with one training example at a time</h3>
<p>It might seem odd to work with one training example at a time, and I suspect that is just a convenience for noobes, but stick with the program.  If you don’t, life gets so complicated, you will feel like giving up.</p>
<p>So take example one, which is row 1; and do the stuff. And repeat for row 1, and so on until you are done.</p>
<p>In computing this is done with a loop: for 1: m where m is the number of training examples or rows (5000 in our case).  The machine is happy doing the same thing 5000 times.</p>
<p>So we do everything we did before this step but we start by extracting our row of features:  our X or training data how has 1 row and 400 features (1 x 400)*.</p>
<p>And we still have one label, or correct answer but remember we will turn that into a row of 1’s and 0’s.  So if the right answer is 5, the row will be 0000100000 (1 x10).</p>
<p>And we can recalculate our error, or uplift the right row from matrix of observed values that we calculated in Step 6.  The errors at the ‘output_layer’ will be a row of ten numbers (1 x 10).  They can be positive or negative and the number bit will be less than 1.</p>
<h3>Step 8: Now we have to figure out the error in the hidden layer</h3>
<p>So we know our starting point of pixels (those never get changed), the correct label (never gets changed) and the error that we calculated for <em>this particular forward pass or iteration.  </em>After we adjust the weights and make another forward pass, our errors change of course and hopefully get smaller.</p>
<p>We now want to work on the hidden layer, which of course is hidden. Actually it doesn’t exist.  It is a mathematical convenience to set up this temporary “tab”.  Nonetheless, we want to partition the errors we saw at the output layer back to the units in the hidden layer (25 in our case)*.</p>
<p>Just like we had at the output layer, where we had one row of errors (1 x 10), we now want a row or column of errors for the hidden layer (1 x25  or 25 x 1)*.</p>
<p>We work out this error by taking the weights we used in the forward pass and multiplying by the observed error and weighting again by another probabilistic value.  This wasn’t explained all that well. I’ve seen other explanations and it makes intuitive sense.  I suspect our version is something to do with computing.</p>
<p>So here goes.  To take the error for hidden layer unit 1, we take the ten weights that we had linking that hidden unit to each digit.  Or we can take the matrix of weights (10 x 25)* and match them against the row of observed errors (1 x 10).  To do this with matrix algebra, then we turn the first matrix on its side (25 x 10) and the second on its side (10 x 1) and we the computer will not only multiply, it will add up as well giving us one column of errors (1 x25).*   Actually we must weight each of these by the probabilistic type function that we called sigmoidGradient.</p>
<p>We put into sigmoidGradient a row for the training example that was calculated earlier on as the original data times the weights between the pixels and the hidden layer ((5000 x 400*)  times  (25 x 400*))– the latter is tipped on its side to perform matrix algebra and produce a matrix of 25* values for each training example (5000 x 25*).</p>
<p>Picking up the column of data that we calculated one paragraph up, we now have two columns (25* x1) which we multiple (in matrix algebra .* so we can do multiplication of columns like we do in Excel).</p>
<p>Now we have a column of errors for the hidden layer for this one particular training example (25* x1).  (Our errors at the output layer for this person was in a row (1 x 10).</p>
<h3>Step 9: Figure out how much to adjust the weights</h3>
<p>Now we know how much error is in the output layer and the hidden layer, we can work on adjusting the weights.</p>
<p>Remember we have two sets of weights.  Between the output and hidden layer we had (10 x 25*) and between the input layer and the hidden layer, we had (25 x 400*). We deal with each set of weights separately.</p>
<p>Taking the smaller one first (for no particular reason but that we start somewhere), we weight the values of the hidden layer with the amount of error in the output layer.  Disoriented?  I was.  Let’s look again what we did before.  Before we used the errors in the output layer to weight the weights between output and hidden layer and we weighted that with a probabilistic version of input data times the weights coming between input and hidden layers.  That seemingly complicated calculation produced a set of errors – one for each hidden layer – just for this training example because we still working with just one row of data (see Step 8).</p>
<p>Now we are doing something similar but not the same at all. We take the same differences from the output layer (1 x10) and use them to weight the values of the hidden layer that we calculated on the forward pass (1&#215;25*).  This produces (and this is important) a matrix that will have the same proportions as the weights between the hidden and output layer.  So if we have 10 output possibilities (as we do) and 25* units in the hidden layer, then at this stage we are calculating a 10 x 25* matrix.</p>
<p>So for each training example (original row), we have 250 little error scores, one for each combination of output and hidden units (in this case 10&#215;25*).</p>
<p>Eventually we want to find the average of these little errors over all our training examples (all 5000), so we whisk this data out of the for loop into another matrix.  As good programmers, we set this up before and filled it up with zeros (before the for loop started).  As we loop over training examples, we just add in the numbers and we get a total of errors over all training examples (5000) for each of the combos of hidden unit and output unit (10 x25*).</p>
<h4>And doing it again</h4>
<p>We have a set of errors now for the connections between hidden and output layers. We need to do this again for the connections between the input layer and the hidden layer.</p>
<p>We already have the errors for the hidden layer (25* x1) (see Step 8).  We use these to weight the input values (or maybe we should think of that the other way round – we use the input values to weight the differences).</p>
<p>We take the errors for the hidden layer (25 x1) and multiple by the row of original data ( 1 x 400*) and we will get a matrix of (25 x 400*) – just like our table of weights!  You might notice I did not put an asterisk on the 25 x1 matrix.  This is deliberate.  At this point, we take out the bias factor that we put in before.</p>
<p>We do the same trick of storing the matrix of error codes (25 x 400*) in a blank matrix that we set up earlier and then adding the scores for the next training example, and then the next as we loop through all 5000.</p>
<h3>Step 10: Moving on</h3>
<p>Now we have got what we want: two matrices, exactly the same size as the matrices for the weights ( 25 x 400* and 10 x 25*).  Inside these matrices are the errors added up over all training examples (5000).</p>
<p>To get the average, we just have to divide by the number of training examples (5000 in this case). In matrix algebra we just say – see that matrix? Divide every cell by m (the number of training examples). Done.</p>
<p>These matrices – one 25 x 400* and the other 10 x 25* are then used to calculate new tables of weights.  And we rinse and repeat.</p>
<ol start="1">
<li>Forward pass : make a new set of predictions</li>
<li>Back propagation as I described above.</li>
<li>Get two matrices of errors: yay!</li>
<li>Recalculate weights.</li>
<li>Stop when we have done enough.</li>
</ol>
<p>The next questions are how are the weights recalculated and how do we know if we have done enough?</p>
<h3>Recalculating weights</h3>
<p>The code for the back propagation algorithm is contained within a function that has two purposes:</p>
<ul>
<li>To calculate the cost of a set of weights (average error in predictions if you like)</li>
<li>And the matrices that we calculated to change the weights (also called gradients).</li>
</ul>
<p>The program works in this order</p>
<ul>
<li>Some random weights</li>
<li>Set up the step-size for learning (little or big guesses up or down) and number of iterations (forward/back ward passes)</li>
<li>Call a specialized function for ‘advanced optimization’ – we could write a kluxy one but this is the one we are using</li>
<li>The advanced optimizer calls our function.</li>
<li>And then performs its own magic to update the weights.</li>
<li>We get called again, do our thing, rinse and repeat.</li>
</ul>
<h3>How do we know we have done enough?</h3>
<p>Mainly the program will stop at the number of iterations we have set.  Then it works out the error rate at that point – how many digits are we getting right and how many not.</p>
<p>Oddly, we don’t want 100% because that would probably just mean we are picking up something quirky about our data.  Mine eventually ran at around 98% meaning there is still human work and management of error to do if we are machine reading postal codes.  At least that is what I am assuming.</p>
<p>&nbsp;</p>
<p>There you have it.  The outline of the back propagation.  I haven’t taken into account the bias factor but I have stressed the size of the matrices all the way through, because if there is one thing I have learned, that’s how the computing guys make sure they aren’t getting muddled up.  So we should too.</p>
<p>So now I will go through and add an * where the bias factor would come into play.</p>
<p>Hope this helps.  I hope it helps me when I try to do this again.  Good luck!</p>
<h3>The regularization parameter</h3>
<p>Ah, nearly forgot – the regularization parameter.  Those values – those little bits of error in the two matrices that are the same size as the weights – (25&#215;400*) and (10&#215;25*)?</p>
<p>Each cell in the matrix except for the first column in each which represents the bias factor, must be adjusted slightly by a regularization parameter before we are done and hand the matrices over to the bigger program</p>
<p>The formula is pretty simple.  It is just the theta value for that cell times by the learning rate (set in the main program) and divided by the number of training cases.  Each of the two matrices is adjusted separately.  A relatively trivial bit of arithmetic.</p>
<h3>CHECK OUT SIMILAR POSTS</h3>
<ul>
<li><a href="http://flowingmotion.jojordan.org/2011/11/03/a-general-algorithm-for-implementing-a-neural-network/" rel="bookmark" title="November 3, 2011">A general algorithm for implementing a neural network</a></li>
</ul>
<p><!-- Similar Posts took 91.590 ms --></p>
]]></content:encoded>
			<wfw:commentRss>http://flowingmotion.jojordan.org/2011/11/10/back-propagation-for-the-seriously-hands-on/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Need to practice first order logic?</title>
		<link>http://flowingmotion.jojordan.org/2011/11/05/need-to-practice-first-order-logic/</link>
		<comments>http://flowingmotion.jojordan.org/2011/11/05/need-to-practice-first-order-logic/#comments</comments>
		<pubDate>Sat, 05 Nov 2011 19:12:55 +0000</pubDate>
		<dc:creator>Jo Jordan</dc:creator>
				<category><![CDATA[SOCIAL MEDIA & IT]]></category>
		<category><![CDATA[first order logic]]></category>
		<category><![CDATA[predicate calculus]]></category>
		<category><![CDATA[Wolfram]]></category>

		<guid isPermaLink="false">http://flowingmotion.jojordan.org/?p=5081</guid>
		<description><![CDATA[I found this first order logic exercise on Wolfram. #1 Download Wolfram’s CDF player -1 The download on their site&#8230;]]></description>
			<content:encoded><![CDATA[<p>I found this first order logic exercise on Wolfram.</p>
<h3>#1 Download Wolfram’s CDF player</h3>
<p>-1 The download on their site did not work for me, so I downloaded here from <a href="http://www.softpedia.com/dyn-postdownload.php?p=160276&amp;t=4&amp;i=1">Softpedia.</a></p>
<p>-2 You will download an .exe file. When it arrives on your personal computer, simply click on the link and it will install as a Program.  It takes a little time to install.  Big beastie to allow you to view interactive documents.</p>
<h3>#2 Now read whatever you want on Wolfram’s Demonstrations</h3>
<p>-1 Find the demonstration that interests you.  In this case, try this demo for practicing <a href="http://demonstrations.wolfram.com/TypicalPredicateCalculusStatements/">first order logic</a>, also known as predicate calculus..</p>
<p>-2 Click on “Download Demonstration as CDF” at top right and it should open.  If not, try firing up Wolfram first from your Start/Programs.</p>
<h3>#3 Practice your first order logic</h3>
<p>-1 Choose how many objects to play with,</p>
<p>-2 Start at equation number 1.</p>
<p>-3 Move objects around to change the truth value from true to false and v.v</p>
<p>&nbsp;</p>
<p>It won’t do your homework for you but it might take the edge off the confusion.</p>
<h3>CHECK OUT SIMILAR PPOST</h3>
<ul>None Found
</ul>
<p><!-- Similar Posts took 19.097 ms --></p>
]]></content:encoded>
			<wfw:commentRss>http://flowingmotion.jojordan.org/2011/11/05/need-to-practice-first-order-logic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Check your propositional logic with a truth table generator</title>
		<link>http://flowingmotion.jojordan.org/2011/11/05/check-your-propositional-logic-with-a-truth-table-generator/</link>
		<comments>http://flowingmotion.jojordan.org/2011/11/05/check-your-propositional-logic-with-a-truth-table-generator/#comments</comments>
		<pubDate>Sat, 05 Nov 2011 17:55:06 +0000</pubDate>
		<dc:creator>Jo Jordan</dc:creator>
				<category><![CDATA[SOCIAL MEDIA & IT]]></category>
		<category><![CDATA[propositional logic]]></category>
		<category><![CDATA[truth table generator]]></category>

		<guid isPermaLink="false">http://flowingmotion.jojordan.org/?p=5080</guid>
		<description><![CDATA[The Seventh Day Adventist University website has a truth table generator for checking propositional logic.  Instructions for inputting propositional logic&#8230;]]></description>
			<content:encoded><![CDATA[<p>The Seventh Day Adventist University website has a truth table generator for checking propositional logic.  Instructions for inputting propositional logic symbols are on its page.</p>
<p>My host&#8217;s wordpress is borked: so here is the link</p>
<p>http://turner.faculty.swau.edu/mathematics/materialslibrary/truth/</p>
<h3>#1 Check you understand each part of the assertion</h3>
<p>Basically, you can check that you are using the basic truth table for simple assertions like (A and B).</p>
<h3>#2 Generate a truth table for multiple assertions</h3>
<p>And you can combine simple assertions to generate a truth table</p>
<h3>Caveat</h3>
<p>I am not an expert in this, but I am assuming that if a bundle of assertions are always true,whatever the starting values that we put into the bundle, then the bundle resolves to true.</p>
<p>Correspondingly, if the assertions come out as false, no matter what the starting values are, the bundle resolves to be false.</p>
<p>And if the bundle contains a mix of true and false, we are left uncertain what will happen.</p>
<p>Any thoughts?</p>
<h3>CHECK OUT SIMILAR POSTS</h3>
<ul>None Found
</ul>
<p><!-- Similar Posts took 12.264 ms --></p>
]]></content:encoded>
			<wfw:commentRss>http://flowingmotion.jojordan.org/2011/11/05/check-your-propositional-logic-with-a-truth-table-generator/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A general algorithm for implementing a neural network</title>
		<link>http://flowingmotion.jojordan.org/2011/11/03/a-general-algorithm-for-implementing-a-neural-network/</link>
		<comments>http://flowingmotion.jojordan.org/2011/11/03/a-general-algorithm-for-implementing-a-neural-network/#comments</comments>
		<pubDate>Thu, 03 Nov 2011 13:10:10 +0000</pubDate>
		<dc:creator>Jo Jordan</dc:creator>
				<category><![CDATA[SOCIAL MEDIA & IT]]></category>
		<category><![CDATA[backward propagation]]></category>
		<category><![CDATA[forward propagation]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[neural network]]></category>

		<guid isPermaLink="false">http://flowingmotion.jojordan.org/?p=5076</guid>
		<description><![CDATA[What is a neural network? A neural network takes a set of binary (0/1) signals, groups them into a smaller&#8230;]]></description>
			<content:encoded><![CDATA[<h3>What is a neural network?</h3>
<p>A neural network takes a set of binary (0/1) signals, groups them into a smaller set of hidden units, which are used to work out the probability of something happening.</p>
<h3>An example of a neural network?</h3>
<p>The example used by Andrew Ng in the Stanford course</p>
<ul>
<li>Converts the image of a handwritten number into 20&#215;20 = 400 pixels, i.e. a row of 400 1’s or 0’s</li>
<li>The backward propagator works out how to group the 400 columns of pixels into 25 units (which are ‘hidden’ because the end user doesn’t need to know about them)</li>
<li>And then the backward propagator does its magic again to work out weights to match combinations of 25 units onto the ten possibilities of a digit (0-9).</li>
</ul>
<h4>Forward propgation</h4>
<p>The forward propagator takes a handwritten number, or rather the row of 400 1’s and 0’s representing the 20&#215;20 pixels for a number, and runs the calculation forwards.  400 1’s and 0’s are multiplied by the weights matching a column to the 25 hidden weights. And 25 new columns are generated.  Each image represented by a row,now has 25 numbers.</p>
<p>The process is repeated with the weights from the 25 columns to the 10 digits and each image now has a probability for each of the 10 digits.   The biggest probability wins!  We have taken a list of pixels and stated what a human thought they were writing down!</p>
<h4>Training accuracy</h4>
<p>And of course, if we knew what the digit really was, as we do in a ‘training set’ of data, then we can compare the real number with the one that the machine worked out from the set of pixels.  The program run for Stanford students is 97.5% accurate.</p>
<h4>Waiting for backward propagation</h4>
<p>The real interest is in the backward propagator, of course.  Just how do they work out that there should be 25 units in the hidden layer and how do they work out the weights between the hidden layer and the output layer.</p>
<h3>Machine learning vs psychology</h3>
<p>In psychology, we have traditionally found the hidden layer with factor analysis or principal components analysis.  We take your scores an intelligence test, for example.  That is simply a row of 1’s and 0’s!  We factor analyse the 1’s and 0’s (for you and hundreds of other people) and arrive at a hidden layer.  And from there we predict an outer layer.</p>
<p>We usually tighten up the inputs to reflect the hidden layer as closely as possible – that is we improve our tests so that 30/50 is meaningful.  And our outer layer is often continuous – that is we predict a range of outcomes which we later carve up into classes.  We might predict your A level exam results by % and then break them into A, B, C, etc.</p>
<p>So it is with great interest that I await the backward propagation.  I am also more interested in unsupervised machine learning which I suspect reflects real world conditions of shifting sands a lot more.</p>
<h3>CHECK OUT SIMILAR POSTS</h3>
<ul>
<li><a href="http://flowingmotion.jojordan.org/2011/11/10/back-propagation-for-the-seriously-hands-on/" rel="bookmark" title="November 10, 2011">Back propagation for the seriously hands-on</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/12/04/down-to-earth-principal-components-analysis-in-5-steps/" rel="bookmark" title="December 4, 2011">Down-to-earth principal components analysis in 5 steps</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/11/15/learning-curves-and-modelling-in-machine-learning/" rel="bookmark" title="November 15, 2011">Learning curves and modelling in machine learning</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/10/16/12-steps-to-running-gradient-descent-in-octave/" rel="bookmark" title="October 16, 2011">12 steps to running gradient descent in Octave</a></li>
<li><a href="http://flowingmotion.jojordan.org/2011/11/24/10-steps-to-build-a-spam-catcher/" rel="bookmark" title="November 24, 2011">10 steps to build a spam catcher</a></li>
</ul>
<p><!-- Similar Posts took 43.398 ms --></p>
]]></content:encoded>
			<wfw:commentRss>http://flowingmotion.jojordan.org/2011/11/03/a-general-algorithm-for-implementing-a-neural-network/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Functional dependence in databases: plain language &amp; how-to</title>
		<link>http://flowingmotion.jojordan.org/2011/10/30/functional-dependence-in-databases-plain-language-how-to/</link>
		<comments>http://flowingmotion.jojordan.org/2011/10/30/functional-dependence-in-databases-plain-language-how-to/#comments</comments>
		<pubDate>Sun, 30 Oct 2011 21:29:02 +0000</pubDate>
		<dc:creator>Jo Jordan</dc:creator>
				<category><![CDATA[SOCIAL MEDIA & IT]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[functional dependence]]></category>
		<category><![CDATA[relational design]]></category>

		<guid isPermaLink="false">http://flowingmotion.jojordan.org/?p=5072</guid>
		<description><![CDATA[Computer scientists speak a language of their own with long ‘noun phrases’ and complex sentences which are often grammatically incorrect&#8230;]]></description>
			<content:encoded><![CDATA[<p>Computer scientists speak a language of their own with long ‘noun phrases’ and complex sentences which are often grammatically incorrect or very difficult to parse.</p>
<p>This post here is a short description of FUNCTIONAL DEPENDENCE IN DATA BASES.  This is a common sense view and an AMATEURISH view – so read it to take off the edge off the unintelligibility of explanations around the web but remember that when you want to solve very hard problems, this explanation will probably be not be sufficient or even accurate.</p>
<h3>#1 What is a database?</h3>
<p>A database is a set of tables.  A table is just a set of rows and columns like we have in Excel.  A database has lots of tables like the sheets in Excel.</p>
<p>And just as we can in Excel, we can tell the program to toddle off to another table to pick up a value, bring it back and put it in another table.  We call that a Look Up.</p>
<h3>#2 Looking up something by tracking from table to table</h3>
<p>In a complicated set of tables, the value we want might be in Table 6, for example.  We might also have some information that doesn’t allow us to look up what we want in Table 6, but we can use the information we have to look up something in Table 4 and something else in Table 3 and use those two facts to look up what we need in Table 5 and then go to Table 6 to finish the job.</p>
<p>To take a practical example, if you want to look up someone’s telephone number, you need to know their family name and the town where they live.  If their family name is very common, you might need to know their first name and street name as well, but let’s stick to a simple example.</p>
<p>So if we know our friend lives in Timbuktu, we look up the volume number of the directory for Timbuktu, then we go to the Timbuktu directory/table and we use our friend’s name to lookup their telephone number.</p>
<p>Alternatively, the list might have been laid out in one table and we look in the first two columns for Timbuktu AND our friend’s name.  When we have found both together, the correct telephone number will be in the next column.</p>
<h3>#3 What is functional dependence?</h3>
<p>In plain language, functional dependence just describes how we look up information.</p>
<p>Because we need our friend’s town to look up their telephone number, then telephone number is functionally dependent on town and the computer scientists write that down as Town àTelephone Number.</p>
<p>Equally ,as we need a name to look up a telephone number, telephone number is functionally dependent on name and that is written as Name à Telephone Number.</p>
<p>What’s more, as we need town and name to look up a telephone number, Telephone Number is functionally dependent on Town AND Name. Computer Scientists write that as Town, Name à Telephone Number.</p>
<h3>#4 So why do we care about functional dependence?</h3>
<p>Every day we ask questions about functional independence without being conscious that we are using this exalted concept!</p>
<h4>We use functional dependence every day</h4>
<p>Whenever we look up something like a telephone number, we are asking what information we need to know to look up the number.</p>
<p>When you Googled “functional dependence” and landed up here, you used a look up – or rather you trusted Google to know what look-ups to use!</p>
<h4>Tough search problems require us to track from one lookup table to another</h4>
<p>At work, we might also say, if I have information A and information B, can I find out Z and how do I find out.</p>
<p>How can I step from table to table to get the information that I seek?</p>
<h4>When we design a database or set of spreadsheets, we want to do the least work possible!</h4>
<p>When we design a database, or set of Excel spreadsheets, we also want to make as few tables as possible!</p>
<p>We want to make sure we only ever have to type a piece of information into only one table once!</p>
<p>We want to make sure that data that hardly ever changes or we hardly ever look at is still accessible but in tables that we can put out of the way.</p>
<p>That’s it.  That’s the what and why of functional dependence.  So let’s turn to the how and specifically the ‘how’ for students.</p>
<h3>#5 So how do you work out functional dependence questions?</h3>
<p>With the Stanford experiment in online classes going on, and other computer science students doing homework, you might really want to know how do I do these ***** problems?</p>
<p>This is my way – it is not the official way but it works for me.</p>
<p>When I have a table (R) with columns (A, B, C, D etc), I think of the columns as all the columns in a set of spreadsheets.</p>
<p>Then I turn the functional dependencies (FD) (written AàB) into tables.  AàB is a table with two columns, A and B. In plain language, I use column A to look up column B.</p>
<h4>Problem 1</h4>
<p>Then, when I am asked, does ABCàD work in that relation and set of FD’s, all I do is ask myself, given the set of tables, if I already  know ABC, can I find the value of D? I have to be careful and methodical, but hey it works.</p>
<h4>Problem 2</h4>
<p>Then when I am asked if BC, say, is a key, I write down BC and I write down the columns that are left – say a, d.  Then I ask if I can look up a and d with B and C.  If I can, then BC is a key. If not, BC is not a key.</p>
<h4>Problem 3</h4>
<p>When I am asked if two sets of FD are the same, then all I am being asked is whether a set of tables allows me to look up the same information.  This is more tricky and I found it easiest to draw a matrix with a column for each column (A, B, C, D, etc) and a row for each of those columns.  I scratch out the diagonal (A=A) and see if knowing A (row), I can look up B, C, D etc.  This only works for very simple problems though.</p>
<p>So, this is an amateur’s take on functional dependence. Use it if it works for you and not if it doesn’t.  And remember it is an amateur’s version.  Once problems become more complicated, all that maths is probably useful shorthand and my account is probably concealing some misunderstanding or other!</p>
<p>Good luck.</p>
<h3>CHECK OUT SIMILAR POSTS</h3>
<ul>None Found
</ul>
<p><!-- Similar Posts took 29.887 ms --></p>
]]></content:encoded>
			<wfw:commentRss>http://flowingmotion.jojordan.org/2011/10/30/functional-dependence-in-databases-plain-language-how-to/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

