Blog too big? Tidy up and make eBooks?

Blog too big to manage? Does WordPress have plugins to help?

I have almost a thousand posts on this blog and I would dearly love to tidy them up; but this is a horribly big job.  Posts for one year alone come to about 500 printed pages and there is almost a decade of writing on here.

My search for  eBook plugins  for Word press begins

MPL plugin for WordPress

I have just now downloaded and activate my first eBook plugin: MPL.

Over the coming days I will test it and then try others too.

What I hope to do to tidy my blog

Goal 1:  Dead posts

When I spot a post that I really do not want to keep, I would like to delete it.

The post might be important to other people though.  Before I delete it, I will check on the Google statistics on Google Analytics, the analytics through Jetpack, and if it is an old post, the statistics on the original blog that I had and that redirects here.

How do I know if someone links to it?  And does it matter, if the link delivers no traffic?  It would be good to find out though.

My statistics will include

  1. The number of posts considered for deleting
  2. The number retained anyway
  3. Why I retained them

Goal 2: Interesting posts that might maintain a sense of my interests changing over the years

A post may be pretty much dead but it might be a market of my interests over the year.  Nostalgic perhaps, but I would like to complete those.

Maybe, I should simply make one of my eBooks – The first decade.

Indeed, this is the value of writing. I will do this first to give myself an overview of the blog.

Can I download a list of all 1000 posts, with their dates, tags and comments and look for a pattern?

Can I find out their current traffic too?

Task 1:  Get overall statistics for my blog

Alternative corporate tax regimes

We all know by now that the likes of Google, Amazon and Starbucks pay very little corporate tax in the UK. We know that our corporate taxes tend to be low though they are not as low as Ireland’s and Sweden’s. We might also have an intuition that corporate taxes are a significant part of GDP. We would miss them if we didn’t have them but they are only about 3% of GDP. Surprised? What is even more surprising is that though our corporate taxes are low, they are a larger part of our GDP than in Germany, say. My first thought was, ah good, we set individual corporate taxes low but gain on the aggregated. But of course earning more from corporate taxes than Germany is just a reminder that we earn less from more productive sources. So perhaps not hurrah for us.
So what are the issues surrounding corporate taxation? The issue that energises us at the moment is the crafty use of tax law and corporate structures to lower tax liabilities. Academic accountants are proposing various different ways of taxing companies and here are my notes below.
Cash-flow taxes
Instigator: Meade Report
Core idea: Tax net cash flow rather than profits.
• Abolish deductions for depreciation and interest payments.
• Deduct investment expenditure when it occurs.
Spelled out:
• Investment becomes a current cost.
• Sales of capital costs are treated as any other cash inflow.
• Both loans and interest payments are not subject to tax (similarly to equity injections and dividend payments)
• Trading transactions are taxed; not financing operations
• Does not apply well to banking trading operations
• Other countries would still be making distinctions between debt and equity
Variant: “R+F”
• Tax borrowings as cash inflows and treat payments of principal and interest as deductibles
• Would apply to trading operations of banks
Follow-up question:
• VAT on financial services
Allowance for Corporate Equity (ACE)

Instigator: IFS Capital Taxes Group (1991)
Core idea: Provide explicit tax relief for the (imputed) opportunity cost of using shareholder funds to finance the operations of a company
Comparison with other methods:
• Allows for 100% allowance for equity-financed investment not provided for in cash-flow methods
• Equates to allowing the interest cost of debt-finance in ‘standard’ corporate taxation
• The normal return on equity-finance investment is removed from the corporate tax base
• Calculate the closing stock of shareholder funds at the end of the previous period
• Opening stock + Equity issued – Equity repurchased + Retained profits
• PV of a stream of tax payments will not depend on details of deprecation schedule
• Opportunity cost is the risk-free (nominal) interest rate
Comparisons with personal taxation:
• ACE compares with RRA (rate of return allowance)
• Cash-flow proposals compare with EET treatment of savings
• Reduce rebates and carry-forward provisions by aligning timing of tax payments with actual returns
• Retains much of existing tax structure
• Taxes the excess over nominal rates of return
• Debt and equity financing equivalent in PV terms and in relation to tax timing provided depreciation schedules are realistic
• Taxes only become liable when returns exceed a normal rate of return
• But if taxing rents taxes effort and risk?
• To implement, only need to specify how the equity based evolves over time and the nominal interest rate
Comprehensive Business Income Tax (CBIT)

Essential idea: Tax the interest on debt financing in the hands of the debtor, i.e., tax corporate profits after depreciation but before interest.
• Raises required RoR for both debt and equity financing
• Hugely increase tax in banking and financial services that currently deduct their interest payments
• CBIT effectively taxing returns to debt and equity at the corporate level.
• Increasingly, limited interest deductibility across borders to stop subsidiaries in high tax regimes borrowing from parent companies in low tax regimes
• Increasingly, limit deductions for financing foreign operations are exempt from local tax

Comparisons with other systems
• ACE and cash-flow taxes change the tax base and exclude normal returns on savings (which are taxed at a personal level)
• Cash-flow taxes are close to the expenditure treatment of personal savings
• ACE is similar to RoR Allowance on personal savings.

WordPress to Drupal : First steps

I plan to use Drupal to re-organize this large blog:  This post lists the beginning steps.  Each step is quite large and if you are completely new to web development, you will need to look up and complete each step as a mini-project.

  1. Make a development environment on my laptop.  Set up WAMP.
  2. Set up Drush so that I download Drupal and its modules more easily.
  3. Set up a clean database in WAMP / PHPMyAdmin.
  4. Download the latest version of Drupal 7 and unpack it into folder in c:/wamp/www/myblog
  5. Go to my browser and install Drupal: “localhost/myblog”
  6. Use the command line to use Drush : “cd c:/wamp/www/myblog”
  7. Use Drush to download and enable the modules for “pretty url”
    1. drush dl token pathauto
    2. drush en -y token pathauto
  8. Chose a theme and use Drush to download/upload it.  I used Stanford’s Open Framework and downloaded it manually to /myblog/sites/all/themes.
  9. Logged in, set the time to UK and set up the date format.
  10. Under Appearance, set the Open Framework to the default
  11. Use Drush to install the modules needed for WordPress_Migrate (see above for the commands): migrate ctools features  media media_youtube migrate_extras
  12.  As of today’s date though this will change, do not download WordPress_migrate with drush. Go to its webpage and install the latest development version
  13. Go to Add Content and select Migrate. Look for the Worpress link
  14. Follow the instructions – all of the them!
  15. I  tried to retained my WordPress urls but that did not work and the aliases are borked too.  As it was not straightforward finding this workflow, I will leave this for now and worry about sorting my comment.

These are the basic steps for importing WordPress content into Drupal. It is not perfect but as this is a one off import, it is satisfactory.

WordPress to Drupal

Here begins my migration from WordPress to Drupal.

Why move from WordPress to Drupal?

I don’t advise anyone to move from WordPress to Drupal unless they have a strong reason.  WordPress is friendly and you will be functional and confident in few months.

Drupal is overly complicated, bloated and very badly documented.

So why move to Drupal?

Drupal is designed for complicated sites.  The Views facility, that is, it’s query function is very powerful.

This blog, has grown very big.  I don’t even remember everything i have written.   And I would like to find out.  I would like to delete what is no longer of any use. I would like to organize what is still valuable. And I would like to rewrite what could benefit from rewriting.]

The benefits of migrating from WordPress to Drupal

I intend to download the content of this WordPress blog onto a development site on my laptop and reorganize its content.

I will use a Drupal module to do a rough-and-ready migration.   Then I will start allocating posts to “Books”.   That is, I will set up a “Book” and cross-reference them to existing posts.  Once I have done that, I will be able to make a plan for each Book or collection.

I have also chosen a simple base them to think ahead to a “responsive” theme. I want people to be able to read my “Books” on their mobiles. When I rewrite the posts, I will take the smaller screen estate into account and chunk and structure the posts so that people can follow a complicated idea more easily.

Uniform Server unexpectedly throws a php5ts.dll error

Uniform Server is a portable version of WAMP. Apologies if you are an expert developer but some people arriving here may not be so let me explain a little.

What is WAMP?

WAMP means Windows Apache MySQL PHP. It is a bundle of web server, database server and PHP language that we download onto our Windows laptops so that we can develop new websites locally.

What is Uniform Server?

What does portable mean in practice?

Uniform Server is a portable version of WAMP. You can install it on a USB stick and use it without installing it on your laptop. This is very useful when you are working in situations such as universities where you may not want to lug your laptop around with you. You can take your USB stick from office to library to lecture hall and plug it in to whatever computer is there. When you pull it out, you have left no “registry dust”, i.e., you haven’t changed the configuration of the computer at all. But you may have changed the content on the USB. The equivalent for Word would be that Word has been installed on the USB rather than on the computer.

What happens when I buy a new laptop before my project has ended?

A portable server is also useful when we have a long project because over the life of a long project, a laptop is likely to be replaced. By having everything to do with a project on a standalone portable server, we simply plug the USB into the new laptop. We don’t have to stop and think – what software is on here? What needs to be installed? Will the versions be compatible, etc.?

How do I make backups?

Having server, software and data in one place is also handy for backups. Though this makes for a big file, we usually handle backups in three levels. The server is backed-up. The program files are backed up. And the data files are backed up. Backing up at three levels is good practice and makes sense when the server is shared, when the program files are common and when data is specific. But for many researchers, a failure is a nightmare. We aren’t thinking in terms of three levels and panic ensues when something fails. Sorting the mess out can take days because we are unfamiliar with the processes. A bundle on a USB obviates that panic. Back up nightly. It is a bit slow but if the work is important, back up onto another USB or even onto your laptop and backup again via a tunnel to the university’s servers. Then the worst case scenario is a 24 hour loss. Just pick up yesterday’s USB and begin again there.

The flaw in Uniform Server

But I have discovered a flaw in the system. I had an old version of Uniform Server running on my laptop: 8.6.4. I started it up and it threw an error.


1. C:\my_uniform_server_directory_name\my_website_folder\usr\local\php\php5ts.dll was present.
2. I had other versions of Uniform Server on my laptop. Shutting down the first and starting up others threw the same error.
3. There was sometime between last time I started up Uniform Server and the error. It seemed unlikely that multiple versions of php5ts.dll had been corrupted and much more likely that something downloaded to my computer since I last used Uniform Server was clashing.
4. And here is the flaw. Though Uniform Server is fully portable, the path name set up within Uniform Server does not explicitly reference php5ts.dll. This means that when this .dll is called, Windows looks at it generic path statement and accesses the php5ts.dll in other versions of PHP on your computer (or fails to find it).

The fix for the php5ts.dll error

1. For each and every version of Uniform Server on your computer or USB stick, edit C:\my_uniform_server_directory\my_website_folder\usr\local\apache2\conf\httpd.conf to include a direct reference to the php5ts.dll file that came with Uniform Server. Simply add the last line to the list of code shown below.

Loadfile “C:/portables/”
Loadfile “C:/portables/”
Loadfile “C:/portables/”
Loadfile “C:/portables/”
Loadfile “C:/portables/”

5. Save, stop Uniform Server if it is running and restart. All should be well.

In short, phpt5ts.dll and Uniform Server

1. If you have an error message about pht5ts.dll, then check the file is present.
2. Think if there is any reason that if could have been corrupted.
3. Check the httpd.conf file and add the direct reference.
4. Test.
In all likelihood, Uniform Server was not picking up its own version of pht5ts.dll.


Pondering participation in MOOCs

I have just listened to a THES podcast on the University of London MOOC’s offered on Coursera during 2013 and scanned their report.

  • University of London offered 4 MOOCs
  • They had initial interest (equivalent to click through and registration) of 241 075
  • They had initial active registration during the first week of 93 468 (44%)

This translated to 14K to 36K per course, with a median of around 22K
And initial rates from 32% to 50% with a median of 46%

  • Activity rates were roughly as follows

Three quarters of starters watched a video
Between 2% and a third took a quiz (median around 22%)
Between 4% and 7% posted in a forum (median 5%)

  • Participation drops week-by-week to around 27-28K (about a quarter of starters and just over 10% of all registered users)
  • Activity rates show the same overall pattern but drop

Videos – two-thirds with a low of 55%
Assessments – 61% with a low of 27% and median around 40%
Forum – 2% to 3%

  • Certificates of accomplishment were issued to 8.8K (less than 9% of starters and less than 5% of those who indicated interest)

Per course: 1,5K to 2,5K per with a median of about 2,4K
% of starters completing ranged from 6% to 18%
Completers as a % of people active in the sixth week ranged from 25% to 40% with a median of 34%

  • Repeat sales of a cash nature

The pod cast mentioned 45 enrolments on full degree courses with revenue of 250K
I believe the marginal cost of the MOOCs (not counting prior development costs of courses and full time staff) was 4x20K
Much of the material is re-usable
ROI was around 300%


As a marketing cost

45 large sales of around 5K from initial interest of 241K – a ball park figure of 5 000 contacts per sale (
This is very low – Google expects a click through rate of 200 per ad
The initial registration of around 250K, in marketing terms, is a list of ‘qualified targets’ and the conversion rate should be higher?
As a ROI, the university is saying that marketing costs 80K/45 or 1K to 2K per person out of total fees of 5K is acceptable (normal?)

With an established world-wide brand, shouldn’t the university be driving its marketing and admin costs right down?

As an educational exercise

The podcast mentioned Cousera as being limited and adding other interactional material
But the report did not report the students’ use of this material
Or indeed any sophisticated analytics or responsiveness of the course on the basis of analytics

Isn’t the learning that when students enrol on a course, they want something clear and well organised? Are students not much more economical and organised than universities think?

Isn’t the service offered to help students start and finish, at times that suit them, and more economically (time wise) than heading over to Amazon and buying the book?

How to store vegetables and fruit at home

This is my blog where I keep notes  – and so we have fruit and vegetables among psychology, poetry and code. So be it.

I haven’t tested everything here.  I have simply re-organised information I have found around the web to make it more memorable.

Store outside of the fridge in clean water

Think flowers.


Beetroot with leaves still on

Cabbages & similar greens

Carrots with their leaves on (for a day or so)




Store at room temperature

Avocado (be careful not to bruise)

Brussel sprouts on the stalk





Sweet potatoes


Summer squash

Sweet potatoes

Winter squash


Sealed containers

Artichokes (with a little moisture)

Basil (with absorbent paper to keep dry)

Carrots (with plenty of moisture and give them a drink every few days)

Fava beans

Fennel (with a little water)

Green Garlic

Okra (eat fast)

Open container in the fridge with a dry towel

Arugula (Roquet) (Wash and dry first)

Beans (shelled)

Beets (without leaves)


Brussel sprouts (loose, off the stalk)




Radishs (without leaves)

Snap peas



Sealed container in the fridge


Greens (with a damp cloth)

Green beans (with a damp cloth)





Leeks (wrapped in damp cloth)



Spring onions

Sweet peppers

Read data files into R

Make what you will of this. It was written during a long day trying to make sense of the JHU course on R with liberal helpings of the simpler course from Princeton.  I haven’t checked the text for accuracy or typos and have put it here just so I can find it when I need it. If it helps you to make sense of R, good.

Get your data and save it in a .csv file

Your first task when using R to do statistical analysis is to collect the data.

Layout your data

Normally, you lay out your data in a table. Observations, cases, instances or people are in the rows. Variables or things you observed are in in columns.

Usually there is one column, which is unique but anonymous identifier for each case/row.

You will also have columns for ‘factor’ information such as gender (male/female/other), age (0-125), and so on.

You can put the column names in Row 1 – or you can leave out the column names and put the first case in Row 1.

Save your data

It is quite normal to load data into an Excel table and then to save it in .csv (comma separated value) format. 

.xls format can be read but .csv is more common and should be your standard practice.

You can also capture your data in Excel and save it as .text file. I will show you how to read both the .csv and .txt.

Read your data into R

Your second task is to read a data file into R so that you can use it and analyse it.

You have three primary ways to read in your file.  By the time you have worked through these methods, you will have mastered several basic R commands.

First method: Read the file from an online source

1.       Online urls tend to be long and cumbersome. So put the url into a variable called fileurl.

2.       We use <- to equate the variable and the url

3.       We surround the url with “”

4.       fileurltxt<-“” [name taken from one of the Coursera courses]

5.       For avoidance of confusion: notice this is a text file and even though we have made a variable with this name, the file we read in remains a .txt file.

6.       Alternatively: fileurl csv<- “”

7.       Now you can see why my variable was named – file-url-csv or file-url-txt – I don’t want to forget what I have done.

8.       To read a .txt file, simply:

mydata <- read.table(fileurltxt, header = TRUE) or mydata <-read.table (fileurltxt, header=T)

9.       To read a .csv file, simply:

mydata <- read.csv(filurlcsv, header = TRUE) or as above, abbreviate TRUE to T

10.    If you leave off the header=TRUE, the first line of the file is treated as data. Let’s spell this out: if you leave off the information about the header and the first line is a set of column headings, these will be erroneously treated as the first case. If you do not have headers, then of course, replace TRUE with FALSE or, just take out header=TRUE. I recommend setting header = FALSE so that when you return to your program after many months, you are immediately clear about what you did.

What you have learned to do so far

·        To set up a variable to store a long url

·        The <- command to “put data into a variable or matrix or table or dataframe”

·        To put “” around strings (includeing urls)

·        To check whether your data is in a .txt or .csv file

·        Adifferent command to read .txt and .csv files (read.table and read.csv, respectively)

·        To add header=TRUE or header=FALSE to say whether the top line of the file contains column headings or data.

·        To read your data into a data frame called mydata.  You can call mydata what you like (though you might discover some reserved names). Use a name that is short, descript and memorable.

At this point, you can easily read data from a location on the web and you have data in mydata ready to use.  If you want to see your data, simple type mydata followed by enter and your data will be listed.  If you have a very long file, don’t list all of it.  Type head(mydata) instead to get the top few lines and tail(mydata) to get the bottom few lines

So you have learned to more things:

11.    To read a data frame, simply type its name

12.    To read the first or last lines of a large dataframe, type head(dataframename) or tail(dataframename).

Read the file from a directory or folder on your laptop

If you have downloaded or aved a file onto your laptop, then you are going to follow exactly the same procedure as above. But you will have to replace the url within “” with a path and filename.

Begin by learning how to set your working directory.

1.       To find out your current working directory, use: getwd()

2.       To set a working directory, use: setwd(“C:/users/yourusername/documents/R”)

3.       Check that with getwd()

4.       Of course, if R does not exist, make the directory and put your datafile there

5.       The reason I used R is that when you load packages and libraries, R automatically makes this folder

6.       Note you can also put your data in a sub-directory “C:/users/yourusername/documents/R/datafiles”

7.       Of course, datafiles can be anything you chose to name it

8.       Also note that you must use / not \

9.       Assuming your data file is in /R/datafiles and that your working directory is R, then your path is “datafiles”

10.    Here are your new commands for reading a .txt file

filenametxt = “your file name including .txt”

path = “datafiles”

mydata<-read.table((file=file.path(path, filenametxt), header = TRUE)

11.    And for .csv

filenamecsv =”your filename including .csv)

path =”datafiles)

mydata<- read.csv((file=file.path(path, filenamecsv), header=TRUE)

Read any one data files in any one of many directories

Now let us imagine you are working on a large project and you have several data sub-directories (part1, part2, etc.).  And each directory contains many datafiles. The datafiles might be named with numbers (001.csv to 999.csv).  Everything in this section applies to .txt files too, but you must use read.table not read.csv and substitute .txt for .csv

1.       To be able to tell R to find the file you want, we set the file name as an argument in a function or script.

2.       Set up the barebones of the function:

getdatafile <-function(id, path){


3.       Type getdatafile to see the script

4.       Edit the getdatafile function by using fix(getdatafile) .  Note that you will edit in a little popup and when you save, you should see the corrected function on the original console.  I have had endless trouble with this, so work carefully to eliminate muddle. 

getdatafile <-function(id, path) {

read.csv((file=file.path(path, id)), header=TRUE)


Now run getadatafile(“filename.csv”,”datafiles”) by replacing your

The full datafile should come up on the console.

5.       Explore that further by editing getdatafile again and reading the file to mydata:

mydata<-read.table((file=file.path(path, id)),header=TRUE)

6.       Now the data will not read out. Moreover when you type mydata, you will get NULL, or a message that it does not exist, or values of my data that you put there earlier.  So rerun this command but clear mydata first with mydata<-0.

7.       To have access to the data, you have to set the value of the whole function as the value of mydata.  To do this, type return(mydata) just before the }

8.       Now when you run  the getdatafile(“put the file name here”, “put the path name here”), the data prints to the console again.

9.       To store this data for later use, you have to type mydata<-getdatafile(“filename”,”pathname”)).  The reason for this is that mydata only existed within the function and the ‘logic’ of a function is that you return a value of a function, not a value of things inside the function.  To tease this out further, you can have mydata=3 set before you call the function.  Unless you call the function “onto” mydata, mydata will continue to be 3, despite having read a whole file into another mydata within the function.

So what has been learned here?

·        A function has a specific form with

a.       The name of the function

b.      <-

c.       Function

d.      () containing arguments

e.      {}

f.       Code within the {}

·        The function exists to arrive a t a value, that might be one number or a large table.  This is what is returned.

·        The value of the function vomited up to the console unless we direct it to a variable such as valueofgetdatafile<-getdatfile(“filename”,”pathname”)

·        If we redirect the output into a variable within the function, such as mydata<- read.csv(filepathcsv), then we must remember to make the output the value of the function again but typing return(mydata) just before the last }

·        Once again, if we don’t want mydata vomited up over the console we must direct it into valueofgetdatafile

All this seems unnecessarily convoluted but, as I understand it, it is a function of the modularity of object oriented programming.  Don’t fight it, just master it.

As a final frill, we are going to make it easier to type in the file name.  At present, we have to type in “24.csv” or “345.txt” etc.

Can we make it easier and let people simply type (24,”path name here”)?

To do this, we will use a command called sprintf and two other commands called paste as.numeric. 

Sprintf allows us to add trailing zeros to a number.   For 1 to be 001, and 10 to be 010 and 100 to stay 100 –

filename<-sprintf(“%03d, id)

Notice that this command will fail when you want a file called 1000.csv.

To add on the .csv, use the paste command.  The paste command makes lists and adds spaces by default. Sep=”” removes the spaces.

filename<-paste(sprintf(%03d,id),”.csv”, sep=””)

Notice that you might have thought the items to be concatenated would be in their own (), but they are not.

Also notice, if you try this, that it still will not work because the id has been turned into text or character. To keep the number as the number, we will use as.numeric

filename<-paste(sprint(“%03d”, as.numeric(id)), “.csv”, sep=””)

Replace id in the read.csv (or read.table) commands and the user is at liberty to put the number of the file only without the extension into the getfiledate(id,path).

Note also that this little routine does not help if some of the files are .txt and some are .csv. They must all of be same type and the code must reflect the type and the read.csv or read.table must match.

You should now have a script that reads

getdatafile(function(id, path){


Mydata<-read.csv9(file=file.path(path, filename),header=TRUE)



And you use the script –

Gethecontentsofdatafile<-getdatafile(23, “datafiles”)

Where you put the number of the file that you want and datafiles is the directory /R/datafiles where the file can be found.

To retrieve the file, you type



This post covered how to read files into R from three sources – an external url, a folder on your machine, and from one or more directories containing many files.

It is important to check whether you are using .txt or .csv files and to change the scripts to match.  Also change the command – read.table is for .txt and read.csv for .csv.

It is also necessary to have the labelling of files in some coherent pattern.  This script goes up to 999 files and does not handle files with names like abc.txt.

We also covered the basics of a function – and the very confusing return function. It is best to play around with this until it becomes more intuitive.

Finally, we used three more commands –sprint that is used for printing and has a useful feature for adding trailing zeros and can make 1 into 001.

As.numeric stopped 001 turning into text and keeps it as a sequence of three digits.

And the paste command allows you to add .csv on the end and its feature sep=”” allows us to remove a space so 001 .csv becomes 001.csv.

I hope you find this helpful

Using data.frame in R

Data frames in R

A useful feature of R is the data.frame.

What is a data.frame?

Without being an expert in R: a data.frame is like a table.  So when I read a table from a .csv file, then it read it into a data.frame.

mydata<-read.csv(“myfile.csv”, header=TRUE).

Reshape data with a data.frame on the fly

A very useful feature of a data.frame is that I can construct it, on the fly, so to speak.

Let’s say I make a list that could correspond to column, in ordinary English.


And now imagine that I concatenate ten letters into a row:

row1= c(“a”,”b”,”c”,”d”,”e”,”f”,”g”,”h”,”I”,”j”)

I can make data.frame of two columns with col1, as is, and row1 turned into a column.

data.frame(col1, row1)

This is a very handy piece of data reshaping and I can do this with any combination of numbers.

I can also make this a bit neater add headings to my columns by extending the command

data.frame(vname1 = col1, vname2 = row1)

If I need to return this from a function then of course I place the command in the line above where the x is: return(x)

If I need to put the data.frame into an object, then myobjectname<-data.frame(vname1 = col1, vname2 = row1)


Getting started in R

I am currently doing the John Hopkins course on R that is offered through Coursera.   There is likely to be a gap between taking the course and using R, and these are my notes on how to get started.

Software and setup

  1. Google R, download the version to match your operating system, and install it on your machine using the defaults.  Set up a data folder in My Documents.


  1. Store your data in your folder in a .csv file.
  2. Use the first row to label your columns.
  3. Use NA to mark missing data.

Read your datafile

  1. Open R with the shortcut
  2. Read your datafile using this command substituting the filename for datafile:  data <- read.csv (“datafile.csv”)
  3. List your data to screen with this command: data
  4. Note that you can use any name you like instead of “data” [though I imagine there are some unallowable names]

Find out the number of rows/cases and varibles/columns

  1. To find out the number of columns where data is the name of your data as above : ncol(data)
  2. To find out the number of rows where data is the name of your data as above : nrow(data)

Print out the first line to inspect the names of your variables/columns

  1. Use this command where data is the name of your data as above : data[1, ]

Take a subset of your data

  1. For sake of the example, let the name of your first variable be VAR1 and your third variable be VAR3
  2. Make a new dataframe containing all rows where the values of VAR1 and VAR3 are as shown: newdata <- subset(data, VAR1 > 31 & VAR3 > 90)

Take a subset of one variable excluding missing data

  1. Set up a new variable/vector containing a LOGICAL variable which is TRUE when a value is missing: VAR1BAD <-[,1])
  2. Set up a new variable/vector that copies the values from the original vector, providing they are not “bad”: VAR1GOOD <- data[,1][!VAR1BAD]

Do some basic statistics on your newly created variable/vector

  1. Mean[VAR1GOOD]
  2. Max [VAR1GOOD]

Issues covered during the first week not listed above

  1. Vectors must contain data of the same type i.e., numeric, character, or logical
  2. A list can contain a mix of types
  3. When a vector, as opposed to a list, has mixed types, the type is “coerced” to the LCD, so to speak – logical is reduced to numeric (1,0) and numeric and logical is reduced to character
  4. R uses factors – which in essence are labels such as “male” and “female” where other statistics programmes used numerals. Note that the underlying value might actually be numerical.
  5. Data is read in as a dataframe rather than a matric i.e. as a table that can contain columns of different types. Dataframes can be converted to matrices.
  6. There are various tricks for using massive data sets not covered in this post.