Read data files into R

Make what you will of this. It was written during a long day trying to make sense of the JHU course on R with liberal helpings of the simpler course from Princeton.  I haven’t checked the text for accuracy or typos and have put it here just so I can find it when I need it. If it helps you to make sense of R, good.

Get your data and save it in a .csv file

Your first task when using R to do statistical analysis is to collect the data.

Layout your data

Normally, you lay out your data in a table. Observations, cases, instances or people are in the rows. Variables or things you observed are in in columns.

Usually there is one column, which is unique but anonymous identifier for each case/row.

You will also have columns for ‘factor’ information such as gender (male/female/other), age (0-125), and so on.

You can put the column names in Row 1 – or you can leave out the column names and put the first case in Row 1.

Save your data

It is quite normal to load data into an Excel table and then to save it in .csv (comma separated value) format. 

.xls format can be read but .csv is more common and should be your standard practice.

You can also capture your data in Excel and save it as .text file. I will show you how to read both the .csv and .txt.

Read your data into R

Your second task is to read a data file into R so that you can use it and analyse it.

You have three primary ways to read in your file.  By the time you have worked through these methods, you will have mastered several basic R commands.

First method: Read the file from an online source

1.       Online urls tend to be long and cumbersome. So put the url into a variable called fileurl.

2.       We use <- to equate the variable and the url

3.       We surround the url with “”

4.       fileurltxt<-“http://spark-public.s3.amazonaws.com/stats1/datafiles/Stats1.13.HW.02.txt” [name taken from one of the Coursera courses]

5.       For avoidance of confusion: notice this is a text file and even though we have made a variable with this name, the file we read in remains a .txt file.

6.       Alternatively: fileurl csv<- “http://spark-public.s3.amazonaws.com/stats1/datafiles/Stats1.13.HW.02.csv”

7.       Now you can see why my variable was named – file-url-csv or file-url-txt – I don’t want to forget what I have done.

8.       To read a .txt file, simply:

mydata <- read.table(fileurltxt, header = TRUE) or mydata <-read.table (fileurltxt, header=T)

9.       To read a .csv file, simply:

mydata <- read.csv(filurlcsv, header = TRUE) or as above, abbreviate TRUE to T

10.    If you leave off the header=TRUE, the first line of the file is treated as data. Let’s spell this out: if you leave off the information about the header and the first line is a set of column headings, these will be erroneously treated as the first case. If you do not have headers, then of course, replace TRUE with FALSE or, just take out header=TRUE. I recommend setting header = FALSE so that when you return to your program after many months, you are immediately clear about what you did.

What you have learned to do so far

·        To set up a variable to store a long url

·        The <- command to “put data into a variable or matrix or table or dataframe”

·        To put “” around strings (includeing urls)

·        To check whether your data is in a .txt or .csv file

·        Adifferent command to read .txt and .csv files (read.table and read.csv, respectively)

·        To add header=TRUE or header=FALSE to say whether the top line of the file contains column headings or data.

·        To read your data into a data frame called mydata.  You can call mydata what you like (though you might discover some reserved names). Use a name that is short, descript and memorable.

At this point, you can easily read data from a location on the web and you have data in mydata ready to use.  If you want to see your data, simple type mydata followed by enter and your data will be listed.  If you have a very long file, don’t list all of it.  Type head(mydata) instead to get the top few lines and tail(mydata) to get the bottom few lines

So you have learned to more things:

11.    To read a data frame, simply type its name

12.    To read the first or last lines of a large dataframe, type head(dataframename) or tail(dataframename).

Read the file from a directory or folder on your laptop

If you have downloaded or aved a file onto your laptop, then you are going to follow exactly the same procedure as above. But you will have to replace the url within “” with a path and filename.

Begin by learning how to set your working directory.

1.       To find out your current working directory, use: getwd()

2.       To set a working directory, use: setwd(“C:/users/yourusername/documents/R”)

3.       Check that with getwd()

4.       Of course, if R does not exist, make the directory and put your datafile there

5.       The reason I used R is that when you load packages and libraries, R automatically makes this folder

6.       Note you can also put your data in a sub-directory “C:/users/yourusername/documents/R/datafiles”

7.       Of course, datafiles can be anything you chose to name it

8.       Also note that you must use / not \

9.       Assuming your data file is in /R/datafiles and that your working directory is R, then your path is “datafiles”

10.    Here are your new commands for reading a .txt file

filenametxt = “your file name including .txt”

path = “datafiles”

mydata<-read.table((file=file.path(path, filenametxt), header = TRUE)

11.    And for .csv

filenamecsv =”your filename including .csv)

path =”datafiles)

mydata<- read.csv((file=file.path(path, filenamecsv), header=TRUE)

Read any one data files in any one of many directories

Now let us imagine you are working on a large project and you have several data sub-directories (part1, part2, etc.).  And each directory contains many datafiles. The datafiles might be named with numbers (001.csv to 999.csv).  Everything in this section applies to .txt files too, but you must use read.table not read.csv and substitute .txt for .csv

1.       To be able to tell R to find the file you want, we set the file name as an argument in a function or script.

2.       Set up the barebones of the function:

getdatafile <-function(id, path){

}

3.       Type getdatafile to see the script

4.       Edit the getdatafile function by using fix(getdatafile) .  Note that you will edit in a little popup and when you save, you should see the corrected function on the original console.  I have had endless trouble with this, so work carefully to eliminate muddle. 

getdatafile <-function(id, path) {

read.csv((file=file.path(path, id)), header=TRUE)

}

Now run getadatafile(“filename.csv”,”datafiles”) by replacing your

The full datafile should come up on the console.

5.       Explore that further by editing getdatafile again and reading the file to mydata:

mydata<-read.table((file=file.path(path, id)),header=TRUE)

6.       Now the data will not read out. Moreover when you type mydata, you will get NULL, or a message that it does not exist, or values of my data that you put there earlier.  So rerun this command but clear mydata first with mydata<-0.

7.       To have access to the data, you have to set the value of the whole function as the value of mydata.  To do this, type return(mydata) just before the }

8.       Now when you run  the getdatafile(“put the file name here”, “put the path name here”), the data prints to the console again.

9.       To store this data for later use, you have to type mydata<-getdatafile(“filename”,”pathname”)).  The reason for this is that mydata only existed within the function and the ‘logic’ of a function is that you return a value of a function, not a value of things inside the function.  To tease this out further, you can have mydata=3 set before you call the function.  Unless you call the function “onto” mydata, mydata will continue to be 3, despite having read a whole file into another mydata within the function.

So what has been learned here?

·        A function has a specific form with

a.       The name of the function

b.      <-

c.       Function

d.      () containing arguments

e.      {}

f.       Code within the {}

·        The function exists to arrive a t a value, that might be one number or a large table.  This is what is returned.

·        The value of the function vomited up to the console unless we direct it to a variable such as valueofgetdatafile<-getdatfile(“filename”,”pathname”)

·        If we redirect the output into a variable within the function, such as mydata<- read.csv(filepathcsv), then we must remember to make the output the value of the function again but typing return(mydata) just before the last }

·        Once again, if we don’t want mydata vomited up over the console we must direct it into valueofgetdatafile

All this seems unnecessarily convoluted but, as I understand it, it is a function of the modularity of object oriented programming.  Don’t fight it, just master it.

As a final frill, we are going to make it easier to type in the file name.  At present, we have to type in “24.csv” or “345.txt” etc.

Can we make it easier and let people simply type (24,”path name here”)?

To do this, we will use a command called sprintf and two other commands called paste as.numeric. 

Sprintf allows us to add trailing zeros to a number.   For 1 to be 001, and 10 to be 010 and 100 to stay 100 –

filename<-sprintf(“%03d, id)

Notice that this command will fail when you want a file called 1000.csv.

To add on the .csv, use the paste command.  The paste command makes lists and adds spaces by default. Sep=”” removes the spaces.

filename<-paste(sprintf(%03d,id),”.csv”, sep=””)

Notice that you might have thought the items to be concatenated would be in their own (), but they are not.

Also notice, if you try this, that it still will not work because the id has been turned into text or character. To keep the number as the number, we will use as.numeric

filename<-paste(sprint(“%03d”, as.numeric(id)), “.csv”, sep=””)

Replace id in the read.csv (or read.table) commands and the user is at liberty to put the number of the file only without the extension into the getfiledate(id,path).

Note also that this little routine does not help if some of the files are .txt and some are .csv. They must all of be same type and the code must reflect the type and the read.csv or read.table must match.

You should now have a script that reads

getdatafile(function(id, path){

Filename<-paste(sprint(“%03d”,as.numeric(id)),”.csv”,sep=””)

Mydata<-read.csv9(file=file.path(path, filename),header=TRUE)

Return(mydata)

}

And you use the script –

Gethecontentsofdatafile<-getdatafile(23, “datafiles”)

Where you put the number of the file that you want and datafiles is the directory /R/datafiles where the file can be found.

To retrieve the file, you type

Getthecontentsofdatafile

Summary

This post covered how to read files into R from three sources – an external url, a folder on your machine, and from one or more directories containing many files.

It is important to check whether you are using .txt or .csv files and to change the scripts to match.  Also change the command – read.table is for .txt and read.csv for .csv.

It is also necessary to have the labelling of files in some coherent pattern.  This script goes up to 999 files and does not handle files with names like abc.txt.

We also covered the basics of a function – and the very confusing return function. It is best to play around with this until it becomes more intuitive.

Finally, we used three more commands –sprint that is used for printing and has a useful feature for adding trailing zeros and can make 1 into 001.

As.numeric stopped 001 turning into text and keeps it as a sequence of three digits.

And the paste command allows you to add .csv on the end and its feature sep=”” allows us to remove a space so 001 .csv becomes 001.csv.

I hope you find this helpful

Using data.frame in R

Data frames in R

A useful feature of R is the data.frame.

What is a data.frame?

Without being an expert in R: a data.frame is like a table.  So when I read a table from a .csv file, then it read it into a data.frame.

mydata<-read.csv(“myfile.csv”, header=TRUE).

Reshape data with a data.frame on the fly

A very useful feature of a data.frame is that I can construct it, on the fly, so to speak.

Let’s say I make a list that could correspond to column, in ordinary English.

matrix(col1,10,1)

And now imagine that I concatenate ten letters into a row:

row1= c(“a”,”b”,”c”,”d”,”e”,”f”,”g”,”h”,”I”,”j”)

I can make data.frame of two columns with col1, as is, and row1 turned into a column.

data.frame(col1, row1)

This is a very handy piece of data reshaping and I can do this with any combination of numbers.

I can also make this a bit neater add headings to my columns by extending the command

data.frame(vname1 = col1, vname2 = row1)

If I need to return this from a function then of course I place the command in the line above where the x is: return(x)

If I need to put the data.frame into an object, then myobjectname<-data.frame(vname1 = col1, vname2 = row1)

 

Getting started in R

I am currently doing the John Hopkins course on R that is offered through Coursera.   There is likely to be a gap between taking the course and using R, and these are my notes on how to get started.

Software and setup

  1. Google R, download the version to match your operating system, and install it on your machine using the defaults.  Set up a data folder in My Documents.

Data

  1. Store your data in your folder in a .csv file.
  2. Use the first row to label your columns.
  3. Use NA to mark missing data.

Read your datafile

  1. Open R with the shortcut
  2. Read your datafile using this command substituting the filename for datafile:  data <- read.csv (“datafile.csv”)
  3. List your data to screen with this command: data
  4. Note that you can use any name you like instead of “data” [though I imagine there are some unallowable names]

Find out the number of rows/cases and varibles/columns

  1. To find out the number of columns where data is the name of your data as above : ncol(data)
  2. To find out the number of rows where data is the name of your data as above : nrow(data)

Print out the first line to inspect the names of your variables/columns

  1. Use this command where data is the name of your data as above : data[1, ]

Take a subset of your data

  1. For sake of the example, let the name of your first variable be VAR1 and your third variable be VAR3
  2. Make a new dataframe containing all rows where the values of VAR1 and VAR3 are as shown: newdata <- subset(data, VAR1 > 31 & VAR3 > 90)

Take a subset of one variable excluding missing data

  1. Set up a new variable/vector containing a LOGICAL variable which is TRUE when a value is missing: VAR1BAD <- is.na(data[,1])
  2. Set up a new variable/vector that copies the values from the original vector, providing they are not “bad”: VAR1GOOD <- data[,1][!VAR1BAD]

Do some basic statistics on your newly created variable/vector

  1. Mean[VAR1GOOD]
  2. Max [VAR1GOOD]

Issues covered during the first week not listed above

  1. Vectors must contain data of the same type i.e., numeric, character, or logical
  2. A list can contain a mix of types
  3. When a vector, as opposed to a list, has mixed types, the type is “coerced” to the LCD, so to speak – logical is reduced to numeric (1,0) and numeric and logical is reduced to character
  4. R uses factors – which in essence are labels such as “male” and “female” where other statistics programmes used numerals. Note that the underlying value might actually be numerical.
  5. Data is read in as a dataframe rather than a matric i.e. as a table that can contain columns of different types. Dataframes can be converted to matrices.
  6. There are various tricks for using massive data sets not covered in this post.

 

A simple ‘to do’ list for Drupal 7 using Date & Calendar

This post describes a very simple ‘to do’ list that I made in a Drupal 7 website.

The website already had a Calendar and Date function.  I can add a Date, and all its details, which is recorded as a node of content-type Date and displayed on a Calendar View.

Already installed and enabled : Drupal 7, Token, Pathauto, CTools, Views and Views UI, Date, Calendar, Computed Field

To add items and to put them on and off the ‘to do’ list

To make this simple to do list, I decided –

  1. To add the things I want ‘to do’as a Date and, if they have no date, I record an arbitrary start date – say the end of the quarter or the end of the half.
  2. When I want to move an item to my active ‘to do’ list, I simply edit the item and change the start date to today or if it on this week’s list, to Monday.
  3. When I have finished the task, I edit the item again to reflect the finish date.
  4. As I will describe below, an odd feature of the filter’s in Views required an extra Boolean filed so I check a box when the task is complete.

Permanent record

With this simple set up, I have a permanent record, in one place, of all my projects that are pending, and of all my projects in the past with the dates when I started them and finished them

Adding focus

To keep me thoroughly focused, I made two taxonomies – one for work and one for leisure. Both of these were added to Date so that later I can retrieve work and leisure activities into separate Views.

Additional information

As I have both start and end information, I added another field to Date to calculate the Elapsed Time between the date I started and ended a project. There is code available on the internet and I repeat it here.  Simply, we retrieve the entity values for the two times, make DateObjects with the values, and then choose our granuality (‘days’).

I also added another field to Date, Days in Queue, and added similar code to calculate the time between the day the Date was added and the start date.

To display my ‘to do’ list

To display my current ‘to do’ list, I made a View in a Block, in table format, and display the taxonomy tag and Date title

I also added two filters. First, using a Relative filter, I selected Dates with a start date equal or larger than now.  Second, I searched on the Complete check box.  It is important to manually put in 1 and 0 as the default values when you set up the field initially, otherwise they will filter will not work in the View. The reason that I added what seems to be a superfluous checkbox is that the default finish date for a new task is not NULL, as one might expect, but the start date. The only way to select tasks that have been started and not finished is to add the extra box.

As I have made a Block, I can put the Block in a side bar.

Note also that we can tell the Block View to order the items in the order of the Taxonomy.  So when you set up the Taxonomy, ensure you put the tags for work into the order of your daily routine.

Note also that I did try using Editable Views but firstly, it did not work out of preview, and secondly, the preview showed a massive date edit screen that would not be user friendly.  So to record a project as finished, I follow the link and edit the Date and check the complete box.

To set up my ‘to do’ list

To review my pending list of items in the queue, I made another View, this time as a page. This time I use all four fields – Date, title, Elapsed Time, Days in Queue – in a table.

I can also filter by taxonomy.

Here I can pick my items, follow their link and change the start date to today or Monday.

To review what I have achieved

For those days when I have been buried in a task and feel that I have achieved nothing, I have an ordinary View of all four fields – Date, title, Elapsed Time, Days in Queue – in a table. This time, I have two exposed filters.  The first is headed, ‘Finish on or after’ and the second is headed, ‘Finished before’.

By setting my time range, I can immediately see what I have finished during the period in question.

Leisure

I can repeat the entire setup for leisure items.

Features to be developed

My next tasks will be to track my Elapsed Time and Days in Queue, particularly for work items.

To understand my own planning behaviour better, I will group all my Dates by week and count the number of items inserted and the number of items that are finished. Now I will have the basic information to understand my queue – how fast am I adding things and how fast am I finishing things.  Ideally, I’ll plot these numbers on a graph.

To understand the chunking of my tasks, I’ll group them again by week and calculate the minimum, average and maximum Elapsed Time.   Hopefully, both the spread of times and the average time will decrease below 5 days – meaning, I have designed tasks that can be finished within a week.

And I will calculate the minimum, average and maximum Elapsed Time and Days in Queue for a moving period of quarters, halves and years so that I can be aware of how long I take to complete something I have decided to do and how my behaviour changes in time.

Summary

There it is.  A simply backlog manager.  Add Dates to a Calendar.  Use Computed Fields to monitor both the Elapsed Time and the Days in Queue. Use a check box to show a task is complete because filtering by a blank end date does not work.  And add two taxonomies so that you can split your lists and display tasks in order.

British Science that hasn’t gone to TED

We have TED.  And we have people who have not got to TED yet.

On Wednesday, I went to Birmingham for the first time in 25 years.  When I was there last, I was using a handful of mainframe computers at Birmingham University and their linguistic computing programs to compile a corpus of Zimbabwean English (which I duly took home on computer tapes about 30cm wide).

On Wednesday, my digital interests took me back for a workshop on publishing Kindle (more later).

A co-creator of the workshop was Kate Cooper of the new optimists. Kate hs rallied the scientists we probably haven’t heard of – the guys and women who are doggedly working in laboratories solving riddles with science and who haven’t popped up at TED.

The New Optimists is a compendium of 80 short essays about the scientists “view [of] tomorrow’s world & what it means to us.”

I’ve just got started and of course read the psychology first.  The piece by Michael West on positive organizational scholarship is spot on and that has encouraged me to read what scientists think in fields where I know very little about their frontiers.

Kate and co will be bringing it out on Kindle – so if you only read on holiday – look out for it.  But if you want to find out what is happening in our laboratories, grab a paper copy. The merit is that you use the margins to make notes and draw diagrams of stuff not familiar to you.

Here’s the link.

UK economy watch: Plastic Electronics

I thought I would start recording all the exciting bits of the economy.

Cambridge has produced some of the exciting developments in organic electronics. Which has led to a new e-reader, btw.

But my mood changed when I tried to find the details of the government awards and tried to subscribe by RSS feed. Hmm. . .

Ig20 - money shot by mugley via Flickrndustry: Plastic or Organic Electronics

Market Size

2010 . . .USD2bn (GBP1.3)

2020 . . USD120bn (GNP80.2)

Growth = 60x or 6000% in ten years

Government Subsidies

#1 8 projects to build the supply chain and “overcome barriers to UK exploitation of plastics electronics technology”

GBP7.4m including CBP800K from Engineering and Physical Sciences Research Council (EPSRC

#2 5 projects to develop commercial prototypes

GBP1.0m by Technology Strategy Board

Two of the projects

Announcement from  COI

Products

Circuits are printed cheaply onto rigid or flexible surfaces and rival silicon-based electronics in lighting and solar panels.

People

As at July 2010

David Willetts, Univesities and Science Minister

Iain Gray, CE of Technology Strategy Board

Websites

PRW.com

Plusplasticelectronics

Technology Strategy Board

Announcement from January 2007

British Interests

Plastic Logic (with Merck in Dresden and management in Mountain View

Noteworthy nugget: Eye to brain to computer screen

As we approach the end of 2008, yes, Japanese neuroscientists are able to recreate what we see from activity in our brain.

Here is  a link to what people were looking at and what the computer recreated.

If you are into neuroscience, you might also enjoy this TED lecture on learning from watching our own bran scans.

Were your grand-parents Luddites? And do you take after them?

Are you like your grandparents?  Or are you very different?

I’m quite excited by all the new science that is going on: biological engineering, nanotechnology, the particle collider, and so on.  We seem to be on a cusp of new age of technology.

Many people are very disapproving, of course.  And they probably think the internet is dangerous as well!  I wondered today.  Do you think their parents were Luddites?  Do you think their grand parents were Luddites?

I am not saying the twins separated at birth are likely both to be Luddites or both not to be Luddites!   But I did wonder if families have a tradition of welcoming technology, or treating it with raucous disdain?

Is your approach to new science and development similar to your grand parents?

I’d love to know!  Luddite or not? And does Ludditism run in your family?

Scientists ARE using social media

Parallel Session II: Making science public: data-sharing, dissemination and public engagement with science

Panellists:

Ben Goldacre, Open blog

Cameron Neylon, Bad Science blog &amp; Oxford University

Maxine Clarke, Nature

Chair: Felix Reed-Tsochas, Oxford University

Journals and peer-reviewed publications are still the most widely used channels through which research is disseminated within the scientific community and to a broader audience. However, social media are increasingly challenging the supremacy of editors, reviewers and science communicators. Blogging about science has become a new way of engaging ‘the public’ directly with researchers whilst researchers are increasingly using blogs within their own academic communities for peer-review purposes. Panellists will give their perspective on how social media have changed the nature of the scientific debate among scientists, and how they have impacted on engagement with the public understanding of science.

1. (Observation last night.) Two of the panelists list their blog as well as their academic affiliation. But are they academics too? Or borrowed for the occasion?

2. Missed opening remarks as struggled with weak internet connections here.

3. Now Cameron Neylon. Scientist – using soical media as his lab notebook. No peer review. Ppl could steal data. But could [crowd-source] review. Then discovered other scientists using social media to “do science”. Maxine Clarke of Nature said few scientists use social media but it is a rapidly growing community.

Exp – publicize details – ask people to take mmts.

Describing typical 7 year cycle of a research project.

Who funded the prizes (journal subs) for students. Completed project in 6 mo with invited paper and publication. Much more efficient.

Questions

FRT: How much has interaction changed?

Ben Goldacre. Journos often get issues wrong and dumb down issues. Does journo science news inform people with science degrees who work in a variety of roles? Blogs can be niche (mindhacks on neuroscience and psychology). Imagine 2000 science blogs with 500 readers each talking to 1m people.

Royal Society Prizes for science books recently – 20K in prizes an more in admin – books selling 3000 copies only. Science Minister [google the spat] – committees have no new medai expereince.

Blogs encourage us to be clearer and sounder about what we write. Link culture. Journos don’t want you to know they’ve copied and pasted from a press release. Cited an example of not checking primary sources. We link to primary sources.

FTR: [Will blogs kill science journalism?]

BG: Old science journalism is dumbed down for us. We need a patchwork with better stuff for people who are informed.

FTR: Danges of sloppy journalism. But issue of quality and trust.

BG: Journos say internet is undistributed mush. Need to learn to use internet. Easy to tell when something is [rubbish]. Lots of dodgy stuff everywhere. Want more and let the street [filter].

Maxine Clarke: As editor, don’t equate blog in that way. But likes blogs and interaction. Nerdish quality – correct – find niche. Look for Open Lab.

BG: Disintermediation – 70% of science words on BBC Radio 4 are spoken by scientists themselves. Shepherded and coached to be clear – but speaking. Look at Radio 4 for examples.

Cameron Neylon. Abandon term public – don’t distinguish between public and scientists. Engage people with the scientists. Let people contribute to science – even be authors.

BG: Interdisciplinary communication. Semi-professional communication promotes . . . Need a place between newspapers and journals.

FTR: Will social media allow us to differentiat public?

Cameron Neylon: Arrogant and lazy toward non-scientists. Need not to be [snobbish]. Get support for funding.”public

Questions

Dussledorf: What keeps scientists from using Web2.0?

BG: Younger people use Web2.0? Get RAE to reward unmediated engagement with “public”. And pay or allow people to split jobs.

Maxine Clarke: Generational issues for journals like Nature. Friendfeed heated discussions about science.

Camero Neylon: Only just starting to explore social media for public and for science (see Friendfeed). New things are high risk strategies and they keep high risk behaviour for science. Won’t be taken seriously if you are out on a limb. People who are using Web2.0 are trying to get a tenured position. Some senior ppl involved. But 10 years in – more cautious.

BG. Use blogs as [scribble-pad] in lost cost threshold.

Question

❓ Time to read academic reports. Likes Nature for summary. Few Twitters using service. How are inst. like Nature making money out of it.

Maxine Clarke. Highlights from Nature very popular. Making money isn’t a serious concern for making money online – still experimental. Lack of time – Nature Network – some blogs to work out problems but also just about lab life. Social not about scientific work itself. Scientists are cerebral – therefore enjoy blogs.

OII: Fighting against moral panics? Rapidity of moral panics in journo. How does peer review play into process? Blogging about something published is out of step with production of work – time gap huge.

Cameron Neylon: 6.5bn spent on science. 80% of cost is peer review – count peer review ideas by 95%. Small proportion of important ideas – use traditional methods. Straight out of instrument and blogged if need for instrument.

Ben Goldcre. Peer review is best of bad lot. What is a scientific publication. Document of record. Methods and results to be published. Different types of publications. Need to recognise two types.

Maxine Clarke. Peer review increases quality. 95% of biological papers are rejected and some passed on to other journals. Cited a journal that publishes online with peer reports – need tagging system.

FTR – audience separating production and differentiation. [lost question]

Maxine Clarke. More journals publishing peer reveiws and opening up articles for comment. People tagged by subject. People don’t comment. Scientists conservative – assessed by publications. Power issues inhibit comment.

FTR- can social media change scientific debates.

Maxine Clarke. Widgets in newspapers to follow conversations – find hard to follow. Nature also makes txt accessible in “accessble” format. Conversation too fragmented.

Cameron Neylon. Publicatation is too high risk to be the place to innovate. . . online material not indexed by medline. Conversations in different part of research cycle.

BG: Structural issues. Draw strands together about topic – can it be open. Wiki-professionals – micro-credits for helping on something.

Maxine Clarke: Micro-attribution is growing topic. Av no authors is 6. Some consortia iare 100 or so.

Can contributions be attributed to you – technical issue.

FTR: open source modes of science. Triggers of open source science.

CN: Science is the great open source endeavour. What can we do that is useful? If cannot be replicated and cannot check details, not science.

Bill Dutton: Peer review publications – wrong place to look. Other phases of research process – lot going on. Less collaboration less at publication, high status, older people.

Maxine Clarke. [Internet playing up]

Question: Radio 4. Book only sold 3000 copies. Wonderful to have well written science blogs. Few ppl capable to of writing good science blogs. Problem is not quality but problem of selling stuff to consumers.

Ben Goldacre. Thtat’s why good

Lost a bit here – Said Business School’s internet connection is scribbled.

Recession opportunities: green our offices

The seriousness of the recession is exaggerated and underplayed!

All around us, we hear the doom and gloom of the recession and I think this talk is both exaggerated and underplayed  Indeed, it is exaggerated because it is underplayed.

The economy needs structural change

The economy has not been strained like the plant on my desk that will bounce back with a little water.  The economy has been strained like the continous salad on the window sill that needs to be replaced.

Britain has a long tradition of science

Such stress in the economy would be a disaster if there was no way of replacing it.  But we only have to watch TED talks to know we are on the cusp of major technological changes and though Britain does not contribute as much to the R&D efforts of the world as the US, we are up there and have a long tradition of serious science.

How will technological change open up jobs for you and me?

I am making it my business to look out for the job opportunities of the future and TED once again obliges with a future opportunity that does not require a PhD in science, though it is certainly based on science.

Green offices!

We are going to green our offices to jungle proportions.  Yep, you will work in a thicket and the last thing you will do every night before you go home is wipe the leaves of 10 bushes very carefully!   Once a quarter, you will pop your plants outside and bring in another set!

And for greening your office, you will

  • Save 15% of power and this is pretty important because 40% of the world’s energy is put into airconditioning.
  • You will feel heaps better and be ill less often
  • You will have 42% chance of an increase of 1% oxygen in your blood.
  • You will be 20% more productive.  That’s a lot.

So where is the opportunity?

In plant growing and tending of course!

I wonder how many people who run nurseries have been scribbling figures on the backs of envelopes.

  • How many airconditioned buildings are there in UK?
  • What is the capital cost of equipping the buildings with a new set of plants?
  • What will be the knock-on effect on air-conditioning businesses and power companies?
  • What would be the projected power decrease and how would it be offset by increased fumes as we ship plants across UK on our inefficent road networks?
  • Who else is effected?  Well, HR and productivity specialists are put squarely in their place at a 20% productivity increase!

What other side effects can you think of that I haven’t thought of?

And here are the details for the greening of your office from Kamal Meattle speaking at TED

Areca Palm

  • Co2 to Oxygen
  • 4 Shoulder high plants per person
  • Hydroponics
  • Wipe the leaves daily in Delhi or weekly in less congested place like Milton Keynes
  • Outdoors every 3 to 4 months

Mother-in-law’s Tongue

  • Co2 to Oxygen at night
  • 6-8 waist high plants per person

Money Plant

  • Hydroponics
  • Removes volatile chemicals like formaldehydes

Evidence of the benefits of green offices

  • Tried this green formula in Delhi office
    • 50 000 square feet
    • 20 year old
    • 1200 plants for 300 occupants
  • 42% probability that your blood oxygen goes up 1% when you spend 10 hours in the building
  • Reduced incidence of
    • eye irritation by 52%
    • headaches by 24%
    • respiratory illnesses by 34%
    • lung impairment by 12%
    • asthma by 9%
  • Human productivity increased by 20%
  • Reduction of energy requirements in the building by 15% because of reduced air conditioning
  • Replicating with 1.75 million square feet building with 60 000 plants

Importance of greening offices

  • Demand for energy will grow by 30% in the next 10 years
  • 40% of energy is used by buildings
  • 60% will live in cities with population of more than 1 million people

I must get this together before next winter!

Enhanced by Zemanta