Skip to content →

Month: April 2011

Strip whitespace from a document in elementary Python scraping

Tsingy 010 by Olivier Lejade via FlickrSimple python scraper

I’ve been using Python to extract text from textfiles that I created from pdf documents using opensource PDFToText.

Strip() does not strip all whitespace

There is a nifty python command which strips off leading and lagging white space.  For example, to clean up a line containing the phrase that we seek, we simply write

line = line.strip()

or

line=line.rstrip()

line=line.lstrip()

Easy peasy ~ except that my phrases were being returned with a long list of whitespace to the right.

The whitespace is not visible to the naked eye of course and I deduced that from the presence of word~long blank~closing quotes.

NBSP will not be stripped by strip()

My own diagnosis took me up some blind alleys.  In a thoroughly confused state, I sought help on StackOverflow.  While I slept, a helpful chap in Australia read me the riot act on confusion and told me what was likely to be my problem.

The whitespace wasn’t a space after all. It is a NBSP – non break space.  That is, a marker that is the opposite of forcing a page break – it prevents  a page break at that point.

Knowing this, all I had to do now was search for NBSP using its ASCII code “XA0” ( and that 0 is zero).

Simple python code to find and strip NBSP

So this is what I did:

I compiled a search term as

Snap = re.compile(r”””

(XA0)                    # searching for NBSP that shows up as white space but doesn’t leave with strip()

“””, re.X)                    # re.X allows this verbose layout with comments

matchObj =  snap.search(line)

if matchObj:

# Discard the line at the point where matchObj starts

line = line(matchObj:)

else:

pass

Clean NBSP from textfiles using Python

Hey presto – my line is cleaned up and the offending NBSP have gone.

 

Leave a Comment

Commuters agree to be deprived of life for eternal slumber

Frustration by greencandy8888 via FlickrfYesterday evening, the M1 motorway heading north out of London was closed – for 24 hours.  Thousands upon thousands of commuters going home and people heading north for the weekend were stranded.

Staying in London overnight is a large expense for a commuter.  Outgoings will be at least 100 pounds.  Your dogs and cats back home remain unwalked and unfed.  And I put that first because I am British.  Your partner and children might be ill amused too.

There is no insurance for commuter travel.   And no liability for the operators or the utility providers.  The commuter bears the risk as an Act of God.

Yet we don’t treat our commuter travel as part of the reason why we travel.

@documentally was grumbling.  I don’t blame him because I would have been worn out with frustration too.  And if I am honest, I’ve cut down my use of public transport to the minimum.

But the irrelevant frustration, the signs that we are going along with senseless commodification of our lives that only hurts us led me to wax lyrical.

My tweet of the morning that infuriated @Documentally even further

@Documentally We’ve been duped into believing that several hours travel isn’t an adventure – deprived of life for eternal slumber?

2 Comments

Don’t brag and don’t whimper!

Delight by Nagesh Kamath via FlickrAdventure, surprise and discovery

I am quite an adventurous person.  I like experiencing new things and the biggest prize is discovering a new method or way to get something new.

Tales of exploration

I also love listening to stories of people who think the same way.  But there is a paradox in our fascination with surprise and novelty.

Don’t brag and don’t whimper!

Stories where we succeed are not actually all that interesting.  It is the stories where we fail that are funny and interesting.

That’s life.

  1. We get fun doing something surprising and new.
  2. People laugh uproariously at our of stories of ill-judged hope

We don’t get both!

  • People don’t really want to know what you did well – they want to do it themselves.
  • People don’t give a jot about your misery but they will appreciate a laugh at your expense.

The fun is in doing well, or, recounting flops.  Don’t brag and don’t whimper!

I get the impression that this ethos is not widely shared.  What do you think?

Leave a Comment

Is the universe capable of having your city at its center?

View from the Rockfeller Center - Top of the rock - 51 by caccamo via FlickrStanding where you are – what do you see?

Psychologists angst quite a bit  over whether there is an essential us  or whether we are creature of circumsances.

Of course we are both and neither.

Without a deep respect for the place where we find ourselves, how can we see the world?  Irish Yorkshireman poet David Whyte calls the place we stand “hallowed ground”.

Birmingham poet, Roy Fisher is functional as  any Brummy should be.

 

The universe, we define

As a place capable of having

A place like this for its centre.

 

There’s no shame/ in letting the world pivot

On your own patch.  That’s all a centre is for.  (p.13).

Roy Fisher

 

( I must buy his book but I haven’t discovered the title yet.)

Leave a Comment

Is your soul in your city?

365.107 dancing on ruins by aaron.bihari via FlickrIs your city long past its prime?

I can understand the argument that many British cities, like Liverpool and Birmingham have

  • “outlived the lifespan of their own economic base or infrastructure and must now live primarily by their “superstructure”

and

  • “such institutions as museums-of-local-life or tourist-related service industries which recycle and re-package the industrial past assume a primary role in the local economy”

(Peter Barry in Contemporary British poetry and the city)

Are we hankering after times long gone?

There is also nothing wrong in selling history, geography and a variety of temporary, low grade experiences.  Though not from a holidaying culture,  I too have been on ‘holiday’ in my time.

But it makes no sense to

  • Think we can roll the clock back and re-assert the raison d’etre a place had in the past.
  • Deny that the old  raison d’etre has gone out with the tide of history.

Is there not a place which speaks to our soul?

If we aren’t selling history (and enjoying selling history) maybe we should move to a city which has a raison d’etre that speaks to our soul.

I know we don’t all have a choice but I am sure clear thinking will give us more choices.  I know from past experience that  it is utterly deadening to live in a place that has lost touch with why it exists.

Like a traditional farmer in winter, a city might be enjoying the fallow winter and living off stored harvests.  That is OK too.

It’s the self-delusion or alternative cynicism that makes us feel zombish.

Why does our city exist?  Do we empathize with its soul?

Why does our city exist?  And do we empathize with its soul?

What is the resonance between us and the city where we live?

2 Comments