Buy

Books
Click images for more details

Twitter
Support

 

Recent comments
Recent posts
Currently discussing
Links

A few sites I've stumbled across recently....

Powered by Squarespace
« More problems at Skeptical Science | Main | Quote of the day »
Thursday
Jan192012

Quote of the day

I should declare that I am an extensive user of freedom of information legislation, particularly as regards universities, which I have found unutterably tiresome and difficult to deal with. One of their more tiresome habits is to refuse to provide information in anything other than PDF format. They get it in Excel, or whatever form, and translate it into PDF to provide it to me, merely to cause me extra work. I have to buy a program to suck it out of the PDF again. PDF is not a transmissible format, as it were, and they are merely trying to make life difficult by putting it in that format. So I would like to be sure that when data are provided they are provided in a properly reusable format. I have never come across a data set that cannot be reduced to tabbed, delimited text. Maybe that happens in a collection of tables, but data are essentially a simple thing. Although the data may be held in an immensely complex form in the program that the scientists are using, in any program that I have come across it should be easy-if only for the purposes of sharing with other people-to drop out at least the base data into relatively simple form.

Lord Lucas, speaking in the debate on the Protection of Freedoms Bill, has clearly experienced some of the same frustrations as others who have tried to get information from universities.

PrintView Printer Friendly Version

Reader Comments (14)

Climate-XML
Simple. Problem fixed

Jan 19, 2012 at 8:20 AM | Unregistered Commenterandy scrase

I wonder how many of his fellow lords understood any of that?

Jan 19, 2012 at 9:02 AM | Unregistered CommenterJames P

The real problem is scanned information whether in pdf or not. If one wishes to quote from the document it is hard work and if long underlined URLs with underscores are used it is a real pain.

In fairness to the Met Office, when I complained and asked for a character based copy they apologised, remedied it quickly and said they understood the issue. So ask for an electronically readable copy any time people do it.

Jan 19, 2012 at 10:37 AM | Unregistered CommenterDavid Holland

I must get my eyes tested again. For a moment I thought it was Lord Lucan.

Jan 19, 2012 at 11:05 AM | Unregistered CommenterPhillip Bratby

Lord Lucas studied Physics at Oxford University. And he is a Chartered Accountant.

And he is a hereditary peer, not answerable to those who ennobled him.

http://en.wikipedia.org/wiki/Ralph_Palmer,_12th_Baron_Lucas

Would that there were more independently-minded, intelligent and thoughtful peers like him, active in the business of the House of Lords.

Jan 19, 2012 at 11:39 AM | Unregistered CommenterCassio

Could it be that one reason for wrapping up the data as a pdf, is to make it relatively tamperproof.

However, there's no reason why the data shouldn't be provided both as an spreadsheet file and as a pdf.

Jan 19, 2012 at 11:53 AM | Unregistered CommenterJoe Public

Cassio- and that is precisely why the last Government, in particular, reduced the hereditary peerage representation as much as possible.

They can't be trusted to toe the Party line.

Jan 19, 2012 at 12:18 PM | Unregistered CommenterDon Keiller

I sympathize with data distribution by pdf. I once wasted several days debugging a problem in data conveyed by Excel file, only to discover that the problem was introduced by the receiver and that the file I had sent him had been used directly, then modified so as to introduce (innocently) the problem. He had modified the file I sent him, had not kept the original and much time was wasted discovering the problem - and all with no ill intent.
Likely the best course is to send a passworded pdf as the archive version, and ALSO whatever it is in whatever the most usable native format might be, CSV for example.

Jan 19, 2012 at 12:55 PM | Unregistered Commenterj ferguson

I've been lurking here for a while but this is the first time this software geek has felt competent enough to comment on something.

As others have said, it's pretty straightforward - for tabular data, CSV is ideal. For datasets where you wish to convey relationships, XML is king, or if verbosity is a problem then JSON is possibly acceptable.

There are parsers for any of those three for virtually any app you may wish to use within those contexts.

Jan 19, 2012 at 2:23 PM | Unregistered CommenterThrog

Sorry, I forgot to add - the added benefit of using text rather than binary formats such as PDF is that they tend to compress much better, so generally you'll get more bang for your byte from your zip files.

Jan 19, 2012 at 2:25 PM | Unregistered CommenterThrog

@Cassio:
"Would that there were more independently-minded, intelligent and thoughtful peers like him, active in the business of the House of Lords."

I'm afraid he's now top of the list for the cull in the next constitutional shake-up.

Jan 19, 2012 at 9:35 PM | Unregistered CommenterPeter Dunford

tabbed, delimited text? CSV is so much easier to handle

Jan 20, 2012 at 11:32 PM | Unregistered Commenterdiogenes

diogenes - CSV = Comma Separated Variable = delimited fields with commas no? I think Excel likes it just the same as tabs.

Jan 21, 2012 at 12:09 AM | Unregistered Commenternot banned yet

General solution : slash government funding by £10,000 each time an FOI is dealt with obstructively.

Jan 23, 2012 at 8:45 AM | Unregistered CommenterPunksta

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>