Buy

Books
Click images for more details

Twitter
Support

 

Recent comments
Recent posts
Currently discussing
Links

A few sites I've stumbled across recently....

Powered by Squarespace
« Baling out? Probably not | Main | Reign of madness »
Friday
Oct112013

Cook’s consensus: standing on its last legs

This is a guest post by Shub Niggurath.

A bird reserve hires a fresh enthusiast and puts him to do a census. The amateur knows there are 3 kinds of birds in the park. He accompanies an experienced watcher. The watcher counts 6 magpies, 4 ravens and 2 starlings. The new hire gets 6 magpies, 3 ravens and 3 starlings. Great job, right?

No, and here’s how. The new person was not good at identification. He mistook every bird for everything else. He got his total the same as the expert but by chance.

If one looks just at aggregates, one can be fooled into thinking the agreement between birders to be an impressive 92%. In truth, the match is abysmal: 25%. Interestingly this won’t come out unless the raw data is examined.

Suppose, that instead of three kinds of birds there were seven, and that there are a thousand of them instead of twelve. This is the exact situation with the Cook consensus paper.

The Cook paper attempts validation by comparing own ratings with ratings from papers’ authors (see table 4 in paper). In characteristic fashion Cook’s group report only that authors found the same 97% as they did. Except this agreement is solely of the totals – an entirely meaningless figure

Turn back to the bird example. The new person is sufficiently wrong (in 9 of 12 instances) that one cannot be sure even the matches with the expert (3 of 12) aren’t by chance. You can get all birds wrong and yet match 100% with the expert. The per-observation concordance rate is what determines validity.

The implication of such error, i.e. of inter-observer agreement and reliability, can be calculated. In the Cook group data, kappa is 0.08 (p <<< 0.05). The Cook rating method is essentially completely unreliable. The paper authors’ ratings matched Cook’s for only 38% of abstracts. A kappa score of 0.8 is considered ‘excellent’; score less than 0.2 indicates worthless output.

With sustained questions about his paper, Cook has increasingly fallen back on their findings being validated by author ratings (see here, for example). Richard Tol’s second submission to Environmental Research Letters has reviewers adopt the same line:

This paper does not mention or discuss the author self-ratings presented in the Cook et al paper whatsoever. These self-ratings, in fact, are among the strongest set of data presented in the paper and almost exactly mirror the reported ratings from the Cook author team.

The Cook authors in fact present self-rating by paper authors and arrive at 97.2% consensus by author self-ratings.

In reality, the author ratings are the weakest link: they invalidate the conclusions of the paper. It is evident the reviewers have not looked at the data themselves: they would have seen through the trickery employed.

[1] Sim J, Wright C. The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy March 2005; 85(3): 257-268. •• A well-cited review that provides good general description.

PrintView Printer Friendly Version

Reader Comments (137)

And has Cook released the raw data yet..?

Thought not.

Oct 11, 2013 at 2:12 PM | Registered Commenterjamesp

Please, please can we stop talking about this piece of dreck?

Oct 11, 2013 at 2:24 PM | Unregistered CommenterJack Savage

JS

But convincing Cook's followers (and Nuccitelli's) is more difficult. Worth demonstrating that SkS's critical faculties only point one way, though, IMO.

Oct 11, 2013 at 2:29 PM | Registered Commenterjamesp

@jamesp
A lot more data has been released, but only after severe pressure by Dan Kammen, the journal editor.

That said, two crucial pieces of data have not been released: rater ID, and response time.

I asked Cook to run two tests (in lieu of releasing that data):

1. Abstract raters by rater; and chi-squared test results whether individual ratings are indistinguishable from aggregate ratings.

2. Histogram of time between ratings: less than 1 second, less than 10 seconds, less than 1 minute, less than 10 minutes, more than 10 minutes.

Needless to say, Cook refused to release the data and refused to perform these tests.

The University of Queensland and the Institute of Physics (the publisher) told me to get lost.

Oct 11, 2013 at 2:43 PM | Unregistered CommenterRichard Tol

O/T but I've just noticed that all 300 pages of the Bish's posts are available. I didn't realise that back in the day this wasn't an anti-fascist blog at all.

Here's a good one from 2007.

http://www.bishop-hill.net/blog/2007/2/10/more-brainwashing.html

Children were being told 2007 was the warmest year on record!

Oct 11, 2013 at 2:54 PM | Unregistered CommenterJustice4Rinka

The damage has been done. This is not about the methodology it is about the message and it has been successful because we have seen the dolts that inhabit the Westminster bubble spouting and re-spouting the 97% figure. It is the same trick that the IPCC have used with the 95% plucked out of the air confidence level in the SPM. Once the toothpaste is out of the tube it is a devil of a job to get it back in. Public enquiry in about 2018 by my reckoning. Until then seguimos luchando.

Oct 11, 2013 at 3:07 PM | Unregistered CommenterDolphinhlegs

Richard

The University of Quackland, perhaps?

Oct 11, 2013 at 3:14 PM | Registered CommenterBishop Hill

RT

Thanks for the feedback. It seems to me you've done enough - the fact that he still refuses to release key information speaks volumes.

Oct 11, 2013 at 3:17 PM | Registered Commenterjamesp

J4R, huh?

Oct 11, 2013 at 3:18 PM | Unregistered Commentersteveta_uk

@ Richard Tol

Does Jo Nova have the details of how you were told to "get lost?"

She would probably be interested in writing a post for local consumption in Australia.

Oct 11, 2013 at 3:36 PM | Unregistered CommenterPav Penna

@Pav
Jo was bcc'ed on most (all?) correspondence, but ran only one story: my open letter.

Oct 11, 2013 at 3:38 PM | Unregistered CommenterRichard Tol

@ Dolphinlegs

Exactly. Just last night in a Newsnight piece about the Russian treatment of the Arctive offshore oil platform protesters, we had a Greenpeace Obergruppenführer insisting (twice) that "the newly-released report from the IPCC says that all fossil fuels should stay in the ground".

It's probably too much to hope that - in the not too distant future - the presenter would call 'Foul' on the lying little scrote. For now, though, the message remains unchallenged.,

Seguimos luchando indeed.

Oct 11, 2013 at 3:39 PM | Unregistered CommenterJerryM

So sorry, that was @ Dolphinhlegs

Oct 11, 2013 at 3:42 PM | Unregistered CommenterJerryM

Lots of repeat 97% claims in Dana's latest rant..

Oct 11, 2013 at 4:00 PM | Unregistered Commentersteveta_uk

Well as a 'communications' strategy it is effective and very difficult to counter. It is very easy for Bryony Braindead to mention 97% of scientists believe in global warming but it would take an hour and a half to explain to her and anyone else that is interested that what Bryony says is meaningless drivel.

Same with the 95% bollox - it cannot be coincidence that on Mock the Week last week the 'if this is the answer what is the question' section resulted in one of the panellists landing by chance on '95%'.

The so-called scientists produce the crap and the BBC ram it down our throats.

Public enquiry 2018.

Oct 11, 2013 at 4:16 PM | Unregistered CommenterDolphinhlegs

Exactly. Just last night in a Newsnight piece about the Russian treatment of the Arctive offshore oil platform protesters, we had a Greenpeace Obergruppenführer insisting (twice) that "the newly-released report from the IPCC says that all fossil fuels should stay in the ground".

It's probably too much to hope that - in the not too distant future - the presenter would call 'Foul' on the lying little scrote. For now, though, the message remains unchallenged.,
Oct 11, 2013 at 3:39 PM | Unregistered CommenterJerryM

Don't worry. Neither Russia, nor China, nor a lot of other nations, feel the need to challenge Greenpeace's "message". They will simply ignore it.

Oct 11, 2013 at 4:29 PM | Unregistered Commentermichael hart

If I had a gun that held 100 bullets and was told that it only had 5 bullets in it, I still wouldn’t point it at my head and pull the trigger no matter what the incentive.

Oct 11, 2013 at 4:54 PM | Unregistered CommenterTaxanban

Cook? Nuticcelli?
No reaching either of those two. Complete barn pots and totally lost causes.

Have you seen Nutty's latest offering today? Truly astonishing garbage. And they pass for 'scientists'? Risible.

Oct 11, 2013 at 5:07 PM | Unregistered CommenterCheshirered

Interestingly, or worryingly, when challenged about this paper, Dana Nuccitelli cited a paper to try and provide support that the consensus was real - THE SAME PAPER!!! Would you believe it? Talk about self certification "My paper is correct because my paper is correct"! Self delusion more like. Utter junk!

Oct 11, 2013 at 5:29 PM | Unregistered CommenterSimon

taxanban

Nor me. I'd save them for Deben, Huhne, Davey, Milibean and Nutty Jelly, probably. :-)

(Smiley for trolls who will otherwise accuse His Grace of harbouring psychopaths.)

Oct 11, 2013 at 5:34 PM | Unregistered CommenterJames P

Why do we keep going on about this? It's all total nonsense - and, in any case, it's achieved its object. Not that it did much real damage. Forget about it.

Oct 11, 2013 at 5:36 PM | Registered CommenterRobin Guenier

The sooner the children in Brownwar (probably a more apt name) realise that boarding a vessel in the high seas without the express permission of the Master of the vessel is an act of piracy the sooner they will cease such antics… erm… which means they will continue, and hope that appealing to our bleeding hearts will get them out of the mire they insert themselves into. The only difference between these Brownwar numpties and armed Somalis is that the Somalis have a rational reason to be there, and fully appreciate what they are doing!

“…all fossil fuels should stay in the ground.” That so, eh? Well, odd to note that the boats they were using were not row-boats; all had big, powerful engines. They deserve 15 years for that hypocrisy, alone! There is little difference between the Brownwar actifarts and Islamic jihadists; both are being brainwashed into utterly irrational activities for the nefarious gains of their masters.

Oct 11, 2013 at 5:36 PM | Unregistered CommenterRadical Rodent

Shub: "kappa is 0.08 (p <<< 0.05)"
While I'm unfamiliar with kappa, you state that this kappa value denotes poor agreement. Yet a small p-value implies that this is unlikely to be a chance occurrence. This seems a contradiction.

Perhaps the "p" was intended to be a "rho" (correlation coefficient)?

Oct 11, 2013 at 5:44 PM | Registered CommenterHaroldW

<I>...Please, please can we stop talking about this piece of dreck?...</I>

My first thought as well.

This is, of course, politics, not science. I can't see any point in trying to counter it with science. You need to counter it with politics.

Stress that prices are going to go through the roof to buy a crippled energy infrastructure, and ask if the 97% of scientists have agreed to that....

Oct 11, 2013 at 5:55 PM | Unregistered CommenterDodgy Geezer

I don't see the point of the test referred to in this post. We have no reason to expect author ratings to match the SkS team's ratings. They rated different things. That means the test is invalid.

As for the data that hasn't been released, there's never been any indication time stamps were recorded. Without knowing it was recorded, you can't say it's being hidden.

Oct 11, 2013 at 7:42 PM | Unregistered CommenterBrandon Shollenberger

All this reminds me of the Magpie Rhyme

One for sorrow,
Two for joy,
Three for a girl,
Four for a boy,
Five for silver,
Six for gold,
Seven for a secret never to be told.

Personally I like the Yorkshire version

One for Sorrow
Two for Joy
Three for a Girl
Four for a Boy
Five for Silver
Six for Gold
Seven for a tale never to be told
Eight you Live
Nine you Die
Ten you eat a bogey pie!

What we need is a climate scientist rhyme

any takers?

Oct 11, 2013 at 8:57 PM | Unregistered CommenterAnoneumouse

@brandon
For the Nth time, time stamps were recorded.

Oct 11, 2013 at 9:05 PM | Unregistered CommenterRichard Tol

"I don't see the point of the test referred to in this post. We have no reason to expect author ratings to match the SkS team's ratings."

Why then does the paper and Cook use the author ratings to validate their results?

Abstract rating in the study is a proxy for the content of the paper.

Oct 11, 2013 at 9:29 PM | Registered Commentershub

If I had a gun that held 100 bullets and was told that it only had 5 bullets in it, I still wouldn’t point it at my head and pull the trigger no matter what the incentive.

What is about the Greens and the "precautionary principle"? Why can they not understand that doing something also has costs as well as not doing something.

Taxanban: if you had pancreatic cancer (death rate in the 90%s, very short and painful death) would you take a drug that had a 5% chance of killing you more quickly but a decent chance of saving you? Most people would.

I agree that the chance of dangerous warming is about 5%. However the chance that we ruin our economies by fully decarbonising is more or less 100%. In order to reduce a fever you are willing to chop off a leg.

Oct 11, 2013 at 10:13 PM | Unregistered CommenterMooloo

Taxanban - no, of course not. But that's not analogous to the situation we are in. Imagine, instead, that some sadist (or psychology professor) was forcing you to point the gun at your head and pull the trigger, or else he would amputate your right arm. In that case, knowing whether there are 5% or 95% bullets versus blanks in the gun might well be the deciding factor.

That's the situation we are in - we are being asked to impoverish ourselves and all future citizens (because growth foregone cannot be recovered - humanity will be forever less wealthy if we adopt the low 'carbon' future the IPCC demands) to avoid a possibly imaginary danger. If we had a high degree of confidence that:
1) The planet was warming due to anthropogenic causes;
2) The effects will be adverse;
3) The adverse effects cannot be cost-effectively mitigated, and;
4) Resticting energy use and choice will reduce the warming to a safe level.

Then the low-carbon policy prescription would be justified. But note, we must believe ALL of these things with a high degree of confidence - if any one of them are not true then other policies are preferable, from mitigation to simply ignoring the issue. I think most people on the skeptic side of the discussion would argue that there are significant doubts on all of those links in the logic.

Oct 11, 2013 at 10:13 PM | Unregistered Commenterdcardno

Richard Tol, you claim that was for the "nth time," but you have never told me timestamps were recorded. You merely argued they were. If you've gained some new information since the last time we discussed the issue, you should say so. If you haven't, you're still just going off suppositions. Either way, your response to me is incorrect.

Shub, both sets of ratings are proxies. The quality of those proxies are different, especially in regard for precision. If you don't account for this when testing them, your test is meaningless. Saying the self-ratings invalidate the paper's conclusionsis making a huge claim. You need to do a lot more to support it.

Oct 11, 2013 at 10:51 PM | Unregistered CommenterBrandon Shollenberger

@brandon
Time stamps were recorded and saved, at least that is what John Cook told me.

Oct 11, 2013 at 10:56 PM | Unregistered CommenterRichard Tol

Richard Tol, when did this happen? Previously, I called your comments about time stamps into question. Your response was to say:

It was a distributed, computerized survey. Hard to imagine that time stamps were never recorded. Besides, Cook has never told me he could not give me time stamps. Only that he would not.

This directly indicated you were not told time stamps were recorded. It clearly showed their existence was a supposition of yours. I pointed this out, and not twelve hours you gave a different story:

I suspected time stamps were recorded, so I asked for them. Cook’s response confirmed that they have them.

I highlighted this inconsistency, pointing out that unless John Cook had told you they collected timestamps in the twelve hours between your comments, there was an unexplained change in what you were saying. You promptly ignored the point.

Now, a month and a half later, you're repeating the second story without explaining why it is different from the first. Given nobody can see the communication between Cook and you, that's unacceptable. If we're to take you at your word, you have to at least provide a coherent and consistent story.

Oct 11, 2013 at 11:16 PM | Unregistered CommenterBrandon Shollenberger

Thanks I've been unsure about what could be told by the stats on the reviewers and this is a total wake up call for me on what the possibilities are.

My initial stance on Cook was just seeing through its vagueness and lack of specificity in the paper. This gave me an excuse to read the tree-hut files and after reading them I had a sort of folk explanation why these guys are not saying anything interesting since they were whipping themselves up into a frenzy of cramming their way through thousands of papers like -er, well, like kids in a tree-hut.

One thing to notice in the THF is that Cook is quite clear in saying to everyone - if in doubt consign to the "no position" bin. This always struck me as a pose of a rigorous scientific stance. As if he was doing sceptics a favour. It isn't. Anyone should know that any "no position" papers would just disappear into a hole.

They are not seen on either side and have no effect on the final magic percentage.

Would they also know there would be still be thousands left to wade through and each individual can kinda keep their internal clock ticking in their head were they are at in their personal percentage? ;)

However, as Shub says here, the concept of the numbers not overlapping like this now come alive as a key way of seeing a flaw from the individual rater stats. I don't see how could Cook "manage" that even subconsciously.

Oct 11, 2013 at 11:33 PM | Registered CommenterThe Leopard In The Basement

I've decided I should expand upon my response to Shub Niggurath's question. The point is a basic one, but for people not familiar with things like inter-correlation measures, it may be useful.

Inter-correlation measures how well two data sets measure the exact same thing. The key is it has to be the exact same thing. If two different things are measured, differences in the data sets can reflect differences in what were measured rather than differences in the results.

I'll use the bird example to demonstrate. Suppose the new person was not totally inept. Suppose instead of misidentifying every bird, they simply failed to identify half of them. For that half, they wrote "Unknown." We'd get a kappa score indicating a lot of disagreement in this case even though the new person was right on every bird they took a guess at.

That is, the new person could have been right on every answer he gave, and every answer he gave could have matched the experienced watcher's. There'd still be a lot of disagreement, as measured by kappa scores, because he didn't give an answer for many of the birds.

The same is true for Cook et al's data sets. The self-ratings for Cook et al were done over entire papers. A whole paper has far more information in it than an abstract will. We'd expect a number of abstracts not to give enough information to classify them while that information does exist in the full paper.

It gets worse. A paper could explicitly endorse a position within its body while only implicitly endorsing that position within its abstract. In that case, an accurate self-rating would necessarily "disagree" with an accurate SkS rating. That's two examples of ways in which the kappa score would indicate disagreement whether or not the ratings were accurate.

In the bird example, the level of knowledge was different for each data set. In the Cook et al example, the level of information was different for each data set. These differences (and possibly others) ensure there will be disagreement in the data sets. That makes the test in this post completely meaningless.

Now then, I've tried to maintain a neutral tone, but I want to engage in a brief diversion. This post bases a claim:

In reality, the author ratings are the weakest link: they invalidate the conclusions of the paper. It is evident the reviewers have not looked at the data themselves: they would have seen through the trickery employed.

Entirely upon a test applied in a nonsensical way. If it is okay to say it "is evident the reviewers have not looked at the data" on such ridiculous grounds, what's it okay to say about the author of this post? What language should I use when pointing out this post is nothing but a test applied in obviously wrong way?

Oct 12, 2013 at 12:44 AM | Unregistered CommenterBrandon Shollenberger

what's it okay to say about the author of this post?

I really feel like answering to your comment now.

Oct 12, 2013 at 1:10 AM | Registered Commentershub

Let's be clear. Cook et al doesn't show 97% of their abstracts in their sample display agreement that humanity has cause more than half of the global warming since 1950.

Oct 12, 2013 at 1:12 AM | Registered CommenterThe Leopard In The Basement

I am a real layman waiting to be amazed sometimes but while I wait I stick by what I see ;)

For me the fact that Cook and Nuccitelli are trying to become leaders in the climate field on very little skill is actually a more meta interesting thing than anything else I see here.

Anyone can see that for whatever reason Cook et al promote their paper as showing more than it does using rhetorical techniques a lot of the time when asked to comment in the media.

Example:

John Cook:

http://opinion.financialpost.com/2013/09/25/counterpoint-consensus-of-evidence/


For example, one definition of consensus specified that humans are causing more than half of global warming, with rejection of consensus specifying that humans are causing less than half. Looking at the scientists’ ratings of their own papers with this definition, we found 96.2% consensus

Trust me you can say Cook didn't find 96.2%scientist rating their own paper said "humans are causing more than half of global warming"

Nuccitelli interestingly subtlety changed the emphasis on the 97% from his previous CiF post to his last, and I think I may probably have had an effect after I commented on it

First:

The 2013 Intergovernmental Panel on Climate Change (IPCC) report states with 95 percent confidence that humans have caused most, and probably all of the rapid global warming over the past 60 years. Approximately <>97 percent<> of climate experts and peer-reviewed climate science studies agree.

Second:

There's a 97 percent consensus on human-caused global warming <>in the peer-reviewed climate science literature<> and among climate experts.


I will be clear from my POV on Cook and the SkS crew here that I don't see them as being fraudulent.

In fact btw I find an annoying thing about people who have already read the tree-hut files and then later the same people say they are deliberate frauds, that these people are idiots for missing the obvious. I would say that you can see the SkS are not frauds - but they are clearly more likely to engage in group think with delusional (charismatic?) leaders.

I remember an interview with Derren Brown after he did his Investigates series, after he went to the USA and posed as various things like a religious healer and psychic etc. I remember him saying he left one episode on the cutting room floor. An episode when he engaged with some amateur housewife psychics who got together to read each others palms. When he realised they were quite harmless and rather pitiful - just deceiving themselves in a circular way. This reminds me why that tree-hut epithet is very applicable .

Oct 12, 2013 at 2:00 AM | Registered CommenterThe Leopard In The Basement

@TLITB
I've poured over the data time and again. There is no sign of fraud, plenty of incompetence.

Oct 12, 2013 at 6:49 AM | Unregistered CommenterRichard Tol

Leopard: a lie is not necessarily required for fraud. Once evidence of a mistake is brought to the attention of those making the mistake, failure to correct the mistake, or simply refusing to acknowledge the point is just as fraudulent as a lie. Cook is an ignoramus incapable of understanding his flaws, and unwilling to admit as much.

Mark

Oct 12, 2013 at 6:55 AM | Unregistered CommenterMark T

It didn't become fraud, Richard, until he refused to correct his obvious errors.

Mark

Oct 12, 2013 at 7:01 AM | Unregistered CommenterMark T

@Oct 12, 2013 at 6:49 AM | Richard Tol

I wasn't pointing any fingers at you or anyone else in my rant :)

@Oct 12, 2013 at 7:01 AM | Mark T

Cook is an ignoramus incapable of understanding his flaws, and unwilling to admit as much.

I know a lot here hold the scientific method dear but what Cook et al do for me is just support my nihilistic side about the rubbishness of the peer review system.

The real irony for me is that Cook gets quite excited about using his psychoanalysing toy skills he learned from Lewandowsky and hopes to be the one who eventually psychoanalyses all climate sceptics into a peer reviewed box and then throws it into the sea. ;)

The fact the SkS crew themselves are so overtly psychologically bent like this, and apparently uncritically accepted by the mainstream educational/climate community, is quite astonishing to me. I don't think I will ever be able to to come up with a peer reviewed reason why this is laughable. To me as a layman Cook et al's work is just weak and dull. It is obvious Cook has obviously just worked his way into a niche that can be leveraged into greater public attention than it deserves.


I'm one of the nihilistically - tired of this crap - people who think that a long time will be needed before a change comes in the "evidence based" bullshit nannying by egotistical inadequates like this we get today. So I am left with the hope this stuff *doesn't* get thrown out of the peer reviewed lit right now I hope it stays there festering for some future generation to marvel at ;)

I think one day it will be there, just ignored, and then some child will find it and ask. "Did you call yourself a scientist in the days when this crap was held up as the acme of science?"

Sweet. ;)

Oct 12, 2013 at 7:53 AM | Registered CommenterThe Leopard In The Basement

There is no sign of fraud, plenty of incompetence.
Oct 12, 2013 at 6:49 AM Richard Tol

Interestingly - that depends on the jurisdiction.

In the corporate world, fraud has to be a willful act under English Law.

Under US Law, it's quite easy and common to commit fraud by lack of diligence - eg in being careless in recording material facts when selling a business or shares. I have some personal experience of this.

The US based "climate science" community would do well do bear this in mind sometimes.

Oct 12, 2013 at 9:51 AM | Registered CommenterFoxgoose

Brandon's:
All points about how an abstract may contain less and/or different information than the full paper apply foremost to Cook's methodology. The Cook paper's starting assumption is that an abstract is a reasonable proxy for the full paper. If one agrees, the position is: abstract'='paper. From there, when you calculate a kappa statistic, you are comparing what can be compared, namely abstract(=paper) by Cook vs abstract by author. The kappa itself is a simple chi-square test. What it is used for, and what inferences you make is left to the user. A kappa calculated in this context can tell you about the validity of the ratings generated by the Cook group. When calculated against two (or more) independent observers scoring the exact same thing (say, an abstract), as a basis for inferential statistics, it can tell you about whether the scoring system is robust with respect to reproducibility.

If it is argued that papers and abstracts can be dramatically different, then, firstly, it is Cook starting assumption that breaks down.Consequently, he ought not to have performed his study the way he did at all. Secondly, he should not be performing any comparison between his ratings and author ratings who are presumed to have given them after reading their own full paper.

Objections of this kind have been raised about kappa statistics wherever they are applied. There have been numerous modifications to accommodate them. The larger point however is that aggregate statistics provide no information about performance in a per-point rating system.

Oct 12, 2013 at 12:19 PM | Unregistered Commentershub

I really didn't want to go and look at Cook et al. again but I found the following quote in section 3.2.

Among self-rated papers not expressing a position on AGW in the abstract, 53.8% were self-rated as endorsing the consensus. Among respondents who authored a paper expressing a view on AGW, 96.4% endorsed the consensus.

That is, as Brandon Shollenberger discussed above, in some cases there was no explicit endorsement in the abstract but the paper as a whole was rated quite legitimately by the author as endorsing the consensus (indeed this could have been the case whoever had rated the paper). So I'm afraid I don't think Shub's comparison is the right one to make. I'd be more interested to see some proper statistics on inter-rater reliability (e.g. kappa as Shub uses, or Krippendorff's alpha) which (as far as I can see) the paper does not provide.

But is anyone really surprised that most papers (or abstracts) that have been found with the search terms ‘global warming’ or ‘global climate change’ endorse AGW?

I find it far more problematic that the 'consensus' that the paper measured is a 'consensus without an object' as Ben Pile puts it, or a 'shallow consensus' as Andrew says.

As Andrew said in his report, the paper only measured the consensus "that carbon dioxide (CO2) is a greenhouse gas [and] that human activities have warmed the planet to some unspecified extent" but this has somehow been spun into far stronger statements.

Oct 12, 2013 at 12:26 PM | Registered CommenterRuth Dixon

Shub, you just demonstrated my point:

I really feel like answering to your comment now.

You made derogatory remarks about the people you criticized while raising what you felt were substantial criticisms. I felt what you said was inappropriate and would effectively sabotage any meaningful discussion this post could generate. When I responded in kind, your immediate reaction was to ignore my substantial criticisms of your post because of it.

If the goal of your post is to have a meaningful discussion, you've shown the approach you used is bad for it. The only thing this approach is good for is petty point-scoring.

Regardless, nothing about my tone or writing style changes the fact the test you used in this post was completely inappropriate.

Oct 12, 2013 at 12:36 PM | Unregistered CommenterBrandon Shollenberger

shub:

All points about how an abstract may contain less and/or different information than the full paper apply foremost to Cook's methodology.

Which is why they took steps to address it. For example, they filtered out abstracts that weren't rated as adopting any position. That accomodates the fact the different ratings use different amounts of information. In a case where information presented was completely accurate and raters did a perfect job interpreting it, this filter would completely resolve the differences between the data sets. Despite this, your test would show a massive level of disagreement.

The Cook paper's starting assumption is that an abstract is a reasonable proxy for the full paper. If one agrees, the position is: abstract'='paper.

No. This isn't even close to true. If I say a set of tree rings are a proxy for temperature, that doesn't mean I expect every tree ring in the set to automatically reflect temperature. There are many other effects that matter.

Your formulation here requires us assume a proxy should have a 100% correlation to what it proxies. Anything short of that invalidates your argument.

Secondly, he should not be performing any comparison between his ratings and author ratings who are presumed to have given them after reading their own full paper.

Nonsense. Cook et al used two different tests. Those tests showed the same general result. That's perfectly appropriate. Even if one test doesn't inherently demonstrate the validity of the other test, comparing the two tests is still completely appropriate. That's how we look at evidence. We look at the tests performed and see what their results tend to show.

Oct 12, 2013 at 12:50 PM | Unregistered CommenterBrandon Shollenberger

Ruth Dixon, thanks for finding that quote. The fact the paper discusses a significant difference in the data sets certainly makes using a test that finds that difference strange. It's difficult to argue a difference the authors highlighted invalidates the authors' results.

By the way, I agree about what you say is "far more problematic." Not only does it render the results of the paper practically meaningless, it demonstrates a massive deception on the part of the authors. They have intentionally created a situation in which their results are massively overstated.

Now for a moment of shameless self-promotion:

I was the first person to point out that problem!

Oct 12, 2013 at 12:59 PM | Unregistered CommenterBrandon Shollenberger

Ruth
We cross-posted. My post above answers to your point.

Inter-rater reliability between ratings, for abstracts, by the Cook group can be calculated. The data is available (http://t.co/yeUThmQoEY). The kappa figures are no better.

Whenever kappa statistics are worked out, the question of equal conditions arises. In the bird example, the new comer might argue that the wind was blowing too hard, or that the birds hid behind leaves just as he turned to look for them.

Think of identifying the bird by looking only at its head and right wing versus looking at it whole, for paper vs abstract. If someone claims that the former provides enough information for correct identification, the kappa should not turn out so poor. If they claim head+right wing is a good proxy for the whole, and classify a thousand, the kappa would be a good test, among others, in its assessment.

Oct 12, 2013 at 1:21 PM | Registered Commentershub

Thanks for your reply, Shub. I think I misunderstood your argument.

I now believe that you are saying that 'the abstract is not a good proxy for the paper' - and I think your kappa value and the statement in the paper show that you are right on that.

That is different to saying that the Cook et al. ratings of the abstracts were 'completely unreliable' (as I understood you to say in your post). I would say that abstracts could be invalid proxies for the papers, while at the same time rated completely reliably by Cook and his team (I have a paper that discusses reliability versus validity of indicators!). Of course I'm not making a judgement on how reliably the abstracts were rated, just that comparing the abstract and the paper ratings doesn't tell us that.

Anyway, there is so much wrong with the concept and execution of the Cook paper and I think your post was valuable in highlighting another way in which it has been spun. Mike Hulme's comments on the article by Ben Pile that I linked above are well worth re-reading.

Oct 12, 2013 at 2:07 PM | Registered CommenterRuth Dixon

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>