- Bishop Hill blog - Where there is harmony, let us create discord

Thursday

Jul032014

Bishop Hill

Where there is harmony, let us create discord

Jul 3, 2014

Climate: Statistics

Climate: Surface

My recent posts touching on statistical significance in the surface temperature records have prompted some interesting responses from upholders of the climate consensus, with the general theme being that Doug Keenan and I don't know what we are talking about.

This is odd, because as far as I can tell, everyone is in complete agreement.

To recap, Doug has put forward the position that claims that surface temperatures are doing something out of the ordinary are not supportable because the temperature records are too short to define what "the ordinary" is. In more technical language, he suggests that a statistically significant rise in temperatures cannot be demonstrated because we can't define a suitable statistical model at the present time. He points out that the statistical model that is sometimes used to make such claims (let's call it the standard model) is not supportable, showing that an alternative model can provide a much, much better approximation of the real world data. This is not to say that he thinks that his alternative model is the right one - merely that because it is so much better than the standard one, it is safe to conclude that the latter is failing to capture a great deal of the variation in the data. He thinks that defining a suitable model is tough, if not impossible, and the only alternative is therefore to use a physical model.

As I have also pointed out, the Met Office does not dispute any of this.

So, what has the reaction been? Well, avid twitterer "There's Physics", who I believe is called Anders and is associated with Skeptical Science, tweeted this:

Can @MetOffice clarify their position wrt statistical models - in a way that @aDissentient might understand?

A response from John Kennedy appeared shortly afterwards, which pointed to this statement, which addresses Doug Keenan's claims, noting that there are other models that give better results and suggesting that the analysis is therefore inconclusive. Kennedy drew particular attention to the following paragraph:

These results have no bearing on our understanding of the climate system or of its response to human influences such as greenhouse gas emissions and so the Met Office does not base its assessment of climate change over the instrumental record on the use of these statistical models.

I think I'm right in saying that Doug Keenan would agree with all of this.

Anders has followed this up with a blog post, in which he says I don't understand the Met Office's position. It's a somewhat snide piece, but I think it does illuminate some of the issues. Take this for example:

Essentially – as I understand it – the Met Office’s statistical models is indeed, in some sense, inadequate.

Right. So we agree on that.

This, however, does not mean that there is a statistical model that is adequate.

We seem to agree on that too.

It means that there are no statistical models that are adequate.

Possibly. Certainly I think it's true to say that we haven't got one at the moment, which amounts to the same thing.

Then there's this:

[Statistical models] cannot – by themselves – tell you why a dataset has [certain] properties. For that you need to use the appropriate physics or chemistry. So, for the surface temperature dataset, we can ask the question are the temperatures higher today then they were in 1880? The answer, using a statistical model, is yes. However, if we want an answer to the question why are the temperatures higher today than they were in 1880, then there is no statistical model that – alone – can answer this question. You need to consider the physical processes that could drive this warming. The answer is that a dominant factor is anthropogenic forcings that are due to increased atmospheric greenhouse gas concentrations; a direct consequence of our own emissions.

Again, there is much to agree with here. If you want to understand why temperature has changed, you will indeed need a physical model, although whether current GCMs are up to the job is a moot point to say the least. (I'm not sure about Anders' idea of needing a statistical model to tell whether temperatures are higher today than in 1880 - as Matt Briggs is fond of pointing out, the way forward here is to subtract the measurement for 1880 from that for today - but that's beside the point).

All this harmony aside, I hope you will be able to see what is at the root of Anders's seeming need to disagree: he is asking different questions to the one posed at the top of this post. He wants to know why temperatures are changing, while I want to know if they are doing something out of the ordinary. I would posit that defining "the ordinary" for temperature records is not something that can be done using a GCM.

I think Anders' mistake is to assume that Doug is going down a "global warming isn't happening" path. In fact the thrust of his work has been to determine what the empirical evidence for global warming is - when people like Mark Walport say that it is clear that climate change is happening and that its impacts are evident, what scientific evidence is backing those statements up? I would suggest that anyone hearing Walport's words would assume that we had detected something out of "the ordinary" going on. But as we have seen, this is a question that we cannot answer at the present time. And if such statements are supported only by comparisons of observations to GCMs then I think words like "clear" and "evident" should not be used.

Update on Jul 3, 2014 by

Bishop Hill

In my post above I said:

If you want to understand why temperature has changed, you will indeed need a physical model.

As I put it in a tweet to Anders, he and I are in glorious harmony.

He has just replied:

No, I really don't think we are. If you want to understand GW you need a physical model.

I laughed so much I got cramp in an intercostal.

307 comments

View Printer Friendly Version

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Response: Comment

by That would be telling :-) at And Then There's Physics on Jul 3, 2014 at 11:26 AM

This is an attempt at a response to Andrew’s post Andrew, you say, I think Anders’ mistake is to assume that Doug is going down a “global warming isn’t happening” path. In fact the thrust of his work has been to determine what the empirical evidence for global warming is Well, ...

Reader Comments (307)

Nullius,

That's what Doug did to generate his ARIMA(3,1,0) model.

If Doug understood the basics of climate science, I might have some confidence that his analysis was better/superior (whatever word you want to use) than that of the Met Office. Given that he clearly does not, the chance that his preferred method is more appropriate than that used by actual experts, is vanishingly small. I know you might regard this as an appeal to authority, but an appeal to authority is clearly superior to an appeal to ignorance.

Plus, you still haven't actually told me what Doug's ARIMA(3,1,0) model tells us about the data. As I understand it, it tells us that it is consistent with a random walk. Given that it can't be a random walk, this would seem to be telling us something rather irrelevant.

Jul 12, 2014 at 10:12 PM |

And Then There's Physics

"If Doug understood the basics of climate science, I might have some confidence that his analysis was better/superior (whatever word you want to use) than that of the Met Office."

Well, if you insist on using argument from authority, a statistician is more of an expert on statistics than a climate scientist, and more likely to be right about it.

Personally, I'd always check the mathematics, or say "I don't know". 'Science' is the belief in the ignorance of experts.

"Plus, you still haven't actually told me what Doug's ARIMA(3,1,0) model tells us about the data."

I answered the question at 10:19 PM on July 7th. See my previous comment for more speculations about possible physics from which it could arise.

"As I understand it, it tells us that it is consistent with a random walk."

No, it's a short-term approximation, in the same way that a linear trend is. What it tells us is that the data set we've got is too short to be able to distinguish the actual behaviour from a random walk, and using the random walk in its place gives more accurate results.

The AR(1) class of models used by the IPCC also borders on the random walk, and can approximate one arbitrarily closely.

" Given that it can't be a random walk, this would seem to be telling us something rather irrelevant."

And as I explained previously, it can't be a non-zero linear trend either. A non-zero linear trend extrapolated one way or the other eventually yields negative absolute temperatures (below zero Kelvin) which is impossible.

So exactly the same objection applies to all attempts to fit linear trends to temperatures. Why are these not irrelevant, too?

Jul 12, 2014 at 10:34 PM |

Nullius in Verba

Well, if you insist on using argument from authority, a statistician is more of an expert on statistics than a climate scientist, and more likely to be right about it.

Self-taught. A self-professed expert on statistic isn't necessarily a better statistician than a climate scientist. Additionally, there is no guarantee that a professional, fully-qualified statistician is a better statistician than a specific climate scientist. Plus, climate scientists can do statistics and physics.

No, it's a short-term approximation, in the same way that a linear trend is. What it tells us is that the data set we've got is too short to be able to distinguish the actual behaviour from a random walk, and using the random walk in its place gives more accurate results.

What does this mean? That a random walk can reproduce the data better than a linear trend. Sure, there will be a random walk that fits the data better than a linear trend. So what? Not only can one not easily quantify the best-fit random walk; just because some random distribution of points fits some data set tells you nothing.

So exactly the same objection applies to all attempts to fit linear trends to temperatures. Why are these not irrelevant, too?

Both have a start and end point, so the extrapolated values are not that relevant. However, what you seem to be missing is that a best fit linear-trend produces a number that can then be quoted when someone ask about the data (i.e., the answer to the question "how fast are we warming" can be "x degrees per decade, assuming a linear trend"). What can the random walk that happens to fit the data tells us? "We get the best fit when the random number seed is 73". What does that actually tell us about the data? Nothing, I would say.

Jul 12, 2014 at 11:27 PM |

And Then There's Physics

> Validation trumps goodness of fit.

Yes, but MattStat's point applies to validated theories too. It's always possible to find another validated theory with a better fit than the validated one we prefer. This follows from the indeterminacy of theories, just like Ye Old Statisician recalled. And this point is good whatever notion of validity is used here, a notion that would deserve due diligence.

***

> With epicycles, you can't validate it because you can't falsify it.

Appealing to falsifiability only helps make the point shorter. Falsifiability is simply the disposition to find falsifiers. A falsifier is a statement that expresses the possibility of falsification. From this consideration alone follows that a theory is scientific only if we could refute it, be it zillion-times validated like Douglas would like climate models to be.

In a nutshell, Douglas seems to appeal to a conception of science that stops being empirical in a Popperian sense.

The story is a bit longer if we prefer to speak of internal validity. But the end is the same.

***

> [A] statistician is more of an expert on statistics than a climate scientist, and more likely to be right about it.

Being an expert on statistical inference does not imply one is an expert on every kind of scientific inferences, as Nullius exemplifies.

Perhaps statisticians ought to keep under their "pure maths" thimble.

Jul 13, 2014 at 12:19 AM |

willard

"However, what you seem to be missing is that a best fit linear-trend produces a number that can then be quoted when someone ask about the data"

So does an ARIMA model fit.

Let's start simple. First generate a sequence of random numbers r(t) for t = 0,1,2,..., with mean zero and distribution iid Gaussian. Now construct a sequence x(t) such that x(0) = 0, and x(t+1) = k*x(t)+r(t). Set k = 0.999. Now generate 1000 values for the sequence x(t), and draw a trend line through it.

The mean for x(0) is obviously zero. The mean for x(t) is k times the mean for x(t-1) plus the mean of r(t), which is zero. So by induction, the mean of the distribution at every time is zero. There is no trend. And yet most of the time the trend calculation will report that there is one.

Talking about the trend for this series isn't very useful. But we can certainly quote a number to characterise the sequence: the constant k is one such number. The variance of the r(t) sequence is another.

It's an easy example to play with. Vary the constant k from 0 to 1, and see how the visual appearance of the series changes. k = 0 is what the OLS method is designed to cope with. k = 1 is a random walk. Which does the data look more like?

This is the simplest model after the iid one, and the next one taught. A more complicated example is x(t+4) = x(t+3) + p*(x(t+3) -x(t+2)) + q*(x(t+2) - x(t+1)) + r*(x(t+1) - x(t)) = (1+p)*x(t+3) + (q-p)*x(t+2) + (r-q)*x(t+1) - r*x(t) for some set of constants p, q, r. The three constants characterise the sequence. It's easy to check that the mean is again zero at every time step, so again there's no trend. (If you want one, you can just add another constant s to the polynomial.) But it does have numbers you can report to characterise it.

For these models, the rise or fall of any particular instance is just random noise. You can, of course, report how much it rose or fell in any given interval, but it doesn't tell you very much about the rise or fall in any other interval. And that's as it should be, because the trend you get from looking at the data is accidental and meaningless. The true underlying trend is zero. You can't assume that any observed trend will continue beyond the data you see. But for these models, you *can* assume that the same values of p, q, and r will continue to apply, and fit the relationship between the data points.

There are still quotable numbers you can report. They're just described by a different set of parameters.

Jul 13, 2014 at 12:35 AM |

Nullius in Verba

Sorry, missed out the r(t) on the last formula.

Should be:
x(t+3) + p*(x(t+3) -x(t+2)) + q*(x(t+2) - x(t+1)) + r*(x(t+1) - x(t)) + r(t) = (1+p)*x(t+3) + (q-p)*x(t+2) + (r-q)*x(t+1) - r*x(t) + r(t)

Jul 13, 2014 at 12:39 AM |

Nullius in Verba

Expansion coefficient is 1.39*10^-3/C.

Jul 10, 2014 at 1:53 PM | Unregistered CommenterEntropic man

EM - if you are there - could you check or give a reference for the expansion coefficient of seawater. I know it depends on temperature and pressure but the values I found eg in

http://publishing.cdlib.org/ucpressebooks/view?docId=kt167nb66r&chunk.id=d3_4_ch03&toc.id=ch03&brand=eschol

are very different from the value you used.

Jul 13, 2014 at 10:54 PM |

Martin A

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>