Keenan's response to the BEST paper

Oct 21, 2011

Bishop Hill in Climate: Statistics, Climate: Surface

Bishop Hill in Climate: Statistics, Climate: Surface

*The Economist* asked me to comment on four research papers from the Berkeley Earth Surface Temperature (BEST) project. The four papers were as follows.

- Decadal variations in the global atmospheric land temperatures
- Influence of urban heating on the global temperature land average using rural sites identified from MODIS classifications
- Berkeley Earth temperature averaging process
- Earth atmospheric land surface temperature and station quality

Below is some of the correspondence that we had. (Note: my comments were written under time pressure, and are unpolished.)

From: D.J. Keenan

To: Richard Muller [BEST Scientific Director]; Charlotte Wickham [BEST Statistical Scientist]

Cc: James Astill; Elizabeth Muller

Sent: 17 October 2011, 17:16

Subject: BEST papers

Attach: Roe_FeedbacksRev_08.pdf; Cowpertwait & Metcalfe, 2009, sect 2-6-3.pdf; EmailtoDKeenan12Aug2011.pdf

Charlotte and Richard,

James Astill, Energy & Environment Editor of *The Economist*, asked Liz Muller if it would be okay to show me your BEST papers, and Liz agreed. Thus far, I have looked at two of the papers.

- Decadal Variations in the Global Atmospheric Land Temperatures
- Influence of Urban Heating on the Global Temperature Land Average Using Rural Sites Identified from MODIS Classifications

Following are some comments on those.

In the first paper, various series are compared and analyzed. The series, however, have sometimes been smoothed via a moving average. Smoothed time series cannot be used in most statistical analyses. For some comments on this, which require only a little statistical background, see these blog posts by Matt Briggs (who is a statistician):

Do not smooth times series, you hockey puck!

Do NOT smooth time series before computing forecast skill

Here is a quote from those (formatting in original).

Unless the data is measured with error,

you never, ever, for no reason, under no threat, SMOOTH the series!And if for some bizarre reason you do smooth it,you absolutely on pain of death do NOT use the smoothed series as input for other analyses! If the data is measured with error, you might attempt to model it (which means smooth it) in an attempt to estimate the measurement error, but even in these rare cases you have to have anoutside(the learned word is “exogenous”) estimate of that error, that is, one not based on your current data.If, in a moment of insanity, you

dosmooth time series data and youdouse it as input to other analyses,you dramatically increase the probability of fooling yourself! This is because smoothing induces spurious signals—signals that look real to other analytical methods.

This problem seems to invalidate much of the statistical analysis in your paper.

There is another, larger, problem with your papers. In statistical analyses, an inference is not drawn directly from data. Rather, a statistical model is fit to the data, and inferences are drawn from the model. We sometimes see statements such as “the data are significantly increasing”, but this is loose phrasing. Strictly, data cannot be significantly increasing, only the trend in a statistical model can be.

A statistical model should be plausible on both statistical and scientific grounds. Statistical grounds typically involve comparing the model with other plausible models or comparing the observed values with the corresponding values that are predicted from the model. Discussion of scientific grounds is largely omitted from texts in statistics (because the texts are instructing in statistics), but it is nonetheless crucial that a model be scientifically plausible. If statistical and scientific grounds for a model are not given in an analysis and are not clear from the context, then inferences drawn from the model should be regarded as unfounded.

The statistical model adopted in most analyses of climatic time series is a straight line (usually trending upward) with noise (i.e. residuals) that are AR(1). AR(1) is short for “first-order autoregressive”, which means, roughly, that this year (only) has a direct effect on next year; for example, if this year is extremely cold, then next year will have a tendency to be cooler than average.

That model—a straight line with AR(1) noise—is the model adopted by the IPCC (see AR4: §I.3.A). It is also the model that was adopted by the U.S. Climate Change Science Program (which reports to Congress) in its analysis of “Statistical Issues Regarding Trends”. Etc. An AR(1)-based model has additionally been adopted for several climatic time series other than global surface temperatures. For instance, it has been adopted for the Pacific Decadal Oscillation, studied in your work: see the review paper by Roe [2008], attached.

Although an AR(1)-based model has been widely adopted, it nonetheless has serious problems. The problems are actually so basic that they are discussed in some recent introductory (undergraduate) texts on time series—for example, in *Time Series Analysis and Its Applications* (third edition, 2011) by R.H. Shumway & D.S. Stoffer (see Example 2.5; set exercises 3.33 and 5.3 elaborate).

In Australia, the government commissioned the Garnaut Review to report on climate change. The Garnaut Review asked specialists in the analysis of time series to analyze the global temperature series. The report from those specialists considered and, like Shumway & Stoffer, effectively rejected the AR(1)-based statistical model. Statistical analysis shows that the model is too simplistic to cope with the complexity in the series of global temperatures.

Additionally, some leading climatologists have strongly argued on scientific grounds that the AR(1)-based model is unrealistic and too simplistic [Foster et al., *GRL*, 2008].

To summarize, most research on global warming relies on a statistical model that should not be used. This invalidates much of the analysis done on global warming. I published an op-ed piece in the *Wall Street Journal* to explain these issues, in plain English, this year.

The largest center for global-warming research in the UK is the Hadley Centre. The Hadley Centre employs a statistician, Doug McNeall. After my op-ed piece appeared, Doug McNeall and I had an e-mail discussion about it. A copy of one of his messages is attached. In the message, he states that the statistical model—a straight line with AR(1) noise—is “simply inadequate”. (He still believes that the world is warming, primarily due to computer simulations of the global climate system.)

Although the AR(1)-based model is known to be inadequate, no one knows what statistical model should be used. There have been various papers in the peer-reviewed literature that suggest possible resolutions, but so far no alternative model has found much acceptance.

When I heard about the Berkeley Earth Surface Temperature project, I got the impression that it was going to address the statistical issues. So I was extremely curious to see what statistical model would be adopted. I assumed that strong statistical expertise would be brought to the project, and I was trusting that, at a minimum, there would be a big improvement on the AR(1)-based model. Indeed, I said this in an interview with *The Register* last June.

BEST did not adopt the AR(1)-based model; nor, however, did it adopt a model that deals with some of the complexity that AR(1) fails to capture. Instead, BEST chose a model a model that is much more simplistic than even AR(1), a model which allows essentially no structure in the time series. In particular, the model that BEST adopted assumes that this year has no effect on next year. That assumption is clearly invalid on climatological grounds. It is also easily seen to be invalid on statistical grounds. Hence the conclusions of the statistical analysis done by BEST are unfounded.

All this occurred even though understanding the crucial question—what statistical model should be used?—requires only an introductory level of understanding in time series. The question is so basic that it is discussed by the introductory text of Shumway & Stoffer, cited above. Another text that does similarly is *Introductory Time Series with R* by P.S.P. Cowpertwait & A.V. Metcalfe (2009); a section from that text is attached. (The section argues that, from a statistical perspective, a pure AR(4) model is appropriate for global temperatures.) Neither Shumway & Stoffer nor Cowpertwait & Metcalfe have an agenda on global warming, to my knowledge. Rather, they are just writing introductory texts on time series and giving students practical examples; each text includes the series of global temperatures as one of those examples.

There are also textbooks that are devoted to the statistical analysis of climatic data and that discuss time-series modeling in detail. My bookshelf includes the following.

*Climate Time Series Analysis* (Mudelsee, 2010)

*Statistical Analysis in Climate Research* (von Storch & Zwiers, 2003)

*Statistical Methods in the Atmospheric Sciences* (Wilks, 2005)

*Univariate Time Series in Geosciences* (Gilgen, 2006)

Considering the second paper, on Urban Heat Islands, the conclusion there is that there has been some urban *cooling*. That conclusion contradicts over a century of research as well as common experience. It is almost certainly incorrect. And if such an unexpected conclusion is correct, then every feasible effort should be made to show the reader that it must be correct.

I suggest an alternative explanation. First note that the stations that your analysis describes as “very rural” are in fact simply “places that are not dominated by the built environment”. In other words, there might well be, and probably is, substantial urbanization at those stations. Second, note that Roy Spencer has presented evidence that the effects of urbanization on temperature grow logarithmically with population size.

The Global Average Urban Heat Island Effect in 2000 Estimated from Station Temperatures and Population Density Data

Putting those two notes together, we might expect that the UHI effect will be larger at the sites classified as “very rural” than at the sites classified as urban. And that is indeed what your analysis shows. Of course, if this alternative explanation is correct, then we cannot draw any inferences about the size of UHI effects on the average temperature measurements, using the approach taken in your paper.

There are other, smaller, problems with your paper. In particular, the Discussion section states the following.

We observe the opposite of an urban heating effect over the period 1950 to 2010, with a slope of -0.19 ± 0.19 °C/100yr. This is not statistically consistent with prior estimates, but it does verify that the effect is very small....

If the two estimates are not consistent, then they contradict each other. In other words, at least one of them must be wrong. Hence one estimate cannot be used “verify” an inference drawn from the other. This has nothing to do with statistics. It is logic.

Sincerely, Doug

* * * * * * * * * * * *

Douglas J. Keenan

http://www.informath.org

From: Richard Muller

To: James Astill

Cc: Elizabeth Muller

Sent: 17 October 2011, 23:33

Subject: Re: BEST papers

Dear James,

You've received a copy of an email that DJ Keenan wrote to me and Charlotte. He raises lots of issues that require addressing, some that reflect misunderstanding, and some of which just reflect disagreements among experts in the field of statistics. Since these issues are bound to arise again and again, we are preparing an FAQ that we will put on our web site.

Keenan states that he had not yet read our long paper on statistical methods. I think if he reads this he is more likely to appreciate the sophistication and care that we took in the analysis. David Brillinger, our chief advisor on statistics, warned us that by avoiding the jargon of statistics, we would mislead statisticians to think we had a naive approach. But we decided to write in a more casual style, specifically to be able to reach the wider world of geophysicists and climate scientists who don't understand the jargon. Again, if Keenan reads the methods paper, he will have a deeper appreciation of what we have done.

It is also important to recognize that we are not creating a new field of science, but are adding to one that has a long history. In the past I've discovered that if you avoid using the methods of the past, the key scientists in the field don't understand what you have done. As my favorite example, I cite a paper I wrote in which I did data were unevenly spaced in time, so I did a Lomb periodogram; the paper was rejected by referees who argued that I was using an "obscure" approach and should have simply done the traditional interpolation followed by Blackman-Tukey analysis. In the future I did it their way, always being careful however to also do a Lomb analysis to make sure there were no differences.

His initial comment is on the smoothing of data. There are certainly statisticians who vigorously oppose this approach, but there have been top statisticians who support it. Included in that list are David Brillinger, and his mentor, the great John Tukey. Tukey revolutionize the field of data analysis for science and his methods dominate many fields of physical science.

Tukey argued that smoothing was a version of "pre-whitening", a valuable way to remove from the data behavior that was real but not of primary interest. Another of his methods was sequential analysis, in which the low frequency variations were identified, fit using a maximum likelihood method, and then subtracted from the data using a filter prior to the analysis of the frequencies of interest. He showed that this pre-whitening would lead to a more robust result. This is effectively what we did in the Decadal variations paper. The long time scale changes were not the focus of our study, so we did a maximum-likelihood fit, removed them, and examined the residuals.

Keenan quotes: "If, in a moment of insanity, you *do* smooth time series data and you *do* use it as input to other analyses, **you dramatically increase the probability of fooling yourself**! This is because smoothing induces spurious signals—signals that look real to other analytical methods." Then he draws a conclusion that does not follow from this quote; he says: "This problem seems to invalidate much of the statistical analysis in your paper."

He is, of course, being illogical. Just because smoothing can increase the probability of our fooling ourselves doesn't mean that we did. There is real value to smoothing data, and yes, you have to beware of the traps, but if you are then there is a real advantage to doing that. I wrote about this in detail in my technical book on the subject, "Ice Ages and Astronomical Causes." Much of this book is devoted to pointing out the traps and pitfalls that others in the field fell into.

Keenan goes on to say, "In statistical analyses, an inference is not drawn directly from data. Rather, a statistical model is fit to the data, and inferences are drawn from the model." I agree wholeheartedly! He may be confused because we adopted the language of physics and geophysics rather than that of statistics. He goes on to say that "This invalidates much of the analysis done on global warming." If we are to move ahead, it does no good simply to denigrate most of the previous work. So we do our work with more care, using valid statistical methods, but write our papers in such a way that the prior workers in the field will understand what we say. Our hope, in part, is to advance the methods of the field.

Unfortunately, Keenan's conclusion is that there has been virtually no valid work in the climate field, that what is needed is a better model, and he does not know what that model should be. He says, "To summarize, most research on global warming relies on a statistical model that should not be used. This invalidates much of the analysis done on global warming. I published an op-ed piece in the *Wall Street Journal* to explain these issues, in plain English, this year."

Here is his quote basically concluding that no analysis of global warming is valid under his statistical standards: "Although the AR(1)-based model is known to be inadequate, no one knows what statistical model should be used. There have been various papers in the peer-reviewed literature that suggest possible resolutions, but so far no alternative model has found much acceptance."

What he is saying is that statistical methods are unable to be used to show that there is global warming or cooling or anything else. That is a very strong conclusion, and it reflects, in my mind, his exaggerated pedantry for statistical methods. He can and will criticize every paper published in the past and the future on the same grounds. We might as well give up in our attempts to evaluate global warming until we find a "model" that Keenan will approve -- but he offers no help in doing that.

In fact, a quick survey of his website shows that his list of publications consists almost exclusively of analysis that shows other papers are wrong. I strongly suspect that Keenan would have rejected any model we had used.

He gives some specific complaints. He quotes our paper, where we say, "We observe the opposite of an urban heating effect over the period 1950 to 2010, with a slope of -0.19 ± 0.19 °C/100yr. This is not statistically consistent with prior estimates, but it does verify that the effect is very small...."

He then complains,

If the two estimates are not consistent, then they contradict each other. In other words, at least one of them must be wrong. Hence one estimate cannot be used “verify” an inference drawn from the other. This has nothing to do with statistics. It is logic.

He is misinterpreting our statement. Our conclusion is based on our analysis. We believe it is correct. The fact that it is inconsistent with prior estimates does imply that one is wrong. Of course, we think it is the prior estimates. We do not believe that the prior estimates were more than back-of-the-envelope "guestimates", and so there is no "statistical" contradiction.

He complains,

Considering the second paper, on Urban Heat Islands, the conclusion there is that there has been some urban

cooling. That conclusion contradicts over a century of research as well as common experience. It is almost certainly incorrect. And if such an unexpected conclusion is correct, then every feasible effort should be made to show the reader that it must be correct.

He is drawing a strong a conclusion for an effect that is only significant to one standard deviation! He never would have let us claim that -0.19 ± 0.19 °C/100yr indicates urban cooling. I am surprised that a statistician would argue that such a statistically insignificant effect indicates cooling.

Please be careful whom you share this email with. We are truly interested in winning over the other analysts in the field, and I worry that if they were to read portions of this email out of context that they might interpret it the wrong way.

Rich

From: D.J. Keenan

To: James Astill

Sent: 18 October, 2011 17:53

Subject: Re: BEST papers

James,

On the most crucial point, it seems that Rich and I are in agreement. Here is a quote from his reply.

Keenan goes on to say, "In statistical analyses, an inference is not drawn directly from data. Rather, a statistical model is fit to the data, and inferences are drawn from the model." I agree wholeheartedly!

And so the question is this: was the statistical model that was adopted for their analysis a reasonable choice? If not, then--since their conclusions are based upon that model--their conclusions must be unfounded.

In fact, the statistical model that they adopted has been rejected by essentially everyone. In particular, it has been rejected by both the IPCC and the CCSP, as cited in my previous message. I know of no work that presents argumentation to support their choice of model: they have just adopted the model without any attempt at justification, which is clearly wrong.

(It has been known for decades that the statistical model that they adopted should not be used. Although the statistical problems with the model were clear, for a long time, no one knew the physical reason. Then in 1976, Klaus Hasselmann published a paper that explained the reason. The paper is famous and has since been cited more than 1000 times.)

We could have a discussion about what statistical model should be adopted. It is certain, though, that the model BEST adopted should be rejected. Ergo, their conclusions are unfounded.

Regarding smoothing, the situation here requires only little statistics to understand. Consider the example given by Matt Briggs at

Do NOT smooth time series before computing forecast skill

We take two series, each entirely random. We compute the correlation of the two series: that will tend to be around 0. Then we smooth each series, and we compute the correlation of the two smoothed series: that will tend to be greater than before. The more we smooth the two series, the greater the correlation. Yet we started out with purely random series. This is not a matter of opinion; it is factual. Yet the BEST work computes the correlation of smoothed series.

The reply uses rhetorical techniques to avoid that, stating "Just because smoothing can increase the probability of our fooling ourselves doesn't mean that we did". The statement is true, but it does not rebut the above point.

Considering the UHI paper, my message included the following.

There are other, smaller, problems with your paper. In particular, the Discussion section states the following.

We observe the opposite of an urban heating effect over the period 1950 to 2010, with a slope of -0.19 ± 0.19 °C/100yr. This is not statistically consistent with prior estimates, but it does verify that the effect is very small....

If the two estimates are not consistent, then they contradict each other. In other words, at least one of them must be wrong. Hence one estimate cannot be used “verify” an inference drawn from the other. This has nothing to do with statistics. It is logic.

The reply claims "The fact that [their paper's conclusion] is inconsistent with prior estimates does imply that one is wrong". The claim is obviously absurd.

The reply also criticizes me for "drawing a strong a conclusion for an effect that is only significant to one standard deviation". I did not draw that conclusion, their paper suggested it: saying that the effect was "opposite in sign to that expected if the urban heat island effect was adding anomalous warming" and that "natural explanations might require some recent form of “urban cooling”", and then describing possible causes, such as "For example, if an asphalt surface is replaced by concrete, we might expect the solar absorption to decrease, leading to a net cooling effect".

Note that the reply does not address the alternative explanation that my message proposed for their UHI results. That explanation, which is based on the analysis of Roy Spencer (cited in my message), implies that we cannot draw any inferences about the size of UHI effects on the average temperature measurements, using the approach taken in their paper.

I has a quick look at their Methods paper. It affects none of my criticisms.

Rich also cites his book on the causes of the ice ages. Kindly read my op-ed piece in the *Wall Street Journal*, and especially consider the discussion of Figures 6 and 7. His book claims to analyze the data in Figure 6: the book's purpose is to propose a mechanism to explain why the similarity of the two lines is so weak. In fact, to understand the mechanism, it is only necessary to do a simple subtraction--as my piece explains. In short, the analysis is his book is extraordinarily incompetent--and it takes only an understanding of subtraction to see this.

This person who did the data analysis in that book is the person in charge of data analysis at BEST. The data analysis in the BEST papers would not pass in a third-year undergraduate course in statistical time series.

Lastly, a general comment on the surface temperature records might be appropriate. We have satellite records for the last few decades, and they closely agree with the surface records. We also have good evidence that the world was cooler 100-150 years ago than it is today. Primarily for those reasons, I think that the surface temperature records--from NASA, NOAA, Hadley/CRU, and now BEST--are probably roughly right.

Cheers, Doug

From: James Astill

To: D.J. Keenan

Sent: 18 October 2011, 17:57

Subject: Re: BEST papers

Dear Doug

Many thanks. Are you saying that, though you mistrust the BEST methodology to a great degree, you agree with their most important conclusion, re the surface temperature record?

best

James

James Astill

Energy & Environment Editor

From: D.J. Keenan

To: James Astill

Sent: 18 October 2011, 18:41

Subject: Re: BEST papers

James,

Yes, I agree that the BEST surface temperature record is very probably roughly right, at least over the last 120 years or so. This is for the general shape of their curve, not their estimates of uncertainties.

Cheers, Doug

From: D.J. Keenan

To: James Astill

Sent: 20 October, 2011 13:11

Subject: Re: BEST papers

James,

Someone just sent me the BEST press release, and asked for my comments on it. The press release begins with the following statement.

Global warming is real, according to a major study released today. Despite issues raised by climate change skeptics, the Berkeley Earth Surface Temperature study finds reliable evidence of a rise in the average world land temperature of approximately 1°C since the mid-1950s.

The second sentence may be true. The first sentence, however, is not implied by the second sentence, nor does it follow from the analyses in the research papers.

Demonstrating that "global warming is real" requires much more than demonstrating that average world land temperature rose by 1°C since the mid-1950s. As an illustration, the temperature in 2010 was higher than the temperature in 2009, but that on its own does not provide evidence for global warming: the increase in temperatures could obviously be due to random fluctuations. Similarly, the increase in temperatures since the mid 1950s could be due to random fluctuations.

In order to demonstrate that the increase in temperatures since the mid 1950s is not due to random fluctuations, it is necessary to do valid statistical analysis of the temperatures. The BEST team has not done such.

I want to emphasize something. Suppose someone says "2+2=5". Then it is not merely my opinion that what they have said is wrong; rather, what they have said is wrong. Similarly, it is not merely my opinion that the BEST statistical analysis is seriously invalid; rather, the BEST statistical analysis is seriously invalid.

Cheers, Doug

From: James Astill

To: D.J. Keenan

Sent: 20 October 2011, 13:19

Subject: Re: BEST papers

Dear Doug

Many thanks for all your thoughts on this. It'll be interesting to see how the BEST papers fare in the review process. Please keep in touch.

best

jamesJames Astill

Energy & Environment Editor

A story about BEST was published in the October 22nd edition of

Article originally appeared on (http://www.bishop-hill.net/).

See website for complete article licensing information.