Last week, I posted about a comment Nic Lewis had written at RealClimate. In that comment, Lewis had spent some time discussing a study by Aldrin et al, and noted that its findings were distorted by the use of a uniform (or "flat" prior). Although Gavin Schmidt did not respond directly to this point, one commenter pushed the question of the validity of the uniform prior approach a little further.
I thought James Annan had demonstrated that using a uniform prior was bad practise. That would tend to spread the tails of the distribution such that the mean is higher than the other measures of central tendency. So is it justified in this paper?
This elicited a response from a statistician called Steve Jewson (a glance at whose website suggests he is just the man you'd want to give you advice in this area):
Following on from the comments by Nic Lewis and Graeme,
Yes, using a flat prior for climate sensitivity doesn’t make sense at all. Subjective and objective Bayesians disagree on many things, but they would agree on that. The reasons why are repeated in most text books that discuss Bayesian statistics, and have been known for several decades. The impact of using a flat prior will be to shift the distribution to higher values, and increase the mean, median and mode. So quantitative results from any studies that use the flat prior should just be disregarded, and journals should stop publishing any results based on flat priors. Let’s hope the IPCC authors understand all that.
Nic (or anyone else)…would you be able to list all the studies that have used flat priors to estimate climate sensitivity, so that people know to avoid them?
RC regular Ray Ladbury then chimed in with this:
The problem is that the studies that do not use a flat prior wind up biasing the result via the choice of prior. This is a real problem given that some of the actors in the debate are not “honest brokers”. It has seemed to me that at some level an Empirical Bayes approach might be the best one here–either that or simply use the likelihood and the statistics thereof.
To which Steve Jewson replied:
I agree that no-one should be able to bias the results by their choice of prior: there needs to be a sensible convention for how people choose the prior, and everyone should follow it to put all studies on the same footing and to make them comparable.
And there is already a very good option for such a convention…it’s Jeffreys’ Prior (JP).
JP is not 100% accepted by everybody in statistics, and it doesn’t have perfect statistical properties (there is no framework that has perfect statistical properties anywhere in statistics) but it’s by far the most widely accepted option for a conventional prior, it has various nice properties, and basically it’s the only chance we have for resolving this issue (the alternative is that we spend the next 30 years bickering about priors instead of discussing the real issues). Wrt the nice properties, in particular the results are independent of the choice of coordinates (e.g. you can use climate sensitivity, or inverse climate sensitivity, and it makes no difference).
Using a flat prior is not the same as using Jeffreys’ prior, and the results are not independent of the choice of coordinates (e.g. a flat prior on climate sensitivity does not give the same results as a flat prior on inverse climate sensitivity).
Using likelihood alone isn’t a good idea because again the results are dependent on the parameterisation chosen…you could bias your results just by making a coordinate transformation. Plus you don’t get a probabilistic prediction.
When Nic Lewis referred to objective Bayesian statistics in post 66 above, I’d guess he meant the Jeffreys’ prior.
ps: I’m talking about the *second* version of JP, the 1946 version not the 1939 version, which resolves the famous issue that the 1939 version had related to the mean and variance of the normal distribution.
Nic Lewis was happy to concur and to provide a list of flat-prior studies.
First, when I refer to an objective Bayesian method with a noninformative prior, that means using what would be the original Jeffreys’ prior for inferring a joint posterior distribution for all parameters, appropriately modified if necessary to give as accurate inference (marginal posteriors) for individual parameters as possible. In general, that would mean using Bernardo and Berger “reference priors”, one targeted at each parameter of interest. In the case of independent scale and location parameters, doing so would equate to the second version of the Jeffreys’ prior that Steve refers to. In practice, when estimating S and Kv, marginal parameter inference may be little different between using the original Jeffreys’ prior and targeted reference priors.
Secondly, here is a list of climate sensitivity studies that used a uniform prior for main results when for estimating climate sensitivity on its own, or when estimating climate sensitivity S jointly with effective ocean vertical diffusivity Kv (or any other parameter like those two in which observations are strongly nonlinear) used uniform priors for S and/or Kv.
Forest et al (2002)
Knutti et at (2002)
Frame et al (2005)
Forest et al (2006)
Forster and Gregory (2006) – results as presented in IPCC AR4 WG1 report (the study itself used 1/S prior, which is the Jeffreys’ prior in this case, where S is the only parameter being estimated)
Hegerl et al (2006)
Forest et al (2008)
Sanso, Forest and Zantedeschi (2008)
Libardoni and Forest (2011) [unform for Kv, expert for S]
Olson et al (2012)
Aldrin et al (2012)
This includes a large majority of the Bayesian climate studies that I could find.
Some of these papers also used other priors for climate sensitivity as alternatives, typically either informative “expert” priors, priors uniform in the climate feedback parameter (1/S) or in one case a uniform in TCR prior. Some also used as alternative nonuniform priors for Kv or other parameters being estimated.
Steve Jewson again:
Sorry to go on about it, but this prior thing this is an important issue. So here are my 7 reasons for why climate scientists should *never* use uniform priors for climate sensitivity, and why the IPCC report shouldn’t cite studies that use them.
It pains me a little to be so critical, especially as I know some of authors listed in Nic Lewis’s post, but better to say this now, and give the IPCC authors some opportunity to think about it, than after the IPCC report is published.
1) *The results from uniform priors are arbitrary and hence non-scientific*
If the authors that Nic Lewis lists above had chosen different coordinate systems, they would have got different results. For instance, if they had used 1/S, or log S, as their coordinates, instead of S, the climate sensitivity distributions would change. Scientific results should not depend on the choice of coordinate system.
2) *If you use a uniform prior for S, someone might accuse you of choosing the prior to give high rates of climate change*
It just so happens that using S gives higher values for climate sensitivity than using 1/S or log S.
3) *The results may well be nonsense mathematically*
When you apply a statistical method to a complex model, you’d want to first check that the method gives sensible results on simple models. But flat priors often given nonsense when applied to simple models. A good example is if you try and fit a normal distribution to 10 data values using a flat prior for the variance…the final variance estimate you get is higher than anything that any of the standard methods will give you, and is really just nonsense: it’s extremely biased, and the resulting predictions of the normal are much too wide. If flat priors fail on such a simple example, we can’t trust them on more complex examples.
4) *You risk criticism from more or less the entire statistics community*
The problems with flat priors have been well understood by statisticians for decades. I don’t think there is a single statistician in the world who would argue that flat priors are a good way to represent lack of knowledge, or who would say that they should be used as a convention (except for location parameters…but climate sensitivity isn’t a location parameter).
5) *You risk criticism from scientists in many other disciplines too*
In many other scientific disciplines these issues are well understood, and in many disciplines it would be impossible to publish a paper using a flat prior. (Even worse, pensioners from the UK and mathematicians from the insurance industry may criticize you too :)).
6) *If your paper is cited in the IPCC report, IPCC may end up losing credibility*
These are much worse problems than getting the date of melting glaciers wrong. Uniform priors are a fundamentally unjustifiable methodology that gives invalid quantitative results. If these papers are cited in the IPCC, the risk is that critics will (quite rightly) heap criticism on the IPCC for relying on such stuff, and the credibility of IPCC and climate science will suffer as a result.
7) *There is a perfectly good alternative, that solves all these problems*
Harold Jeffreys grappled with the problem of uniform priors in the 1930s, came up with the Jeffreys’ prior (well, I guess he didn’t call it that), and wrote a book about it. It fixes all the above problems: it gives results which are coordinate independent and so not arbitrary in that sense, it gives sensible results that agree with other methods when applied to simple models, and it’s used in statistics and many other fields.
In Nic Lewis’s email (number 89 above), Nic describes a further refinement of the Jeffreys’ Prior, known as reference priors. Whether the 1946 version of Jeffreys’ Prior, or a reference prior, is the better choice, is a good topic for debate (although it’s a pretty technical question). But that debate does muddy the waters of this current discussion a little: the main point is that both of them are vastly preferable to uniform priors (and they are very similar anyway). If reference priors are too confusing, just use Jeffreys’ 1946 Prior. If you want to use the fanciest statistical technology, use reference priors.
ps: if you go to your local statistics department, 50% of the statisticians will agree with what I’ve written above. The other 50% will agree that uniform priors are rubbish, but will say that JP is rubbish too, and that you should give up trying to use any kind of noninformative prior. This second 50% are the subjective Bayesians, who say that probability is just a measure of personal beliefs. They will tell you to make up your own prior according to your prior beliefs. To my mind this is a non-starter in climate research, and maybe in science in general, since it removes all objectivity. That’s another debate that climate scientists need to get ready to be having over the next few years.
I wonder how many of the flat prior studies will make it to the final draft of AR5? All of them?