February 17, 2018
|The Italian, original version of this paper has been published at ClimateMonitor.it as Part I, Part II and Part III. The present version includes new applications and a “proof” of the goodness of my choices. fz.|
Sostituisco il dataset iniziale con le sue differenze prime (o con le derivate numeriche in caso di passo variabile) e verifico sperimentalmente se l’esponente di Hurst H, cioè il livello di persistenza cambia nel senso che si avvicina maggiormente al valore 0.5 mentre il dataset trasformato mantiene l’informazione spettrale del dataset originale. Applico la modifica alle medie annuali NOAA di anomalia di temperatura, all’ultimo dataset mensile, sempre NOAA, al livello del lago Vittoria i cui dati non sono a passo costante e ad altri sette dataset di varia natura.
I change the original dataset with its differences or its numerical derivatives in case of a variable step and look at the Hurst exponent H. If its value is lowered by this procedure, I verify if the new dataset contains again the spectral information of the original one by comparing their spectra. Such a procedure has been applied to the yearly global temperature anomaly and to the last available monthly data from the NOAA GHCN cag-site. Also the lake Victoria levels (at variable time step) has been used to test the procedure.
|After this paper had been written, I did find the site https://terpconnect.umd.edu/~toh/spectrum/Differentiation.html, part of A Pragmatic Introduction to Signal Processing by Prof. Tom O’Haver , Professor Emeritus, Department of Chemistry and Biochemistry, The University of Maryland at College Park.
From this site I quote the phrase: “The derivative of a periodic signal containing several sine components of different frequency will still contain those same frequencies, but with altered amplitudes and phases.”
So, I assume that this statement is a proof of the conservation of spectral information, widely discussed in what follows, within the paper, during the transition from observed to difference/derivative series also if I did not check it in a Signal Theory textbook.
It also justify the ratio between observed and difference spectral peaks.
We have noted many times that the persistence affects several datasets we normally use in climatology (and not only in it). Persistence concerns measures that tend to reproduce previous results, shows autocorrelation among them and a probable dependence. The autocorrelation function at lag 1 (i.e. acf(1)) assumes values greater than 0.5 and denotes that “normal” statistics can no more be used, being based on independent data.
|I always remember, mainly to myself, that independent data are uncorrelated, while the opposite is untrue: correlated data are not necessarily dependent. If data are correlated, then their “physical” dependence must be proven by another method or in another way.|
As an example, the standard deviation of the mean of a sample
computed from a n-dimensional sample, σ being the (common) standard deviation of sample elements, becomes, in face of persistence (Koutsoyiannis, 2003):
where H is the Hurst exponent (or coefficient). If H=0.5 equations (1) and (2) become the same, so we can say that, if H=0.5; the series we derived H from, does not show persistence and -at some extent and not being correct at all- that random variables (rv), whose values represent the actual series, are
independent from each other (we could prove the independence e.g. by verification that series, are the probability density of both the rv is given by the product of the single densities
where x,y are the rv and f(), g(), h() the probability densities).
Hurst exponent derives from an estimate in a simplified procedure, described e.g. in Koutsoyiannis (2002, 2003), not so simple for me. So, as an estimate of H, I use equation (5) of Koutsoyiannis (2003) or, being the same, equation (17) of Koutsoyiannis (2002)
(it can be proved this equation is independent from k)
where ρj is the autocorrelation function at lag j or acf(j), j>0.
So, if I fix lag 1, equation(3) becomes
acf(1)=2H2-H or 2H2-H-acf(1)=0,
from which H can be derived as:
In such a way I can get an estimate of H from the acf(1) (what of course implies the acf computation). I need to note acf is a positive function between 0 and 1; if a numerical procedure produces a negative values for acf(1), equation (4) gives an indefinite result (NaN, not a number). We can assume negative results as fluctuations around zero and assign to them an average zero-value, so equation (4) gives H=0.5 (i.e. uncorrelated data).
In what follows, I will define a procedure that, hopefully, reduces or cancels the persistence in a climate variable and apply spectral analysis to the “corrected” dataset (i.e. the one at reduced/null persistence). The last operation implies the “corrected” data will conserve the information content of the original data (or, at least, the spectral information): I am not able to prove such an hypothesis in a general case, so, after the first example, I’ll apply the same procedure to several datasets in order to verify for any single situation the so-called conservation of the (spectral) information content after a transformation.
Annual NOAA-NCDC Temperature Anomaly
The first example where I apply the above mentioned procedure is the annual average anomaly of NOAA global (earth+ocen) temperature. I own data from 2011 through 2017 and show here the 2017 ones.
Computing the acf and the Hurst exponent via equation (4) gives a Hobs=0.975, a large value implying a strong persistence (autocorrelation) among the data.
An article by Roman Mureka at WUWT shows that the differences between successive values of a dataset can reduce dependence (his words are:
“… it might not be unreasonable to assume that the annual changes are independent of each other and of the initial temperature”) and I add that the difference dataset looks like “more casual” or “less structured” (whatever such terms could mean) than the original data.
The method used by Mureka (by the way, it was applied to the same annual anomaly but in another context) appeared interesting to me and the procedure itself easy to be implemented, so decided to apply it to the 2017 annual NOAA anomaly (hereafter noaa-17). Result is in figure 1
Fig.1. upper panel: 2017 annual average anomaly, NOAA. central panel: Differences d(i)=t(i+1)-t(i). bottom panel: Detrended values, computed from the line in the upper panel compared to a fixed sine-wave.
The comparison in figure 2 between the autocorrelation functions (original vs. differences) shows an impressive improvement (reduction) of the persistence.
Fig.2. Comparison of observed and difference acfs. Hobs=0.975; Hdiff=0.5.
Now I can apply spectral analysis (say MEM [Childer 1978; Press et al. 2009] or LOMB [Lomb, 1976; Scargle, 1982] methods, which is my final scope) to the difference in order to obtain a more reliable spectral structure for annual anomaly if and only if the differences conserve the (spectral, at least) information content of the original data. As stated above, I’m not able to prove the hypothesis, so I’ll compare the spectra of both datasets shown in the next two plots (figure 3 and figure 4)
Fig.3. Original NOAA global anomaly and its MEM spectrum.
Fig.4. As in figure 3, for the differences of anomaly.
A direct comparison between spectra shows they are very similar in spectral peaks positions (periods), the only variety being the ratio among peaks height and the better definition of the long-period maxima in the difference spectrum. This one cannot be a proof, but surely may be a strong suggestion about the conservation of the information content of the differences dataset. Also, the above plots indicate the persistence does not affect the spectrum, at least in this case of annual global anomaly.
Actually, this supposed conservation of the information must be experimentally confirmed for any dataset, before whatever conclusion can be drawn.
A synthetic summary of the section is:
- I can extract from a autocorrelated dataset a new series without persistence, from which
- I can derive a more reliable spectral analysis, possibly not distorted by an high Hurst exponent.
- For a given dataset I must demonstrate the conservation of the information by comparing both (original and differences) spectra.
After this first example, I can apply the procedure to a variety of climate data in order to verify its reliability (and also if trust/untrust the statements listed in the first section of this paper).
Lake Victoria level
The lake Victoria series has a Hurst exponent H=0.962, so it is good choice for the actual procedure, the main difference with the NOAA data being the variable data step. This implies the differences must be computed per unit of the abscissa, i.e. it needs the ratio Δy/Δx, or the numerical derivative of the dataset (the same computed above, but with Δx=1).
The present one being the first time I use the procedure in a derivative/difference context, I do include both the transformations of the lake Victoria series, so that they can be compared with each other, as “obs”, “deriv” or “der” and “diff” outputs.
In the meanwhile I note that Hderiv=0.781 and Hdiff=0.638, large enough to push toward the little effectiveness of the method in reducing or nullify persistence. Nevertheless, the spectral analysis could be affected in a positive sense by the lower Hurst exponents.
Figure 5 shows the comparison among the acfs of the series.
Fig.5. acfs of lake Victoria. black Original data. blue: Absolute differences. red: Numerical derivatives. Both transformations show well visible improvement of the original autocorrelation. It should be noted that at lag 1 the acf of the derivatives is more than the double of the differences acf. Here Hobs=0.962; Hderiv=0.781; Hdiff=0.638.
Fig.6. Original serie of lake Victoria level and its LOMB spectrum. In what follows (figures 7 and 8) the main spectral feature at about 78 year appears as a macroscopic spurious shape due to the persistence, while the lowest periods remain also in the “transformed” spectra.
Fig.8. Absolute difference (i.e. not referred to a time-base) between lake Victoria levels. The peak at 34.4 year doesn’t appear in figure 7 and some minor variety is visible around 3-4 year.
Also in lake Victoria (strong) differences among power ratios of near-period peaks appear.
Monthly NOAA-NCDC Temperature Anomaly
We can suppose a resemblance between annual and monthly NOAA data but it is better to directly verify such possible common behaviour. So, I use here the last available monthly dataset at NOAA cag (climate at a glance) site: the series referred to December 2017 (here named 1712t.dat), from which I computed the differences and, from both, the acfs of figure 9.
Fig.9. acf of observed (black) and differences (blue) of December 2017 NOAA monthly anomaly. The enhancement of the persistence is evident. Here Hobs=0.983 and Hdiff=0.5.
The persistence has been totally reseted by the transformation and the spectra in figures 10 and 11 cleanly show that the spectral information has been mantained through the transformation procedure. In short, we can read here the same novel as above, for the annual data: we observe the same spectral structure and a sharpening of the ~60-year peak.
Fig.10. Global monthly anomaly through December 2017. A comparison with the black line of figure 9 shows how much the persistence is strong here, much more than in annual data. In the central frame we can note the weakness of the ~60-year peak identification.
Fig.11. Differences of 1712t.dat monthly anomaly and its MEM spectrum. The peak at ~60 year is well visible here.
From the spectral analysis of monthly data we can derive the same
spectral structure as the annual data and the confirmation that, with the
generic NOAA dataset, the persistence has little (if not none at all) effect
on the spectrum; correcting autocorrelations acts only on a better
definition of the ~60-year spectral maximum. Again, the differences works
well in nullifying the persistence.
In some a way the above three tests define a fixed point within the present work, so allow some
While the above statements for NOAA data hold also in the general case, I must outline that the enhancement of the persistence is not the same in any dataset and for any climate variable. Lake Victoria levels shows the differences and derivatives did not cancel the persistence at all, but in any case give rise to a noticeable restoration of the spectral structure with very good resemblance between the spectra and the not-significant variety of the periods.
In the same time, the applied transformation allows to cancel spectral peaks (like the ~40 and ~78 year ones) whose significance has been discussed without understand their origin.
It should be also noted that, in spite of a diversity for longer periods of the “observed” spectrum of lake Victoria with respect of the other two ones, the shortest periods are common to all spectra, perhaps suggesting the persistence acts differently along the spectrum.
I think the present procedure, i.e. the use of differences/derivatives, generates uncorrelated data which contains the information (at least the spectral one) of the original data, allowing a more reliable spectral analysis. At the same time, I think we need to verify the improvement in any single situation.
Nile: annual minimum level, 622-1469 CE
The Nile river series is linked to teh lake Victoria leveò because tha lake is the source of the White Nile, while the Blue Nile takes its origin from the Ethiopian Highlands. They converge near Khartoum, Sudan and the two arms become “the Nile”.
The series of the Nile annual minimum level (site visited 20 November, 2017) has a Hurst exponent Hobs=0.833, high enough to justify the use of the differences. Figure 12 shows how the transformation can nullify the persistence.
Fig.12. Observed and difference acfs of the
Nile minimum level, 622-1469 CE. Hobs=0.833; Hdiff=0.5.
In the next figures 13 and 14 “observed” and “difference” spectra will be compared.
Fig.13. Observed annual minimum level of the Nile river and its MEM spectrum.
Fig.14. Differences between successive values of the Nile series and their spectrum.
A comparison between the spectra show that:
- An almost invisible “wave” in the “observed” spectrum at 320-340 year is enhanced in the differences as a peak at 308 year.
- An “observed” maximum at 242 year is present in the differences as decomposed in a main maximum at 199 year and two minor ones at about 225 and 260 year.
- “observed” maxima at 82.6 and 99.6 year become 89.1 and 96.7 (the last one not marked in figure 14). The main 89.1 maximum is better defined than the “observed” one.
- The shortest-period maxima are the same in both spectra, but I note the lack of a 37.2 year “observed” period, apparently substituted by a 40.3 year peak in the differences.
- The ratios between the power of “observed” ad “differences” peaks appear larger than previously.
TPW: Total Precipitable Water
The climate variable TPW is strongly linked to temperature and gains its largest values along the Pacific equatorial belt, mainly in the areas of the Indonesian sea, named the “warm pool”, where the Pacific water, pushed by the Trade Winds, accumulate during El Niño events. The above-mentioned strong link is outlined in figure 15
Fig.15. The HadCrut4 global temperature anomaly compared with a scaled TPW. Within any evidence, the plots refer to the same
TPW data is available for two latitude belts: ±20° and ±60°. Here, only the wider belt has been used.
The shapes of figure 15 and the NOAA anomaly of figure 1 push to think about persistence and the use of the difference, so figure 16 appears a natural process
Fig.16. TPW Observed and difference acfs. No need to highlight the enhancement. Here Hobs=0.968 and
Spectral analysis of observed and difference data confirms again the little or null influence of pthe persistence on the spectral periods of such a kind of data. Also a ratio among the powers (heights) of observed and differences spectral peaks appears, as usual. Here, some doubt can rise, due to the short time extension of the dataset (30 years) but I think they can be resolved in the best way by a comparison with the above NOAA data analysis.
Fig.17. TPW observed data for the ±60° belt and its MEM spectrum.
Fig.18. TPW differences and their MEM spectrum.
OHC: Ocean Heath Content (0-700m)
I use here only the data relative to the global ocean (0-700m). The constant data step is 1 year and the range 1955-2015 is covered.
Hurst exponent for observed data is Hobs=0.970 and becomes Hdiff=0.468 after the transformation, as in figure 19.
Fig.19. acfs of OHC and its differences. Hobs=0.970 and Hdiff=0.468.
This is another situation where the persistence is high in the observed data and null after the differences. Both spectra, figures 20 and 21, appear similar in their structure, with some minor variety in evidence: 4.1 year not present in the observed spectrum and the shallow 30.5 year in the observed is a “desaparecido” in the differences.
Fig.20. Observed OHC (0-700m) and its MEM spectrum
Fig.21. Differences of OHC and their MEM spectrum
Dendrology: tree rings, russ243, 1540-2004 CE
Here the so-called “observed data” is the average over the 45 available cronologies measured at the Skahalin Island (Russia).
Its acf, along with that of the differences is in figure 22.
Fig.22. acf of the dendrological series russ243mm and the one of its differences. Observed acf(1) tells us about a weak persistence (Hobs=0.809 that, nevertheless, is totally cancelled by the differences (Hdiff=0.5).
Here the climate variable is the ring width (in microns) and NOT the temperature, due to large and well known problems of the calibration process, because ring growth doesn’t depend only on temperature but on many meto-climatic and geological factors.
Comparing the observed, figure 23, and the difference, figure 24, spectra tell us that the persistence can be corrected also through a 450-year time range and that, again, the information content doesn’t change after the transformation.
Fig.23. Average dendrology of russ243 and its MEM spectrum.
Fig.24. Differences d(i)=w(i+1)-w(i) of the average dendrology russ243 and its MEM spectrum.
The actual spectrum shows the most severe difference of all the up-to-now transformation processes, also if some spectral maxima appear in both spectra and some period variety doesn’t seem to be significative.
Kinderlinskaya Cave, Russia, Souther Urals
This is a δ18O series, spanning all over the Holocene for 11 ka (ka=kiloyears=1000 years). Data are available from NOAA paleo site and its reference paper is Baker et al., 2017. The time line is referenced as BP2k (i.e. the present is the year 2000) and the series is the longest one I did apply the difference method to. Also, the Hurst exponent has the larger value among those actually available Hobs=0.995. Comparison between observed and differences acfs is
in figure 25, where an amazing enhancement of the persistence clearly appears.
Fig.25. acf of δ18O, Kinderlinskaya Cave, and its differences. We have a very high persistence in the observed data, totally nullified by the differences. Hobs=0.995 and Hdiff=0.484.
From the spectra in figures 26 and 27 the following information can be derived:
- Differences show structures which correspond to the “jumps” in δ18O values, so that they don’t appear uncorrelated, also if the acf says the opposite.
- Diversity between spectral maxima are real but the time extension of the data (almost 12000 year) make the 500-year maximum difference poorly significative.
- Several spectral peaks are present in both spectra, in that confirming as the information propagates from observed to difference data, also through a so-long time range.
Fig.26. Observed δ18O data and its LOMB spectrum.
Fig.27. Difference δ18O data and its LOMB spectrum. Several maxima are common to both obs and diff spectra.
Stockholm tide gauge 1774-2000 CE
The Stockholm tide gauge is the longest series in the world; data, available at the PSMSL site (Permanent Service for Mean Sea Level), include monthly values and the annual means I use here. They show some breaks at the beginning of the dataset, so derivatives have been used as transformation function. The respective acfs are in figure 28.
Fig.28. Observed and derivatives acfs. ACF(1) of “deriv” is zero. Persistence has been cancelled by derivatives also if with large oscillations. Hobs=0.950 and Hdiff=0.523.
Spectral comparison (figures 29 and 30) shows again the above-mentioned power ratio among peaks, also if the spectra are very similar. An exception is the maximum at 94.2 year, present oncly in the derivatives, without any signal in the observed data.
Fig.29. Sea level at Stockholm, annual means, and its LOMB spectrum.
Fig.30. Numerical derivatives of the Stockholm tide gauge and their LOMB spectrum. Longest periods (94, 140-160 year) show some differences while the shortest ones, mainly those “El Niño” like, are mantained in both spectra.
Actually, mainy due to the lack of opposite proofs, the use of the differences/derivatives with the scope to eliminate the persistence and bring the Hurst exponent to H=0.5 (i.e. uncorrelated data) appears a really effective method.
I’m not able to prove the general statement that differences conserve the information content of the original data, so tried at least to verify that this is true in a variety of concrete situations which covered various: steps, time extension, climate variables and persistence content.
The enhancement of the persistence pushes me to think the actual procedure allows to overcome the problem of the autocorrelation, at least as far as spectral analysis is concerned.
After the due tests, I can suggest the differences/derivatives spectrum gives the best available results.
|All plots and data relative to this article can be found at the support site here.|
- Alexander W.J.R., Bailey F., Bredenkamp D.B., van der Merwe A. and Willemse N., 2007. Linkages between solar activity, climate predictability and water resource development Journal of the South African Institution of Civil Engineering, 49(2), 32-44, 2007. Full text available at https://saice.org.za/downloads/journal/vol49-2-2007/vol49_n2_e.pdf
- Jonathan L. Baker, Matthew S. Lachniet, Olga Chervyatsova, Yemane Asmerom and Victor J. Polyak: Holocene warming in western continental Eurasia
driven by glacial retreat and greenhouse forcing, Nature GeosciencePUBLISHED ONLINE: 22 MAY 2017. doi:10.1038/NGEO2953
- Childers, D.G. (Ed.), 1978. Modern Spectrum Analysis. IEEE Press, New York (chapter II).
- Koutsoyiannis D.: The Hurst phenomenon and fractional Gaussian noise made easy, Hydrological Sciences-Journal-des Sciences Hydrologiques47:4, 573-595, 2002. doi:10.1080/02626660209492961
- Koutsoyiannis D.: Climate change, the Hurst phenomenon, and hydrological statistics, Hydrological Sciences-Journal-des Sciences Hydrologiques,48:1, 3-24, 2003. S.I. doi:10.1623/hysj.481.3.43481
- Koutsoyiannis D.: Nonstationarity versus scaling in hydrology, Journal of Hydrology, 324, 239-254, 2006. doi:10.1016/j.jhydrol.2005.09.022
- Lomb, N.R., 1976. Least-squares frequency analysis of unequally spaced data. Astrophys. Space Sci., 39, 447-462, 1976.
- Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 2009. Numerical Recipes in Fortran. Cambridge University Press India Pvt. Ltd., New Delhi, 2009.
- Scargle, J.D., 1982 Statistical aspects of spectral analysis of unevenly spaced data. Astrophys. J., 263, 835-853.