For doing global surface temperature projections with the CSALT model, I find it critical to not over-fit if the training period is short. Over-fitting at short intervals can create oppositely compensating signs on factors, and these become sensitive to amplification when projected. The recommendation is then to rank the factors (or principal components) in order of their contributing strength to promoting a good fit via the correlation coefficient. See **Fig. 1**

With the original handful of CSALT factors, we can reach good correlation rather quickly. But after this point, the forcing factors from solar, lunar, and orbital become increasingly more subtle, providing progressively less thermal forcing as we run down the list of periods suggested by previous researchers. From the clear asymptotic trend, we would likely require several times as many factors to reach correlation coefficient levels arbitrarily close to 1. Noise does not seem to be an issue as the vast majority of the temperature fluctuations appear to come from real forcing terms. The noise residual in this case is at the 0.002 level or 0.2% of the measured signal.

For projecting trends, the objective is to use as few factors as possible on the training interval, as shown in **Fig. 2**. This will pull the salient trends out at the expense of losing detail.

The following figure uses only the first 6 cyclical terms after the TSI factor of **Fig 1 **to train on the data up to the year 1950, and then projects the trend as the blue line.

If we look at the result of applying all the potential forcing factors, we can get very high correlation coefficients as shown in **Fig. 4**.

The small deviations of the model from the data, shown by the arrows, correspond to actual named climatic periods -- corresponding to the *Heat Wave of 1977* and the *Cold Sunday* polar vortex conditions that occurred in January 1982. One can see that deviation more clearly in the following residual plot.

Extremes will always be hard to model, as these usually come about due to the low probability of conditions that will provoke an extreme.

It may look as if with enough sinusoidal factors, we can fit any waveform. In fact, many of these natural oscillations are locked in terms of frequency *and* phase. So when we have a thermal signal emerging due to the effects of a semi-diurnal tide of 8.85/2=4.425 year period (see here), the peaks of these will be fixed with respect to the calendar. Using the CSALT model, we can accurately discriminate the frequency as well as the phase. The upshot of this is that the strength of the matching will show a maximum for 4.425 years and will drop quickly should we choose 4.40 or 4.50 years. And if we pick the wrong phase, we will be shifted in the calendar to no longer line up with the actual maximum gravitational pull.

This isn't regular signal processing where we can assume stationarity in the origin of the signal. The origin or phase are fixed ... fixed in the stars so to speak.

Picked this up from the CE blog:

Mosh is right. Plain vanilla "natural variation" has to be decomposed into other factors before one can even begin to explain anything with it. And when one starts doing that, the likely causative agents become better characterized and understood.

Consider Fig.1 which ranks the possible contributing factors to the warming signal left to right via the CSALT model

Obviously CO2 contributes over 90% of the variability and then LOD and SOI produce much of the rest. Then it is a matter of understanding how LOD and SOI manifest their variability. And this is a smaller decomposition unit to understand.

Pingback: Decadal Temperature Variations and LOD | context/Earth

This is very interesting stuff. Also, I wondered to see/download your "stochastic analysis" PDF, but the operation timed out. I gave it two tries. Advice?

Sorry, I had to adjust my router settings. Should be OK now.

Pingback: The Southern Oscillation Index Model | context/Earth

Pingback: Biennial Connection from QBO to ENSO | context/Earth