For doing global surface temperature projections with the CSALT model, I find it critical to not over-fit if the training period is short. Over-fitting at short intervals can create oppositely compensating signs on factors, and these become sensitive to amplification when projected. The recommendation is then to rank the factors (or principal components) in order of their contributing strength to promoting a good fit via the correlation coefficient. See Fig. 1
With the original handful of CSALT factors, we can reach good correlation rather quickly. But after this point, the forcing factors from solar, lunar, and orbital become increasingly more subtle, providing progressively less thermal forcing as we run down the list of periods suggested by previous researchers. From the clear asymptotic trend, we would likely require several times as many factors to reach correlation coefficient levels arbitrarily close to 1. Noise does not seem to be an issue as the vast majority of the temperature fluctuations appear to come from real forcing terms. The noise residual in this case is at the 0.002 level or 0.2% of the measured signal.
For projecting trends, the objective is to use as few factors as possible on the training interval, as shown in Fig. 2. This will pull the salient trends out at the expense of losing detail.
The following figure uses only the first 6 cyclical terms after the TSI factor of Fig 1 to train on the data up to the year 1950, and then projects the trend as the blue line.
If we look at the result of applying all the potential forcing factors, we can get very high correlation coefficients as shown in Fig. 4.
The small deviations of the model from the data, shown by the arrows, correspond to actual named climatic periods -- corresponding to the Heat Wave of 1977 and the Cold Sunday polar vortex conditions that occurred in January 1982. One can see that deviation more clearly in the following residual plot.
Extremes will always be hard to model, as these usually come about due to the low probability of conditions that will provoke an extreme.
It may look as if with enough sinusoidal factors, we can fit any waveform. In fact, many of these natural oscillations are locked in terms of frequency and phase. So when we have a thermal signal emerging due to the effects of a semi-diurnal tide of 8.85/2=4.425 year period (see here), the peaks of these will be fixed with respect to the calendar. Using the CSALT model, we can accurately discriminate the frequency as well as the phase. The upshot of this is that the strength of the matching will show a maximum for 4.425 years and will drop quickly should we choose 4.40 or 4.50 years. And if we pick the wrong phase, we will be shifted in the calendar to no longer line up with the actual maximum gravitational pull.
This isn't regular signal processing where we can assume stationarity in the origin of the signal. The origin or phase are fixed ... fixed in the stars so to speak.