ENSO Split Training for Cross-Validation

If we split the modern ENSO data into two training intervals — one from 1880 to 1950 and one from 1950 to 2016, we get roughly equal-length time series for model evaluation.

As Figure 1 shows, a forcing stimulus due to monthly-range LOD variations calibrated to the interval between 2000 to 2003 (lower panel) is used to train the ENSO model in the interval from 1880 to 1950. The extrapolated model fit in RED does a good job in capturing the ENSO data in the period beyond 1950.

Fig. 1: Training 1880 to 1950

Next, we reverse the training and verification fit, using the period from 1950 to 2016 as the training interval and then back extrapolating. Figure 2 shows this works about as well.

Fig. 2: Training interval 1950 to 2016

The possibility of over-fitting is always a concern.  There are only two primary lunar long-periods, the Anomalistic lunar month (A) and the Draconic (D) lunar month.  There is one yearly impulse with a fixed phase (IP).  A table of the parameters for the lunisolar behavior is shown below.   And for the differential equation model, there are an additional 5 parameters, as described in Figure 4.

Param Training
1880-1950
Training
1950-now
Description of parameter
IP 0.1234 0.1228 Yearly impulse phase (rad)
pD 0.1308 0.3610 Draconic monthly phase (rad)
D 0.9370 0.5880 Draconic month amplitude
D2 2.530 2.201 Draconic month squared amplitude
D3 -3.634 -1.596 Draconic month cubed amplitude
D4 -17.41 -17.01 Draconic month squared-squared proamplitude
pA -0.6370 -1.261 Anomalistic monthly phase (rad)
A -0.5918 -0.5433 Anomalistic month amplitude
A3 4.258 3.579 Anomalistic month cubed amplitude
ampA 0.9326 1.464 Anomalistic frequency modulation amplitude (Fig 4)
phaA 0.2297 -0.02378 Anomalistic frequency modulation phase (Fig 4)
DA 0.1418 0.0602 Draconic monthly times Anomalistic monthly
DA2 -2.685 -2.615 Draconic monthly times Anomalistic fortnightly
LOD CC 0.7407 0.7416 LOD correlation coefficient for chosen lunar params

Fig 3 :  Procedure for simultaneously fitting ENSO and LOD using lunar tidal coefficients

deriv 0.8816 0.8325  Wave Differential Equation coeff (1 mo derivative)
delay -0.1467 -0.1634  Wave Delay Differential Equation coeff (1 yr delay)
mA 0.1712 0.2874 Mathieu biennial modulation amplitude
mP 0.2782 0.3074 Mathieu biennial modulation phase
sina 20.76 14.62 Laplace tidal equation sin(A sin) scaling

Fig 4:  Anomalistic period varies at this time-scale

For each row, compare the values of the coefficients for the pre-1950 and post-1950 model fitting results. They do change, but not enough to imply a non-stationary underlying process.

So there appears to be plenty of room to over-fit the model to the data, yet the constraints imposed by the fixed lunar periods (with associated nonlinear products — squared, cubed, etc) and by the simultaneous fit to LOD prevents the curves from diverging too wildly in the extrapolated regions. That is the well-known behavior of poor cross-validation.

There is always the possibility that the model is missing another long-period tidal forcing that could incrementally improve the model. It may be more product terms such as the DA and DA2 used currently, so that perhaps D2A and D2A2 could be evaluated as well. From a Fourier analysis focusing on the residuals, we can get some other hints but this is where avoiding over-fitting is critical.

7 thoughts on “ENSO Split Training for Cross-Validation

  1. Pingback: Annual and Biennial Modulations | context/Earth

  2. Pingback: Variation in the Length of the Anomalistic Month | context/Earth

  3. Pingback: Search for El Nino | context/Earth

  4. Pingback: Using Solar Eclipses to calibrate the ENSO Model | context/Earth

  5. Pingback: Limits to Goodness of Fit | context/Earth

  6. Pingback: Earthquakes, tides, and tsunami prediction | context/Earth

Leave a Reply