Limits to Goodness of Fit

Based on a comparison of local interval correlations between the NINO34 and SOI indices, there probably is a limit to how well a model can be fit to ENSO.  The lower chart displays a 4-year-windowed correlation coefficient (in RED) between the two indices (shown in upper chart):

Note that in the interval starting at 1930, the correlation is poor for about 7 years.

Next note that the ENSO model fit shows a poor correlation to the NINO34 data in nearly the same intervals (shown as dotted GREEN). This is an odd situation but potentially revealing. The fact that both the ENSO model and SOI don't match the NINO34 index over the same intervals, suggests that the model may match SOI better than it does NINO34.  Yet, because of the excessive noise in SOI, this is difficult to verify.

But more fundamentally, why would NINO34 not match SOI in these particular intervals? These regions do seem to be ENSO-neutral, not close to El Nino or La Nina episodes.  Some also seem to occupy regions of faster, noisy fluctuations in the index.

It could be that the ENSO lunar tidal model is revealing the true nature of the ENSO dynamics, and these noisier, neutral regions are reflecting some other behavior — but since they also appear to be obscured by noise, it makes it difficult to unearth.


The paper by Zajączkowska[1] also applies a local correlation to compare the lunar tidal cycles to plant growth dynamics. There's a treasure trove of recent research on this topic.


Second-Order Effects in the ENSO Model

For ocean tidal predictions, once an agreement is reached on the essential lunisolar terms, then the second-order terms are refined. Early in the last century Doodson catalogued most of these terms:

"Since the mid-twentieth century further analysis has generated many more terms than Doodson's 388. About 62 constituents are of sufficient size to be considered for possible use in marine tide prediction, but sometimes many fewer can predict tides to useful accuracy."

That's possibly the stage we have reached in the ENSO model.  There are two primary terms for lunar forcing (the Draconic and Anomalistic) cycles, that when mixed with the annual and biannual cycles, will reproduce the essential ENSO behavior.  The second-order effects are the  modulation of these two lunar cycles with the Tropical/Synodic cycle.  This is most apparent in the modification of the Anomalistic cycle. Although not as important as in the calculation of the Total Solar Eclipse times, the perturbation is critical to validating the ENSO model and to eventually using it to make predictions.

The variation in the Anomalistic period is described at the NASA Goddard eclipse page. They provide two views of the variation, a time-domain view and a histogram view.

Time domain view
Histogram view

Since NASA Goddard doesn't provide an analytical form for this variation, we can see if the ENSO Model solver can effectively match it via a best-fit search to the ENSO data. This is truly an indirect method.

First we start with a parametric approximation to the variation, described by a pair of successive frequency modulated (and full-wave rectified) terms that incorporate the Tropical-modified term, wm. The Anomalistic term is wa.


\cos(\omega_a t+\phi_a+c_1 \cdot |\sin(\omega_m t+k_1 \cdot |\sin(\omega_m t+k_2)|+c_2)|)

This can generate the cusped behavior observed, but the terms pa, c_1, c_2, k_1, and k_2 need to be adjusted to align to the NASA model. The solver will try to do this indirectly by fitting to the 1880-1950 ENSO interval.

Plotting in RED the Anomalistic time series and the histogram of frequencies embedded in the ENSO waveform, we get:

Time domain view of model
Histogram view of model

This captures the histogram view quite well, and the time-domain view roughly (in other cases it gives a better cusped fit).  The histogram view is arguably more important as it describes the frequency variation over a much wider interval than the 3-year interval shown.

What would be even more effective is to find the correct analytical representation of the Anomalistic frequency variation and then plug that directly into the ENSO model. That would provide another constraint to the solver, as it wouldn't need to spend time optimizing for a known effect.

Yet as a validation step, the fact that the solver detects the shape required to match the variation is remarkable in itself. The solver is obviously searching for the forcing needed to produce the ENSO waveform observed, and happens to use the precise parameters that also describe the second-order Anomalistic behavior.  That could happen by accident but in that case there have been too many happy accidents already, i.e. period match, LOD match, Eclipse match, QBO match, etc.

Using Solar Eclipses to calibrate the ENSO Model

This is the forcing for the ENSO model, focusing on the non-mixed Draconic and Anomalistic cycles:

Note that the maximum excursions (perigee and declination excursion) align with the occurrence of total solar eclipses. These are the first three that I looked at, which includes the latest August 21 eclipse in the center chart.

There are about 90 more of these stretching back to 1880. The best way to fit the calibration is to take the negative excursions of the two lunar forcings and multiply these together, i.e. use the effective Draconic*Anomalistic amplitudes (also only take the fortnightly cycle of the Draconic, as eclipses occur during both the ascending and descending node crossings). The main fitting factors are the phases of the two lunar months.  To get the maximum alignment from the search solver, we maximize the sum of the effective amplitudes across the entire interval. This results in a phase difference between the two of about 0.74 radians based at the starting year of 1880 (i.e. year 0).

Continue reading

Search for El Nino

The model for ENSO includes a nonlinear search feature that finds the best-fit tidal forcing parameters.  This is similar to what a conventional ocean tidal analysis program performs — finding the best-fitting lunar tidal parameters based on a measured historic interval of hundreds of cycles. Since tidal cycles are abundant — occurring at least once per day — it doesn't take much data collected over a course of time to do an analysis.  In contrast, the ENSO model cycles over the course of years, so we have to use as much data as we can, yet still allow test intervals.

What follows is the recipe (more involved than the short recipe) that will guarantee a deterministic best-fit from a clean slate each time. Very little initial condition information is needed to start with, so that the final result can be confidently recovered each time, independent of training interval.

Continue reading

Variation in the Length of the Anomalistic Month

For the ENSO model, we use two constraints for the fitting process. One of the constraints is to maximize the correlation coefficient for the model over the ENSO training interval selected. The other constraint is to maximize the correlation of the selected lunar tidal forces over a measured Length-of-Day (LOD) interval. The latter constrains the lunar tidal forcing to known values that will actually change the angular momentum of the earth's rotation. This in turn drives the sloshing of the Pacific ocean's thermocline leading to the ENSO cycle. The two constraints are simultaneously met by heuristically maximizing the average of the correlation coefficients.

In addition, there are the fixed constraints of the primary lunar periods corresponding to the Draconic/nodal cycle and the Anomalistic cycle.

This combination gives a fairly effective fit over the entire training cycle, but there is an important additional constraint that needs to be applied to the Anomalistic cycle. The NASA eclispse and moon's orbit page describes the situation :

"The anomalistic month is defined as the revolution of the Moon around its elliptical orbit as measured from perigee to perigee. The length of this period can vary by several days from its mean value of 27.55455 days (27d 13h 18m 33s). Figure 4-4 plots the difference of the anomalistic month from the mean value for the 3-year interval 2008 through 2010. ... the eccentricity reaches a maximum when the major axis of the lunar orbit is pointed directly towards or directly away from the Sun (angles of 0° and 180°, respectively). This occurs at a mean interval of 205.9 days, which is somewhat longer than half a year because of the eastward shift of the major axis. "

This is a significant variation in the anomalistic cycle over the course of a year. We don't use this variation as a constraint but we can use it as a fitting parameter and then compare the variation obtained over that shown above.

Using the 205.9 day value ~365/(2-2/8.85), we break this into Fourier components of half this value and twice this value. The mean Anomalistic period of 27.5545 days is then frequency modulated by the slower periods by the standard engineering procedure. We then allow the amplitude and phase of each factor to vary during the training to obtain the best fit (this is slightly different than the concise form used previously).

If we zoom in on the anomalistic period variation, we get this match to the NASA Goddard model:

There is no reason to believe that this match would spontaneously occur given that there are 3 amplitudes and 3 phase factors involved. Yet it matches precisely to the (1) peak positions, (2) relative amplitudes, and to the (3) cusped shape via the Fourier series summation. An even better fit is obtained if we use abs(sin(π time/205.9+Φ)) as the fitting function as it naturally creates more of a cusp shape due to the full-wave rectification of the sine wave.

Conventional tidal analysis is renowned for being an exacting procedure [1], where the known tidal periods are broken down into equivalently similar sets of harmonic factors, yet applied on a diurnal or semidurnal basis. The only difference here is that ENSO responds to the monthly and fortnightly long-period tides and not the short-period ones.

→ This model fit gives further validation to the lunar tidal mechanism for forcing ENSO.  The exacting process of generating the correct lunar tidal variations (along with the subtle biennial modulation and the tricky aliasing) have likely contributed to the fact that the pattern has remained hidden for so long.  This is actually not so different a situation as the long hidden connection between triggering of earthquakes and the dynamic  moon-sun-earth alignment. That pattern is also hidden, only exposed recently. Alas, not everything can be quite as obvious as the pattern matching of ocean surface tides to the lunisolar cycles.


[1] S. Consoli, D. R. Recupero, and V. Zavarella, “A survey on tidal analysis and forecasting methods for Tsunami detection,” arXiv preprint arXiv:1403.0135, 2014.


The reason for the peculiar shape of the Anomalistic frequency variations is due to a different slope (i.e. velocity) on one lobe of the elliptical orbit than on the other. You can get this by generating another sinusoidal modulation on top of the average elliptical sinusoid. This generates an asymmetric sawtooth in the phase angle (see blue line below) and the characteristic spiked or cusped profile in the effective frequency or derivative of this value (see red line below).
I am starting to use this formulation in the ENSO model as it is quite concise.

The Hawkmoth Effect

Contrasting to the well-known Butterfly Effect, there is another scientific modeling limitation known as the Hawkmoth Effect.  Instead of simulation results being sensitive to initial conditions, which is the Butterfly Effect, the Hawkmoth Effect is sensitive to model structure.  It's a more subtle argument for explaining why climate behavioral modeling is difficult to get right, and named after the hawkmoth because hawkmoths are "better camouflaged and less photogenic than butterflies".

Not everyone agrees that this is a real effect, or it just reveals shortcomings in correctly being able to model the behavior under study. So, if you have the wrong model or wrong parameters for the model, of course it may diverge from the data rather sharply.

In the context of the ENSO model, we already provided parameters for two orthogonal intervals of the data.  Since there is some noise in the ENSO data — perfectly illustrated by the fact that SOI and NINO34 only have a correlation coefficient of 0.79 — it is difficult to determine how much of the parameter differences are due to over-fitting of that noise.

In the figure below, the middle panel shows the difference between the SOI and NINO34 data, with yellow showing where the main discrepancies or uncertainties in the true ENSO value lie. Above and below are the model fits for the earlier (1880-1950 shaded in a yellow background) and later (1950-2016) training intervals. In certain cases, a poorer model fit may be able to be ascribed to uncertainty in the ENSO measurement, such as near ~1909., ~1932, and ~1948, where the dotted red lines align with trained and/or tested model regions. The question mark at 1985 is a curiosity, as the SOI remains neutral, while the model fits to more La Nina conditions of NINO34.

There is certainly nothing related to the Butterfly Effect in any of this, since the ENSO model is not forced by initial conditions, but by the guiding influence of the lunisolar cycles. So we are left to determine how much of the slight divergence we see is due to non-stationary variation of the model parameters over time, or whether it is due to missing some other vital structural model parameters. In other words, the Hawkmoth Effect is our only concern.

In the model shown below, we employ significant over-fitting of the model parameters. The ENSO model only has two forcing parameters — the Draconic (D) and Anomalistic (A) lunar periods, but like in conventional ocean tidal analysis, to make accurate predictions many more of the nonlinear harmonics need to be considered [see Footnote 1]. So we start with A and D, and then create all combinations up to order 5, resulting in the set [ A, D, AD, A2, D2, A2D, AD2, A3, D3, A2D2, A3D, AD3, A4, D4, A2D3, A3D2, A4D1, A1D4, A5, D5 ].

This looks like it has the potential for all the negative consequence of massive over-fitting, such as fast divergence in amplitude outside the training interval, yet the results don't show this at all.  Harmonics in general will not cause a divergence, because they remain in phase with the fundamental frequencies both inside and outside the training interval. Besides that, the higher order harmonics start having a diminished impact, so this set is apparently about right to create an excellent correlation outside the training interval.  The two other important constraints in the fit, are (1) the characteristic frequency modulation of the anomalistic period due to the synodic period (shown in the middle left inset) and (2) the calibrated lunar forcing based on LOD measurements (shown in the lower panel).

The resulting correlation of model to data is 0.75 inside the training interval (1880-1980) and 0.69 in the test interval (1980-2016).  So this gets close to the best agreement we can expect given that SOI and NINO34 only reaches 0.79.  Read this post for the structural model parameter variations for a reduced harmonic set to order 3 only.

Welcome to the stage of ENSO analysis where getting the rest of the details correct will provide only marginal benefits;  yet these are still important, since as with tidal analysis and eclipse models, the details are important for fine-tuning predictions.


  1. For conventional tidal analysis, hundreds of resulting terms are the norm, so that commercial tidal prediction programs allow an unlimited number of components.




Switching between two models

Recipe for ENSO model in one tweet

and for QBO

The common feature of the two is the application of Laplace's tidal equation and its closed-form solution.

Should you trust climate science? Maybe the eclipse is a clue

An example of a prediction:

"Looks like we're heading for La Nina going into Winter. That means I expect 2018 will not average much different from 2017, both close to 2015 level. Then a probable new record in 2019."

How does anyone know which way the ENSO behavior is heading if there is not a clear understanding of the underlying mechanism? [1]

For the prediction quoted above, the closer one gets to an peak or valley, the safer it is to make a dead reckoning guess. For example, I can say a low tide is coming if it is coming off a high tide — even if I have no idea what causes tides.

Yet, if we understand the mechanism behind ocean tides — that it is due to the gravitational pull of the sun and the moon  —  we can do a much better job of prediction.

The New York Times climate change reporter Justin Gillis suggests that climate science can make predictions as well as geophysicists can predict eclipses:  And there is this:

Yet, if climate scientists can't figure out the mechanism behind a behavior such as ENSO, everyone is essentially in the same boat, fishing for a basic understanding.

So what happens if we can formulate the messy ENSO behavior into a basic geophysics problem, something on the complexity of tides?  We are nowhere near that according to the current research literature, unless this finding — which has been a frequent topic here — turns out to be true.

In this case, the recent solar eclipse is in fact a clue. The precise orbit of the moon is vital to determining the cycles of ENSO. If this assertion is true, one day we will likely be able to predict when the next El Nino occurs, with the accuracy of predicting the next eclipse.


[1] Consider one common explanation invoking winds. In fact, shifts in the prevailing winds is not a mechanism because any shift or reversal requires a mechanism itself, see for example the QBO.


ENSO model for predicting El Nino and La Nina events

Applying the ENSO model to predict El Nino and La Nina events is automatic. There are no adjustable parameters apart from the calibrated tidal forcing amplitudes and phases used in the process of fitting over the training interval. Therefore the cross-validated interval from 1950 to present is untainted during the fitting process and so can be used as a completely independent and unbiased test.

Continue reading