Contrasting to the well-known Butterfly Effect, there is another scientific modeling limitation known as the Hawkmoth Effect. Instead of simulation results being sensitive to initial conditions, which is the Butterfly Effect, the Hawkmoth Effect is sensitive to model structure. It's a more subtle argument for explaining why climate behavioral modeling is difficult to get right, and named after the hawkmoth because hawkmoths are "better camouflaged and less photogenic than butterflies".

Not everyone agrees that this is a real effect, or it just reveals shortcomings in correctly being able to model the behavior under study. So, if you have the wrong model or wrong parameters for the model, of course it may diverge from the data rather sharply.

In the context of the ENSO model, we already provided parameters for two orthogonal intervals of the data. Since there is some noise in the ENSO data — perfectly illustrated by the fact that SOI and NINO34 only have a correlation coefficient of 0.79 — it is difficult to determine how much of the parameter differences are due to over-fitting of that noise.

In the figure below, the middle panel shows the difference between the SOI and NINO34 data, with yellow showing where the main discrepancies or uncertainties in the true ENSO value lie. Above and below are the model fits for the earlier (1880-1950 shaded in a yellow background) and later (1950-2016) training intervals. In certain cases, a poorer model fit may be able to be ascribed to uncertainty in the ENSO measurement, such as near ~1909., ~1932, and ~1948, where the dotted red lines align with trained and/or tested model regions. The question mark at 1985 is a curiosity, as the SOI remains neutral, while the model fits to more La Nina conditions of NINO34.

There is certainly nothing related to the Butterfly Effect in any of this, since the ENSO model is not forced by initial conditions, but by the guiding influence of the lunisolar cycles. So we are left to determine how much of the slight divergence we see is due to non-stationary variation of the model parameters over time, or whether it is due to missing some other vital structural model parameters. In other words, the Hawkmoth Effect is our only concern.

In the model shown below, we employ significant over-fitting of the model parameters. The ENSO model only has two forcing parameters — the Draconic (D) and Anomalistic (A) lunar periods, but like in conventional ocean tidal analysis, to make accurate predictions many more of the nonlinear harmonics need to be considered [see Footnote 1]. So we start with A and D, and then create all combinations up to order 5, resulting in the set [ A, D, AD, A^{2}, D^{2}, A^{2}D, AD^{2}, A^{3}, D^{3}, A^{2}D^{2}, A^{3}D, AD^{3}, A^{4}, D^{4}, A^{2}D^{3}, A^{3}D^{2}, A^{4}D^{1}, A^{1}D^{4}, A^{5}, D^{5 }].

This looks like it has the potential for all the negative consequence of massive over-fitting, such as fast divergence in amplitude outside the training interval, yet the results don't show this at all. Harmonics in general will not cause a divergence, because they remain in phase with the fundamental frequencies both inside and outside the training interval. Besides that, the higher order harmonics start having a diminished impact, so this set is apparently about right to create an excellent correlation outside the training interval. The two other important constraints in the fit, are (1) the characteristic frequency modulation of the anomalistic period due to the synodic period (shown in the middle left inset) and (2) the calibrated lunar forcing based on LOD measurements (shown in the lower panel).

The resulting correlation of model to data is 0.75 inside the training interval (1880-1980) and 0.69 in the test interval (1980-2016). So this gets close to the best agreement we can expect given that SOI and NINO34 only reaches 0.79. Read this post for the structural model parameter variations for a reduced harmonic set to order 3 only.

Welcome to the stage of ENSO analysis where getting the rest of the details correct will provide only marginal benefits; yet these are still important, since as with tidal analysis and eclipse models, the details are important for fine-tuning predictions.

**Footnote:**

- For conventional tidal analysis, hundreds of resulting terms are the norm, so that commercial tidal prediction programs allow an unlimited number of components.

The example above is for a training fit from 1880 to 1980, and testing beyond that interval.

For a training interval from 1880 to 1950, this is what the lunar components are for a correlation coefficient of 0.85 (i.e. likely an over-fit)

Now if we take an orthogonal training interval from 1880-2016, this gives a CC of 0.80:

The relative amplitude and phases of the lunar tidal cycles are for all intents and purposes the same over the two non-overlapping intervals.

The hawkmoth parameters describing the differential equation are also nearly equivalent over the two intervals. The only departure is that the biennial Mathieu modulation is 0.2 in the 1880-1950 interval while it is 0.38 in the 1950-2016 interval. Yet there is nothing fixed about that value as it is a dynamic characteristic of what is a metastable behavior.

This is the spreadsheet look of the split cross-validation, animated to flip from one interval to another:

The training over-fits on each orthogonal interval, yet the parameters do not diverge too much, so that the most likely parameter values are somewhere within the mean range of the two training runs.

I've just read another paper that Brian Arbic co-authored, Frequency content of sea surface height variability from internal gravity waves to mesoscale eddies, DOI: 10.1002/2016JC012331 It would be interesting to see their model results for Darwin and Tahiti

HYCOM may not be that far away from being able to utilize the forcings you've identified.

Are they capturing more of the diurnal and semidurnal tidal factors?

I looked at their spectrum and the power at higher frequencies than 1 cycle per day looks to be mainly red noise-like.