I've been applying equal doses of machine learning (and knowledge based artificial intelligence in general) and physics in my climate research since day one. Next month on December 12, I will be presenting Knowledge-Based Environmental Context Modeling at the AGU meeting which will cover these topics within the earth sciences realm :

Table 1: Technical approach to knowledge-based model building for the earth sciences

In my opinion, machine learning likely will eventually find all the patterns that appear in climate time-series but with various degrees of human assistance.

"Vipin Kumar, a computer scientist at the University of Minnesota in Minneapolis, has used machine learning to create algorithms for monitoring forest fires and assessing deforestation. When his team tasked a computer with learning to identify air-pressure patterns called teleconnections, such as the El Niño weather pattern, the algorithm found a previously unrecognized example over the Tasman Sea."

In terms of the ENSO pattern, I believe that machine learning through tools such as Eureqa could have found the underlying lunisolar forcing pattern, but would have struggled mightily to break through the complexity barrier. In this case, the complexity barrier is in (1) discovering a biennial modulation which splits all the spectral components and (2) discovering the modifications to the lunar cycles from a strictly sinusoidal pattern.

The way that Eureqa would have found this pattern would be through it's *symbolic regression* algorithm (which falls under the first row in **Table 1** shown above). It essentially would start it's machine learning search by testing various combinations of sines and cosines and capturing the most highly correlated combinations for further expansion. As it expands the combinations, the algorithm would try to reduce complexity by applying trigonometric identities such as this

After a while, the algorithm will slow down under the weight of the combinatorial complexity of the search, and then the analyst would need to choose promising candidates from the *complexity versus best-fit* Pareto front. At this point one would need to apply knowledge of physical laws or mathematical heuristics which would lead to a potentially valid model.

So, in the case of the ENSO model, Eureqa *could have* discovered the (1) biennial modulation by reducing sets of trigonometric identities, and perhaps by applying a *sin(A sin())* frequency modulation (which it is capable of) to discover the (2) second-order modifications to the sinusoidal functions, or (3) it could have been fed a differential equation structure to provide a hint to a solution .... but, a human got there first by applying prior knowledge of signal processing and of the details in the orbital lunar cycles.

Yet as the Scientific America article suggests, that will likely not be the case in the future when the algorithms continue to improve and update their knowledge base with laws of physics.

This more sophisticated kind of reasoning involves the refined use of the other elements of **Table 1**. For example, a more elaborate algorithm could have lifted an entire abstraction level out of a symbolic grouping and thus reduced its complexity. Or it could try to determine whether a behavior was stochastic or deterministic. The next generation of these tools will be linked to knowledge-bases filled with physics patterns that are organized for searching and reasoning tasks. These will relate the problem under study to potential solutions automatically.