Big data is in fashion these days, especially for modelling ecosystems in ocean science. But how much data do you really need? And does adding more data, and building more complex models, actually give you better results? To find out, the authors used the case of kelp forests on the west coast of Vancouver Island, British Columbia. Their results show how, depending on the objective, simpler models can actually be better.
Historically in the study area, canopy kelp was subject to predation pressure from sea urchins, which can eat a lot of kelp. Sea otters, a major predator of invertebrates like sea urchins, have been re-introduced, therefore reducing predation pressure on the kelp. As the sea otter population grows, there is interest in knowing how much kelp forests will recover. The kelp forests provide habitat and food for a wide range of species, making them a much more biodiverse and productive habitat compared to an area dominated by sea urchins.
Predicting the potential distribution of a species can range from relatively simple models, like the habitat suitability index which has been used since the 1980s, to machine learning-based models which “learn” from patterns the model finds in the data. Starting with a habitat suitability index model, the authors cranked up the complexity in a stepwise fashion so the fit and forecasting skill of the increasingly complex models could be compared.
The authors found that as they increased model complexity, the models were better at explaining the patterns in the data used to build them, but not as good at predicting patterns from different years. Increasing model complexity, therefore, does not necessarily result in more accurate model predictions. So, managers: do not fret! Your data-limited situation may not be all that limited after all.
Source: Gregr, Edward, et al. (2018) Why less complexity produces better forecasts: An independent data evaluation of kelp habitat models. 10.1111/ecog.03470. Available in MarXiv at https://marxiv.org/u76hf.