Bringing the power of Deep Learning to the climate community via open datasets and architectures
The ClimateNet Project seeks to address an major open challenge in bringing the power of Deep Learning to the climate community, viz. that of creating community-sourced open-access expert-labeled datasets and architectures for improved accuracy and performance on a range of supervised learning problems.
Pattern recognition tasks such as classification, localization, object detection and segmentation have remained challenging problems in the weather and climate sciences. While there exist many heuristics and algorithms for detecting weather patterns or extreme events in a dataset, the disparities between the output of these different methods even for a single class of event are huge and often impossible to reconcile. Given the pressing need to address this problem, we propose a Deep Learning based solution.
The Vision and Proposed Approach
The first figure below captures our overall vision for a unified Deep Learning workflow, as relevant to climate science. The workflow can be split into two pieces: the Training phase and the Inference phase. The goal of the Training phase is to produce a single, unified Deep Network that is trained by examples from either heuristics and algorithms applied to training datasets or `hand'-labeled examples by human experts. In the inference phase, the unified Deep Network is applied to archives of multi-resolution, multi-modal datasets from climate model output or reanalyses or observational datasets.
A fundamental challenge in training Deep Networks is the availability of reliable suitably labeled training data. We propose creating a `ClimateNet' dataset, with an accompanying schema that can capture information pertaining to class or pattern labels, bounding boxes and segmentation masks. Ideally, various domain experts will contribute `hand'-labeled information to the ClimateNet dataset (The proposal was first presented by Prabhat, Karthik Kashinath et al. at AGU 2017 and recent progress will be presented at AGU 2018). We have developed a web interface and modified the "Label-Me" tool developed at MIT to crowdsource the `hand'-labeling task. See here. Examples have been created using the Toolkit for Extreme Climate Analysis (TECA) software which implements expert-specified heuristics to generate label information. Note that the overall spirit of the Deep Learning methodology is to avoid the prescription of heuristics for defining weather patterns; it has been conclusively established by the computer vision community that Deep Learning is extremely effective at learning relevant features for solving pattern classification tasks without requiring application-specific tuning. The second figure below shows a snapshot of the labelling tool on the web interface mentioned earlier.
Once the ClimateNet dataset is available, we propose developing a unified convolutional architecture to learn representations for various weather patterns and have already demonstrated that this task is doable (Ref. (1) Liu et al. (2016), Application of deep convolutional neural networks for detecting extreme weather in climate datasets, arXiv preprint arXiv:1605.01156, (2) Mudigonda et al. (2017), Segmenting and tracking extreme climate events using neural networks, Deep Learning for Physical Sciences, NIPS workshop, and (3) Racah et al. (2017), ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events, 31st conference on Neural Information Processing Systems)
We will then apply the unified network to archives of multi-resolution, multi-modal datasets from climate model output, reanalyses and observations to seamlessly extract class labels, bounding boxes and segmentation masks.