RFC 75: Multidimensional arrays
Author: |
Even Rouault |
Contact: |
even.rouault @ spatialys.com |
Started: |
2019-May-24 |
Last updated: |
2019-Jul-22 |
Status: |
Implemented in GDAL 3.1 |
Summary
This document describes the addition of read/write support for multidimensional arrays, in particular of dimension 3 or above, in GDAL core and a few select drivers.
Motivation
Multidimensional arrays (also known as hypercubes) are a way of mоdelling spatio-temporal (time series of 2D raster) or spatio-vertical-temporal (2D + Z dimension + time dimension) data which are becoming increasingly more available. GDAL current raster model is however strongly 2D oriented. A number of drivers, such as netCDF, HDF4, HDF5, work around that limitation by using raster bands or subdatasets to expose multiple 2D slices of what is intrinsically a N>2 Multidimensional dataset. It is desirable to have a proper API, and driver support, to be able to expose those multidimensional arrays as such, and be able to perform slice and trim operations on them.
That topic had already been discussed in the past, in particular in this mailing list thread
Proposed changes
A lot of existing GDAL raster API are strongly 2D oriented. Rather than upgrading all those API, and driver code, to be ready for N arbitrary dimensions, which would be a enormous effort for the benefit of only a small fraction of drivers, we propose to add a new dedicated API to support multidimensional arrays. We also want to support hierarchical structure of data as found in the HDF5 format and data model. This model can encompass the needs of other formats/drivers that have multidimensional capabilities such as HDF4, netCDF, GRIB, WCS. Therefore the proposed API will be strongly inspired by the API of the HDF5 library itself.
Data model
The data model is described in: https://github.com/rouault/gdal/blob/rfc75/gdal/doc/source/user/multidim_raster_data_model.rst
C++ API
New classes and methods will be added. See https://github.com/rouault/gdal/blob/rfc75/gdal/gcore/gdal_priv.h#L1715
A new driver capability will be added for drivers supporting multidimensional rasters:
#define GDAL_DCAP_MULTIDIM_RASTER "DCAP_MULTIDIM_RASTER"
A new open flag, GDAL_OF_MULTIDIM_RASTER
, for GDALOpenEx()
will be added. When this is specified, drivers supporting multidimensional
raster will return a root GDALGroup. Otherwise their current traditional 2D
mode will still be used.
New creation options metadata items are added to documents multidimensional dataset, group, dimension, array and attribute creation options.
/** XML snippet with multidimensional dataset creation options.
* @since GDAL 3.1
*/
#define GDAL_DMD_MULTIDIM_DATASET_CREATIONOPTIONLIST "DMD_MULTIDIM_DATASET_CREATIONOPTIONLIST"
/** XML snippet with multidimensional group creation options.
* @since GDAL 3.1
*/
#define GDAL_DMD_MULTIDIM_GROUP_CREATIONOPTIONLIST "DMD_MULTIDIM_GROUP_CREATIONOPTIONLIST"
/** XML snippet with multidimensional dimension creation options.
* @since GDAL 3.1
*/
#define GDAL_DMD_MULTIDIM_DIMENSION_CREATIONOPTIONLIST "DMD_MULTIDIM_DIMENSION_CREATIONOPTIONLIST"
/** XML snippet with multidimensional array creation options.
* @since GDAL 3.1
*/
#define GDAL_DMD_MULTIDIM_ARRAY_CREATIONOPTIONLIST "DMD_MULTIDIM_ARRAY_CREATIONOPTIONLIST"
/** XML snippet with multidimensional attribute creation options.
* @since GDAL 3.1
*/
#define GDAL_DMD_MULTIDIM_ATTRIBUTE_CREATIONOPTIONLIST "DMD_MULTIDIM_ATTRIBUTE_CREATIONOPTIONLIST"
Examples with the netCDF driver:
<MultiDimDatasetCreationOptionList>
<Option name="FORMAT" type="string-select" default="NC4">
<Value>NC</Value>
<Value>NC2</Value>
<Value>NC4</Value>
<Value>NC4C</Value>
</Option>
<Option name="CONVENTIONS" type="string" default="CF-1.6" description="Value of the Conventions attribute" />
</MultiDimDatasetCreationOptionList>
<MultiDimDimensionCreationOptionList>
<Option name="UNLIMITED" type="boolean" description="Whether the dimension should be unlimited" default="false" />
</MultiDimDimensionCreationOptionList>
<MultiDimArrayCreationOptionList>
<Option name="BLOCKSIZE" type="int" description="Block size in pixels" />
<Option name="COMPRESS" type="string-select" default="NONE">
<Value>NONE</Value>
<Value>DEFLATE</Value>
</Option>
<Option name="ZLEVEL" type="int" description="DEFLATE compression level 1-9" default="1" />
<Option name="NC_TYPE" type="string-select" default="netCDF data type">
<Value>AUTO</Value>
<Value>NC_BYTE</Value>
<Value>NC_INT64</Value>
<Value>NC_UINT64</Value>
</Option>
</MultiDimArrayCreationOptionList>
<MultiDimAttributeCreationOptionList>
<Option name="NC_TYPE" type="string-select" default="netCDF data type">
<Value>AUTO</Value>
<Value>NC_BYTE</Value>
<Value>NC_CHAR</Value>
<Value>NC_INT64</Value>
<Value>NC_UINT64</Value>
</Option>
</MultiDimAttributeCreationOptionList>
C API
All C++ methods are mapped to the C API. See https://github.com/rouault/gdal/blob/rfc75/gdal/gcore/gdal.h#L1397
Driver changes
The MEM driver will implement read and write support.
The VRT driver will allow extraction of 2D slices from multidimensional drivers to 2D/classic drivers, as well as multidimensional->multidimensional slicing/trimming
The netCDF driver will implement read and write support.
The HDF4 and HDF5 drivers will implement read support.
The GRIB driver will implement read support (exposing X,Y,Time arrays for GRIB messages only differing by timestamp)
New Utilities
A new gdalmdiminfo utility is added to report the hierarchical structure and content. Its output format is JSON. See https://github.com/rouault/gdal/blob/rfc75/gdal/doc/source/programs/gdalmdiminfo.rst for its documentation.
A new gdalmdimtranslate utility is added to convert multidimensional raster between different formats, and/or can perform selective conversion of specific arrays and groups, and/or subsetting operations. It can also do extraction of 2D slices from multidimensional drivers to 2D/classic drivers. See https://github.com/rouault/gdal/blob/rfc75/gdal/doc/source/programs/gdalmdimtranslate.rst for its documentation.
SWIG binding changes
The C API is mapped to the SWIG bindings. The scope is complete for the Python bindings. Other languages would need to add missing typemaps, but this is not in the scope of the work of this RFC. For Python bindings, NumPy integration is done.
Limitations
This is intended to be a preliminary work on that topic. While the aim is for it to be be usable for the defined scope, it will probably require future enhancements to fill functional and/or performance gaps.
No block cache mechanism (not sure this is needed)
No sub-pixel requests, or non-nearest subsampling
Upgrade of WCS driver or other drivers with potential multidimensional capabilities are not part of this RFC.
SWIG bindings: full scope only for Python bindings.
Backward compatibility
No backward incompatibility. Only API and utility additions.
Documentation
Data model: https://github.com/rouault/gdal/blob/rfc75/gdal/doc/source/user/multidim_raster_data_model.rst
API tutorial: https://github.com/rouault/gdal/blob/rfc75/gdal/doc/source/tutorials/multidimensional_api_tut.rst
gdalmdiminfo: https://github.com/rouault/gdal/blob/rfc75/gdal/doc/source/programs/gdalmdiminfo.rst
gdalmdimtranslate: https://github.com/rouault/gdal/blob/rfc75/gdal/doc/source/programs/gdalmdimtranslate.rst
VRT driver: https://github.com/rouault/gdal/blob/rfc75/gdal/doc/source/drivers/raster/vrt_multidimensional.rst
Testing
The gdalautotest suite is extended to test the modified drivers and the new utilities.
Implementation
The implementation will be done by Even Rouault. A preliminary implementation is available at https://github.com/OSGeo/gdal/pull/1704
Voting history
+1 from HowardB, NormanB and EvenR