RFC 57: 64-bit bucket counts for histograms
Author: Even Rouault
Contact: even dot rouault at spatialys dot com
Status: Adopted, implemented
Version: 2.0
Summary
This RFC modifies the GDALRasterBand GetHistogram(), GetDefaultHistogram() and SetDefaultHistogram() methods to accept arrays of 64-bit integer instead of the current arrays of 32-bit integer for bucket counts. It also changes GetRasterSampleOverview() to take a 64-bit integer. This will fix issues when operating on large rasters that have more than 2 billion pixels.
Core changes
The following methods of GDALRasterBand class are modified to take a GUIntBig* argument for GetHistogram() and SetDefaultHistograph(), GUIntBig** for GetDefaultHistogram() and GUIntBig for GetRasterSampleOverview()
virtual CPLErr GetHistogram( double dfMin, double dfMax,
int nBuckets, GUIntBig * panHistogram,
int bIncludeOutOfRange, int bApproxOK,
GDALProgressFunc, void *pProgressData );
virtual CPLErr GetDefaultHistogram( double *pdfMin, double *pdfMax,
int *pnBuckets, GUIntBig ** ppanHistogram,
int bForce,
GDALProgressFunc, void *pProgressData);
virtual CPLErr SetDefaultHistogram( double dfMin, double dfMax,
int nBuckets, GUIntBig *panHistogram );
virtual GDALRasterBand *GetRasterSampleOverview( GUIntBig );
PAM serialization/deserialization is also updated.
C API changes
Only additions :
CPLErr CPL_DLL CPL_STDCALL GDALGetRasterHistogramEx( GDALRasterBandH hBand,
double dfMin, double dfMax,
int nBuckets, GUIntBig *panHistogram,
int bIncludeOutOfRange, int bApproxOK,
GDALProgressFunc pfnProgress,
void * pProgressData );
CPLErr CPL_DLL CPL_STDCALL GDALGetDefaultHistogramEx( GDALRasterBandH hBand,
double *pdfMin, double *pdfMax,
int *pnBuckets, GUIntBig **ppanHistogram,
int bForce,
GDALProgressFunc pfnProgress,
void * pProgressData );
CPLErr CPL_DLL CPL_STDCALL GDALSetDefaultHistogramEx( GDALRasterBandH hBand,
double dfMin, double dfMax,
int nBuckets, GUIntBig *panHistogram );
GDALRasterBandH CPL_DLL CPL_STDCALL
GDALGetRasterSampleOverviewEx( GDALRasterBandH, GUIntBig );
The existing methods GDALGetRasterHistogram(), GDALGetDefaultHistogram() and GDALSetDefaultHistogram() are marked deprecated. They internally call the 64-bit methods, and, for GDALGetRasterHistogram() and GDALGetDefaultHistogram(), warn if a 32-bit overflow would occur, in which case the bucket count is set to INT_MAX.
Changes in drivers
All in-tree drivers that use/implement the C++ histogram methods are modified: ECW, VRT, MEM and HFA.
Changes in utilities
gdalinfo and gdalenhance are modified to use the modified methods.
Changes in SWIG bindings
For Python bindings only, RasterBand.GetHistogram(), GetDefaultHistogram() and SetDefaultHistogram() use the new 64-bit C functions.
Other bindings could be updated, but likely need new typemaps for (int, GUIntBig*). In the meantime, they still use the 32-bit C functions.
Compatibility
This modifies the C++ API and ABI.
Out-of-tree drivers must make sure to take into account the updated C++ API if they implement some of the 4 modified virtual methods.
Documentation
All new/modified methods/functions are documented. MIGRATION_GUIDE.TXT is updated with a new section for this RFC.
Testing
Setting/getting 64 bit values is tested in gcore/pam.y and gdrivers/mem.py
Implementation
Implementation will be done by Even Rouault (Spatialys).
The proposed implementation lies in the "histogram_64bit_count" branch of the https://github.com/rouault/gdal2/tree/histogram_64bit_count.
The list of changes : https://github.com/rouault/gdal2/compare/histogram_64bit_count
Voting history
+1 from DanielM, JukkaR and EvenR