RFC 94: Numeric fields width/precision metadata

Author:	Alessandro Pasotti
Contact:	elpaso @ itopen.it
Started:	2023-Feb-17
Status:	Adopted, implemented
Target:	GDAL 3.7

Summary

The document proposes and describes the introduction of a couple of new vector driver metadata that return information for numeric real fields about how the precision and width have been calculated by the driver and have to be interpreted.

Motivation

Applications (for example OGR/GDAL utils and QGIS) may require information about the field width and precision in order to convert between different formats or to validate user input on editing.

Different drivers may calculate the width and precision differently by including or not the decimal separator and/or the minus sign into the calculated length.

There is currently no way for an application to access this information and this can lead to loss of width or precision while converting between formats or when the application decides to stay on the safe side and reduce the width reported by GDAL.

Additionally, attention must be paid to the meaning of "width" and "precision": OGR "width" corresponds to SQL "precision" and OGR "precision" corresponds to SQL "scale".

For reference:

Current drivers behavior

Here is a list of the drivers and how they behave with respect to the width and precision, (for databases we refer to the NUMERIC data type).

Driver	Width Includes Decimal Separator	Width Includes Sign
Shapefile	YES	YES
MapInfo	YES	YES
PostgreSQL	NO	NO
MySQL	NO	NO
MSSQL	NO	NO
OCI	NO	NO
GPKG	N/A	N/A
CSV (from .csvt)	YES	YES
HANA	NO	NO
FlatGeoBuf	NO	NO
FileGDB	N/A	N/A
GML	NO	NO
MEM	N/A	N/A

Notes about specific drivers

GPKG: SQLite column affinity storage is 8-byte IEEE floating point number
GML: xsd:decimal with totalDigits and fractionDigits, xs:totalDigits defines the maximum number of digits of decimal and derived datatypes (both after and before the decimal point, not counting the decimal point itself). xs:fractionDigits` defines the maximum number of fractional digits (i.e., digits that are after the decimal point) of an xs:decimal datatype.
FlatGeoBuf: for Float fields, OGR_width = flatgeobuf_precision and OGR_precision = flatgeobuf_scale (if flatgeobuf_scale != -1, or 0 if flatgeobuf_scale == -1)
FileGDB: Scale is the number of digits to the right of the decimal point in a number. For example, the number 56.78 has a scale of 2. Scale applies only to fields that are double. Scale is always returned as 0 from personal or File geodatabase fields. Precision is the number of digits in a number. For example, the number 56.78 has a precision of 4. Precision is only valid for fields that are numeric. Precision is always returned as 0 from personal or File geodatabase fields

Technical details

The change will be done at the driver metadata level with a couple of flags that report the capabilities.

The vector drivers will expose a metadata entry to define if the width of the fields includes the decimal separator and/or the sign.

If the metadata entry is undefined the feature is not supported (there is no client-accessible width or precision constraint for numeric fields).

The feature will be exposed to the ogr2ogr application with a switch to enable/disable the width reduction when converting from a format where width includes the minus sign to a format where it doesn't. This is useful when the user knows that the input data are all negative values because in that that case it won't be necessary to add the extra width to store it.

To clarify, when converting from DBF to SQL the width is decreased by 1, when converting from SQL to DBF the width is increased by 2.

"-1.23" for .DBF needs width=5 and precision=2. In SQL, it needs precision=3 and scale=2 "12.34" for DBF needs width=5 and precision=2. In SQL, it needs precision=4 and scale=2

This means that when converting from DBF to SQL we can safely reduce the width by 1 but we cannot safely reduce it by 2 unless we are sure all values are negative.

Example API:

/** Capability set by a vector driver that supports field width and precision.
*
* This capability reflects that a vector driver includes the decimal separator
* in the field width.
*
* See GDAL_DMD_NUMERIC_FIELD_WIDTH_INCLUDES_SIGN for a related capability flag.
* @since GDAL 3.7
*/
#define GDAL_DMD_NUMERIC_FIELD_WIDTH_INCLUDES_DECIMAL_SEPARATOR "DMD_NUMERIC_FIELD_WIDTH_INCLUDES_DECIMAL_SEPARATOR"

/** Capability set by a vector driver that supports field width and precision.
*
* This capability reflects that a vector driver includes the (minus) sign
* in the field width.
*
* See GDAL_DMD_NUMERIC_FIELD_WIDTH_INCLUDES__DECIMAL_SEPARATOR for a related capability flag.
* @since GDAL 3.7
*/
#define GDAL_DMD_NUMERIC_FIELD_WIDTH_INCLUDES_SIGN "DMD_NUMERIC_FIELD_WIDTH_INCLUDES_SIGN"

Efficiency considerations

None.

Backward compatibility

None.

SWIG Bindings

This implementation will be exposed to bindings.

Testing

A C++ test will be added to the test suite.

Voting history

+1 from PSC members EvenR and JukkaR