RFC 41 : Support for multiple geometry fields in OGR

Summary

Add read/write support in the OGR data model for features with multiple geometry fields.

Motivation

The OGR data model is currently tied to a single geometry field per feature, feature definition and layer. But a number of data formats support multiple geometry fields. The OGC Simple Feature Specifications also do not limit to one geometry field per layer (e.g. §7.1.4 of OGC 06-104r4 "OpenGIS® Implementation Standard for Geographic information - Simple feature access -Part 2: SQL option).

There are workarounds : using geometries of type GEOMETRYCOLLECTION, or advertizing as many layers as there are geometry columns in the layer (like currently done in the PostGIS or SQLite drivers). All those approach are at best workarounds that suffer from limitations :

GEOMETRYCOLLECTION approach : no way to know the name/semantics of each sub-geometry. All sub-geometries must be expressed in the same SRS. No way of guaranteeing that the GEOMETRYCOLLECTION has always the same number of sub-geometries or that there are of a consistent geometry type.
one layer per geometry column approach : only appropriate for read-only scenarios. Cannot work in write scenarios.

The purpose of this RFC is to make support for multiple geometry fields per feature to be properly taken into account in the OGR data model.

Proposed solution

(Note: alternative solutions have also been studied. They are explained in a following section of this RFC.)

To sum it up, geometry fields will be treated similarly as attribute fields are handled at the OGRFeatureDefn and OGRFeature levels, but they will be kept separate. Attribute fields and geometry fields will have their own separate indexing in the feature definition.

This choice has been mainly made to maximize backward compatibility, while offering new capabilities.

Its involves creating a OGRGeomFieldDefn class, and changes in OGRFieldDefn, OGRFeatureDefn, OGRFeature and OGRLayer classes.

OGRGeomFieldDefn class

The OGRGeomFieldDefn is a new class. Its structure is directly inspired from the OGRFieldDefn class.

class CPL_DLL OGRGeomFieldDefn
{
protected:
        char                *pszName;
        OGRwkbGeometryType   eGeomType; /* all values possible except wkbNone */
        OGRSpatialReference* poSRS;

        int                 bIgnore;

public:
                            OGRGeomFieldDefn(char *pszName,
                                             OGRwkbGeometryType eGeomType);
        virtual            ~OGRGeomFieldDefn();

        void                SetName( const char * );
        const char         *GetNameRef();

        OGRwkbGeometryType  GetType();
        void                SetType( OGRwkbGeometryType eTypeIn );

        virtual OGRSpatialReference* GetSpatialRef();
        void                 SetSpatialRef(OGRSpatialReference* poSRS);

        int                 IsIgnored();
        void                SetIgnored( int bIgnoreIn );
};

One can notice that the member variables were to be found at OGRLayer level previously.

The SRS object is ref-counted. The reference count is increased in the constructor and in SetSpatialRef(), and decreased in the destructor.

GetSpatialRef() is deliberately set virtual, so that lazy evaluation can be implemented (getting SRS can have a noticeable cost in some driver implementations, like reading an extra file, or issuing a SQL request).

OGRFeatureDefn class

The OGRFeatureDefn class will be extended as the following :

class CPL_DLL OGRFeatureDefn
{
  protected:
        // Remove OGRwkbGeometryType eGeomType and bIgnoreGeometry and
        // add instead the following :

        int nGeomFieldCount;
        OGRGeomFieldDefn* papoGeomFieldDefn;
  public:
        virtual int         GetGeomFieldCount();
        virtual OGRGeomFieldDefn *GetGeomFieldDefn( int i );
        virtual int         GetGeomFieldIndex( const char * );

        virtual void        AddGeomFieldDefn( OGRGeomFieldDefn * );
        virtual OGRErr      DeleteGeomFieldDefn( int iGeomField );

        // Route OGRwkbGeometryType GetGeomType() and void SetGeomType()
        // on the first geometry field definition.

        // Same for IsGeometryIgnored() and SetGeometryIgnored()
}

At instantiation, OGRFeatureDefn would create a default geometry field definition of name "" and type wkbUnknown. If SetGeomType() is called, this will be routed on papoGeomFieldDefn[0]. If only one geometry field definition exists, SetGeomType(wkbNone) will remove it.

GetGeomType() will be routed on papoGeomFieldDefn[0] if it exists. Otherwise it will return wkbNone.

It is strongly advised that there is name uniqueness among the combined set of regular field names and the geometry field names. Failing to do so will result in unspecified behavior in SQL queries. This advice will not be checked by the code (it is currently not done for regular fields).

Another change is to make all the existing methods of OGRFeatureDefn virtual (and change private visibility to protected), so this class can be subclassed if needed. This will enable lazy creation of the object. Justification: establishing the full feature definition can be costly. But applications may want to list all the layers of a datasource, and only present some information that is important, but cheap to establish. In the past, OGRLayer::GetName() and OGRLayer::GetGeomType() have been introduced in order to workaround for that.

Note also that ReorderGeomFieldDefns() is not foreseen for the moment. It could be added in a later step, should the need arises. DeleteGeomFieldDefn() is mostly there for the own benefit of OGRFeatureDefn itself when calling SetGeomType(wkbNone).

OGRFeature class

The OGRFeature class will be extended as following :

class CPL_DLL OGRFeature
{
  private:
        // Remove poGeometry field and add instead
        OGRGeometry** papoGeometries; /* size is given by poFDefn->GetGeomFieldCount() */

  public:

        int                 GetGeomFieldCount();
        OGRGeomFieldDefn   *GetGeomFieldDefnRef( int iField );
        int                 GetGeomFieldIndex( const char * pszName);

        OGRGeometry*        GetGeomFieldRef(int iField);
        OGRErr              SetGeomFieldDirectly( int iField, OGRGeometry * );
        OGRErr              SetGeomField( int iField, OGRGeometry * );

        // Route SetGeometryDirectly(), SetGeometry(), GetGeometryRef(),
        // StealGeometry() on the first geometry field in the array

        // Modify implementation of SetFrom() to replicate all geometries
}

Note: before RFC41, SetGeometry() or SetGeometryDirectly() could work on a feature whose feature definition had a GetGeomType() == wkbNone (which was inconsistent). This will be no longer the case since the size of the papoGeometries array is now based on GetGeomFieldCount(), and when GetGeomType() == wkbNone, the geometry field count is 0. The VRT and CSV drivers will be fixed to declare their geometry type consistently.

OGRLayer class

Impact on OGRLayer class :

Spatial filter: the option considered is to only allow one spatial filter at the time.
- the need for spatial filters applied simultaneously on several geometry fields is not obvious.
- the m_poFilterGeom protected member is used more than 250 times in the OGR code base, so turning it into an array would be a tedious task...
Additions:

protected:
    int m_iGeomFieldFilter // specify the index on which the spatial
                           // filter is active.

public:
    virtual void        SetSpatialFilter( int iGeomField, OGRGeometry * );
    virtual void        SetSpatialFilterRect( int iGeomField,
                                            double dfMinX, double dfMinY,
                                            double dfMaxX, double dfMaxY );

GetNextFeature() implementation must check the m_iGeomFieldFilter index
in order to select the appropriate geometry field.

GetGeomType() : unchanged. For other fields, use GetLayerDefn()->GetGeomField(i)->GetType()
GetSpatialRef(): Currently the default implementation returns NULL. It will be changed to return GetLayerDefn()->GetGeomField(0)->GetSpatialRef() (if there is at least one geometry field). New drivers are encouraged not to specialize GetSpatialRef() anymore, but to appropriately set the SRS of their first geometry field. For other fields, use GetLayerDefn()->GetGeomField(i)->GetSpatialRef().

Caveat: as SRS wasn't previously stored at the OGRFeatureDefn level, all existing drivers, if not updated, will have GetGeomField(0)->GetSpatialRef() returning NULL. The test_ogrsf utility will check and warn about this. Update of existing drivers will be made progressively. In the mean time, using OGRLayer::GetSpatialRef() will be advized to get the SRS of the first geometry field in a reliable way.
add :

virtual OGRErr GetExtent(int iGeomField, OGREnvelope *psExtent,
                         int bForce = TRUE);

Default implementation would call GetExtent() if iGeomField == 0

add :

virtual OGRErr CreateGeomField(OGRGeomFieldDefn *poField);

no DeleteGeomField(), ReorderGeomFields() or AlterGeomFieldDefn() for now. Could be added later if the need arises.
GetGeometryColumn() : unchanged. Routed onto the first geometry field. For other fields, use GetLayerDefn()->GetGeomField(i)->GetNameRef()
SetIgnoredFields() : iterate over the geometry fields in addition to regular fields. The special "OGR_GEOMETRY" value will only apply to the first geometry field.
Intersection(), Union(), etc... : unchanged. Later improvements could use the papszOptions parameter to specify an alternate geometry field
TestCapability(): add a OLCCreateGeomField capability to inform if CreateGeomField() is implemented.

OGRDataSource class

Impact on OGRDataSource class :

CreateLayer() : signature will be unchanged. If more than one geometry fields are needed, OGRLayer::CreateGeomField() must be used. If the name of the first geometry field must be specified, for datasources supporting ODsCCreateGeomFieldAfterCreateLayer, using code should call CreateLayer() with eGType = wkbNone and then add all geometry fields with OGRLayer::CreateGeomField().
CopyLayer() : adapted to replicate all geometry fields (if supported by target layer)
ExecuteSQL() : takes a spatial filter. In the case of the generic OGR SQL implementation, this filter is a facility. It could also as well be applied on the returned layer object. So there is no real need for adding a way of specifying the geometry field at the ExecuteSQL() API level.
TestCapability(): add a ODsCCreateGeomFieldAfterCreateLayer capability to inform if CreateGeomField() is implemented after layer creation and that CreateLayer() can be safely called with eGType = wkbNone.

Explored alternative solutions

( This paragraph can be skipped if you are totally convinced by the proposed approach detailed above :-) )

A possible alternative solution would have been to extend the existing OGRFieldDefn object with information related to the geometry. That would have involved adding a OFTGeometry value in the OGRFieldType enumeration, and adding the OGRwkbGeometryType eGeomType and OGRSpatialReference* poSRS members to OGRFieldDefn. At OGRFeature class level, the OGRField union could have been extended with a OGRGeometry* field. Similarly at OGRLayer level, CreateField() could have been used to create new geometry fields.

The main drawback of this approach, which seems the most natural way, is backward compatibility. This would have affected all places in OGR own code or external code where fields are retrieved and geometry is not expected. For example, in code like the following (very common in the CreateFeature() of most drivers, or in user code consuming features returned by GetNextFeature()) :

switch( poFieldDefn->GetType() )
{
        case OFTInteger: something1(poField->GetFieldAsInteger()); break;
        case OFTReal: something2(poField->GetFieldAsDouble()): break;
        default: something3(poField->GetFieldAsString()); break;
}

This would lead, for legacy code, to geometry being handled as regular field. We could imagine that GetFieldAsString() converts the geometry as WKT, but it is doubtfull that this would really be desired. Fundamentally, the handling of attribute and geometry fields is different in most use cases.

(On the other side, if we introduce 64bit integer as a OGR type (this is an RFC that is waiting for implementation...), the above code would still produce a meaningful result. The string representation of a 64bit integer is not that bad as a default behavior.)

GetFieldCount() would also take into account geometry fields, but in most cases, you would need to subtract them.

A possible way of avoiding the above compatibility issue would be to have 2 sets of API at OGRFeatureDefn and OGRFeature level. The current one, that would ignore the geometry fields, and an "extended" one that would take them into account. For example, OGRFeatureDefn::GetFieldCountEx(), OGRFeatureDefn::GetFieldIndexEx(), OGRFeatureDefn::GetFieldDefnEx(), OGRFeature::GetFieldEx(), OGRFeature::SetFieldAsXXXEx() would take into account both attribute and geometry fields. The annoying thing with that approach is the duplication of the ~ 20 methods GetField() and SetFieldXXX() in OGRFeature.

C API

The following functions are added to the C API :

/* OGRGeomFieldDefnH */

typedef struct OGRGeomFieldDefnHS *OGRGeomFieldDefnH;

OGRGeomFieldDefnH    CPL_DLL OGR_GFld_Create( const char *, OGRwkbGeometryType ) CPL_WARN_UNUSED_RESULT;
void                 CPL_DLL OGR_GFld_Destroy( OGRGeomFieldDefnH );

void                 CPL_DLL OGR_GFld_SetName( OGRGeomFieldDefnH, const char * );
const char           CPL_DLL *OGR_GFld_GetNameRef( OGRGeomFieldDefnH );

OGRwkbGeometryType   CPL_DLL OGR_GFld_GetType( OGRGeomFieldDefnH );
void                 CPL_DLL OGR_GFld_SetType( OGRGeomFieldDefnH, OGRwkbGeometryType );

OGRSpatialReferenceH CPL_DLL OGR_GFld_GetSpatialRef( OGRGeomFieldDefnH );
void                 CPL_DLL OGR_GFld_SetSpatialRef( OGRGeomFieldDefnH,
                                                     OGRSpatialReferenceH hSRS );

int                  CPL_DLL OGR_GFld_IsIgnored( OGRGeomFieldDefnH hDefn );
void                 CPL_DLL OGR_GFld_SetIgnored( OGRGeomFieldDefnH hDefn, int );

/* OGRFeatureDefnH */

int               CPL_DLL OGR_FD_GetGeomFieldCount( OGRFeatureDefnH hFDefn );
OGRGeomFieldDefnH CPL_DLL OGR_FD_GetGeomFieldDefn( OGRFeatureDefnH hFDefn, int i );
int               CPL_DLL OGR_FD_GetGeomFieldIndex( OGRFeatureDefnH hFDefn, const char * );

void              CPL_DLL OGR_FD_AddGeomFieldDefn( OGRFeatureDefnH hFDefn, OGRGeomFieldDefnH );
OGRErr            CPL_DLL OGR_FD_DeleteGeomFieldDefn( OGRFeatureDefnH hFDefn, int iGeomField );

/* OGRFeatureH */

int               CPL_DLL OGR_F_GetGeomFieldCount( OGRFeatureH hFeat );
OGRGeomFieldDefnH CPL_DLL OGR_F_GetGeomFieldDefnRef( OGRFeatureH hFeat, int iField );
int               CPL_DLL OGR_F_GetGeomFieldIndex( OGRFeatureH hFeat, const char * pszName);

OGRGeometryH      CPL_DLL OGR_F_GetGeomFieldRef( OGRFeatureH hFeat, int iField );
OGRErr            CPL_DLL OGR_F_SetGeomFieldDirectly( OGRFeatureH hFeat, int iField, OGRGeometryH );
OGRErr            CPL_DLL OGR_F_SetGeomField( OGRFeatureH hFeat, int iField, OGRGeometryH );

/* OGRLayerH */

void     CPL_DLL OGR_L_SetSpatialFilterEx( OGRLayerH, int iGeomField, OGRGeometryH );
void     CPL_DLL OGR_L_SetSpatialFilterRectEx( OGRLayerH, int iGeomField,
                                               double dfMinX, double dfMinY,
                                               double dfMaxX, double dfMaxY );
OGRErr   CPL_DLL OGR_L_GetExtentEx( OGRLayerH, int iGeomField,
                                    OGREnvelope *psExtent, int bForce );
OGRErr   CPL_DLL OGR_L_CreateGeomField( OGRLayerH, OGRGeomFieldDefnH hFieldDefn );

OGR SQL engine

Currently, "SELECT fieldname1[, ...fieldnameN] FROM layername" returns the specified fields, as well as the associated geometry. This behavior is clearly not following the behavior of spatial RDBMS where the geometry field must be explicitly specified.

The following compromise between backward compatibility and the new capabilities of this RFC is adopted :

if no geometry field is explicitly specified in the SELECT clause, and there is only one geometry fields associated with the layer, then return it implicitly
otherwise, only return the explicitly mentioned geometry fields (or all geometry fields if "*" is used).

Limitations

Geometries from joined layers will not be fetched, as currently.
UNION ALL will only handle the default geometry, as currently. (could be extended in later work.)
The special fields OGR_GEOMETRY, OGR_GEOM_WKT and OGR_GEOM_AREA will operate on the first geometry field. It does not seem wise to extend this ad-hoc syntax. A better alternative will be the OGR SQLite dialect (with Spatialite support), once it is updated to support multi-geometry (not in the scope of this RFC)

Drivers

Updated drivers in the context of this RFC

PostGIS:
- a ad-hoc form of support already exists. Tables with multiple geometries are reported currently as layers called "table_name(geometry_col_name)" (as many layers as geometry columns). This behavior will be changed so that the table is reported only once as a OGR layer.
PGDump:
- add write support for multi-geometry tables.
Memory:
- updated as a simple illustration of the new capabilities.
Interlis:
- updated to support multiple geometry fields (as well as other changes unrelated to this RFC)

Other candidate drivers (upgrade not originally covered by this RFC)

GML driver : currently, only one geometry per feature reported. Possibility of changing this by hand-editing of the .gfs file --> implemented post RFC in GDAL 1.11
SQLite driver :
- currently, same behavior as current PostGIS driver.
- both the driver and the SQLite dialect could be updated to support multi-geometry layers. --> implemented post RFC in GDAL 2.0
Google Fusion Tables driver : currently, only the first found geometry column used. Possibility of specifying "table_name(geometry_column_name)" as the layer name passed to GetLayerByName().
VRT : some thoughts needed to find the syntax to support multiple geometries. Impacted XML syntax : . at OGRVRTLayer element level : GeometryType, LayerSRS, GeomField, SrcRegion, ExtentXMin/YMin/XMax/YMax, . at OGRVRTWarpedLayer element level : add new element to select the geometry field . at OGRVRTUnionLayer element level : GeometryType, LayerSRS, ExtentXMin/YMin/XMax/YMax --> implemented post RFC in GDAL 1.11
CSV : currently, take geometries from column named "WKT". To be extended to support multiple geometry columns. Not sure worth the effort. Could be done with the extended VRT driver. --> implemented post RFC in GDAL 1.11
WFS : currently, only single-geometry layers supported. The standard allows multi-geometry. Would require GML driver support first.
Other RDBMS based drivers: MySQL ?, MSSQLSpatial ? Oracle Spatial ?

Utilities

ogrinfo

ogrinfo will be updated to report information related to multi-geometry support. Output is expected to be unchanged w.r.t current output in the case of single-geometry datasource.

Expected output for multi-geometry datasource:

$ ogrinfo PG:dbname=mydb
INFO: Open of `PG:dbname=mydb'
      using driver `PostgreSQL' successful.
1: test_multi_geom (Polygon, Point)

$ ogrinfo PG:dbname=mydb -al
INFO: Open of `PG:dbname=mydb'
      using driver `PostgreSQL' successful.

Layer name: test_multi_geom
Geometry (polygon_geometry): Polygon
Geometry (centroid_geometry): Point
Feature Count: 10
Extent (polygon_geometry): (400000,4500000) - (500000, 5000000)
Extent (centroid_geometry): (2,48) - (3,49)
Layer SRS WKT (polygon_geometry):
PROJCS["WGS 84 / UTM zone 31N",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",3],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AXIS["Easting",EAST],
    AXIS["Northing",NORTH],
    AUTHORITY["EPSG","32631"]]
Layer SRS WKT (centroid_geometry):
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4326"]]
FID Column = ogc_fid
Geometry Column 1 = polygon_geometry
Geometry Column 2 = centroid_geometry
area: Real
OGRFeature(test_multi_geom):1
  area (Real) = 500
  polygon_geometry = POLYGON ((400000 4500000,400000 5000000,500000 5000000,500000 4500000,400000 4500000))
  centroid_geometry = POINT(2.5 48.5)

A "-geomfield" option will be added to specify on which field the -spat option applies.

ogr2ogr

Enhancements :

will translate multi-geometry layers into multi-geometry layers if supported by output layer (OLCCreateGeomField capability). In case it is not supported, only translates the first geometry.
"-select" option. If only attribute field names are specified, all input geometries will be implicitly selected (backward compatible behavior). If one or several geometry field names are specified, only those ones will be selected.
add a "-geomfield" option to specify on which field the -spat option applies
the various geometry transformations (reprojection, clipping, etc.) will be applied on all geometry fields.

test_ogrsf

Will be enhanced with a few consistency checks :

OGRLayer::GetSpatialRef() == OGRFeatureDefn::GetGeomField(0)->GetSpatialRef()
OGRLayer::GetGeomType() == OGRFeatureDefn::GetGeomField(0)->GetGeomType()
OGRLayer::GetGeometryColumn() == OGRFeatureDefn::GetGeomField(0)->GetNameRef()

Spatial filtering tests will loop over all geometry fields.

Documentation

In addition to function level documentation, the new capability will be documented in the ベクターデーターモデル and ベクターAPIチュートリアル documents.

Python and other language bindings

The new C API will be mapped to SWIG bindings. It will be only tested with the Python bindings. No new typemaps are expected, so this should work with other languages in a straightforward way.

Compatibility

Changes are only additions to the existing API, and existing behavior should be preserved, so this will be backwards compatible.
C++ ABI changes
Change of behavior in PostGIS driver w.r.t GDAL 1.10 for tables with multiple geometries.

Implementation

Even Rouault will implement the above described changes for GDAL 1.11 release, except the upgrade of the Interlis driver that will be done by Pirmin Kalberer.

Funding

This work is funded by the Federal Office of Topography (swisstopo), COGIS

Voting history

+1 from EvenR, FrankW, HowardB, DanielM and TamasS