RFC 102: Embedding resource files into libgdal
Author: |
Even Rouault |
Contact: |
even.rouault @ spatialys.com |
Started: |
2024-Oct-01 |
Status: |
Adopted, implemented |
Target: |
GDAL 3.11 |
Summary
This RFC uses C23 #embed
pre-processor directive, when available,
to be able to optionally embed GDAL resource files directly into libgdal.
A similar PROJ RFC-8 has been
submitted for PROJ to embed its proj.db
and proj.ini
files.
Motivation
Some parts of GDAL core, mostly drivers, require external resource files located
in the filesystem. Locating these resource files is difficult for use cases where
the GDAL binaries are relocated during installation time.
One such case could be the GDAL embedded in Rasterio or Fiona binary wheels where GDAL_DATA
must be set to the directory of the resource files.
Web-assembly (WASM) use cases come also to mind as users of GDAL builds where
resources are directly included in libgdal.
Technical solution
The C23 standard includes a #embed "filename" pre-processor directive that ingests the specified filename and returns its content as tokens that can be stored in a unsigned char or char array.
Getting the content of a file into a variable is as simple as the following (which also demonstrates adding a nul-terminating character when this is needed):
static const char szPDS4Template[] = {
#embed "data/pds4_template.xml"
, '\0'};
Compiler support
Support for that directive is still very new. clang 19.1 is the first compiler which has a release including it, and has an efficient implementation of it, able to embed very large files with minimum RAM and CPU usage.
The development version of GCC 15 also supports it, but in a non-optimized way for now. i.e. trying to include large files, of several tens of megabytes could cause significant compilation time, but without impact on runtime. This is not an issue for GDAL use cases, and there is intent from GCC developers to improve this in the future.
Embedding PROJ's proj.db
of size 9.1 MB with GCC 15dev at time of writing takes
18 seconds and 1.7 GB RAM, compared to 0.4 second and 400 MB RAM for clang 19,
which is still reasonable (Generating proj.db
itself from its source .sql files
takes one minute on the same system).
There is no timeline for Visual Studio C/C++ at time of writing (it has been requested by users)
To be noted that currently clang 19.1 only supports #embed
in .c files, not
C++ ones (the C++ standard has not yet adopted this feature). So embedding
resources must be done in a .c file, which is obviously not a problem since
we can easily export symbols/functions from a .c file to be available by C++.
New CMake options
Resources will only be embedded if the new EMBED_RESOURCE_FILES
CMake option
is set to ON
. This option will default to ON
for static library builds
and if C23 ``#embed` is detected to be available. Users might also turn it to ON for
shared library builds. A CMake error is emitted if the option is turned on but
the compiler lacks support for it.
A complementary CMake option USE_ONLY_EMBEDDED_RESOURCE_FILES
will also
be added. It will default to OFF
. When set to ON, GDAL will not try to
locate resource files in the GDAL_DATA directory burnt at build time into libgdal
(${install_prefix}/share/gdal
), or by the GDAL_DATA
configuration option.
Said otherwise, if EMBED_RESOURCE_FILES=ON
but USE_ONLY_EMBEDDED_RESOURCE_FILES=OFF
,
GDAL will first try to locate resource files from the file system, and
fallback to the embedded version if not found.
The resource files will still be installed in ${install_prefix}/share/gdal
,
unless USE_ONLY_EMBEDDED_RESOURCE_FILES
is set to ON.
Impacted code
gcore: embedding LICENSE.TXT, and tms_*.json files
frmts/grib: embedding GRIB2 CSV files
frmts/hdf5: embedding bag_template.xml
frmts/nitf: embedding nitf_spec.xml
frmts/pdf: embedding pdf_composition.xml
frmts/pds: embedding pds4_template.xml and vicar.json
ogr/ogrsf_frmts/dgn: embedding seed_2d.dgn and seed_3d.dgn
ogr/ogrsf_frmts/dxf: embedding header.dxf and leader.dxf
ogr/ogrsf_frmts/gml: embedding .gfs files and gml_registry.xml
ogr/ogrsf_frmts/gmlas: embedding gmlasconf.xml
ogr/ogrsf_frmts/miramon: embedding MM_m_idofic.csv
ogr/ogrsf_frmts/osm: embedding osm_conf.ini
ogr/ogrsf_frmts/plscenes: embedding plscenesconf.json
ogr/ogrsf_frmts/s57: embedding s57*.csv files
ogr/ogrsf_frmts/sxf: embedding default.rsc
ogr/ogrsf_frmts/vdv: embedding vdv452.xml
Considered alternatives
Including resource files into libraries has been a long-wished feature of C/C++.
Different workarounds have emerged over the years, such as the use of the
od -x
utility, GNU ld
linker -b
mode, or CMake-based solutions such
as https://jonathanhamberg.com/post/cmake-file-embedding/
We could potentially use the later to address non-C23 capable compilers, but we have chosen not to do that, for the sake of implementation simplicity. And, if considering using the CMake trick as the only solution, we should note that C23 #embed has the potential for better compile time, as demonstrated by clang implementation.
Backward compatibility
Fully backwards compatible.
C23 is not required, unless EMBED_RESOURCE_FILES is enabled in GDAL.
Documentation
The 2 new CMake variables will be documented.
Testing
The existing fedora:rawhide continuous integration target, which has now clang 19.1 available, will be modified to test the effect of the new variables.
Local builds using GCC 15dev builds of https://jwakely.github.io/pkg-gcc-latest/ have also be successfully done during the development of the candidate implementation
Voting history
+1 from PSC members JukkaR, JavierJS, KurtS, HowardB and EvenR