RFC 12: Improved File Management

Author: Frank Warmerdam

Contact: warmerdam@pobox.com

Status: Adopted / Implemented

Summary

Some applications using GDAL have a requirement to provide file management operations through the GUI. This includes deleting, renaming, moving and packaging up datasets which often requires operations on several associated files. This RFC introduces an operation on a GDALDataset to identify all the dataset files, and operations to move or copy them.

GetFileList()

The following new virtual method is added on the GDALDataset class, with an analogous C function.

virtual char   **GDALDataset::GetFileList(void);

The method is intended to return a list of files associated with this open dataset. The return is a NULL terminated string list which becomes owned by the caller and should be deallocated with CSLDestroy().

The default implementation tests the name of the datasource to see if it is a file, and if so it is returned otherwise an empty list is returned. If the default overview manager is active, and has overviews, those will also be included in the file list. The default implementation also checks for world files, but only those with extensions based on the original files extension (ie. .tfw or .tifw for .tif) but does not search for .wld since that is not very specific.

The GDALPamDataset::GetFileList() method will add the ability to find .aux and .aux.xml files associated with a dataset to the core default behavior.

pfnRename()

The following new function is added to the GDALDriver class.

CPLErr       (*pfnRename)( const char *pszNewName, const char *pszOldName );

Also a corresponding function is added to the C API.

CPLErr        GDALRenameDataset( GDALDriverH hDriver, const char *pszNewName, const char *pszOldName );

Note that renaming is done by the driver, but the dataset to be operated on should not be open at the time. GDALRenameDataset() will invoke pfnRename if it is non-NULL.

If pfnRename is NULL the default implementation will be used which will open the dataset, fetch the file list, close the dataset, and then try to rename all the files (based on shared basenames). The default rename operation will fail if it is unable to establish a relationship between the files (ie. a common basename or stem) to indicate how the group of files should be rename to the new pattern.

Optionally a NULL hDriver argument may be passed in, in which case the appropriate driver will be selected by first opening the datasource.

CPLMoveFile()

The POSIX rename() function on which VSIRename() is usually based does not normally allow renaming files between file systems or between different kinds of file systems (ie. /vsimem to C:/abc). In order to implement GDALRenameDataset() such that it works efficiently within a file system, but still works between file systems, a new operation will be added to gdal/port. This is the CPLMoveFile() function which will first try a VSIRename(). If that fails it will use CPLCopyFile() to copy the whole file and then VSIUnlink() to get rid of the old file.

int CPLMoveFile( const char *pszNewFilename, const char *pszOldFilename );

The return value will be zero on success, otherwise an errno style value.

It should be noted that in some error conditions, such as the destination file system running out of space during a copy, it may happen that some files for a dataset get renamed, and some do not leaving things in an inconsistent state.

pfnCopyFiles()

The following new function is added to the GDALDriver class.

CPLErr       (*pfnCopyFiles)( const char *pszNewName, const char *pszOldName );

Also a corresponding function is added to the C API.

CPLErr        GDALCopyDatasetFiles( GDALDriverH hDriver, const char *pszNewName, const char *pszOldName );

Note that copying is done by the driver. The dataset may be opened, but if opened in update mode it may be prudent to first do a flush to synchronize the in-process state with what is on disk. GDALCopyDatasetFiles() will invoke pfnCopyFiles if it is non-NULL.

If pfnCopy is NULL the default implementation will be used which will open the dataset, fetch the file list, close the dataset, and then try to copy all the files (based on shared basenames). The default copy operation will fail if it is unable to establish a relationship between the files (ie. a common basename or stem) to indicate how the group of files should be renamed to the new pattern.

Optionally a NULL hDriver argument may be passed in, in which case the appropriate driver will be selected by first opening the datasource.

Copy is essentially the same as Rename, but the original files are unaltered. Note that this form of copy is distinct from CreateCopy() in that it preserves the exact binary files on disk in the new location while CreateCopy() just attempts to reproduce a new dataset with essentially the same data as modelled and carried through GDAL.

pfnDelete()

The delete operations default implementation will be extended to use the GetFileList() results.

Supporting Functions

Some sort of supporting functions should be provided to make it easy to identify worldfiles, .aux files and .prj files associated with a file.

Drivers Updated

It is anticipated that a majority of the commonly used drivers will be updated with custom GetFileList() methods that account for world files and other idiosyncratic files. A particular emphasis will made to handle the various formats in gdal/frmts/raw that consist of a header file and a raw binary file.

Drivers for "one file formats" that are not updated will still use the default logic which should work fairly well, but might neglect auxiliary world files.

  • VRT: I do not anticipate updating the VRT driver at this time since it gets quite complicated to collect a file list for some kinds of virtual files. It is also not exactly clear whether related files should be considered "owned" by the virtual dataset or not.

  • AIGRID: I will implement a custom rename operation in an attempt to handle this directory oriented format gracefully.

Additional Notes

  • Subdatasets will generally return an empty file list from GetFileList(), and will not be manageable via Rename or Delete though a very sophisticated driver could implement these operations.

  • There is no mechanism anticipated to ensure that files are closed before they are removed. If an application does not ensure this rename/move operations may fail on win32 since it doesn't allow rename/delete operations on open files. Things could easily be left in an inconsistent state.

  • Datasets without associated files in the file system will return an empty file list. This essentially identifies them as "unmanagable".

Implementation Plan

This change will be implemented by Frank Warmerdam in trunk in time for the 1.5.0 release.

SWIG Implications

The GDALRenameDataset(), and GDALCopyDatasetFiles() operations on the driver, and the GetFileList() operation on the dataset will need to be exposed through SWIG.

Testing

Rename and CopyFiles testing will be added to the regression tests for a few representative formats. These rename operations will be between one directory and another, and will not test cross file system copying which will have to be tested manually.

A small gdalmanage utility will be implemented allowing use and testing of the identify, rename, copy and delete operations from the commandline in a convenient fashion.