The Relevance of the XML Data Format for Access to Historical Datasets and a Strategy for Digital Preservation (20 Feb 2005)
As for the applicability of XML for long-term storage, our conclusions are not yet definitive. However, from what we have seen by implementing the system, simply converting the dataset to XML will not be sufficient—not even when ample documentation is provided on how to regenerate the dataset and some specific functionality. If XML is to be used for long-term storage of datasets, it should be applied within the framework of a larger solution. XML can be used for the data, but the actual dataset should be represented by something 'larger'. Functionality that is implicitly included within complex datasets should be captured in a generic implementation in the framework in order to make an optimal semantic mapping between an original dataset and an archived dataset possible. Long-term preservation of binary information is another major problem we're facing when storing datasets. The general question will be how to store the data in a way that will allow us to interpret the information many years from now. For now, storing the bit stream in a text-encoded form such as Base64 and accompanying it with metadata seems to be the best option. The metadata would have to describe what kind of file or object is represented by the bit stream, indicate which kind of text encoding is used, indicate which program is needed to open it, contain a reference to documentation about the format (if possible), and contain a datestamp.
Article URL: http://www.dlib.org/dlib/february05/vannispen/02vannispen.html
Read 82 more articles from D-Lib Magazine sorted by
Next Article: Why Board Members Should Ban Bullets