Figure1:Proposed Technical Strategies for Digital Preservation
However, preserving bits is only a small part of the problem. This problem is overshadowed by much larger problems involving organization, policy, and roles and responsibilities. The International Council for Scientific and Technical Information (ICSTI) sponsored a study in March 1999 aimed at identifying emerging models and best practices for digital archiving, wherein technology was considered of secondary interest to the understanding of policy and practice.
Figure2:Archive Environment - A Simple Model
1) Attempt to predict data sets which are endangered;
2) Identify strategically important data sets;
3) Fill gaps (e.g. missing parts of a series, out-of-print materials);
4) Build specialist collections;
5) Widen holdings.
1) Identify the name of the work, who created it, who reformatted it, and other descriptive information;
2) Provide unique identification and links to organizations, files, or databases which have more extensive descriptive metadata about this work (this is particularly important in the event that the digital file and its metadata become separated);
3) Explain the technical environment needed to view the work, including applications and version numbers needed, decompression schemes, other files that need to be linked to it, etc.
Preservation metadata, therefore, may be used to store all this technical information that supports preservation decisions and actions. In contrast to descriptive metadata schemas (e.g. MARC, Dublin Core), which are used in the discovery and identification of digital objects, preservation metadata largely falls into the category of administrative metadata, assisting in the management of information.
The Open Archival Information System (OAIS) Reference Model has been utilized by many initiatives developing preservation metadata sets. It provides a useful reference point to ensure all relevant information required for preservation has been included. National libraries included in this study were selected based on this common feature. All are using the OAIS model as their framework in developing their own preservation metadata sets.
The work being undertaken by the Research Libraries Group (RLG) and OCLC is complementary to this study and that keeping watch on the developments of this Working Group would be beneficial. In the CURL/JISC/ RLG/OCLC ? sponsored Digital Preservation Conference held last 2000 December in the U.K., Brian Lavoie of OCLC described the work of the joint OCLC/RLG Working Group on Metadata for Digital Preservation (RLG DigiNews, December 2000). He mentioned that this metadata effort is using a consensus-building approach to identify a comprehensive metadata framework to support a broad range of digital preservation activities. In the same conference, Lavoie described a white paper written to launch the work. The white paper, to be made publicly available in 2001, describes the current thinking and practice on the use of metadata to support digital preservation. Also highlighting the OAIS reference model as a common starting point, the paper reviews existing metadata element sets from projects and institutions, which were guided by the OAIS model during their work: CEDARS, the National Library of Australia, and NEDLIB. The method and objectives adopted by the OCLC/RLG Working Group methodology are, incidentally, very much similar to those of this research.
Figure3:Key Concepts and Relationships
The second issue deals with access. Considering that it is widely recognized, at both the national and international levels, that a copyright owner has an exclusive right to communicate a protected work to the public and that most electronic publications need to be "communicated to the public" in order to be seen and read, the deposit copy of such electronic publications might require a specific exception allowing access to the clientele of the national legal deposit institution.
Another legal issue is the question of ownership. The law should clearly state that the collection is an integral part of the country's cultural heritage and that the sole owner is the national institution responsible for maintaining and preserving it. A good example of such a clear statement of ownership is contained in Canada's National Library Act. However, it should be made clear to both the national legal deposit institution and the publishers that ownership of the collection does not mean ownership of intellectual property rights. A related property issue is the right of the depository to dispose of certain categories of material under certain conditions, e.g. the resource has limited enduring value or potential for re-use. The legislation should include a commitment by the depository that all possible and reasonable efforts will be made to permanently keep all materials deposited, but the legislation should also include a right of disposal.
Finally, a legal issue that is important to consider when preparing legal deposit legislation is the possible conflict with other laws. The two best examples of such a problematic situation relate to pornographic material and hate literature. Even if most countries have laws forbidding the publication, production, distribution, circulation and possession of such material, any such material should be subject to legal deposit. Since both pornographic and hate material may be found on carriers subject to legal deposit (books, periodicals, videos, etc.) and is also extensively available in electronic format, it is worth considering the issue. One of the basic elements of this discussion is the fact that the issue deals with the values of society, which vary from one country to another. One of the objectives of a national legal deposit scheme is to build up a comprehensive collection of published material for preservation and research purposes, and not allowing such material to be deposited might jeopardize the historical and sociological value of the national collection as the prevailing standards of tolerance evolve. From a strictly legal point of view, unless the legal deposit legislation clearly states that such material is not subject to legal deposit, it should be deposited. But after it has been deposited, the depository will have to comply with its jurisdiction's legal requirements with respect to access to material deposited.
Some countries have specifically excluded online publications from any new legislation. The Legal Deposit Act of 1993 in Sweden covers offline electronic documents and certain other non-print media such as microforms but not online publications. A Bill submitted to Parliament in 1995 recommends that not only online databases, but also software such as operating systems, compilers and text-processing programs, be excluded. The French legal deposit legislation of 1992 applies to offline but not to online electronic materials. However, it does apply to databases, software and expert systems and specifies rules of deposit for each category. The Library of Congress receives CD-ROMs on a more or less comprehensive basis through legal deposit. However, it currently lacks clear authority to collect online publications.
Some countries include non-print publications within their legislation but their coverage is highly selective. In Italy, the existing law of 1939 (revised in 1945) covers print material and videos produced as integral parts of books. In Spain, the existing law of 1971 covers books, periodicals, sound recordings and cinematographic productions; plans for new legislation recommend much wider coverage and will include computer programs, databases, expert systems and other artificial intelligence products. Current legislation in Germany includes offline electronic publications and excludes online publications. It also excludes film works, filmed records, audio-visual displays, and individual photographs.
In Asia, it is only the National Diet Library (NDL) of Japan that made effort to amend its Library Law to make the new legal deposit system include packaged (i.e. CD- ROMs, DVDs and other electronic publications which fixate information in physical media) electronic publications from 2000. However, networked electronic publications are excluded from the time being but those which are considered necessary or beneficial would be collected selectively by contract (NDL Newsletter 112). Other Asian national libraries like Malaysia, China, Indonesia, Korea, Nepal, the Philippines, Taiwan, Thailand, and Vietnam have non- print/audio-visual materials as part of their collection and legal deposit legislation. However, it is not clear whether the law requires the deposit of non-print materials.
The Working Group of the Conference of Directors of National Libraries (CDNL) recommends to include rather than to exclude items if there is any doubt what should be included. It also advises against making a distinction between online and offline forms of electronic publications and suggests both forms be included in countries where there is the possibility of a rapid move towards online publications. It is then up to the national repository to determine which items are required for the national collection.
Dynamic documents, i.e. frequently updated documents or those that change continually over time, pose an acquisition problem that we do not face with conventional texts. Although one would argue for selective acquisition that is frequent enough to preserve all information contained in such a publication during its lifetime, prohibitive costs may well compel a much greater selectivity aimed at only acquiring representative samples (National Library of Sweden's Kulturarw3 Project).
In the 1998 survey of RLG's members conducted by Margaret Hedstrom and Sheon Montgomery of the University of Michigan's School of Information, few institutions reported having policies or even codified practices for preserving "born-digital" and "converted-to-digital", i.e. digitized, materials. Many institutions are actively working to store and maintain access to their digital holdings, whether or not practice is documented or an institutional policy exists. As the RLG survey documents, creating digital preservation policies is a difficult task. The lack of good models for digital preservation, together with uncertainty about the most appropriate methods and approaches, appear to be major obstacles to developing effective policies and practices.
The selective approach is represented by the PANDORA Project of the National Library of Australia and EPPP (Electronic Publications Pilot Project) of the National Library of Canada, the policies of which are analyzed in this study. Their scope is to collect important publications that can be made accessible at once. They are only collecting thousands of documents. An argument for being selective is that one should not spend limited resources for preserving lots of trash. However, admittedly, doing intelligent selection is difficult and researchers in the future will criticize the way choices are being made now. Computer storage is getting cheaper and cheaper, while the cost of personnel is not.
(2) Elements contained in general selection guidelines include decisions regarding the following:
(3) Adapting traditional collection levels for print-based materials to the digital realm can be the most cost-effective means of ensuring appropriate management and continued access to the most important digital resources. Assigning collection levels to digital materials can indicate what preservation decision and action are given to the resource. These levels which have been consolidated from three initiatives i.e., the Berkeley Digital Library SunSITE Project, the Arts and Humanities Data Services, and the National Library of Canada, are now summarized as follows:
Collection Levels & Definitions
(4) Most projects dealing with digital preservation recognized at an early stage that metadata is important. The OAIS Taxonomy of Information Object Classes, the information requirements identified for preservation used by several of these projects, was based on the concepts first described in the 1996 Task Force Report as those features that determine information integrity : content, fixity, reference, provenance, and context. Accordingly, the OAIS Taxonomy divides Preservation Description Information (PDI) into Reference Information, Context Information, Provenance Information, and Fixity Information. After comparing the three OAIS-based Preservation Metadata sets, the common preservation metadata elements which can be considered essential to ensure long-term preservation, are the following:
Preservation Description Information
(2) There is a relationship between preservation and access in both the traditional and digital environments. Institutions, like national libraries, that are charged with preserving traditional paper-based materials invest heavily in the physical infrastructure which will allow people to access the material they need both now and in the future. Similarly, there is a need to ensure that selected digital materials will also continue to be accessible when they are needed.
(3) An increasing dependence on both digitally produced and accessed information means that there is a rapidly growing body of digital material for which there are legal, ethical, economic and/or cultural imperatives to retain, at least for a defined period of time and, in some cases, forever. If active steps are not taken to protect these digital materials, they will inevitably become inaccessible within a relatively brief timeframe.
(4) Selection for collection building and preservation is mainly human-driven and involves the decision-making process for including or excluding electronic material from the deposit collection. The decision- making process is based on national deposit policies, regulations, and agreements made with publishers and other providers. The selection process is therefore highly dependent of local conditions.
(5) The OAIS Reference Model is applicable to any archive. It is specifically applicable to organizations with a responsibility to make information available for the long term such as national libraries. By applying the OAIS Model, deposit libraries can benefit from the advantages of international standardization. By using a common reference model, a common terminology and a common conceptual framework, it is much easier to share ideas and exchange experiences.
(6) In the library domain, discussion has tended to focus on so-called "item-level" metadata (i.e. descriptions of individual books, articles, etc.) The new environment has new requirements. The "information broker" needs to have access to various types of metadata to support its operation. It is recognized that metadata required for long-term digital preservation is complicated by the levels of "granularity" that can occur within a single digital object or collection of objects. Metadata may be assigned at the level of a complete digital collection, a single digital object, or even, in the case of complex digital material, at the individual file level. In part, the granularity of the metadata will be determined by the digital object itself and the level of description necessary to ensure preservation, but it will also be influenced by collection management policies in place at the archive. In addition, the granularity of the metadata may be influenced by concerns about rights management of some more complex digital objects, for example, where different parties own different components of the content and/or systems. How an archive chooses to assign metadata, and at what level of granularity, are not decisions imposed by a metadata specification. A preservation metadata specification should allow for description at any level but ultimately the decision resides with the archive. For example, both the British Library and NEDLIB, where work is focused on the deposit library situation, have chosen, for justifiable practical reasons, to assign metadata to materials as they have been delivered to the library (e.g. as produced by the publisher).
(7) The description of collections will become increasingly important in the context of networked library services. A strong view is emerging that libraries need to complement item-based description with description at a higher level. This will complement current work in the archives community and that descriptions at this shared level of granularity will facilitate cross-domain working. Hence, while the value of collection-level description is recognized, there is no standardized way of doing it. UKOLN has developed a preliminary approach in describing the JISC Current Collections, and it has prepared a report that examines collection description in library, archive, and museum domains. This area would be worth looking into further.
1) Beagrie, Neil and D. Greenstein. Managing digital collections: AHDS policies, standards and practices. Consultation draft, December 1998.
2) Beagrie, Neil and Maggie Jones. Preservation management of digital materials. Pre-publication draft, October 2000.
3) Besser, Howard. Digital longevity, 1999. http://www/gseis.ucla.edu/howard/Papers /sfs-longevity.html
4) CEDARS Project. http://www/curl.ac.uk/projects/cedars.ht ml
5) Conference of Directors of National Libraries. The legal deposit of electronic publications: report of a CDNL Working Group. Paris: UNESCO, 1996.
6) Consultative Committee for Space Data Systems (CCSDS), CCSDS 650.0-R-1, May 1999. Reference Model for an Open Archival Information System (OAIS) Draft Recommendation for Space Data System Standards. http://www.ccsds.org/RP9905/RP9905.ht ml, October 1999.
7) Day, Michael. "Extending metadata for digital preservation". Ariadne, No. 9, May 1997.
8) Day, Michael. Metadata for preservation. CEDARS project document AIW01. Bath: UKOLN, 1998.
9) Digital Library SunSITE Collection and
Preservation Policy.
10) Graham, Peter. "Requirements for the
digital research library". College and
Research Libraries, Vol. 56 No. 4, July
1995.
11) Hodge, Gail. Best practices for digital
archiving: an information life cycle
approach. D-Lib Magazine, January
2000.
12) Hodge, Gail. Digital archiving: Bringing
stakeholders and issues together ? a
report on the ICSTI/ICSU Press
Workshop on Digital Archiving. ICSTI
Forum, No. 33, March 2000.
13) Kuny, Terry. "The Digital Dark Ages?
Challenges in the preservation of
electronic information"" International
Preservation News, No.17, May 1998.
14) Lariviere, Jules. Guidelines for legal
deposit legislation: a revised, enlarged
and updated edition of the 1981
publication by Dr. Jean Lunn, IFLA
Committee on Cataloging. Paris:
UNESCO, 2000.
15) Mannerheim, Johan. The WWW and our
digital heritage ? the new preservation
tasks of the library community. Paper
presented at the 66th IFLA Council and
General Conference, Jerusalem, Israel,
August 2000.
http://www.ifla.org/IV/ifla66/papers/158-
157e.htm
16) National Diet Library. Legal deposit
system.
http://www.ndl.go.jp/e/toukan/nouhon_to
wa.html
17) National Library of Australia. Guidelines
for the Selection of Online Australian
Publications Intended for Preservation.
1999.
http://www.nla.gov.au/scoap/guidelines.h
tml
18) National Library of Australia.
Preservation Metadata for Digital
Collections. 1999.
http://www.nla.gov.au/preserve/pmeta.ht
ml.
19) National Library of Canada. Electronic
Publications Pilot Project (EPPP): Final
Report. June 1996.
20) National Library of Canada. Networked
Electronic Publications Policy and
Guidelines. October 1998.
21) PADI (Preserving Access to Digital
Information) subject gateway
http://www.nla.gov.au/padi/
22) Preserving Digital Information: Report
of the Task Force on Archiving of Digital
Information, Commission on
Preservation and Access and The
Research Libraries Group, Inc., May
1996, http://www.rlg.org/ArchTF/
23) Project PRISM.
http://prism.cornell.edu/overview.htm
24) RLG Working Group on Preservation
Issues of Metadata. Final report.
Mountain View, California: Research
Libraries Group, May 1998.
25) Russell, Kelly. Cedars Document ABS01:
Debate and discussion at the Cedars
Project Advisory Board Meeting, May
1999.
26) Stenvall, Jani. Metadata for preservation
of electronic publications. Helsinki
University Library, October 2000
(Translated from Finnish).
27) Stenvall, Jani. Using Dublin Core as
preservation metadata. Helsinki
University Library, October 2000
(Translated from Finnish).
28) Stone, A. and Day, M. Cedars
preservation metadata elements. Cedars
project document AIW02, 1999.
http://users.ox.ac.uk/~cedars/Papers/AIW
02.html