Keywords: Digital libraries, dissertations, graduate education, theses
Some of the attendees had been involved in NDLTD since early 1996. Then, funding by the Southeastern University Research Association (SURA) allowed regional expansion, which slightly preceded the U.S. Department of Education support of a national project in September 1996. Also attending the meeting were representatives of several Canadian universities, which began joining in 1997, reflecting the progress that moved the initiative from a national to an international enterprise (with national efforts beginning that year in Australia and Germany).
At the workshop members decided to carry forward the project through a series of committees, each charged with addressing key concerns of the group:
Attendees shared their experiences and solutions. Many learned of new approaches to difficult problems they had faced, and all left encouraged to push forward at their home institution. Some decided to join NDLTD, and begin pilot efforts. A number decided to shift from a pilot effort to allow all interested students to submit their theses electronically. Others decided to set a date whence all students would be required to submit an electronic thesis or dissertation (ETD). Though these successive stages demonstrate increasing levels of commitment, they simplify the situation accordingly, since it is easier to handle all theses electronically than to have all or most submissions on paper. Virginia Tech and West Virginia University, the first two institutions to require ETDs, explained the smooth procedures in place. Statistics on the rapidly growing number of accesses to the Virginia Tech collections indicated the strong demand for ETDs from around the world, encouraging other universities to make available to their own students this vehicle for disseminating scholarly findings. Finally, attending universities agreed to work together to ensure the continuing expansion of NDLTD and the enhancement of the services it provides.
5S is particularly oriented to describe information systems. We view digital libraries as high- end or super information systems that integrate a wide variety of more specialized technologies, and so benefit strongly from such a powerful framework. Advanced information systems often involve multimedia information and distributed processing, both of which require special support for Streams. Various approaches to information organization - whether in collections, using databases, supported by indices as in geographic information systems, through graphs (as in hypertext), or as complex objects - involve suitable Structures. Scientific visualization, virtual reality simulations, vector space or probabilistic or conceptual searching, and 2D or 3D graphics interfaces all make use of Spaces. Since digital libraries provide a range of services, supporting various types of information needs in tailored fashion, and are often designed based on story-like descriptions of interactions, the use of Scenarios is particularly important. Finally, since digital libraries are built to serve particular target users or user communities, it is essential that the Societies involved be carefully studied (i.e., we extend our earlier 4S model [9] by adding societies).
Building on this framework, we discuss other aspects of NDLTD in this paper. The next section explains more about NDLTD by describing representative Scenarios for various Societies involved. Section 3 broadens the focus to consider collaboration at the international level. Section 4 details some of the underlying technology developed. Section 5 explains challenges still faced, while Section 6 concludes the paper.
A graduate student user is likely to wish to find works to guide personal research. The user may have a topic well in mind, know how to search, and wish to make sure that the problem selected has not already been solved by another. This situation requires a comprehensive search, somewhat similar to those called for in legal cases. In such situations the cost of missing a relevant document can be extraordinarily high (e.g., turning years of work into a spurious effort).
Another scenario involves a graduate student hoping to identify a topic for research. One search goal may be a suitable set of highly related studies, where the detailed literature reviews and bibliographies included can serve as an introduction to the topical area. After these chapters are read, the corresponding bibliographies may lead directly to other interesting works. Alternatively, newer leads may emerge from a citation database that may be built inside NDLTD (or, perhaps, constructed jointly with the Institute for Scientific Information by extending their indexes).
Building upon the annotation server prototyped by Todd Miller early in 1999, users may decide to add notes in the form of private annotations whenever they read something in NDLTD. These will be stored on their local server, so whenever the user connects to NDLTD, all past annotations become available. Using these, an annotated bibliography could be easily produced, which might become a chapter in ones own ETD.
Research users may desire a number of other specialized services. Focus group discussions recorded by Todd Miller in 1998 suggested several. It would be helpful to have a program to analyze an ETD and extract / generate from it a glossary of important terms. It would be useful to take a small number of literature reviews and compile an overall summary for a sub- field. This might indicate which references were cited repeatedly, and group together the varied comments around each such reference. Somewhat more difficult to develop might be a tool that would summarize the open problems mentioned repeatedly in a small collection of ETDs, to help in the search for good research topics. Many other services and scenarios might be of value for users of NDLTD. Specialized ones might address the needs of particular types of users, e.g., teachers hoping to use parts of a dissertation in a class presentation, or chemists looking for works that employ a particular type of methodology.
Most students relate to NDLTD through one or more of three key scenarios. First, during their research, they may be a user of NDLTD (see Section 2.1 above), studying interesting results and perhaps finding some useful bibliographic references to support further investigation. Second, they will use the NDLTD submission software to upload their ETD, thus adding their work and related metadata to NDTLD. Third, they must connect with the rights and permissions aspect of NDLTD, filling in the Approval Form with their faculty committee and specifying terms and conditions of access to their work. Finally, it is hoped that their ETD may lead others to contact them to offer employment, ask for more details, make suggestions on future research, or suggest collaboration.
While these cases give specific instances of benefit from ETDs, it may be more useful to describe general scenarios related to collaboration. First, there are cases where one student develops a method or approach that can be applied by others. Second, there are situations where tools or data sets are developed that are appropriate for re-use. Third, some students may develop theories that others may apply or validate through experimental investigations. Fourth, one students contribution may lead to another student facing the challenge to improve upon prior work, by developing a faster or more efficient solution, possibly at the same time validating earlier findings by replicating them.
Collaborations often are supported by shared artifacts, such as ETDs. With Todd Millers annotation tool, an author may allow another to attach public annotations to their own ETD, thus making the comments of collaborators to become available whenever the abstract of the ETD is read. In a less public situation, new collaborators may discuss an ETD by sending comments through email or by communicating using phone, facsimile, or letters. If an ETD is represented using PDF, copies of the ETD may be exchanged with notes attached by tools like Adobes Acrobat software. In any case, having large numbers of ETDs available for essentially no cost should make it possible for scholars in remote regions to study theses that otherwise might have been out of reach, and should make it easier for them to contact the authors, engaging in mentoring or learning activities.
Another librarian may implement access restrictions requested by the student, so that part or all of the work becomes accessible only to the local campus community. That librarian may change the access situation later, perhaps shifting from campus to world access, as may happen with a chemistry work that relates to a journal article just published by the American Chemical Society (since their copyright policy supports such as change).
A third librarian, charged with handling preservation, may work on the collection of ETDs filed several years ago. First, the librarian may move the documents to a newer computer or set of storage devices, so that online access continues to be supported. Second, if there are SGML files involved, a program developed at Virginia Tech may be run to generate a new collection of HTML files, using the most recent standard version of the HTML specifications, so this rendering of the ETD can benefit from the newest Web technology. Finally, some conversions may be run on multimedia files, so that old standard forms give way to newer, more widely supported versions. In such cases, the original submission may be left, so users can select from the original authored form or a new rendering that is easier to use.
Finally, other librarians may help various ETD users. Reference librarians may help students find interesting ETDs. Others may help students preparing their ETDs, perhaps when using complex devices such as an editing system for digital video. Some librarians may engage in training sessions, such as occurs in the periodic workshops run so students can learn more about the ETD requirement. A smaller number of librarians may support advanced students who desire specialized training regarding multimedia information or markup languages (e.g., SGML, XML).
In many cases, Graduate Schools are responsible for checking theses, to be sure that all campus policies are enforced. In addition, they may want to check for plagiarism, by submitting one ETD and having a special program make sure that it is not highly similar to any other ETD.
Once an ETD is available, for cases where a student wishes to have UMI (University Microfilms International, of Ann Arbor, Michigan) archive their work and include an entry in their Dissertation Abstracts database, the Graduate School may then notify UMI. This may involve reporting that:
Other scenarios apply regarding NDLTD. It is encouraged that each NDLTD member determine what scenarios are appropriate in their situation, what others are desirable, and which ones they might develop into new services to be shared with other NDLTD members.
First, there is the sharing of research supported by UMI. According to figures reported by UMIs employee William Savage, less than sixty-thousand works are received by them each year. These account for almost all of the dissertations from USA and Canada, as well as almost all of the masters theses from Canada. Figures are not available to this author regarding how many copies of Dissertation Abstracts are sold, or how many copies of theses or dissertations are sold by UMI to interested parties from the stock of about 1.5 million works in their archive. An estimated upper bound might be computed on the number of copies sold, though. If we assume that total income is about $10M per year and that a single copy costs $50, UMI might sell 200,000 copies per year. That is much less than one sale per thesis or dissertation in their collection, and no more than about 4 accesses per year to each of the new works submitted over the last year, if all sales were concentrated on those items. Larger numbers of accesses may result from new UMI services, including free viewing of parts of dissertations that since 1997 have been scanned and made available as PDF page images (300 dpi, black and white, captured from microfilm).
Second, there is sharing based on interlibrary loan. Records at Virginia Tech indicate that this occurs at a low level. Circulation records show that in the first six years after a paper thesis was submitted, it circulated about 2 times per year. Dissertations circulated about 3 times per year in the same period. Since most use is local, we might assume that an average thesis or dissertation would be loaned out less than once a year.
Third, regarding access to international research from inside the U.S., there is little real support. The main arrangement for this process is through the agreements made by the Council of Library Resources. Their Chicago facility has about 750,000 dissertations collected from abroad, mostly from European universities. However, these are only available onsite. Further, there is no electronic or even card catalog, and the books are organized on shelves according to size of book, and then alphabetically according to author name. It appears that there is little access feasible under these circumstances.
Fourth, there are various methods for sharing works in less formal fashions. Some authors are contacted by interested parties who desire a copy. In recent years, authors have begun to post their works on Web sites or in departmental repositories. Some countries collect dissertations in a national library or other depository, which can be visited by interested parties. While industrious researchers make use of all of these mechanisms, it seems unlikely that more than say one access per year on average occurs across university boundaries through such mechanisms.
In summary, we note that U.S. dissertations and Canadian theses are made available through UMI, but on average are probably only sold in relatively small numbers each year. Other theses and dissertations have very low circulation, and particularly low re-use across university boundaries.
Interest in NDLTD is also reflected by the activities at universities exposed to the project. At least a few hundred universities have heard of the project. Visits and discussions involving the Virginia Tech team almost certainly bring to over one hundred the number of universities that either are members or have some clear interest in ETDs.
As is the case in other initiatives involving diffusion of innovation, spread of interest occurs in accord with a variety of factors. The Virginia Tech team has tried to manage this process using a number of approaches:
Hundreds of talks have been given. Additional presentations are scheduled. Other members of NDLTD are also engaged in this dissemination process, which we hope will lead to a number of sites becoming leaders in their nation or region.
Other support is possible to provide suitable infrastructure for NDLTD. One aspect involves the Web sites developed at Virginia Tech, which have been adapted at other locations. It would be helpful to streamline this process, so new sites can more quickly come online with their Web pages. In particular, it would help to have translations of the site into major languages for the various countries involved, with suitable support for character sets.
Part of the activity at an NDLTD site involves workshops, online, and one-on-one training and assistance for students who will prepare an ETD. Since students use a broad diversity of software while preparing an ETD, and since new versions of that software arrive continuously, it is important to continually add to and update the training resources developed for NDLTD. This can easily be undertaken in a distributed fashion, and will be coordinated by the new committee being formed to focus on training.
The other committees formed at the May 1999 Workshop also provide points of focus for joint work. Thus there is need to identify suitable standards, assist with training about them, locate tools that facilitate conversion from proprietary formats to standard forms, and find mechanisms to render (e.g., format and then either display or print) files stored using those standards.
Regarding software, there is need to identify software that can be used at NDLTD sites, and to assist with the application of it to support NDLTD objectives. Regarding statistics and reporting, there is need to develop versions of surveys and other data-collection instruments that will work in various nations, and will allow fusion of findings across sites. Regarding publisher relations, there is ample opportunity for a distributed approach to explain NDLTD to publishers and to enlist their support. In Germany this has proceeded well, with five professional societies (and publishers) serving as partners in their Dissertations Online effort. If each university were to involve all faculty who are editors of journals to obtain support from publishers for NDLTD, it would be relatively easy to change the current situation, in which many students are afraid to allow worldwide access because of concern over publisher reactions.
Other opportunities for joint work involve collaboration on developing supporting technology and on undertaking applied research to support the initiative.
Significant improvements can be made to the federated search system. One key issue is how to integrate this with the Dienst software from Cornell and the NCSTRL project that makes use of Dienst [24].
Many other sites have uploaded this software and adapted it for local use. Some detailed effort is needed to make it work on other platforms and with other database management systems. That can be carried out in distributed fashion.
Planned enhancements to be undertaken at Virginia Tech include:
A broad range of additional software has been developed, mostly in the form of specialized tools and scripts. New members of NDLTD download this set as part of the process of developing a local support infrastructure.
Extensive additional work is required to construct the most effective interfaces, and to tailor interfaces to the various tasks carried out by various user groups. A variety of interfaces have been developed and tested [4], but additional work is required. The most effective long-term solution appears to further extend MARIAN to support all desired functions.
It will become much easier to convince others to move toward requiring (only) electronic submission once there are solid results regarding the challenging problem of preservation.
We hope to collect and relate findings from similar data from other members of NDLTD. Even harder may be to determine how NDLTD is used for each Society involved. We need to determine if the collection leads to classroom use by instructors and/or student access. We need to ascertain the effect of ETDs on other theses and dissertations. We need to collect data on how often ETDs are cited, relative to other genre. Ultimately we seek to determine who learns as a result of NDLTD, how, and what can be done to enhance learning.
As NDLTD grows, some changes will be needed. The metadata standards, training resources, and federated search systems all need improvement to support more members and more users. The emerging committee structure will extend the reach of the Steering Committee to manage such growth. The annual workshop will serve as a vehicle for promoting sharing. Other meetings in connection with international digital library efforts will support this at the global level. Ultimately we hope that NDLTD will broadly support graduate education and research, extend collaboration among universities, and prepare the next generation of scholars for the Information Age.
[2] E. A. Fox, J. L. Eaton, G. McMillan, N. Kipp, P. Mather, T. McGonigle, W.
Schweiker, and B. DeVane, Networked Digital Library of Theses and Dissertations: An International Effort Unlocking University Resources, D-Lib Magazine, vol. 3, 1997. http://www.dlib.org/dlib/september97/theses/09fox.html
[3] E. A. Fox, R. Hall, N. A. Kipp, J. L. Eaton, G. McMillan, and P. Mather, NDLTD: Encouraging International Collaboration in the Academy, Special Issue on Digital Libraries of DESIDOC Bulletin of Information Technology (DBIT), vol. 17, pp. 45-56, 1997.
[4] C. Phanouriou, N. Kipp, O. Sornil, P. Mather, and E. A. Fox, A Digital Library for Authors: Recent Progress of the Networked Digital Library of Theses and Dissertations, presented at The Fourth ACM Conference on Digital Libraries, DL '99, Berkeley, CA, 1999.
[5] M. Lesk, Practical Digital Libraries: Books, Bytes and Bucks. San Francisco: Morgan Kaufmann Publishers, 1997.
[6] E. A. Fox and G. Marchionini, Toward a Worldwide Digital Library; Guest Editors' Introduction to Special Section on Digital Libraries: Global Scope, Unlimited Access, Comm. ACM, vol. 41, pp. 28-32, 1998. http://purl.lib.vt.edu/dlib/pubs/CACM199804
[7] ARL, Electronic Theses and Dissertations, vol. 7, Spec Kit 236 ed. Washington, D.C.: Association of Research Libraries, 1998. http://www.arl.org/transform/
[8] E. A. Fox, G. McMillan, and J. Eaton, The Evolving Genre of Electronic Theses and Dissertations, presented at Digital Documents Track of HICSS-32, Thirty-second Annual Hawaii International Conference on Systems Sciences (HICSS), Maui, HI, 1999. http://scholar.lib.vt.edu/theses/presentations/Hawaii/ETDgenreALL.pdf
[9] E. A. Fox, N. Kipp, and P. Mather, How Digital Libraries Will Save Civilization, Database Programming & Design, vol. 11, pp. 60-65, 1998. http://www.dbpd.com/foxweb.html
[10] E. Fox, R. France, E. Sahle, A. Daoud, and B. Cline, Development of a Modern OPAC: From REVTOLC to MARIAN, in Proc. 16th Annual Int'l ACM SIGIR Conf. on R&D in Information Retrieval, SIGIR '93. Pittsburgh: ACM Press, 1993, pp. 248-259.
[11] E. A. Fox, D. Hix, L. Nowell, D. Brueni, W. Wake, L. Heath, and D. Rao, Users, User Interfaces, and Objects: Envision, a Digital Library, J. American Society Information Science, vol. 44, pp. 480-491, 1993.
[12] E. A. Fox, N. D. Barnette, C. Shaffer, L. Heath, W. Wake, L. Nowell, J. Lee, D. Hix, and H. R. Hartson, Progress in Interactive Learning with a Digital Library in Computer Science, in ED-MEDIA 95, World Conference on Educational Multimedia and Hypermedia. Graz, Austria, 1995, pp. 7-12.
[13] L. Heath, D. Hix, L. Nowell, W. Wake, G. Averboch, and E. A. Fox, Envision: A User-Centered Database from the Computer Science Literature, Communications of the ACM, vol. 38, pp. 52-53, 1995.
[14] L. Nowell and D. Hix, User interface design for the project Envision database of computer science literature, in Twenty-second Annual Virginia Computer Users Conference. Blacksburg, VA, 1992, pp. 29-33.
[15] L. Nowell and D. Hix, Visualizing search results: User interface development for the project Envision database of computer science literature, in Advances in Human Factors/Ergonomics, Proceedings of HCI International '93, 5th International Conference on Human Computer Interaction, vol. 19B, Human-Computer Interaction: Software and Hardware Interfaces: Elsevier, 1993, pp. 56-61.
[16] L. Nowell and D. Hix, Query composition: Why does it have to be so hard?, in East-West International Conference on Human-Computer Interaction, vol. I. Moscow, Russia, 1993, pp. 226-241.
[17] L. Nowell, E. A. Fox, L. Heath, D. Hix, W. Wake, and E. Labow, Seeing Things Your Way: Information Visualization for a User-Centered Database of Computer Science Literature, Virginia Tech Dept. of Computer Science, Blacksburg, VA Technical Report TR-94-06, January, 1994.
[18] L. T. Nowell and E. A. Fox, Envision: Information Visualization in a Digital Library. Demonstration. Seattle, WA: ACM SIGIR'95, July 10, 1995.
[19] L. Nowell, D. Hix, R. France, L. Heath, and E. A. Fox, Visualizing Search Results: Some Alternatives to Query-Document Similarity, in SIGIR '96. Zurich, Switzerland, 1996, pp. 67-75.
[20] L. T. Nowell, R. K. France, and E. A. Fox, Visualizing search results with Envision. Demonstration. Zurich, Switzerland: ACM SIGIR'96, Aug. 19, 1996.
[21] L. Nowell, Graphical Encoding for Information Visualization: Using Icon Color, Shape and Size to Convey Nominal and Quantitative Data, Virginia Tech Dept. of Computer Science, Blacksburg, VA, Ph.D. Dissertation, 1997.
[22] J. Zhao, Making Digital Libraries Flexible, Scalable, and Reliable: Reengineering the MARIAN System in JAVA, Virginia Tech Department of Computer Science, Blacksburg, VA, Master of Science, 1999.
[23] J. Powell and E. Fox, Multilingual Federated Searching Across Heterogeneous Collections, D-Lib Magazine, vol. 4, 1998. http://www.dlib.org/dlib/september98/powell/09powell.html
[24] C. W. Sharrets and J. C. French, Electronic Theses and Dissertations at the University of Virginia Library, presented at The Fourth ACM Conference on Digital Libraries, DL '99, Berkeley, CA, 1999.