Prototype Development:
Groundwork for a Digital Audio Project Utilizing Primary Humanities Sources

Asako Yoshida
Chester Fritz Library
University of North Dakota
Phone: 1-701-777-4491
Fax: 1-701-777-3319
E-mail: yoshida@prairie.nodak.edu

Abstract:

Digital projects dealing with humanities primary sources often pose developmental challenges. The author presents preliminary findings and lessons learned during the development of an envisioned online reference book. The paper discusses the evolution of the author’ thought about the nature and the role of prototype development for humanities digital projects. Future developmental scenarios for the reference book are also discussed. Prototype Development; Humanities Primary Sources; Digital Reference Book.

Introduction

Does a “digital library” have to be a large-scale- infrastructure project?[1] The answer is a resounding "No".” We have seen an amazing proliferation of World Wide Web (web) technology over the last couple of years. More recently, a wide variety of multimedia digital systems and authoring tools for web development have become available, along with much improved HTML markup capability, and other interactive features that have become possible with JAVA and CGI scripts. In this new environment, small to medium size academic libraries can start their own digital projects focused on valuable research collections available on site, and make them widely accessible on the Internet for the benefit of national and international researchers and the general cyber-audience. Chester Fritz Library, the main library at the University of North Dakota (UND), has such a collection. Project team members are developing a prototype research tool, with audio, text, and graphic elements in the web environment, based on a special video collection held by its Elwyn B. Robinson Department of Special Collections. This paper focuses on preliminary findings and lessons learned at the initial prototype development stage prior to making it a full- fledged grant project.

Rationale of a Digital Project: The Case of the UND Writer's Conference Video Collection

Is it worthwhile to develop this collection into a digital project? This is a most question to ask right at the beginning, especially if the library has not yet moved towards planning a digital library and thus has no obvious technical support. The library first needs to assess the value and uniqueness of the collection, and then consider whether the collection is worthy of the extra effort required to make it available on the Internet. And, if so, it needs to seek a grant to support the project. Another important consideration is whether the library can recruit experts from campus and develop a collaborative and working relationship for the project. Collaboration with subject experts is an important component of a project dealing with primary sources in the humanities if it aims to develop a successful "content-based user interface".[2]

Initial interest in the digital project at the Chester Fritz Library emerged from a discussion with Prof. Jim McKenzie, the current Director of the University of North Dakota Writer's Conference, who pointed out the significance and value of the Writer's Conference video collection.[3] The collection includes readings, talks, and panel discussions by many notable contemporary American and international novelists, poets, playwrights, essayists and filmmakers who have participated in the annual UND Writer's Conference over the past 20 years.[4]

In the past two decades, the annual UND Writer's Conference has established itself as a rare public forum for the University and its surrounding community. In more recent years, the Conference has drawn a diverse, educated public from the region and beyond to the campus. Each year, the week long Writer's Conference is organized around a particular theme, issue, or literary movement, and features six to ten writers. During the Conference week, each writer gives a reading and participates in several panel discussions. These sessions have become most interesting forums capturing each participating writer's perspectives on his or her own work and the wider personal and cultural contexts from which that work arises. The Writer's Conference at the University of North Dakota has created a number of rare occasions to observe memorable interactions among participating writers, reminiscent of the eighteenth- and nineteenth-century literary cafe culture.[5} The Writer's Conference archival video collection captures many unique moments--reminiscent of such cafe culture--that may never be recreated.

The collection is a valuable primary resource for literary students and researchers, as well as humanists and social scientists whose interests range from American intellectual history to American social and cultural studies. It is worth the effort to make the materials available on the Internet and thus to a wider audience. Some highlights of the archival collection include a historic 1974 gathering of the Beat poets, entitled, "City Light in North Dakota;" 1976’'s "New Journalism and the Novel" brought, among others, Tom Wolfe and Truman Capote to the campus; and several international Writer's conferences in the 1980s that featured such authors as Alain Robbe-Grillet, Joseph Brodsky, Derek Walcott, and Czeslaw Milosz. The Conference's panel discussions have referred to a wide variety of significant social and cultural events in American history: Watergate, the Vietnam War, and the nuclear arms race, to name a few, over the years. The collection has captured the evolution of various cultural discourses on gender, politics, and ethnicity, especially as they relate to language and cultural representations in literary works. The connection between a Writer's work and the social and cultural milieu from which the work arises is one of the important contextual features of the archival collection.

Content-Driven Organizational and Design Issues for Humanities Resources

Aigrain's paper, presented at the 1995 International Symposium of Digital Library, has a theoretical relevance to the UND Writer's Conference project. [6] Aigrain outlined key theoretical and technical issues in the practical development of image and sound digital libraries. He emphasized the importance of "content- based processing and interaction” for sound and digital projects, and directly addressed the key issues as they relate to technical solutions, such as better digital compression methods and standardized content- description codes programming. Aigrain thus delineated needed technical improvements that in turn constitute important considerations for the delivery of "content- based processing and interaction,” as well as "content- based user interface”. Many of the technical issues, however, can be overcome by increasingly user-friendly multimedia development tools appearing in the marketplace, such as Progressive Network, Inc.'s "RealAudio" and Silicon Graphics, Inc.'s comprehensive WebFORCE MediaBase.[7] These pre-packaged multi- media development tools would make the audio delivery of primary humanities sources on the Internet much easier, and increase access by researchers and students to resources otherwise not easily available.[8] The UND Writer's Conference project selected RealAudio as its primary audio delivery mechanism, based on its real- time, on-demand audio streaming technology, its availability to a wide variety of computer platforms, its established popularity on the Internet, and its indexing capability.

The marriage of intellectually dynamic audio content and the current web multi-media technology requires that the project development team pay extra attention to and put significant efforts into the overall research tool design and organization. The issues surrounding content-based interaction or user interface have particular relevance to a project dealing with humanities primary sources. The issues of design and organization have to be addressed in the process of working together with subject experts who will evaluate and analyze the contents of the primary sources.

The Project Vision, and the Role of Prototype Development

The vision for the UND Writer's Conference Project is to create a well-organized electronic reference tool that facilitates user's browsing and skimming through the contents by means of abstracts or annotated representations. The project envisioned will not be simply a collection of digitized archival materials, but rather an online reference book, with abstracts, annotations, and explanatory notes that makes use of the audio portion of selected reading and panel session video archives.

At the onset, the project team selected a conference known to be rich in content as its logical starting point to develop the project's "ptotype". The project team, however, started with a very vague understanding of what prototype development was supposed to do. It was rather overwhelming to deal with a large portion of the video collection all at once. Starting with one conference seemed reasonable. The project team thus simply called it "pototype development”".[9] In the process of developing the first prototype, the project team gained a new understanding about prototype development and its role. The project team now sees project development as a useful framework which guides each development stage and within which design and organizational issues are considered and tested. In other words, Prototype development will involve testing and creation of new organizational and design features by considering the project elements, such as audio clips, abstracts, annotations, and explanatory notes, and their relationships to each other.

The project team learned some practical lessons while designing the first prototype. The initial and fundamental lesson was that a digital project requires appropriate facilities and technology on site. At the outset, the project team assumed that developing a prototype using one sample conference would be a relatively easy task. It became immediately obvious, however, that it did not have the appropriate equipment on site to produce even one entire sample conference. A minimum of 1.5 Gb computer disk space was estimated for processing and storing the entire conference sample.[10] Despite the initial readjustment, the project team managed to design the first prototype with support from the National Research Council Canada, Institute for Biodiagnostics.[11] The prototype consists of five web pages, including the main project page, a panel session page, a reading session page, a Writer's session page, and a conference page. These pages are hyperlinked to each other and constitute the basic building block of the project. [12]

With the experience of developing the first prototype, the project team learned to be realistic about each developmental step. For example, its initial objective of making one sample digitized conference was revised, and instead, the project team first established a much smaller unit by digitizing a sample tape from one reading and one panel session. When the first prototype was established, the project team reassessed the next realistic project development step and now plans to continue using contents of much smaller scale-- probably, half a dozen or so authors’ archives--to test the envisioned electronic reference book. The project team foresees the following future prototype development described below. In the next stage, the project team will use the current prototype as a template to process other selected reading and panel sessions. With the template, the team can further edit accompanying abstracts and explanatory notes, and organize indexes for audio files. After establishing this basic ground, the project team will be ready to incorporate additional annotations, explanatory notes and appropriate hyperlinks to mold the prototype into the envisioned reference book. The project team will be able to revisit any changes to the core organization (if there are any) and experiment with additional interface features at this time.

Outline of Organizational and Design Elements in the Core Building Block

Three web pages--Writer's, reading, and panel session pages--constitute the core building block of the project. A Writer's page will be created for each author, containing a concise biographical note, hyperlinks to reading and panel session pages, a selected bibliography and selected Internet resources. This page is designed to include basic background information on the author. The reading session page includes an abstract, some graphic images providing the visual flavour of the session, and indexed audio clips. The panel session page is organized similarly.

RealAudio files can be indexed by starting and ending time of each segment. For the reading page, the project team chronologically segmented each item read and remarks made by the reader. Each reading item was indexed by title. After analyzing the content, the panel session audio clips were also segmented and each segment annotated, using free form sequential and topical highlighting. Many web projects that utilize RealAudio audio clips organize them sequentially as in the first prototype for the UND Writer's Conference project. For example, the National Public Radio site is a typical news site organizing radio clips sequentially by date of broadcasting and by program within the same date.[13] Another typical sequentially organized audio project is the 1992 Presidential Debate at the Michigan State University.[14] The Presidential Debate and the NPR sites are essentially archival in nature, that is, they faithfully reproduce the material of the actual events: the debates or, in the case of the latter, the newscasts themselves, and other NPR programs. In contrast, the UND Writer's Conference project is not strictly a collection of archival materials, but rather an online reference book. The book will contain contextual items to highlight and support audio clips. The project team currently envisions thematic pages where annotated audio clips will be organized by themes selected. For example, if the theme is "feminism",” the project team will gather all relevant comments made on "feminism" by participating authors over the years. A RealAudio metadata file (.ram) containing a series of audio clips with comments on "feminism" can be created and set up to play the collected comments one after the other. An appropriate annotation to the audio clip can be added to guide the user. This is an example of the kind of interesting feature the project can include by utilizing the UND Writer's Conference materials and the RealAudio indexing capability.

The prototype development with successive developmental stages will continue to address general web layout design and interactive features, possibly using CGI scripts, JAVA applications, a search engine, and other technical mechanisms for the research tool interface. For example, software that allows on screen user annotation would, as pointed out by Aigrain, significantly increase the value of the research tool and allow users flexible browsing according to their research needs, regardless of the pre-arranged organization.[15] In addition, metadata indices will be HTML-encoded in order to ensure wide accessibility of the research tool.[16]

In order to proceed to the next developmental stage, the project team needs to have appropriate equipment and an on-site facility. With the project facility on site, the project team will address quality control methods applied to the audio digitized files using a sound editor. Also at that time, the project team will design some tools that facilitate content development and editing process using simple computer programs.

Conclusion

Recent increased availability of multimedia development tools and systems in the market will make it more feasible for small to medium academic libraries to start their own digital library projects utilizing unique and valuable humanities collections held at their libraries. This paper summarized some lessons learned by developing the first prototype for the UND Writer's Conference Project and outlined possible future steps with content-based user interface in mind.

References:

[1] Most notable digital library projects are usually large in scale. The American Memory Project at the Library of Congress, and the NSF/ARPA/NASA Digital Library Initiative are examples from publicly funded organizations; and the digital archival system set up at Simon and Schuster's Higher Education Group, is one from a privately-funded organization. A short list of notable digital library projects is available from the web site: .

[2] Aigrain, Philippe, "Image and Sound Digital Libraries Need More Than Storage and Networked Access" p.113. In the paper, Aigrain covered key theoretical and technical issues in developing image and sound digital projects. He pointed out that "Content-based user interface for interactive listening and viewing are maybe the most important components of digital libraries of time-based media".

[3] I would like to thank Prof. McKenzie for sharing with me the detailed history of the Writer's Conference at the University of North Dakota and also for providing timely insights and his knowledge of the intellectual contents of the archival collection. He has been the organizer of a number of past UND Writer's Conference series and also a member of the national writer's conference organization, Writer's Conference and Festivals. Prof. McKenzie is a collaborator and an active participant in the UND Writer's Conference Project.

[4] The UND Writer's Conference was inaugurated in 1970 and the video collection started since its 8th Conference in 1977. The list of writers and poets who attended past conferences is available from the author.

[5] This analysis of the panel sessions emerged in the discussion with Prof. McKenzie.

[6] See no.3. p. 112-117.

[7] Progressive Network, Inc. develops both sound and video multimedia systems and players. For more details, see . Silicon Graphics, Inc.'s’ WebFORCE MediaBase currently provides what is probably the most comprehensive multimedia development tools in the marketplace. See for more information.

[8] Many primary sources in humanities are held as archival collections in libraries and unless they are deemed to be highly valuable and to have some local or national significance, many are not even catalogued, nor listed on their on-line catalogues.

[9] The Writer’'s Conference video archival collection consists of a series of panel and reading sessions, covering on average 15 hour-long videotapes for each conference. The whole video archival collection covers over 20 conferences. The video archival collection contains close to 200 reading and 100 panel sessions.

[10] An audio file initially digitized with the processing speed of 11.025 kHz for a one-hour videotape requires approximately 100 Mb disk space. When the same file is compressed to a RealAudio file (.ra), the size reduces to 7 Mb. Required disk space for a RealAudio server and digital processing would accumulate substantially.

[11] The project team especially acknowledges Mr. Walter Roberson, System Administrator at National Research Council Canada, Institute for Biodiagnostics for his technical support and for making the Institute’'s Unix facilities available for the project.

[12] Creating these basic building elements also involve establishing basic file and directory organization on the Unix machine and a naming scheme for each file type.

[13] See http://www.npr.org.

[14] See http://web.msu.edu/debate/debate.html.

[15] For example, Paul Lansky developed software called "marksnd”". See the Princeton University Department of Music web site:
The author has not, however, found similar annotation software that is usable with RealAudio files.

[16] For an example, the Dublin Core Element set has been developed over the last couple of years; it consists of 15 metadata elements. For more detail information is available from the Dublin Core home page:
.

Acknowledgements: The author is grateful to a number of people who helped her finalize the paper. Prof. McKenzie, Mrs. Betty Gard, Mrs. Sandy Slater, Mr. Bob Garret, and Mr. Walter Roberson, all of whom provided helpful editing comments on her paper. The author especially acknowledges Mr. Roberson's extensive technical support on the first prototype development.