RLG
RLG
DIGITAL IMAGE ACCESS
PROJECT
Proceedings from an RLG symposium lield
March 31 and April 1,1995
Palo Alto, California
Edited by Patricia A. McClung
The Research Libraries Group, Inc.
Cover photographs courtesy of: the Avery Library, Columbia
University; the Bancroft Library, University of California at
Berkeley; Duke University; the Frances Loeb Library,
Graduate School of Design, Harvard University; The Getty
Center for the History of Art and the Humanities ( from the
Max Hutzel survey of regional Italian architecture collection);
the Karl Struss Collection of the Amon Carter Museum;
Northwestern University; and the Ramona Javitz Collection,
Miriam & Ira D. Wallach Division of Art, Prints and
Photographs, The New York Public Library, New York City.
© The Research Libraries Group, Inc.
All rights reserved
First published in August 1995
Printed in the United States of America
The Research Libraries Group, Inc.
Mountain View, California 94041 U. S. A.
TABLE OF CONTENTS
Introduction v
Patricia A. McClung
The RLG Digital Image Access Project: 1
Overview and Summary of Objectives
Anne R. Kenney
« / Why Make Images Available Online: User Perspectives 11
Hinda E Sklar
Audience Discussion 19
Anthony T. Troncale
r Options for Digitizing Visual Materials 21
Ricky L. Erway
Audience Discussion 30
Richard Frieder
Description and Access for Digitized Photo Archives 33
Jackie M. Dooley
Audience Discussion 43
Steven L. Hensen
Digital Image Collections: Cataloging Data Model 45
and Network Access
Stephen Paul Davis
Audience Discussion 60
Sherman Clarke
UC Berkeley's Participation in the RLG Digital Image 63
Access Project
Jack von Euw
Access to Digital Representations of Archival Materials: 73
The Berkeley Finding Aid Project
Daniel V Pitti
Audience Discussion 82
Helena Zinkham
Technical Choices in Digital Imaging: The Technical 85
Images Test Project in Review
James M. Reilly
III
Final Thoughts 95
Anne R. Kenney
Abbreviations and Acronyms 98
About the Speakers 99
Symposium Attendees 102
IV
INTRODUCTION
All of us in the information business are facing enormous challenges as we
make the inevitable— but very uncertain— transition from a paper- based
analog past to an electronic digital future. Thus far, this digital revolution
represents a mixed blessing. It offers enormous opportunities to make information
readily available at the desktop, while it also raises general expectations
beyond what either the technology or our budgets can deliver, at least
in the near term. We can envision a seamlessly integrated online information
environment somewhere down the road, but the fact is that a lot of
work needs to be done to ensure that beauty and order will eventually
emerge from the chaos that currendy characterizes many Internet information
searches.
Over the past 18 months RLG has sponsored two pioneering projects focused
on use of new technology to improve access to and preservation of
photographic images. The first, the Digital Image Access Project, involved
nine RLG institutions— the Amon Carter Museum, Columbia University,
Duke University, the Getty Center for the History of Art and the
Humanities, the Frances Loeb Library of the Harvard University Graduate
School of Design, the New York Public Library, Northwestern University,
the University of California at Berkeley, and the Library of Congress. In
partnership with Stokes Imaging Services of Austin, Texas, these institutions
experimented with an online image management system to try to find economical
ways to catalog and index large photographic collections as well as
to make them more accessible using electronic technology.
The project was motivated by the proliferation of independently conceived
and executed locally oriented projects to digitize collections in various institutions.
It set out to see what would happen when many collections from different
institutions were merged and to take a broader look at the issues involved
in making integrated, networked access possible. In all, approximately
9,000 images related to the project theme of " The Urban Landscape"
were digitized and included in the project database along with related index
records.
The creation of the database allowed the participants to explore together
many of the very tough issues involved in creating shared " virtual" collections
in an electronic environment. While lots of people are waving their
arms about what this new information infrastructure will be, the project
managers rolled up their sleeves and tried to create a prototype environment
for merged image collections and in the process engaged in some groundbreaking
work. They also experienced some of the frustrations and impediments
that are endemic to any such pioneering effort.
Together with the Image Permanence Institute at the Rochester Institute of
Technology, Stokes Imaging and RLG also undertook a second, parallel pro-
ject focused on some of the technical aspects of digitization, including
image capture, compression, display, and output. The project set out to
demonstrate the choices and trade- offs inherent in developing online image
systems. At the heart of the project was the role of photographic and digital
imaging in collections— that is, can surrogate images of original photographic
materials be used for both preservation purposes and enhanced access
by the users of collections?
In March 1995 RLG wrapped up both projects with a small invitational
symposium in Palo Alto. That event convened project managers from the
nine participating institutions along with experts in the digital- imaging field.
The agenda featured reports on various aspects of the project, as well as discussions
of some of the challenging questions the project generated. The
event also was intended to officially mark the end of a project that could
otherwise go on forever, as well as to create an opportunity to share findings
and think about potential follow- on projects.
This publication brings together the papers from that event along with summaries
of the discussions that occurred. It highlights the accomplishments of
the work these institutions undertook collectively as well as the frustrations
they encountered.
Some of the accomplishments discussed include:
• A prototype cataloging model for networked digital image
access that will be studied and tested further at Columbia
University.
• A side- by- side comparison of two different approaches to
online access for the same collections: SGML encoding and
MARC.
• A test of the feasibility of using very- low- resolution scans in
an online search- only system ( which found that this practical
and economical approach to digital capture is inadequate
for the uses that will likely be made of the images).
• Requirements for scanning and display of images online,
which spawned a follow- on research project that will be
pursued by the Image Permanence Institute.
• A testbed environment for RLG and others to further explore
making image collections available via the Internet,
World Wide Web, and other networking tools.
VI
More than anything, the Digital Image Access Project provided a forum for
grappHng with some of the toughest problems that must be solved before a
national or international digital library service can be successful. In particular,
it allowed for practical experimentation with what happens when disparate
collections with different histories, provenance, and methods ( as well
as levels) of access are merged online. It also provided a testbed for providing
both item- level and collection- level access to images, in hopes of finding
more economical approaches to bringing large groups of images under intellectual
control.
These projects did not solve any of these problems. We do not yet know how
to organize and retrieve millions of images in an online environment both
efficiently and cheaply. We do, however, have a better understanding of the
issues, and hope that future projects will build on what we learned and report
on in this publication. And we believe more firmly than ever that success
will come about only with close collaboration among research institutions,
their faculty, libraries, and computer centers, as well as museums, historical
societies, archival repositories, and other content providers.
Patricia McClung
June 19, 1995
VII
VIII
THE RLG DIGITAL IMAGE
ACCESS PROJECT
Overview and Summary of Objectives
by Anne R. Kenney
Welcome to the symposium for the RLG Digital Image Access Project
( DIAP). My job is to provide an overview of DIAP and to place in context
the issues that wiU be discussed by some of those who did the real work at
the nine participating institutions. I served as a liaison to DIAP from the
RLG Task Force on Photographic Preservation, so my role was more observer
than doer, and I have watched with interest this 18- month effort to
collaboratively work out some of the major issues affecting image access. I
am greatly indebted to James Reilly and Jackie Dooley for sharing copies of
papers they presented on DIAP at the SAA meeting last September Much
of the background I present here is taken directly from their work.
DIAP was conceived over three years ago when RLG organized its Task
Force on Photographic Preservation to identify and address significant
preservation issues facing photographic collections in research libraries. The
task force was coordinated by Patti McClung, chaired by Jeffrey Horrell,
and consisted of Debbie Hess Norris, Barclay Ogden, James Reilly, Jackie
Dooley, and me. One of the central issues investigated was the application
of digital imaging to photographic materials for preservation and access.
Eventually two projects emerged. The first, and the focus of much of this
symposium, was DIAP. It was designed to investigate issues related to description
and access for digitized photograph collections. As part of the project,
9,000 photographs would be converted to digital images. DIAP would
use low resolution ( 640 x 480 pixels) to control costs and expedite the retrieval
of images.
The second project, the Technical Images Test Project, would investigate issues
of image quality in digital conversion. It involved only 14 images, but
would explore how image quality was affected by choices made in capture,
display, compression, and output— with a focus on what level of quality is
possible, practicable, and affordable given today's materials and equipment.
Stokes Imaging Services of Austin, Texas, working with Jim Reilly of the
Image Permanence Institute of the Rochester Institute of Technology, generated
a body of data, which Jim Reilly will present in his talk.
By contrast, DIAP, involving many more participants and many more images,
was designed to be an interinstitutional investigation. The response to
RLG's competitive request for proposals to its member institutions in early
1993 was overwhelming. Each participant was asked to select 1,000 archival
photographs for digitization in order to create a testbed of images for
demonstration purposes. No restrictions were placed on the nature of the
photographic originals or the type of cataloging data used to describe them;
in fact, in selecting the participants, a mix of both media and cataloging approaches
was consciously sought. Just two conditions were made: that the
institution had to hold the right to freely reproduce the images and that the
photographs fit within the broad theme of " The Urban Landscape," so that
the resulting image database would have some subject coherence. Network
exchange of the digitized images was beyond the scope of this project but
figured prominently in our thoughts for future endeavors. The following institutions
were selected to participate:
• The Amon Carter Museum selected images from the Karl
Struss archive. Struss was one of the key figures in the pictorial
photography movement of the early 20th century, and
the project images focus on his work in New York City between
1908 and 1917.
• Columbia University chose photographs from the Avery
Library's Empire State Building archive, as well as from the
collections of three faculty members, including sociologist
Camilo Vergara's " New American Ghetto" and images of
the cathedral at Amiens. Twenty- five architectural drawings
from the AVIADOR project also were digitized for comparison
with AVIADOR videodisc images.
• Duke University selected from seven different collections,
including several that focus on cities and towns of the
Southeast and a collection of architectural picture postcards.
• The Getty Center for the History of Art and the
Humanities digitized a portion of the Max Hutzel survey of
regional Italian architecture.
• The Frances Loeb Library of Harvard's Graduate School of
Design selected 1,000 images from the city planning collections.
• The New York Public Library chose master photographs
from the Ramona Javitz collection, including urban subjects
taken by Dorothea Lange, Berenice Abbott, Walker Evans,
and Lewis Hine.
• Northwestern University digitized a collection documenting
what has been called the first great urban riot, the destruction
of Paris during the siege and Commune of 1870- 1871.
• The University of California at Berkeley used images of the
San Francisco area taken from a variety of collections and
depicting the city from the 1850s through the 1920s.
• Due to the extensive imaging experience of the Library of
Congress's Prints and Photographs Division, LC also was
invited to participate. It contributed images from its Civil
War and Jack Delano collections.
Each repository's images were digitized by Stokes Imaging Services of
Austin, Texas, during summer 1993, and the participants met for the first
WHAT DO USERS
WANT? CAN WE
MEET THEIR NEEDS?
time in October 1993 and then in May 1994 to discuss approaches to cataloging
and indexing and their relationship to the development of image collections.
Stokes Imaging responded to members' requests by further developing
its retrieval software, Visual Photologue, while the repositories focused
on cataloging and ensuring quality control of the digital images. There were
some delays in completion of all data production and software revision, and
just two weeks ago we received the final combined version of the DIAP uniform
database structure and the merged data for all nine institutions. The
CD- ROM contains the image and thumbnail files for 9,000 images, and the
accompanying diskettes contain the program files for the Visual Photologue
software as well as the database files and template.
The RLG Task Force on Photograph Preservation conceived DIAP in very
broad strokes. We envisioned a project to " explore the capabilities of digital
image technology for providing effective access to photographic materials."
We did not get much more specific when it came to our three main objectives:
( 1) to explore access and description issues related to making photographic
collections available in digital format; ( 2) to explore intellectual control
issues, as well as related collection and resource management issues,
faced by institutions in the context of sharing image collections; and ( 3) to
develop guidelines and models that will assist in decision- making at research
institutions developing online image access systems.
Carving out a feasible agenda for this cooperative project proved a real
challenge, since the entire field of digital imaging is changing so rapidly and
many of the most interesting and important problems require considerable
research and development.
Although any number of issues could have been addressed at this symposium,
we chose four topics and two case studies, believing these to be the
most significant issues faced in DIAP. They also underscore the real limitations
and hard choices that had to be made given the present state in technological
development and institutional capabilities and commitments.
Our hope is that the presentations will provide the necessary fodder for the
joint exploration of next steps: What consortially can we and should we consider
undertaking as we begin to develop and implement distributed digital
image collections? To help our exploration, each presentation will be followed
by a half- hour audience discussion led by a DIAP participant.
The first issue to be addressed is the most cosmic: What is our purpose?
Why make images available online? Hinda Sklar will assume the user perspective
in her presentation as she considers the ways people use images,
their needs, and the ramifications digitized images hold for various collection
management and access policies.
I can hardly conceive of a digital project that does not have as its premise
that we will make images widely available online and not just locally. There
are obvious advantages that derive from this premise. As librarians and
archivists, we could focus on the benefits to collection management first, but
all collection management decisions should be informed by the benefits to
be gained for the end user. Can we improve on a service that we already
provide? Can we do what we have always done, but cheaper or faster? Can
we provide researchers with some thing or service that we have not been
able to provide in the past? Will we really be meeting our researchers'
needs? Do we know what they are? And what are the ramifications for
how we collect, care for, and make available historical and cultural materials
over time?
We know from earlier studies that users' needs can vary greatly from discipline
to discipline, user to user, or with time and circumstance; that needs
and perceptions are not necessarily the same; and that efficiencies in collection
management do not always benefit the user. There is a whole range of
interrelated issues surrounding user satisfaction that are affected by system
capabilities, connectivity, and institutional objectives. A researcher may patiently
await the arrival of a box of photographs in the reading room, but
drum his/ her fingers at the time it takes a large image to come across a network.
Something happens to all of us seated at the terminal. We expect
faster, cheaper, better, and greater control than with physical objects and
traditional means of access. At the screen, timeliness and relevance become
highly correlated.
Although we do need to consider users' needs and how they are affected by
technology, this should not be the sole determinant for how we convert,
manage, and make accessible digital information. User needs will change
with a greater understanding of the capabilities present in making disembodied
information available any time, any place. I am reminded of John
Stuart Mill's comments on freed slaves who do not know what to ask for all
at once. They might be satisfied at first with just getting the shackles off
their wrists and ankles. Later on, freedom might encompass having a say in
where and when they travel or what happens to their children. It will be in
the discovery of the possibilities wrought by this technology that new historical
trends and insights could develop.
Some people claim that once information is freed from the limitation of
geographical location and its physical carrier, scholars will be open to new
ways of conceptualizing not just what they study but how they go about formulating
positions and comparing and contrasting data. In its promotional
literature, for instance, one imaging company contends that its approach to
digital collections will " encourage visual thinking."
There are those who even claim that a new field of research and development
is emerging. It would draw on a number of disciplines to explore the
world in which digital documents are defined as much by their structure and
contextual use as by the nature of the source material itself Time will tell
how significant the shift from the physical to digital is to the concept of a
document, but speculation and hyperbole are rife.
Some people are beginning to believe that we have oversold the value and
utility of digital libraries. One is the author of a new book entitied Silicon
WHAT IS POSSIBLE
NOW? ARE WE
READY FOR THE
FUTURE?
Snake Oil. Although I advocate digital library development, I share some of
the concerns raised by these people: are we in danger of throwing a party
that no one comes to?
We know for instance that, with notable exceptions, humanists lag behind
their colleagues in the sciences and the professions in computer use and network
access to resources. Partly this is because most sources for humanist research
are not available online; partly it stems from the reluctance of humanists
to change tried and true scholarly habits. We need to make a suffi-cientiy
compelling case for doing things in new ways, while providing some
comfort in electronically mimicking the old. We must involve users in deciding
what and how to convert material they will use and in providing ways to
locate it, while also giving them the means for annotating, comparing, contrasting,
synthesizing, collecting, storing, adapting— all the things they do
with physical documents today. Digital imaging begins with the conversion
process itself but does not end with the host of related processes involved in
imbuing the digital images with " intelligence" and the requirements for associated
indexing of structure and content. Presentation and functionality are
also critical to successful adaptation. This is an iterative process, with access
decisions made and changed, made and changed, and refined yet again.
One of my favorite quotes on the future of digital technology appeared last
fall in The New York Times. The paper asked a number of observers of the
brave new world to comment on the future. James Gleick's quote is particularly
compelling: " I have seen the future, and it is still in the future."
Our second presentation will be given by Ricky Erway, formerly of the
Library of Congress and now of RLG. She will present insights into what is
possible digitally given existing technological capabilities and will speculate
on what is coming in the foreseeable future.
It is elemental, but worth repeating, that this is a fast- paced technology.
What is true today will not be true in six months, and maybe not even to��morrow.
I recall making this point to Steve Hensen last year at RLG's
Digital Imaging Technology for Preservation Symposium held at Cornell.
We were talking at dinner and I said, " Steve, it's important to understand
the basic concepts but not get bogged down in the nitty gritty of the technology
because it changes so fast. It's like owning a car. I know a car has a
carburetor, but I don't need to know what it does as long as my car runs
every time I turn the ignition and it takes me where I want to go." Steve
corrected me: " Anne, cars don't have carburetors anymore." And I replied,
" That's just my point, the technology changes so fast, it's hard to keep up."
Given the state of change, we must define requirements for digital image
systems that are based on immediate and future applications, not on current
technological capabilities. I suggest there is no one true path emerging— that
the current means for data exchange all have serious limitations in terms of
flexibility, functionality, capability, and capacity. This may tempt one to
home- grow a system, but that should be avoided. System support over time
becomes increasingly burdensome, and we should avoid stepping on that
treadmill if possible. Instead we should capitalize on what is being used,
while positioning ourselves for the inevitable changes. As with other issues,
establishing system requirements will not be just, or perhaps even essentially,
a technological issue, but one in which user demands and institutional tolerance
must be carefully weighed.
Which brings me to conversion: there is still so much to understand about
the nature of photographic materials. As will be seen in Jim Reilly's presentation,
full informational capture is difficult with photographs because they
are incredibly dense with information. Although we can approach digital
resolutions that are close to the granularity of the film or print, that informational
capture comes at a price. I believe that we may be close to defining
requirements for preservation reformatting of some text- based materials,
but we have a long way to go in defining digital preservation requirements
for photographic materials.
Fortunately, it appears that access requirements for photographs are not
identical to conversion requirements. The vast majority of on- screen use
may well be satisfied by lower- quality images derived from a fully functional
digital access master, which in turn has been derived from a photographic
preservation master. It also seems, ironically, that we may be closer to satisfying
basic access requirements for on- screen display of photographs than
we are for textual documents. With text- based images, resolution is the
dominant indicator of image quality, and today's monitors cannot effectively
display complete high- resolution images. Users also expect to be able to manipulate
text- based information. I am often asked, what is the good of a digital
image if I can't search across it? It does not seem to matter that this
question is analogous to asking, what good is a book if I can't conduct a
keyword search across its pages? With photographs, the image is the
image— the desired end product— and resolution is but one factor among a
number— tonal reproduction, color representation, and fidelity included—
that govern image quality. Currently available monitors can do a good
job in providing recognition quality for photographs that will satisfy many
users' needs.
We need further research to help define conversion requirements. We also
need to study on- screen image quality requirements that take into consideration
not just the characteristics of the source documents, but the whole access
chain, from conversion, to transmission, to display and retrievability,
leavened by user expectations. Some of the work done at the University of
Southern California and elsewhere on progressive transmission of images is
quite promising.
WHAT IS ADEQUATE The third presentation, by Jackie Dooley, will focus on issues of retrieval. It
ACCESS? deals with the heart of DIAP: exploring intellectual access and control issues.
DIAP assumed from the outset that there was a tremendous variety
not only in the types of materials to be digitized but in the quality and detail
of the metadata— the cataloging, indexing, and finding aids— associated
with those images. Particularly important was the insistence on preserving
hierarchical order, the collection context, rather than merely assuming item-
HOW DO WE
PROVIDE EFFECTIVE
ACCESS IN A
DIGITAL WORLD?
level cataloging for each image. Also important was recognizing that not all
items would receive the same level of description and that the retrieval system
had to accommodate those variations. This led to the development of
multiple levels of records. The collection context thrust explicitly acknowledged
the importance of economies of scale in cataloging and description
and emphasized the natural connections that exist among groups of related
images.
The fourth speaker, Stephen Davis, will examine how the terrain explored
by Dooley is affected when one enters cyberspace. He wUl discuss linkages:
the options for making images and their corresponding indexing data available
in a networked environment. His presentation will consider the pros
and cons of existing standards, emerging choices, and hybrids of the two.
The notion of linkages set me to thinking about extending the idea of a
hierarchical approach in other directions— other dimensions, if you will.
One dimension, and the one receiving the most attention in this project,
deals with the level of cataloging information and how connections are
made to existing metadata. How well can we provide differentiated access in
a single system, accommodating item- level entries along with collection- level
entries while taking advantage of the digital images in a one- for- one context
as well as through representative examples— for example, can one image really
stand in for 50 cow pictures? Or will it indeed be the case, as one USC
project concluded, that the " success of a digital image database will be dependent
on easy- to- use, yet content- rich indexing"?
Another dimension— perhaps corresponding to width— was considered but
not truly realized in DIAP. It correlates to the number of institutions included
and the connections made among them. This dimension raises concerns
about how well a hierarchical approach can scale— beyond one institution,
beyond nine institutions, and beyond 9,000 images.
A third dimension involves the source documents themselves. What types
and formats of materials shall we include? Photographs sit side by side in
the minds of researchers with monographs, manuscripts, and, increasingly,
multimedia products. We must go further than just accommodating format
variety, however, to recognizing the importance of the relationship between
these formats. How do we preserve the relational and functional aspects
of documentation in a digital world? A photograph may represent an artifact,
it may be an artifact, or it may merely illustrate a notion or concept.
How do we convey those relationships and purposes as we link digital
data together?
When one dimension in the hierarchy is increased, when we go beyond one
institution, for example, or beyond one format, the problems become compounded.
When we increase two dimensions, there is an exponential variance
from the ijorm. The more comprehensive in any and all dimensions we
strive to become, the less effective may be the results at anything but the
highest levels. The notion for instance that the digital library initiatives fund-
WHAT ABOUT MARC?
SGML? ARE THEY
GOOD CHOICES
FOR CYBERSPACE
ACCESS?
ed by the National Science Foundation ( NSF) will result in the organization
of vast quantities of globally networked data into something approaching " a
single useful information resource" just boggles my mind.
Stephen Davis's discussion of linkage will lead to the last two presentations,
representing two current approaches for providing access— both to information
and images— via SGML ( Standard Generalized Markup Language)
and MARC ( Machine- Readable Cataloging). Jack von Euw will present a
hierarchical access model using Visual Photologue and Berkeley's adaptation
of a collection- level access approach. Dan Pitti will follow with a presentation
on the Berkeley Finding Aid Project and the use of SGML.
The use of SGML is intriguing. Initially I was struck by the idea of taking a
text- based approach and applying it to graphic material ( in reality it is being
applied to the textual information accompanying the graphic material in
this project), but it begins to get at the heart of how pictures and words in��teract.
As an aside, one wonders about embedding SGML- tagged data in
the binary code of the image itself as distinct from or in addition to marking
up the finding aid. The work that the Electronic Text Center at the
University of Virginia has been doing in this regard with book illustrations
and other book- related images is tantalizing.
What is appealing about the Berkeley approach— and the approach of
DIAP— is the recognition that different levels of access are provided as they
are represented in finding aids. Both projects challenge the assumption that
effective access is conditional on providing extensive content- enriched indexing.
Both projects assumed pragmatically that current finding aids represent
cost- effective descriptive methodologies and looked to capitalize on that
work in a networked environment. What is disturbing about the approach is
that it may not sufficiently question the status quo. The status quo should be
challenged in the three- dimensional hierarchy wrought by digital technology
and network access. We need to consider how to move beyond the indexing
of distinct collections to providing intellectual links regardless of where the
material is located and who owns it. Our finding aids in the future must be
interactive, not collection bound or repository bound. This assumption raises
a myriad of questions about standard operating procedures:
• What impact could the use of digital imaging technology
and SGML have on the way archival collections are
arranged and described?
• How will the availability of an image database of digital
files affect the need for and extent of textual description
and cataloging?
• What are alternative means for pinpointing access while
maintaining internal context and structure?
• How does the flexibility associated with digital technology
for providing random, multiple, and intercollection access
affect such archival principles as provenance, original order,
sequence, and context— principles that have been defined in
8
part because of the physical hmitations of one place, one
time, one collection, and one creator?
• Can archivists and curators really develop interactive finding
aids to provide intellectual links regardless of where the
original material is located?
• How do we rethink arrangement and description so as not
to merely duplicate actions suited to a paper environment in
the digital world?
• And, finally, what technical and administrative impediments
to network access and collection sharing exist, including but
not limited to network bandwidth, hardware/ software requirements,
speed of retrieval, proprietary institutional
mindsets, the need to protect privacy, copyright, and donor
wishes?
So, as will be evident in the presentations, we will raise many more issues
than we can effectively address— the sign of a vital project. And we will
discover what we have learned from DIAP—^ what worked, what did not,
what was the most challenging, and where we can best direct our efforts
as a consortium.
Kenney is associate director of the Department of Preservation and Conservation, Olin
Library, Cornell University.
9
10
WHY MAKE IMAGES
AVAILABLE ONLINE
User Perspectives
by Hinda F. Sklar
THE USERS' Imagine you are an architectural historian located in New York City doing
POINT OF VIEW research on the Empire State Building. You are particularly interested in
finding pictures of the building and its immediate surroundings in the early
part of the 20th century. You decide to visit your favorite research institution,
the New York Public Library, to investigate possible sources of information.
A librarian sends you to the Division of Art, Prints and Photographs,
where you begin your search for images. Your research indicates
that the Avery Library at Columbia University has an Empire State Building
archive and that the photographer Karl Struss, whose collection is at the
Amon Carter Museum, also photographed New York City between 1908
and 1917, and therefore might be a resource for pertinent images for your
study.
Discussion with the division librarian leads you to an online resource located
in the department, the Digital Image Access Project online database, containing
the images and records from photographic collections from nine institutions
across the country. You sit down at the designated terminal and,
after searching simultaneously across all nine databases using the term
" Empire State BuUding," you cull a series of small images which display together
on your screen. You select six of these images to enlarge and further
study on the screen before you decide to request photographic copies of
three images: two from the Avery Library collection and one from the Struss
collection. You decide to download the three images to some floppy disks so
that you can study them further at home, noting as you do the copyright restrictions
on the images. You then speak to the librarian about contacting
the two institutions to arrange for 8" x 10" photographs to be made. Your
work complete, you gather up your things and prepare to leave. The entire
process has taken you about an hour and a half
The Research Libraries Group Digital Image Access Project ( DIAP), which
has been the focus of our efforts over the past year and a half, had as its
central goal the exploration of " the capabilities of digital image technology
for providing effective access to photographic materials."' Photographic materials,
which are a subset of a broader class of visual materials that includes
slides, drawings, paintings, maps, plans, posters, and all manner of two- dimensional
visual works, can be scattered within any given collection or
across collections, may be voluminous in size, fragile in form, and are generally
difficult to manage and handle. Digitization of these unique visual resources
can provide users of our collections unprecedented access to these
materials in an effective, efficient manner while preserving the artifactual integrity
of the individual items. I will talk today about the user perspective on
digitizing visual materials: what are the benefits for the users of our collec-
11
THE USE OF IMAGES
IMAGES IN DESIGN
EDUCATION
tions, and how does digitization enrich the scholarly and nonscholarly
community of users?
Images—^ photographs, paintings, drawings, and such— are unique resources.
Why do people want images? Images serve a variety of functions in
a research or information environment. They can visually illustrate an idea
or concept, such as an illustration of the concept of perspective or how light
refracts into a range of colors when passed through a prism. Images can be
used to develop a theory or idea further. For example, Wassily Kandinsky, a
prominent member of the Bauhaus, developed creative theories about color
which formed one of the core principles of the Bauhaus approach to color;
his students used a variety of images to illustrate his color theories. Images
also allow people to understand the visual environment that might serve as
the context of a philosophical idea, a built environment, or a period in social
history. For example, what does the area look like around the National
Gallery in London, and how did that affect or influence the architects' design
of the new wing of that museum? Or, what were the social conditions
in 1919 that gave rise to the Bauhaus movement in Weimar, Germany? And
finally, images can be used simply to provide a visual hook upon which to
hang a hat: an image of the Empire State Building used to represent New
York City, for example, or the famous photograph of Jackson Pollack painting
in his studio used to exemplify modern artists and techniques.
Once people determine what images they might need, they then are faced
with the task of locating them— no easy feat these days with the variety of
periodicals, books, and special collections in our public libraries, museums,
and academic institutions around the country and the world. Most users
find themselves consulting card catalogs, printed finding aids, or other research
tools to locate the images they need. Using a variety of controlled vocabularies,
familiar and unfamiliar, and a host of different systems, local and
national, they comb these resources to locate appropriate images. Often,
however, they find the easiest way to locate images is to search through
books on their particular topic to look for visual clues.
When users have located images, the images are then physically handled in
a variety of ways. Users look at the images to ensure their suitability for the
intended purpose, often comparing several images with each other and then
sorting and arranging them by personal criteria. Once researchers have
sorted them into categories for use in a particular context, they may choose
a specific image, or two or three, for which they want copies. The copies
may take the form of photocopies, or users may ask for photographs.
Depending on the intended use for the image, users might, during the
course of their work, need to modify a specific image or manipulate it and
change it in some way. Images, then, are used in different ways by researchers,
historians, publishers, or designers; the intended use of the
images helps determine the nature of the particular chosen image or
series of images.
I will illustrate some of what I have been discussing by describing how images
play a role in the educational process at the Graduate School of
Design ( GSD) at Harvard. The GSD focuses on teaching architecture, land-
12
ONLINE IMAGES:
ADVANTAGES
TO USERS
scape architecture, and urban design to future design professionals. Our
users are primarily visually oriented; they read, of course, but the printed
word ranks far down their list of primary resources! Certainly their first
frame of reference is visual, and they use a healthy proportion of the library's
resources almost exclusively for their visual content.
What do they look for when they go through our materials? For architects,
landscape architects, and urban designers, the products of spatial design are
buildings, landscapes, or urban contexts. The designer's art relies in large
part on the ability to represent the abstract spatial concepts which form the
basis of the designed environment. The design process involves the statement
of a problem, the subsequent definition of a series of possible solutions,
and the selection of a final design solution that, in the eyes of the designer
or design team, best solves the stated problem and also meets a set of
criteria that emerged during the process of defining potential solutions. The
designer may arrive at design solutions using an array of design methodologies
ranging from the study of historical and contemporary precedents in
building or landscape types to the analysis of work produced over a specific
time period by a single designer or by a design movement.
The design process itself draws on a wide range of material: site plans, soil
surveys, census data, zoning codes and other regulatory statutes, maps,
drawings, and sketches. Of necessity the designer amasses a large amount of
data which is consulted throughout the design process. The design student
or professional architect also depends heavily on images ( for example, photographs
and three- dimensional models) which are copied, disassembled and
reassembled, or reconstructed in other ways to stimulate his/ her design
thinking. The result, regardless of the designer's methodology, is a design
that fulfills both the spatial needs of the problem and the social and contextual
criteria that are developed during the design process.
At the GSD, the search for visual materials leads our users to a variety of resources:
periodicals, books, videos, planning reports, maps, drawings, plans,
and sketches. Our students go through quantities of material very rapidly,
often using them where they find them— on the floor, between stack ranges,
on shelving trucks in the photocopying room. They photocopy materials
heavily, sometimes to actually read something, but more often to further play
around with or manipulate and analyze images. And they scan images from
books and periodicals as well, using a flatbed scanner that produces digital
images that can be manipulated or deconstructed, using a computer, to understand
the underlying design principles.
As you can see, at the GSD images are paramount to our students— and
in an environment where constant pressure is the rule and time is a precious
commodity. How then can an online environment benefit those who need
illustrative materials, such as GSD students or users of the New York
Public Library, who range from publishing industry to academic to private
researchers?
13
First, users can perform a range of searches in a variety of contexts in one
location through one tool— the computer In addition, computerized access
allows users enormous flexibility in how they formulate and manage searching.
Users can search using specific subject terms through controlled vocabularies,
using keywords, or using form and genre terms. Computers also
provide browsing options to users, for example, from the " top down" ( the
traditional archival system of cataloging) or other logical ways across hierarchies
and subject headings and indexing terms.
Second, computerized access also allows searching many collections in
many different locations— either in one search or in multiple searches in one
system. This means that users do not have to travel to far- flung sites or
search individually through a multiplicity of finding aids to get information—
and visual information— on widespread collections and resources.
Third, when computerized searching provides access to the images themselves,
users have instant direct access to surrogates of the material. And
electronic access provides new ways of using visual resources. A number of
images can display on the screen at once and be compared easily and rapidly.
Movement between images is simple, and selection of possible images for
use can be done in a relatively short period of time. Studies have shown that
users can make decisions based on electronic visual stimuli much more
quickly than when examining slides individually on a light table, for example.
The images that have been selected can be enlarged and studied in detail
with a simple click of the mouse, and images can be downloaded for
further study and use ( providing existing copyright restrictions are followed).
Downloading onto a floppy or the hard drive offers users the ability to import
images into photo- rendering software and to then manipulate and edit
the images and create new images. Downloading also offers the possibility
of transferring images directly onto slides or of projecting the images from
the computer through an overhead digital projector onto a large- sized
screen, thus allowing presentations to be done easily and directly from the
computer
Fourth, as more and more of our images in collections are digitized, users
can find materials that span collections and institutions that might otherwise
not be known or available. Broad access can be offered to a unique set of
resources, fostering new intellectual depth and breadth in research.
Fifth, in today's networked environment, resources can be found and examined
from almost anywhere without time constraints, as long as one has access
to a computer with a modem and communications software. The students
at the GSD, for example, are interested in any form of access to resources
that will free up time and allow them to work whenever they want
from wherever they are— which is often at their studio desk at 3 a. m. in the
morning! Online access to a database of digital images will enable them to
do what is currently considered library research whenever they want,
whether the library is open or not.
14
ONLINE IMAGES:
ADVANTAGES FOR
SERVICE PROVIDERS
And, finally, in our familiar library/ museum context, there is a " one
user/ one image" limitation on access to materials. Each piece of visual material
is available to only one user at a time— that is, there are not necessarily
multiple copies of books or images, so that when one slide or book or
photograph is in use by one user, generally no one else can use it. This principle
constricts users' ability to use all the resources that might exist on a
topic. Digital networked images can be used simultaneously by a number of
users, thus broadening the range of resources available at a given time.
What are the advantages of online access to images for those who provide
access to users— the librarians and archivists of our world? If we look at the
list of advantages for users, we see that the same advantages hold true for
service providers.
First, computers offer access via a variety of vocabularies to resources within
and outside our immediate collections. Users can look for things using familiar
terms or, with a bit of guidance, they can string together creative searches
to more narrowly— or broadly— come up with a range of possible solutions.
The fact that multiple vocabularies and searching techniques are possible
allows users to use terminology and methods with which they feel comfortable
with a minimum of guidance. As the caretakers of images and collections,
it is to our benefit to lead our users to as rich an array of resources
as possible, and giving our users the capability to do so on their own is an
important benefit.
Second, through the effective online cataloging and digitization of related
images, we can provide broad access across subjects and collections to a disparate
set of materials without having to locate images through a multitude
of separate finding aids or catalogs, and then retrieve each image. As collection
managers, this means that we wUl no longer need to keep the variety of
physical finding aids at hand that we currently do and work with our users
to understand how to use the sometimes bewildering array of material.
Neither will we need to pull out and refile boxes of photographs or dozens
of slides for users who are searching for the perfect illustration, or indeed,
provide so many large, flat work spaces for users to spread out those numbers
of photographs and drawings to study.
Third, by providing direct access to digital surrogates of images ( which can
be used in so many ways), we can defray the wear and tear on our originals,
thus preserving those originals for more intensive scholarship and study
needs. The preservation community has long recommended the creation of
surrogates as a preservation tool, and microfilm has been the recommended
format. However, as standards are established for digital reproductions, I believe
that digital formats will become an accepted standard, particularly
since the quality of the image is far superior to that on microfilm for certain
kinds of material, particularly visual materials. Additionally, because digital
imagery offers the possibility of saving a single image in a number of different
resolutions ( that is, in a variety of dots per inch), which affect the quality
of the image as well as the amount of storage per image, we can give users
lower- resolution images for study purposes, while higher- resolution images
15
can be reserved for approved, licensed, copyright- acknowledged- and- paid-for
reproductions. The Getty Art History Information Program ( AHIP)
Imaging Initiative is currently pursuing a project that is investigating the acceptable
quality levels for images needed by users for particular classes of
use. As collection managers it is a wonderful benefit to be able to provide
access to large amounts of material without having to deal directly with the
physical images themselves.
Fourth, much as computerization provides access to information that might
be spread out across collections, institutions, or countries, digital images can
be distributed among a variety of locations and accessed over a network
from any properly equipped terminal. This capability for distributed location
and networked access will allow librarians and archivists to manage
their digital resources locally while providing widespread access to these resources
to as broad a community of users as they wish.
Fifth, access to resources which are delivered electronically and can occur
without regard to date or time of day— when the researcher needs information,
research can take place— can only benefit us as service providers at
a time when fiscal constraints have already shortened open hours in many
collections.
And, finally, as I said earlier, multiple users can access the same material,
thus providing a broader accessibility to limited resources. This network access
capability is the first step in the development of the true library without
walls— open to all, at any time, from anywhere, to provide information
needs to a community of users.
COPYRIGHT ISSUES The question of copyright is always raised when people talk about a digital
environment. How do we handle copyright? Are these image databases violating
copyright? What are we doing to protect the copyright owners?
Copyright is a major issue in the development of any widely available networked
resource. Indeed, it has become one of the biggest stumbling blocks
in the move towards the notion of a digital library of the future. In the
United States, the copyright laws protect material from being widely copied
and distributed without permission from the copyright holder except as defined
under the " fair use" provision, which allows one- time copying for a
class or for personal use, following defined guidelines. 2 How do these laws
translate to the electronic world, where digital technology empowers people
to manipulate and modify information they receive from other sources?
Although no definitive legal statement has been made about image databases,
there is a growing body of research and writing in that area.
Perhaps the clearest positive statement on copyright related to electronic resources
that I have read comes from an article by Mary Kay Duggan, a library
science faculty member at the University of California at Berkeley.
She explains that the fair use principle allows single copies of pieces of information—
sound, text, image— to be used for educational database cre-
16
OBSTACLES,
CHALLENGES, AND
NEW PARADIGMS
ation by direct downloading, OCR scanning, optical scanning, or sound
recording. 3
However, copyright is being hotly debated by publishers and vendors even as
we meet here, and the concept of fair use as it applies to digital environments
is coming under close scrutiny. The publishers believe that digitization
offers unheard of ( and irresistible) opportunities for violation of copyright,
and are currently pursuing a path that would disallow the concept of fair
use for the digital world. In the meantime, what can we do?
We can build in features specifically to manage copyright and reproduction
issues, for example, by providing fields in each record associated with the
image that cite the source of that image and the copyright owner. We can
provide a copyright " stamp" below each image. As I stated earlier, by providing
lower- resolution images for certain levels of study we can prevent the
downloading or copying of images that would be usable for other purposes,
particularly publication. We can also prevent downloading or copying of images
based on user ID or IP address; in other words, where we know we own
the copyright and wish to provide our own users with copying capabilities,
we can do so more easily in an online environment. We can also track image
use and copying and build in charging mechanisms such as commercial services
do; at the same time, we can provide far better records than we currently
do in an age where everything gets tossed onto a photocopier by users.
For the RLG Digital Image Access Project, the nine institutions involved
used images for which they owned the copyright or for which there were no
copyright issues. At the GSD, where we are currently developing an image
database called DOORS ( Design Oriented Online Resource System), we
have been careful to include only images that we know are either out of
copyright or from known, relatively " safe" sources. We are not scanning slide
sets or illustrations from books or periodicals. When students begin to put in
their own material, we will provide copyright guidelines and request they indicate
the source of their images. But we cannot guarantee that something
copyrighted will not get into the database, and so we have said clearly that
this is an educational tool to be used in the study and teaching of design and
is not a commercial product. We also will not provide access to outside users
until we can have in place the security needed to allow look- but- can't- touch
access to users outside the GSD community. In this way we feel we are protecting
our interests and those of possible copyright owners whUe developing
an important schoolwide resource.
Any new technology brings with it certain obstacles that must be overcome.
Copyright is and will remain a thorny issue for some time. There is still a
certain amount of computer phobia in libraries, although it is rapidly disappearing
as younger generations seem to become computer- literate in kindergarten.
The process of creating and maintaining digital online resources is
certainly expensive, not only in terms of equipment and technological resources
but also in manpower— the human resources needed to design, develop,
enrich, and maintain these networked resources. And the technology
17
is constantly changing; you can be guaranteed that nothing will be fixed for
more than a few years at best.
However, digital technology can open our collections and inspire our users
in ways we may never have imagined. At the same time, it can preserve and
protect our most precious resources for generations of users to come. It is a
difficult challenge, but well worth the effort. New ways of looking ( literally)
for or at images may create new paradigms for using images. And, after all,
isn't that what libraries and museums are for— helping users find new ways
of thinking and looking at the world around them?
Sklar is associate dean for information services and librarian at the Frances Loeb
Library, Harvard Graduate School of Design.
1. Patricia McClung, " RFP for RLG Digital Image Access Project," letter addressed to
RLG member representatives, dated March 23, 1993.
2. T. MJ. Hemnes and A. H. Pyle, A Guide to Copyright Issues in Higher Education. Washington,
DC: National Association of College and University Attorneys, 1992.
3. Mary Kay Duggan, " Copyright of Electronic Information: Issues and Questions,"
Online{\ m\) 15 ( 3), 21.
18
Audience Discussion.
The discussion focused on the changes in reference service, material selection,
and access mechanisms that will be required as more and more resources
become digitized and are made accessible online to local and remote
users.
Participants noted that libraries may not see reading room traffic significantly
increase as the result of online image databases, but online and telephone
reference inquiries most certainly will. The nature of the users will change,
from a known public�� namely local students, faculty, and scholars, the traditional
users of academic and research libraries— to an unknown public that
is out there in cyberspace. This proliferation and diversification of the user
population may cause librarians to rethink the way they select and present
visual materials, for instance, so that they strive for a broader, global approach.
Fully one- quarter of the audience indicated that their institutions
have already mounted images on the World Wide Web ( WWW).
Reference service. The problem of providing reference service to online
searchers was widely discussed. It was suggested that with online searching,
many users can quickly and easily determine what is of use to them without
the aid of a reference librarian or finding materials, and that digital image
databases will give users even more of an idea of what is available. Future
technological developments, such as automatic document delivery of high-resolution
files, may further lighten the load on reference librarians.
On the other side, however, were those participants who felt that online
users need more than images in a database to find the data they need.
Metadata needs to be included and even expanded upon, for example, to fa-cihtate
access, to provide copyright information, to promulgate policies relating
to electronic and physical access, and, very importantly, to make plain
the association between the library's holdings and its collection policies.
Without the proper finding aids, images and any other digitized information
could prove inaccessible and therefore useless. In essence, all of the types of
information one would expect to receive at the reference desk will still be required
for cyberspace patrons. One participant likened the current era to
the early days of converting the card catalog to electronic form. Just as librarians
and archivists could not anticipate what would happen when the
card catalog was converted, we cannot know now what will happen when
holdings are digitized. Events over time will reveal more.
Questions of selection and access. The difficulties in knowing how to select materials
for digitization in this rapidly changing world of electronic information
access were examined. One strategy at the Library of Congress is to select
public domain materials ( Americana, Civil War photos, for example) and
high- demand materials. Another idea is to select backlogged materials not
accessible currently.
Several participants stressed the need to develop models of how users access
materials and what materials they access before devising a selection strategy.
19
For example, the GSD user and the America Online user are entirely different
and their needs are different, but they may sometimes seek the same information.
It is necessary to understand these different users and the ways they hope to
use the information they discover online. Librarians and archivists have to
figure out how to provide access from different points of view. One participant
pointed to the Getty AHIP Points of View project which, in trying to
discover what types of information art researchers want and how they go
about finding it, studied the subscribers to Smithsonian Online and the
types of queries they posted.
Developing a selection strategy does not mean censorship. Several participants
pointed to C- SPAN as a model, in that the material presented is selected
but unfiltered. The audience agreed that our perceptions of what is
important should not get in the way of what the public may or may not
want.
Retrieval systems were discussed. Suggestions included synonym retrieval
software and the inclusion of conceptual relationships, and emotional,
esthetic, philosophical, and mathematical terms in authority files. The
desirability of multilingual graphical user interfaces to overcome language
barriers was also discussed.
The problem of covering costs. Participants agreed that the resources in our libraries
and archives are invaluable and that digital imaging will give users
increased access to these materials. As demand for information increases
and the costs of providing it remain high, are there ways institutions can recoup
costs? One participant cited findings from the University of Michigan
Library, which stores some materials offsite. These materials are delivered
free of charge to the central library, but faculty can have them delivered to
their door for a fee of $ 4, an option many faculty choose, according to the
study. It was speculated that many libraries and archives eventually may be
forced to charge users for more and more services just to cover costs.
Anthony Troncak, Prints & Photographs, New York Public Library
20
OPTIONS FOR DIGITIZING
VISUAL MATERIALS
by Ricky L. Erway
Digital- imaging projects are expensive and should not at this time be considered
a cost- saving endeavor. They can, however, be an efficient way to reach
new audiences, expand access to collections, provide surrogates or preserve
originals, or make previously unavailable materials available.
Digitizing visual materials is a complex process that involves selecting collections
for conversion, conserving individual items to optimize capture and
minimize damage, organizing the collection, cataloging or preparing a finding
aid, choosing quality levels and standards, archiving the files, and finding
the best ways to make them accessible. In this time of increasing demand for
library services and decreasing financial support, it is especially important
that we are aware of all the options and make the right decisions to optimize
our efforts.
Typically, there are two approaches to embarking on a digitizing project.
The more rational approach is the identification of a need or problem and
the consideration of digitizing as a solution. More prevalent today, however,
is the desire to create a digitizing project followed by the decisions of what
to digitize and why. This cart- before- the- horse approach can at least be done
rationally.
WHY ARE WE ^^^ ^'"^ t thing to understand is why digitizing is being considered for a par-
DIGITIZING? ticular project. Is it for preservation, for reference access, or to be able to
reach broader audiences? The answer will guide many subsequent decisions,
such as whether a film intermediate will be created, which standards will be
chosen, and what the level of image quality will be. The answer will also affect
storage needs, transmission speeds, rights issues, and cost factors.
Preservation. There are two primary ways to realize preservation objectives
through digitization: creation of a surrogate to preserve the original or
creation of a high- quality reproduction to replace the original. The first is
easier to achieve through digitizing than the second.
Reference Access. There are two possible reference access gains: Provision
of new access or provision of improved access. New access can mean access
to materials so fragile they cannot be used in their original form. Improved
access can mean providing more immediate access and more efficient browsing
of large numbers of materials. Either way, this improved access saves
time that researchers can instead put to use on thought and discovery.
As an example, a few years ago at the Library of Congress, a regular out- of-town
researcher sized up his task based on his past experience using a particular
collection of photographs and made his airline and hotel reservations to
accommodate a week of research. When he arrived, he found that the col-
21
IS DIGITIZING THE
RIGHT SOLUTION?
WHAT ARE WE
DIGITIZING?
lection had been made electronically available. No longer did he have to
have material fetched in small batches from storage and handle the materials
very carefully with white gloves and under curatorial supervision. This
time he was able to search for specific things and view them immediately on
the screen, and for the first time he could actually browse through thousands
of photographs without using any finding aid at all. He reported that
he accomplished more in a single day than he had hoped to in a week.
Remote Access. Remote access can mean reaching researchers beyond
the walls of the institution. They can now do the research, find what they
need, make reference copies, and order reproductions right from their own
desks. And remote access can mean providing internal users with access to
collections beyond the walls of your institution.
It is difficult and expensive, though not impossible, to achieve all these goals.
However, if one is digitizing to achieve any one of these goals, it is wise to
consider whether any other of these goals can be accomplished at the same
time. With all the effort and expense, we want to maximize the benefits received.
Some combinations of goals can be more easily achieved. While improved
reference access obviously helps the researcher, it also helps the library preserve
its collections and, through decreased handling, secure them from theft
or damage. Providing remote access can mean the overcrowded reading
room has more readers, but fewer bodies.
After determining what our motivations are, we should also consider other
means of achieving our ends. Could improved cataloging or finding aids
alone allow us to realize the improved access or decreased handling we desire?
Might capturing the images on film or analog videodisc or some other
means be easier, cheaper, higher capacity, quicker to prepare, quicker to access,
and easier to distribute to the intended audience?
It is wise to consider whether, when the primary objective is achieved, other
objectives might emerge. If it is certain that the images will be only accessed
internally, many different approaches can be considered. But it is far better
to know ahead of time if we intend to allow other institutions to access the
digitized resources. Digital images prepared for access purposes may increase
the demand for access to the originals, thereby generating preservation
concerns.
After we know why we are digitizing, the next decision is what to digitize.
Again there are a number of approaches.
We can select a collection that is in need of preservation or is in high demand
by researchers. We can review past use of our materials and select the
most accessed images— the greatest hits approach. We can digitize individual
items on demand and add them to our archive. As materials are selected
for other reasons, such as conservation, photographic reproduction for researchers,
or publication in traditional forms, we can also digitize them. We
22
can identify a new audience and select items or a collection to meet their
needs. We can start with items or collections in which there is private sector
interest— private sector companies may be willing to do or fund the digitizing,
or even pay the institution royalties based on usage or sales.
Other factors will influence our selection. What are the rights issues? Do the
materials lend themselves to the technology? Is there uniformity in size, type,
condition, and readiness? Do we have finding aids to access the digital files?
^ H/^ l" ARE THE Once the materials are identified, we can assess the quality and formatting
REQUIREMENTS? requirements.
The level of quality will affect cost, storage requirements, and transmission
time. One approach is to provide lower- resolution reference images and continue
to use photographic reproductions to satisfy user needs not met by the
reference images. The intended use of the images or the need to capture
text or other fine details in the originals will affect the degree of resolution
required.
Resolution. Very- high- resolution images wUl typically necessitate decisions
about how they will be displayed. It can be disconcerting for the user to display
an image and see only a piece of sky in a landscape scene— or a nostril
in a portrait. Will the images be scaled at display time? Are lower- resolution
images needed for quicker transmission and meaningful display?
We typically talk about resolution for document scanning in terms of dots
per inch ( as in 300 dpi). But for photographic materials, we more often
refer to resolution in terms of the total number of dots vertically and horizontally
( as in 480 x 640, which for an 8" x 10" original photograph means
about 60 dpi).
In addition to determining spatial resolution, we need to identify the required
pixel depth. Line art is typically captured as 300 dpi bitonal images.
Pixel depth can increase the apparent resolution of the image, allowing for
lower spatial resolution. Grayscale photos can usually be adequately represented
by 8 bits per pixel, allowing for up to 256 shades of gray. We should
consider if we wish to retain sepia or some other nongray value. If photos
are in color, they are often captured at 24 bits per pixel, allowing for millions
of colors. For color graphic originals, such as posters or watercolors, 8 bits
per pixel may suffice. Sixteen bits per pixel, allowing for 65,000 colors, is a
middle ground. Thirty- two bits per pixel is typically the highest pixel depth
captured. Increases in pixel depth will affect file size more than increases in
spatial resolution.
Low- resolution thumbnail images are often captured at 8 bits per pixel to
minimize file size. While 8 bits per pixel is usually adequate for a single
image, if the same color palette is to be used for a vast range of subject matter,
the colors become quite limited. An alternative is to use adaptive optimized
palettes ( each image selects its own 256 colors). But if the thumbnail
images with adaptive palettes are to be displayed proofsheet style on Super
23"
VGA displays, the active image will temporarily impose its palette on the
other images.
Another consideration is the effect of the colors used in the end- user environment.
For example, in Windows ™ ', a few colors are reserved for menu
bars, highlighting, and so forth; these colors can affect the display of the
thumbnail colors. This can even affect black- and- white images, for instance,
adding a moldy- green cast. A solution is to capture black- and- white originals
as 8- bit color images with adaptive palettes that reserve eight colors for
local use.
High spatial resolution and increased pixel depth make compression necessary
rather than optional. For each type of original, we should experiment
with varying degrees of resolution, depth, and compression. Lossless and
lossy compression should be evaluated. If an uncompressed version is to be
retained, we can take advantage of significant compression to minimize file
size.
A 3 K X 3 K image at 24 bits per pixel results in a 27 MB file— only 24 of
them will fit on a CD- ROM. It becomes impractical to consider retaining
uncompressed images for large collections. If an average of 30: 1 compression
is applied, the resulting files will average under 1 MB, and over 700 will
fit on a CD- ROM. The risk, however, is that when improved compression
algorithms are developed, recompressing a decompressed image may result
in deleterious artifacts.
Other decisions to be made include whether edges of the original are to be
included in the image, whether masking is wanted, whether enhancement,
such as sharpening, is desired, and whether color bars should be captured to
allow users to adjust the color on their monitors to get an accurate rendition
of the original colors.
Formats and Standards. Decisions on formats will be largely determined
by intended use but should take into account longevity issues. Even if
the images will be made available only in a local closed system, possible future
systems should be considered. The planned- for hardware and software
may not always be supported or maintainable. Choosing standard data for��mats
and media will allow for future reformatting so that the digital resources
will always be accessible.
Collaboration on standards among libraries, educational institutions, government,
and the private sector will be critical to ensure that information
digitized by one can be accessed by another It will also allow for cross- collection
retrieval on collections from multiple institutions.
The standards and tools for photographic images are perhaps the best- defined
and accepted of all those involved in conversion of archival materials.
There are very few large- scale photo- imaging projects I know of that do not -
use the TIFF image format and JPEG compression.
24
TIFF is in such common use that it has become a de facto industry standard.
It is possible for an image to become separated from its cataloging or
other descriptive information; the TIFF format allows storage of a significant
amount of information in the TIFF header, such as capture resolution
and date, source of the image, file name, version information, or even copyright
restrictions. JPEG images can haveJFIF ( JPEG file interchange format)
headers that can include similar data.
The most common alternative to the TIFF format and JPEG compression is
Kodak's PhotoCD system. This system allows for a range of resolutions—
from 200 x 100 dots to 3 K x 2 K. Compatible hardware and software are
now widespread. Disadvantages include the use of a proprietary format, imposed
file- naming schemes, lack of header information, and the need to digitize
from negatives or slides— which, for archival collections, often requires
an intermediate capture step. The sets of five derivative- resolution images
together amount to about 60 MB of data— 100 per CD- ROM. The Pro-
PhotoCD allows for more flexibility.
Accepted standards should also be used for descriptive information. MARC
and SGML are obvious choices for catalog records and finding aids.
Sometimes we are tempted to choose solutions based on today's software
tools, but these are not long- term solutions. It is better to adopt accepted
standards ( for example, TIFF and SGML) and convert them for short- term
needs ( for example, GIF [ Graphical Interchange Format] and HTML
[ Hypertext Markup Language] for World Wide Web servers).
A third area where standards should be considered is retrieval systems. A
Z39.50 server may be the best choice if access is likely to be made available
to multiple libraries. For stand- alone systems, the operating system, retrieval
engine, storage medium, and user interface will be chosen to best suit local
needs.
In determining data formats and quality levels, we should consider end- user
requirements. Will the users have readily available playback equipment?
Can they use the files in readily available software? High- resolution, 24- bit-per-
pixel images will best be viewed on computers with fast processors and
high- resolution, large- sized monitors with special video display cards and additional
video memory. The speed of modems, network connections, and
disk drives will affect access times. Printers should have extra memory and,
unless our beautiful 24- bit images are to be halftoned on a standard laser
printer, a special printer may be desirable.
File Naming. Another requirement to be considered is how the files are to
be named. If unique numbers are already associated with the images, it
makes sense to incorporate them into the file names. It is advantageous if,
even in a UNIX environment, the file names conform to DOS file- naming
requirements. The three- character extension should identify the format of
the image. If there are multiple versions of each image, a character should
be reserved to differentiate between a thumbnail, a compressed, a higher-resolution,
or an uncompressed image. If a group of images relates to a sin-
25
gle record or a hierarchical level in a finding aid, the file names should left-match
to that degree. Sometimes a file- naming system is worked out ahead
of time; in other cases, the file names are created at the time of scanning. In
either case, there must be a way to maintain the relationship from the image
to its descriptive information.
Directories and Links. Each collection of images should be in its own
directory. A system for placing images in subdirectories will also need to be
determined; a thousand images in one directory is unmanageable. If the directory-
naming scheme is derived from the file names, it will not be necessary
to provide the entire path name in the links. If the link provides both
the collection name and the file name, a locator file can reference the collection
name to the location on a storage device and the file can then be automatically
located. In this way, when a storage device is replaced or reorganized,
only the locator file need be updated, not each link. At a metalevel,
the same approach can be used for institutions and their collections, using a
handle server to maintain those links. The evolving use of uniform resource
numbers ( URNs), uniform resource locators ( URLs), and uniform resource
identifiers ( URIs) may make these accommodations.
While at first a relatively small number of images linked to a large existing
catalog will get lost, as more images are made available, it may make sense
to maintain a single system to best serve the researcher.
WHO WILL DO Once the why, the what, and the how have been addressed, the next ques-jLip
DIGITIZING'^ '"^°^ ^^' ^ ^ ° ^"^ '^^ ^^^ ^'^\ the digital capture be done in- house or on contract?
Since most libraries are finding it difficult to get funding to hire enough staff
for traditional acquisitions, cataloging, and reference work, it is unlikely they
will be able to develop the technical skills to handle the digitizing in- house.
Especially if many digitizing projects are planned, the types and sizes of
materials to be digitized may vary tremendously, requiring multiple specialty
firms with the range of equipment and skills to handle, for instance, posters,
architectural drawings, and an array of different photographic formats.
Contracting requires significant resources, both in contracting dollars and in
staff time to prepare and monitor the contracts. Archival collections may require
onsite scanning under curatorial supervision. The materials challenge
a production- line approach.
En route to digital images, it may be prudent to create a film intermediate.
The originals can be captured on film onsite, creating a high- quality preservation
copy that can be digitized oflFsite. Later, if higher- resolution digital
images are desired, the film can be rescanned without further handling of
the originals.
Contracting out still places significant demands on library staff to prepare
the materials for scanning and to perform quality verification on the reproductions.
The quality review process requires adequate equipment and sig-
26
HOW SHALL WE
MANAGE THE
DIGITAL FILES?
NOW THAT WE HAVE
THEM, HOW WILL
USERS ACCESS
THEM?
nificant staff time to assess the level of quality and track any rescanning that
may need to be done. The effect of wrong, poor, or missing images should
be evaluated to determine the degree of quality control.
Another decision is, what activities are best done before or after digitizing?
One approach is to organize and catalog the photos before scanning. In this
way, targets can be prepared to help the scanning proceed smoothly.
Another approach is to use the digital versions to sequence and catalog the
images. This is especially helpful when the originals are large, fragile, or in
poor condition. A by- product of this approach is 100% quality control of
the images, although missing images may not be identified.
Will the images be delivered in the form in which they will be accessed? Will
the delivery medium be the medium from which the images will be accessed?
Or wUl the images be transferred to another medium? How will they
be backed up? Will they need to be refreshed, reformatted, or migrated?
Archiving these digitized materials will require storage devices, staff, and
maintenance and security software systems beyond those already in place in
most libraries. We need to determine which data should reside online, near-line,
or offline. Can we keep lower- resolution thumbnail images online and
higher- quality images near- line? Should uncompressed versions of the files
also be retained? Tracking the movement or renaming of files must be managed
so that users do not reach data dead ends.
High- demand digital resources may be replicated at other locations. There
must be a way to update those sites when defective images are replaced or
errors in records are corrected. We may need to be able to detect alteration
or identify an official version to serve as the authority for the integrity of the
original.
While providing intellectual access to digitized materials raises many of the
same issues as providing intellectual access to materials in their original
form, the stakes are higher. Uncataloged photographs in a pile or in a drawer
can still be viewed. Digital images without any reference to them may be
lost forever
WUl the images be made available online or on disk, in- house or beyond? Is
appropriate retrieval and display software available? What text should be indexed?
How should the text and images be displayed? What sorts of accompanying
information will be provided: photographer biographies, time lines,
bibliographies?
Searching at a single workstation across multiple collections from multiple libraries
requires use of standards, both in the digital formats as well as in the
means of describing and linking to the digital resources. The recent adoption
of the 856 MARC field is a significant step in that direction.
27
SHOULD WE
COLLABORATE
WITH THE
PRIVATE SECTOR?
WHAT ADVANCES
CAN WE EXPECT?
World Wide Web servers have rapidly been adopted as an appropriate
means of making collections available to others. The user software is easily
and inexpensively available. Powerful searching tools are becoming available.
Web browsers can effectively accommodate multimedia content.
As the Internet approaches ubiquity, more and more companies are seeking
more and more content. This often brings them to libraries, museums, and
archives— with a variety of propositions.
We must be somewhat wary of these offers. Many corporations appear
eager to help, but we must evaluate the help they want to give. Often they
offer to lend or donate equipment, but we know that the labor has a more
significant impact on our budgets than the acquisition of equipment. Some
companies will do the digitizing if they can get an exclusive right to distribute
it. Or they will offer to do the digitizing, but it will be in their proprietary
format, compelling the library to use their hardware or software system.
Where there is commercial collaboration, libraries should attempt to retain
the right to make the information freely available. The commercial interests
will have to add value to the information or to the means of accessing it to
make their profit. Those with the skills, talent, and knowledge of the markets
can customize products for various audiences. In this way, the libraries,
the companies, and the researchers all benefit.
While there are many incremental improvements that will make our lot easier—
faster capture devices, higher resolution, improved compression,
greater storage capacities, and improved archive maintenance software—
significant advances at the user end, such as faster transmission speeds and
improved display and printing devices, are likely to arrive more quickly.
If the World Wide Web continues to be a dominant means of access, Web
software will need to more fully accommodate the needs of libraries. We
need software that understands MARC records, can search and display
SGML- encoded texts, and can display bitonal images. We need software
that accepts current standard file formats rather than, for instance, having to
convert our SGML texts to HTML or convert our TIFF images to the GIF
format. These changes are likely to come quickly.
We need to render the computer a more hospitable and forthcoming host to
the riches within. Navigation to and among the myriad of image databases
that will soon be available on the Internet is an area ripe for refinement.
Improved tools for access are being developed, though there have been delays
in achieving retrieval interoperability. Users and librarians cannot be
expected to learn a new search interface for each new resource.
Implementation of standardized approaches will better enable cross- collection
and cross- institution searching.
28
The most certain advancement is that costs will decrease as quality and capacity
increase. Although that statement may not mark me as a great prog-nosticator,
it is a rather encouraging forecast.
Erway is member services officer for digital initiatives at the Research Libraries Group;
at the time of the symposium, she was associate coordinator of the Library of Congress's
American Memory Program.
29
Audience Discussion.
Collaboration with publishers. The advantages and risks of collaboration with
publishers interested in digitizing library or archival collections were discussed.
A number of participants described their experiences. A key issue is
how to arrive at an agreement with a publisher regarding what will be
scanned and what the institution will receive in return. Ideally, publishers
will provide institutions with copies of the scanned texts in standard formats.
However, actual experiences vary widely. Some start- up publishers offer only
a small fee. Other publishers are willing to provide copies but are unable to
do so in standard formats, thus leaving the institution with an unusable
product.
The desirability of guidelines for negotiating these agreements was discussed.
It was suggested that ALA or another appropriate organization produce
such guidelines to help institutions navigate relationships with publishers.
It was mentioned that the American Association of Museums has developed
a public series of license agreements that will be made widely available
this spring ( 1995).
Scanning options. Participants also discussed the pros and cons of doing scanning
in- house versus contracting it out. It was generally agreed that for most
institutions it makes sense to contract for these services rather than try to
maintain an in- house operation, which would require frequent upgrades in
equipment and staff training. Many institutions underestimate the complexities
and rapid rate of change of scanning technology and therefore initially
assume it would be best to establish an in- house operation. Although one of
the anticipated advantages of outsourcing may appear to be avoiding the
need to develop in- house expertise, it was pointed out that this is a fallacy. A
fairly high level of expertise is needed among in- house staff to make efficient
use of a vendor's services.
For working with publishers and outsourcing scanning, there was broad
agreement that the institution must be an active partner in these collaborations
and therefore must develop a significant level of in- house expertise.
Collaborating with the private sector does not mean turning all responsibility
over to another party To assure high quality and a product that maximizes
advantages to users, the institution has to be knowledgeable and fully
engaged when working with the private sector.
Collaboration with faculty and students was also discussed. Some participants
have entered into collaborative relationships with faculty and students
to develop products, digitize materials, or transcribe texts. Faculty and students
can add value to materials that are put online by adding their interpretations
and notes. It was pointed out that, in contrast to the more traditional
definition of cataloging and access, which involves the one- time publishing
of authoritative lists and descriptions, this work may be accomplished
more incrementally over time in the electronic environment, with a variety
of parties contributing to it.
30
Standards. The issue of standards for scanning projects was discussed at
length. It was pointed out that as institutions look increasingly to funding organizations
for support in scanning projects, these organizations are beginning
to look for standards by which to measure funding requests. There is
also clearly a major concern in institutions performing scanning projects in
ways that are compatible with similar activities of other institutions. To address
these issues, standards are needed.
The establishment of standards, however, is very difficult in a rapidly changing
environment. We really know very little about what image quality means
and what it means for diflFerent types of users. In the digital environment,
we are being asked to do new things for different types of users whose needs
we do not yet understand. In addition, establishing firm standards in such an
unstable environment could have significant drawbacks. Any standards that
are created will have to include a certain amount of flexibility and a mechanism
for frequent change.
Richard Frieder, Preservation Department, Northwestern University Library
31
32
PRESERVING
ARCHIVAL CONTEXT
IN THE DIGITAL
ENVIRONMENT
DESCRIPTION AND ACCESS FOR
DIGITIZED PHOTO ARCHIVES
by Jackie M. Dooley
As Anne Kenney described in her opening remarks, the RLG Task Force on
Photograph Preservation designed the Digital Image Access Project to investigate
issues related to description and access for digitized archival photograph
collections. As she also mentioned, participants were consciously
sought whose submissions would represent a variety of typical approaches to
archival cataloging and description. We wanted to confront reality, in which
the typical " one image, one record" scenario is utterly unfeasible for most
large- scale collections; to do so, we felt we would have to stretch the limits of
what the USMARC formats currently have to offer in terms of linked
records, including the inflexible requirement that each record stand independently,
and so we could not work directly in a MARC environment. We
wanted to examine how our usual assumptions regarding both collection-level
and item- level cataloging might have to change when these records become
part of an integrated online system that includes images. We also
wanted to take a broad view of access and consider aspects of image quality,
given that a digitized image's capacity for delivering information might differ
greatly from that of an original.
In jointly planning both DIAP and the technical project coordinated by Jim
Reilly, the task force also very much wanted a demonstration project that
would deliver a tangible product, since visual issues are of such central importance.
We sat down to develop the project more than two years ago, and
it takes little imagination to consider what a different project we might have
planned had we undertaken the task today. Two years ago there was no
World Wide Web ( for all intents and purposes), there was no Berkeley
Finding Aid Project, no Digital Libraries Initiative, no burgeoning National
Digital Library Project at the Library of Congress. I think most DIAP participants
have felt somewhat stymied at various points, wondering whether
we planned the right project or took the best approach. In fact, the best approach
has probably changed several times over the intervening two years
and, worse, is likely to go right on changing after today.
In this presentation I will trace some of the thinking that went into the project,
describe some of our experiences as participants, and look at what we
accomplished.
Archivists are well acquainted with the structured physical arrangement of
many archival collections and the finding aids that traditionally describe
them. In surveying existing digital- imaging projects, however, we did not at
the time see systems that preserve collection context by providing summary
descriptions or presenting images in the context of archival series and logical
arrangement. Imaging systems typically assume that random presentation of
images is acceptable and that an item- level description will accompany each
image. In digitizing archival photograph collections, this not only destroys
any relevant context but also requires costly item- level cataloging.
33
The project team therefore focused on demonstration of a hierarchical system
of Hnked records that would allow entry at the " top," followed by a
level- by- level descent to the images themselves and any accompanying item-level
descriptions. Within such an environment, there would be no need to
repeat at lower levels any data elements that are common to an entire collection
or series. The concept of inherited data common in archival hierarchical
registers could result in economies of data entry, storage, and presentation
to users. The system also would enable testing the intelligibility of
collection- or group- level cataloging within the imaging context. In the ab��sence
of item- level records, we wanted to see how capable individual images
are of speaking for themselves when a collection- level catalog record is readily
available.
Participants took arrangement of their images very seriously in planning for
digital presentation. For example, Columbia University staff worked closely
with Camilo Vergara to arrange his documentation of New York City urban
sites following a logical street arrangement, and the Getty Center preserved
Max Hutzel's logical progression of views and details throughout each particular
architectural site. Stokes Imaging assigned file names to the images in
a structured sequence that preserves the desired presentation order when a
set of images is retrieved.
THE To demonstrate our desired contextual approach, Stokes Imaging agreed to
DEMONSTRATION adapt its existing image- retrieval software, Visual Photologue, as an element
MODEL of its participation in DIAP, with the resulting software to be made available
at no cost to all project participants and for sale to others. A uniform data
structure, comprising a simplified subset of USMARC data elements, was
agreed on with surprising ease for use by all participants; the elements include
some access points of particular importance for photographs, such as
genre/ form ( USMARC 655 field) and hierarchical place name ( USMARC
752 field).
It is important to note that it was never the project's intention to promulgate
Visual Photologue either as a new set of standard data elements for cataloging
photographs or as a standard software package, although certainly
we hoped it would prove a useful and flexible tool. Rather, DIAP participated
in Visual Photologue development in order to have an appropriate
demonstration platform for exploring description and access issues in the
digital environment, since, as mentioned earlier, it would not have been feasible
to do so using RLIN or any existing MARC- based commercial software
package.
34
CATALOGING
METHODOLOGIES
Field
RON record number
Accession/ collection number
Creator
Collection
Uniform title
Title
Publication date
Dates
Physical description of original
Physical description of di
Series
Public note
Private note
Proper name subject
Generic subject
Genre/ form
Added creator
[ gital
Hierarchical geographical place
Linking data
Level code
Figure 1.
Repeatable?
no
yes
yes
no
no
no
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
no
Maximum No. of
Characters
Data elements used in Visual Photologue
Each institution was given comp]
used. This was done to a pproxin
lete flexibility in
late the typical
I the level of
30
80
80
128
80
320
40
80
128
128
80
2048
2048
128
1024
320
256
80
128
8
cataloging to be
archival environment in
which the level of description varies enormously from one collection to the
next. We felt we could not assume that every digital image requires a full
item- level record, since the prohibitive cost of such cataloging could help
doom the proliferation of large- scale archival digitization projects. By investigating
ways to approximate the cost- effective descriptive methodologies
that we already use, we hoped to help institutions focus on the potential that
digitization holds for improved preservation and access rather than on the
impossibility of traditional full cataloging.
As a result, the DIAP database contains cataloging at varying levels of detail.
The completed system's descending hierarchy of records should include
a record for each institution and a collection- level record for each contributed
collection; the latter would ideally have been cloned from full
USMARC RLIN records. Lower- level records vary considerably in every
possible respect.
Columbia University, for example, has about 50 collection- level records,
each with as many as 200 item- level records and images attached; the extent
of item- level data varies depending on both context and availability of information.
Columbia University's methodology is that used in the AVIADOR
videodisc project at the Avery Library; their published cataloging manual
documents the approach in detail. Columbia University and RLG have proposed
that a key element of this method be added to the USMARC format,
35
PRESERVING
CHARACTERISTICS
OF THE ORIGINAL
namely the use of a multiple- occurring 789 field ( component item entry)
within a collection- level AMC ( archival and manuscripts control) or VIM
( visual materials) record for description of each item in the collection.
The Getty Center contributed images from a single collection; these are described
in about 70 subunit- level records, each of which describes the photographs
of a particular architectural site. Each site- level record is attached
to from five to 125 images; at the item level, each record contains only a
negative number. The New York Public Library ( NYPL) and the Amon
Carter Museum both contributed museum- quality images with preexisting
full item- level cataloging; they planned to make decisions regarding the use
of the hierarchical structure of Visual Photologue for more economical presentation
of data that is common to all the items in a collection. Duke
University and the University of California at Berkeley ( UC Berkeley) contributed
collections that previously were described only in unsystematic
paper inventories; they planned to catalog straight into the hierarchical
database from the finding aids or the items themselves, and both institutions
expected the level of detail to vary enormously depending on available information.
The participants' experiences in deciding what they would digitize illustrate
some of the ways in which these decisions affect digital access to the
archival record. For example, the Amon Carter Museum worked from high-quality
contact prints made for reference purposes from Karl Struss's original
negatives. These prints are made with an eye to faithful duplication of
the negative without interpretation, and the museum recognizes that Struss
himself would have printed them very differently.
The Getty Center also was faced with the choice of both negatives and
prints from which to digitize, but took a different approach. It was inclined
to work from Hutzel's original negatives in order to use the earliest possible
generation of image and thereby best capture the complete pictorial record
of Hutzel's photographic survey. There were problems, however: many of
the negatives have very poor contrast and have always been notoriously difficult
to print. Also, when Hutzel printed his own negatives, he often
cropped them severely in order to improve image composition and focus on
particular architectural features. Ultimately, it chose to make 35 mm intermediates
from the original negatives but crop the digital images to match
the photographer's prints in order to approximate Hutzel's creative intent.
Northwestern University digitized both the recto and verso of some images
so as not to lose access to handwritten information, often in French, that
would have been too difficult or costly to key into the catalog records. The
Library of Congress plans to do the same in an upcoming project that will
digitize a large file of presidential portraits that are heavily annotated on the
versos of their mounts. Such data is not searchable and so provides limited
access, but it does enhance the online description of an image and presents
a viable alternative to costly verification and keying of such annotations.
36
IMAGE QUALITY
AS AN ELEMENT
OF ACCESS
LINKING TO THE
RLIN DATABASE
UC Berkeley chose its collections based on perceived high use; hence, images
were selected for which viewing prints were already available, and so
the originals were not involved in the digitization process at all.
One might argue that some of us spent more time than necessary debating
which image format to digitize from and the like, given that all we were
going to get was a set of reference- grade low- resolution image files. These
are, however, the issues that any institution would have to grapple with in a
" real" project to create higher- quality images, so I feel our energy was well
spent.
Because low- resolution digital images ( 480 x 640 pixels) were made for
DIAP, we all came to realize how significant image quality is as an access element.
Due to the variety of original media that was digitized, the anticipated
uses of the digital copies, and previous experience, the participants have
differing views of the reference value of the images. In several cases, the
original photographs contain words ( such as a storefront sign) that are illegible
in the digital image; higher resolution is necessary to convey basic informational
value.
The Amon Carter Museum has found the project images sufficient as a
recognition tool. In their museum context, they assume that most users must
eventually consult the original photographic prints, so low- resolution digital
copies serve principally to facilitate rapid browsing to narrow the field of
originals that must ultimately be handled. Some institutions expected the
digital copy to be the service copy and lament the low- resolution images as
vastly inferior to black- and- white glossies, particularly for researchers who
wish to magnify details, since the low- resolution images pixelize with only
minimal magnification. One repository has found their videodisc images to
be of higher quality than the DIAP images; another has found the opposite
( variations in the hardware used for display may explain the difference, or it
may be just a matter of perception).
In general, expectations for digital image quality have risen sharply in the
past two years while the cost of disk storage has continued to drop and the
average speed and storage capacity of microcomputers have continued to
rise. Were the project being designed today, higher- resolution images almost
certainly would have been made.
Each DIAP collection is to have a USMARC record in RLIN. This means a
single RLIN record for participants such as the Getty Center and the Amon
Carter Museum, fewer than ten RLIN records for most participants, and
about 50 for Columbia University.
In the past couple of years, the idea has become widespread that finding
aids and other resource files should be linked to online MARC catalog
records; this has been stated in papers presented at this symposium as a
basic assumption of needs. In such a scenario, a researcher who locates a
collection- level record in an online catalog could jump directly to a full- text
37
finding aid or a separate image database rather than looking cross- eyed at a
note such as " UnpubHshed finding aid available in the repository."
The existence of research projects such as the British Library's Catriona
project offers hope for quick progress. The Catriona proposal states, " There
must be a means of enhancing catalogue records so that they can include
the information needed to allow the electronic item to be retrieved from a
remote networked source. And there must also be client software capable of
utilizing this information to retrieve the item. . . . The basis of the means
for enhancing a MARC record to allow this kind of network retrieval already
exists in the shape of the universal resource locators ( URLs) utilized
by World Wide Web clients and servers . . . and a development of this, a
universal resource name ( URN), which is under discussion. Moreover, work
is going on to allow this sort of enhancement to become an integral part of
accepted cataloguing standards."
In the USMARC community, work is proceeding apace to agree on final details
for implementation of the 856 field ( electronic location and access),
which is the key to linking MARC records to other resources, and commercial
vendors of online systems are moving forward to effect the links.
PROJECT GOALS DIAP was intended to stimulate thinking about a variety of description and
AND ACHIEVEMENTS access issues. How did we do? The following is a brief discussion of five key
issues and the progress I think we made.
/ . Preservation of Archival Context. We sought to demonstrate the
importance of preserving archival context in the digital environment by developing
systems that maintain links between collection- level descriptions
and the specific images they describe, and that allow users to navigate the
contents of a collection from the top down, to browse in a logical, predetermined
sequence, and to query for particular pieces of data. In asking Stokes
Imaging to adapt their existing " one image, one record" software to such a
model, we gave them a tall order to fill within a limited time. The difficulties
of designing an interface that would be simultaneously complex enough to
accommodate our hierarchies, flexible enough to allow for a variety of differences
among institutions, powerful enough to satisfy the searching sophistication
and speed to which we have all become accustomed, and intuitive
enough for our users to learn easily, were all brought home to us in spades.
We ultimately opted not to conduct user evaluations due to participants'
general sense that the hierarchical Visual Photologue needed enhancements
to allow easier retrieval by nonstaflf users.
At the same time that we were working on Visual Photologue, however,
other approaches came to our attention that also appear capable of maintaining
archival context by allowing navigation from the top down, browsing
in a logical sequence, and by focused queries.
UC Berkeley undertook its Finding Aid Project, and Jack von Euw will report
on their experience working with both Visual Photologue and the
Finding Aid Project, which uses DynaText software for navigating, browsing,
" si
and querying SGML- tagged finding aids. UC Berkeley's experimentation
with both approaches has been fascinating, given the head- on comparison
made possible by entering the finding aid for a photograph collection into
both Visual Photologue and the SGML environment. I see the greatest
strength of the SGML approach in its ability to navigate the text in finding
aids; inline images can be inserted within the text and enlarged for better
viewing, but browsing of the images per se is inelegant.
An extraordinary development of the past two years, known to all but Rip
Van Winkle, has been the emergence of NCSA Mosaic, the first hypertext
graphical interface to the World Wide Web. Columbia University, never one
to sit idly and wait for others to work out the kinks, immediately mapped the
RLIN records for a selection of their DIAP images to HTML and mounted
both the records and accompanying images on a Web server; higher- resolution
images from Kodak PhotoCDs were made available for dramatic comparison
with DIAP low- resolution images. These same RLIN records also
were mapped into Visual Photologue, offering a different comparison from
that described for UC Berkeley. The Library of Congress's World Wide Web
implementation also was constructed from records mapped into HTML
from item- level MARC records and includes useful introductory material
from narrative finding aids and other structural devices to preserve collection
context. As with Berkeley's DynaText environment for browsing finding
aids, however, browsing of text is primary; images are somewhat secondary.
One final thought on preservation of archival context: Anne Kenney noted
in a private communication last fall— and again in her remarks at this symposium—
that in the digital environment we have an opportunity to go beyond
our current definitions of archival context, which assume a variety of
limitations imposed by the physical carrier and geographic location of a collection.
She cautioned against developing systems that will preclude new
links, discoveries, and connections. Her point is well taken; I would encourage
us, however, not to passively ignore or actively reject past principles either
without careful thought.
2. Affordable Methodologies. We sought to demonstrate the value of developing
affordable methodologies for describing and accessing digital images
in groups, when appropriate or necessary, rather than assuming that
item- level description is the only acceptable approach. The extent of item-level
descriptive information provided by project participants varied widely,
from the complete item- level cataloging employed by NYPL and the Amon
Carter Museum, to the Getty Center's lack of any item- level data other than
negative numbers. In looking at the combined database, therefore, we have
an opportunity to see the effects of such variance.
In the Getty Center's case, it simply took existing site- level catalog records
and imported them into Visual Photologue in order to judge whether the
level of cataloging for the nondigital Hutzel collection would suffice in an
imaging system. Doing so left behind secondary data that normally is used
with the collection, such as captions pencilled on photo versos and textual
39
documentation located in reference files. For a true digital corollary, more of
this data would have to be entered into the online system.
The need for this wUl vary enormously from one collection to the next, depending
on the heterogeneity of the individual images represented by a single
collection- level record. Unless the system's navigation system makes the
hierarchical relationship between collection- level record and individual
image very clear, however, users are confused by the lack of item- level description.
In this regard. Mosaic implementations generally seem more effective
than Visual Photologue, since Mosaic users so very consciously move
between levels, as in the American Memory implementation.
3. Lou>- Resolution Images and Access. We sought to demonstrate the
importance of determining under what circumstances, and for what purposes,
low- resolution image quality will suffice for access. As mentioned, project
participants varied widely in their opinions as to whether the DIAP images
were useful surrogates, but we can carry away the knowledge that low- resolution
images do have some very real virtues. They remain substantially
cheaper to make and consume vastly less disk storage, and, as other speakers
have pointed out, the use of " reference grade" images evades many of
the thorny copyright issues that confront us. Low- resolution images also
offer at least one great advantage for access, particularly in a networked environment:
speed. The lower the resolution, the more quickly an image
pops onto the screen, and if quick browsing through many images is desired
( and this, after all, is one of the great access benefits of online images), low-resolution
copies are very useful.
We also know the arguments against low resolution: there can be loss of
basic information; the images lack the tonality and detail of photographic
prints ( as do even very- high- resolution digital images); substantially higher
resolution is required for magnification of details; it will be necessary to return
to the original negatives to make prints unless very- high- resolution
archival digital files are stored. We will feel very silly some day returning to
the originals to create higher- resolution digital files when we can afford
them. Unfortunately, we also know that unless costs of every stripe drop
precipitously, none of us are likely to be able to afford high- resolution copies
of more than a small percentage of our collections. As usual, we each have
to decide what is most important and make decisions accordingly. Is it low-resolution
networked access to massive collections? Is it high- resolution storage
of high- spot items? Needless to say, there is no one answer, but I doubt
we will be able to have it both ways.
4. Data Creation, Image Access, and Affordable Image
Databases. We sought to demonstrate the value of using an affordable
image database system for data creation and image access. All but three
participants migrated existing data from another system, so most of us did
not test the data creation aspects of Visual Photologue. This was not true
for UC Berkeley and Duke University, which began from printed finding
aids, or for Northwestern University, which began with no cataloging data
whatsoever. Jack von Euw will report on UC Berkeley's experience entering
40
data into Visual Photologue and designing a set of hierarchical schemata
tailored to their DIAP collections.
All of us were able to evaluate Visual Photologue's ability to provide image
access. As an environment for quickly and flexibly delivering images to users,
the system proved elegant. Rapid delivery of many images, flexibility in
image sizing, and immediate access to information describing the digital file
are all features of Visual Photologue not yet easily replicated in an environment
such as Mosaic. In our zeal to have networked access to collections, let
us remember the very real virtues inherent in a system like Visual
Photologue that are desirable to replicate.
5. Image and Data Migration from Local to Networked Systems.
We sought to demonstrate the challenges involved in migrating images and
data from local to networkable systems, and the feasibility of linking image
databases with descriptive records such as the collection- level records residing
in RLIN and in local online catalogs. At the recent RLG Primary
Sources Forum, I am told, participants strongly recommended that RLG actively
work to enable linking of databases containing finding aids, images,
and other source materials with RLIN catalog records; clearly, such connectivity
is now a widely shared goal. The RLG Task Force on Photograph
Preservation also recognized distributed network access as a critical ultimate
goal of any imaging system, but we realized it was premature to hope this
could be an element in our demonstration project. The ability to link to networks,
or at least to easily export data to networks, is a design specification
we must transmit loud and clear to all software designers with whom we do
business.
WHERE DO WE GO DIAP per se was a baby step, but it stimulated much productive thinking.
FROM HERE? The participants remain committed to investigating the feasibility of group-level
or collection- level cataloging within the context of an imaging system,
even though the DIAP prototyping effort demonstrated what a challenge it
can be to implement such an approach. Comparison between Visual
Photologue's structured database environment and the more flexible hypertext
approach afforded by Mosaic/ Netscape and the World Wide Web indicated
frustrating trade- offs— Visual Photologue was the more effective delivery
mechanism for images, while Mosaic/ Netscape won hands down for
flexible treatment of cataloging data.
Most participating institutions did not, in the course of this project, consciously
question their traditional approaches to cataloging photographs.
The majority loaded existing cataloging data roughly as is, which was both
practical and desirable in the context of representing " reality," and we assumed
the evaluation would come later. However, the mixed success of the
prototype system ( hence our subsequent decision to eliminate formal end-user
evaluation), coupled with endless technical difficulties, prevented this
evaluation from occurring before formal completion of the project.
Participants continue to collaborate and exchange ideas since the project's
end, and additional DIAP data will inevitably be made available via the
4l"
World Wide Web and other, more up- to- date software environments. It is
heartening to know that, stimulated by projects such as DIAP, the archival
community's penchant for collaborative analysis and cooperation will press
us forward.
Dooley is head of special collections and university archives at the University of
California at Irvine.
42
Audience Discussion.
The discussion focused on some of the pracrical aspects of using Visual
Photologue as a platform for exploring the issues that DIAP had set out to
address, while also touching on some of the more theoretical aspects of
modern information system architecture and the implications for network
access to archival image collections. The discussion thus served to highlight
the failure of the project to fully meet its original objectives, while underlining
that the issues the project did address were probably ultimately more important.
The question of network access. Because of rapidly moving developments in network
access to information that were external to the project, it quickly became
clear that Visual Photologue was not the answer to the questions the
project was posing— especially because the question of establishing dynamic
links between the database and RON was never effectively answered.
Although most agreed that the database did provide a useful level of standalone
functionality, it was clear that more- dynamic network environments
like the World Wide Web offered a more promising approach to answering
the access questions originally posed.
Description and access standards. Although the project originally intended to seriously
address description and access standards issues, this was never done. In
part, this was due to the way Visual Photologue evolved; it was necessary for
participants to provide descriptive data to accompany their images and thus
put a premium on using existing descriptions. And in part this was a tacit
recognition that individual repositories were not going to create an entirely
new approach to cataloging for the purposes of this project, or that if they
were, this was neither the time nor place to do it.
The diversity of the project group, while certainly one of its strengths, was
also problematic in this regard. Each institution saw its needs and mission as
being different, and these differences were reflected in their respective approaches
to the description of and access to their collections— much of
which translated into differing approaches to questions of item- level versus
collection- level control and to structural and descriptive hierarchies.
Ultimately, participants agreed that the diversity of approaches, insofar as it
reflected the " real world," made both the project and the Visual Photologue
design stronger.
Some abstract considerations. Related to the discussion of descriptive and access
standards were more abstract considerations of confusion between the concept
of the physical and the logical in discussing item records and hierarchies
and the importance of recording what is known at any given level, regardless
of how that level might be defined locally. A related question was
raised as to the possibility of substituting images for descriptive data or, alternatively,
using one image to represent a group of images. Such an approach,
however, would require a significant higher image- detail capture
level than was possible in this project.
43
Levels of descriptive information. It was noted that libraries and archives need to
understand that it is simply not possible any longer to supply complete authoritative
descriptive information on all holdings that will satisfy the almost
unlimited number of audiences for which the material might have utility or
relevance. The best an institution can do is supply basic core- level data with
some degree of authoritativeness. Other value- added information can then
be provided by vendors and scholars as material is used. However, on a
practical note, with respect to DIAP and cataloging records with attached
digital images, the question of what is basic was never addressed. Does such
an approach require less description or more?
Choices brought into sharper focus. The discussion finished with a more practical
focus on the development and features of Visual Photologue, including
questions relating to the ( over) abundance of fields and levels and their impact
on searching; the difficulty ( for both sides) of working with a systems
staff" from which the participants were both physically and temporally removed;
and the inadequacy and disparity of the various approaches to
screen design. Nonetheless, the system's various strengths were noted,
chiefly its ability to bring into sharper focus the choices faced by archives
and libraries in making digital representatio