Archimedes Palimpsest Digital Release README Document
- Date:
- October 29, 2008
1 Rights and Conditions of Use
The Archimedes Palimpsest data is released with license for use under Creative Commons Attribution 3.0 Unported Access Rights. It is requested that copies of any published articles based on the information in this data set be sent to The Curator of Manuscripts, The Walters Art Museum, 600 North Charles Street, Baltimore MD 21201.
2 Intended Audience and Consumers
The Archimedes Palimpsest Digital Product is intended to serve any interested user or party. However, its content is focused on serving the following groups.
Scholars of Greek and mathematics
Application providers
Libraries and archives
Image scientists, and scientists in other disciplines interested in the production of the images
3 Digital Project Data Set Purpose
The Archimedes Palimpsest Digital Product provides all the digital information available on the Archimedes Palimpsest in a single digital data set, with a standard structure. Its purposes are threefold:
Serve as the authoritative digital data set of images in a standardized format that meets the needs of users, information providers, archives and libraries.
Provide derived information (i.e. transcriptions, processing information) in the context of digital images of the original manuscript in a single integrated package.
Offer a standard product sustainable by users to which current or future contributors can add additional standardized information (e.g. alternate texts, image analyses or conservation information).
4 Data Set Contents
This data set consists of:
a core content set digital images and transcriptions of the Archimedes Palimpsest, each with accompanying metadata and checksums
project-generated and third-party documentation of all included components
supporting functional files, including XML schemas, and cascading style sheet files
supplemental versions of the transcriptions by treatise and work
a directory for researcher contributed content files, not a part of the core data set
4.1 Core Data Content
The core content of images and supporting transcriptions is the focus of the Digital Product. For each folio, a comprehensive set of registered images is provided of the palimpsest. Available transcriptions are provided to support use of the images.
For this release, the core data includes:
Image data consisting of large 8-bit image files, including requantized raw images, processed pseudo-color images, registered Heiberg images andregistered XRF images. All these files include embedded metadata and metadata files. (Note: The original 16-bit Archimedes Palimpsest images are available at http://mirrors.rit.edu/archie/post-2007/HTML_TIFF/ thanks to Imaging Scientist Roger Easton.)
A set of TEI (Text Encoding Initiative) conformant XML tagged Unicode transcriptions from all Archimede and Hyperides texts, including embedded metadata and associated metadata files.
Spatially mapped transcriptions for each palimpsest folio of the Archimedes and Hyperides texts.
For each folio in the palimpsest, the data set provides:
All eight-bit raw and processed registered TIFF images for the directory’s folio, including XRF images or images of prints of Heiberg’s 1906 photographs, when they exist, images of photographs of an unfoliated palimpsest leaf from Cambridge University, and an image a negative of folio 57v from the University of Chicago.
For all of the Archimedes and available Hyperides texts, an XML encoded transcription of the directory’s folio spatially mapped to all the registered images in the directory
An XML metadata file for each of the TIFF files in the directory [forthcoming]
An MD5 checksum file for each of the TIFF and XML content files
All file names follow strict naming conventions to facilitate easy identification of file type and content.
The core content set contains folio-by-folio versions of the Netz-Wilson transcriptions of the Archimedes texts, and of the Hyperides texts transcriptions and line-by-line text-to-image spatial mappings in integrated files. These files collect in one place transcription mapping data for all images of a single undertext folio.
In addition to its images and transcriptions, each content directory provides preservation information in the form of:
Metadata embedded in image files
XML metadata files for each image [forthcoming]
Metadata embedded in the mapped transcription file
MD5 checksum data for all TIFF and XML files to ensure their fixity
The metadata for images and transcriptions complies with the Archimedes Palimpsest project metadata standards, which are provided with this set as documentation. The metadata provides investigative, data sharing and scientific information on the images and transitions.
Metadata are data elements about the content, quality, condition, and other characteristics of the data sets that make up the digital holdings. Metadata records are produced according to rules and definitions governing several subtypes:
Identification Information
Spatial Data Reference Information (images and spatial indexes, only)
Imaging and Spectral Data Reference Information (images only)
Data Type Information
Data Content Information
Metadata Reference Information
4.2 Documentation
Documents are provided to fully describe the contents of the data set and facilitate their use. There are both external and internal documents. External documents detail data standards, file specifications, and technologies used by the project, such as the TIFF specification, MD5 checksum algorithm, and various XML-related technologies. Internal documents detail project data standards and practices, image processing algorithms, and information required to use the data set not detailed in the external documentation.
4.2.1 External Documentation
External documentation includes:
ASCII specification [forthcoming]
CSS 2.0
Dublin Core [forthcoming]
GNU TAR file archive algorithm [TBD]
GZIP file compression algorithm [TBD]
HTML 4.0
MD5 hash - rfc1321.txt
PDF 1.7
RELAX NG
TIFF 6.0
XML 1.0
XML Schema
XSL 1.0
Unicode - Unicode Code charts - Unicode specifications and technical reports
ZIP file format specification 6.3.2
4.2.2 Internal Documentation
Internal documentation includes:
Archie Image Manipulation software documentation - Manual [to be updated] - Algorithms employed [forthcoming] - C code [TBD]
File Naming Conventions
Folio Index
MD5 How-To
Metadata Data Dictionary [forthcoming]
Metadata How-To [TBD]
Metadata Standard
Transcription Integration Plan
Transcription Metadata Standard
Scientific documents describing: - Spectral image capture techniques [forthcoming] - XRF image capture techniques [forthcoming] - Image processing [forthcoming]
XRF Metadata Extensions
4.3 Supporting Functional Files
The data set provides supporting files needed to share or work with the Digital Product content data. Primarily these files are XML schema documents used to validate and process transcription, spatial index, and metadata files in XML format. The following supporting file collections are included.
Archimedes-Palimpsest: Custom XML schema files for working with project metadata XML files and custom mapped transcription formats [forthcoming]
TEI: Documentation and XML schema files for the TEI guidelines
Dublin-Core: XML schema files for the Dublin Core metadata elements
CHS: RELAX NG schemas for Center for Hellenic Studies spatial indexing XML files
4.4 Supplemental Files
The purpose of the Supplemental material is to provide alternate presentations of XML-encoded data for scholars, application developers, and other interested parties who may want to use them.
It contains “master” files created for the transcription and spatial mapping efforts. For each work there may be:
TEI XML-encoded transcripts
All Archimedes works have a Netz-Wilson transcription
Heiberg transcriptions are provided for the Archimedes texts On Floating Bodies, On Spiral Lines, and Sphere and Cylinder
Hyperides transcriptions are included
XML-encoded line-by-line mappings of transcriptions to images
The combined folio-by-folio spatially mapped transcriptions files included in the core data set have been derived via XSL transformation from the transcription and mapping files.
4.5 Contributed Research Files
This Contributed Research data is intended initially to include useful and specialized images contributed to the project by image scientists. These are images useful to scholars, but not integrated into the core data set because, for example, they are not registered to core image dimensions or they are not accompanied by complete metadata. Over the life of the data set, this directory may be used to include carefully vetted contributions that provide critical contributions to the data set, such as conservation, codicological, and other information.
This component includes experimental diagrams, and may later contain close-up images of special regions of interest and images captured or processed using experimental techniques.
5 How to Use This Data Set
This data set contains supporting documentation to enable discovery of the data and available access tools. The files named below may be located by using the file 1_FileList.txt which accompanies this ReadMe file.
5.1 General Orientation
For General Orientation to the data set, see
0_ReadMe.txt: this file
1_FileIndex.txt: list of files in the data set
FileNamingConventions.txt: a description of naming conventions for image, XML, and MD5 files
FolioIndex.txt: a list of the Archimedes Palimpsest folios by work, undertext folio, and Euchologion folio
MD5_README.txt: a brief how-to on using MD5 files to confirm the integrity of content files
TBD: A lay description to the image types.
5.2 Metadata
Metadata information for the images and transcriptions is described in several supporting documents.
Image_Metadata_Standard.pdf: The projects imaging metadata standard document.
Image_Metadata_Standard_XRF_Extensions.pdf: Extensions to the metadata standard to support XRF imaging
Transcription_Metadata_Standard.pdf: Metadata elements for transcriptions and spatial mappings of transcriptions to images
Transcription_Metadata_Mapping.txt: A mapping between project-selected Dublin Core identification elements and TEI header elements used for metadata in the transcription files
MetadataDataDictionary.txt: A complete dictionary of the metadata elements used in all contexts
TEI documentation: Documentation of the TEI guidelines used for the transcriptions
rfc5013.txt: Dublin Core metadata elements
DCMI_Metadata_Terms: Dublin Core metadata term specification
ArchimedesPalimpsestXML.xml: Documentation of Archimedes Palimpsest custom metadata schemas for metadata and content management
5.2 Computer Access Tools
For machine access to the files in this data set the following files can be used.
ArchimedesPalimpsestXML.txt: Documentation of Archimedes Palimpsest custom metadata schemas for metadata and content management [forthcoming]
Content.xml: a machine readable table of contents for the data set, connecting content files to their unique identifiers, metadata records, and folios [forthcoming]
FolioIndex.xml: a machine readable list of the Archimedes Palimpsest folios, by work, prayer book folio, and undertext folio [forthcoming]
XML schemas and DTDs for working with content XML files, including TEI, DublinCore, and custom schemas created for the data set
TEI documentation: Documentation of the TEI guidelines used for the transcriptions
5.3 Scientific Information
The included scientific texts provide descriptions of image capture and processing techniques used to create the data set.
ImageCapture.txt: Documentation of techniques used to capture spectral images used in the data set [TBD]
ImageProcessing.txt: Documentation of techniques and algorithms used createthe processed images used in the data set [TBD]
XRFCaputre.txt: Documentation of XRF imaging used to capture XRF images used in the data set [TBD]
Archie_1.0.pdf: Documentation of the Archie 1.0 image manipulation software suite [to be update]