HMML Frag 32 Multispectral Imaging ReadMe

Prepared by R.B. Toth Associates

Multispectral Imaging for the Hill Museum & Manuscript Library

Authors: Michael B. Toth, William A. Christens-Barry, Cerys Jones

Date: 15 May 2020

1 HMML Multispectral Imaging

The Hill Museum and Manuscript Library (HMML) data set includes captured and processed image data from the multispectral imaging of a HMML Palimpsest in a Washington DC area imaging laboratory on 11 and 20 September, 2018 by R.B. Toth Associates, in partnership with Equipoise Imaging and Phase One A/S.

The narrowband multispectral imaging system used for this project includes commercial-off-the-shelf hardware and software for digital spectral image capture and viewing with the integrated system. It also includes customized image processing software for processing and exploitation of the spectral images, utilizing techniques from other cultural heritage studies. The medium-format, high-pixel-count camera takes a series of high-quality digital images, each illuminated by a specific wavelength of light. The resulting image set is then digitally processed and combined to reveal residues and features in the manuscript or book (or artwork) that are not visible to the eye in natural light. These processed images, which are generated from the captured images, clarify and offer new insights to support research into the objects.

1.1 Camera System

A Phase One iXG Camera System with a 100 Megapixel Achromatic CMOS sensor with a 72 mm lens produced images of over 700 ppi. The higher resolution CMOS sensor and greater dynamic range allows greater resolution and autofocus for increased efficiency and improved results.

1.2 Illumination System

The imaging system provides narrowband illumination with light in specific wavelengths from low heat and low maintenance, long-lifetime light emitting diodes (LEDs). It includes two integrated illuminators, each with multiple LEDs, providing illumination for imaging in distinct ultraviolet, visible and infrared narrow spectral bands (see wavelengths in “General File Conventions” below). It is integrated with software to allow simplified system operation and unified metadata capture.

1.3 Filter System

To capture fluorescence from an object, a 6-position motorized filter wheel contains five 2-inch square optical glass filters, with control software and computer interface. Filtered images can increase the range of captured information to include both fluorescence emissions and UV reflectance. This allows the characteristic spectra of substrate, colorant, and contaminant materials to be more completely determined and analyzed. The filter wheel is driven by computer control with a removable carousel containing a selection of filters (UV bandpass; visible bandpass and longpass filters).

1.4 Image Capture Integration

The Spectral XV integrated image capture operating software developed by Equipoise Imaging LLC provides integrated control of the digital camera back, filters and illumination as a single system. This software – based on the CaptureCore application engine developed by Phase One A/S to control camera capture operations and processing workflow – allows streamlined operation and metadata capture from a single interface with simple setup and imaging.

1.5 Spectral Imaging Processing

Images are initially processed with ImageJ open-source image processing software and a customized Paleo Toolbox – a spectral imaging toolkit created by Equipoise Imaging LLC, for applications in cultural heritage imaging. The Paleo toolkit comprises plugin modules that integrate into ImageJ, an open source image processing tool originally developed at the US National Institutes of Health. ImageJ has been widely adopted and extended by scientists working in remote sensing, biological science, and cultural heritage world-wide. It offers a wide range of digital operations for the enhancement and reproduction of non-visible features from the manuscripts and books based on their spectral response in images captured with the full set of illumination wavelengths and emission bands.

2 Rights

These images are licensed for free use under Creative Commons Attribution 4.0 International License (CC BY 4.0). Users are free to copy and redistribute the material in any medium or format, and remix, transform, and build upon the material for any purpose, even commercially with appropriate credit to HMML. Since Michael B. Toth and Bill Christens-Barry conducted this multispectral imaging and digital processing for HMML on a pro bono basis, we request published images be credited to “HMML, R.B. Toth Associates and Equipoise Imaging”.

3 HMMLData Set Contents

This data set comprises a core content set of digital images of St. John’s University Manuscript Fragment 32 (listed as SJUMsFrag32). The data set contains the following folders:

README.txt file: This description of the data set in txt form providing an orientation to the data and rights management.

SJUMs_[Filename]: Data captured from multispectral imaging of the HMML manuscripts and books and converted into TIFF images. The filename is not intended to substitute for the descriptive metadata in the json file.

Flattened (if applicable): Converted images that have been processed with reference “flats” images to balance illumination and other imaging artifacts.

Processed: Digitally processed images from the captured multispectral images of the HMML Palimpsests taken with the 100 MP camera and integrated illumination system. This may be in output folders or folders with the prefix PROC-.

The directory structure, starting from the root is as follows:

  |-- Data
  |   |-- Frag32r
  |   |   |-- Frag32rXRF
  |   |   `-- Processed
  |   `-- Frag32v
  |       |-- Frag32vXRF
  |       `-- Processed
  |-- manifest-sha1.txt
  |-- ReadMe_Multispectral.html
  |-- ReadMe_Multispectral.txt
  |-- ReadMe_XRF.html
  `-- ReadMe_XRF.txt

3.1 Core Data

For each manuscript side, the data set provides sequences or stacks of captured and registered images converted to TIFF and JPEG thumbnail images with metadata. These images should be retained as archival images and will be easiest to read with most image viewers. Images are captured in IIQ format as working images that are converted to TIFF, as they are in a proprietary format that can only be viewed with Phase One’s Capture One software.

The data set includes:

  1. Multispectral images captured using Spectral XV were converted from .IIQ format to 16-bit .TIF format by use of Capture One Software. Converted images have the _R at the end of the rootname.
  2. Reference “flats” images used to calibrate the light levels across the image.
  3. (If applicable) Subject images flattened using reference flats images; flattened images have the string _F at the end of the rootname.

The core data include:

Metadata is included in associated JSON files for multispectral images. Each multispectral capture image folder is provided with descriptive metadata in the JSON file giving details of the image capture for the project, scene and sequence and processing methods used to generate integrated images from the various captured images.

Each multispectral capture image folder is provided with descriptive metadata in the JSON file giving details of the image capture for the project, scene and sequence.

This includes basic Archimedes Palimpsest Metadata Standard metadata, such as:

  {
    "Project": {
      "ProjectID": "100001",
      "Name": "HMML",
      "Rights": "CC4.0-BY",
      "Publisher": "R.B. Toth Associates",
      "ProjectNickName": "HMML",
      "Creator": "M.B. Toth, W.A. Christens-Barry",
      "Contributors": "Cerys Jones, Columba Stewart, HMML, Equipoise Imaging, R.B. Toth Associates, Phase One A/S",
      "Description": "Pro Bono Multispectral imaging of St. John’s University Manuscript Fragment 32 - SJUMsFrag32"",

4 General File Conventions

The unflattened and flattened captured images file names include six fields plus an extension. The initial three fields match the short forms of the project name, scene name, and sequence name. The first and second fields are delimited by _, and the second and third fields are delimited by -. The fourth field consists of a three digit number, indicating the illumination wavelength (in nm), plus a plus a single letter identifier for the camera filter.

This file naming convention supports automated processing of the captured images using delimited information to define needed parameters used by the processing tools and algorithms.

The illumination or illuminations used to produce each image cited in the filename of the flattened images include multiple illumination types. The illumination symbol is one of the following symbols, or a combination of symbols for processed images:

Examples for captured images are:

Unflattened:

  project_scene-sequence-<wavelength and filter>_<index number>_R.tif

Flattened:

  project_scene-sequence-&lt;wavelength and filter&gt;_&lt;index number&gt;_F.tif

Processed images amend this naming convention to indicate the type of processing employed. The initials of the individual who created the processed images are (optionally) given in the fourth field of the filename of processed files. Since processing operations most often utilize all of the captured images of a sequence, identification of individual images used as inputs for processing operations are generally omitted. One or more following, underscore-delimited fields describe the processing operations and parameters that were used, appended in order of their application. Within an underscore-delimited field, single hyphens are used to delimit parameter values or image indices used during that processing operation. Usually the parameters refer to the index number of a component image.

A typical filename exemplifies the naming practices used for processed images:

  HMML_[ManuscriptName]-[Sequencename]__MBT__PCA_pc05-pc06-pc07.tif

Project name: HMML

Scene name: [ManuscriptName]

Sequence name: [Sequencename]

Creator: Michael B. Toth (this field is sometimes not used)

Processing: 1. Principal Components Analysis (PCA)

  1. PCA components 05, 06, and 07 were used in the R, G, and B channels, respectively, of the final (synthetic) RGB TIFF image

In some case, two rounds of PCA processing were performed. Selected components from the first round of PCA processing were used during the second round of processing. In these cases, the string PCAx2, followed by - delimited indices of the first round components that were used in the second round. For example, the file name:

  HMML_[ManuscriptName]-[Sequencename]__WCB__PCAx2-05-08-12-15_pc02-pc03--pc06_RGB.tif

This indicates that William Christens-Barry used components 2, 5, 8, 12, and 15 from the first PCA round in the second round, and used components 02, 03, and 09 from the second round of processing in the red, green, and blue channels, respectively, of the final RGB TIFF image. Note that the use -- as a delimiter indicates that a range of component images was used, e.g. 3--6 would indicate that components 3, 4, 5, and 6 were used. Please note that the practice of including a leading 0 is not followed consistently, and that the use of pc in the front of a principal component used in an RGB channel may not be followed to avoid excessively long filenames.

Other strings in processed file names include:

The remainder of the file name, including the extension, indicates the file type:

  1. TIFF still image files, ending in tif,
  2. JPEG still image files ending in jpg