=========================
 File Naming Conventions
=========================

--------------------------------------
 Archimedes Palimpsest Digital Release
--------------------------------------


:Author: Doug Emery
:Date: May 23, 2008

.. contents::
..
   1 General Conventions
     1.1 Folio Designation
     1.2 File-Dependent Components
   2 Image Files
     2.1 Imaging System/Group
     2.2 Imaging and Processing Details
     2.3 Image Illumination/Detector
     2.4 Image Processing
     2.5 Image Set
     2.6 Image Tag
   3 XML Files
     3.1 Transcription files
       3.1.1 Content Type
       3.1.2 Transcription Source
     3.2 Spatial Index Files
       3.2.1 Content Type
     3.3 Supplemental XML Files
       3.3.1 Author
       3.3.2 Work
       3.3.3 Content Type
       3.3.4 Transcription Source
   4 References

1 General Conventions
=====================

1.1 Folio Designation
---------------------

All core data files are prefixed with a two-part folio designation.
There are 180 separate folio designations.  The complete list of
folios, the relations between the prayer book and under text folios,
and the under text contents are detailed in the provided in the
Foliation document.  

Sample folio designations are:

 * 002r-001v_Arch13r
 * 0000-100r_Arch53v
 * 039r-0000_Arch63r

Each bifolium corresponds to a single under text folio.  The first
number indicates the top folio with the undertext oriented up. In the
first sample, 002r-001v, 002r is the leaf of the prayer book
containing the top half of the undertext page, and 001v is the prayer
book leaf containing the lower half.

In those cases where only half of the undertext page is available, the
missing portion of the page is expressed by four '0' (zero)
characters: '0000', as with '0000-100r' and '039v-0000'.

Folio numbers for the undertext have been assigned by scholars working
on the palimpsest. Content is indicated by an abbreviation. The
abbreviations used are:

 * Alex -- Alexander of Aphrodisias' Commentary on Aristotle's
   Categories
 * Arch -- Archimedes
 * Hype -- Hyperides
 * Mena -- Menaion
 * Pant -- The Life of St. Pantoleon
 * UnkA -- Unknown text A
 * UnkB -- Unknown text B

Both under text and prayer book folios have the folio side specified
by an 'r' or 'v', for 'recto' or 'verso', respectively.  Folio side is
not known for the Saint's Life, the Menaion, or the unknown texts.

One leaf of the palimpsest was not rotated 90 degrees. Instead it was
trimmed and is oriented left-to-right. For this folio, the first
number indicates the left side of the undertext leaf, with the under
text oriented up:

  * 165r-168v corresponds to undertext Arch29v-30r
  * 168r-165v corresponds to undertext Arch30v-29r

1.2 File-Dependent Components
-----------------------------

The remainder of the file name, including the extension, indicates the
file type.  Broadly, there are three types of file:

 1. TIFF image files, ending in 'tif',
 2. various XML files, ending in 'xml', and
 3. MD5 checksum files ending in'md5'.  

The meaning of the remaining of the file name components depends on
its type.

2 Image Files
=============

Each image file has a structured name that identifies the image content,
imaging equipment type, and image capture or processing type. The basic
structure is illustrated by the following file name.

 * 103r-097v_Arch42r_Sinar_LED445_01_pack8.tif


Each segment of the filename is separated by underscore characters. The
segments of the sample file are:

 * prayerbook folio: '103r-097v'
 * undertext folio: 'Arch42r'
 * image group: 'Sinar'
 * imaging and processing details: "LED445_01_pack8"
   - illumination: 'LED445'
   - set index: '01'
   - processing tag: 'pack8'
 * extension: always 'tif'

In other words, the named file is an image of Archimedes Palimpsest prayer
book folio 103 recto conjoined with 97 verso, otherwise known as undertext
folio Arch42 recto. It belongs to the Sinar image group and is an 8-bit
"packed" version of a raw image captured using 445 nm LED illumination. It
belongs to the exposure set of this bifolio with index '01'.

Each of these segments is described below.

2.1 Imaging System/Group
------------------------

Value: The imaging system used to capture the raw images, or the group
of raw images.

'Sinar' in the sample '103r-097v_Arch42r_Sinar_LED445_01_pack8.tif.'

The possible values are:

 * Sinar - optical images captured in August 2007
 * XRF - X-Ray Fluorescence Images Captured at the Stanford Linear
   Accelerator Center in March, July, and August 2006
 * Heiberg - images of Heiberg's 1906 photographs of the palimpsest

2.2 Imaging and Processing Details
----------------------------------

Value: One or more segments used to distinguish the image and ensure
unique file names.

'LED445_01_pack8' in the sample
'103r-097v_Arch42r_Sinar_LED445_01_pack8.tif.'

2.3 Image Illumination/Detector
-------------------------------

Value: For optical eight-bit 'pack8' images, the illumination used to
capture the sixteen-bit 'raw' image.

'LED445' in the sample '103r-097v_Arch42r_Sinar_LED445_01_pack8.tif.'

For spectral images, this value will be one of:

 * LED365 - UV illumination, including 365 nm LED, see metadata for
   details
 * LED445 - 445 nm LED illumination
 * LED470 - 470 nm LED illumination
 * LED505 - 550 nm LED illumination
 * LED530 - 530 nm LED illumination
 * LED570 - 570 nm LED illumination
 * LED617 - 617 nm LED illumination
 * LED625 - 625 nm LED illumination
 * LED700 - 700 nm LED illumination
 * LED735 - 735 nm LED illumination
 * LED780 - 780 nm LED illumination
 * LED870 - 870 nm LED illumination
 * RAKBLL - raking 445 nm LED illumination from the left
 * RAKBLR - raking 445 nm LED illumination from the right
 * RAKIRL - raking 910 nm infrared LED illumination from the left
 * RAKIRR - raking 910 nm infrared LED illumination form the right
 * TNGSTN - tungsten illumination


Note that illumination details including number of sources, wattage,
spectral ranges, and their azimuthal angles are provided in the
metadata.

For Heiberg images, this value will be one of:

 * ultraviolet - image created using blue separation of images
   captured with ultraviolet illumination
 * strobe - image created from images captured with strobe
   illumination

For XRF images, the illumination was a single X-ray beam. Front and
back detectors with channels at different energies were used to scan
the palimpsest. These values indicate the channel energy by element
and detector position.  Each value is one of:

 * BaBack   - Ba channel on back detector
 * CaBack   - Ca channel on back detector
 * CaFront  - Ca channel on front detector
 * CuBack   - Cu channel on back detector
 * CuFront  - Cu channel on front detector
 * FeBack   - Fe channel on back detector
 * FeFront  - Fe channel on front detector
 * GeBack   - Ge channel on back detector
 * KBack    - K  channel on back detector
 * MnBack   - Mn channel on back detector

2.4 Image Processing
--------------------

There are three types of images generated by special processing, using
raw spectral images as sources. These are pseudo-color, true color,
ultraviolet, and ultraviolet blue images.

In the processing/illumination position, these images have:

 * pseudo - pseudo-color no-veil or sharpie image
 * true - true color image
 * ultraviolet - ultraviolet norm8 image
 * uvblue - ultraviolet blue norm8 image

2.5 Image Set
-------------

Value: For the optical eight-bit 'pack8' images, the exposure set to
which the image belongs.

'01' in the sample '103r-097v_Arch42r_Sinar_LED445_01_pack8.tif.'

Each image within a set of exposures was assigned the same index,
'01', '02', and so forth. For the pack8 images, each first-quality
image set has the index '01'. Only first quality images are included
in this data set. Further, all processed images are derived from
first-quality '01' images.

2.6 Image Tag
-------------

Value: General purpose image type descriptor and file name
differentiator.

'pack8' in the sample '103r-097v_Arch42r_Sinar_LED445_01_pack8.tif.'

In order to distinguish otherwise identically named files and to
clarify image types, and "image tag" is added to some files. Some of
these values are used for interim files used in the processing.

Values are:

 * RAW - 16-bit "raw" captured image (not in this data set)
 * alph - pre-release only version of this image type
 * no-veil - "no-veil" version of a pseudo-color image
 * norm8 - 8-bit image processed in part using the normalize command
 * pack8 - 8-bit version of a RAW image generated using the packimage
   command
 * registered - Heiberg or XRF image registered to the LED365 version
   of the same folio
 * sharpie - "sharpie" version of a pseudo-color image
 * stitch - image created from multiple "tiles" of the same subject:
   folio or Heiberg photograph

3 XML Files
===========

There are two types of XML file included in the core data set,
transcriptions and spatial indexes that map transcriptions to images.
As do other core data files, XML files begin with the two part folio
designation followed by file content information.  There are three
such file types, shown by these examples:

  * 055r-050v_Arch05r_spatial-index.xml
  * 055r-050v_Arch05r_TEI_Heiberg.xml
  * 055r-050v_Arch05r_TEI_Netz-Wilson.xml

3.1 Transcription files
-----------------------

Transcriptions included in the core data set conform to Text Encoding
Initiative (TEI) P5 release XML.  Each file has a name that identifies
the transcription subject, content type (always TEI), and
transcription source. The basic structure is illustrated by the
following file name.

  * 055r-050v_Arch05r_TEI_Netz-Wilson.xml

Each segment of the filename is divided by underscore characters. The
segments of the sample file are:
 
 * prayerbook folio: '055r-050v'
 * undertext folio: 'Arch05r' 
 * content type: 'TEI'
 * transcription source: 'Netz-Wilson' (or 'Heiberg')
 * extension: always 'xml'

In other words, the named file is a TEI conformant transcription of
Archimedes Palimpsest prayer book folio 55 recto conjoined with 50
verso, otherwise known as undertext folio Archimedes 5 recto.  It is
based on the Reviel Netz and Nigel Wilson transcription of the text.

Each of the transcription-specific segments is described below.

3.1.1 Content Type
~~~~~~~~~~~~~~~~~~

Value: The type of XML file; always 'TEI' for transcriptions included
in this data set.

This value is always TEI for TEI-conformant transcriptions.  All
transcriptions in the release conform to the Text Encoding Initiative
public release 4 guidelines, or TEI P4.

3.1.2 Transcription Source
~~~~~~~~~~~~~~~~~~~~~~~~~~

Value: The source of the digital transcription, either 'Netz-Wilson'
or 'Heiberg.'

'Netz-Wilson' in the sample '055r-050v_Arch05r_TEI_Netz-Wilson.xml.'

Digitally encoded transcriptions in this data set are based on the
Reviel Netz and Nigel Wilson transcription of the palimpsest or
reconstructed from J. L. Heiberg's 1915 Teubner edition of Archimedes.

The possible values are:

 * 'Netz-Wilson' - the Netz-Wilson transcription
 * 'Heiberg' - the Heiberg reading of the palimpsest

3.2 Spatial Index Files
-----------------------

The spatial index files included in the core data set provide
line-by-line spatial mapping of the folios based on pseudocolor
'sharpie' images.  Because all core images are registered to the same
dimensions, each mapping is valid for any unaltered core data image
belonging to the same folio.  The basic structure is illustrated by the
following file name.

  * 055r-050v_Arch05r_spatial-index.xml

Each segment of the filename is divided by underscore characters. The
segments of the sample file are:
 
 * prayerbook folio: '055r-050v'
 * undertext folio: 'Arch05r' 
 * content type: always 'spatial-index'
 * extension: always 'xml'

In other words, the named file is a line-by-line spatial-index of
Archimedes Palimpsest prayer book folio 55 recto conjoined with 50
verso, otherwise known as undertext folio Archimedes 5 recto.

3.2.1 Content Type
~~~~~~~~~~~~~~~~~~

Value: The type of XML file; always 'spatial-index' for spatial index
files.

'spatial-index' in the sample '055r-050v_Arch05r_spatial-index.xml'.

This values is always 'spatial-index' for spatial index files.

3.3 Supplemental XML Files
--------------------------

Supplemental, treatise-length transcriptions are included in this data
set of Netz-Wilson and Heiberg readings of the palimpsest.  There are
three:

 * Archimedes_FloatingBodies_TEI_Heiberg.xml
 * Archimedes_FloatingBodies_TEI_Netz-Wilson.xml
 * Archimedes_Method_TEI_Netz-Wilson.xml

Each treatise transcription has a structured name that identifies the
subject content, the transcription type, the transcription source.
The basic structure is illustrated by the following file name.

 * Archimedes_FloatingBodies_TEI_Heiberg.xml

Each segemnt of the file name is separated by underscore
characters. The segments of the sample file are:

 * author: 'Archimedes'
 * work: 'FloatingBodies'
 * content type: 'TEI'
 * transcription source: 'Heiberg'
 * extension: always 'xml'

In other words, the named file is a TEI conformant transcription of
Archimedes' "On Floating Bodies" based on J. L. Heiberg's reading of
the Archimedes Palimpsest.

Each of these segments is described below.

3.3.1 Author
~~~~~~~~~~~~

Value: The author of the transcibed work.

'Archimedes' in the sample 'Archimedes_FloatingBodies_TEI_Heiberg.xml.'

The possible values are:

 * Alexander of Aphrodisias
 * Archimedes
 * Hyperides
 * Unknown

3.3.2 Work
~~~~~~~~~~

Value: The transcribed work.

'FloatingBodies' in the sample 'Archimedes_FloatingBodies_TEI_Heiberg.xml.'

The possible values are:

 * Commentary (Alexander of Aphrodisias)
 * Diondas (Hyperides)
 * EquilibriumOfPlanes (Archimedes)
 * FloatingBodies (Archimedes)
 * MeasurementOfTheCircle (Archimedes)
 * Menaion (Unknown)
 * Method (Archimedes)
 * Pantoleon (Unknown)
 * SphereAndCylinder (Archimedes)
 * SpiralLines (Archimedes)
 * Stomachion (Archimedes)
 * Timandros (Hyperides)

3.3.3 Content Type
~~~~~~~~~~~~~~~~~~

Value: The XML file's type of content.

'TEI' in the sample 'Archimedes_FloatingBodies_TEI_Heiberg.xml.'

The possible values are:

 * spatial-index - line-by-line mapping to images of the palimpsest
   folios of this work [forthcoming]
 * TEI - TEI P4 conformant transcription

3.3.4 Transcription Source
~~~~~~~~~~~~~~~~~~~~~~~~~~

Value: The source of the XML file's encoded transcription.

The possible values are:

 * 'Netz-Wilson' - the Netz-Wilson transcription
 * 'Heiberg' - the Heiberg reading of the palimpsest


4 References
============

* Heiberg, J. L., ed., Archimedes Opera omnia cum commentariis Eutocii
  (Leipzig: Teubner, 1910-15, reprinted 1972)