========================= File Naming Conventions ========================= -------------------------------------- Archimedes Palimpsest Digital Release -------------------------------------- :Author: Doug Emery :Date: May 23, 2008 .. contents:: .. 1 General Conventions 1.1 Folio Designation 1.2 File-Dependent Components 2 Image Files 2.1 Imaging System/Group 2.2 Imaging and Processing Details 2.3 Image Illumination/Detector 2.4 Image Processing 2.5 Image Set 2.6 Image Tag 3 XML Files 3.1 Transcription files 3.1.1 Content Type 3.1.2 Transcription Source 3.2 Spatial Index Files 3.2.1 Content Type 3.3 Supplemental XML Files 3.3.1 Author 3.3.2 Work 3.3.3 Content Type 3.3.4 Transcription Source 4 References 1 General Conventions ===================== 1.1 Folio Designation --------------------- All core data files are prefixed with a two-part folio designation. There are 180 separate folio designations. The complete list of folios, the relations between the prayer book and under text folios, and the under text contents are detailed in the provided in the Foliation document. Sample folio designations are: * 002r-001v_Arch13r * 0000-100r_Arch53v * 039r-0000_Arch63r Each bifolium corresponds to a single under text folio. The first number indicates the top folio with the undertext oriented up. In the first sample, 002r-001v, 002r is the leaf of the prayer book containing the top half of the undertext page, and 001v is the prayer book leaf containing the lower half. In those cases where only half of the undertext page is available, the missing portion of the page is expressed by four '0' (zero) characters: '0000', as with '0000-100r' and '039v-0000'. Folio numbers for the undertext have been assigned by scholars working on the palimpsest. Content is indicated by an abbreviation. The abbreviations used are: * Alex -- Alexander of Aphrodisias' Commentary on Aristotle's Categories * Arch -- Archimedes * Hype -- Hyperides * Mena -- Menaion * Pant -- The Life of St. Pantoleon * UnkA -- Unknown text A * UnkB -- Unknown text B Both under text and prayer book folios have the folio side specified by an 'r' or 'v', for 'recto' or 'verso', respectively. Folio side is not known for the Saint's Life, the Menaion, or the unknown texts. One leaf of the palimpsest was not rotated 90 degrees. Instead it was trimmed and is oriented left-to-right. For this folio, the first number indicates the left side of the undertext leaf, with the under text oriented up: * 165r-168v corresponds to undertext Arch29v-30r * 168r-165v corresponds to undertext Arch30v-29r 1.2 File-Dependent Components ----------------------------- The remainder of the file name, including the extension, indicates the file type. Broadly, there are three types of file: 1. TIFF image files, ending in 'tif', 2. various XML files, ending in 'xml', and 3. MD5 checksum files ending in'md5'. The meaning of the remaining of the file name components depends on its type. 2 Image Files ============= Each image file has a structured name that identifies the image content, imaging equipment type, and image capture or processing type. The basic structure is illustrated by the following file name. * 103r-097v_Arch42r_Sinar_LED445_01_pack8.tif Each segment of the filename is separated by underscore characters. The segments of the sample file are: * prayerbook folio: '103r-097v' * undertext folio: 'Arch42r' * image group: 'Sinar' * imaging and processing details: "LED445_01_pack8" - illumination: 'LED445' - set index: '01' - processing tag: 'pack8' * extension: always 'tif' In other words, the named file is an image of Archimedes Palimpsest prayer book folio 103 recto conjoined with 97 verso, otherwise known as undertext folio Arch42 recto. It belongs to the Sinar image group and is an 8-bit "packed" version of a raw image captured using 445 nm LED illumination. It belongs to the exposure set of this bifolio with index '01'. Each of these segments is described below. 2.1 Imaging System/Group ------------------------ Value: The imaging system used to capture the raw images, or the group of raw images. 'Sinar' in the sample '103r-097v_Arch42r_Sinar_LED445_01_pack8.tif.' The possible values are: * Sinar - optical images captured in August 2007 * XRF - X-Ray Fluorescence Images Captured at the Stanford Linear Accelerator Center in March, July, and August 2006 * Heiberg - images of Heiberg's 1906 photographs of the palimpsest 2.2 Imaging and Processing Details ---------------------------------- Value: One or more segments used to distinguish the image and ensure unique file names. 'LED445_01_pack8' in the sample '103r-097v_Arch42r_Sinar_LED445_01_pack8.tif.' 2.3 Image Illumination/Detector ------------------------------- Value: For optical eight-bit 'pack8' images, the illumination used to capture the sixteen-bit 'raw' image. 'LED445' in the sample '103r-097v_Arch42r_Sinar_LED445_01_pack8.tif.' For spectral images, this value will be one of: * LED365 - UV illumination, including 365 nm LED, see metadata for details * LED445 - 445 nm LED illumination * LED470 - 470 nm LED illumination * LED505 - 550 nm LED illumination * LED530 - 530 nm LED illumination * LED570 - 570 nm LED illumination * LED617 - 617 nm LED illumination * LED625 - 625 nm LED illumination * LED700 - 700 nm LED illumination * LED735 - 735 nm LED illumination * LED780 - 780 nm LED illumination * LED870 - 870 nm LED illumination * RAKBLL - raking 445 nm LED illumination from the left * RAKBLR - raking 445 nm LED illumination from the right * RAKIRL - raking 910 nm infrared LED illumination from the left * RAKIRR - raking 910 nm infrared LED illumination form the right * TNGSTN - tungsten illumination Note that illumination details including number of sources, wattage, spectral ranges, and their azimuthal angles are provided in the metadata. For Heiberg images, this value will be one of: * ultraviolet - image created using blue separation of images captured with ultraviolet illumination * strobe - image created from images captured with strobe illumination For XRF images, the illumination was a single X-ray beam. Front and back detectors with channels at different energies were used to scan the palimpsest. These values indicate the channel energy by element and detector position. Each value is one of: * BaBack - Ba channel on back detector * CaBack - Ca channel on back detector * CaFront - Ca channel on front detector * CuBack - Cu channel on back detector * CuFront - Cu channel on front detector * FeBack - Fe channel on back detector * FeFront - Fe channel on front detector * GeBack - Ge channel on back detector * KBack - K channel on back detector * MnBack - Mn channel on back detector 2.4 Image Processing -------------------- There are three types of images generated by special processing, using raw spectral images as sources. These are pseudo-color, true color, ultraviolet, and ultraviolet blue images. In the processing/illumination position, these images have: * pseudo - pseudo-color no-veil or sharpie image * true - true color image * ultraviolet - ultraviolet norm8 image * uvblue - ultraviolet blue norm8 image 2.5 Image Set ------------- Value: For the optical eight-bit 'pack8' images, the exposure set to which the image belongs. '01' in the sample '103r-097v_Arch42r_Sinar_LED445_01_pack8.tif.' Each image within a set of exposures was assigned the same index, '01', '02', and so forth. For the pack8 images, each first-quality image set has the index '01'. Only first quality images are included in this data set. Further, all processed images are derived from first-quality '01' images. 2.6 Image Tag ------------- Value: General purpose image type descriptor and file name differentiator. 'pack8' in the sample '103r-097v_Arch42r_Sinar_LED445_01_pack8.tif.' In order to distinguish otherwise identically named files and to clarify image types, and "image tag" is added to some files. Some of these values are used for interim files used in the processing. Values are: * RAW - 16-bit "raw" captured image (not in this data set) * alph - pre-release only version of this image type * no-veil - "no-veil" version of a pseudo-color image * norm8 - 8-bit image processed in part using the normalize command * pack8 - 8-bit version of a RAW image generated using the packimage command * registered - Heiberg or XRF image registered to the LED365 version of the same folio * sharpie - "sharpie" version of a pseudo-color image * stitch - image created from multiple "tiles" of the same subject: folio or Heiberg photograph 3 XML Files =========== There are two types of XML file included in the core data set, transcriptions and spatial indexes that map transcriptions to images. As do other core data files, XML files begin with the two part folio designation followed by file content information. There are three such file types, shown by these examples: * 055r-050v_Arch05r_spatial-index.xml * 055r-050v_Arch05r_TEI_Heiberg.xml * 055r-050v_Arch05r_TEI_Netz-Wilson.xml 3.1 Transcription files ----------------------- Transcriptions included in the core data set conform to Text Encoding Initiative (TEI) P5 release XML. Each file has a name that identifies the transcription subject, content type (always TEI), and transcription source. The basic structure is illustrated by the following file name. * 055r-050v_Arch05r_TEI_Netz-Wilson.xml Each segment of the filename is divided by underscore characters. The segments of the sample file are: * prayerbook folio: '055r-050v' * undertext folio: 'Arch05r' * content type: 'TEI' * transcription source: 'Netz-Wilson' (or 'Heiberg') * extension: always 'xml' In other words, the named file is a TEI conformant transcription of Archimedes Palimpsest prayer book folio 55 recto conjoined with 50 verso, otherwise known as undertext folio Archimedes 5 recto. It is based on the Reviel Netz and Nigel Wilson transcription of the text. Each of the transcription-specific segments is described below. 3.1.1 Content Type ~~~~~~~~~~~~~~~~~~ Value: The type of XML file; always 'TEI' for transcriptions included in this data set. This value is always TEI for TEI-conformant transcriptions. All transcriptions in the release conform to the Text Encoding Initiative public release 4 guidelines, or TEI P4. 3.1.2 Transcription Source ~~~~~~~~~~~~~~~~~~~~~~~~~~ Value: The source of the digital transcription, either 'Netz-Wilson' or 'Heiberg.' 'Netz-Wilson' in the sample '055r-050v_Arch05r_TEI_Netz-Wilson.xml.' Digitally encoded transcriptions in this data set are based on the Reviel Netz and Nigel Wilson transcription of the palimpsest or reconstructed from J. L. Heiberg's 1915 Teubner edition of Archimedes. The possible values are: * 'Netz-Wilson' - the Netz-Wilson transcription * 'Heiberg' - the Heiberg reading of the palimpsest 3.2 Spatial Index Files ----------------------- The spatial index files included in the core data set provide line-by-line spatial mapping of the folios based on pseudocolor 'sharpie' images. Because all core images are registered to the same dimensions, each mapping is valid for any unaltered core data image belonging to the same folio. The basic structure is illustrated by the following file name. * 055r-050v_Arch05r_spatial-index.xml Each segment of the filename is divided by underscore characters. The segments of the sample file are: * prayerbook folio: '055r-050v' * undertext folio: 'Arch05r' * content type: always 'spatial-index' * extension: always 'xml' In other words, the named file is a line-by-line spatial-index of Archimedes Palimpsest prayer book folio 55 recto conjoined with 50 verso, otherwise known as undertext folio Archimedes 5 recto. 3.2.1 Content Type ~~~~~~~~~~~~~~~~~~ Value: The type of XML file; always 'spatial-index' for spatial index files. 'spatial-index' in the sample '055r-050v_Arch05r_spatial-index.xml'. This values is always 'spatial-index' for spatial index files. 3.3 Supplemental XML Files -------------------------- Supplemental, treatise-length transcriptions are included in this data set of Netz-Wilson and Heiberg readings of the palimpsest. There are three: * Archimedes_FloatingBodies_TEI_Heiberg.xml * Archimedes_FloatingBodies_TEI_Netz-Wilson.xml * Archimedes_Method_TEI_Netz-Wilson.xml Each treatise transcription has a structured name that identifies the subject content, the transcription type, the transcription source. The basic structure is illustrated by the following file name. * Archimedes_FloatingBodies_TEI_Heiberg.xml Each segemnt of the file name is separated by underscore characters. The segments of the sample file are: * author: 'Archimedes' * work: 'FloatingBodies' * content type: 'TEI' * transcription source: 'Heiberg' * extension: always 'xml' In other words, the named file is a TEI conformant transcription of Archimedes' "On Floating Bodies" based on J. L. Heiberg's reading of the Archimedes Palimpsest. Each of these segments is described below. 3.3.1 Author ~~~~~~~~~~~~ Value: The author of the transcibed work. 'Archimedes' in the sample 'Archimedes_FloatingBodies_TEI_Heiberg.xml.' The possible values are: * Alexander of Aphrodisias * Archimedes * Hyperides * Unknown 3.3.2 Work ~~~~~~~~~~ Value: The transcribed work. 'FloatingBodies' in the sample 'Archimedes_FloatingBodies_TEI_Heiberg.xml.' The possible values are: * Commentary (Alexander of Aphrodisias) * Diondas (Hyperides) * EquilibriumOfPlanes (Archimedes) * FloatingBodies (Archimedes) * MeasurementOfTheCircle (Archimedes) * Menaion (Unknown) * Method (Archimedes) * Pantoleon (Unknown) * SphereAndCylinder (Archimedes) * SpiralLines (Archimedes) * Stomachion (Archimedes) * Timandros (Hyperides) 3.3.3 Content Type ~~~~~~~~~~~~~~~~~~ Value: The XML file's type of content. 'TEI' in the sample 'Archimedes_FloatingBodies_TEI_Heiberg.xml.' The possible values are: * spatial-index - line-by-line mapping to images of the palimpsest folios of this work [forthcoming] * TEI - TEI P4 conformant transcription 3.3.4 Transcription Source ~~~~~~~~~~~~~~~~~~~~~~~~~~ Value: The source of the XML file's encoded transcription. The possible values are: * 'Netz-Wilson' - the Netz-Wilson transcription * 'Heiberg' - the Heiberg reading of the palimpsest 4 References ============ * Heiberg, J. L., ed., Archimedes Opera omnia cum commentariis Eutocii (Leipzig: Teubner, 1910-15, reprinted 1972)