This series of links is for the series of publications, how many is not known by this writer, released by today's Ohio Historical Society. Originally published under the title Ohio Archæological and History Publications, in a series of 113 volumes, they contain the quarterly publications by year in book form. At present the only volume on this site is the twentieth in the series published in 1911. The same material, though not in archival format, is available from the Publications web site of the Ohio Historical Society. Through their efforts the various volumes are full-text searchable and is an amazing collection not often fully utilized. The chief difference between the full record set on their site differs in one manner only from that hosted herein; the same material presented here displays the actual page, not the text of the page.
OHS text / pageThe other primary portal for accessing this invaluable record set is through the the Internet repository, Archive.Org. While usable in a roughshod fashion, it leaves a lot to be desired on many accounts. First, it is not a true archive in the purest sense. The images are little more than second or third generation photostatic copies and in most cases without the corresponding digital archive supporting the low quality searchable PDF files. Usable, yes, after a disorganized blurry fashion. Desirable for material that should remain viable for future generations, no. The image below is one of the better ones by far.
Second, because of the alluded-to low quality imaging utilized, the inherent searchability of the PDF file is severely limited. In most cases it us useless. It is the thought of this writer that Archive.Org failed with their funding, and the time would be better spent on other more esoteric material. This detraction is not directed solely at the that fine organization, but more towards the firms that willy-nilly host the material on their servers. Archive.Org needs to establish better protocols for the vetting process, including better standards, for the hosting of their "archive."
The Ohio Historical Society web site serves a far better purpose. Other than the aforementioned lack of the imagery utilized in creating a true digital archive.
Now to the pages presented on this site, in brief. A sacrificial volume, in this case that of 1911, was located and unbound to the signature level. The signatures were each scanned at 400 pixels-per-inch and 24-bit color and each singular page saved to a file. This set of images, was saved into a “Raw” directory after the overscan was removed and a path created. The final attribute for this set of truly archival images then had metadata applied to the record set. In this instance metadata can be thought of as the old card catalogs those of a certain age remember at our local library. Only in lieu of describing a book, it has been applied to the pages of that book.
Archive.Org imageFrom this master set a duplicate set was created and placed in a directory entitled “Enhanced”. Using the aforementioned path, in essence the image sans anything not particular to the page itself, the page was tonally adjusted using several techniques particular to digital imaging. Additional adjustments being utilized as well, a series of images thus usable for creating images usable on the Internet can be created. This series of actions were stored in a script, and then that script was applied to the record set.
The “Enhanced” directory thus created then becomes the input for the online images. A subtle drop shadow for aesthetic viewing was added and the images saved to a “PNG” directory. This, again, was performed using a set of scripts, permitting an easier digital workflow. This set of Internet images was then individually processed as to the total image dimensions. These are the image viewable on this web site.
The metadata, and this is specific only to the CMS software used on this web site, Joomla, then becomes the input to a spreadsheet. Using software specific only to the CMS Joomla software, this spreadsheet was created as it permits ease of use in creating the over 500 pages to be built for the online record set. In one section of this spreadsheet was placed the text of each page. In essence the text, having been OCRed, was copied and pasted into a text file to which corrections for errors was made. The resultant text was then back-fed into the spreadsheet. Why you may ask? This OCRed text becomes the basis of the search engine, both internal and external, that is used. The final aspect for the presentation was the uploading of the spreadsheet to a program that constructs the individual pages.
BrethrenArchives.Com imageAs part of this overall process three PDFs were generated. The first meets the standards as set forth for PDF/A-1a. Why this seemingly lower quality standard? Because there is no text in the file —they are images! They are not content generated by a modern computer word processing program. This is a fallacy that Archive.Org, actually the vendors crafting the online content for these invaluable old volumes, is prosecuting. This master PDF is the basis for two additional PDFs.
One is a “Report PDF” file and the other is the OCR PDF. The Report PDF contains all the particulars of the original images as to the image color spectrum, the original metadata, in essence, all the particulars of the files as well as a thumbnail of each. The OCR PDF file is, as the title describes, the images having been Optical Character Recognition applied internally by Adobe Acrobat. Again, this file is a PDF/A-1a standards compliant file. Again, why this lower standard? The file contains “images of antiquity” and was not created by a modern word processing or image editing program. They are the same as photographs in your shoe box under the bed. The resultant OCRed file, after a careful examination of the embedded fonts, will show hundreds upon hundreds, and perhaps even thousands upon thousands of “fonts.” Not a one, likely, will be any recognized standards compliant or recognized font of today. Again, another fallacy perpetuated by firms creating Archive.Org content when using the higher complaint standard PDF formats.
The final step of this mini-project, and that more important, perhaps, to archivists, is the creation of archival friendly media. Yes, no media disk has yet been invented that meets the same standards as the time trusted microfilm. However, when was the last time you had access to a film reader at your present space? Not many people have access to one and digital will slowly, inevitably take precedence over film. You might state that I am on the cutting edge. Though the media used in this project is purported by the manufacturer, Taiyo Yuden in this case, to exceed a 70 year life span, caution should always be used. Thus standards as in place for all institutional procedures should be adhered to. This is stated because it is hoped that whichever archive into which the possession of these records occurs will have such protocols in place.
One last thing. Earlier the concept of metadata was commented on and that the record sets created both herein, but as well for all other record sets digitized by this writer, are important when creating such records. Below are two tables demonstrating the differences between metadata properly generated when compared for what has quickly become a de facto online repository, Archive.Org:
As can be seen above and to the left, the differences are drastic and significantly different in their descriptive approaches. The one gives clearly the parameters used in the record set creation, while the other leaves it to a series of questions difficult to answer.
It matters not this particular record set of Archive.Org uses this series of record descriptive phrases. In other words, this is not an oddity. All of the metadata applied to other books of old are the same, a banal series of meaningless words.
To further confound an archivist, or those interested in preserving these volumes for posterity, once the cropped images have been compiled into a PDF the images, and yes they are never tonally adjusted, are tossed into the digital trash-bin. Some of the PDFs are of so low a quality as to appear as if a sheen of vaseline were smeared atop them.
One final step, if applicable, to the books stored on this site. During the scanning process if a photographic image is deemed worthy enough it is scanned at a resolution of no less than 1,200 ppi, and more often than not at the higher 2,400 pixels, and then carefully adjusted removing the moiré pattern at the same time. More often than not it is only in these old volumes that images yet remain.
A. Wayne Webb