written by Konstantina Eleftheriadi, University of Cologne
Introduction
This article investigates online tools for Handwritten Text Recognition (HTR). Specifically, it examines and compares the functionality of the “Transkribus” and “eScriptorium” platforms. Initially, it outlines practical applications of either platform. In the case of Transkribus, a custom Deep-Learning HTR model for Byzantine Greek is developed and evaluated. In the case of eScriptorium, an established model is tested instead, for the same tradition. Through the two experiments, the efficacy of each platform is revealed. Consequently, there is an attempt to compare them briefly, before ultimately venturing to highlight prospective challenges that arise regarding the applicability of such tools for research in the field of digital paleography.
The framework for the endeavor has been provided by the “Sunoikisis Summer 2024: Digital Classics and Byzantine Studies” Consortium, convened by Martina Filosa, Monica Berti, and Gabriel Bodard, for the Academic Summer Semester 2024. Specifically, the fourth session from the aforementioned, on “HTR and OCR from papyrus to codex” and the four proposed exercises.
Online Tools for Handwritten Text Recognition
Manuscripts constitute vital resources for the formation of historical narratives, insofar as they are direct carriers of information concerning past sociocultural circumstances (Saraswati 2013). The study of the material may often be impeded by practical conditions, chief amongst them being the fragility of the objects and their dispersion among different collections and locations. Digitization efforts have significantly increased accessibility to documents, motivating their comprehensive study remotely (Marthot-Santaniello 2021). The development of highly-specialized tools that enable innovative approaches to the new, digital material in academic contexts, has also helped invigorate interest in their study (Jang 2020). “Digital Paleography” has emerged as prolific scientific direction in its own right, garnering due attention from experts.
Transkribus: Creating a custom HTR-Model
The second and third exercises provided instructions for the development and evaluation of Deep-Learning models for Handwritten Text Recognition in Byzantine paleography. Excerpts from manuscripts of the Bodleian Library Collection were supplied in the form of JPEG images, along with transcriptions of their contents, to be used during the training and validation stages. The process was conducted through the “Transkribus” platform, co-developed by the University of Innsbruck (A Short History, 2023). As per the directions of the second exercise, seven pages in total were used in the endeavor. The pages selected originated from the MS. Barocci 102, a Commentary on Isaiah by Basil the Great, compiled around the early 12th c. The model was trained on six pages, then validated and tested on two more.
In the first stage, the model used the training pages and their ground-truth transcriptions as an example, to begin learning to recognize character-sequences in the images. Afterwards, it evaluated and fine-tuned itself by testing its accuracy against the validation-dataset, for which it also had access to transcriptions. Finally, it could attempt to transcribe the text on the test-page, for which it did not have a transcription. The entire process was conducted automatically through the platform, with no further intervention by the user. A preliminary examination of the results on the test-dataset could immediately point to disparities from the text (Fig. 1). Especially concerning was the misinterpretation of word-clusters in areas where the text was densely written, meaning with insufficient spacing between glyphs.
The experiment helped further highlight the difficulties cited in literature, concerning the development of functioning models for HTR implementations (Pavlopoulos et al. 2023A). Peculiarities in the handwritten text impeded the recognition and transcription processes, leading to an array of mistakes. Deviations in positioning posed a particular obstacle, as confounded spacing and base-line gravity led to frequent misidentifications and incorrect collation of words. Ligatures and abbreviations were equally challenging, as the model could not separate the individual letters that formed them. As proven by the comparison of common-word instances, stylistic similarities among glyphs also served to confuse the algorithm, as υ (ypsilon) and ν (ny) were often interchangeable. The issues presented could possibly explain the disparity in detected characters, as well as the significant Character Error Rate Percentage (%CER) calculated (Fig. 2, 43.99%).
eScriptorium: Using a pre-trained HTR-Model
The fourth and final exercise provided a pre-trained model for layout recognition and automatic transcription of Byzantine Greek Handwritten Manuscripts of the 10th c. A concise workflow was outlined in the instructions of the exercise, for the employment of the model in the framework provided by the “eScriptorium” platform, developed by the École Pratique des Hautes Études (Stokes et al. 2021). Though the former may be downloaded locally on Linux or Mac computers, an account on the server of the CREMMA Project was also made available, in order to access the application. Thus, the inability to access the application on a Windows operating-system, could be overcome. The latter option was favored, as it allowed for the processing and storing of information on a cloud server, minimizing the demand on the student’s operating system. For the purposes of the exercise, three pages of the Palatinus graecus 23 from the Bibliotheca Palatina were selected to be examined. An examination of the produced text showed promising results, in terms of fidelity to the original. Overall, the transcription was satisfactory and would require significantly less post-correction, compared to the ones produced previously (Fig. 3).

Fig. 3 – eScriptorium: Visual comparison of recognized transcription on p. 49 of the Codex Palatinus graecus 23. On the top, the first 8 lines on the page, as segmented and transcribed automatically utilizing the provided model, on eScriptorium. On the bottom, the page as segmented on the digital Bibliotheca Palatina.
Platform Comparison
Transkribus and eScriptorium may be thought of as ultimately accomplishing similar results, as two examples of highly specialized platforms. Both were developed in an academic context and have been regularly employed for robust scientific work. Both were judged as having a user-friendly interface that enabled engagement with their tools with relative ease. The steps that had to be followed for each procedure were clearly communicated through labeled buttons and dedicated panes. Transkribus especially offered extensive instructions in the “Help Page”, which further enhanced the experience. Waiting times were similar, in the range of 15-25 seconds for segmenting, and for transcribing documents.
A major disparity between the applications would relate to actual accessibility. Transkribus is a licensed platform that requires a paid subscription to be utilized to its full potential. Currently, it only operates through a provided server, wherein operations are confined to what is allowed as part of the running payment plan. The API behind the platform is only accessible to collaborating organizations. Conversely, eScriptorium is open-source and may be freely accessed on Linux and Mac operating systems, or through a cloud server. Its full documentation is available online, along with its API. Community support is highly encouraged, in order to consolidate the platform. A final difference between the two is that eScriptorium also allows the export of models trained on the platform, ensuring the shareability of its products. Thus, the results of a scientific work may be replicable in new contexts.
Discussion and Challenges
The digital publication of produced data is steadily becoming a sine qua non for the investigated disciplines (Galleron – Idmhand 2020). Advancements in relevant software have permitted complex interactions with the material in varied environments. Cloud-based operations are becoming more of a norm, limiting the burden on private operating machines. The compounded engagement of experts with the new methodologies has opened up a new epistemological horizon for the disciplines, in the era of “Digital Humanities”. By examining the prospects and limitations of the tools presented, major challenges in the realm of Digital Paleography may also begin to be delineated.
The threat of data obsolescence remains a significant concern for pertinent implementations. Projects that are no longer retrievable or emulable are often encountered, as no measures were taken to ensure their permanence in digital form (Strange et al. 2023). Informed data management strategies should be deliberated over, to ensure that the models developed and the archives they are included in are not only functional in the present, but also replicable in the future. Interoperability may be set as an indispensable condition in publishing related work for posterity, to motivate its continuous refinement through the input of the scientific community. Consequently, “digitization-bias” should be taken into account when interacting with relevant tools. Datasets may be limited to what has been digitized, or simply preserved in accessible digital form, thus affecting the analytical scope of scientific research that heavily relies upon digital databases.
Additionally, the employment of digital tools such as the ones surveyed for research and analysis en masse, could ultimately result in the “black boxing” of scientific knowledge. In the field of Digital Humanities, the former term is understood as a process in which the over-dependance on software to perform complex queries contributes to limiting the user’s understanding of essential background operations (Stokes 2009). Procedures in digital environments often require a sequence of operations in advanced mathematics and statistics to be carried out, as vital components that make up the deep learning algorithms enlisted. Simultaneously, an efficient tool that produces a desired result, may not motivate scrutiny from the researcher, as it serves its purpose successfully without intervention. When relying primarily on applications that simply accept input and produce an output with little clarification, those same procedures become obscured to the user, who cannot effectively re-apply them without fear of loss of scientific integrity.
Regarding the problems presented, a viable solution has emerged in the form of extensive collaborations. When dealing with such heterogenous structures, it would be expected to face an array of complications, which can be alleviated through interdisciplinary cooperation. Indeed, the implication of experts from varied backgrounds has been common-practice in related projects. Computer scientists and data analysts have often aided the work of philologists. In a reciprocal manner, philologists have motivated pioneering approaches in the aforementioned areas. Both sides have sought to expand their knowledge and toolshed as well, by specializing in digital applications for the humanities (Galleron – Idmhand 2020).
Shareability of data and analytical processes has become another central objective in modern academia. Transparency concerning tools utilized, problems encountered, and error-correcting methodology favored, additionally strengthens relevant efforts (Joyeux-Prunel 2024). By adhering to protocols that promote open-access and collaborative practices, a fertile environment for the proliferation of the disciplines may be formulated presently, and for the future. As tools become more powerful, and experts gain more knowledge through the exchange of ideas, new directions for even more creative analyses may surface. Ensuring the explainability and reproducibility of published methods motivates participatory values among the scientific community, which aim at refining the introduced epistemology. The consistent reference to the FAIR Principles (Fair Principles, 2022) for the publication of digital data, serves to exemplify that point.