10 years of correspSearch – Connect Scholarly Editions of Correspondence

Since 2014, correspSearch has been collecting the metadata of edited letters and providing it for cross-project research. Just in time for its anniversary, new features are now available: visualisations, full-text search and a SPARQL endpoint. Over 270,000 edited letters can be searched. A good reason not only to introduce the new functions, but also to look back and see what is yet to come.

By Stefan Dumont, Sascha Grabsch, Jonas Müller-Laackman, Ruth Sander and Steven Sobkowski

Note: This article was translated by Deepl and only slightly corrected. The original blog post  was written in German and published on the DHd-Blog.


A look back

Ten years ago, on 1 September 2014 to be precise, correspSearch  officially went online with an email to the TEI list . The initiative for the web service was born in February 2014 at the workshop ‘Letter editions around 1800: Finding interfaces and networking’, which was organised by Anne Baillot and Markus Schnöpf at the BBAW. There, Peter Stadler presented his thoughts on the planned TEI element correspDesc and also expressed the idea of providing correspondence metadata from letter editions via an interchange format and aggregating it across scholarly editions (Stadler 2014).

Screenshot of the search interface in the correspSearch prototype, approx. 2015
Screenshot of the search interface in the correspSearch prototype, approx. 2015.

Following the workshop, Stefan Dumont developed the prototype of a web service at the BBAW that could aggregate files and make them searchable at a basic level: correspSearch (Dumont 2018; 2023). At the same time, a task force of the TEI Correspondence SIG completed the modelling of correspDesc (Stadler, Illetschko and Seifert 2016). With the inclusion of correspDesc in the TEI guidelines (version 2.8.0) in spring 2015, the first version of the Correspondence Metadata Interchange Format  (CMIF), developed by the TEI Correspondence SIG , was also finalised. The CMIF is based on a very reduced and restrictive set of elements (and therefore information). It is characterised by the consistent use of URIs from authority files such as the Gemeinsame Normdatei (GND) or the Virtual International Authority File (VIAF) for persons as well as GeoNames for places. This allows these entities to be clearly identified and searched for across projects.

From the very beginning, correspSearch relied on the provision of data by scholarly editions, research projects and institutions. The first data contributors were, for example, the Weber-Gesamtausgabe  and Letters and Texts from Intellectual Berlin around 1800 . In the following years, the database grew slowly but steadily. By summer 2016, over 17,000 edited letters had already been catalogued. This - and the fact that correspSearch was honoured with the Berlin DH Prize in 2015 - provided a strong backing for the application for a project, funded by the German Resarch Foundation (Deutsche Forschungsgemeinschaft – DFG). The application was thankfully approved and the project was able to start in 2017.

As part of the DFG project, the prototype was replaced by a new, modularised software architecture, which is primarily based on the search engine software Elasticsearch. This means that even very large volumes of meta and full-text data (see below for the latter) can be searched with high performance. New applications have also been developed for harvesting, ingest and API, which guarantee secure and stable productive operation. 

Map-based search in correspSearch
Map-based search in correspSearch.

The Elasticsearch software also enabled a faceted search so that search results can be further explored and filtered. Some filters were only possible by enriching the aggregated CMIF data with additional data from authority files. For example, letters can now be searched by gender and occupations of their correspondents. To do this, correspSearch uses data from the German Integrated Authority File (GND) and Wikidata. With the help of geo-coordinates obtained from GeoNames, for example, the map-based search  can be used. Here you can search for letters based on a region that is either drawn in freely or selected from a historical national territory (after 1815) stored in HistoGIS . The new search interface was implemented in Vue.js, the website as a whole is now responsive and can therefore be used on all end devices.

Entry form in the CMIF Creator
Entry form in the CMIF Creator.

In addition, we have created the CMIF Creator , a browser-based input form that scholars can use to create digital letter indexes for their editions without any prior technical knowledge. When entering persons and places, the GND or GeoNames can also be conveniently requested directly in order to add authority file IDs for persons and places. The CMIF Check  and CMIF Preview  services support the checking of CMIF files. In addition, video tutorials  on correspSearch and the CMIF Creator have been produced to supplement the existing documentation. The community also kindly provided tools for CMIF creation: Klaus Rettinghaus developed the Python tool CSV2CMI , which can convert CSV tables into CMIF files. The tool is also offered by the Saxon Academy of Sciences and Humanities as a web service  - supplemented by the web service ba[sic]?  . Julian Jarosch (Academy of Sciences and Literature Mainz) recently developed the eXistdb library CMIFerator , which can be used to implement a CMIF API in eXistdb.

The csLink widget  (bottom right) used in the Weber-Gesamtausgabe
The csLink widget (bottom right) used in the Weber-Gesamtausgabe.

The DFG project also developed the Javascript widget csLink , which refers to letters (in other editions) written by the correspondence partners at the same time as an edited letter in one's own digital edition (it queries the correspSearch API for this purpose). This ‘extended correspondence context’ can be of great interest, as a person can write to different correspondence partners about an event etc. - and sometimes with different content (Dumont 2023, 745). The csLink widget is published under a free licence  and can be used by any digital edition.

In 2018, a small side project was added that was initiated and realised by students: quoteSalute  (Lou Klappenbach, Marvin Kullick and Louisa Philipp, supervised by Stefan Dumont, Frederike Neuber and Oliver Pohl). The quoteSalute service offers curated greetings from edited letters that can be used in your own (email) correspondence (see also the german article on the DHd blog ). QuoteSalute was honoured with the DARIAH-DE DH Award 2018 . In the same year, the community-driven project consortium of correspDesc, CMIF & correspSearch was also honoured with the Rahtz Prize for TEI Ingenuity  from the Text Encoding Initiative.

Random greetings wiith quoteSalute
Random greetings wiith quoteSalute.

Over the past few years, the number of letters indexed in correspSearch has grown thanks to numerous data contributions - the majority of which came from the community, i.e. from scholarly edition projects and institutions themselves. Unfortunately, it would go beyond the scope of this blog post to list all data contributors, but some of them (in addition to those already mentioned above) should be mentioned as examples: Alfred Escher-Briefedition , Alexander Rollett. Seine Welt in Briefen 1844-1903  (ZIM Graz), Briefe der Bach-Familie  (Sächsische Akademie der Wissenschaften und Bach-Archiv Leipzig), Arthur Schnitzler - Briefwechsel mit Autorinnen und Autoren  (M. A. Müller, G. Susen, L. Untner, ÖAW; not only self-edited letters, but also metadata on Schnitzler correspondence in other editions), Digitale Edition der Korrespondenz August Wilhelm Schlegels  (J. Strobel & C. Bamberg), letters of Friedrich Wilhelm Joseph Schelling 1786-1802  (BAdW), Melanchthon correspondence  (Heidelberger Akademie der Wissenschaften), hallerNet , various editions that have been recorded or made available as part of the Norwegian Correspondences (NorKorr)  project led by Annika Rockenberger (e.g. on the letters  of  Friedrich Wilhelm Joseph Schelling 1786-1802 ). (e.g. on Camilla Collet or Edvard Munch), the correspondence of Otto Nicolai  (K. Rettinghaus), Halle pastors in Pennsylvania  (Francke Foundations), letters to Johann Wolfgang Goethe  (Klassik Stiftung Weimar), letters from and to Theodor Fontane  (Fontane-Archiv Potsdam), the correspondence of Paul d'Estournelles de Constant  (Anne Baillot & Team), The Mary Hamilton Papers  (D. Denison et al.), the Thomas Gray Archive  (R. Eck & A. Huber), CatCor - The Correspondence of Catherine the Great  ... the list goes on. A complete overview of all CMIF files and publications can be found here .

After all, CorrespSearch is not a one-way street for aggregated data: since its launch in 2014 the web service has been also accessible via APIs and the data can be retrieved in machine-readable form under a free licence. TEI-XML, TEI-JSON and CSV are available as formats - see the API documentation  for details. In autumn 2023, we launched the fully renewed API 2.0, which ensures a good performance even with large responses. In addition, a BEACON  interface offers the option of automatically linking the correspondence found in correspSearch (e.g. from index entries of persons). And thanks to Klaus Rettinghaus, a template is also available in the (German  and English-language ) Wikipedia to link to correspondence in correspSearch using the GND ID of Wikipedia articles on individuals.

State of play: Version 3.0 with visualisations and full-text search

The DFG project was recently successfully completed and version 3.0 of the correspSearch web service was released. This also means that new functions are now available. In addition to improvements such as searchable facets, two fundamentally new functions have also been introduced.

A. v. Humboldt's correspondence visualised over time
A. v. Humboldt's (published) correspondence visualised over time.

First, search results can now also be explored in visualisations. There are three different types of visualisation to choose from: Time course (as a stacked bar chart) of the correspondences, map view of the writing and receiving locations (also over time) and network display of the correspondence partners. All three visualisations can be accessed from the search result (i.e. after performing an initial search). Depending on the search, the different visualisations are differently suited for further exploration. While the timeline is good for visualising an entire correspondence (such as that of Constance de Salm-Salm ), the map view is particularly suitable for travel correspondence (such as that of A. v. Humboldt in Russia in 1829 ).

network from the data of the Weber Gesamtausgabe
Correspondence network from the data of the Weber Gesamtausgabe.

The network visualisation, on the other hand, is interesting for queries that are not person-centred or for editions that also contain letters from third parties or from a wider context, such as family and friends (e.g. the Weber Gesamtausgabe ). It can also be used to explore letter networks of entire time periods (e.g. 1789-1798 ). The underlying metadata can be viewed in all visualisations using the zoom function and pop-ups. It is also possible to switch back to the search result for more detailed research in many places. Smooth switching from the search result to the visualisation and back was a central point in the concept of the visualisations, which were implemented with the help of D3.js.

Search results in correspSearch
Search results for ‘Jubiläum*’ in correspSearch. The text snippets indicate whether the hits are in the abstract, the letter text or the editor's commentary.

Second, in addition to the metadata, correspSearch can now also harvest and aggregate the full texts of the edited letters and make them available for searches (e.g. for a search for the german word ‘Jubiläum*’  (“anniversary”)). Only the URL to the TEI-XML full text of the respective letter is specified in the CMIF and obtained from there when the metadata is ingested. Digital editions that already offer their data via API can thus easily provide the full texts for correspSearch. However, it is also technically possible to obtain individual files from data dumps (e.g. on GitHub or Zenodo). During ingest, basic TEI structures are analysed and displayed accordingly in the search result: this allows researchers to distinguish hits in the (original) letter text from those in the editor's commentary or abstract. At present, only texts from the first four digital editions are searchable, which have thankfully already extended the CMIF interface accordingly (including the Weber-Gesamtausgabe  and Dehmel digital ). The number of letters that can be searched in full text are displayed under the search slot of the full text search

In addition to the full-text search, the search functions will soon be supplemented with another one: it will then be possible to search not only for correspondence partners, but also for persons mentioned in the text. This function has already been implemented and will be activated shortly. Like the full-text search, it is based on the expansion of the CMIF in version 2 (proposal , see also Dumont et al. 2019).

The version 3.0 of correspSearch offers also a new, additional interface:  a SPARQL endpoint . Thankfully, this can be offered on the lod.academy  platform, which is operated by the Academy of Sciences and Literature Mainz. The current RDF data model  is also documented there. It should be noted that the SPARQL endpoint is currently still being operated as a beta version and that changes to the data model may still be made.

The aggregated data also reached a new level this summer. More than 270,000 versions of letters can currently be researched, mainly thanks to data contributions from the editing and research community, but also from the PDB18 cooperation project (see below).

What comes next

The DFG project correspSearch – Connect Scholarly Editions of Correspondence has now come to an end, but the web service will continue to be operated by the BBAW in the long term (BBAW 2023, 6). In addition, the DFG cooperation project Der deutsche Brief im 18. Jahrhundert  (The German Letter in the 18th century - PDB18), which is being carried out together with the Interdisciplinary Centre for the Study of the European Enlightenment (IZEA) at the University of Halle and the University and State Library Darmstadt, is still ongoing. The aim of the project is to establish a database and a cooperative network for the digitisation and research of German letters during the Enlightenment. The project focuses on the retro-digitisation and metadata collection of printed, completed letter editions (Décultot et al. 2023).

Logo of the project Deutscher Brief.

In addition, in PDB18 the web service will be enhanced with some additional functions, e.g. the ‘Dataset’ feature and the filter ‘Language used’. The most important development, however, will be csRegistry. With csRegistry it will be possible to assign a unique URI for a letter (as an ‘abstract’ entity) and to link different editions of this letter to it. This will make it possible in the future to show different editions of one and the same letter in correspSearch or to filter out these ‘duplicate’ records from the data if required - for network analyses, for example. 

So the future will bring a few more innovations for correspSearch. But hopefully there will also be many more new digital letter indexes as CMIF, which will further increase the database. Despite the fact that correspSearch already contains a considerable number of edited letters, the total number of edited letters (in German-speaking countries alone) is much larger. Therefore, even a service like correspSearch is useless without the many large and small data contributions from edition projects, scholars and institutions. We would therefore like to take this opportunity to express our sincere thanks for the numerous data donations over the last 10 years. And if you would still like to provide data (or would like to do so again), you can find all further information under ‘Participate’  on correspSearch.net.

 

References

Berlin-Brandenburgische Akademie der Wissenschaften. 2023. “Das Leitbild Open Science der Berlin-Brandenburgischen Akademie der Wissenschaften.” urn:nbn:de:kobv:b4-opus4-37530 .

Décultot, Elisabeth, Stefan Dumont, Katrin Fischer, Dario Kampkaspar, Jana Kittelmann, Ruth Sander und Thomas Stäcker. 2023. “PDB18: The German Letter in the 18th Century.” [Poster]. Encoding Cultures – Joint MEC and TEI Conference. Paderborn 2023. https://hcommons.org/deposits/item/hc:59731/  

Dumont, Stefan. 2018. “correspSearch – Connecting Scholarly Editions of Letters.” Journal of the Text Encoding Initiative 10. https://doi.org/10.4000/jtei.1742 .

Dumont, Stefan, Ingo Börner, Jonas Müller-Laackman, Dominik Leipold, Gerlinde Schneider. 2019. Correspondence Metadata Interchange Format (CMIF). In: Encoding Correspondence. A Manual for Encoding Letters and Postcards in TEI-XML and DTABf. Ed. by Stefan Dumont, Susanne Haaf, and Sabine Seifert. URL: https://encoding-correspondence.bbaw.de/v1/CMIF.html  URN: urn:nbn:de:kobv:b4-20200110163712891-8511250-2 

Dumont, Stefan. 2023. “Briefeditionen vernetzen.” In Digitale Literaturwissenschaft: DFG-Symposion 2017, edited by Fotis Jannidis, 729–49. Germanistische Symposien. Stuttgart: J.B. Metzler. https://doi.org/10.1007/978-3-476-05886-7_30 .

Stadler, Peter. 2014. “Interoperabilität von Digitalen Briefeditionen.” In Fontanes Briefe Ediert, edited by Hanna Delf von Wolzhagen, 278–87. Fontaneana 12. Würzburg: Königshausen & Neumann.

Stadler, Peter, Marcel Illetschko, and Sabine Seifert. 2016. “Towards a Model for Encoding Correspondence in the TEI: Developing and Implementing <correspDesc>.” Journal of the Text Encoding Initiative [Online] 9. https://dx.doi.org/10.4000/jtei.1742 .

Alexander Czmiel
Leitung
TELOTA - Digital Humanities
Tel.: +49 (0)30 20370 276
czmiel@bbaw.de 
Jägerstraße 22/23
10117 Berlin
© 2024 Berlin-Brandenburgische Akademie der Wissenschaften