Report: Beyond Mining Workshop

By Tessa Hausdewell

The ASYMENC Beyond Methods of Mining workshop sought to address how results of digital research can be used in historical research and on how historians can best evaluate the validity of digitally mined evidence. For as Hermione Giffard pointed out in her opening remarks, the movement of the last decade towards digitising historical collections and developing text mining tools seems to have preceded nuanced reflections to what end these tools can be utilised to pose. The workshop, therefore, served to initiate a sustained discussion amongst historians about their use of digital archives and their faith in using digital tools to answer historical research questions.

In her keynote address, Jane Winters from the Institute of Historical Research addressed the manifold changes in which the advent of the digital lastingly changes research practices and processes. She began by pointing out that, overall, the speed of qualitative change in research practices has been slower than initially anticipated and has proceeded unequally amongst the disciplines. Those who have profited most from new digital tools and archives tend to be early modern- rather than twentieth century historians, whose sources are locked away mostly in copyrighted archives. Where the digital is becoming part of the spectrum, Winters noted, the use of digital archives does simplify access to sources, but working with these archives also requires also new forms of collaboration at the intersection of the academic and the technical. While the spatial turn was already part of a larger development of digital history, Professor Winters asked historians to rise to the challenge and to move away from purely text based archives towards sound recordings and images as primary sources.

The presentations of this present workshop were, however, very much focused on traditional textual archives and included newspapers, tax surveys, parliamentary debates, catalogues, ranging from the late eighteenth century up to material. Together, they showcased the range of digital analysis depending on the focus of enquiry. They included named entity recognition, topic modelling, n-gram visualisations, geomapping, multiple correspondence analysis, as well methods used in corpus linguistics such as collocation and cluster analysis.

Opening the first of the panels, James Baker from the University of Sussex gave a presentation on Acts of being in proxies for prints. People in the British Museum catalogue of Political and Personal Satire, 1770-1830. Using descriptions of satirical prints from the late eighteenth and early nineteenth century as his source, Baker undertook an analysis of macro-patterns of these descriptions in order to reveal changing patterns of descriptions of men and women and changes to forms of speech acts over time.

Ian Gregory, (University of Lancaster) demonstrated the possibilities of a spatial, rather than temporal approach to digital history in his talk on Spatial Humanities: texts, GIS places and public health in 19thcentury Britain. Geo-parsing techniques to identify place names in printed reports from 1851-1911 in the Census and Registrar General’s Reports enable the researcher to identify what is being said about a specific place and to map out places in connection with particular public health themes. In combination with a detailed collocation analysis of these places and themes, the results reveal patterns and regional differences in public health reporting in nineteenth century Britain.

The second panel included two papers dedicated to nineteenth century British newspaper corpora. Amelia Joulain Jay (Lancaster University) analysed Victorian British attitudes towards France and Russia through the eyes of a nineteenth century newspaper, The Era. France and Russia were the two countries most commonly perceived as the two main rivals to Great Britain, but, as Joulain’s analysis demonstrated, the two countries were discussed in very different terms. In combining a ‘global’ and ‘sampling’ approach to her corpus with a linguistic analysis of the phraseologies, she concludes that overall France was presented as a more immediate threat, while Russia’s agency was downplayed.

Tessa Hauswedell’s (University College London) talk on Reporting the Empire dealt with the daily newspaper Pall Mall Gazette, in the period from 1870-1900.  According to the historiography, newspapers were one of the key drivers in continuously whipping up popular support for the British Empire in the late nineteenth century. Yet, what are the empirical indicators for an alleged increase in the a) quantity and b) tone of reporting on the British Empire over a thirty-year period? Using a mix of frequency analysis of place names and collocation analysis, she concluded that there is no discernable increase in the reporting about empire, but a change in the tone and style of discussions about empire in the Pall Mall Gazette.

Shedding new light on an existing, but  ‘unsettled’ research question was also the aim of Mustafa Erdem Kabadayi’s (Istanbul Bilgi University) presentation. In making use of a   multiple correspondence analysis on an extensive dataset extracted from 1845 Ottoman tax surveys, it sought to assess the validity of the historical claim that an ethno-religious division of labour prevailed in mid-nineteenth century Ottoman Empire. The numerical and qualitative analysis demonstrated that indeed ethno-religious affiliations were a major determinant of occupation during the Ottoman Empire.

On the second day of the workshop, Paul van Trigt (Utrecht University) touched upon one of the central problems that historians working with large corpora are trying to deal with: how to mine for concepts, and how to show changes in the meaning of concepts and specific terms over time.  His talk, entitled ‘Microhistory and Big Data. Rewriting a History of Disability by Mixed Methods’ sought to trace the use of the term ‘disability’ throughout the twentieth century in Dutch public debates and to investigate whether the use of the word disability is mainly used in the context of issues relating to ‘care’ of disabled, rather than the context of ‘normalcy’ of living with disability.

Maarten van den Bos, (Utrecht University) also touched upon changing meanings of concepts and terms over time. In ‘Unsupervised walks. Youth, mass culture and the changing future of society in Dutch public discourse, 1945-1965’, he traced changing trends in the reporting about ‘youth’ by analysing articles and reports about popular music from the Dutch post-war era. While in the fifties ‘youth’ was frequently associated with wantonness, latent danger and moral decay, a shift occurs in the sixties towards more positive portrayals of youth culture, coupled with the emergence of new categories such as ‘authenticity’ and ‘self-fulfilment’ in relationship with youth.

In the final case study of the workshop, Dino Mujadzevic from the Ruhr University Bochum moved onto the contemporary history of South-eastern Europe. His presentation sought to investigate how media representations of Turkey were used to frame Bosnian public opinion on Turkish foreign policy towards Bosnia between 2002 and 2014.   Making use of a detailed collocation analysis of a Bosnian media archive, Mujadzevic demonstrated how two clear categories of pro- and anti Turkish sentiment are present, and how the more dominant pro-Turkish discourse is likely to be concentrated in specific locations, such as the capital Sarajevo.

In the closing presentation of the workshop, Hermione Giffard provided a critical view of the utility and validity of digital tools and methods and questioned whether their presence in historical research means that historians are likely to adopt a digital, rather than historical epistemology. Beyond the problem of the decontextualization of the source material that occurs in mining large-scale corpora, Giffard addressed how researchers invariably have to accept the settings and algorithms that the tools supply. One such example, she noted, was the widely used TF-IDF weighting scheme, a score commonly used to rank the findings of search queries. Yet, is the TF-IDF score an accurate measure appropriate for historical research? And does the feeling of comprehensiveness that a large archive promises perhaps lead the researcher into a false sense of security? Giffard concluded the talk with a plea not to rush towards digital positivism but to maintain a critical distance and to continuously anchor close reading into the process of doing digital history.