Early modern information overload: the Encyclopédie as big data
Résumé
The exponential increase in the scale of humanities datasets is altering the relationship of scholars to the objects of their research. These ‘big data’ approaches—from data mining to distant reading—promise new insights into our shared cultural record in ways that would have previously been impossible. But, these same methods also risk disconnecting scholars from the raw materials of their research as individual texts are subsumed into massive digital collections. One of the main challenges for the digital humanities is to develop scalable reading approaches—both distant and close—that allow scholars to move from macro- to micro-analyses and back. In this talk, I will outline some previous attempts at addressing this challenge using data mining and machine learning techniques to explore large-scale datasets drawn primarily from the French Enlightenment period, and in particular the great mid-century Encyclopédie of Diderot and d’Alembert. Preliminary analysis of these datasets demonstrates that the overwhelming sense of ‘information overload’ that characterises our modern condition is in fact much older. From the Renaissance onwards, print culture was shaped by new information technologies such as indexing, commonplacing, and encyclopaedism, developed in order to make sense of the growing textual record. Today, as we grapple with our own data deluge, these techniques can help put our current fascination with big data into perspective, ensuring that the inherent specificity of humanistic enquiry remains viable and vibrant at any sc