Data Analysis, Methodology and Transparency
Tom Scheinfeldt wrote a thought-provoking blog post about the transition from the pre-occupation of history with ideology to a growing concern with methodology. He believes that we are at a juncture whereby history is dictated by organizing events and data rather than big ideas (http://foundhistory.org/digital-humanities/sunset-for-ideology-sunrise-for-methodology/). This has particular significance when asking questions about recording methodology within the ever-growing subject of digital humanities. Last week’s seminar raised important questions about the problematic relationship between the rapid growth in the use of digital resources on the one hand and the relative lack of consideration towards a transformation in how methodology is referenced on the other. As Tim Hitchcock has demonstrated, new epistemological methods have been embraced without adequate thought behind how what is read on a digital platform can be re-used for future research. Perhaps the greatest methodological problem this creates is the paradigm between what Hitchcock has termed immersive reading and what Franco Moretti calls distant reading. The ability to search and select through OCR’d text and data mining techniques has perhaps rendered immersive reading unnecessary. However, this transformation in research methodology has not been followed by a reassessment of citing digital versions of scholarly work.
The main implication this has for academic writing is its value in future re-use of research. If digital research is not recognised as being distinct from analogue forms, the resulting methodology would not provide an accurate representation of research paths chosen. The problem arises when someone who wishes to assess and query the methodology of a particular piece of work adopts immersive reading as a result of misleading citations as opposed to adopting the same methodological techniques used. What needs to be acknowledged is that web-based research resources, whether in the form of digitized books or historical databases, change constantly. As such, digital forms of research needs to be cited in all forms of scholarly research.
As Christof Schoch has demonstrated with regards to digitization of text-based sources, we are not yet at a stage where we are able to close the gap between ‘big’ and ‘smart’ data. Consequently, sacrifices are made; in the case of big data, we deal with ‘raw’ or ‘messy’ data, while smart data deals with comparatively small data-sets (http://journalofdigitalhumanities.org/2-3/big-smart-clean-messy-data-in-the-humanities/). The failure thus far to create big, smart data means that the limitations attached to both individually make it imperative that researchers are clear about their research paths so that their analysis, as well as the limitations to their research can be highlighted and recognised.