Text-induced corpus correction and lexical assessment tool
Research into our culture is about understanding conversations and debates as forms of public discourse. New ways of studying culture are inspired by the availability of massive digital collections: growing repositories of old (for example centuries worth of newspapers, radio, and television archives) and new media (for example blogs, twitter streams, and discussion forums). Scholars studying culture are beginning to engage with data-intensive research methods. Digital humanities, computational humanities, e-humanities – no matter what label is being used, we are observing a dramatic shift from data-poor to data-intensive research, a shift that generates unique challenges for today’s search and text mining technology.
How can we support a data-intensive research cycle in cultural studies? Building on existing open source tooling, the project will follow an iterative, task-based approach to creating key search, analysis, and visualization solutions for studying public discourses within a humanities context. The development will be guided by three complementary use cases. In one, public discourse is driven by a mixture of scientific notions and notions of the good life, in the second by differences in experience and valuation of films, and in the third by social issues related to law and order.
Use case 1 focuses on genetics and eugenics. This discourse involves the changing significance attached to nature over nurture, and the collective, yet varying images of “the good life”. The use case deserves analysis in terms of continuities and discontinuities. We focus on the early twentieth century, when the main polarities of this debate were articulated, and the early twenty-first century, when discussions on medical genetics are overshadowed by the “specter of eugenics,” the fear that genetics will lead to control over sexual reproduction.
Use case 2 is based on the fact that film titles can provoke different emotions in viewers depending on preferences, past experiences, and values. In public discussions people use the emotions they had when viewing to argue a film’s value. In analyzing the discussions on forums and in reviews we discern a variety and variability over time and relations with valuations may result in rich and possibly new categorizations of films, genres, and discourse positions.
Use case 3 focuses on drugs, drug trafficking and drug users in the early twentieth century and early twenty-first century. In both eras the public view on drugs alternates between medical and social aspects. SPuDisc will allow for associations, longitudinal search and comparisons that enable researchers to analyze the mechanisms of a swinging pendulum in public discourse.
The specific developments are centered on the realization of technologies for normalizing expressions, detecting semantic shifts in language usage patterns, exploiting multi-linguality to aid in understanding public discourse and explicating different perspectives in discussion around a given issue. The outcomes will be incorporated in dedicated interfaces for two key phases in humanities research: exploration and contextualisation.