Nascent Thoughts on Text Analysis Across Disciplines

I had cause to reflect on this recently. The following essay is a rough draft but a recent book review of Franco Moretti’s latest brought it to mind.

More Scale, More Questions: Observations From Sociology

Tressie McMillan Cottom

Much of the debate about the whys and what-fors of textual analysis in the age of massive data is about scale. Take for example the current gold rush known as “big data”. On its face, big data merely refers to bits and bytes of data that are too large to be stored or processed using traditional means. Discursively, big data has come to mean much more. More than just taxonomy of size, big data is notable “because of its relationality to other data” (boyd and Crawford 2011, p.2). The scale of large quantities of data portends analytical possibilities for uncovering fundamental social facts. In this context, scale is about a grand theory of human nature and human systems. For many actors, large-scale data is valuable simply because it is big and “now”[1]. Franco Moretti’s “distant reading” (Moretti 2000; Schulz 2011) is embedded in this larger political economy of what some sociologists have called the “datalogical turn” (Clough et al 2015) of quantification and digitization. This argument unfolds against a backdrop of socio-political processes that favor market relationships over social ties, individuals over collectivities. It also unfolds within the academic industrial complex, itself a microcosm of the larger socio-political context. In this context, scale is an end unto itself. Data-tizing literature at large scale becomes meaningful not because of its ontological superiority per se but because it rationalizes the hegemonic cultural imperative that all things (and beings) be data-tized. With scale come questions that are important for how we identify, understand and analyze textual data.

One of boyd and Crawford’s provocations for the big data moment (2011) is that claims of objectivity and accuracy are misleading. What does the “mistaken belief that qualitative researchers are in the business of interpreting stories and quantitative researchers are in the business of producing facts” mean for textual analysis in the humanities given the discipline’s emphasis on interpretation as opposed to production? Moretti seems excited by a digital humanities project that creates a “unified theory of plot and style” (Schulz 2011, BR 15). If plot and style can be attributed to a single unifying theory, a discernable objective pattern of words across context, plot and style risk losing their salience as analytical concepts. To put it more simply, plot and style are meaningful ways of thinking about text because they capture difference and not because they approach the singularity.

The tensions I describe are not unique to digital humanities or humanities writ large. The social and natural sciences are casting about the same waters of shifting university priorities, declining financial support, and academic entrepreneurship[2]. For sociology, the quantitative or datalogical turn as hegemonic has been a long, strange trip indeed. Latour writes that ‘Sociology has been obsessed by the goal of becoming a quantitative science” in 2010. In the 1970s sociologists worried that changing political economies (e.g. state patronage) were legitimizing quantitative methodologies over qualitative sensemaking, and empiricism over theory. It “would be naive not to recognize that state patronage has contributed to promoting atheoretical forms of methodological empiricism and has given less encouragement to other approaches.” (1977 p. 17). Karabel and Halsey are talking about the power that state-funded and controlled access to national data sets, the “big data” of the time, exerts normative and economic power over the questions scholars ask and how they set out to answer them. Miriam Posner provides an analogous discourse for digital humanities when she says that, “most of the data and data models we’ve inherited [from business applications] deal with structures of power, like gender and race, with a crudeness that would never pass muster in a peer-reviewed humanities publication” (2015). Business applications exert influence over the data being produced, the scale of which is being produced, and constrains how that data can be accessed, analyzed and politicized. Like state patronage, business applications (or market actors in my parlance), give scale the relationality that boyd and Crawford critique and the taken-for-grantedness of Moretti’s enthusiasm for distant reading. We do distant reading because we can. But that we can do it – these data, these methods – is inherently political.

Moretti would seem to agree that textual data are political (if I’m not entirely sure that he would agree that our conceptualization of literature as textual data are also political). He has taken great pains to link distant reading and quantitative textual analysis to Immanuel Wallerstein’s influential world systems theory (WST). WST is Marxist in tradition and contemporary in its focus on nation-states as the primary unit of analysis, one of its primary contributions to contemporary social theory. Moretti proposes to “borrow this initial hypothesis from the world-system school of economic history, for which international capitalism is a system that is simultaneously one, and unequal: with a core, and a periphery (and a semiperiphery) that are bound together in a relationship of growing inequality” (2000, p.55). It is an interesting theoretical treatment with important empirical considerations. WST has a particular emphasis on how powerful core nations manipulate the terms of a global economic system to extract resources from (semi-)peripheral nations to expand profit taking in various forms of trade. The unit of analysis in WST is nation-state but its mechanisms are about resources, geopolitics, and capital. It is difficult to see how this translates to the methodological choice of some national texts to “uncover the true scope and nature of literature” (Schulz 2011 BR14).

This application of quantitative textual analysis brings up several questions. One question is what constitutes a data set when it is broadly defined as “literature”? For instance, Moretti has made choices about language and time period. Roberto Franzosi et al (2012) have applied quantitative methods to newspaper data on enslaved people in the U.S. south during a period of frequent public lynchings. There is a theoretical framework guiding these choices. In methodological terms from content analysis, something bounds these texts, making them a “data set”. Wallerstein is not much use here and it is where Moretti’s theoretical choice of WST to ground quantitative textual analysis becomes difficult to grasp. Another question is what constitutes a form under the current conditions of prosumption, or “situations in which consumers collaborate with companies or with other consumers to produce things of value” (Humphreys and Grayson 2008, p. 2). An example of this would be the production of long-form texts, produced over time by an individual or collective of authors on a digital publishing platform. The platform may be privately owned, as in the case of Blogger. The context produced by users of the platform can generate revenue for the private company, which operates under the logics of financialized capital. But, the platform users are also producers. They conceive and author the content and can still own legal rights to that content. The content producers can be in one nation-state and the company that owns the publishing platform can be in another. The content producers can call their content a digital magazine but platform owners can call the same content a blog while readers can call it a book. The content is searchable in another corporate-owned platform, i.e. a search engine like Google. But, it may not be classified in an academic library database using any hegemonic taxonomy for knowledge classification. Are these texts part of the literature on which quantitative textual analysis is refining its hypotheses and running experiments? Whether it is or is not, a set of assumptions is embedded in the data on which models about the inherent nature of literature are based.

Sociology has developed a diverse toolkit to identify, measure and analyze various forms of text with an attention to political economy. This includes content analysis (e.g. newspaper content), organizational analysis (e.g. texts produced by institutions or organizations), and quantitative narrative analysis or QNA (e.g. sociological complement to distant reading). I will provide two examples of how political economies informed the theory that guided my methodological analysis of various texts. These examples are not meant to be instructive of best practices and certainly not across disciplinary borders. Instead, I aim to show the role theory has played in what I consider “text” as illustrative of how robust a unified theory would need to be in our contemporary research milieu.

The research project had a deceptively simple question: are there more interracial couples on television today than some unspecified past? The question emerged from debates on social media about the hit network television show, “How To Get Away With Murder”. The show has a black female protagonist who is married to a white male. Memes circulated on social media about the show seemed to focus heavily on the novelty of the pairing, if not outright claiming it as evidence of a new vanguard for the acceptability of black women as desirable sexual partners. To explore the question of the trope’s significance via its novelty, I had to translate a type of text into a data set. I needed a body of data about television programming. I ended up with a guidebook, “The Complete Directory to Prime Time Network and Cable TV Shows, 1946-Present”. The encyclopedic text “cover[s] the entire history of network TV in the United States, from its inception on a regular basis in 1994 through April 15, 2007”. I also used Wikipedia entries for television show descriptions, and the International Movie Database for audience interpretation. When I chose these texts I incorporated the logics embedded in the making of the text. I adopted the epistemology of an English language text. I adopted the history of network television and its concomitant market relationships with advertisers. I adopted the textual constraints of various platform architectures. I adopted the cultural logic of “race” and “gender” and “sex” and heteronormativity that would be reflected in a mass produced genre that has been indexed for textual analysis. Even after accepting all of the constraints I inherited in the data I had to deal with the complexity of categories that are at odds with critical theory. For example, is a character on Grey’s Anatomy “black” because I interpret him as black or because the show’s writers write the character as black or because the actor playing the character identifies as black? My analysis relied in great part on ascribing race to visual data that had been captured as text without any consideration of what constitutes race. Miriam Posner explores other examples of how our classification systems are produced within a political economy and how we inherit them through our tools. She wonders, “What would maps and data visualizations look like if they were built to show us categories like race as they have been experienced, not as they have been captured and advanced by businesses and governments?” (2015). I extrapolate from Posner what it means when distant reading or quantitative textual analysis does not theorize the power relations of financial actors or the social construction of race in computational models or analytical frameworks.

Without that lens I suspect that we get a quantitative textual analysis that is very popular with powerful actors precisely because it does not theorize power relations. Given our current political economy, especially in the rapidly corporatized academy, one should expect great enthusiasm for distant reading and acritical theorizing. We should not be distracted by the appeal to WST, a critical theory of global power relations. The devil, when it comes to analysis, is in the details of mechanisms. WST as currently applied to textual analysis does not go so far as to explicate how global systems of capital, geo-politics and power define what is literature, the ways we produce various texts in new digital mediums, or to what ends we analyze them out of their given contexts. Sociologists who use QNA approach the challenge of the context of textual data by focusing on the qualitative decisions in quantitative text analysis. I could not find an analog among distant reading approaches as currently used by Moretti but Roel Popping argues that mechanized and manual “coding is based on a qualitative decision that everybody should understand” (2012, p. 89). Coding schemas emerge from political contexts. Those political contexts have historical decisions embedded in contemporary categories. These contexts may not be of primary interest to humanities’ traditional scope of inquiry but when one adopts computational tools (and the political and market systems embedded in them) it is a good moment to reflect on what those tools mean. Big data does not solve the human conundrum of power of which every text – every kind of every epoch of every culture – engages. Big data can obscure that power but then any inferences based on the unexamined assumptions and theoretical mechanisms cannot be said to be true of literature writ large but only of literature bound by the market’s invisible hand.

Bady, Aaron. “The MOOC Moment and the End of Reform.” The New Inquiry (2013): Web. 29 July 2015.

Berman, Elizabeth Popp. Creating the Market University: How Academic Science Became an Economic Engine. Princeton [N.J.]: Princeton University Press, 2012. Print.

Boyd, Danah, and Kate Crawford. “Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon.” Information, communication & society 15.5 (2012): 662–679. Print.

Clough, Patricia Ticineto et al. “The Datalogical Turn.” Non-Representational Methodologies: Re-Envisioning Research (2015): 146. Print.

Cottom, Tressie McMillan. “When White Men Love Black Women on TV.” tressiemc. Web. 29 July 2015.

Cottom, Tressie McMillan, and Gaye Tuchman. “Rationalization of Higher Education.” Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource (2015): n. pag. Google Scholar. Web. 25 May 2015.

Franzosi, Roberto, Gianluca De Fazio, and Stefania Vicari. “Ways of Measuring Agency An Application of Quantitative Narrative Analysis to Lynchings in Georgia (1875–1930).” Sociological Methodology 42.1 (2012): 1–42. Print.

Humphreys, Ashlee, and Kent Grayson. “The Intersecting Roles of Consumer and Producer: A Critical Perspective on Co-Production, Co-Creation and Prosumption.” Sociology Compass 2.3 (2008): 963–980. Print.

Karabel, Jerome, and Albert Henry Halsey. Power and Ideology in Education. Oxford University Press, 1977. Print.

Latour, Bruno. “10 Tarde’s Idea of Quantification.” The social after Gabriel Tarde: debates and assessments 4 (2010): 145. Print.

Moretti, Franco. “Conjectures on World Literature.” New left review (2000): 54–68. Print.

Popping, Roel. “Qualitative Decisions in Quantitative Text Analysis Research.” Sociological Methodology 42.1 (2012): 88–90. Print.

Posner, Miriam. “What’s Next: The Radical, Unrealized Potential of Digital Humanities.” Miriam Posner’s Blog. 27 July 2015. Web. 29 July 2015.

Schulz, Kathryn. “What Is Distant Reading.” The New York Times 24 (2011): BR14. Print.

Slaughter, Sheila. Academic Capitalism and the New Economy: Markets, State, and Higher Education. Baltimore: Johns Hopkins University Press, 2004. Print.

Suchman, Mark C. “Managing Legitimacy: Strategic and Institutional Approaches.” Academy of management review 20.3 (1995): 571–610. Print.

Tuchman, Gaye. Wannabe U: Inside the Corporate University. Chicago: University of Chicago Press, 2009. Web. 12 Aug. 2012.

[1] See Aaron Bady (2013) on temporality and technological moments.

[2] Slaughter (2004) is an excellent work on the macro context of these changes that I cannot treat fully in this essay. I suspect every academic (and academic-in-training) is familiar with the effects of that macro context, i.e. increased competition for limited funds, etc. See also Gaye Tuchman’s “Wannabe U” (2009)and Tressie McMillan Cottom with Tuchman (2015) for an institutional level analysis of the market’s effect on college processes, policies and various actors. Finally, Elizabeth Berman’s “Creating the Market University” (2012) provides an important historical analysis of contemporary macro and meso (institutional) realities of the academic complex.

6 thoughts on “Nascent Thoughts on Text Analysis Across Disciplines

    1. Hi Dahn, your comment is confused. If sociology is a tool of the oppressors, it can not be considered “irrelevant”. Being a tool of the oppressors would make it quite relevant as a matter of study for all, therefore. We should know how oppressors are using their tools.

      Just saying…

  1. I am not sure I agree that textual data are political. The identification, categorization and classification of data certainly is to some degree, and in as much as that has a direct effect on retrieval it makes it very political.

    Interesting concept. I cam see why heard from you lately.

    1. I can see why we haven’t heard from you lately. (don’t ask what happened to that last line, I don’t know )

Leave a Reply

Your email address will not be published. Required fields are marked *


Becoming An Advisor

If you have been keeping track, I’m now an assistant professor of sociology at Virginia Commonwealth University. I’m also a faculty associate at the Berkman Institute for Internet & Society. And, I’m a contributing editor at Dissent and a contributing writer at The Atlantic. It’s a lot. I love it all. This post is aboutRead More “Becoming An Advisor”