This project combines textual encoding and thematic analysis to examine racialized language in Wuthering Heights (1847), specifically the representation of Heathcliff. I initially planned to study Wuthering Heights alongside several adaptations, focusing on how racial language shifts across adaptations and compare those differences to the original version. However, I found that working across three texts made it difficult to develop the depth of analysis I wanted and it would require a more complex customization that could be used across the novels. Therefore, I narrowed the project to focusing on Wuthering Heights alone. This allowed me to closely examine its racialized language and create a more customized, TEI (Text Encoding Initiative)-driven framework for examining Heathcliff’s depiction.
The project uses the 1847 edition of Wuthering Heights, with special attention to language shaping Heathcliff’s racial ambiguity. I begin by preparing a clean digital text from a public-domain edition available on project Gutenberg. I deleted any extra pieces that weren’t a part of the story from the online version which allowed me to isolate the novel into each chapter which I was then able to copy into Oxygen—the software I used to complete my encoding.
I began with the "seg" element as an initial tool to highlight and isolate racialized language throughout the text. Racialized language refers to words, descriptions, or labels that assign social meaning to race and shape how certain groups are perceived or treated, and in Wuthering Heights this matters for Heathcliff because the ambiguous, often derogatory terms used to describe him construct him as racially Other and influence how characters interpret his identity, status, and place within the household and community.
It is worth noting that not all othering is racial othering, and this project draws a deliberate distinction between the two. The descriptors encoded here were selected because they draw specifically on racial and ethnic meaning—whether through direct labels like 'dark-skinned gipsy' and 'Lascar,' or through implied associations between darkness, foreignness, savagery, and social exclusion that carry racial weight in the context of nineteenth-century Britain and its imperial history. Descriptors that signal class-based or gendered othering without this racial dimension were not encoded under raceDesc, though the project acknowledges these forms of othering frequently overlap in the text.
Through these initial exploratory markings, I noticed that patterns were more consistent and analytically rich when focusing specifically on Heathcliff’s descriptions. Through these markings I could see how different characters described him and it became clear that creating a custom code would allow me to express these patterns. I conducted repeated close readings to observe connections between descriptive language, narrative voice, and the historical context. By combining this with the TEI it allowed for the project to become a way of making patterns of racialized description visible, comparable, and analyzable beyond traditional close reading.
I created an ODD file—a special XML file used to define and customize TEI schemas—which allowed for me to have a custom schema that I could add to and input my custom elements inside along with descriptions of them. The two custom tags I created are:
"raceDesc" with type=”explicit” or type=”implied” to distinguish direct vs. indirect racial descriptors. Explicit descriptors refer to direct racial labels or comments stated openly in the text, while implied descriptors rely on indirect hints—such as appearance, behavior, or social treatment—that suggest racial meaning without naming it outright.
Example 1: “He is a “tc:racedesc type=”explicit”dark-skinned gipsy ”tc:racedesc” in aspect.”
Example 2: “A perfect “tc:racedesc type=”implied”misanthropist’s Heaven tc:racedesc” —and Mr. Heathcliff and I are such a suitable pair to divide the desolation between us.”
“raceContext” with type=”historical” to indicate the contextual frame for the racialization. Historical context includes ideas and events related to race and history from the past.
Example 1: The master tried to explain the matter; but he was really half dead with fatigue, and all that I could make out, amongst her scolding, was a tale of his seeing it starving, and houseless, and as good as dumb, in the streets of “tc:racecontext type=”historical”Liverpool”tc:racecontext”, where he picked it up and inquired for its owner.”
The encoded text section consists of multiple drop down pages users can navigate through. The entire encoded text has been uploaded to allow for users to view the entirety in one space. In the chapter breakdowns these include the sections I have encoded using my custom code. My hope is this will allow users to view the specific language I wanted to highlight and this can also be a great way to focus on the chapters within the larger context of the novel.
This combination of close reading and customized TEI encoding allows for a nuanced exploration of the novel’s racialized language, revealing patterns that traditional analysis might overlook.Through exploring the novel as a whole users can witness the patterns within the text and also I hope they can come away with new revelations that allow for deep engagement with Heathcliff’s character. By situating these findings within historical and social contexts, the project offers a new digital framework to engage with Heathcliff’s complex identity. The resulting encoded text and schema are publicly accessible, designed to support further scholarship, teaching, and critical discussion around race in nineteenth-century literature.