This project offers a race-centered digital reading of Emily Brontë’s Wuthering Heights (1847) through customized Text Encoding Initiative (TEI) markup. By encoding racialized language surrounding Heathcliff, the project makes visible patterns of description, exclusion, and ambiguity that have long fueled critical conversations about the novel. Through this encoding users can witness the racialized language within the larger context of the novel and think more critically about the choices Brontë made by creating an important character such as Heathcliff. Rather than treating race as a secondary or speculative concern, this project positions racialization as a structuring force within the text and asks what new insights emerge when digital methods are designed explicitly to account for it. This project demonstrates how encoding can function not merely as a tool for textual organization, but as a critical practice capable of exposing how language produces racial meaning. Through this process of encoding I managed to create a custom TEI schema which allowed me to closely define the language of the text through creating specific categories. These markings when examined side by side allow for user to experience the racial language in a completely different way, rather than just reading through the text to see these examples by utilizing TEI I was able to explicitly mark these findings to allow for further analysis and showcase how we can use TEI to mark these concepts. My hope with this schema is to use this as a model to replicate amongst other texts so I can showcase the significance of marking this racialized language and how this can provide important insights.
Emily Brontë’s Wuthering Heights (1847) is a Gothic novel set on the Yorkshire moors. The novel itself is told through two different narrators, Nelly and Lockwood. The story centers on Heathcliff, a foundling of unknown origin brought into the Earnshaw household, whose presence reshapes the lives of those around him. He is constantly ridiculed, bullied, and never truly accepted into the family. He is raised alongside Catherine Earnshaw, their daughter. Heathcliff forms an intense bond with her that is ultimately fractured and the two share a toxic relationship that haunts the other characters of the book. After Catherine marries Edgar Linton, Heathcliff disappears, only to return years later wealthy and determined to reclaim power over both families. His pursuit of revenge drives the novel’s violence, obsession, and generational conflict, extending beyond the first generation to later entangle their children.
Heathcliff’s identity remains deliberately unstable: he is repeatedly described through language associated with foreignness, darkness, and animality, yet never assigned a fixed racial or national origin. This ambiguity has made Heathcliff one of the most contested figures in nineteenth-century British fiction, raising questions about race, belonging, and inheritance. Wuthering Heights challenges readers through its refusal to domesticate difference, presenting a world in which love, cruelty, and social power are inseparable.
By making these moments visible within the text itself, the project encourages readers to reconsider how race operates in Wuthering Heights—not as a secondary theme, but as a shaping force within its characters, narrative structure, and emotional economy. The text itself offers a rich analysis when it comes to our understanding of race within the nineteenth century and pushes back against a surface level reading of the text that doesn’t emphasize the significance of race.
This project shows that racialized language in Wuthering Heights is not isolated or incidental, but patterned, cumulative, and structurally embedded in the novel. Brontë explicitly inputs clues throughout the novel for audiences to uncover and wants to showcase the complexity of Heathcliff's character. Through customized TEI encoding, I identified recurring moments in which Heathcliff is described using language that marks him as racially and socially "Other." When viewed across the full text, these moments form clear patterns that are difficult to perceive through close reading alone but rather the encoding allows for even the less obvious moments of racialized language to be highlighted. For instance, encoding revealed that the word 'black' attaches to Heathcliff with striking consistency across the novel—marking his eyes, his temper, his face, his countenance—and clusters especially at moments of social exclusion and punishment. The full weight of this pattern only becomes visible through encoding: scattered across 34 chapters these descriptors accumulate into a sustained strategy for marking Heathcliff as racially and socially Other in ways that close reading alone cannot fully capture. Those who have read the novel might know the main passages that people reference that use explicit racial language but through my encoding I was able to capture those passages alongside other ones to showcase a wider picture within the entire novel.
The encoding reveals that racialization operates in both explicit and implied ways. Direct descriptors of appearance coexist with indirect cues such as animal imagery, moral judgment, and social exclusion. These three categories help to frame the language around Heathcliff and allow for audiences to see how he becomes attached to specific language that is meant to taint his character and craft a villain. The project demonstrates that Heathcliff’s identity is shaped less by a single label than by the accumulation of these descriptors over time and across narrators. The findings also highlight the importance of historical context in understanding how racial meaning is constructed. References to places such as Liverpool, when encoded alongside racialized descriptions, reinforce the novel’s connection to Britain’s imperial and slave-trading history which Brontë’s father was involved in, therefore she would have been deeply aware of these events. Encoding these references makes visible how geography, history, and character description work together to racialize Heathcliff without ever naming his origins directly.
By customizing the schema to distinguish between explicit and implied racial descriptors and to mark historical context, the project shows how digital tools can be designed to support interpretive questions about race. These findings suggest that race-centered encoding offers a powerful framework for reexamining nineteenth-century texts.
Heathcliff’s identity is shaped throughout Wuthering Heights by description rather than explanation, unfolding through the ways characters observe, name, and respond to him over time. The novel offers no single account of who Heathcliff is or where he comes from, but instead presents a series of moments that invite readers to notice how difference is expressed and understood. Here we will explore two examples of my TEI encoding to examine how those moments appear across the text, focusing on the language that contributes to Heathcliff’s racial ambiguity through an explicit and implicit example. By looking at these patterns together, the project offers a clearer view of how identity is constructed through repetition, context, and narrative perspective. I encourage you all to continue looking at these descriptions by clicking through the individual chapters linked to see the exact words used.
Explicit: “He is a “tc:racedesc type=”explicit” dark-skinned gipsy “/tc:racedesc” in aspect.”
In this example, the narrator uses a direct racial descriptor to mark Heathcliff’s appearance. This is a passage people bring up when looking at Heathcliff’s race. The phrase “dark-skinned gipsy” functions as an explicit label that racializes Heathcliff through both physical description and ethnic association, drawing on nineteenth-century stereotypes that link darkness with foreignness and social marginality. Even though Brontë uses this explicit term I would still argue we cannot be definitively sure what specific racial group Heathcliff belongs to since there are other clues within the text that could be used to describe other marginalized people during that time. But this example is important because it becomes a clear indication that Heathcliff differs greatly from the other people of the house.
Implied: A perfect “tc:racedesc type=”implied” misanthropist’s Heaven“/tc:racedesc” —and Mr. Heathcliff and I are such a suitable pair to divide the desolation between us.”
This example showcases a more implicit example of racialized language. Instead of directly naming race or appearance, the term “misanthropist’s Heaven” suggests social and emotional isolation that can be read as tied to Heathcliff’s outsider status. The language implies a kind of otherness through characterization and mood rather than explicit racial labeling, reflecting how the novel often conveys racial meaning through atmosphere, behavior, and social exclusion rather than straightforward description. This subtlety allows the narrative to maintain Heathcliff’s ambiguous identity while still signaling difference to the reader. By capturing these implied references, the project reveals how racialization operates not only through direct statements but also through tone and implication throughout the novel especially when it is examined on a larger scale by looking at all the examples within the text.
Wuthering Heights has received plenty of adaptations both on screen and in literature. The concept of taking a beloved classic novel and repurposing it is not new but when it comes to Wuthering Heights these adaptations fail to explore key themes that make Brontë’s work so important. Her focus on class, gender, and race are what makes the novel so valuable as a commentary of her time period. There has only been one film adaptation that has been made where Heathcliff is a non white character, the 2011 version.
Recently, there have been ongoing debates particularly on social media platforms such as TikTok as a new version of the film will be released in February 2026 by filmmaker Emerald Fennell. Through the promotional content released about the film there have been many changes made that have stripped key parts of the original novel. Now, there are always debates when it comes to adaptation because there are those who believe a novel should be adapted fully faithfully with no changes. While others believe you can have more loose adaptations that keep the essence of the original work but provide something new. I enjoy many types of adaptations and I don’t need to always have a frame for frame film. Sometimes it is good to see something different.
However, the issue with the 2026 version is that we are expected to believe this is a true adaptation. Heathcliff, played by Jacob Elordi, a white man, now completely erases a key aspect of his identity that plays a pivotal part in the purpose of this novel. His casting choice foreclosed the novel’s persistent refusal to name Heathcliff’s origins while simultaneously ignoring the racialized language that repeatedly marks him as Other within the text. By presenting whiteness as a neutral default, the adaptation reframes ambiguity as absence rather than tension, flattening the historical, colonial, and racial contexts that shape Heathcliff’s social position. In doing so, the film risks reinscribing a long-standing tendency to separate Wuthering Heights from the imperial histories embedded in its language.
Viewed through this project's encoding categories, the stakes of Fennell's casting become precise in a way that goes beyond general representation debates. The explicit raceDesc category establishes that the racial coding of Heathcliff is not a matter of interpretation but is written directly into the text through terms like “dark-skinned gipsy,” and “savage.” The implied raceDesc category reveals something that only becomes visible through the encoding work this project has done. Unlike the explicit descriptors which are easy to identify, the implied descriptors are embedded throughout the novel. The darkness, animality, foreignness, and social exclusion that are attached to Heathcliff build on each other throughout the novel until they form a system of racial othering that shapes how every character understands and responds to him. This is why the implied category makes the 2026 casting choice so significant because it helps to identify how his racial identity is undeniable. Recasting Heathcliff as white severs Heathcliff from the racialized language that makes him who he is, rendering the explicit descriptors meaningless and the implied ones invisible. This is what the encoding makes clear, the racial othering of Heathcliff. This is what the 2026 adaptation has chosen to erase.
This project shows how combining close reading with customized TEI encoding can deepen our understanding of how race operates within Wuthering Heights. By making patterns of racialized language visible across the novel, the project demonstrates that Heathcliff’s ambiguity is not the absence of racial meaning but the result of its careful construction through description, narrative voice, and historical context. I hope this work encourages readers to return to the novel with renewed attention to the language that shapes Heathcliff’s identity and to the imperial histories embedded in the text.
By making both the encoded text and the custom schema publicly accessible, the project invites further scholarly engagement, classroom use, and methodological adaptation. Ultimately, it argues for a digital humanities practice that does not merely accommodate race as data, but builds race into the structure of analysis itself, offering a model for future work in nineteenth-century studies and beyond.