Finally, to wrap up my dive into the Wheel of Time series using Natural Language Processing (NLP) techniques, today I’ll show the results of Probabilistic Topic Modeling, to reveal some of the most interesting insights yet. In particular, we will look at Latent Dirichlet Allocation to find our topics.

It is common in NLP, and machine learning in general, to create a simple model of a complex process, and by looking at how the simple model works, we can gain some understanding of the more complex situation. This depends a lot on our choice of model and its match for the particular situation.

For our model today, let’s try to model and understand the process of how an author writes. We can imagine that an author wanted to write a piece of text using only the top 50 words from the Wheel of Time. Here is a very simple way to do so:

  • The author owns a 50-sided die.
  • This die has one word on each face
  • To write a word in the text, the author rolls the die.
  • The author then writes the word on the top face.
  • The author repeats this process until the text is complete.

Returning to the frequency counts of our earlier posts, we can improve this algorithm by weighting the die so that the more frequent words in the corpus are chosen more often. But this only goes so far in actually modeling the process of writing. The resulting text is gibberish, and we don’t have any more insight into why the words are chosen with these frequencies.

Let’s enrich our model. Usually when we write, there are a few main ideas we are trying to communicate or topics we want to discuss. And if we try to emulate another text, we are really trying to write about the topics and not just use the same words.

So, if we assume instead that there are ten main topics in the text, we can alter our process for writing, such that a piece of text is composed in the following way:

  • The author owns 10 20-sided die.
  • Each die has 20 words, one on each face, chosen from the 50 words.
  • To write a word in the text, the author first randomly chooses a die.
  • The author rolls the chosen die.
  • The author then writes the word on the top face.
  • The author repeats this process until the text is complete.

Our model is more accurate of the process, but we needed to introduce many new pieces. How do we determine the words on each die, or their frequencies? How do we randomly choose a die?

Latent Dirichlet Allocation (LDA) can be used to iteratively try to find the best set of probabilities for the topics and the topic word frequencies. We start out with random choices for everything, and then slowly altering these probabilities to match the distribution of words in the text.

It can seem like magic, but we actually do this all the time. If we are playing a game involving dice, we would assume at the start that the die are fair, but would probably start to revise this assumption once we started rolling a 6 much more often than the other sides. Eventually after enough rolls, we would have a pretty good idea of the frequencies for the die.

LDA Heat Map

OK, let’s see this algorithm in action. For my experiment, I chose to find ten topics, using the Gensim LDA library. The most frequent words were thrown out, so we lose stop words, plus Aes, Sedai, and Rand. Running this algorithm with my four-core MacBookPro took about 30 minutes, the most resource-intensive algorithm yet.

LDA Topics

In the above picture, I have created a heatmap showing the prevalence of each learned topic across the fourteen books in the series. The colors go from black to red to yellow to white, where black means not mentioned at all, and white means mentioned all the time.

Can you guess what topics were learned by LDA? Read on to see!

Spoiler Alert!

Discussions of plot points and threads across the whole series follow. Please take appropriate cautions if this is important to you.

Topics

I was surprised and impressed with what LDA found for topics! Here, I’ve tried to organize them somewhat chronologically, along with a few details of why I think the topic makes sense. The top 20 words in each topic are ordered by their frequency.

1 - Emond’s Field

perrin, faile, trollocs, rivers, tam, master, village, field, mat, lan, whitecloaks, gaul, emonds, cloak, children, mistress, axe, alvere, aram, perrins

We start the series in Emond’s Field, a small village in the Two Rivers region. The elders are referred to as Master and Mistress. A Trolloc attack is part of the inspiration for our characters to begin their journey, and Perrin returns here in book 4 to again fight the Trollocs and continues his conflict with the Whitecloaks.

2 - Beginning of the Epic

moiraine, mat, nynaeve, perrin, egwene, loial, lan, thom, door, trollocs, ogier, min, horn, tar, horses, valon, big, ingtar, hurin, cloak

Here we see the main characters for the first three books (minus Rand, who is present most everywhere and thus not included in these topic models). The book most connected to this topic is The Great Hunt, which brings the Horn of Valere to the front.

3 - Visiting the Aiel Wasteland

aiel, wise, maidens, aviendha, cadsuane, egwene, lews, althor, amys, min, therin, moiraine, rhuarc, saidin, spear, rands, cairhien, cairhienin, shaido, bashere

Next, we pick up the Aiel people, their land, and their culture, a topic carried through books 4-8. We see represented the Wise Ones, the Maidens of the Spear, the invasion of Cairhein, and the prominence of Aviendha. Mixed in with this is Rand’s awakening to and discussions with his alter-ego Lews Therin.

4 - Elayne’s War for Succession

elayne, nynaeve, birgitte, aviendha, faile, palace, perrin, lady, wise, min, quite, door, sisters, aiel, sister, mistress, throne, caemlyn, elaynes, queen

While Rand and Egwene are traveling through the Aiel world, Elayne and Nynaeve continue their own search for the Black Ajah. But more prominently in this topic, and the main subplot of Crown of Swords, Path of Daggers, and Winter’s Heart, is Elayne’s war for succession in Andor to become Queen of Caemlyn.

5 - Mat’s Adventures and Seanchan

mat, seanchan, thom, tuon, bloody, dice, damane, egeanin, suldam, wagon, ebou, talmanes, luca, juilin, selucia, horses, gold, soldiers, dar, mats

While never taking front stage, this topic focuses on Mat and is present from early on in The Great Hunt, and he continues rolling his dice for gold, reluctantly leading his soldiers, and swearing his bloody curses across much of the series. Also found in this topic, probably because of Mat’s turbulent courtship of Tuon, this topic is also the most related to the invading Seanchan armies and their horrible Sul’dam and Damane pairs.

6 - Aes Sedai Training and Politics

egwene, siuan, amyrlin, elaida, nynaeve, sisters, sheriam, hall, ajah, mother, leane, novices, accepted, sister, seat, sitters, romanda, lelaine, egwenes, bryne

Even after removing the ever-present words “Aes Sedai”, we find this topic encompassing much of the background Aes Sedai intrique and politics plots, with Egwene, the unlikely Amyrlin Seat, leading the way. All the Tower hierarchy is here, with the novices, accepted, Ajah, sitters, the Great Hall, and even the late entry of Gareth Bryne shows up.

7 - Sanderson’s Contractions (and Perrin’s Adventures)

perrin, didnt, gawyn, hed, shed, camp, wasnt, faile, ituralde, althor, hadnt, cadsuane, galad, couldnt, aiel, seanchan, soldiers, battle, morgase, min

The next two topics are the most interesting. First, we see that they are the focus of the last three books, where Brandon Sanderson took over the series writing after Robert Jordan’s passing. Remember the contractions we saw in a few of the word clouds earlier? It looks like Sanderson used many more contractions than Jordan did (didn’t, he’d, she’d, wasn’t, hadn’t, couldn’t)! The remaining words in this topic all point to the aftermath of Perrin’s quest to rescue Faile and background character movements in the Towers of Midnight and the Gathering Storm.

8 - The Last Battle (Sanderson)

elayne, mat, perrin, egwene, nynaeve, trollocs, didnt, battle, lan, androl, fight, bloody, birgitte, gateway, aviendha, hed, galad, fighting, army, nearby

Finally, Tarmon Gai’don, or the Last Battle, which takes up the majority of the final book A Memory of Light, brings together most of our favorite characters across the series for the final struggle against the dark forces (see battle, fight, fighting, army). While Elayne and Mat are the leaders of the battle, even Androl and his gateway weaves make a strong showing.

9 - Prologues, Appendices, and World-building

aiel, war, channel, seanchan, age, 1, reborn, ajah, tongue, children, shadow, ability, andor, during, wheel, breaking, wars, land, sea, trolloc

Finally, we have the most subtle topics found. As far as I can tell, this topic is small but present in every book. My guess is that these are the world-building sections and discussions of the wheel of time and pattern itself, most often seen in the prologues and appendices of the books.

10 - Mostly Obscure Characters

bors, wind, malenarin, torval, almen, lan, illian, blade, hopwil, practice, elenia, keemlin, nasin, cadsuane, mans, semaradrid, command, dagger, companions, jargen

Last but not least, well, kind of least, is a collection of second and third-tier characters in the series. Only showing up enough to be noticed in The Path of Daggers, there does not seem to be a common thread across many of these names, but without these small threads the larger fabric of the series would crumble.

Conclusion

What else do you see highlighted in these topics? Are there connections I’ve missed, or that you disagree with?

I have been surprised at every turn how much insight into the series can be found using these NLP tools. I had finally finished reading the series last year, and it has proved to be an excellent testbed for explaining how valuable the combination of computation and human insight can be when analyzing text.

Each week in my class this spring, I was excited to tell my students “Watch this, it actually works!” and then discuss with them some of what I’ve shared in these posts. Hopefully this whets you appetite for some computational analysis, if not on the Wheel of Time, then for your favorite series. I’d love to hear what you investigate, please share your discoveries below!