Episode 39. Saana Svärd: Digital Assyriology in Helsinki: Transcript

0:14  JT

Welcome to the Thin End of the Wedge, the podcast where experts from around the world share new and interesting stories about life in the ancient Middle East. My name is Jon. Each episode I talk to friends and colleagues, and get them to explain their work in a way we can all understand.

0:32  JT

There have been digital resources in Assyriology for far longer than you might imagine. They were usually created by and for individual scholars. In the space of just the last 20 years, that situation has transformed beyond all recognition. And the field of Assyriology has fundamentally changed because of it. Very much for the better. It’s an exciting time to be in the field. 

1:03  JT

Digital tools are giving us access to more material, and letting us work much faster on it, typically together in all sorts of fruitful ways. Crucially, we can study bigger and more complex questions, as well as more delicate and nuanced ones. What’s more, we now get a quantitative component to our results. These tools effectively give us all a dedicated team of research assistants.

1:36  JT

Our guest today is someone trained in the traditional manner, who also has experience in digital humanities. She is leading an interdisciplinary team who are creating powerful tools that will help us better understand Akkadian. 

1:54  JT

So get yourself a cup of tea, make yourself comfortable, and let’s meet today’s guest.

2:08  JT

Hello, and welcome to Thin End of the Wedge. Thank you for joining us.

2:13  SS

Nice to be here. Thanks for inviting me.

2:16  JT

Can you tell us please: who are you, and what do you do?

2:21  SS

Sure. So my name is Saana Svard. I’m associate professor in Ancient Near Eastern Studies at the University of Helsinki. I’m also the director of the Center of Excellence in Ancient Near Eastern Empires. My own research for the past years has been concentrating on gender in Mesopotamia. So studying how women and men existed in Mesopotamia, what did they do? How should these things be understood in the context of gender studies?

2:52  SS

But in very recent years, there’s been a second interest of mine, which is digital humanities. I guess you could call it the digital assyriology. So how can we approach these amazing archives, this amazing text material, from a digital perspective? So in other words, by using quantitative analytical tools that have been used for other languages, but not yet for Akkadian or Sumerian.

3:20  JT

Now we’re going to talk about some research you’ve been doing on digital tools to help find the meaning of Akkadian words. That research is carried out as part of a team you lead within the Centre of Excellence in Ancient Near Eastern Empires, of which you are the overall director. Before we get into details, could you first introduce us to the centre and its work, please?

3:45  SS

Yes, I’d be happy to. The Centre of Excellence was, and is, funded by the Academy of Finland and by the University of Helsinki. And there’s this special Centres of Excellence program in Finland, whereby about 10 centres are chosen every four years. And these centres are chosen from all fields of study in all of Finland. So it’s quite hard to get a funding for one of these centres. But then if you do, it’s a really, really good deal for the whole field. Basically, what we got from the Academy and from the University of Helsinki is eight years of funding to study ancient Near Eastern empires in the first millennium Mesopotamia.

4:30  SS

We started on the first of January in 2018, which means that the center will be concluded by the end of 2025. The Centre currently has about 35 members or associate members. And these 35 people are divided into three distinct themes. And when we were planning the Centre, we decided not to focus on chronological themes, but instead develop themes based on the methodologies. So I’m personally … well, I’m leading the whole Centre of Excellence, but additionally, I’m leading personally Team One, which is focusing on digital humanities, on digital methods. Then we have Dr. Jason Silverman, whose team is focusing on social scientific approaches to ancient materials. And then we have Antti Lahelma who’s leading Team Three, who focuses on the material culture and cultural heritage.

5:29 

And in practice, of course, I mean, members of all teams work together. But we find it very useful that each team has very interesting variation in their expertise of ancient Near East. And I think that has led to some really interesting interdisciplinary work. The 35 members that we currently have, not all of them are directly funded by the Centre of Excellence. So the reason why it’s called a Centre and not a project is precisely this idea that the Centre is bringing together people who are interested in this research mission that we have in the Centre of excellence. So it’s not about hiring certain people to do certain jobs. Certainly we hire people, yes. But it’s also about bringing people together who are interested in the same kind of research agenda. And that’s kind of the added value, as they say, in the language of applications.

6:28  JT

Well, congratulations, that is quite an achievement in itself. Could you tell us more about the specific goal of your teams, please?

6:38  SS

Well, my own team, Team One, is focusing on texts and on written material. And more precisely, we aim to apply two distinct sets of digital tools. So on the one hand, we apply social network analysis. And on the other hand, we apply language technological methods. And these can be very powerful tools, when we have a lot of texts; when we have a lot of text materials, or we have a lot of individuals to deal with. And then in support of these two digital approaches, Team One is also engaging in a kind of more traditional, philologically oriented work.

7:22  SS

And perhaps this is a good time to say that both of these digital methods, social network analysis, as well as the language technological methods, they require considerable expertise on the languages and cultures of ancient Mesopotamia. So it’s not any kind of deus ex machina that suddenly solves all our problems. But they are more of a very useful helping tool to do linguistic and historical research. So I would also like to take this opportunity to thank all the Team One members who have been doing an amazing job developing these tools, particularly Krister Linden has been a crucial part of this project right from the start. We started collaborating with Krister almost six years ago. He’s a language technologist by training, who nonetheless embraced the opportunity of working with this strange Akkadian corpus with assyriologists. And I think that our collaboration has been one of the key aspects of making this all work.

8:30  SS

Team Two is particularly interested in developing sociological approaches for the study of ancient world. They are currently working on a series of articles based on Bordieu’s field theory, to see how that can be applied fruitfully to Ancient Near Eastern texts and history.

8:50  JT

What is Bordieu’s field theory?

8:53  SS

Well, I guess in a short way, you could think about Bordieu’s field theory as a way of chopping up social institutions to have a look at them from different perspectives simultaneously. So it’s basically a theoretical framework that can help us analyse social institutions in Mesopotamia. And it’s been used a little bit in the study of Mesopotamian history, but the Team Two people are doing a series of collaborative articles and developing these kinds of methodologies together with anthropologists and also historians who are experienced in different phases of Mesopotamian history.

9:34  JT

And how about Team Three?

9:37  SS

Team Three is our so called archaeological team. So most of our archaeologists are located there. They have two major initiatives that are currently underway. So on the one hand, they have been planning, and now they hopefully will finally be able to go to Jordan to do their field work. Obviously Covid has delayed that by almost two years. But they are going to do a survey mission in Jordan, especially to explore the environment of so-called King’s Highway. The idea is that Team Three is looking into the material remains, looking into the pattern of settlements, and the pattern of roads, to try and find out more about these so-called fringe areas. And then that data and that insight is incorporated into what Teams One and Two are doing with texts.

10:33  SS

We have rather ambitious interdisciplinary approach here. We are now sort of halfway through with this Centre of Excellence. And once they get the first season of the field work done, then I think we have a better idea of what kind of publishable results we are going to get. The second major initiative that team three is engaged in is arranging a museum exhibition in the National Museum of Finland here in Helsinki. So we want to make Ancient Near Eastern research more accessible to the general public. Obviously, it’s not the British Museum. We don’t have a lot of material from ancient Near East, in Finnish museums in general. But actually, it’s been a really interesting project to figure out, what do we have in Finland? And why have these objects ended up as parts of Finnish collections? So this center, and the work that the team Three is doing, has kind of enabled us to reflect on the position of ancient Near Eastern Studies in the Finnish society and in the Finnish history. And that’s what the exhibition is also going to be about. The exhibition will open in spring.

11:44  JT

That sounds interesting. I must come back and ask them about that in the spring. For the moment, though, I’d like to focus on your team, Team One. We’re fortunate today to have at least three complete Akkadian dictionaries. But all tools have their advantages. What does a digital dictionary-like tool do for us that a paper one doesn’t?

12:11  SS

That’s a very good question. Basically, what I think it does better than modern dictionaries is depicting relationships between words. Like, obviously, the information is what it is, right? We don’t have any mysterious hidden sources that the CAD wouldn’t have had, for example. But the way that the information is analysed and quantified, and then made into these graphic networks, is actually revealing something about the semantic relationships between individual words that in some cases could be traced via careful dictionary work, but in many cases, opens up new questions and new interesting insight into vocabulary, and into semantics … lexical semantics of Akkadian.

13:02  JT

Can you give us an example of that in action?

13:05  SS

Yeah, sure. So our most recently published article is authored by myself and Tero Alstola, Heidi Jauhiainen, Aleksi Sahala, and Krister Linden. It’s called “Fear in Akkadian texts: new digital perspectives on lexical semantics”. And it was published in The Expression of Emotions in Ancient Egypt and Mesopotamia, edited by Shih-Wei Hsu and Jaume Llop-Radua. We bought an open access option for it. So it should be openly available in Brill web pages. And in that article, we basically compared the view that we get on fear from current dictionaries with the view that we get on the same words by using lexical network approach that our team has developed. And more precisely, we plooked into the verbs are adaru, galatu, palahu, baradu, and shahatu. So we only looked at these verbs and their derivatives that occurred often enough in our material.

14:15  SS

So this is our first caution for a quantitative approach: that if you want to examine a word or a cluster of words that occurs less than 30 times in the text material, then it’s definitely more effective to just do that by han. And just look up the text examples and figure it out. But then if you have like, you know, in our case we had five different verbs and their derivatives occurring at least 20 times in the material, in the corpus that we had. So tracing these interdependencies then becomes complicated enough, that it’s worth our time to use digital tools to analyse it. So what we wanted to find out was, well, first of all, are the digital tools yielding similar results as CAD, for example?

15:06  SS

And the second agenda point was, how much do we understand about different nuances of these verbs? I mean, they are all of the translated as “to fear” or with similar English words. But what was their nuance exactly in Akkadian? We even had a rather ambitious cognitive idea, or we had an idea from cognitive sciences, that the lexemes and their semantic connotations could help us in a way to get into the head of the ancient language speakers. And, of course, I mean, we all know that in Mesopotamia, the language was, in many ways, the tool for the elite. So the texts that we have, are hardly, you know, recordings of usual speech in market squares or something like that. But nonetheless, we thought it’s worth a try to trace the connections between these five fear words. And the results were very encouraging.

16:08  SS

We used two different language technological approaches. One was called PMI, and the second one was called fastText. And the results were similar. They supported the interpretation that we did based on the traditional dictionaries. But they also added something that I guess we could call quantitative meaning. We could not see in the results, or the dictionary meanings. But this probably would have been the case even if we had the exact same text material at our disposal as the dictionaries had. So the quantitative approach, of course, focuses on usual meanings and common meanings. But then I guess this relates to a fairly interesting question like, what is the meaning of a word? And do we consider meaning to be the sum of all potential meanings for one particular word? Or can meaning be defined by the most common usage of the word? And of course, the quantitative approach is simply more suitable for answering the latter question about what are the common uses of a given word. The quantitative analysis that we used is good for finding the clearest context for a given word. So that with this means that it’s good at finding the most common pattern of use.

17:35  SS

And it seems that these methods that I mentioned–PMI and fastText–they are particularly good at identifying contexts that have specialised vocabulary, or some kind of easily recognisable setting; for example, warfare. The quantitative analysis immediately picks up any context that relates to warfare, because that, of course, has very distinct surrounding vocabulary in itself.

18:01  SS

And of course, the second thing that the methods really easily pick up, are synonyms or word pairs. In this study, where we looked at fear, gilittu and pirittu were very easily highlighted as a pair of words that practically always occurs together. So gilittu and pirittu, usually translated as fear and terror. They occur most of the time in a very specific text genre, and they very, very rarely appear anywhere else. Whereas palahu, of course, has many different kinds of usages in the corpus. Palahu has a clear positive connotation of respect. At the same time, galatu, bararu, and adaru, had a clear connotation to sickness or symptoms. So they clearly had something to do with the body trembling, for example. It’s a little bit difficult to describe these results in speech, because the main result of the article was producing this lexical semantic map of these five verbs and their derivatives.

19:15  SS

And the data set that we produced for this article is openly available in Zenodo. Anybody can go and scroll through it and see what the connections are between these five words. And what are the connections between these words and their derivatives and the rest of the vocabulary in our corpus? It’s a starting point in a way, or that’s how I see it. That the lexical semantic map is a starting point for a serious quantitative inquiry into the common usage patterns of individual words.

19:50  JT

So you’ve published two networks that help people understand Akkadian words. Why two? What’s the difference?

19:58  SS

Well, first of all, we released last summer two versions of these kind of lexical semantic maps that are available online in the web pages of the Centre of Excellence. And the reason why we wanted to produce two different maps is that the methods themselves give a slightly different view on our texts. PMI is an abbreviation of Pointwise Mutual Information. And the way it works is calculating co-occurrence probabilities across the whole corpus in question. So it’s basically doing the same kind of semantic work that assyriologists have always done: looking at the context of the word to define the meaning of the word. But it’s just doing it on mathematical basis, and very, very fast. fastText on the other hand, is something a little bit trickier. So fastText is also based on contexts. But fastText is calculating which words tend to appear under similar circumstances. So, for example, in English language, you could say that “I was terrified of spiders”. Or you could say, “I was afraid of spiders”, right? So then word2vec would calculate in its own special way that, okay, so “to fear” and “to be terrified”, probably have something to do with each other?

21:33  JT

Would it be fair to say then, that fastText helps you find words that are similar. And PMI helps you understand how those related words differ from each other?

21:44  SS

Yeah, that’s one way of summarising it. When we have done experiments with both fastText and PMI, we realise that for Akkadian, the results are often quite similar. And that could be because the corpus is fairly small. I mean, in language, technological terms, the corpus is fairly small. But it can also have something to do with the very specific context that we have. Like if you think about royal inscriptions, for example, I mean, there is a very clear tradition on what goes into a royal inscription. And there’s a very clear tradition of having long lists, having certain things following one after another. So this kind of blurs the line between finding words that appear in similar contexts, or finding contexts that appear with similar words.

22:40  SS

But in any case, the results were different enough that we wanted to sort of give the people the chance to have a look at it themselves. So that’s why we made two separate lexical networks. We made one by using PMI, and then we made another one by using fastText. And then we just uploaded them and modified them, so that they can be easily zoomed in and zoomed out and browse through a web page. So we wanted to avoid a situation where people would actually have to download and install a lot of software before they can navigate these word networks. We wanted to make it as easy as possible for people to see them evaluate for themselves.

23:23  JT

With any kind of analytical tool, what you get out will be influenced by what you put in. Where do you get your material from? How do you analyse it? And how do those choices affect your results?

23:40  SS

That’s a really good question. Ah, well, when we started the project, we decided to start with the Oracc–so Open Ritually Annotated Cuneiform Corpus–has been absolutely invaluable for our work. In fact, these kinds of approaches could not be done if you didn’t already have a very large corpus that are available for research. We downloaded the Oracc data, and then some of the team members worked on that for quite a while to sort of transform that into a format that can be analysed easily with PMI or fastText. And this format made it also possible for us to do a kind of like a shadow Oracc corpus. So in the Helsinki University, we are hosting something called the Language Bank of Finland, which is precisely a repository for research material and for research texts and text corpora. So this formatted version, this enriched version of Oracc data was uploaded to this Language Bank of Finland service which is called Korp. And that is the corpus that we are working with in Helsinki.

24:50  SS

The downside of that is that this kind of like Oracc-in-Korp corpus, it’s obviously always a snapshot from Oracc. This month or next month, there should be the new version available. That’s then a snapshot of Oracc in 2021. This is perhaps a very good opportunity to say out loud, how grateful we are for people like Karen Radner and Simo Parpola, and many, many others who have made these research data and these texts electronically available, and who have lemmatised it and enriched the data to a point that we can actually start experimenting with these language technological tools.

25:32  SS

For the fear article, which is not that old, we had 7346 texts, most of which come from the Neo-Assyrian period. And obviously, most of those come from Nineveh. So what we have now in our lexical portal–this can’t be emphasised too much–it’s a diachronically. flat view. The data that we have definitely should be taken into account when we ask our research questions. So learning to use these kind of new research methodologies means that you have to understand what these methods do. And what does PMI do? And what does fastText do? And what is the basic data that we are using? And then you can use it, you know, in a considered way to support your lexical analysis of an individual Akkadian concept, for example.

26:29  JT

Digital tools have already changed how assyriology is done. And the pace of that change is increasing all the time. Where do you think digital tools and resources might take us next?

26:43  SS

Hmm, nobody can know for sure, of course, but my personal idea is that using digital tools will also bring assyriology closer to other fields of study. Because in many cases, historians or linguists are interested in ancient Mesopotamia. But as a field, it’s not easy to understand what assyriologists do. And it’s not easy often, even to have access to the published texts. Like where are they published? I mean, what’s the most recent publication? How do you use publications? The digital tools or the digital approaches, I think they can make an impact in the field of historical inquiry. I think personally, that the history of ancient Mesopotamia has been bypassed too long in the general field of historical inquiry.

27:42  SS

And also in linguistic terms, we have this amazing material for a Semitic language. Like a written tradition of hundreds and hundreds and hundreds of years that has been researched within assyriology. But how much has that been researched in linguistic community, in linguistic research? I think that would enrich the field of assyriology. To have more comparative work done, and more collaborative work done, both on the Linguistics of ancient Mesopotamian languages, and also on the social phenomenon, historical developments in Mesopotamia. And of course, it’s not all going to be done by just applying digital methods. But I think it can sort of help us to move to that direction.

28:32  JT

How can we follow your progress with this research?

28:36  SS

I think that the ANEE web pages are the best way to follow the work of our team. So it’s in the Helsinki University web pages. But basically, you can just Google “Centre of Excellence in Ancient Near Eastern Empires, Helsinki”, and you will end up in our web pages. And if you’re particularly interested in the lexicon portals, in the main web page, there’s a little tab called “Research Data”. So under “Research Data”, and you’ll be able to find the links to the networks.

29:08  SS

Perhaps I should add that the current versions, you know, web pages, they are based on all of Oracc data. But we also didn’t filter them for any particular words. So we published this fear article that focused particularly on certain words, usually translated as “fear”. But for this lexical portal that’s on our web pages, we just took everything, and then just asked PMI and fastText to calculate the relationships between the words. So you should be able to find in the lexical portal any word that you’re interested in, if it appears often enough in the research corpus.

29:49  SS

In addition to doing a PMI version, and the fastText version of this kind of a lexical portal, we also did a kind of like a flipped network where we used the English terms instead of the Akkadian terms. So for PMI network and for fastText network, there’s also an English language version available, for those who don’t know Akkadian so well. In the web pages, we tried to include enough background information about what these lexical networks are, and how do they work and how they were created.

30:25  SS

And we also have links to our open data repositories, as well as to our articles, many of which are open access. But I would recommend to read at least part of that material, because the lexical portal is a very specific creation following laws of its own. And, for example, if you don’t find a word you’re interested in, there’s probably a good reason for that either in the data or in the methods that we are using. And also, I’d like to say that we welcome, really honestly … like people always say that they welcome feedback. But we actually really do welcome feedback, because we have still more than four years left for the Centre of Excellence. So we hope … really, really hope … to have produced a usable, interesting, maybe even exciting tool for the field of assyriology by the end of these next four years. So seriously, the feedback is very much appreciated.

31:27  JT

Well, thank you very much.

31:29  SS

Yeah, sure. No worries. Thanks.

31:33  JT

I’d also like to thank our patrons: Tyler Russell, Enrique Jimenez, Jana Matuszak, Nancy Highcock, Jay C, Rune Rattenborg, Woodthrush, Elisa Rossberger, Mark Weeden, Jordi Mon Companys, Thomas Bolin, Joan Porter MacIver, John MacGinnis, Andrew George, Yelena Rakic, Michael Katsevman, Mend Mariwany, Kathryn Topper, Zach Rubin, Sabina Franke, Sophus Helle, Shai Gordin, Aaron Macks, Jonathan Stökl, Maarja Seire, Jaafar Jotheri, Morgan Hite, Chikako Watanabe, Mark McElwaine, Heather Baker, Sukanya Ramanujan, Laura Battini, Jonathan Blanchard Smith, Vanessa Richards, Kliment Ohr, TT, Christina Tsouparopoulou, Andwer Senior, Melanie Gross, as well as those who prefer to remain anonymous.

32:39  JT

I really appreciate your support. It makes a big difference. Every penny received has contributed towards translations. Thanks of course to the lovely people who have worked on the translations on a voluntary basis or for well below the market rate. For Arabic, thanks in particular to Zainab Mizyidawi, as well as Lina Meerchyad and May Al-Aseel. For Turkish, thank you to Pinar Durgun and Nesrin Akan. TEW is still young, but I want to reach a sustainable level, where translators are given proper compensation for their hard work.

33:18  JT

And thank you for listening to Thin End of the Wedge. If you enjoy what we do, and you would like to help make these podcasts available in Middle Eastern languages, please consider joining our Patreon family. You can find us at patreon.com/wedgepod. You can also support us in other ways: simply subscribe to the podcast; leave us a five star review on iTunes or your podcatcher of choice; recommend us to your friends; follow us on Twitter: @wedge_pod. If you want the latest podcast news, you can sign up for our newsletter. You can find all the links in the show notes and on our website at wedgepod.org. Thanks, and I hope you’ll join us next time.