Last updated: 5 months ago
The Oxford Arabic – English Dictionary (OAD) is the most up-to-date Arabic dictionary on the market. It contains more than 130,000 words and phrases and 200,000 translations. It also helps with verb conjugation and gives tips on how to write the numbers correctly. I am therefore delighted that the Editor-in-Chief and team leader of the Oxford Arabic Dictionary, Tressy Arts, is answering questions for us in this interview.
Tressy Arts is an Arabist and lexicographer who worked on the Nijmegen Dutch Arabic dictionary project and is Chief Editor of the Oxford Arabic Dictionary. She studied Arabic at the Radboud Universiteit Nijmegen (the Netherlands), specializing in linguistics and translation. When not involved with dictionaries, she has worked as a language teacher, tour leader, translator, and proofreader; but lexicography is her passion.
In this interview, you'll learn why swear words aren't easy to translate into Arabic, how computers helped create the dictionary, what role a Dutch dictionary played in it, and which Arabic words are particularly tricky.
Facts about the Oxford Arabic Dictionary Show
- How many entries are there in the Oxford Dictionary?
- When did the dictionary first appear (first edition)? How “thick” was it at that time (pages, words)?
- How many people work in the dictionary team?
- What is the biggest difficulty in writing a dictionary? What do you have to pay particular attention to?
- What does the daily work of a dictionary editor look like?
- For which Arabic or English texts does the dictionary work best? Can I use it to read the Qur'an or any religious text? Or can I only read texts that appeared in the last 100 years?
- According to what Arabic spelling conventions is the dictionary written? For example, Syrian or Egyptian spelling of the Hamza?
- What kind of words do you think are the hardest to translate in Arabic?
- Which is more difficult to write and research: an English -> Arabic dictionary or an Arabic -> English dictionary? Why?
- Do you also use AI in your work?
- To what extent is youth language included in the dictionary?
- How many Arabic roots (3 radicals) and how many with 4 radicals are there in the dictionary?
- Which steps have to happen before a word ends up in the Oxford Dictionary? Who decides?
- Who collects the words? What are the sources? Newspapers? Novels?
- What are your main sources for analyzing the frequency and meaning of Arabic words?
- Are there Arabic words that disappeared from the dictionary because they are no longer used?
- How do you deal with dialect expressions? (Arabic slang)
- How do you deal with the fact that Arabic words often describe more of an idea than a specific event, thing or action? Where do you draw the line?
- After the 2011 revolution, the word الفلول was widespread, also used in Egyptian media for the remnants of the Mubarak regime. I couldn't find it in the dictionary. Would you also include such terms?
- How do you deal with English terms for which there is no Arabic word yet? (for example: “ghosting”)
- Are there letters or comments from readers that you remember? What were they about? Specific words or expressions?
- Are you considering making the dictionary available as a browser extension? E.g.: Getting instant translation when hovering over Arabic words?
How many entries are there in the Oxford Dictionary?
The current online version has 26,738 Arabic-English entries and 27,964 English-Arabic entries. The printed book has around 50,000 entries in total. We keep adding to the online version all the time, but there probably won't be a new print edition.
When did the dictionary first appear (first edition)? How “thick” was it at that time (pages, words)?
The dictionary was launched in August 2014. The book has 2048 pages that we absolutely crammed full. We reduced the margins and front pages to a minimum so we could put in all the information we wanted, whilst still making sure the fully vocalized Arabic text remained clear to read.
How many people work in the dictionary team?
When we compiled the Arabic dictionary, the team consisted of sixteen editors and three consultants, who communicated and worked together fully via the internet. Meetings were held on Skype, and the editing happened in an online database, which allowed a team from all parts of the world to work closely together. This meant we could find people without having to worry about them relocating to Oxford.
In addition to this core editorial team, there was a team in Oxford that dealt with the project management, data and corpus development, etc.
What is the biggest difficulty in writing a dictionary? What do you have to pay particular attention to?
I have worked on dictionaries of many languages now over the years. I think the biggest challenge for editors of any bilingual dictionary is that it is really hard to translate a word, as opposed to a text. That sounds paradoxical – surely a text is composed of many words, so should be harder?
But in practice, getting down to word level is really difficult. When you translate a text, you have the full context; if one word doesn't quite work, you can play around with the surrounding text to make sure that the translation matches the intention of the original. When writing a dictionary, you don't have that space: every word needs to be perfect.
Most words have several meanings, which we call senses, so you'll have to use indicators and collocates to explain to people in which context you use which translation. Some older dictionaries, like Hans Wehr for Arabic, don't provide this sense division, which can make it very hard to find the right translation.
For Arabic dictionaries in particular, a major issue is that you can't rely on other dictionaries. In Arabic lexicography, conservatism rules, and works like the Lisan al-Arab are still seen as the standard. Which is great for finding out the original meanings of a word and looking into the history, but not if I want to know what the most common meaning of a word is in this day and age, or if I want to work out how to say computer in Arabic. So when deciding on the meanings of a word, we couldn't really make much use of monolingual dictionaries or earlier bilingual dictionaries, like editors of many other languages can.
Also, for most languages, you have to be wary of language purists, who want to write a prescriptivist dictionary, a dictionary that tells you how language ‘should be' used. Oxford dictionaries are descriptivist, meaning that they tell you how language is used.
What does the daily work of a dictionary editor look like?
For most editors, it consisted of translation of entries, senses, and examples.
When creating this dictionary, we started with what's called a ‘framework'. A framework is basically the source language half of the dictionary.
Oxford University Press had a very good English-language framework, consisting of entries already divided into senses and containing indicators, collocates, examples, and extra hints for translators to make sure they understood the English perfectly before translating it into Arabic to create the English-Arabic half of the dictionary. The English-Arabic translations were then reviewed by Arabic reviewing editors from different countries.
For the Arabic-English editors, the framework consisted of the ‘Bulaaq Arabic-Dutch dictionary' compiled by Jan Hoogland and his team (of which I was one, before I moved to Britain). We actually hired mostly Dutch editors to work on this side (whose Arabic and English was top notch, of course) so the usefulness of the Dutch translations didn't get lost. Their work was then reviewed by native English speakers from both Britain and the US.
So in practice, as an editor you would choose a letter and be assigned batches of words of that letter, which you then worked on, translating or reviewing. When you were working, you could make notes for other editors inside the dictionary database if you wanted to check things. We had codes to assign questions to Arabic speakers, English speakers, specialists, or the chief editor. There was a lot of communication throughout.
As a reviewer, again you'd be assigned batches to review, and you'd communicate with the original editor to get to the best solutions.
For which Arabic or English texts does the dictionary work best? Can I use it to read the Qur'an or any religious text? Or can I only read texts that appeared in the last 100 years?
The dictionary aims to be a dictionary of Modern Standard Arabic. There are many excellent classical Arabic dictionaries that can be used to read the Qur'an or classical texts, but this is not it. We aim to be helpful for English speakers learning Arabic and Arabic speakers learning English when reading current texts.
Virtual tour: the Oxford Arabic Dictionary
If you would like to learn more about how to use the dictionary and what it has to offer, you can watch a 15-minute presentation that walks you through it here. It is available on the Oxford Dictionary website. You need to provide your email address and register in order to watch it, but it is accessible for free.
According to what Arabic spelling conventions is the dictionary written? For example, Syrian or Egyptian spelling of the Hamza?
We use the standard Arabic spelling of the Hamza, as illustrated in the excellent book ‘Arabic for Nerds‘ in paragraph 63 😉
Because all Arabic in the dictionary is fully vocalized, we had to create our own conventions for these, which we listed in the Style Guide (a list of conventions in the dictionary, so all editors used the same rules).
We decided to not write Hamzat al-Wasl (همزة الوصل), for example, and no sukun (سكون) or shadda (شدّة) on the first letter after al- (ال); we do write the case endings, but only when syntactically relevant (so we don't write every entry with a nominative ending), we don't vocalize some of the most common words like مع, etc.
What kind of words do you think are the hardest to translate in Arabic?
For any pair of languages, the words which are hardest to translate are words that are specific to one language region but not the other: dishes, religious, national, and cultural practices, names of institutions, etc. The more different the cultures and countries are, the larger this category grows. So in that case, often you have to resort to a gloss or description to describe to the speaker of the target language what something is.
A gloss might also be useful if a language does have a term for something, but it's much rarer than in the other language. For example, many Islamic terms have an English equivalent, but they're not very well-known among non-Muslims, so then it helps to give the equivalent plus a short description of what it is. Similarly, the Arabic reader may want some help explaining what exactly haggis is.
Haggis is a dish from Scotland containing different sheep's organs – heart, liver, and lungs -, minced with onions and spices and cooked inside a sheep's stomach.
A problem specific to English and Arabic is the register difference. English is used in all areas of life, and has words from the most familiar and vulgar to the highest register. Standard Arabic, by its nature, lacks those intimate and profane registers. So when you translate something like fuck off into Standard Arabic you can capture the meaning, but it won't have the same feel.
Which is more difficult to write and research: an English -> Arabic dictionary or an Arabic -> English dictionary? Why?
You know, I don't think I could say! Both have their unique difficulties.
English as a source language is much more researched and so building the framework is easier, but finding the right Arabic equivalents can be really hard, also because there are no native speakers of standard Arabic.
On the other hand, finding the exact meaning of an Arabic word can be really difficult and require a lot of corpus research, which is in itself hard because of the morphological complexity of Arabic.
Do you also use AI in your work?
Do you mean like translation software? No, it's all done by humans. We do use corpus analysis software but I'm not sure if that is classed as AI.
To what extent is youth language included in the dictionary?
Quite little. For Arabic, as the dictionary focuses on Standard Arabic, the language is by nature quite formal. For English, we tend to include only words with staying power, which youth language, which is quite dynamic, tends to have few of.
How many Arabic roots (3 radicals) and how many with 4 radicals are there in the dictionary?
That's hard to say.
Because the printed dictionary is ordered by root, every word in the dictionary needed to be assigned a root, including loan words which don't have a ‘proper' root. For example: كمبيوتر was given the root كمبيوتر. So when looking it up you can find it between كمبيالة and كمت, but of course كمبيوتر is not an Arabic root.
So unless I do a manual count, which would take hours, I am afraid can't tell you.
How can you access the Oxford Arabic Dictionary online?
The Oxford Arabic Dictionary is available via annual subscription, which also gives you access to the eight other Oxford Premium language dictionaries: Chinese, English, French, German, Italian, Portuguese, Russian, and Spanish.
Many libraries and universities already have a subscription to this website.
The prices for a subscription are $19.99 per year for the US, and £16.66 per year for the UK and the rest of the world. If you want to subscribe, follow this link.
Unfortunately, there is no mobile app so far.
For those who like to browse dictionaries, the printed edition is (still) recommended. It was published in 2014 and is available at amazon.
Please note that we are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.
Which steps have to happen before a word ends up in the Oxford Dictionary? Who decides?
That's a very good question, and the answer is kind of the same for English and Arabic: the language users decide. What that means is that we use corpora (singular corpus) to find and research our words.
A corpus is a gigantic mass of texts of many different sources in a particular language, that can be researched with software to find how a particular word behaves in context.
Our criteria are if a word is used often enough, in a variety of types of texts (so not only in computer manuals or only in fashion blogs), by different authors, without quotation marks, in texts which are in the language intended (i.e. Modern Standard Arabic for this dictionary) it is a candidate for inclusion.
Who collects the words? What are the sources? Newspapers? Novels?
The sources are corpora, which ideally have a collection of all types of text – newspapers, novels, educational material, etc.
Software can search these corpora for frequency, so how often a particular word occurs. The more frequent a word is, the more likely it is that people will want to look it up in a bilingual dictionary, so we compare a frequency list to a list of the words already in the dictionary, and sift through those, to find words that are useful to add to the dictionary.
This is specifically difficult for Arabic, because of its dense morphology.
A word like أكل (to eat) has 48 different potential forms, and that's without accounting for clitics (for more information, see below). Most of those forms could also belong to آكل. So corpus linguists list all possible forms of أكل and tell the computer that these are all forms of one word, and then the software can tell how frequent أكل is, because it knows that تأكل and أكلتم are also forms of أكل. But how can you do that with a new word, which the software doesn't know yet?
Fortunately, we had an editor who specializes in this kind of question: Mohammed Attia (http://www.attiaspace.com) has been immensely helpful in using corpus analysis and algorithms to identify new (to us) Arabic words.
A clitic is a word or part of a word that has a host (= neighboring word). It cannot stand on its own.
In English, the ‘s' in the expression What's the matter? would be called a clitic. Arabic, however, works differently. We do not have apostrophes. In Arabic, clitics are usually single letters attached (or prefixed) to a word without any orthogrophic marks.
Object suffixes such as ه or ك would be clitics, possessive determiners, but also the interrogative particle أ counts as a clitic.
An extreme example in Arabic containing several clitics would be the following sentence (although it looks like a single word):
أَوَسَتعطونيها؟ – And will you (pl.) give it to me?
Now, what are the clitics here? There are many: أ (question particle), و (and), س (future particle), the ن (helping device), the ي (possessive marker), ها (object pronoun).
What are your main sources for analyzing the frequency and meaning of Arabic words?
When creating the dictionary, we largely used the Gigaword corpus (https://catalog.ldc.upenn.edu/LDC2003T12), analyzed by Sketch Engine software (https://www.sketchengine.eu/); enhanced and improved by Oxford University Press (OUP) corpus linguists.
Nowadays, we also use the more modern corpora that Sketch Engine have on their website, though the analysis is less fine-tuned.
Are there Arabic words that disappeared from the dictionary because they are no longer used?
When we adapted the Arabic-Dutch dictionary to an Arabic-English dictionary, our Arabic editors looked at the Arabic entries, and marked ones that they said were no longer in use, so we removed those, to make more space for current words.
How do you deal with dialect expressions? (Arabic slang)
We have some very, very common ones in the dictionary, marked as ‘colloquial', but very few.
How do you deal with the fact that Arabic words often describe more of an idea than a specific event, thing or action? Where do you draw the line?
Ohhh, interesting question; which leads me to a counter-question – isn't that the case for all languages?
Words and their context all paint a picture in one's head, a feeling, a sense, often sentiments, moral judgements, etcetera, all of which are very hard or impossible to capture in a dictionary.
For any language pair, semantic fields (by which I mean the meanings covered by a specific word) rarely overlap; only for very concrete specific things like carbon dioxide. It's always a challenge to capture the overlaps between the semantic field of word A in language A and word B in language B; and the further apart the languages are, the more difficult it gets.
But specifically for Arabic dictionaries – we divide roots into word forms, which often have a more concrete meaning than the root. Then we divide these word forms into senses, which try to capture a significant segment of the semantic field and translate it into English, which is the aim.
We will never be able to fully transmit the Platonic idea of قام, but we can tell a user in which contexts which English words would be the best equivalent.
After the 2011 revolution, the word الفلول was widespread, also used in Egyptian media for the remnants of the Mubarak regime. I couldn't find it in the dictionary. Would you also include such terms?
Yes, we would.
And with this, you've given an excellent example of how hard it is to find new words computationally!
When searching for new terms, the software ignores words that we already have in the dictionary. We already had فلول as the plural of فل, but not with that sense. And that's the weak spot of corpus software: it can recognize new words, but not new meanings. So as far as the analysis software was concerned, people were just talking about notches a lot.
Of course we did a lot of human reading of the corpus to find new senses like these, but as we didn't have unlimited time, things inevitably got missed – like this.
This is an issue for any language, not just for Arabic, but because of Arabic's dense morphology, it does form a specific problem for Arabic corpus analysis. For example, in the Gigaword corpus, ويلز (Wales) was consistently seen as a form of لَزَّ (وَيَلُزُّ is a potential form). So it happens a lot that new meanings and entire new words are missed by the software – and we need attentive users like you to point these gaps out!
If readers encounter any words or senses like this, please send them to me: [email protected], and we will add them to the list of words to be considered for inclusion.
How do you deal with English terms for which there is no Arabic word yet? (for example: “ghosting”)
If the English word is used in Modern Standard Arabic, like إنترنت, we treat it as an Arabic word. It's our policy that loan words are part of a language's vocabulary.
If the English word is simply hardly ever used by Arabic writers, we describe what the English means, so Arabic users can understand the English word if they encounter it.
Are there letters or comments from readers that you remember? What were they about? Specific words or expressions?
The one that always will stay with me is the comment from Amir on Amazon: ‘WORST dictionary ever'.
That was it. No comments on why.
There's also the memorable comment from R.A.: ‘In Ordnung.' – which must be the most German review ever. 😊 (Remark: In Ordnung here means something like acceptable; okay; fairly well).
Fortunately, most comments have been very favourable, mostly about the breadth and modernity of the vocabulary. Tim Buckwalter, who received a print copy shortly before the launch in summer 2014, loved it so much he took all six pounds of it with him on holiday!
Note: Tim Buckwalter is a famous Arabic lexicographer who is a pioneer and leading expert of the Arabic morphology analysis for dictionary look-ups.
Please note that we are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.
Are you considering making the dictionary available as a browser extension? E.g.: Getting instant translation when hovering over Arabic words?
I'm afraid I have zero say in that, but the data is available to be purchased by developers. I, personally, would love to have that in my browser!