The MADAR corpus is a collection of parallel sentences covering the Arabic dialects of 25 Arab cities, in addition to English, French, and MSA. In this article I will show how this database can be used for learners of Arabic dialects.

MADAR – a collection of Arabic dialects from 25 Arab cities Hide

How can you access the database?
How can this database be useful for Arabic learners?
1. Example 1: How do people in Alexandria, Beirut, Mosul, Tunis, and Rabat express “to want”?
2. Example 2: How is “to want” used in Egyptian and Levantine Arabic in full sentences?
Which 25 cities are covered?
How to download the corpus data
Other corpus data for Arabic
1. Tunisian Arabic
Collection for Modern Standard Arabic including Egyptian Arabic
The OPUS project

In Arabic, مَدار means axis, center but also orbit. MADAR, however, is an acronym and stands for Multi Arabic Dialect Applications and Resource. The goal of MADAR is to create a unified framework for Arabic dialects which could also be used for Machine Translation.

There are two datasets:

Corpus-26: a set of 2,000 sentences which were translated to 25 city dialects (each of these sentences has 25 corresponding parallel translations), in addition to MSA.
Corpus-6: a set of 12,000 sentences translated to the dialects of five selected cities: Doha, Beirut, Cairo, Tunis, and Rabat, in addition to MSA.

Unfortunately, the English or French translations are not publicly available. The authors state copyright restrictions.

screenshot website madar: **https://camel.abudhabi.nyu.edu/madar**

How can you access the database?

You can access the database on MADAR's website:

https://camel.abudhabi.nyu.edu/madar

MADAR database

If you are interested in the genesis and aim of the project, you can download a paper of the project members.

How can this database be useful for Arabic learners?

In principle, just typing in a word and looking at the results is enough – you can learn a lot from that alone. To show how valuable this database is, I would like to show two examples.

Example 1: How do people in Alexandria, Beirut, Mosul, Tunis, and Rabat express “to want”?

Let's assume you want to see how the verb to want is expressed in Alexandria, Beirut, Mosul, Tunis, and Rabat.

We use the MADAR lexicon viewer for this. You choose the cities you want to analyze, enter the word in question – and this is what you get:

Arabic dialects: A database of 25 cities from Rabat to Baghdad 14 — Arabic dialects: A database of 25 cities from Rabat to Baghdad 18

In my opinion, this is an outstanding resource for anyone interested in Arabic dialects. It is usually the most important verbs and nouns which are different and crucial for understanding. The less common words (except for nature and food) are usually quite the same in many dialects.

Example 2: How is “to want” used in Egyptian and Levantine Arabic in full sentences?

We use the MADAR Corpus Viewer for this. We choose the cities Alexandria and Beirut. In the option field “English”, we write “want”. The results are stunning! We see how the verb is used in colloquial Arabic and what would be the equivalent in Modern Standard Arabic.

Arabic dialects: A database of 25 cities from Rabat to Baghdad 15

Which 25 cities are covered?

It aims at producing a large parallel corpus of 25 Arabic city dialects, in addition to a preexisting parallel set for English, French and Modern Standard Arabic (MSA).

Morocco: Rabat, Fes
Algeria: Algiers
Tunisia: Tunis, Sfax
Libya: Tripoli, Benghazi
Egypt: Cairo, Alexandria, Aswan
Sudan: Khartoum
Palestine: Jerusalem
Jordan: Amman, Salt
Lebanon: Beirut
Syria: Damascus, Aleppo
Iraq: Mosul, Baghdad, Basra
Qatar: Doha
Oman: Muscat
Saudi-Arabia: Riyadh, Jeddah
Yemen: Sana'a

How to download the corpus data

For people who would like to download the corpus data – press on this button and fill out the form. You will then receive a link to download a ZIP-file.

Download the madar dialect corpus data

Other corpus data for Arabic

I must confess that I have quite little experience with corpora systems, that is, databases that collect Arabic texts and make them analyzable. However, I am convinced that these databases are an important tool for anyone who wants to develop a better feeling for Arabic.

Tunisian Arabic

Tunisiya.org is a project, led by Karen McNeil and Miled Faiza, seeking to build a four-million-word corpus of Tunisian Spoken Arabic. There are currently 2,006 texts in the corpus, comprising 881,964 words. It is free.

tunisiya.org

Collection for Modern Standard Arabic including Egyptian Arabic

The website https://arabicorpus.byu.edu is a fascinating tool if you want to analyze words in context. It offers a variety of different corpora data including many newspapers. It is a great tool to see how words are used in Modern Standard Arabic.

You need to register first, but it is completely for free.

arabicorpus.byu.edu

The OPUS project

The OPUS project covers many languages. OPUS is a growing collection of translated texts from the web. OPUS provides the community with a publicly available parallel corpus. For example, you get side-by-side translations of TED talks, etc. You need some time to digest all the data and to know what you are looking for.

opus.nlpl.eu