A HOME FOR ANYONE ADDICTED TO ARABIC. 
JOIN ARABIC FOR NERDS➕

Support this site with a membership: For only $2.99 a month or $29.99 a year, you can have a true AD-FREE experience. You also get a 15% discount in my shop and a monthly premium newsletter. Find out more here.

SUPPORT THIS SITE

Passion doesn't need money. Unfortunately, my web provider does. Your contribution ensures that this site will grow and grow.

Buy Me A Coffee

Get your dose of Arabic

Subscribe to my FREE newsletter and get 10% off in my store!

library

Arabic dialects: A database of 25 cities from Rabat to Baghdad

MADAR is a remarkable tool to analyze Arabic dialects. The database contains 2000 sample sentences, each translated into the dialect of 25 Arab cities. Here is how to use it.

LAST UPDATED: 1 month ago

The MADAR corpus is a collection of parallel sentences covering the Arabic dialects of 25 Arab cities, in addition to English, French, and MSA. In this article I will show how this database can be used for learners of Arabic dialects.

In Arabic, مَدار means axis, center but also orbit. MADAR, however, is an acronym and stands for Multi Arabic Dialect Applications and Resource. The goal of MADAR is to create a unified framework for Arabic dialects which could also be used for Machine Translation.

There are two datasets:

  • Corpus-26: a set of 2,000 sentences which were translated to 25 city dialects (each of these sentences has 25 corresponding parallel translations), in addition to MSA.
  • Corpus-6: a set of 12,000 sentences translated to the dialects of five selected cities: Doha, Beirut, Cairo, Tunis, and Rabat, in addition to MSA.

Unfortunately, the English or French translations are not publicly available. The authors state copyright restrictions.

w=9999
screenshot website madar: https://camel.abudhabi.nyu.edu/madar

How can you access the database?

You can access the database on MADAR’s website:

https://camel.abudhabi.nyu.edu/madar

If you are interested in the genesis and aim of the project, you can download a paper of the project members.

How can this database be useful for Arabic learners?

In principle, just typing in a word and looking at the results is enough – you can learn a lot from that alone. To show how valuable this database is, I would like to show two examples.

Example 1: How do people in Alexandria, Beirut, Mosul, Tunis, and Rabat express “to want”?

Let’s assume you want to see how the verb to want is expressed in Alexandria, Beirut, , Tunis, and Rabat.

We use the MADAR lexicon viewer for this. You choose the cities you want to analyze, enter the word in question – and this is what you get:

w=9999

In my opinion, this is an outstanding resource for anyone interested in Arabic dialects. It is usually the most important verbs and nouns which are different and crucial for understanding. The less common words (except for nature and food) are usually quite the same in many dialects.

Example 2: How is “to want” used in Egyptian and Levantine Arabic in full sentences?

We use the MADAR Corpus Viewer for this. We choose the cities Alexandria and Beirut. In the option field “English”, we write “want”. The results are stunning! We see how the verb is used in colloquial Arabic and what would be the equivalent in Modern Standard Arabic.

Which 25 cities are covered?

It aims at producing a large parallel corpus of 25 Arabic city dialects, in addition to a preexisting parallel set for English, French and Modern Standard Arabic (MSA).

  • Morocco: Rabat, Fes
  • Algeria: Algiers
  • Tunisia: Tunis, Sfax
  • Libya: , Benghazi
  • Egypt: Cairo, Alexandria,
  • Sudan: Khartoum
  • : Jerusalem
  • Jordan: Amman, Salt
  • Lebanon: Beirut
  • Syria: Damascus,
  • Iraq: Mosul, Baghdad, Basra
  • Qatar: Doha
  • Oman: Muscat
  • Saudi-Arabia: , Jeddah
  • Yemen: Sana’a

How to download the corpus data

For people who would like to download the corpus data – press on this button and fill out the form. You will then receive a link to download a ZIP-file.

Other corpus data for Arabic

I must confess that I have quite little experience with corpora systems, that is, databases that collect Arabic texts and make them analyzable. However, I am convinced that these databases are an important tool for anyone who wants to develop a better feeling for Arabic.

Tunisian Arabic

Tunisiya.org is a project, led by Karen McNeil and Miled Faiza, seeking to build a four-million-word corpus of Tunisian Spoken Arabic. There are currently 2,006 texts in the corpus, comprising 881,964 words. It is free.

Collection for Modern Standard Arabic including Egyptian Arabic

The website https://arabicorpus.byu.edu is a fascinating tool if you want to analyze words in context. It offers a variety of different corpora data including many newspapers. It is a great tool to see how words are used in Modern Standard Arabic.

w=9999

You need to register first, but it is completely for free.

The OPUS project

The OPUS project covers many languages. OPUS is a growing collection of translated texts from the web. OPUS provides the community with a publicly available parallel corpus. For example, you get side-by-side translations of TED talks, etc. You need some time to digest all the data and to know what you are looking for.

Subscribe
Notify of
guest
2 Comments
Inline Feedbacks
View all comments
Lady Jane
Lady Jane
4 months ago

Bahrain, Muscat? 🙂

Previous Article
w=110,h=110,fit=crop

What kind of word is حُرِّيّةٌ in Arabic?

Next Article
Immanuel Kant

What is the connection between Immanuel Kant and the Basmala?

Related Posts