LAST UPDATED: 3 weeks ago
Automatic speech recognition and transcription has made great advancements. This works very well in English and German, and even YouTube can do it. Arabic speech recognition has been a disaster so far, mainly due to the dialects and of course the lack of training data. Now, there is a new tool that is very promising: Maqsam (مقسم).
Arabic Speech Recognition Hide
- What is Maqsam?
- Does AI understand your spoken Arabic? Here is how to test it
- Transcribe Arabic YouTube videos with Maqsam
- How good does Maqsam transcribe Arabic YouTube videos?
- Final remarks
The Jordan-based technology company Maqsam (مقسم) has done a lot of research in the fields of Arabic speech recognition. They offer a trial version online that everyone can access for free. Let's see what it is capable of.
What is Maqsam?
Maqsam (مقسم) has launched a new Arabic speech recognition tool (تحويل الكلام الى نص) and claims that it can outperform Google and Microsoft as well as other regional competitors in the race to transcribe Arabic dialects of the Middle Eastern and North African (MENA) region. Here are a few keywords about the new tool:
- You can access the trial version at https://intelligence.maqsam.com
- Maqsam Speech to Text recognizes and annotates not only regular expressions, but also domain-specific jargon when mixed with other languages.
- Maqsam can deal with noise very well.
- Maqsam's transcription model has been heavily trained with custom acoustic, language, and pronunciation models.
- It “understands” almost all Arabic dialects and variants, including the more difficult North African dialects.
Does AI understand your spoken Arabic? Here is how to test it
It's very exciting to check whether an automatic speech recognition will recognize and understand what you say in Arabic. If a computer can understand your spoken Arabic, a human being should also be able to do so.
How can you test it? You need a microphone on your computer device, or simply use your mobile phone. In order to test Maqsam with your own voice, visit https://intelligence.maqsam.com and press Record Audio.
The speech recognition is fantastic and understands dialects without any problems. I just tried it with a sentence in Egyptian Arabic. Interestingly, also Maqsam's competitors understood it well.
Transcribe Arabic YouTube videos with Maqsam
YouTube is already quite good at automatic transcripts, but not at Arabic dialects. We can use Maqsam to transcribe videos. Since we are only using the trial version, we have to take a detour. I will show how to do that.
First, we need to look for a YouTube video in Arabic. Then, you have two options:
Option 1: Speakers and Mic
Use your speakers and your microphone: It will work, but the result will contain many errors. To do this, play the YouTube video and press record on Maqsam. I do not recommend doing it this way.
Option 2: Download the audio of the YouTube video
This is the preferred way of transcribing the audio of videos. So, the question is: How do we extract the audio of a YouTube video?
There are many online tools, just google for “download audio youtube”. One that works quite well and is free is accessible here. It is called Online Video Converter Pro.
If you use Linux, there is a handy tool called youtube-dl. You may have to install it; many distros have already included it. If you have it installed on your system, use the following syntax:
$ youtube-dl -x --audio-format mp3 'http://www.youtube.com/watch?v=HRIF4_WzU1w'
Now, let's do some tests.
How good does Maqsam transcribe Arabic YouTube videos?
I will use three videos.
- The first example is an official statement; the announcement of the resignation of Hosni Mubarak. It has pretty clear audio and the speaker uses Standard Arabic, but the pronunciation is Egyptian (the letter ج is like “g” in the English word girl).
- The second example is a song by Elissa. Especially, songs are often tricky to understand (which is true for any language).
- The third example is the most difficult one: a part of the legendary “zanga zanga” speech of Muammar al-Gaddafi (معمر القذافي).
Test 1: Mubarak's resignation announcement
Omar Suleiman (عمر سليمان) – as well as the guy in the background, by the way – became famous for announcing the resignation of Hosni Mubarak during the Arab Spring (11th of February 2011). Here is the official announcement which was broadcasted by the Egyptian State Television as breaking news.
I will download the audio file of the video and upload it to maqsam.
Note: In case you encounter difficulties when generating the audio file, you can download it from my server – just click here to download the Omar Suleiman Resignation Speech file.
This is what I get:
سادتي وسادتي نقدم لحضراتكم الان بيانا هاما من رياسة الجمهورية بسم الله الرحمن الرحيم ايها المواطن نون في هذه الظروف العصيبة التي يتمر بها البلاد قرر الرئيس محمد حسني مبارك تخليه عن منصب رئيسمهورية وكلف المجلس الاعلى للقوات المسلحة لادارة شؤون البلاد والله الموفق والمستعان
سيداتي وسادتي نقدم لحضراتكم الان بيان هام من رئاسه الجمهوريه بسم الله الرحمن الرحيم ايها المواطنون في هذه الظروف العصيبه التي تمر بها البلاد قرر الرئيس محمد حسني مبارك تخليه عن منصب رئيس الجمهوريه وكلف المجلس الاعلى للقوات المسلحه لاداره شئون البلاد والله الموفق والمستعان
أهلا سيداتي وسادتي، نقدم لحضراتكم الآن بيانا هاما من رياسة الجمهوريةبسم الله الرحمن الرحيم.أيها المواطن ونفي هذه الظروف العصيبة.التي تمربها البلادقرر الرئيس محمد حسني مبارك.تخليه عن منصب رئيس الجمهورية؟وكلف المجلس الأعلى للقوات المسلحةلإدارة شؤون البلاد.والله الموفق والمستعان
Full transcript in Standard Arabic:
Full transcript in Standard Arabic:
سادتي وسادتي نقدم لحضراتكما الآن بيانا هاما من رياسة الجمهورية.
بسم الله الرحمن الرحيم أيها المواطنون في هذه الظروف العصيبة التي يتمر بها البلاد قرر الرئيس محمد حسني مبارك تخليه عن منصب رئيس الجمهورية وكلف المجلس الأعلى للقوات المسلحة لإدارة شؤون البلاد والله الموفق والمستعان.
Dear ladies and gentlemen, we present to you now an important statement from the Presidency of the Republic:
In the name of God the merciful, the compassionate; my fellow citizens, in these very difficult circumstances Egypt is going through, President Hosni Mubarak has decided to step down from the office of president of the republic and has charged the supreme council of the armed forces to administer the affairs of the country. We seek God's help and guidance.
To be honest, all three made some small (but obvious) mistakes, but all three are quite good. If you listen to the video again and read the transcript, it will be easy for you to correct the mistakes.
Test 2: The song “Halet Hob” by Elissa
Next we try Halet Hob (حالة حب), a song performed by Elissa (إليسا) which was released in 2014. Elissa is one of Lebanon's most famous singers and one of the best-known artists in the Arab world.
عيشة حالة حبي معك واخداني وصعب انها تتكرر تاني وبعيشها لو انت بعيد او قدامي واخيرا الايام ردي علي اخيرا جهي حبيبي يوم هي يرتاح من اسوديامي سدني اسرح فيك شوية وانسى ايام ضاو مني نفسي عمري عتيبي وانت بانك ضل حضني وانا جنبكايفة منك حاجة من ريحة ابويا حب الدنيا دي الجوا
عايشه حاله حب معاك واخداني وصعب انها تتكرر ثاني وبعيشها وانت بعيد او قدامي واخيرا الايام رضيو اخيرا يا حبيبي هم ليرتاح من اسود ايامي سبني اسرح فيك شويه وانسى ايام ضاعو مني نفسي عمري انتبه على دول حضرني وانا جنبك شايفه منك حاجه من ريحه ابويا احبك اغنيه
هاي شحال حبي معاك واخدة ان يوص امي. النها تتكرر تني و بعشه لو انت بعد أو أداب امي وأخرا الأيام ردي علي أخيرا فهي حبيبي ومدي ارتاح من أسود أيامي
That's the original lyrics in Egyptian Arabic:
عايشة حالة حب معاك واخداني وصعب انها تتكرر تاني وبعيشها لو انت بعيد أو قدامي وأخيرا اﻷايام رضيوا عليا أخيرا جه يا حبيبي اليوم ليا أرتاح من قسوة أيامي
3aysha 7alet 7ob ma3ak wakhdany we sa3b ennaha tetkarrar tany we ba3eshha law enta ba3eed aw oudamy we akheeran el ayam redyo 3alaya akheeran geh ya 7abibi youm leya arta7 men 2aswet ayami
Rough English translation:
I am living a state of love with you, it is taking me and it is hard to be lived again. And I live it whether you're far or standing in front of me. Finally, life is tender with me (= the days are on my side). The day has come, my love, where I can heal from my darkest days (= to get over all the hard times).
It is really impressive. There are some mistakes. Interestingly, the ض in رضيوا was understood by the software as د. But apart from that, you would only need to fix some minor things. Interestingly, Google also did a great job. I suspect Google used material already available on the web (e.g., there are databases for song lyrics).
Test 3: Gaddafi's “Zanga Zanga” speech
On February 22, 2011, the Libyan dictator Muammar Gaddafi (مُعمّر القذّافي) gave a televised speech amidst violent social unrest against his government. In the speech, Gaddafi vowed to hunt down protesters “inch by inch, house by house, home by home, alleyway by alleyway.
زنقة زنقة is pronounced in Libyan dialect as “Zanga Zanga”. The Arabic word زنقة describes a very narrow path. In Alexandria in Egypt, for example, there is a famous market area called زنقة الستات with very narrow alleyways.
We will only look at the most famous part, which starts at timeline 7:35.
Note: In case you encounter difficulties when generating the audio file, you can download it from my server – just click here to download the Zanga Zanga file.
What Maqsam understood (note that this can be different on different setups; it may also become better over time):
انا سنوجه نداء للملايين من الصحراء للصحراء وسنزحف انواع الملايين لتطهير ليبيا شبر شبر بيت بيت دار دار زنجة زنجة فرد فرد حتى تتطهر البلد من الدنسي والارجاس لا يمكن نسمح لليبيا الضيء
Google and Microsoft had no chance of understanding Gaddafi's Arabic – and failed completely.
But Maqsam was sensationally good, judging by the quality of the recording and the spoken Arabic. You only have to fix a few small things (also spelling), but you can work very well with that.
Here is what Gaddafi said in this part (I have used some Standard Arabic spelling to make it easier):
أنا سأوجّه نداء للملايين من الصحراء إلى الصحراء، وسنزحف أنا والملايين لتطهير ليبيا شبرا شبرا.. بيتا بيتا.. دارا دارا.. زنقة زنقة.. فردا فردا.. حتى تتطهر البلد من الدنس والأرجاس
Rough English translation:
I am calling upon the millions, from one end of the desert to the other. And we, the millions and I, will march to purify Libya inch by inch, house by house, home by home, alley by alley, person by person… until the country is clean of the dirt and impurities.
Gaddafi is a tough nut to crack. I'm pretty excited about the result. Especially if you want to analyze or transcribe longer videos, Maqsam can be of great help.
Off-topic: Zanga Zanga song
Gaddafi's speech became even more famous later. “Zenga Zenga”, an auto-tuned song, became a viral YouTube video that parodied the Libyan leader Muammar Gaddafi. The song, released on February 22, 2011, quickly became popular among the Libyan opposition active in the 2011 Libyan civil war. The song was created by Noy Alooshe, an Israeli journalist and musician. His family is of Tunisisan-Jewish decent. The original music video has more than 5 million views.
- In the trial version, you can upload audio files up to one minute long.
- Interestingly, the machine has big problems when speaking with pronounced case endings.
- Some words are transcribed acoustically correct – but that doesn't mean the words are correct. For example, in Gaddafi's Zanga Zanga speech, Maqsam detected the word زنجة which may be difficult for you to find in a dictionary since the Standard Arabic equivalent is زنقة. In many Arabic dialects, the ق is pronounced as “g” as in English girl. In Egyptian Arabic, however, “Zanga” is pronounced “zan2a” with Hamza.
- The Arabic spelling is not consistent, especially the spelling of hamza.
- It is not clear what grammar and spelling rules the software uses for each dialect. It is often contradictory. For example, the Standard Arabic letter ظ is sometimes pronounced as ض in some dialects. Maqsam sometimes writes ط in such instances and other times again ض. It is not really a problem as Arabic speakers and learners usually know how to deal with it.