Last updated: 6 months ago
Watching a video and reading what you hear is perhaps the best way to learn and understand a language. Arabic, however, offers few opportunities in this regard. Subtitles are often poorly written by volunteers and often summarize the content, but do not always accurately reflect what was said. Thanks to AI, however, it is now possible to create transcripts of videos yourself – and have them translated at the same time.
In this article, I show how you can do this using OpenAI. It all looks a bit nerdy, but it’s actually quite simple and can be done in 5 minutes, even for people without terminal/shell or programming skills. We’ll do it all step-by-step.
Using Whisper AI to transcribe and translate Arabic videos
Whisper is a general-purpose speech recognition model developed by OpenAI (the company that created Chat-GPT). It is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The developers used Python, so you need Python on your computer to run it.
It works very well with Spanish, English and German and, surprisingly, Japanese. Arabic is in the middle in terms of word error rate. Compared to other speech-to-text systems, I have to say that Whisper is the best free system available so far. Let’s see what it can do.
An example of a machine-transcribed Arabic video
The media coverage on the major Arabic news channels al-Arabiyya and al-Jazeera is more or less in standard Arabic – but since standard Arabic is not heard in everyday life, it is often simply too fast, especially for Arabic learners. It helps a lot to have the text in front of you. Whisper makes it easy to create transcripts – and I think they’re surprisingly good.
Here is an example of a video from Al Jazeera:
Here is a snippet of the transcript generated by Whisper from OpenAI using the TRANSCRIBE module:
معنا من غزة مرسلة الجزيرة هباء عكيلة
هباء أخر التطورات في القطاع الذي يعاني أوضاعاً إنسانياً صعبة
الغارات الإسرائيلية ما زالت مستمرة
وهذه الغارات تركزت في الساعات القليلة الماضية
في منطقة شمال غرب مدينة غزة وفي شمال قطاع غزة
كذلك الحال في منطقة دير البلح وسط قطاع غزة
كان هناك عدد من الغارات استهدفت مناطق متفرقة
القصف الإسرائيلي أيضاً تواصل ساعات الماضية في جنوب قطاع غزة في رفح في خان يونس
وكان الأبرز خلال ساعات الليلة الماضية ومنذ صباح هذا اليوم وحتى الآن
هو استمرار الغارات الإسرائيلية مستهدفة المنازل والبنايات السكنية
مما أدى إلى وقوع عشرات الشهداء والجرحى
هذه الغارات كانت تستهدف هذه المنازل ومعظم هذه المنازل مأهولة بالسكان
وبالتالي هدم هذه المنازل على رؤوس ساكنيها وهذا ما رفع من عدد الشهداء والجرحى
نتحدث أيضاً أن هذه المنازل التي استهدفت على سبيل المثال منزل عائلة شهاب في بلدة جباليا
كانت حصيلة هذا القصف 44 شهيداً وعشرات الجرحى
لأن هذه الأسر التي تسكن في هذا المبنى السكني كانت تستضيف أعداد من الأقارب والمعارف
النازحين من المناطق الحدودية في بيت حانون وفي شمال قطاع غزة
وبالتالي هذا تكرر أيضاً في مناطق عدة في لبريج في منزل عائلة أبو مدين
الذي أوقع 17 شهيداً وعشرات الإصابات
…كذلك الحال في مخيم جباليا شمال قطاع غزة منزل عائلة حلاوة
There are some issues, of course, often they are related to proper names. For example, هباء عكيلة should be هبة عكيلة. But despite some minor mistakes, I would say it is really remarkable.
An example of a machine-translated Arabic video
You can even translate videos automatically by Whisper AI. You don’t need to go the Google Translate route. You can use Open AI instead directly. Again, the results are really impressive. Let’s take the above video again and let Whisper AI automatically translate it:
We have with us from Gaza Al-Jazeera correspondent Hiba Akila
Hiba, what are the latest developments in the sector that is facing a difficult humanitarian situation?
The Israeli raids are still going on
and these raids were concentrated in the last few hours
in the north-west of Gaza City and in the north of the Gaza Strip
and also in the area of Deir El-Balah
in the middle of the Gaza Strip
there were a number of raids targeting different areas
The Israeli bombardment also continued in the last few hours
in the south of the Gaza Strip, in Rafah, in Khan Younes
and the most prominent during the last few hours of the night
and since the morning of this day until now
is the continuation of the Israeli raids targeting houses and residential buildings
which led to the fall of dozens of martyrs and injuries
These raids were targeting these houses
and most of these houses are inhabited
and therefore these houses were destroyed on the heads of its inhabitants
and this increased the number of martyrs and injuries
We also say that these houses that were targeted
for example, the house of the Shihab family in the city of Jabalia
this bombardment had 44 martyrs and dozens of injuries
because these families that live in this residential building
were hosting a number of relatives and acquaintances
who were displaced from the border areas in Bayt Hanoun
and in the north of the Gaza Strip
and therefore this was repeated in several areas
in Librej, in the house of the Abu Meddan family
who killed 17 martyrs and dozens of injuries
as well as in the camp of Jabalia in the north of the Gaza Strip…
Also here, we have some minor issues, for example, Gaza is often simply called القطاع (-> “Gaza strip”) which is translated as “the sector”. And again, you need to double-check names.
So how do we get such transcripts and translations? There are several options depending on your operating system and budget.
I have been using Linux since my childhood. Therefore, I cannot say first-hand how well the software solutions for Macintosh or Windows work. However, if in doubt, anyone can try out the Google Cloud solution (Number 3).
Apple users (paid): Whisperscript
From the product description: “Quickly and easily transcribe audio files into text with Whisper. Whether you’re recording a meeting, lecture, or other important audio, MacWhisper quickly your audio files into text.”
Easy solution (free): Using the Google cloud
This is a handy solution if you would rather not use your desktop computer. You can even use it on your mobile, since we will run all required software and processes in the Google Cloud.
Unless you’re a softare developer or work a lot with data analytics, you may not have heard of Google Colab. It is a really good platform to run Python code – and Whisper depends on Python. Basically, it is like Google giving you a computer to work with that already has Python installed.
Requirements: You do need a Google account to access it.
So, how does this work? It is easy. Here are the steps:
- Click the button below or here
- You need to log-in (Google account)
- Save a copy of the Colab sheet to your Google Drive
- Now just press the play button one after another.
- Adjust the YouTube link or upload an audio file
If you struggle with Google Colab, there are many YouTube tutorials. Here is a simple explanation:
Google Colab, or “Colaboratory”, is a cloud-based Jupyter notebook environment that allows you to write and execute Python code in your browser without any configuration required. It provides access to GPUs free of charge and is especially well-suited for machine learning, data analysis, and education
KDENLIVE (free): Generating Arabic or English subtitles
A reader (@Clorijn) recently asked me if I knew of any YouTube videos with Arabic subtitles. These are really hard to find. The streaming platform Shahid has this option. For YouTube videos, however, the only solution is to generate the subtitles yourself. Whisper from OpenAI is a good choice for this. In combination with the free video program Kdenlive this is easy.
You can download the software from the Kdenlive website (Linux, Apple, Windows, App image editions) – click here. Then go to Settings ‣ Configure Kdenlive ‣ Speech to Text page. You may need to install some dependencies.
If you encounter any difficulties, you can follow this YouTube tutorial:
Note: If you are a Linux user and have the flatpak edition of Kdenlive, you may need to adjust some things in the environment settings. For details, take a look at this page and scroll to the bottom.
Local installation (free)
I work mainly in the Linux terminal with the command line. Once Whisper is installed, you can quickly create transcriptions. The same is true for Windows and Mac.
You find the setup instructions on the Whisper Github page or click the button below:
Windows users
You can do all by hand, but apparently, there is a script which does the job for you. How it works:
- Search for the “powershell” and “run as administrator”
- When the shell is open, run the following command:
C:\... iex (irm whisper.tc.ht)
- Note: IRM is the Invoke-RestMethod. It will download a script from that website. And IEX is short for for Invoke-Expression. It will run the script.
- Now all relevant files should be installed automatically. If it worked, you should be able to run whisper in your shell.
Sounds difficult? There is a YouTube video explaining all the steps:
Linux users
Linux distributions are ideal for Whisper because they already come with a current Python version and package management. Ubuntu, for example, uses apt to install packages and modules; Python is already available in version > 3.10. First, start your distribution’s command line (“terminal”) program. Note: The following commands are for Ubuntu/Debian users; arch users can simply use yay whisper or pacman -S whisper if they have the community repositories enabled:
sudo apt install ffmpeg git
pip install git+https://github.com/openai/whisper.git
Older versions of Python do not appear in the repositories, so add an additional software repository. If you are using Ubuntu 22.04 or a derivative based on it, enter the following lines in the terminal and confirm each with Enter:
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.10-venv
sudo apt install python3.9
sudo apt install python3.9-venv
For people who want to use their GPU (Nvidia):
Install extra/python-pytorch-cuda. So just type yay python-pytorch-cuda in your terminal if your Linux is Arch-based.
Apple users
To use Whisper on the Mac, you first need the command line tools of the Xcode development environment. If you install it via the Mac App Store, it will take up around 40 GB of permanent storage. If you don’t plan to develop your own apps, you can save space: With the command line command xcode-select –install you only trigger the installation of the essential software packages; their space requirement is less than 2 GB.
They provide the basis for the Homebrew package management. The git version management system, which Whisper requires, is also transferred to the computer in this way. You install Homebrew yourself by entering the following command in the “Terminal” program:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Then follow the instructions in the command line window. The whole thing takes maybe a minute. After that, enter the following commands to install ffmpeg and Python 3.9:
brew update
brew install ffmpeg python@3.9
How to use the command line
The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the –language option:
whisper al-jazeera.mp3 --language arabic --model large --output_format txt
Adding –task translate will translate the speech into English:
whisper al-jazeera.mp3 --language arabic --task translate
To view all available options:
whisper --help
Tools to convert YouTube videos to audio
There are online tools and browser add-ons and offline shell tools to extract just the audio from videos.