Building AI for Arabic-speaking businesses in the UAE
Building AI products for the UAE means handling Arabic, English, Hindi, Tagalog, Urdu — often in the same conversation. Most generic AI stacks break on contact with reality. Here is what works.
A UAE business deploying an AI chatbot for customer support quickly discovers a problem that the global "best practices" articles do not address: their customers do not write in one language. They switch between Arabic and English mid-sentence. They use Egyptian dialect in one message and Khaleeji dialect in the next. They send voice notes in Hindi to a contact who speaks English back. They use Latin-script Arabic ("3arabizi" — writing Arabic with numerals for letters that English does not have) when they cannot find the Arabic keyboard.
Every one of these patterns breaks the default AI customer-support stacks built for English-speaking markets. This article is what we have learned about getting them to work for UAE deployments.
Model choice — the practical landscape
The Arabic capability of large language models has improved dramatically. You no longer need a specialist Arabic model for most tasks. But the gap between "works in English" and "works in Arabic" is still meaningful, and the gap is largest for dialects.
What works in practice for UAE deployments:
- For chat and customer support: the major frontier models (Claude, GPT, Gemini) handle Modern Standard Arabic well. Khaleeji dialect is hit-or-miss — generally OK for understanding, weaker for generation. If your customers speak Gulf dialect and expect dialect responses, you need either a dialect-specific fine-tune or a prompt strategy that explicitly asks the model to use Gulf register.
- For document processing: Arabic OCR has improved but is still meaningfully worse than English OCR, especially for handwriting and for documents with stamps overlaid on text. Test against your actual document samples; do not assume.
- For voice: Arabic ASR (automatic speech recognition) is now production-quality for MSA and for major dialects. Egyptian, Levantine, and Khaleeji are reasonably well-served. Less common dialects like Sudanese or Maghrebi remain harder.
- For embedding-based search: multilingual embedding models work but have noticeable quality drops on Arabic vs English. For high-stakes search (legal documents, medical records), test on your real corpus before committing.
The four UAE-specific patterns to handle
1. Code-switching (Arabic-English mixing). A typical WhatsApp message from a UAE customer might read: "Hi، أبغى أعرف if my order وصل ولا لا?" Mid-sentence language switches confuse some pipelines that do language detection upfront and route to language-specific models. The fix: detect the dominant language for routing, but pass the full original message to the model. Frontier models handle code-switching natively far better than separate language pipelines.
2. RTL rendering with mixed-direction content. Arabic renders right-to-left. English renders left-to-right. UAE business interfaces almost always contain both — an Arabic message thread might include English product names, English customer names, English link URLs. Browser and email-client rendering of bidirectional text is famously imperfect. Test in actual Arabic-UI environments (Arabic Outlook, Arabic Gmail mobile) before launching, not just in your developer Chrome window.
3. Voice input as a first-class channel. WhatsApp voice notes are how a huge fraction of UAE customer interactions happen, especially for older or Arabic-speaking customers. If your AI customer service handles only typed input, you are missing a large slice of your customer base. The pattern that works: transcribe the voice note (Arabic ASR), pass the transcription to the model, generate a text response, optionally also generate a voice response using Arabic TTS for customers who interact mostly by voice.
4. Multilingual customer service teams. Many UAE customer-service operations have agents who handle multiple languages — an agent might speak English, Arabic, Hindi, and Tagalog, switching based on who is on the line. AI tooling that assumes one language per agent does not fit. The pattern: store the customer's language preference (or detect it from their first message), surface the AI's draft response to the agent in the customer's language, but let the agent edit and respond in whatever language they choose to use.
Prompt engineering for Arabic
A few patterns we keep landing on:
Always prompt in the language you want the model to respond in. If you want Arabic output, write the system prompt in Arabic — even if your application code is in English. Mixed-language prompts consistently produce mixed-language outputs.
For Khaleeji (Gulf) dialect specifically, be explicit. Just saying "respond in Arabic" produces Modern Standard Arabic, which to a UAE customer reads as formal and slightly cold — like a newscaster responding to a casual question. Saying "respond in Khaleeji dialect" or "respond in the natural conversational Arabic of an Emirati" produces noticeably warmer, more appropriate output.
Provide style examples in the prompt. Two or three example exchanges in the style and register you want will produce more consistent output than a paragraph of style instructions.
Handle religious and cultural sensitivities explicitly. Greetings, condolences, and references to religious occasions follow conventions in Arabic that differ from English. If your AI is going to greet customers during Ramadan, send condolences on a death, or close a conversation politely, those conventions matter. Either include them in the prompt or use a thin layer of templated responses for these specific moments.
Common deployment patterns
For WhatsApp customer support, the typical stack is: WhatsApp Business API → webhook to your application → message normalisation (language detection, voice transcription if applicable) → AI model with conversation history context → response generation → translation if needed → WhatsApp reply. The whole flow takes 2-5 seconds end-to-end when tuned.
For internal AI assistants (sales, finance, ops), the typical stack is similar but with a RAG layer over the business's internal documents — typically a mix of Arabic and English content. The retrieval layer needs to handle queries in either language and surface documents in either language. We have had good results with multilingual embedding models for retrieval, then letting the generation model do the language matching in its response.
For voice IVR replacements, the typical stack is: call audio → real-time Arabic ASR → AI model → Arabic TTS → caller. The latency budget is much tighter (sub-second for natural conversation) and the model needs to handle interruptions, which is its own substantial engineering effort.
Cost considerations
Arabic text is denser per token than English in most current tokenizers, but not by enough to dominate cost decisions. The bigger cost driver is typically conversation length — UAE customer conversations tend to run longer than equivalent English ones, partly because of cultural norms around pleasantries, and partly because mixed-language conversations often need more clarification cycles. Budget accordingly.
If you are scoping an Arabic AI project, the AI automation services page walks through how matching works for AI engineering.