Mohammad Abu Sheikh is transforming the AI landscape in the MENA region, driving a shift from passive consumption to sovereign innovation. As CEO of CNTXT AI and founder of a $10 million AI fund, he has led three successful exits and secured over a billion dollars in funding. His work is laying the foundation for an AI ecosystem rooted in language, culture, and data sovereignty.
We saw the abundance of underutilized data in this part of the world. A lot of problems in scaling AI came from the lack of data readiness — which eventually meant a lack of AI readiness. That’s why we started CNTXT AI.
Initially, we were solving the same problems we faced while building LocAI…We saw these challenges firsthand working with AI71, TII and G42 (IIAI). As we helped these entities solve those problems, the vision got clearer and the business just kept growing.
You’ve played a key role in building the largest Arabic digital library for AI training. What were some of the biggest challenges in doing so, and how did you overcome them?
Quality was one of the biggest challenges. Another was the limited availability of high-quality Arabic data online: Arabic is seriously underrepresented. Only a small portion of Arabic-language content has been digitized, and just 3–5% of all online content is in Arabic. That’s almost nothing. We overcame that problem by deploying data labelers, annotators, and data scientists to digitize, create, and curate the data ourselves.
CNTXT AI operates at the intersection of culture and computation. How do you balance cutting-edge AI innovation with the goal of building culturally relevant solutions for the MENA region?
We build culturally grounded models from the ground up. From infrastructure to final product, culture is embedded from the very beginning — it’s not something we add later. We design, innovate, and build with specific cultures, dialects, and needs in mind from day one. Arabic is one language, but it carries many dialects and cultural contexts across the region, so we build local products for local countries. And we do that by working with local annotators, people on the ground, in their own countries.
You’ve also co-founded LocAI and lead the SMPL AI Fund. How do these ventures complement the mission of CNTXT AI?
LocAI is the application layer — the part people actually interact with. It sits right on top of the data and infrastructure built by CNTXT AI. That’s what made it successful: it transforms AI foundations provided by CNTXT AI into real-world solutions people can use.
SMPL AI, on the other hand, is about giving back to the community. It focuses on investing in early-stage startups and helping build the regional AI ecosystem. We share the tools and lessons we’ve learned from building AI ourselves, so founders can grow faster and avoid common pitfalls.
Munsit has been called the most accurate Arabic speech recognition model in the world. What drove the development of this model, and why now?
What drove the development of this model was simple: the need.
We always build out of necessity. We looked at the market and saw the landscape was ripe — government agencies and private clients were all asking for a solution like this.
The existing models just weren’t up to the task. Most are built on English tech and then adapted. They aren’t designed for Arabic from the ground up, and definitely not for the specific problems we’re solving.
So we decided to build our own. It’s Arabic first — by design.
The research behind Munsit introduces a weakly supervised learning approach. Can you explain what that means and why it was essential for training Arabic ASR at scale?
Annotation is expensive. So we had to move beyond traditional methods that depend on large amounts of manual transcription. Weakly supervised learning helped us scale without having to label every audio file by hand — which is especially important for Arabic, a language with limited data and many different dialects.
Instead of using professionally transcribed audio, we started with 30,000 hours of unlabeled Arabic speech. We built an annotation pipeline that generates, filters and cleans the best ones using automated checks. This gave us a high-quality 15,000-hour dataset — all without human transcription.
This approach made it possible to train our model from scratch, capturing the richness of spoken Arabic across real-life situations, quickly and cost-effectively. Without this method, building an Arabic ASR system at this scale would have taken years and millions in manual effort.
Munsit outperformed models from OpenAI, Microsoft, and Meta across multiple benchmarks. What does this achievement say about the future of Arabic AI innovation?
The future of Arabic AI is in our hands; and that’s exactly what this achievement proves. We can no longer afford to rely on technologies we don’t own or depend on third parties who don’t prioritize our region.
Munsit shows that we can build world-class AI, from the region, for the region — using local talent to solve local problems. It’s a clear signal that the next wave of Arabic AI innovation will come from within.
How do you see Munsit evolving in future versions, and what are the next frontiers for Arabic voice AI at CNTXT?
You’ll just have to wait and see. What I can say is that we have a fresh, new suite of Arabic-first AI solutions on the way — all powered by Munsit and other models we’re currently building at CNTXT AI. This is just the beginning.
You often speak about the importance of “sovereign AI.” What does that term mean to you, and why is it critical for the Gulf and broader MENA region?
To me, sovereign AI means having full ownership and control over the data, infrastructure, and models that shape our future. It’s critical because we need to own our own fate, and that starts with data.
Data sovereignty is everything. Data is precious, and we need to make sure it stays in our hands.
We can’t afford to hand over our future and sit idle while others build the technology for us. The future of AI in this region will come from this region. That’s exactly what we’re working toward.
How do you see CNTXT AI shaping the AI ecosystem in the Middle East over the next five years?
By enabling true AI readiness. We go in, understand what companies and governments need, build the data and AI strategies, and then help them build, test, deploy and scale.
If data is the new oil, then unstructured data is oil unrefined — full of potential but useless until processed. That’s why we’ve built CNTXT AI to help organizations clean, structure, and activate their data. Because that’s where real AI transformation begins.
From your vantage point as both an entrepreneur and investor, what advice would you give to other founders building AI startups in emerging markets?
Start now. Move quickly. Fail fast, learn faster, and keep iterating.
Most importantly, build for real problems. Stay close to the ground — listen to users, not just the hype. In emerging markets, relevance and adaptability are key.
Thank you for the great interview, readers who wish to learn more should visit CNTXT AI.
CNTXT,CNTXT AI,Interview