Building Multilingual LLM for Southeast Asia
Most global large language models (LLMs) are trained predominantly on English and Western-centric data. This raises an important question: what about users in Southeast Asia? The region is rich in languages and cultural nuances—so how well can today’s AI truly understand local users?
This challenge is what inspired AI Singapore to develop Sea Lion, an open-source language model built specifically for Southeast Asia. Dr. William Tjhi, Head of Applied Research at AI Singapore, and Potsawee Manakul, Senior AI Researcher at SCB 10X, shared the goals and challenges behind creating a model that “truly understands our own cultures.”
1️⃣ What’s the Problem with Today’s Global AI Models?
Most AI models are developed in the U.S. or China, and nearly 95% of their training data is in English. As a result, they often struggle to understand Southeast Asian users and local contexts.
Examples include:
- AI misidentifying Southeast Asian cultural clothing or food when generating images
- Incorrect or irrelevant business information about smaller SEA cities
- Responses that overlook cultural sensitivities or use tone that would be considered rude or inappropriate in local contexts
2️⃣ How Does Sea Lion Solve This Problem?
Sea Lion tackles these issues through three main strategies:
- Using real linguistic and cultural data from Southeast Asia, combined with verification by local experts
- Training the AI on cultural appropriateness, teaching it how to respond respectfully and contextually in each SEA country
- Collaborating with regional partners, such as SCB 10X (Thailand) and Gojek (Indonesia), to ensure real users’ expectations are reflected in the model’s behavior
3️⃣ Key Challenges in Teaching AI to Understand Southeast Asian Languages & Cultures
Two major obstacles stand out:
- Mixed or hybrid languages
Countries like Singapore use Singlish, a blend of Chinese, Malay, and English, while the Philippines has Taglish. Training AI to understand and speak these naturally is difficult and highly complex. - Lack of standard evaluation metrics
Global AI benchmarks do not work for SEA languages, as no standard tests exist. AI Singapore created a new evaluation suite called Seahound, developed with language experts to properly assess SEA language understanding.
4️⃣ Sea Lion Becomes Multimodal: Understanding Images in Southeast Asian Contexts
Sea Lion has expanded to multimodal capabilities—understanding both text and images. The main focus is image understanding rather than image generation, targeting real regional needs:
- Tourism / Culture / Food: Identifying historic sites, understanding local dishes, or suggesting food pairings
- Safety & cultural sensitivity: Detecting and filtering culturally inappropriate images, a necessity for many ASEAN countries
5️⃣ What’s Next for Sea Lion and AI Singapore (2025–2026)?
Sea Lion’s roadmap focuses on four pillars:
- Collaboration
Expanding partnerships with global players like Google and with SEA countries such as the Philippines. - Value Creation
Building applications in infrastructure, public health, education, and public services. - Safety
Strengthening alignment and adversarial robustness to ensure trustworthy AI. - Resource Efficiency
Developing smaller, more efficient models, reflecting the region’s need for cost-effective AI solutions.
The Future: Cooperation Over Competition
Local AI models should collaborate rather than compete. Global LLMs excel at reasoning and coding, while regional LLMs provide cultural grounding. Together, they create products that are both powerful and locally relevant.
Sea Lion represents Southeast Asia’s commitment to building AI that is as smart as global models—but deeply connected to the lives, languages, and cultures of our region.
See more at : https://youtu.be/kS44VoIZT3Y?si=zQqfcsjHioa8A_GO





