milkyway 6
milkyway 7
milkyway 8
Technology
December 16, 2025

Building Multilingual LLM for Southeast Asia

Most global large language models (LLMs) are trained predominantly on English and Western-centric data. This raises an important question: what about users in Southeast Asia? The region is rich in languages and cultural nuances—so how well can today’s AI truly understand local users?


Article_1NOV_1200X800.jpg


This challenge is what inspired AI Singapore to develop Sea Lion, an open-source language model built specifically for Southeast Asia. Dr. William Tjhi, Head of Applied Research at AI Singapore, and Potsawee Manakul, Senior AI Researcher at SCB 10X, shared the goals and challenges behind creating a model that “truly understands our own cultures.”

1️⃣ What’s the Problem with Today’s Global AI Models?

Most AI models are developed in the U.S. or China, and nearly 95% of their training data is in English. As a result, they often struggle to understand Southeast Asian users and local contexts.

Examples include:

  • AI misidentifying Southeast Asian cultural clothing or food when generating images
  • Incorrect or irrelevant business information about smaller SEA cities
  • Responses that overlook cultural sensitivities or use tone that would be considered rude or inappropriate in local contexts

2️⃣ How Does Sea Lion Solve This Problem?

Sea Lion tackles these issues through three main strategies:

  1. Using real linguistic and cultural data from Southeast Asia, combined with verification by local experts
  2. Training the AI on cultural appropriateness, teaching it how to respond respectfully and contextually in each SEA country
  3. Collaborating with regional partners, such as SCB 10X (Thailand) and Gojek (Indonesia), to ensure real users’ expectations are reflected in the model’s behavior

3️⃣ Key Challenges in Teaching AI to Understand Southeast Asian Languages & Cultures

Two major obstacles stand out:

  • Mixed or hybrid languages
    Countries like Singapore use Singlish, a blend of Chinese, Malay, and English, while the Philippines has Taglish. Training AI to understand and speak these naturally is difficult and highly complex.
  • Lack of standard evaluation metrics
    Global AI benchmarks do not work for SEA languages, as no standard tests exist. AI Singapore created a new evaluation suite called Seahound, developed with language experts to properly assess SEA language understanding.

4️⃣ Sea Lion Becomes Multimodal: Understanding Images in Southeast Asian Contexts

Sea Lion has expanded to multimodal capabilities—understanding both text and images. The main focus is image understanding rather than image generation, targeting real regional needs:

  • Tourism / Culture / Food: Identifying historic sites, understanding local dishes, or suggesting food pairings
  • Safety & cultural sensitivity: Detecting and filtering culturally inappropriate images, a necessity for many ASEAN countries

5️⃣ What’s Next for Sea Lion and AI Singapore (2025–2026)?

Sea Lion’s roadmap focuses on four pillars:

  1. Collaboration
    Expanding partnerships with global players like Google and with SEA countries such as the Philippines.
  2. Value Creation
    Building applications in infrastructure, public health, education, and public services.
  3. Safety
    Strengthening alignment and adversarial robustness to ensure trustworthy AI.
  4. Resource Efficiency
    Developing smaller, more efficient models, reflecting the region’s need for cost-effective AI solutions.

The Future: Cooperation Over Competition

Local AI models should collaborate rather than compete. Global LLMs excel at reasoning and coding, while regional LLMs provide cultural grounding. Together, they create products that are both powerful and locally relevant.

Sea Lion represents Southeast Asia’s commitment to building AI that is as smart as global models—but deeply connected to the lives, languages, and cultures of our region.

See more at : https://youtu.be/kS44VoIZT3Y?si=zQqfcsjHioa8A_GO

 

Use and Management of Cookies

We use cookies and other similar technologies on our website to enhance your browsing experience. For more information, please visit our Cookies Notice.

Reject
Accept