Most LLMs ignores Baltic and Eastern European languages — TildeLM is the solution

Developed in Latvia with support from the EU’s top supercomputers and the European Commission, TildeLM is designed to serve the 250 million Europeans whose languages are ignored by mainstream AI.
Most LLMs ignores Baltic and Eastern European languages — TildeLM is the solution

We talk about LLMs like ChatGPT and Claude, but there is less attention given to National Language models. 

Europe alone has 24 official EU languages, and over 80 spoken across the continent. 

However, commercial models are primarily trained on English and other major languages, and in terms of smaller languages, many of these have limited digital representation, especially smaller or regional languages like Latvian, Catalan, or Basque.

Yet collectively, these "small" languages are spoken by over 250 million Europeans.

This means that when it comes to AI, English-dominant AI models dominate the discourse, and there is a lack of digital equity. 

Models like ChatGPT and DeepSeek struggle not only with language diversity but also with recognising local context. 

Further,  most of these models are hosted outside the EU — in the US or China—which raises sovereignty and data privacy concerns. Governments cannot simply send sensitive documents abroad for processing.

To reset the balance, Latvian initiative Tilde is building TildeLM – an open-source foundational LLM with over 30 billion parameters designed for Baltic and Eastern European languages. 

A Baltic-born model for Europe’s underserved languages

Tilde brings together a team of over 150 dedicated professionals—including software engineers, researchers, computational linguists, professional translators, and business experts—working across offices in Riga, Vilnius, and Tallinn. Over the years, Tilde has developed a vast R&D partnership network with leading EU research centres and universities and serves as a language technology research hub for the Baltic region. 

I spoke to Toms Bergmanis, a researcher at Tilde, to find out more.

Bergmanis studied abroad in the UK and was always fascinated by the idea that we can build technologies that understand language better than we do. 

He admits;

 "I've never been good with languages—terrible at French. At 27, I found out I'm dyslexic, which explained why I always struggled with reading and writing. But it never stopped me from studying or diving deep into natural language processing (NLP). That's what I ended up doing my PhD in."

According to Bergmanis, LLMs "are great in English, German, and French, but smaller languages suffer."

"The models often make basic mistakes, like generating made-up words or failing to handle grammatical nuances, especially in languages with gendered cases or flexible word order like Latvian, Polish, or Russian."

When he returned to Latvia, he discovered that few local companies were working in AI or NLP. But he had some prior contacts, including a company started 30 years ago to support regional languages in the post-Soviet space. 

He recalled:

"Back then, most software was either outdated Russian or foreign and didn't support local languages."

The company, with its keyboard layouts and font localisation, then moved into more advanced NLP, such as translation and grammar correction tools. 

From fonts to foundational models

Today, Tilde has two central departments. 

One is language services, such as translation and transcription. The other is language technology, where it builds tools like grammar checkers, translation systems, and more advanced products like virtual assistants. For example, it built a virtual assistant for the European Space Agency that helps users browse their star photography database.

Tilde is creating smaller, locally deployable models that work without needing to send data to the cloud. They'll be open-source and freely available. 

Bergmanis explained:

 "Our business is not to commercialise the model itself, but to integrate it into products and offer localisation services. Some clients want models deployed within their own infrastructure — or at least within their national borders — because of data regulations."

Specifically, the model is on the larger side of "small," or the very small side of "large," so it can run on reasonably priced hardware. The project is also exploring compression techniques to reduce computational requirements. This makes it viable for clients without massive infrastructure.

Secure, customisable, and built for European sovereignty

TildeLM empowers both businesses and governments to build AI solutions that truly understand and reflect Europe's linguistic and cultural diversity. It offers custom AI solutions tailored to specific industries, workflows, and languages, ranging from virtual assistants and secure translation to speech technology and beyond.

For Governments and Public Institutions, it's a platform for developing national language models that promote digital sovereignty, support public services, and ensure inclusion of all EU official languages.

Built for customisation, security, and multilingual capability, TildeLM enables organisations to fine-tune the model with their own data and deploy it securely, either on-premises or in the cloud.

The project targets Eastern European and Baltic languages such as Bulgarian, Croatian, Czech, Estonian, Finnish, Latvian, Lithuanian, Macedonian, Montenegrin, Polish, Serbian, Slovak, Slovene, and Ukrainian. The model will also support larger languages, such as English, French, German, and Russian, in balanced proportions to support translation and related multilingual tasks. 

TildeLM secures 2 million GPU hours on LUMI to champion multilingual AI 

Tilde was selected as a winner of the LARGE AI GRAND CHALLENGE, an EU-backed competition launched under the AI-BOOST project with support from the European Commission and the EuroHPC Joint Undertaking. The initiative aims to foster European leadership in the development of large-scale AI models. As part of the award, Tilde gained access to Europe's top supercomputers — LUMI and Leonardo — for a 12-month development period and is also collaborating with Germany's upcoming exascale system, JUPITER.

With over 2 million GPU hours allocated on the EuroHPC LUMI supercomputer and strong institutional backing, the project stands among the most ambitious AI efforts currently underway in Europe.

TildeLM has completed its foundational training and is now ready for fine-tuning, with applications such as contextual translation and question answering over documents. There'll also be demo applications to demonstrate the model's capabilities.

Bergmanis admits:

 "The project is partly emotional for us. Coming from the Baltic states, it means a lot to build tools that support our own languages and those of our neighbours.

Europe deserves language technology that reflects its diversity, not just tools that work best in English."

A growing Pan-European movement 

Tilde is not the only European initiative focused on language equity in LLMs.

In 2023, SiloGen, the LLM arm of Europe's largest private AI labs Silo AI (now Silo AMD since its acquisition by AMD), launched a consortium together with TurkuNLP, a research group at the University of Turku, to develop a family of open LLMs, to introduce open LLMs for all EU languages, including Swedish, Danish, Icelandic, and Norwegian. 

Further, ETH Zurich, in collaboration with École Polytechnique Fédérale de Lausanne (EPFL) and the Swiss National Supercomputing Centre (CSCS), is preparing to release a fully open-source LLM in late summer 2025.

Designed for broad public benefit, the model has been trained on more than 15 trillion tokens, covering over 1,500 languages, with support for more than 1,000 languages at deployment, making it one of the most multilingual LLMs to date. Two versions will be available: an 8 billion parameter model for lighter applications and a 70 billion parameter model, placing it among the most powerful open-source LLMs globally.

Follow the developments in the technology world. What would you like us to deliver to you?
Your subscription registration has been successfully created.