Techno Blender
Digitally Yours.

AI model Poro sets new milestones for multilingual LLMs in Europe

0 20



Poro is a 34.2 billion parameter model, designed to process English, Finnish, and code. It’s been trained on a dataset of 1 trillion tokens.

“What we are proving with Poro is that we can build competitive models for low-resource languages, like Finnish,” Peter Sarlin, co-founder and CEO of Silo AI, told TNW.

Sarlin explained that in generic LLMs, high-resource languages like English dominate, meaning that the capabilities of low-resource languages reach the extent of translation, but aren’t representative of the language and the culture of a specific country.

According to the startup, Poro outperforms all existing open-source language models on Finnish, including Mistral, FinGPT, Llama, and the BLUUMI 176 billion parameter model.

TNW Conference 2024 – Group ticket offer

Save up to 40% with our Group offer and join Europe’s leading tech festival in June!

To achieve this, the team used a novel training approach, by pairing Finnish with high-resource languages. It determined optimal data reuse frequencies for low-resource languages and integrated translated paired texts between Finnish and English. This method relies on cross-lingual signals to boost the understanding of the connections between languages — and in turn, boost performance for Finnish, while not compromising it in English.

Poro has also achieved another milestone: it’s the first multilingual model that has been trained on a EuroHPC supercomputer. “This is proof that we’re able to train LLMs on the AMD-based LUMI supercomputer, instead of an NVIDIA-based supercomputer,” Sarlin said.

A step towards European sovereignty

Open-source multilingual LLMs are key to ensuring language diversity, cultural representation, and democratic accessin artificial intelligence. They’re also critical for Europe’s AI sovereignty.

“From a commercial perspective, these models build a baseline and infrastructure that allows European companies to innovate on top,” Sarlin noted. “This way companies can create IP, create competitive edge, and [create] great business that ensures that value stays in Europe with them.”

Poro is available for free under the Apache 2.0 License, which allows both commercial and research use. SiloAI is currently working on the Nordic languages (Swedish, Norwegian, Danish, and Icelandic), and is planning to expand to all other official languages of the EU.



Poro is a 34.2 billion parameter model, designed to process English, Finnish, and code. It’s been trained on a dataset of 1 trillion tokens.

“What we are proving with Poro is that we can build competitive models for low-resource languages, like Finnish,” Peter Sarlin, co-founder and CEO of Silo AI, told TNW.

Sarlin explained that in generic LLMs, high-resource languages like English dominate, meaning that the capabilities of low-resource languages reach the extent of translation, but aren’t representative of the language and the culture of a specific country.

According to the startup, Poro outperforms all existing open-source language models on Finnish, including Mistral, FinGPT, Llama, and the BLUUMI 176 billion parameter model.

TNW Conference 2024 – Group ticket offer

Save up to 40% with our Group offer and join Europe’s leading tech festival in June!

To achieve this, the team used a novel training approach, by pairing Finnish with high-resource languages. It determined optimal data reuse frequencies for low-resource languages and integrated translated paired texts between Finnish and English. This method relies on cross-lingual signals to boost the understanding of the connections between languages — and in turn, boost performance for Finnish, while not compromising it in English.

Poro has also achieved another milestone: it’s the first multilingual model that has been trained on a EuroHPC supercomputer. “This is proof that we’re able to train LLMs on the AMD-based LUMI supercomputer, instead of an NVIDIA-based supercomputer,” Sarlin said.

A step towards European sovereignty

Open-source multilingual LLMs are key to ensuring language diversity, cultural representation, and democratic accessin artificial intelligence. They’re also critical for Europe’s AI sovereignty.

“From a commercial perspective, these models build a baseline and infrastructure that allows European companies to innovate on top,” Sarlin noted. “This way companies can create IP, create competitive edge, and [create] great business that ensures that value stays in Europe with them.”

Poro is available for free under the Apache 2.0 License, which allows both commercial and research use. SiloAI is currently working on the Nordic languages (Swedish, Norwegian, Danish, and Icelandic), and is planning to expand to all other official languages of the EU.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment