Tests the performance of LLMs in zero-shot multi-choice question-answering and analyses how well they output results in the required format.
Languages covered: Czech, English, Estonian, Finnish, French, German, Hungarian, Latvian, Lithuanian, Polish, Russian, Ukrainian
Tests the performance of LLMs in one-shot translation capabilities. Sentences are translated without document context (document-level benchmark coming soon...).
Languages covered: English, Estonian, French, German, Latvian, Lithuanian, Polish, Russian
Tests the robustness of machine translation models, LLMs, and commercial MT services when encountering unseen data and tagged content, which are typical things that production systems must be able to deal with.
Languages covered: English, Latvian
Compares LLM tokenizers (total number of tokens generated, token-per-word ratio, vocabulary size).
Languages covered: Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Hungarian, Icelandic, Italian, Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Ukrainian
Manual error analysis for Latvian, Lithuanian, and Estonian. Focus on European LLMs.
Languages covered: Estonian, Latvian, Lithuanian
More benchmarks coming soon...