Whether your site accepts multiple languages or just a single one, you need to check the language for each item that gets submitted.
Implio has a built-in, AI-powered, language detection feature that makes this process fully automated and highly accurate.
Automatic language detection
Implio automatically detects the language in which the title and body of an item have been written in. The result is exposed as a BLANG variable, using ISO 639-1 codes. You are then free to create automation rules that make use of its value.
You can also specify the language that you expect the title and body to be written in using the content.languageExpected API input field (see API documentation). If specified, this value will be taken into consideration when the language is automatically detected, making the language prediction more reliable.
You will also be able to query its value as a corresponding BLANG variable, and use it to compare its value against the detected language:
Name (simple mode) |
Variable name (advanced mode) |
Description |
Type |
Example |
---|---|---|---|---|
Language detected |
$text.languageDetected |
Contains the ISO-639-1 code of the language in which the title and the body were detected to be written, or the string "unknown" if no language could be detected. |
String |
"fr" |
Language expected |
$text.languageExpected |
Contains the ISO-639-1 code of the language in which the title and the body are expected to be written, or <undefined> if no value was specified in the content.languageExpected API input field. |
String |
"en" |
Making use of the detected language
The following automation rule ensures that the detected language matches the expected one:
$text.languageDetected EQUALS $text.languageExpected
This rule handles cases where no language was predicted:
$text.languageDetected EQUALS "unknown"
This rule ensures that the language is either English or French:
$text.languageDetected CONTAINS ("en", "fr")
If you have a longer list of supported languages, you can add them to a list:
$text.languageDetected CONTAINS @supportedLanguages
You may also combine this variable with other ones to create more complex conditions.
Supported languages
Automatic language detection currently supports the following 123 languages:
Language |
ISO 639-1 code |
---|---|
Afrikaans | af |
Albanian | sq |
Amharic | am |
Arabic | ar |
Aragonese | an |
Armenian | hy |
Assamese | as |
Avaric | av |
Azerbaijani | az |
Bashkir | ba |
Basque | eu |
Belarusian | be |
Bengali | bn |
Bihari languages | bh |
Bosnian | bs |
Breton | br |
Bulgarian | bg |
Burmese | my |
Catalan | ca |
Central Khmer | km |
Chechen | ce |
Chinese | zh |
Chuvash | cv |
Cornish | kw |
Corsican | co |
Croatian | hr |
Czech | cs |
Danish | da |
Divehi | dv |
Dutch | nl |
English | en |
Esperanto | eo |
Estonian | et |
Finnish | fi |
French | fr |
Gaelic | gd |
Galician | gl |
Georgian | ka |
German | de |
Greek | el |
Guarani | gn |
Gujarati | gu |
Haitian | ht |
Hebrew | he |
Hindi | hi |
Hungarian | hu |
Icelandic | is |
Ido | io |
Indonesian | id |
Interlingua | ia |
Interlingue | ie |
Irish | ga |
Italian | it |
Japenese | ja |
Javanese | jv |
Kannada | kn |
Kazakh | kk |
Kirghiz | ky |
Komi | kv |
Korean | ko |
Kurdish | ku |
Lao | lo |
Latin | la |
Latvian | lv |
Limburgan | li |
Lithuanian | lt |
Luxembourgish | lb |
Macedonian | mk |
Malagasy | mg |
Malay | ms |
Malayalam | ml |
Maltese | mt |
Manx | gv |
Marathi | mr |
Mongolian | mn |
Nepali | ne |
Norwegian | no |
Norwegian Nynorsk | nn |
Occitan | oc |
Oriya | or |
Ossetian | os |
Panjabi | pa |
Pashto | ps |
Persian | fa |
Polish | pl |
Portuguese | pt |
Quechua | qu |
Romanian | ro |
Romansh | rm |
Russian | ru |
Sanskrit | sa |
Sardinian | sc |
Serbian | sr |
Serbo-Croatian | sh |
Sindhi | sd |
Sinhala | si |
Slovak | sk |
Slovenian | sl |
Somali | so |
Spanish | es |
Sundanese | su |
Swahili | sw |
Swedish | sv |
Tagalog | tl |
Tajik | tg |
Tamil | ta |
Tatar | tt |
Telugu | te |
Thai | th |
Tibetan | bo |
Turkish | tr |
Turkmen | tk |
Uighur | ug |
Ukranian | uk |
Urdu | ur |
Uzbek | uz |
Vietnamese | vi |
Volapük | vo |
Wallon | wa |
Welsh | cy |
Western Frisian | fy |
Yiddish | yi |
Yoruba | yo |
Known limitations
Automatic language detection uses state-of-the-art machine learning for accurate predictions. But as with any predictive models, it may sometime mistake one language for another, especially when those languages are very similar (e.g. Russian and Ukranian).
One way to avoid some of those mistakes is to specify the expected language (see above).
Also, a language may not be always be predicted, for instance when the text is very short. You can choose how to handle those cases in your automation rules by testing the $text.languageDetected for the "unknown" string (see example above).