How to detect toxic language using Text Vision

Whether you are running a dating site, classifieds site or some other type of marketplace, toxic language is likely something that you encounter and need to deal with to keep your site clean and protect your users.

This article explains how to make use of Implio's built-in Text Vision filters to detect and block out-of-line contents before they hit your site.

Target audience

Toxic language filters can be useful for the following types of marketplaces and content:

Type of marketplace
  • Classifieds site
  • Dating site
  • Sharing economy site
Type of content
  • Classified ad
  • Profile description
  • One-to-one message

Supported fields and languages

Toxic language filters operate on the following API input fields:

  • content.title
  • content.body

The following languages are currently supported:

Language
ISO 639-1 code
English
en
French
fr
Spanish
es

Not seeing the language you are looking for? Reach out to our support team to know more about upcoming languages!

How to use toxic language filters

Before you start

Implio's built-in Text Vision filters leverage automatic language detection, as they operate on specific languages.
For optimal results, make sure you set the content.languageExpected API input field to make the language detection more reliable. See How to check the language in which users write for more information.

BLANG variables

Toxic language filters are exposed as several BLANG variables, each corresponding to a different kind of toxic language.
Each variable contains the number of terms that were found in the text:

Variable
Description
Data type
Possible values
$text.blasphemyCount
Number of blasphemy terms detected in the text
Integer
Number of terms detected, 0 if no term matched or if the language isn't supported
$text.sexualTermCount
Number of sexual terms detected in the text
Integer
Number of terms detected, 0 if no term matched or if the language isn't supported
$text.badWordCount
Number of bad words detected in the text
Integer
Number of terms detected, 0 if no term matched or if the language isn't supported
$text.violenceTermCount
Number of terms related to violence detected in the text
Integer
Number of terms detected, 0 if no term matched or if the language isn't supported
$text.extremismTermCount
Number of terms related to extremism detected in the text
Integer
Number of terms detected, 0 if no term matched or if the language isn't supported
$text.racismTermCount
Number of terms related to racism detected in the text
Integer
Number of terms detected, 0 if no term matched or if the language isn't supported

It is worth noting that each term or expression may only be counted once by the above variables. In other words, there is no overlap between the different filters.

Additionally, this variable contains the sum of all the above-listed variables:

Variable
Description
Data type
Possible values
$text.toxicTermCount
Total number of toxic terms detected in the text
Integer
Number of terms detected, 0 if no term matched or if the language isn't supported

Filtering toxic language using rules

This BLANG condition will pick up any occurrence of toxic language in the text:

$text.toxicTermCount>0

which is strictly equivalent to:

$text.blasphemyCount>0 OR $text.badWordCount>0 OR $text.sexualTermCount>0 OR $text.violenceTermCount>0 OR $text.extremismTermCount>0 OR $text.racismTermCount>0

You may choose to remove some of the variables from the condition, depending on what type of toxic language you wish to filter out (blasphemy for instance may be considered as acceptable), or split the condition into multiple ones with different actions.

Setting the rule's action

Mild profanity or other kinds of toxic language can sometimes be acceptable depending on the context in which they are used, and how much you tolerate on your site.
For instance, use of common words like 'crap' may not be reason enough to refuse a piece of content.

For this reason, it is preferable to set the corresponding rule's action to Send to manual rather than Refuse, so that the content can be reviewed by a moderator.

Finally, you may decide to refuse contents that contain multiple occurrences of toxic language. You can do so by adding a rule such as:

$text.toxicTermCount>=3

and setting its action to Refuse.

Known limitations

Our Text Vision filters have been meticulously crafted by our team of linguists and data scientists and tested against large corpora of user-generated content.

However, they may sometime bring false positives. Conversely, they may be missing some terms or expressions.

We update Text Vision filters regularly. We value and welcome your feedback to help us improve Implio.

Was this article helpful?

Can’t find what you’re looking for?

We are here to support you.