Image Vision (AI) models and features

Implio comes with a wide range of Image Vision models carefully crafted by our team of Data Scientists and moderation experts. These models allow you to make sense of user-generated images and handle their moderation specifically, based on what they contain.

This page describes all available models and how to use corresponding predictions.

Note that these models need to be enabled before they can be used. To do so, please contact our support team.

Types of model

There are 2 different types of models: classification and detection:

  • Classification models yield predictions on the entire image.  
    Here's how they are represented in the Implio user interface:
  • Detection models on the other hand detect individual objects/elements on the image.  
    Here's how they are represented in the Implio user interface:

Detection models come with additional features – see the Detection models features section below for more information on how to use their unique features.

List of available models

The following table lists all currently available models, organized by topic.

Each model may yield one or several tags per image. The last column contains an explanation on how these tags should be interpreted:

Topic Model Type Tags How to interpret tags
Nudity Nudity Classification nudity Image contains full-on nudity.
Suggestive Classification suggestive Image is suggestive/racy/sexy.
Sex toy Classification sex_toy Image contains a sex toy.
Vulgarity Middle finger Classification middle_finger Image contains a hand with its middle finger raised.
Tongue out Classification tongue_out Image contains a person with their tongue out.
Violence & extremism Gore Classification gore Image contains elements of gore or violence.
Weapon Classification weapon Image contains a weapon.
Nazi Classification nazi Image contains nazi symbols, figures or propaganda.
Communism Classification communism Image contains communist symbols, figures or propaganda.
Daech Classification daech Image contains daech/isis symbols, figures or propaganda.
Terrorist Classification terrorist Image contains a terrorist.
Substances Drug Classification drug Image contains drugs/substances that may be prohibited.
Marijuana Classification marijuana Image contains marijuana or references to it.
Tobacco Classification tobacco Image contains tobacco products or people smoking.
Alcohol Classification alcohol Image contains alcoholic beverages or people drinking alcohol.
Faces Face Detection face Image contains a human face (face is visible).
Gender Classification male, female One of the faces detected looks like a male/female.
Minor Classification minor One of the faces detected looks like a minor (less than 18 years old).
People Child Classification child Image contains a child (face is not necessarily visible).
Ski mask Classification ski_mask Image contains a ski mask or a person wearing a ski mask.
Fake / misrepresentation Stock photo Classification stock_photo Image looks like a stock photo.
Model Classification male_model, female_model Image looks like a photo of a male/female model.
 Art  Artwork Classification artwork Image contains artwork: painting, drawing, or other artistic work.
Painting Classification painting Image contains a painting.
Drawing Classification drawing Image contains a drawing.
Manga Classification manga Image contains a manga (book) or manga art.
Comic art Classification comic_art Image contains a comic book or comic art.
Statue Classification statue Image contains a statue.
Text & logos Text OCR Detection text.ocr Image contains text 
Handwritten text Classification handwritten_text Image contains handwritten text.
Watermark Classification watermark Image contains a watermark (text or logo).
PII & contact information License plate Classification license_plate Image contains a license plate.
Contact info Classification contact_info Image contains contact information.
Phone number Classification phone_number Image contains a phone number.
QR code Classification qr_code Image contains a QR code.
Facebook profile Classification facebook_profile Image looks like a Facebook profile.
Instagram profile Classification instagram_profile Image looks like an Instagram profile.
User profile Classification user_profile Image looks like a user profile of some sort.
Social profile Classification social_profile Image looks like a social media profile of some sort.
Image characteristics Low quality Classification low_quality Image is of poor quality (under/overexposed, grainy, etc).
Orientation Classification misoriented Orientation of image is incorrect (off by 90, 180 or 270 degrees).
Solid color Classification solid_white      
solid_black
Image is a solid white/black image.
Screenshot Classification screenshot Image is a screenshot.
Vehicles Car Classification car Image contains a car.
Car interior Classification car_interior Image contains a car interior.

Making use of predictions in automation rules

This section describes how to leverage the output of the above-listed models.

Image tags

Each of the above-listed models can output one or several image tags, for each of the images contained by an item item submitted to Implio for moderation.

These tags are exposed via the following automation variables:

  • $images[0].tags
  • $images[1].tags

These tags can then be queried using the CONTAINS operator. For instance, you can determine whether the first item's image contains nudity or is suggestive using the following expression:

$images[0].tags CONTAINS ("nudity", "suggestive")

In addition, the $images.tags variable contains the concatenation of tags across all images of an item.

For instance, the following expression checks whether any of the item's images contains nudity or is suggestive:

$images.tags CONTAINS ("nudity", "suggestive")

Those tags are calibrated for high precision (precision is typically around 90%). In other words, you should get a relatively low proportion of false positives.

Uncertain image tags

In addition, image models will output specific tags when the confidence of the prediction is low.

Those tags are exposed via $images[0].uncertain_tags, $images.uncertain_tags and $images.uncertain_tags automation variables.

They are used in the exact same way as regular (high precision) image tags.  
For instance, the following expression will match lower-precision nudity or suggestive images:

$images[0].uncertain_tags CONTAINS ("nudity", "suggestive")

You should use uncertain tags if you need maximum recall (i.e. catch as many images containing the notion you are looking for) at the expense of precision. In other words, you will get a higher proportion of false positives using uncertain tags compared to normal (high-precision) tags.

Uncertain tags are typically sent for manual review, so moderators can have a closer look at images and determine how corresponding items should be handled.

You can create separate automation rules for normal and uncertain image tags and set different actions for them. For instance, you could automatically reject items with normal tags, and send those with uncertain tags to a manual moderation queue for closer inspection. 

Detection models features

Unlike classification models which predict whether the entire image contains the desired notion, detection models detect individual objects within images.

Additional pieces of information – object count and area – are available via specific automation variables.

Optical Character Recognition variables

OCR allows additional variables to be used, merging some concept of images and text detection. Consequently OCR can use the text variable LENGTH.

Variable name Possible values Description
$images[n].text.ocr string

Contains the characters recognized in image n.

$images[2].text.ocr contains “Hello”

$images.text.ocr string

Contains the characters recognized in all images that the item contains.
This is the concatenation of the above $images[n].text.ocr variables, each separated by a line break:
$images[1].text.ocr contains “Bonjour”

$images[2].text.ocr contains “Hello”
 

Sample rule expressions

For instance, the following expression will match any string of text on any pictures that are 30 or more characters long:

LENGTH($images.text.ocr) >= 30

The following expression will match any picture containing 6 or more digits:

$images.text.ocr CONTAINS /\d{6,}/

Object count

The number of objects detected on images are exposed via the following variables:

Variable name Possible values Description
$images[n].<tag>.count integer

Number of <tag> objects detected in image n.

For instance, $images[0].face.count will contain the number of faces detected in the first item's image.

$images.<tag>.count integer

Number of <tag> objects detected across all of the item's images.

For instance, considering an item with 2 images containing 3 faces each, $images.face.count will equal 6.

Object area

The following variables can be used to match images based on the proportion represented by <tag> object(s) over the total image area.

The area for a given tag object (e.g. a face) is calculated as the number of pixels represented by the object's bounding box, divided by the total number of pixels that the image contains. The result is a float number comprised between 0 and 1. 

Variable name Possible values Description
$images[n].<tag>.area

float [0-1]

0 if image n doesn’t contain any <tag> object.

Proportion of image area represented by all <tag> objects in image n.

For instance, if image 0 contains two faces, one representing 10% of the image area and the other 5%, then $images[0].face.area will equal 0.15

$images[n].<tag>.minObjectArea

float [0-1]

n/a (no value) if image n doesn’t contain any <tag> object.

Proportion of image area represented by the smallest <tag> object in image n.

For instance, if image 0 contains two faces, one representing 10% of the image area and the other 5%, then $images[0].face.minObjectArea will equal 0.05 .

$images[n].<tag>.maxObjectArea

float [0-1]

n/a (no value) if image n doesn’t contain any <tag> object.

Proportion of image area represented by the largest <tag> object in image n.

For instance, if image 0 contains two faces, one representing 10% of the image area and the other 5%, then $images[0].face.maxObjectArea will equal 0.1 .

$images.<tag>.area

float [0-1]

0 if none of the item's images contain a <tag> object.

Average area represented by <tag> objects across all of the item's images.

For instance, if an item contains 2 images with one face in each, representing resp. 10% of the first image area and 5% of the second image area, then $images.face.area will equal 0.075 .

$images.<tag>.minArea

float [0-1]

0 if none of the item's images contain a <tag> object.

Minimum area represented by a <tag> object across all of the item's images.

For instance, if an item contains 2 images with one face in each, representing resp. 10% of the first image area and 5% of the second image area, then $images.face.minArea will equal 0.05 .

$images.<tag>.maxArea

float [0-1]

0 if none of the item's images contain a <tag> object.

Maximum area represented by a <tag> object across all of the item's images.

For instance, if an item contains 2 images with one face in each, representing resp. 10% of the first image area and 5% of the second image area, then $images.face.minArea will equal 0.10 .

Sample rule expressions

For instance, the following expression will match items whose first image has text objects representing over 50% of the total image area:

$images[0].text.area > 0.5

Similarly, the following expression will match items where the largest text object across all item's images represents over 50% of the image where it is found:

$images.text.maxArea > 0.5

Was this article helpful?

Can’t find what you’re looking for?

Our customer care team is here for you.