Types of Computer Vision | CrowdAI Knowledge Base

Let’s use this image of a cake as a really good example to illustrate the types of computer vision “recipes”. Side note: these “recipes” are often called “architectures."

The four types of computer vision architectures we offer are classification, detection, segmentation, and Keypoints. While you don’t necessarily need to remember these names, it is helpful to understand what each is capable of doing, so you know which one you might want to use on your own images and videos

Classification Architecture

Classification is the simplest type of computer vision. Think of it as just adding a tag or a label to the image to say what’s in the image.

Classification models can have one or many tags, which are called classes. Here’s how that looks using our cake image:

Binary classification model - yes/no or true/false that a class is present in the image
Multi-label classification model - yes/no or true/false that each of one or more classes (or labels) are present in the image

So, a simple binary classification model to detect cake in our example image might look like this:

Input	Output
	cake = true

Detection Architecture

The next type of CV is detection, often also called object detection. It’s pretty easy to create and can be used very flexibly.

With a detection model, you’re looking to draw a box around the object you’re looking for in the image (these are called bounding boxes, as they represent the “bounds” of the object within the image). Whereas classification could only tell you a “yes/no” answer, detection gives you that same answer, as well as the size and location of the cake in the image.

Like classification, detection models can have one or more classes of object. So, you could train a model to just draw boxes around cake, or one that attempts to do the same for both cake and candles at the same time.

So, a detection model to just detect cake might look like this:

Input	Output

Segmentation Architecture

Then we have a more advanced type of CV: segmentation, more formally called semantic segmentation. Segmentation takes all the pixels that make up an image and groups them based on whether or not they’re a part of the object you’re looking for.

This allows you to get the precise group of pixels that represent your object, opening up a lot of more advanced analysis that we’ll get into later.

Just like classification and detection, segmentation models can have one or more classes of object.

A simple segmentation model to detect cake might look like this:

Input	Output

Segmentation is where things get really interesting. The human eye can differentiate multiple examples of the same object at once and our brains can understand that those objects are distinct from one another. When you look at a lush green tree, you can see all of the leafy area as one big blob of leaves, but you are also able to pick out an individual leaf and know that it’s separate from the other leaves nearby.

With segmentation, we can train a model to do the same thing. This is known as instance segmentation.

With instance segmentation, you can teach a model to understand the difference between one instance of cake, and a separate instance of cake in the same image. It might look like this:

Input	Output

Keypoint Architecture

Lastly, The Saab, Inc. Computer Vision Platform hosts a KeyPoint model architecture. KeyPoint detection provides essential information about the location, pose, and structure of objects or entities within an image. This is a more detailed architecture that requires an object of interest have unique features that can be identified.

Common uses for KeyPoint detection are:

Human or facial feature recognition
Object tracking and identification
Augmented reality

A KeyPoint model might look something like this:

Input	Output