Inference is the process of showing imagery/video to a model and the model attempting to label that media. It’s called inference because the model is using the knowledge you trained into it to infer if and where the target object(s) are present. This is the stage where you let the model you worked so hard to train do some work for you!
Types of inference
With computer vision, there’s typically two types of inference. Time is the most important factor in deciding which one you need.
Ask yourself: How often do I need the model to analyze new imagery?
Constant (or ongoing) inference is when you set up your model to constantly analyze new images or video as they are available. Think of it as a pipeline or a queue: as new imagery is created, it’s fed into the model’s pipeline for processing, and continues in a straight line until inference is complete. This can be done in real time or near-real time, depending on a number of factors (e.g. your hardware set-up). With real-time inference, you’re expecting the model to provide outputs almost instantaneously as it’s fed new imagery; with near-real-time inference, you might be okay with a delay of a few seconds or minutes.
Batch inference is when you want to feed media into your model in a more manual way. This is useful when your media is not created on an ongoing or predictable basis. As you have new media that the model hasn’t processed yet, you can use the platform’s interface to push that media to the model of your choice and await results. This is also useful when you want a “one-off” processing job, as it doesn’t require setting up dedicated data pipelines.