Detection

After learning about computer vision in general and how YOLO works, we can start using YOLO for our purposes. We will see, how much we can do with little code.

Project Setup

We start with the project structure from before and create a new jupyter notebook yolo_detect.ipynb

📁 computer_vision/
    ├── 📁 .venv/
    ├── 📁 pics/
    └── 📄 yolo_detect.ipynb

Make sure the virtual environment (here .venv) is selected and all needed packages (ultralytics) are installed.

Now we can download our test pictures, extract the ZIP file and save it in the 📁 pics/ folder.

Test Pictures

Now we are all set for our first detection 🎉

We will start with the picture pic2.jpg which is quite challenging due to motion blur.

Inference

In machine learning, inference refers to the process of using a trained model to make predictions or decisions on new, unseen data. It's the production phase where the model applies what it learned during training to analyze new inputs and generate outputs, like when a trained image recognition model identifies objects in a new photo.

Pretrained Models

Fortunately, so that we don't have to start from scratch, there are already pre-trained models from YOLO that we can use. These pre-trained models have been trained with the help of a lot of data (for detection e.g. on the COCO dataset) and are now available to us without any further effort. For example for detection:

Model	size ^(pixels)	mAP^val 50-95	Speed ^{CPU ONNX (ms)}	Speed ^{T4 TensorRT10 (ms)}	params ^(M)	FLOPs ^(B)
YOLO11n	640	39.5	56.1 ± 0.8	1.5 ± 0.0	2.6	6.5
YOLO11s	640	47.0	90.0 ± 1.2	2.5 ± 0.0	9.4	21.5
YOLO11m	640	51.5	183.2 ± 2.0	4.7 ± 0.1	20.1	68.0
YOLO11l	640	53.4	238.6 ± 1.4	6.2 ± 0.1	25.3	86.9
YOLO11x	640	54.7	462.8 ± 6.7	11.3 ± 0.2	56.9	194.9

Available pretrained YOLO models for detection (Source: Ultralytics)

Running the Detection

Based on these pretrained models, to detect objects in an image with YOLO, only a few lines of code are required:

# Import required librarys
from ultralytics import YOLO

# Define the path to the source picture
picpath = "pics/pic2.jpg"

# Load a pretrained YOLO11 Model (Size: Nano)
model_det = YOLO("yolo11n.pt")

# Apply the model to our source picture
results = model_det(picpath)

>>> Output

image 1/1 c:\path\to\pics\pic2.jpg: 448x640 6 persons, 1 car, 5 motorcycles, 1 traffic light, 1 stop sign, 40.3ms
Speed: 2.0ms preprocess, 40.3ms inference, 0.0ms postprocess per image at shape (1, 3, 448, 640)

And that's it! Your performed your first detection.

Analyzing the Output

Now we can take a closer look at the different parts of the output:

Information about the image:
- image 1/1 c:\path\to\pics\pic2.jpg: 448x640
  
  This includes the path to the image and the image size YOLO uses for the detection (this is not the original image size)
- shape (1, 3, 448, 640)
  
  batch size (number of pictures): 1
  
  picture channels: 3 (RGB)
  
  picture hight: 448
  
  picture width: 640
Detected objects:
- 6 persons, 1 car, 5 motorcycles, 1 traffic light, 1 stop sign
Speed Metrics:
- Speed: 2.0ms preprocess, 40.3ms inference, 0.0ms postprocess per image

But are those all of the results? Where are those objects in the image? How sure are we, that there are six persons? We need to go deeper into the results.

Understanding the results

The before seen output of the detection is just a brief overview. All the information is stored in results

# Get the first (and only) image's results
result = results[0]

print(result)

Task: Analyze the Results

Take a look at the result and answer the following questions (don't forget to use google and the docs)

What is the original shape of the image?
How many different classes are available and what are they?
Why is keypoints, masks and obb None? What do you think?

Now go deeper and analyze result.boxes

What are the detected classes? How do they respond to the labels (person, car,...)
How sure is YOLO about the detected objects? (hint: the right wording is confidence)
What are the differences between xywh, xywhn, xyxy and xyxyn
data is the collection of what?
How are data and shape are connected?

Multiple Images

You can also pass multiple images at once. To access the results for a specific image, you need to choose one from the results list.

results = model_det(['pics/pic1.jpg','pics/pic1.jpg'])
first_pic_result = results[0]
second_pic_result = results[1]

Now, that we are familiar with the results, we can write a little function to print a more detailed overview:

def print_overview(result): 
    # Print object count
    print(f"Detected {len(result.boxes)} objects")

    # Examine each detection
    for box in result.boxes:
        # Get class name
        class_id = int(box.cls)
        class_name = model_det.names[class_id]

        # Get confidence
        confidence = float(box.conf)

        # Get coordinates (x1, y1, x2, y2 format)
        x1, y1, x2, y2 = box.xyxy[0].tolist()

        print(f"\nDetection:")
        print(f"- Class: {class_name}")
        print(f"- Confidence: {confidence:.2f}")
        print(f"- Coordinates: ({x1:.1f}, {y1:.1f}) to ({x2:.1f}, {y2:.1f})")

print_overview(result)

>>> Output

Detected 14 objects

Detection:
- Class: motorcycle
- Confidence: 0.81
- Coordinates: (2177.1, 3142.4) to (3490.8, 4281.8)

Detection:
- Class: motorcycle
- Confidence: 0.75
- Coordinates: (0.0, 3574.5) to (755.0, 4952.4)

...

So far we have everything we need to detect objects in images and work with the results.

But sometimes a picture is worth a thousand words!

Visualize the Results

In some cases, the 'simple' extraction of the results is not sufficient and a visualization is needed.

For visualizing the results there are two options available: showing and saving the picture.

Showing the Resulting Image

YOLO makes it really easy to show the resulting image. With the method .show() the result can be visualized:

result.show()

This should result in the above shown image.

Saving the Resulting Image

There are multiple ways to save the resulting image.

Saving Images

Option 1: YOLOOption 2: openCV

YOLO offers various visualization arguments, which can be used directly in the inference command. One of them is save=True

results = model_det(picpath, save=True)

Unless other specified, the file will be saved in a new folder 📁 runs/detect/predict/. Use the argument save_dir='your/custom/path' to specify a different folder.

When you install ultralytics the package opencv-python will be installed automatically as dependency. openCV is a powerful tool for image and video processing and can also be used for saving the image

annotated_image = result.plot()

# Display using OpenCV
import cv2

# Save the image
cv2.imwrite("output_detection.jpg", annotated_image)

Unless other specified, the file will be saved directly into the working directory 📁 computer_vision.

See the Magic Happen

If you are interessted in seeing, what the model is seeing and doing, you can use another visualization argument

results = model_det(picpath, visualize=True)

The resulting pictures in the folder 📁 runs/detect/predict/pic2 are a 'Intermediate Features Visualization'

This parameter saves visualizations of the intermediate feature maps or activations from the YOLO model.
These feature maps show what parts of the image the model focuses on during different stages of the neural network.
These outputs help in debugging and understanding the model's inner workings.

Inference Arguments

After we worked on the output side of the Yolo, it's now time to focus more on the input side. Besides those visualization arguments we have used before, there are numerous inference arguments, which can be handed over to the model. An overview can be found in the documentation

Task: Inference Arguments

Confidence
- Run a detection with confidence threshold 0.5.
- Save this image as high_conf.jpg
- Now try confidence 0.25, save as low_conf.jpg
- Compare both images - what differences do you notice?
- What is the default value?
Classes
- Limit your detection to just detect motorcycle and car
- Save the results as class_limit.jpg
Adjusting the output path
- Use the two inference arguments project and name to adjust the output folder to 📁 output_pics/detection_pics/.
Adjust the visual outcome
- Run a detection for our pic2.jpg image and adjust:
  - no label should be shown
  - no confidence should be shown
  - line width of the boxes should be 3

Further Adjustments

As we have discussed before, YOLO comes in different model sizes, trading speed for accuracy.

Task: Model Comparison

For each model size determine the following characteristics by running a detection on the pic2.jpg:

Time the detection speed
Count detected objects
Compare confidence scores
Create a table with your findings:

Model	Detection Time	Objects Found	Avg Confidence
nano
small
medium
large
extra large

🎉 Congratulations

You are now able to perform object detection on images and work with the results!

Detection Fail (Source: Visualizing Object Detection Features on Springer Nature Link)