Train Ultralytics YOLOv8 Object Detection Network with converted BIRDS 525 dataset on Google Colab

1. Overview

I tested the operation of training the YOLOv8 object detection network with converted BIRDS 525 dataset described on this page using the free version of Google Colab.

Although the free version of Google Drive is limited to 15 GB, the BIRDS 525 dataset is 1.96 GB in its zipped form. So, we can run this operation on the free version.

A training of 100 epochs using yolov8n, which has a relatively small network size, could be completed in about 8 minutes and 30 seconds.

2. Upload a compressed file of BIRDS 525 dataset on Google Drive
2.1. Download BIRDS 525 SPECIES – IMAGE CLASSIFICATION dataset

Download the BIRDS 525 SPECIES – IMAGE CLASSIFICATION dataset from this page.

2.2. Upload archive.zip to Google Drive

Upload the archive.zip downloaded in 2.1. above to Google Drive.

3. Running commands on Google Colab
3.1. Mounting Google Drive

Write the following Python script in the code cell and run the script. Google Drive will be mounted on /content/drive.

from google.colab import drive
drive.mount('/content/drive')
3.2. Decompress BIRDS 525 compressed file

Write the following in the code cell and execute the script. Data archive.zip will be copied from Google Drive to Google Colab and extracted. It took 1 minute and 30 seconds to extract the zip file.

%%bash
mkdir -p kaggle/birds525
cd kaggle/birds525
cp /content/drive/MyDrive/kaggle/birds525/archive.zip .
unzip archive.zip
cd /content
Note:
I have also tried extracting files on Google Drive by writing the following script in a code cell. By doing so, the extracted files will remain even if the Google Colab connection is lost. In this case, however, it took about 20 minutes to extract the zip file.

If the data is placed in Google Drive and referenced from Google Colab, training the neural network also seems to take long time. Although we will need to re-deploy it when we reconnect, it would be better to copy the data to Google Colab before deploying it.

%%bash
cd /content/drive/MyDrive/kaggle/birds525/
unzip archive.zip
3.3. Installing Ultralytics YOLO

Write the following in the code cell and execute the script.

%pip install ultralytics
import ultralytics
ultralytics.checks()
3.4. Creating datasets in Ultralytics YOLO format

I have prepared a Python script on this GitHub page to create an Ultralytics YOLO format dataset from the BIRDS 525 dataset. Write the following in the code cell and execute the script.

!git clone https://github.com/fukagai-takuya/birds525yolo.git

Write the following in the code cell and execute the script. The script below creates a dataset in Ultralytics YOLO format from the BIRDS 525 dataset.

In the script below, /content/kaggle/birds525/ is the directory where the expanded BIRDS 525 dataset is located. /content/birds525-yolo-data is the output directory for the generated Ultralytics YOLO format dataset.

%%bash
mkdir birds525-yolo-data
cd birds525yolo
python3 ./create_yolo_dataset_from_birds525_limit_bird_species.py /content/kaggle/birds525/ /content/birds525-yolo-data
cd /content

The following log will be output. As described on this page, some images were excluded from the training data when generating the YOLO format training data, for example, when multiple birds were detected even though only one bird was supposed to be in the image.
When an excluded image is processed, a message beginning with Failed is output. Although many messages beginning with Failed will be output, there is no problem.

/content/kaggle/birds525/ satisfy the requirement of birds525/archive/
Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov9c.pt to 'yolov9c.pt'...
YOLOv9c summary: 618 layers, 25,590,912 parameters, 0 gradients, 104.0 GFLOPs
valid
Failed: len(boxes.cls):2, label_name:BLUE HERON, image_file:2.jpg
success_counter: 19
...
failure_counter_single_bird_multiple_objects: 1
train
Failed: number_of_birds:0, label_name:BLUE HERON, image_file:088.jpg
...
Failed: number_of_birds:2, label_name:ROCK DOVE, image_file:115.jpg
success_counter: 471
failure_counter_results_not_one: 0
failure_counter_no_birds: 11
failure_counter_multiple_birds: 65
failure_counter_single_bird_multiple_objects: 25
100%|██████████| 49.4M/49.4M [00:00<00:00, 362MB/s]
3.5. Running the training command

Write the following in the code cell and execute the script.

The yolov8n.pt set in the model parameter is the data of the trained network. To ensure short training times, yolov8n, which has a relatively small network size, is specified as the trained network. The network trained on other data is used as the initial network, and the data prepared this time is used for training.

The data parameter is the data.yaml file of the generated Ultralytics YOLO format dataset. The parameter “epochs” is the number of epochs. In the example below, 100 is specified, so the network will be trained using 100 repetitions of the prepared set of training data.

The image size for the BIRDS 525 dataset is 224 x 224 pixel, and training with the larger default image size of 640 did not progress. Therefore, the image size is passed as a parameter, such as imgsz=224.

!yolo train model=yolov8n.pt data=/content/birds525-yolo-data/data.yaml epochs=100 imgsz=224

The training could be completed in about 8 minutes and 30 seconds, and the following message was output to the log.

Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n.pt to 'yolov8n.pt'...
100% 6.25M/6.25M [00:00<00:00, 280MB/s]
Ultralytics YOLOv8.2.98 🚀 Python-3.10.12 torch-2.4.1+cu121 CUDA:0 (Tesla T4, 15102MiB)
engine/trainer: task=detect, mode=train, model=yolov8n.pt, data=/content/birds525-yolo-data/data.yaml, epochs=100, time=None, patience=100, batch=16, imgsz=224, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train
Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'...
100% 755k/755k [00:00<00:00, 130MB/s]
Overriding model.yaml nc=80 with nc=4

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]     

 ...
 ...
 ...

 22        [15, 18, 21]  1    752092  ultralytics.nn.modules.head.Detect           [4, [64, 128, 256]]           
Model summary: 225 layers, 3,011,628 parameters, 3,011,612 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs/detect/train', view at http://localhost:6006/
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
train: Scanning /content/birds525-yolo-data/labels/train... 496 images, 0 backgrounds, 0 corrupt: 100% 496/496 [00:00<00:00, 1829.14it/s]
train: New cache created: /content/birds525-yolo-data/labels/train.cache
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, num_output_channels=3, method='weighted_average'), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
val: Scanning /content/birds525-yolo-data/labels/val... 20 images, 0 backgrounds, 0 corrupt: 100% 20/20 [00:00<00:00, 1490.62it/s]
val: New cache created: /content/birds525-yolo-data/labels/val.cache
Plotting labels to runs/detect/train/labels.jpg... 
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
optimizer: AdamW(lr=0.00125, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
TensorBoard: model graph visualization added ✅
Image sizes 224 train, 224 val
Using 2 dataloader workers
Logging results to runs/detect/train
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      1/100     0.386G     0.9005      3.037      1.234         41        224: 100% 31/31 [00:07<00:00,  4.10it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 1/1 [00:01<00:00,  1.65s/it]
                   all         20         20     0.0162          1      0.489       0.44

      ...
      ...
      ...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
    100/100     0.348G     0.1748      0.178     0.8938         16        224: 100% 31/31 [00:03<00:00,  9.97it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 1/1 [00:00<00:00, 10.73it/s]
                   all         20         20      0.988          1      0.995      0.945

100 epochs completed in 0.132 hours.
Optimizer stripped from runs/detect/train/weights/last.pt, 6.2MB
Optimizer stripped from runs/detect/train/weights/best.pt, 6.2MB

Validating runs/detect/train/weights/best.pt...
Ultralytics YOLOv8.2.98 🚀 Python-3.10.12 torch-2.4.1+cu121 CUDA:0 (Tesla T4, 15102MiB)
Model summary (fused): 168 layers, 3,006,428 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 1/1 [00:00<00:00,  9.44it/s]
                   all         20         20      0.984          1      0.995      0.954
            BLUE HERON          5          5      0.984          1      0.995        0.9
  EUROPEAN TURTLE DOVE          5          5      0.976          1      0.995      0.995
          MALLARD DUCK          5          5      0.991          1      0.995      0.926
             ROCK DOVE          5          5      0.984          1      0.995      0.995
Speed: 0.0ms preprocess, 0.6ms inference, 0.0ms loss, 1.1ms postprocess per image
Results saved to runs/detect/train
💡 Learn more at https://docs.ultralytics.com/modes/train
3.6. Running object detection on trained network

Write the following in the code cell and execute the script.

This is the object detection process using the network trained in 3.5 above.
The model parameter specifies the weight data best.pt obtained after training.
The input image specified with the source parameter is an image from the test directory of the BIRDS 525 dataset and is not used for training.

!yolo predict model=/content/runs/detect/train/weights/best.pt source="/content/kaggle/birds525/test/EUROPEAN TURTLE DOVE/1.jpg"

It was executed in about 7 seconds and the following message was output to the log.

Ultralytics YOLOv8.2.98 🚀 Python-3.10.12 torch-2.4.1+cu121 CUDA:0 (Tesla T4, 15102MiB)
Model summary (fused): 168 layers, 3,006,428 parameters, 0 gradients, 8.1 GFLOPs

image 1/1 /content/kaggle/birds525/test/EUROPEAN TURTLE DOVE/1.jpg: 224x224 1 EUROPEAN TURTLE DOVE, 20.2ms
Speed: 1.3ms preprocess, 20.2ms inference, 858.0ms postprocess per image at shape (1, 3, 224, 224)
Results saved to runs/detect/predict
💡 Learn more at https://docs.ultralytics.com/modes/predict

Double-click on the detection result image /content/runs/detect/predict/1.jpg displayed in the folder on the left to see the detection result as shown in the image below. The network have successfully detected a EUROPEAN TURTLE DOVE.

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA