Convert BIRDS 525 dataset for object detection and train Ultralytics YOLOv8

1. Overview

I trained Ultralytics YOLOv8 object detection neural network using the BIRDS 525 SPECIES – IMAGE CLASSIFICATION dataset.

The objective is to train an object detection neural network to be able to estimate the type and location of birds in images.

When the YOLOv8n object detection neural network was trained on the BIRDS 525 dataset alone, with one bird in each image and a size that covered the entire image, the estimation of bird type and location using the BIRDS 525 test image worked nearly as intended.

However, when I used input images taken on my side, where the birds were located in some areas of the image, the estimation failed for many images. Therefore, I increased the scale parameter of augmentation, which changes the size of the training image, from the default value so that training would be performed with larger and smaller sizes of the training images. This slightly improved the estimation results using my test images as input.

In addition, although the number of images is small, I have included data in which the birds are located in some areas of my images in the train and validation data for training. The image size of the BIRDS 525 dataset is 224 x 224 pixels, but the size of my images is 6000 x 4000 pixels. When training YOLOv8n with Ultralytics YOLO, the image size had to be adjusted to the smaller size of 224 x 224 pixels or the training would not proceed as expected. So, my images were resized to the lower resolution of 224 x 224 pixels (when the aspect ratio is different, the image is resized while keeping the aspect ratio, and the empty space seems to be padded with 0), and this combination did not improve the estimation results very much.

Next, I would like to prepare more images for training that I have taken on my side and test how the results will turn out. The types of birds that can be photographed near my house change with the seasons, but I hope to gradually enhance the dataset.

Note:
The following procedure used an ESPRIMO WD2/H2 desktop computer with an NVIDIA GeForce GTX 1650 (GPGPU). The program was run on Ubuntu 22.04 with WSL2 under Windows 11.

I used a PC with the specification in the table at the bottom of this page.

2. Creating birds dataset in Ultralytics YOLO format

2.1. Install the latest Ultralytics YOLO

Follow the instructions on this page to install the latest version of Ultralytics YOLO.

2.2. Download BIRDS 525 SPECIES – IMAGE CLASSIFICATION dataset

Download the BIRDS 525 SPECIES – IMAGE CLASSIFICATION dataset from this page.

2.3. Creating datasets in Ultralytics YOLO format

I prepared a Python Script to create a dataset in Ultralytics YOLO format from the BIRDS 525 dataset on this GitHub page. Execute the following command to obtain the Python Script from GitHub.

git clone https://github.com/fukagai-takuya/birds525yolo.git

Execute the following command to create a dataset in Ultralytics YOLO format from the BIRDS 525 dataset. In the following command, path/to/kaggle/birds525/archive/ is the path to the directory where the BIRDS 525 dataset is located. path/to/outputdir/ is the output directory for the Ultralytics YOLO format dataset.

python3 ./create_yolo_dataset_from_birds525_limit_bird_species.py path/to/kaggle/birds525/archive/ path/to/outputdir/

As described in the README on this page, the following line in create_yolo_dataset_from_birds525_limit_bird_species.py can be modified to change the types of birds in the output dataset. If not modified, the Ultralytics YOLO format datasets for the four bird species “BLUE HERON”, “MALLARD DUCK”, “EUROPEAN TURTLE DOVE”, and “ROCK DOVE” will be output.

def update_train_valid_subdirectory_dict(subdirectories, labels, subdir_dict):

    ...

    target_bird_species = [
        'BLUE HERON',
        'MALLARD DUCK',
        'EUROPEAN TURTLE DOVE',
        'ROCK DOVE',
    ]

    ...

2.4. Bird species in the training data

The BIRDS 525 dataset contains data on 525 bird species, but I decided to select only four species of birds that are similar to those found near my location, as shown in the example code above.

“BLUE HERON” is different from the “GREY HERON” I see in my neighborhood, but it is close in appearance, so I decided to use it as training data. “MALLARD DUCK” is found in a nearby river. The “EUROPEAN TURTLE DOVE” is different from the “ORIENTAL TURTLE DOVE” seen nearby, but it is close in appearance, so I decided to use it as training data. “ROCK DOVE” is also often seen.

One of the reasons for narrowing down the types of birds to be trained is to reduce training time. Because I wanted to run training many times with different conditions, I narrowed down the types of birds.

2.5. About the script to convert the BIRDS 525 dataset to Ultralytics YOLO format

The dataset BIRDS 525 SPECIES – IMAGE CLASSIFICATION contains training, validation, and test images for image classification. The image data is classified according to bird types, but there is no data to indicate where the bird is located in the image. The Python script below, which converts the BIRDS 525 dataset to Ultralytics YOLO format, uses the trained weight data yolov9c.pt to estimate bird locations represented by rectangular regions.

create_yolo_dataset_from_birds525_limit_bird_species.py

The images in the BIRDS 525 dataset contain birds at a size that spreads across the entire image, one bird per image. In addition, the birds in BIRDS 525 are placed in directories separated by types, as shown in the example below.

fukagai@ESPRIMO_WD2H2 MINGW64 /c/dev/data/kaggle/birds525/archive/train
$ ls
'ABBOTTS BABBLER'/              CAPUCHINBIRD/                'GREEN MAGPIE'/                   'PYGMY KINGFISHER'/
'ABBOTTS BOOBY'/               'CARMINE BEE-EATER'/          'GREEN WINGED DOVE'/               PYRRHULOXIA/
'ABYSSINIAN GROUND HORNBILL'/  'CASPIAN TERN'/               'GREY CUCKOOSHRIKE'/               QUETZAL/
'AFRICAN CROWNED CRANE'/        CASSOWARY/                   'GREY HEADED CHACHALACA'/         'RAINBOW LORIKEET'/
'AFRICAN EMERALD CUCKOO'/      'CEDAR WAXWING'/              'GREY HEADED FISH EAGLE'/          RAZORBILL/
'AFRICAN FIREFINCH'/           'CERULEAN WARBLER'/           'GREY PLOVER'/                    'RED BEARDED BEE EATER'/
'AFRICAN OYSTER CATCHER'/      'CHARA DE COLLAR'/            'GROVED BILLED ANI'/              'RED BELLIED PITTA'/
'AFRICAN PIED HORNBILL'/       'CHATTERING LORY'/            'GUINEA TURACO'/                  'RED BILLED TROPICBIRD'/
'AFRICAN PYGMY GOOSE'/         'CHESTNET BELLIED EUPHONIA'/   GUINEAFOWL/                      'RED BROWED FINCH'/
 ALBATROSS/                    'CHESTNUT WINGED CUCKOO'/     'GURNEYS PITTA'/                  'RED CROSSBILL'/
'ALBERTS TOWHEE'/              'CHINESE BAMBOO PARTRIDGE'/    GYRFALCON/                       'RED FACED CORMORANT'/
'ALEXANDRINE PARAKEET'/        'CHINESE POND HERON'/          HAMERKOP/                        'RED FACED WARBLER'/
'ALPINE CHOUGH'/               'CHIPPING SPARROW'/           'HARLEQUIN DUCK'/                 'RED FODY'/
'ALTAMIRA YELLOWTHROAT'/       'CHUCAO TAPACULO'/            'HARLEQUIN QUAIL'/                'RED HEADED DUCK'/
...

If the input is an image data from BIRDS 525 and the resulting estimate using yolov9c.pt is a single bird, it is converted to and saved in the Ultralytics YOLO format on this page.

If multiple birds are detected in one image, as in the image below, these images were not used as training data.

I also excluded images from the training data if birds are not detected, as shown in the image below.

There were many images in which only one bird was detected, and these were used as training data.

For images like the one below that include one bird and objects other than birds in the estimation result, I decided to use only the bird as the training labels.

2.6. About Ultralytics YOLO format

I created a Python Script to convert to Ultralytics YOLO format based on this page.

The contents of the file data.yaml in the output path below would be as follows.

/mnt/c/dev/data/kaggle/birds525/ultralytics_yolo_format_limited_for_webpage/data.yaml

path: /mnt/c/dev/data/kaggle/birds525/ultralytics_yolo_format_limited_for_webpage  # dataset root dir
train: images/train  # train images (relative to 'path')
val: images/val  # val images (relative to 'path')

# Classes
names:
  0: BLUE HERON
  1: EUROPEAN TURTLE DOVE
  2: MALLARD DUCK
  3: ROCK DOVE

Other files under the output directory ultralytics_yolo_format_limited_for_webpage are located as follows.

            - ultralytics_yolo_format_limited_for_webpage/
                ├─ images/
                │   ├─ train/
                │   │   ├─ BLUE_HERON_001.jpg
                │   │   ├─ BLUE_HERON_002.jpg
                │   │   ├─ ...
                │   │   └─ ROCK_DOVE_132.jpg
                │   └─ val/
                │       ├─ BLUE_HERON_1.jpg
                │       ├─ BLUE_HERON_2.jpg
                │       ├─ ...
                │       └─ ROCK_DOVE_5.jpg
                └─ labels/
                    ├─ train/
                    │   ├─ BLUE_HERON_001.txt
                    │   ├─ BLUE_HERON_002.txt
                    │   ├─ ...
                    │   └─ ROCK_DOVE_132.txt
                    └─ val/
                        ├─ BLUE_HERON_1.txt
                        ├─ BLUE_HERON_2.txt
                        ├─ ...
                        └─ ROCK_DOVE_5.txt

The contents of the label file labels/train/MALLARD_DUCK_001.txt under the labels directory are as follows. The leading 2 in the following file corresponds to the MALLARD DUCK number “2: MALLARD DUCK” in the file data.yaml. The four numbers behind it correspond to the center position of the object and the width and height normalized to be a number between 0 and 1 divided by the length of the width and height of the entire image. In the example below, the single bird detected is located almost across the entire image, so the center position is close to (0.5, 0,5) and the width and height are close to (1.0, 1.0).

2 0.5347467660903931 0.5482243299484253 0.9086593389511108 0.8864112496376038

3. training using the generated Ultralytics YOLO format dataset

3.1. training command

The following command in Ultralytics YOLO was executed to train the object detection network. The data parameter is the data.yaml file of the generated Ultralytics YOLO format dataset.

The image size for the BIRDS 525 dataset is 224 x 224 pixels, and training with the larger default image size of 640 did not progress. Therefore, the image size is passed as a parameter, such as imgsz=224.

The yolov8n.pt in the model parameter is the data of the trained network. I specified yolov8n, which has a relatively small network size, as the trained network because I would run the training many times with different settings.

epochs is the number of epochs.

yolo train model=yolov8n.pt data=/mnt/c/dev/data/kaggle/birds525/ultralytics_yolo_format_limited_for_webpage/data.yaml epochs=100 imgsz=224

3.2. training result

The training results are stored in ./runs/detect/train/ under the directory where the above command was executed.

If the training command is executed multiple times, train directories are created in sequence with names like the following.

fukagai@ESPRIMOWD2H2:/repos/birds525yolo$ ls -d ./runs/detect/train*
./runs/detect/train    ./runs/detect/train13  ./runs/detect/train17  ./runs/detect/train4  ./runs/detect/train8
./runs/detect/train10  ./runs/detect/train14  ./runs/detect/train18  ./runs/detect/train5  ./runs/detect/train9
./runs/detect/train11  ./runs/detect/train15  ./runs/detect/train2   ./runs/detect/train6
./runs/detect/train12  ./runs/detect/train16  ./runs/detect/train3   ./runs/detect/train7

The contents of ./runs/detect/train/ are as follows.

fukagai@ESPRIMOWD2H2:/repos/misc/test_clone_birds525/birds525yolo$ ls ./runs/detect/train/
F1_curve.png  args.yaml                        labels_correlogram.jpg  train_batch1.jpg     train_batch2792.jpg
PR_curve.png  confusion_matrix.png             results.csv             train_batch2.jpg     val_batch0_labels.jpg
P_curve.png   confusion_matrix_normalized.png  results.png             train_batch2790.jpg  val_batch0_pred.jpg
R_curve.png   labels.jpg                       train_batch0.jpg        train_batch2791.jpg  weights

The trained network weights obtained in training are stored under the following name. last.pt is the weight file saved at the end of the training, and best.pt is the weight file of the best performance during the training.

fukagai@ESPRIMOWD2H2:/repos/misc/test_clone_birds525/birds525yolo$ ls runs/detect/train/weights/
best.pt  last.pt

3.3. About image files in the train directory

The following six images are part of the output in the train directory.

confusion_matrix is the recognition result of 20 photos of 4 different bird species, 5 in each specie. The image below shows that all 20 images were recognized correctly.
results represents the change in the loss and estimation accuracy metrics as the training progresses. The horizontal axis is the number of epochs, and as the number of epochs increases, the loss decreases and the evaluation index representing estimation accuracy increases. In this training, epochs=100 was used, so the horizontal axis is up to 100.
train_batch0 and train_batch2792 are examples of training images shown with labels. The training image shown in the example below is the result of augmentation applied to the image, which changes the saturation, brightness, translation, scaling, and applies merging of the four images.
val_batch0_labels is an example validation image shown with the labels.
val_batch0_pred is an example of estimation results for a validation image using trained weight data. There is a slight difference in the position of the rectangular area surrounding the bird, but the results in this example is almost identical to val_batch0_labels. When training failed, the estimation results differed significantly from the example in the image below, with val_batch0_pred sometimes displaying no estimation result rectangle and text at all.

4. Estimation of bird species and location using trained weight data

4.1. For test images in BIRDS 525 dataset

The following command was used to perform bird type and location estimation on test images from the BIRDS 525 dataset. The model parameter specifies the weight data best.pt obtained as the output of training. The target image is specified by the source parameter.

yolo predict model=./runs/detect/train/weights/best.pt source=/mnt/c/dev/data/kaggle/birds525/archive/test/EUROPEAN\ TURTLE\ DOVE/1.jpg

For training, I used images from the train and valid directories of the BIRDS 525 dataset, but not the test images from the test directory. Using image data not used for training, the trained network was able to obtain the bird type and location estimation results shown in the image below.

The source parameter of the following command, which estimates bird species and location, can also specify a directory. The example command below specifies the directory where the five test images of “MALLARD DUCK” are located.

yolo predict model=./runs/detect/train/weights/best.pt source=/mnt/c/dev/data/kaggle/birds525/archive/test/MALLARD\ DUCK/

If a directory is specified as above, estimation of bird type and location is executed for all images in the directory. The image below shows the result of the execution.

The resulting image of the estimation will be output to the directory ./runs/detect/predict under the directory where the above yolo predict command was executed. The second run of yolo predict will be output to predict2, and the third run will be output to predict3.

fukagai@ESPRIMOWD2H2:/repos/misc/test_clone_birds525/birds525yolo$ ls -d ./runs/detect/predict*
./runs/detect/predict  ./runs/detect/predict2

4.2. For my images containing birds

When using an image in which a bird was captured at a size close to the entire image, such as the BIRDS 525 image data used for training the object detection neural network, the type and location of the bird was successfully estimated, as shown in the example image below.

However, when an image such as the one below, in which a bird appears in some area of the entire image, is used as input, the estimation of the bird type and location fails.

This will be another example of using an image I took as input. This one also failed to estimate the type and location of the bird. The image on the left shows Eurasian wigeons. They are not in the training data. The image on the right shows Mallard ducks.

5. Training for test images other than BIRDS 525

5.1. When the augmentation parameter “scale” is increased

The initial value of the augmentation parameter “scale”, described in this page, is 0.5.

The scale parameter is described in the comments of the Ultralytics YOLO source code ultralytics/data/augment.py as follows. I set the scale parameter to 0.9 and scaled the image from 10% to 190% for training. I thought that by scaling the training image smaller, it would be possible to handle cases where the bird is in one part of the image, as in the photo I provided.

scale (float): Scaling factor interval, e.g., a scale factor of 0.5 allows a resize between 50%-150%.

The following command was used to run the training.

 yolo train model=yolov8n.pt data=/mnt/c/dev/data/kaggle/birds525/ultralytics_yolo_format_limited_for_webpage/data.yaml epochs=100 imgsz=224 scale=0.9

The following six images are part of the images output to the train directory as in 3.3. above. As indicated by the confusion matrix, all validation images seemed to be recognized correctly even when the scale was changed from the default 0.5 to scale=0.9.

train_batch0 and train_batch2792 contain images that vary in size more than in 3.3. above.

The images below are the results of the estimation using the four images I took as input. The upper left image shows a false positive for “BLUE HERON” in the right area, but “ROCK DOVE” is correctly recognized. As in 4.2. above, the upper right image does not correctly estimate the bird type and location.

The lower right image shows the detection of one of two “MALLARD DUCK” in a partial area.

5.2. Change the augmentation to only a conversion that results in a smaller image

The Ultralytics YOLO source code ultralytics/data/augment.py was slightly modified to change the augmentation process to only a transformation that makes the image smaller, and the same training was performed as in 5.1. above. The birds in BIRDS 525 were taken at a size that spanned the entire image, so I thought it might be more effective to limit the transformation to make them smaller, rather than make them larger.

First, I created a new Ubuntu account to run Ultralytics YOLO, referencing the edited source code.

sudo adduser ultralytics-git
sudo gpasswd -a ultralytics-git sudo
su - ultralytics-git

Next, I installed Ultralytics YOLO under my newly created account ultralytics-git by executing the following “installation commands via Git clone” on the Ultralytics YOLO installation instructions page.

I had to run the following third command “python3 -m pip install -upgrade pip” to run the following fourth command “pip install -e .”.

# Clone the ultralytics repository
$ git clone https://github.com/ultralytics/ultralytics

# Navigate to the cloned directory
$ cd ultralytics

# "pip install -e ." failed when the following command was not executed
$ python3 -m pip install --upgrade pip

# Install the package in editable mode for development
$ pip install -e .

Edit /home/ultralytics-git/.bashrc and add the following line at the end.

export PATH=$PATH:~/.local/bin

The following command was executed to reflect the settings in the environment variables.

$ source .bashrc

Then, I rewrote the code in ultralytics/data/augment.py in the directory where I git cloned Ultralytics YOLO as follows.

ultralytics-git@ESPRIMOWD2H2:/repos2/ultralytics$ git diff
diff --git a/ultralytics/data/augment.py b/ultralytics/data/augment.py
index 3a980222..e651a22d 100644
--- a/ultralytics/data/augment.py
+++ b/ultralytics/data/augment.py
@@ -454,7 +454,8 @@ class RandomPerspective:
         R = np.eye(3, dtype=np.float32)
         a = random.uniform(-self.degrees, self.degrees)
         # a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotations
-        s = random.uniform(1 - self.scale, 1 + self.scale)
+        # s = random.uniform(1 - self.scale, 1 + self.scale)
+        s = random.uniform(1 - self.scale, 1)
         # s = 2 ** random.uniform(-scale, scale)
         R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)

As in 5.1. above, the object detection neural network was trained with the following command.

 yolo train model=yolov8n.pt data=/mnt/c/dev/data/kaggle/birds525/ultralytics_yolo_format_limited_for_webpage/data.yaml epochs=100 imgsz=224 scale=0.9

The following six images are part of the images output to the train directory as in 3.3. and 5.1. above. The confusion matrix for the validation images in the val directory indicates that the images seem to be recognized correctly, except for one case where the background is misidentified as ROCK DOVE.

Comparing train_batch0 and train_batch2792 with the images in 5.1. above, I can confirm that there are training images that have been reduced in size, but not enlarged.

The images below are the results of the estimation using the four images I took as input. In the upper left image, a “ROCK DOVE” is recognized correctly. As in 4.2. and 5.1. above, the bird type and location failed to be estimated in the upper right image.

The lower left image shows several Eurasian wigeons. The trained network recognizes three of them as “MALLARD DUCK”. The Eurasian wigeon is similar to “MALLARD DUCK” among the four different species of birds used for training, so I consider the result is not a bad estimate.

As in 5.1. above, the image at the lower right successfully detects one of the two “MALLARD DUCKs” in a partial area.

The results of bird type and location estimation for my images are a little better than in 5.1. above.

6. Other

I decided not to write on this Blog the changes I have made to include my images in the train and validation data for training, as outlined in 1. above. I tried several conditions through trial and error, but have not obtained the expected results.

Next time, I am going to train the network using only the images I took, instead of using the BIRDS 525 data.

7. Note

When bird positions can be correctly estimated from the custom images using trained YOLO weight data such as yolov9c, the bird position information can be output to a text file by adding the option save_txt=true.

yolo predict model=yolov9c save_txt=true source=/mnt/c/dev/data/custom_data/

The output text file is shown in the example below. In yolov9c, if the label number 14 at the beginning of the line indicating the bird is rewritten to a number corresponding to the bird type of the custom training data, it can be used as a label file for training the bird type estimation network.

ultralytics-git@ESPRIMOWD2H2:/repos2/ultralytics$ cat ./runs/detect/predict43/labels/Duck_DSC00154-min.txt
14 0.649205 0.246654 0.0571026 0.0965638
14 0.629693 0.66835 0.111612 0.08429
14 0.121613 0.915089 0.0674569 0.12507
14 0.328448 0.472258 0.0662099 0.100589
14 0.284406 0.400611 0.0694954 0.0906924
14 0.623863 0.538058 0.0708187 0.0899402