Performing Image Annotation using Python and OpenCV | by Wei-Meng Lee | Apr, 2023

By Jessie Hobb On Apr 27, 2023

Learn how to create bounding boxes for your images

One of the common tasks in deep learning is object detection, a process in which you locate specific objects in a given image. An example of object detection is detecting cars in an image, where you could tally the total number of cars detected in an image. This might be useful in cases where you need to analyze the traffic flow at a specific junction.

In order to train a deep learning model to detect specific objects, you need to supply your model with a set of training images, with the coordinates of the specific object in the images all mapped out. This process is known as image annotation. Image annotation assigns labels to objects present in an image, with the objects all marked out.

In this article, I will show you how to use Python and OpenCV to annotate your images — you will use your mouse to mark out the object that you are annotating and the application will draw a bounding rectangle around the object. You can then see the coordinates of the object you have mapped out and optionally save it to a log file.

First, create a text file and name it as bounding.py. Then, populate it with the following statements:

import argparse
import cv2ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required = True, help = "Path to image")
args = vars(ap.parse_args())
# load the image
image = cv2.imread(args["image"])
# reference to the image
image_clone = image
# loop until the 'q' key is pressed
while True:
# display the image 
cv2.imshow("image", image)
# wait for a keypress
key = cv2.waitKey(1)
if key == ord("c"):
break
# close all open windows
cv2.destroyAllWindows()

The above Python console application takes in an argument from the console, which is the name of the image to display. Once the image name is obtained, you will use OpenCV to display the image. At the same time, you want to clone the image so that you can use it later on. To stop the program, you can press Ctrl-C on your keyboard.

To run the program, go to Terminal and type in the following command:

$ python bounding.py -i Cabs.jpg

The above Cabs.jpg file can be downloaded from https://en.wikipedia.org/wiki/Taxi#/media/File:Cabs.jpg.

The image should now be displayed:

Source: https://en.wikipedia.org/wiki/Taxi#/media/File:Cabs.jpg. Image by Users Omnibus, Uris on en.wikipedia — Uris took this photograph.

We want the user to be able to click on the image using their mouse and then drag across the image to select a particular region of interest (ROI). For this, let’s add two global variables into the program:

import argparse
import cv2# to store the points for region of interest
roi_pt = []
# to indicate if the left mouse button is depressed
is_button_down = False

The following figure shows how roi_pt will store the coordinates of the ROI:

You will now define a function name draw_rectangle() to be the handler for mouse clicks. This function takes in five arguments — event, x, y, flags, and param. We will only be using the first three arguments for this exercise:

def draw_rectangle(event, x, y, flags, param):
global roi_pt, is_button_downif event == cv2.EVENT_MOUSEMOVE and is_button_down:
global image_clone, image
# get the original image to paint the new rectangle
image = image_clone.copy()
# draw new rectangle
cv2.rectangle(image, roi_pt[0], (x,y), (0, 255, 0), 2)
if event == cv2.EVENT_LBUTTONDOWN:
# record the first point
roi_pt = [(x, y)]  
is_button_down = True
# if the left mouse button was released
elif event == cv2.EVENT_LBUTTONUP:        
roi_pt.append((x, y))     # append the end point
# ======================
# print the bounding box
# ======================
# in (x1,y1,x2,y2) format
print(roi_pt)                  
# in (x,y,w,h) format
bbox = (roi_pt[0][0],
roi_pt[0][1],
roi_pt[1][0] - roi_pt[0][0],
roi_pt[1][1] - roi_pt[0][1])
print(bbox)
# button has now been released
is_button_down = False
# draw the bounding box
cv2.rectangle(image, roi_pt[0], roi_pt[1], (0, 255, 0), 2)
cv2.imshow("image", image)

In the above function:

When the left mouse button is depressed (cv2.EVENT_LBUTTONDOWN), you record the first point of the ROI. You then set the is_button_down variable to True so that you can start drawing a rectangle when the user moves his mouse while depressing the left mouse button.
When the user moves the mouse with the left mouse button depressed (cv2.EVENT_MOUSEMOVE and is_button_down), you will now draw a rectangle on a copy of the original image. You need to draw on a clone image because as the user moves the mouse you need to also remove the previous rectangle that you have drawn earlier. So the easiest way to accomplish this is to discard the previous image and use the clone image to draw the new rectangle.
When the user finally releases the left mouse button (cv2.EVENT_LBUTTONUP), you append the end point of the ROI to roi_pt. You then print out the bounding box coordinates. For some deep learning packages, the bounding box coordinates are in the format of (x,y,width, height), so I also computed the ROI coordindates in this format:

Finally, draw the bounding box for the ROI

To wire up the mouse events with its event handler, add in the following statements:

...# reference to the image
image_clone = image
# ======ADD the following======
# setup the mouse click handler
cv2.namedWindow("image")
cv2.setMouseCallback("image", draw_rectangle)
# =============================
# loop until the 'q' key is pressed
while True:
...

Run the program one more time and you can now select the ROI from the image and a rectangle will be displayed:

At the same time, the coordinates of the ROI will also be displayed:

[(430, 409), (764, 656)]
(430, 409, 334, 247)

For your convenience, here is the complete Python program:

import argparse
import cv2# to store the points for region of interest
roi_pt = []
# to indicate if the left mouse button is depressed
is_button_down = False
def draw_rectangle(event, x, y, flags, param):
global roi_pt, is_button_down
if event == cv2.EVENT_MOUSEMOVE and is_button_down:
global image_clone, image
# get the original image to paint the new rectangle
image = image_clone.copy()
# draw new rectangle
cv2.rectangle(image, roi_pt[0], (x,y), (0, 255, 0), 2)
if event == cv2.EVENT_LBUTTONDOWN:
# record the first point
roi_pt = [(x, y)]  
is_button_down = True
# if the left mouse button was released
elif event == cv2.EVENT_LBUTTONUP:        
roi_pt.append((x, y))     # append the end point
# ======================
# print the bounding box
# ======================
# in (x1,y1,x2,y2) format
print(roi_pt)                  
# in (x,y,w,h) format
bbox = (roi_pt[0][0],
roi_pt[0][1],
roi_pt[1][0] - roi_pt[0][0],
roi_pt[1][1] - roi_pt[0][1])
print(bbox)
# button has now been released
is_button_down = False
# draw the bounding box
cv2.rectangle(image, roi_pt[0], roi_pt[1], (0, 255, 0), 2)
cv2.imshow("image", image)
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required = True, help = "Path to image")
args = vars(ap.parse_args())
# load the image
image = cv2.imread(args["image"])
# reference to the image
image_clone = image
# setup the mouse click handler
cv2.namedWindow("image")
cv2.setMouseCallback("image", draw_rectangle)
# loop until the 'q' key is pressed
while True:
# display the image 
cv2.imshow("image", image)
# wait for a keypress
key = cv2.waitKey(1)
if key == ord("c"):
break
# close all open windows
cv2.destroyAllWindows()

If you like reading my articles and that it helped your career/study, please consider signing up as a Medium member. It is $5 a month, and it gives you unlimited access to all the articles (including mine) on Medium. If you sign up using the following link, I will earn a small commission (at no additional cost to you). Your support means that I will be able to devote more time on writing articles like this.

In this short article, I demonstrated how you can annotate an image by selecting the object in an image. Of course, once the coordinates of the object have been mapped up, you need to store it in an external file (such as a JSON or CSV file). For this, I will leave it as an exercise to the reader. Let me know if this is useful, or what are some of the annotation tools you use in your daily work.