Detectron2 Starter Guide for Researchers | by Faruk Cankaya | Oct, 2022

By Jessie Hobb On Oct 5, 2022

GETTING STARTED, DETECTRON2, DEEP LEARNING

Common steps to know to start a project on top of detectron2 the state-of-the-art detection and segmentation framework

Detectron2 is one of the most powerful deep learning toolboxes for visual recognition. It is designed as flexible as possible to easily switch between different tasks such as Object Detection, Instance Segmentation, Person Keypoint Detection, Panoptic Segmentation, etc. It has built-in support for popular datasets COCO, Cityscapes, LVIS, PascalVOC, and many backbone combinations Faster/Mask R-CNN(Resnet + FPN, C4, Dilated-C). Also, it provides ready-to-use baselines with pre-trained weights[1].

I was looking for a toolbox for 2D Instance Segmentation and Object Detection tasks to apply in a single codebase. At first glance, detectron2 seemed to me the most preferable option compare to its competitors in terms of speed, flexibility, and simplicity of provided API. Still, it is but some features like early stopping and validation loss are not provided out-of-the-box for the sake of flexibility(I guess). It doesn’t have an explicit epoch definition but you can do it yourself.

For a deep learning project, we need a data loader to load data from files into memory in an appropriate format, a training loop to iterate over the loaded dataset by batches, a model to be trained, an evaluator to be called periodically during training to test the performance of the model, and a logger to save intermediate results and metrics periodically.

Detectron2 proposes a hook system for the tasks that are called periodically during training, an abstraction for dataset registration, and a flexible configuration file where you can customize almost every part of the built-in modules. I suggest having looked at the detectron2/projects to see how they can be customized for different tasks. In this section, I’ll explain the default implementations and abstractions that Detectron2 provides to bring all these needs together so that you can build your own project on top of detectron2.

Data Loading

Using Builtin Datasets and Using Custom Datasets shows how to set up built-in datasets and register new datasets very well. Also, there are a couple of great articles about how to set up a custom dataset on detectron2 [2, 3]. I’ll explain how it is working by referring to the codebase.

Detecron2 has two global dictionaries; DatasetCatalog for loading the raw data into memory and storing them in a predefined format(such as masks in bitmap or polygons), and MetadataCatalog for storing metadata of the dataset such as label id, label name, and color of labels. For example, the registration process for ADEChallengeData2016 is like that:

DatasetCatalog.register(name, lambda x=image_dir, y=gt_dir: load_sem_seg(y, x, gt_ext="png", image_ext="jpg"))MetadataCatalog.get(name).set(..., evaluator_type="sem_seg")

All Builtin datasets are registered with a predefined dataset name in builtin.py in that way. Built-in datasets can be chosen in the configuration file by these predefined names e.g. coco2017. You can register a new dataset by using a different name and loader method instead of name and load_sem_seg in the example above.

It is important to give the loader method inside the lambda function. Because data is not loaded when DatasetCatalog.register() is called but the lambda function is saved. The data is loaded only when you call DatasetCatalog.get(name) by using the loader function that you provide.

Once we have registered the dataset, it is an easy piece to access data:

Hook System and Trainers

Detectron2 provides DefaultTrainer that set up default logic for standard training workflow. It has the methods depicted in the image below that you can extend according to your needs.

Figure 1: Detectron2’s DefaultTrainer and Hook Functions (by Author)

Basically, the first 5 methods are called when a DefaultTrainer is initiated. Once the train method is called, it starts a training loop that will run until the iteration count reaches the cfg.SOLVER.MAX_ITER which is defined in the config file. The test method loads the validation dataset and runs the evaluation script periodically during training.

The good part of DefaultTrainer is having a hook system. Hooks are simple Python classes that can implement 4 methods. In the training loop, the corresponding methods of every hook registered are called. For instance, EvalHook is registered in the 5th step during the initiation of the trainer. It only implements after_step and after_train methods in which it calls the test method depending on the iteration count. So that we can run an evaluation on the validation dataset every ‘N’ iteration that we can configure. In this way, any custom feature can be implemented to be called during training. If you have looked at build_hooks method, many other tasks such as saving models periodically, learning rate scheduling, tensorboard logging, etc. are handled by hooks. If you like this abstraction, you can go with train_net.py. Otherwise, you can check plain_train_net.py to see how to implement a custom training loop using existing methods.

Model

Detectron2 provides a meta-architecture with 3 major blocks that can be generalized for various visual recognition tasks.

Detectron2 Mask RCNN Architecture — Figure 2: Detectron2’s flexible architecture (Illustration by Author, Horse image from COCO2017)

In the image above, I put the default choice of implementations for each block but every model can be changed through the configuration file. Similar to datasets, those models are also used based on a registration system. For example, ResNet is implemented in a method named build_resnet_backbone and it is registered with @BACKBONE_REGISTRY.register() the decorator in backbone/resnet.py. You can find all available backbones by searching @BACKBONE_REGISTRY.register() in detectron2. Finally, you can tell detectron2 which backbone to use by cfg.MODEL.BACKBONE.NAME = “build_resnet_backbone” in config file. Here I listed currently available implementations:

Table by Author

If you need to implement a completely new architecture, you can register your models as described here and then tell detectron2 to use your custom models through the configuration file.

Even if you want to make small changes to existing models, I would suggest registering a new model that extends the existing ones. Thus, you can keep all required files for your work decoupled from detectron2 so that you do not need to check diff in detectron2 to see which parts you code.

In detectron2/tools, there are 4 training scripts available:

train_net.py: A fast way to start training/inference. It implements Trainer which encapsulates the training loop and it handles tasks such as evaluation, logging, saving model weights periodically, etc. by the hook system.
lazyconfig_train_net.py: It is the same as the train_net.py but loads configurations from a python file called LazyConfig instead of a yaml file. I haven’t seen an explicit announcement about this but it seems that LazyConfig is the preferred choice since new baseline configs are shared in that format.
plain_train_net.py: It doesn’t have a default Trainer nor hooks but it gives you an explicit training loop implemented. Thus, you don’t need to invest time to understand detectron2 API. You can directly customize the training loop.
lightning_train_net.py: It doesn’t use detectron2’s Trainer but uses the Trainer mechanism that Pytorch Lightning provides. If you are familiar with PytorchLightning, this would be a quick starter for you.

plain_train_net.py looks a bit messy but it allows us to implement custom logic easily. If you will try many different scenarios, it would be a better option. When your work is ready to be shared with others, you may want to think about publishing the code with train_net.py since it is easier to understand.

Whatever trainer you choose, I would suggest using it in a separate project directory by installing detectron2 as a library. Customizing any files in detectron2 will require much work later. For example, you will have to rebuild docker image every time when you make a change on the code if you want to train your model on a container. You can check a couple of great use cases provided in detectron2/projects. For example, the popular image segmentation model PointRend[4] is implemented in detectron2 with very few lines of code. Similarly, a starter project may look like this:

- root_dir
- configs # config yaml/py for different settings
- project_package # 
- config.py # your project related configs
- data
- dataset_mapper.py # to manipulate image and GT annotations
- dataset.py # to manipulate data loading and evaluation
- modelling # your custom model implementations(layer,loss,etc.)
- tools # utility scripts for preprocessing or postprocessing data
- plain_train_net.py

An iteration is the number of passes of the batches. An epoch is the number of passes of the entire training dataset. There is no concept of epoch in detectron2 [5]. All system is set up based on iteration and it is also called as ‘step’ [6]. That means all metrics such as training loss, mAP, accuracy, etc. are logged based on iteration number and they will be shown on tensorboard based on iteration number as well.

Therefore, you have to be careful when comparing runs that have different batch sizes on tensorboard. If batch sizes are not the same, runs will not have seen the same amount of training samples at the same iteration(step) number. e.g. at iteration 100, a run that uses a batch size of 16 will have passed over 1600 samples but a run that uses a batch size of 64 will have passed over 6400 samples.

In detectron2, ‘batch size’ is determined with cfg.SOLVER.IMS_PER_BATCH. So, it is the number of training samples per iteration(step). If you run detectron2 on multi GPUs, the number of training samples per GPU is distributed evenly as cfg.SOLVER.IMS_PER_BATCH/# of GPUs. To be more concrete, if you use 16 GPUs and IMS_PER_BATCH = 32, each GPU will see 2 images per batch [7].

Surprisingly, detectron2 doesn’t provide built-in validation loss calculation. It has been discussed in issue #810. I have used two solutions that are posted under this discussion. If you are using train_net.py that has the hook system, ortegatron’s LossEvalHook implementation works. You only need to add this hook to your codebase and insert it like other hooks.

If you use plain_train_net.py, you can calculate validation loss inspiring from mnslarcher’s suggestion like:

Prepare validation dataset loader

Implement validation loss calculator

Integrate it with the training loop

Early stopping is a feature that enables the training to be automatically stopped when a chosen metric has stopped improving. In this way, you can stop training when it converges(in other words, when it doesn’t improve for a couple of epochs). It is tricky to implement early stopping in the hook system because hooks are working asynchronously. However, it can be implemented by overriding EvalHook and DefaultTrainer as @ahsennazir implemented here. I’ll share my simple implementation for plain_train_net.py here:

Detectron2 provides Dockerfile out of the box to be able to run it on a container. However, it is not working due to a version mismatch that occurred in recent updates as reported in #4335 and #4394. Until they are resolved, you can use the python3.7 upgraded version of Dockerfile or the Ubuntu 20.04 upgraded version of Dockerfile. Detectron2 docker image can be built without the need for GPU by the command below. If you also run it on your computer, you can follow this article.

docker build --build-arg USER_ID=$UID -t detectron2:v0 .

I suggest not adding your project codes to this image so that you will not need to wait for the entire image to be built when you made a small change in your code. You can use it as an environment or operating system. For example, I have an option to mount a data source to containers in the cluster using MapR. Simply, MapR allows mounting a directory in the cloud to other services in the cloud like docker containers or to your local workstation. I rather copy my project files with rsync to that mounted space. The directory I push my code to can be used as a regular directory on my computer and it can be accessible from docker containers as well. When I test a new code, the only thing I need to do is copying my files to that directory and start training on the cluster with the detectron2 image I built once. I’m rebuilding detectron2 image only if I need additional dependencies such as cityscapesScripts, open3d, etc.

This is a good option for fast prototyping. You can also push your code as a docker image to run in the cluster. In most scenarios, the docker image is built automatically in continuous deployment tools when you push your code to a version control system e.g. Git. In this case, using multi-stage builds will help you to build your image fast and keep the image size down. The idea here is to create two images, one for your detectron2 and the other for your project. While the detectro2 image is built once, you can use it in your project’s Dockerfile as an image by FROM detectron2:v0. When you need additional dependencies, you can rebuild the detectron2 image and use a newer version of it like FROM detectron2:v1 .

Detector2 reported that it can process 62 images per second[8] but it heavily depends on which image resolution and how many GPUs you used and many other configurations. In this section, I’ll list the configurations that you can gain speed by changing them. Then, I’ll give more advanced tips that I used in ‘Choosing Subset of Validation/Train Set Randomly’ and ‘Doing Parallel Evaluation in Different Pods’ subsections.

Image resolution: Downscaling image resolution provides incredible speed. You can resize images by using ResizeShortestEdge or ResizeScale augmentations.
Batch size: A bigger batch size provides more stable gradients and it also allows the processing of more images per iteration. If it is not a critical parameter not to be changed for your work, try using a maximum batch size that fits your GPU.
Logging/Visualization Periods: In detectron2, all logs are added to the EventStorage and they are only written into the file system every 20 iterations by default. This number is hardcoded in plain_train_net.py and it can be changed for train_net.py by changing the period parameter of PeriodicWriter hook. Detectron2 can also add inferenced images/masks to tensorboard during training. It is configured by cfg.VIS_PERIOD config. You can gain speed by keeping the visualization period less. Also, you can consider discarding visualizing classes that are not needed for you [9]. Since the entire logging/visualization process is running on the CPU, it steals the time that can be used by the allocated GPU. Try to choose reasonable numbers for logging and visualization periods.
Multiple workers: The number of workers used in data loading is determined by cfg.DATALOADER.NUM_WORKERS config. If you have enough memory, you can speed up the data loading process by increasing this number. Generally, 4 workers are used per GPU. The more workers used means the more memory you should provide.

Choosing Small Subset of Validation/Train Set Randomly

Even if you can gain some speed by changing the parameters above, it can not be enough for fast development. We want the data to be loaded very quickly for debugging which is not possible when the dataset is big. The first idea that comes to mind is using a very small subset of the dataset to overcome this problem. While some datasets provide image/GT paths in txt files, some datasets load data directly from directories. It is not efficient to divide the dataset into subsets manually. Also if you want to see quick feedback but still want to see the performance of your model on the entire dataset, it will be misleading to train your model on a static sampled small subset.

For training, you can use RandomSubsetTrainingSampler which comes built-in with detectron2. It allows loading a small fraction of the dataset for training. You can change the config file like that:

DATALOADER:
SAMPLER_TRAIN: "RandomSubsetTrainingSampler"
RANDOM_SUBSET_RATIO: 0.1

or you can simply pass a sampler to the train loader:

subset_sampler = RandomSubsetTrainingSampler(len(dataset)), 0.1)
build_detection_train_loader(cfg, sampler=subset_sampler)

For validation, you can similarly pass a subset sampler to the build_detection_test_loader in do_test method of plain_train_net.py or build_test_loader method of DefaultTrainer.

Doing Parallel Evaluation in Different Pods

You can determine the evaluation period by cfg.TEST.EVAL_PERIOD config. Basically, detectron2 loads the validation dataset registered by cfg.DATASETS.TEST config and runs inference on the entire validation set.

It is common practice to use a batch size of 1 for evaluation. Also, for most public datasets, evaluation consists of outputting performance metrics by files. Hereby, we can say that the evaluation process is CPU intensive due to io operations. My recent project that uses the KITTI-360 dataset, takes 1.5hours to make an evaluation on 12276 samples for instance segmentation. I thought that decoupling the evaluation from training will remove the irrelevant process for training and GPU can be fully utilized. Also, making evaluations parallel to training allow us to detect convergence earlier. Here are the steps that I followed to setup parallel evaluation:

Save model weights for every epoch periodically using DetectionCheckpointer already defined in trainers. To be more concrete, it creates weight files in model-{storage.iter} format, i.e. in iteration 1000, it creates model-1000.pth file in the log directory.
Start evaluation for created model weights in a new pod by e.g.:
./plain_train_net.py --config-file configs/mask_rcnn_R_50_FPN_1x.yaml --eval-only MODEL.WEIGHTS /tensorboard-logs/trainingx/model-1000.pth OUTPUT_DIR /eval-logs/trainingx-model-1000 . Once the evaluation is completed, we will have the evaluationmetrics.json and tensorboard event file in the /eval-logs/trainingx-model-1000 directory.
Collect evaluation results from different evaluation directories(/eval-logs/trainingx-model-N) and move them into the main log directory /tensorboard-logs/trainingx . Tensorboard creates an event file per run in the format of events.out.tfevents.1664670922.container-name-trainingx-l27mc.1.0During movement, I rename it to events.out.tfevents.0001000.container-name-trainingx-model1000-l27mc.1.0, and I rename the metric file to metrics_0001000.json. Tensorboard reads event files in order by name. As a result, recent evaluation results imported from /eval-logs/trainingx-model-1000 can be visible on the tensorboard of training directoy /tensorboard-logs/trainingx without any other effort.
(Optionally) You can detect convergence similarly to the early stopping feature that we discussed above and stop training automatically. To this end, in the first step, every iteration check if a file named ‘training_completed.txt’ exists in the log directory(/tensorboard-logs/trainingx) during training. Once the training script detects that file, it can break the training loop to finish training. In the third step, after moving evaluation metrics, you can check the latest metrics if there is an improvement or not. If there is no improvement for a certain amount of epochs, you can create ‘training_completed.txt’ file so that the training loop will be terminated.

Detectron2 provides TensorboardXWriter to write metrics to tensorboard. It basically, takes metrics to be written from EventStorage and sends them to tensorboard. You can add logs byput_image and put_scalar methods of EventStorage. TensorboardXWriter checks that logs added into EventStorage periodically and then sends them to tensorboard by using add_image and add_scalar methods of tensorboard.