Table Of Contents
- Description
- How does this sample work?
- Preparing sample data
- Running the sample
- Additional resources
- License
- Changelog
- Known issues
This sample, sampleFasterRCNN, uses TensorRT plugins, performs inference, and implements a fused custom layer for end-to-end inferencing of a Faster R-CNN model. Specifically, this sample demonstrates the implementation of a Faster R-CNN network in TensorRT, performs a quick performance test in TensorRT, implements a fused custom layer, and constructs the basis for further optimization, for example using INT8 calibration, user trained network, etc. The Faster R-CNN network is based on the paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.
Faster R-CNN is a fusion of Fast R-CNN and RPN (Region Proposal Network). The latter is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. It can be merged with Fast R-CNN into a single network because it is trained end-to-end along with the Fast R-CNN detection network and thus shares with it the full-image convolutional features, enabling nearly cost-free region proposals. These region proposals will then be used by Fast R-CNN for detection.
Faster R-CNN is faster and more accurate than its predecessors (RCNN, Fast R-CNN) because it allows for an end-to-end inferencing and does not need standalone region proposal algorithms (like selective search in Fast R-CNN) or classification method (like SVM in RCNN).
Specifically, this sample performs the following steps:
- Preprocessing the input
- Defining the network
- Building the engine
- Running the engine
- Verifying the output
The sampleFasterRCNN sample uses a plugin from the TensorRT plugin library to include a fused implementation of Faster R-CNN’s Region Proposal Network (RPN) and ROIPooling layers. These particular layers are from the Faster R-CNN paper and are implemented together as a single plugin called RPNROIPlugin. This plugin is registered in the TensorRT Plugin Registry with the name RPROI_TRT.
Faster R-CNN takes 3 channel 375x500 images as input. Since TensorRT does not depend on any computer vision libraries, the images are represented in binary R, G, and B values for each pixels. The format is Portable PixMap (PPM), which is a netpbm color image format. In this format, the R, G, and B values for each pixel are usually represented by a byte of integer (0-255) and they are stored together, pixel by pixel.
However, the authors of Faster R-CNN have trained the network such that the first Convolution layer sees the image data in B, G, and R order. Therefore, you need to reverse the order when the PPM images are being put into the network input buffer.
float* data = new float[N*INPUT_C*INPUT_H*INPUT_W];
// pixel mean used by the Faster R-CNN's author
float pixelMean[3]{ 102.9801f, 115.9465f, 122.7717f }; // also in BGR order
for (int i = 0, volImg = INPUT_C*INPUT_H*INPUT_W; i < N; ++i)
{
for (int c = 0; c < INPUT_C; ++c)
{
// the color image to input should be in BGR order
for (unsigned j = 0, volChl = INPUT_H*INPUT_W; j < volChl; ++j)
{
data[i*volImg + c*volChl + j] = float(ppms[i].buffer[j*INPUT_C + 2 - c]) - pixelMean[c];
}
}
}
There is a simple PPM reading function called readPPMFile.
Note: The readPPMFile function will not work correctly if the header of the PPM image contains any annotations starting with #.
Furthermore, within the sample there is another function called writePPMFileWithBBox, that plots a given bounding box in the image with one-pixel width red lines.
In order to obtain PPM images, you can easily use the command-line tools such as ImageMagick to perform the resizing and conversion from JPEG images.
If you choose to use off-the-shelf image processing libraries to preprocess the inputs, ensure that the TensorRT inference engine sees the input data in the form that it is supposed to.
The network is defined in a prototxt file which is shipped with the sample and located in the data/faster-rcnn directory. The prototxt file is very similar to the one used by the inventors of Faster R-CNN except that the RPN and the ROI pooling layer is fused and replaced by a custom layer named RPROIFused.
This sample uses the plugin registry to add the plugin to the network. The Caffe parser adds the plugin object to the network based on the layer name as specified in the Caffe prototxt file, for example, RPROI.
To build the TensorRT engine, see Building An Engine In C++.
Note: In the case of the Faster R-CNN sample, maxWorkspaceSize is set to 10 * (2^20), namely 10MiB, because there is a need of roughly 6MiB of scratch space for the plugin layer for batch size 5.
After the engine is built, the next steps are to serialize the engine, then run the inference with the deserialized engine. For more information, see Serializing A Model In C++.
To deserialize the engine, see Performing Inference In C++.
In sampleFasterRCNN.cpp, there are two inputs to the inference function:
datais the image inputimInfois the image information array which stores the number of rows, columns, and the scale for each image in a batch.
and four outputs:
-
bbox_predis the predicted offsets to the heights, widths and center coordinates. -
cls_probis the probability associated with each object class of every bounding box. -
roisis the height, width, and the center coordinates for each bounding box. -
countis deprecated and can be ignored.Note: The
countoutput was used to specify the number of resulting NMS bounding boxes if the output is not aligned tonmsMaxOut. Although it is deprecated, always allocate the engine buffer of sizebatchSize * sizeof(int)for it until it is completely removed from the future version of TensorRT.
The outputs of the Faster R-CNN network need to be post-processed in order to obtain human interpretable results.
First, because the bounding boxes are now represented by the offsets to the center, height, and width, they need to be unscaled back to the raw image space by dividing the scale defined in the imInfo (image info).
Ensure you apply the inverse transformation on the bounding boxes and clip the resulting coordinates so that they do not go beyond the image boundaries.
Lastly, overlapped predictions have to be removed by the non-maximum suppression algorithm. The post-processing codes are defined within the CPU because they are neither compute intensive nor memory intensive.
After all of the above work, the bounding boxes are available in terms of the class number, the confidence score (probability), and four coordinates. They are drawn in the output PPM images using the writePPMFileWithBBox function.
RPNROIPlugin has four inputs (bbox confidence, bbox offset, feature map and image info) and two outputs (feature map and rois). This plugin supports feature map fp32/int8 I/O, and only supports fp32 for other inputs and outputs. Per tensor dynamic range file located at data/faster-rcnn is required for int8 precision. Each line in this file contains a tensor name (the same with layer name) and a dynamic range value. The dynamic range value means the abs of a tensor boundary value.
Because one output rois of this plugin is marked as network's output, TensorRT requires all inputs and outputs of this plugin to run on fp32 precision. When enabling int8 precision, addIdentity() is added after rois to remove this requirement.
In this sample, the following layers are used. For more information about these layers, see the TensorRT Developer Guide: Layers documentation.
Activation layer
The Activation layer implements element-wise activation functions. Specifically, this sample uses the Activation layer with the type kRELU.
Convolution layer The Convolution layer computes a 2D (channel, height, and width) convolution, with or without bias.
FullyConnected layer The FullyConnected layer implements a matrix-vector product, with or without bias.
Plugin (RPROI) layer Plugin layers are user-defined and provide the ability to extend the functionalities of TensorRT. See Extending TensorRT With Custom Layers for more details.
Pooling layer
The Pooling layer implements pooling within a channel. Supported pooling types are maximum, average and maximum-average blend.
Shuffle layer The Shuffle layer implements a reshape and transpose operator for tensors.
SoftMax layer The SoftMax layer applies the SoftMax function on the input tensor along an input dimension specified by the user.
Identity Layer The Identity Layer implements the identity operation.
-
Set
$TRT_DATADIRto point to the sample data directory. -
Download the faster_rcnn_models.tgz dataset.
export TRT_DATADIR=/usr/src/tensorrt/data mkdir -p $TRT_DATADIR/faster-rcnn wget --no-check-certificate https://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0 -O $TRT_DATADIR/faster-rcnn/faster-rcnn.tgz
-
Extract the dataset into the
data/faster-rcnndirectory.tar zxvf $TRT_DATADIR/faster-rcnn/faster-rcnn.tgz -C $TRT_DATADIR/faster-rcnn --strip-components=1 --exclude=ZF_*
-
Compile the sample by following build instructions in TensorRT README.
-
Run the sample to generate characters based on the trained model:
./sample_fasterRCNN --datadir=$TRT_DATADIR/faster-rcnn -
Verify that the sample ran successfully. If the sample runs successfully you should see output similar to the following:
Sample output [I] Detected car in 000456.ppm with confidence 99.0063% (Result stored in car-0.990063.ppm). [I] Detected person in 000456.ppm with confidence 97.4725% (Result stored in person-0.974725.ppm). [I] Detected cat in 000542.ppm with confidence 99.1191% (Result stored in cat-0.991191.ppm). [I] Detected dog in 001150.ppm with confidence 99.9603% (Result stored in dog-0.999603.ppm). [I] Detected dog in 001763.ppm with confidence 99.7705% (Result stored in dog-0.997705.ppm). [I] Detected horse in 004545.ppm with confidence 99.467% (Result stored in horse-0.994670.ppm). &&&& PASSED TensorRT.sample_fasterRCNN # ./build/x86_64-linux/sample_fasterRCNNThis output shows that the sample ran successfully;
PASSED.
To see the full list of available options and their descriptions, use the -h or --help command line option.
The following resources provide a deeper understanding about object detection with Faster R-CNN:
Faster R-CNN
Documentation
For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.
February 2019
This README.md file was recreated, updated and reviewed.
There are no known issues in this sample.