CUHK-AHU Dataset: Promoting Practical Self-Driving Applications in The Complex Airport Logistics, Hill and Urban Environments
CUHK-AHU Dataset: Promoting Practical Self-Driving Applications in The Complex Airport Logistics, Hill and Urban Environments
Wen Chen, Zhe Liu, Hongchao Zhao, Shunbo Zhou, Haoang Li, Yun-Hui Liu
Authorized licensed use limited to: Rutgers University. Downloaded on May 15,2021 at 12:08:17 UTC from IEEE Xplore. Restrictions apply.
in dynamic urban environments. Both datasets are collected independent mechanical structure which is attached on the
at different time over one year, suffering from environment top of vehicles. The main objective of the modular design
structural changes and various weather conditions. Other is to acquire ominidirectional visual and range informa-
relevant datasets released in [5], [6], [7] are collected from tion from totally different application environments, such
urban environments using different sensor combinations. as airport cargo terminal and undulating hill roads. The
These datasets have played a very good role in promoting configuration of the sensors is summarized in Table. I.
the development of autonomous driving in urban or suburban A 3D LiDAR (“VLP16”) is installed at the center of
environments. However, the designed collection routes are the platform to provide 360-degree and 16-channel range
relatively simple such that few datasets can cover different information of the surrounding scene at a frequency of 10Hz.
environments in a single collection. What’s more, to the For the inertial measurements, a consumer-level MTi IMU
best of our knowledge, there is not dataset that targets sensor is mounted under the LiDAR by coaxial installation. It
the challenges in the industrial logistics environment and can provide nine-axis measurements: linear acceleration, ro-
undulating hill environment. In these two environments, tational angular velocity and geomagnetism orientation. The
autonomous driving faces different challenges compared with platform also has a dual-antenna GPS with RTK correction
existing datasets. Structural changes are always taking place signals to provide 2.5Hz position measurement.
in the logistics environment. The environment may become The platform includes six color cameras on a flat plane:
completely different as goods are moved in and out at any two Point Grey Grasshopper3 cameras and four Point Grey
time which makes robust localization and mapping very Blackfly cameras. The two Grasshopper3 cameras provide
difficult. In the undulating hill environment, the movement the front and rear view for better imaging quality, and the
of vehicles is not on a flat plane, and the surroundings are remaining four Blackfly cameras are evenly distributed in the
structureless and noisy. other view directions around the central axis of the LiDAR to
In this paper, we present a LiDAR and image dataset form an ominidirectional visual sensing unit. To perform the
that focuses on three types of environments, i.e., air cargo time synchronization of the six-camera unit, synchronized
terminal environment, the undulating hill environment and capture technology provided by the Point Grey is applied to
the mixed complex urban environment using two different ensure the same frame rates. One “primary” camera is used
vehicles shown in Fig. 1. All sensors are mounted on a de- to trigger the other “secondary” cameras by linking their
tachable data collection platform shown in Fig. 1(c), which is GPIO pins. In our application, the “Camera 2” is treated as
then installed on the top of the vehicles. The industrial tractor “primary” camera, yielding 20Hz image capture rate.
in Fig. 1(a) is used to collect data in the HACT which is the All sensors are logged using an industrial computer run-
second busiest air cargo terminal in the world. The passenger ning Ubuntu Linux with an i7 processor, 32GB DDR4 mem-
car in Fig. 1(b) is used to collect data around the Chinese ory and two 2TB SSDs. All the sensor drivers are developed
University of Hong Kong (CUHK) campus which is located on the ROS Kinetic. And the logger are based on the rosbag
on a hill. As for the mixed complex urban environment, we package. The timestamp for each sensor measurement is
also use the passenger car to collect data along the route created by the related driver running on the computer.
through the highly dynamic residence blocks, sloped roads
and highways. We repeatedly collect LiDAR, camera, IMU, B. Sensor Calibration
and GPS data along several paths in these environments to 1) Camera Calibration: The camera calibration includes
capture the structural changes, the illumination changes and the estimation of the intrinsic parameters of each camera
the different degrees of undulation of the road. In summary, and the extrinsic transformation among six cameras. The
the key contributions of this paper are listed in the following: intrinsic calibration is performed separately using the Single
• provides a novel dataset which firstly covers three Camera Calibrator App developed by MATLAB. To improve
types of challenging environments: the highly dynamic the intrinsic calibration, we remove the high-error images to
industrial logistics environment, the undulating hill en- make the mean reprojection error lower than 0.3 pixels.
vironment and the mixed complex urban environment. For the extrinsic calibration among cameras, the feature-
• provides baseline trajectories generated by the proposed matching based method is hard to be applied because of
SLAM approach, which combines the state-of-art Li- the limited overlapping view field between any two adjacent
DAR odometry, graph-based optimization and point cameras. Hence, we choose to recover the transformation
cloud based place recognition. matrix between any two cameras from the calibration result
The first batch of the presented dataset is available of the corresponding LiDAR-camera pairs.
at: http://ri.cuhk.edu.hk/research/public_ 2) Joint LiDAR-Camera Calibration: The calibration
datasets/CUHK_AHU_Dataset. More data and devel- problem between LiDAR and camera is a typical Perspective-
opment tools will be released periodically. n-Point (PnP) problem. One common strategy is to transfer
the 3D-to-2D problem to the 3D-to-3D problem by estimat-
II. C OLLECTION P LATFORM C ONFIGURATION ing the depth of each 2D image feature point with the help
A. Hardware Platform of checker board. However, in practice, the placement of the
Fig. 1(c) shows the sensors’ locations on the data col- checker board affects the final calibration result since the
lection platform. All sensors are mounted on a carrier- working distance of the LiDAR ranges from 1m to 100m.
4284
Authorized licensed use limited to: Rutgers University. Downloaded on May 15,2021 at 12:08:17 UTC from IEEE Xplore. Restrictions apply.
TABLE I
D ETAILED CONFIGURATION INFORMATION OF THE SENSORS USED ON THE PLATFORM
In order to reduce the influence of this factor, we manually In addition, the GPS signals become inevitably unstable or
selected 3D-2D point pairs from different distance zone. For unavailable as the tractor frequently switched between inside
each LiDAR-camera pair, we adopt the same method to find and outside areas. Thus, the GPS measurements are not
the 3D-to-2D correspondences which were initially solved provided in this group.
by the non-iterative EPnP algorithm [8]. In addition, we also 2) Undulating hill environment: Fig. 2(b) shows three
select the image points associated with the same 3D LiDAR paths of collection in the CUHK campus which is situated on
point from the overlapping area of two adjacent images to an undulating hill. The elevation difference of the paths can
enhance the connection between cameras. Finally, all corre- exceed one hundred meters. During the acquisition process,
sponding pairs are put into a jointly non-liner optimization the average driving speed of the car was about 30km/h. The
problem solved by a Gauss-Newton scheme. More details samples of the data group are shown in Fig. 3(b).
can be found in [9]. 3) Mixed complex urban environment: Fig. 2(c) presents
3) GPS/IMU-LiDAR Calibration: The fusion of GPS and the path in mixed complex urban environments, including
IMU provides estimation of the vehicle states [10]. And also, the non-flat mountain roads, highly dynamic living blocks,
the trajectory of the vehicle can be estimated from the match- structured industrial zone and highways. The driving speed
ing of the edge and planar features points extracted from varied from 0km/h to 90km/h. In the process of collection,
LiDAR measurements [11]. The calibration of GPS/IMU and we experienced many situations in manual driving, such
LiDAR becomes a well-known hand-eye calibration problem as long-time traffic jam, overtaking and being overtaken.
[12]. However, the accuracy of the INS and the LiDAR Fig. 3(c) illustrates the typical scenes in the living block
odometry highly depends on the surrounding scenarios: the and the highways.
former needs to work in open and unsheltered environments, B. Data Formats
and the latter needs environments full of structural informa- 1) Synchronized Images: All datasets are distributed by
tion. Thus, we choose an outdoor parking lot to calibrate the the paths of collection into groups (“HACT”, “CUHK”,
extrinsic parameters between the GPS/IMU and LiDAR. “Taipo”). In order to reduce unnecessary trouble in the
III. DATASET D ESCRIPTION process of downloading, each set is further divided into
The datasets collected using our two types of manually several sub-sets whose size dose not exceeds 5GB. Such
driving vehicles cover different features such as industrial a sub-set is compressed in a tar file which is named
logistics environment, undulating hill area and mixed com- as hGroupi hP athi hN o.setsi hN o.subsetsi .tar (e.g.,
plex urban environment. All data are grouped by the path of HACT f orward 1 01.tar). The file tree of the sub-set is
collection, and each group contains at least two sets of data presented in Fig. 4. And the timestamps of measurements
acquired at different time from day to night. The collection from the six cameras and velodyne LiDAR are saved re-
paths of each group are shown in Fig. 2. And the statistics spectively in the hsensori timestamps.csv. The formats
of the size, frame counts, complexity and elevation for each for other measurements are as shown in the following:
path are listed in Table. II. All six cameras synchronize their time using a primary-
secondary trigger mechanism. The timestamps of all images
A. Acquisition Environments are also recorded in case there is a need for higher accuracy
1) Functioning Air Cargo terminal: The HACT group in time processing. The synchronous frequency of these
of datasets is collected in a normally functioning air cargo cameras is 20Hz. The images are saved in the lossless PNG
terminal with the ability of handling 3.5 million tonnes of format whose bayer mode is RGGB8. To convert the bayer
cargo every year. It contains an indoor multi-level storage images to RGB images, the demosaic function in MATLAB
system and several outdoor box storage zones, as shown in or the cvtColor function in OpenCV can be applied.
Fig. 2(a). In order to record the most real logistics scenario, 2) 3D LiDAR scans: The 3D LiDAR scans assembled
we collect the data of the whole terminal environment along from the packets returned by the VLP-16 are saved as
forward and backward paths, without special traffic control. binary files at a frequency of 10Hz. The timestamp of
Fig. 3(a) shows the dramatic changes in the terminal. the last packet during a rotation are recorded as the times-
During data collection, goods, trailers or cargo planes parked tamp of the scan which is used to name the saved file:
at every location may be removed or replaced by others. hT imestampi .bin. Users should pay some attention on the
4285
Authorized licensed use limited to: Rutgers University. Downloaded on May 15,2021 at 12:08:17 UTC from IEEE Xplore. Restrictions apply.
Outdoor box storage zone
7 10 11
4 3 2
Indoor Indoor
multi- multi-
level level
storage storage
system system
5 6
8 9 12
TABLE II
S TATISTICS OF THE WHOLE DATASET
Group Path No. Sets Size Duration(s) No.of images/clouds Complexity Elevation (Min/Avg/Max)
8 2922.0 GB 10173 205200/101689 ***** 0 meters
HACTL Forward
8 2771.5 GB 9653 192709/96461 ***** 0 meters
6 1558.2 GB 9671 193577/96373 *** 5/37/123 meters
CUHK Path 1 6 1541.3 GB 9852 198092/98226 *** 6/61/133 meters
4 2591.1 GB 9003 181576/90010 *** 4/23/128 meters
Taipo Path 1 2 2098.7 GB 10734 214428/107295 ***** 1/18/127 meters
recovering of the cut angle since it is not specified during acquisition environments, the GPS-based positioning meth-
collection. All the points are represented by four-dimension ods are not applicable because they are easily affected by
floating tuples: (X, Y, Z, I). I is the intensity of the point. the surrounding buildings and trees. Some paths are even
(X, Y, Z) is the coordinate value of the point represented in indoors. Similar to [2], we provide baseline trajectories
the local Cartesian coordinate defined w.r.t moving LiDAR, estimated by the graph-based SLAM technology [13].
at the time corresponding to the recorded timestamp. So the Our SLAM system fuses the sensor information from
motion distortion problem of the 3D pointclouds is always LiDAR, IMU and GPS. The measurements from IMU are
present, and we haven’t done any process to compensate initially applied to compensate the motion distortion of the
them in the raw data. LiDAR points. Similar to the state-of-art LiDAR odometry
3) GPS/IMU: The GPS measurements are recorded in the [11], the planar and edge features extracted from the undis-
gps.csv at 2.5Hz. They are formated as seven-dimension tu- torted pointclouds are used to establish the constrains with
ple: (timestamp, latitude, longitude, altitude, latitude std, lon- adjacent frames or local map clouds. The integral of IMU
gitude std, altitude std). The imu.csv stores the IMU mea- data are used as the initial guess of the optimization problem
surements at 100Hz. IMU provides the nine-axis measure- constructed from the constrains.
ments from the accelerator, gyroscope and magnetometer. The back-end pose graph optimization is then performed to
They are formated as (timestamp, acc x, acc y, acc z, gyr x, get the globally consistent map. The key to this optimization
gyr y, gyr z, ori qua w, ori qua x, ori qua y, ori qua z). We problem is not only to obtain the estimation of the relative
provide the two-minute data for each dataset when the transformations between pose nodes, but also to obtain the
vehicle is stationary before moving. The initial value for the closed-loop constraints by place recognition. In our previous
estimation of the bias and gravity can be computed. The work, a deep neural network LPN-Net [14] was proposed
fusion results of GPS and IMU are recorded in the ins.csv. to achieve the point cloud based place recognition in large
scale environments. In this paper, we encode the point cloud
IV. BASELINE T RAJECTORIES of each key frame to a global descriptor using the LPN-Net.
The ground-truth trajectories provided by high-precision The re-passed places can be recognized by comparing the
sensors are always needed for evaluating the performance Euclidean distance between the newly-added frame and the
of different algorithms in public datasets. However, in our old key frames. Then the relative transformation between the
4286
Authorized licensed use limited to: Rutgers University. Downloaded on May 15,2021 at 12:08:17 UTC from IEEE Xplore. Restrictions apply.
(a)
(b)
(c)
Fig. 3. The image samples of each group. The columns from left to right represent the images captured by cameras 0 to camera 5. (a) The samples of
the HACT group, presenting the dramatic changes happened every day in the air cargo terminal. (b) The samples of the CUHK group, illustrating the ups
and downs of the road in the hill environment. (c) The samples of the Taipo group, showing the crowded roads and the high-speed roads in the mixed
complex urban environment.
4287
Authorized licensed use limited to: Rutgers University. Downloaded on May 15,2021 at 12:08:17 UTC from IEEE Xplore. Restrictions apply.
<Group>_<Path>_<No. sets>_<No. subsets>.tar
camera_0 … camera_5
<Time_stamp>.png <Time_stamp>.png
velodyne
<Time_stamp>.bin
camera_0_timestamps.csv … camera_5_timestamps.csv
velodyne_timestamps.csv
gps.csv
ins.csv
imu.csv
4288
Authorized licensed use limited to: Rutgers University. Downloaded on May 15,2021 at 12:08:17 UTC from IEEE Xplore. Restrictions apply.