Get Started with PointPillars
PointPillars is a method for 3-D object detection using 2-D convolutional layers. PointPillars network has a learnable encoder that uses PointNets to learn a representation of point clouds organized in pillars (vertical columns). The network then runs a 2-D convolutional neural network (CNN) to produce network predictions, decodes the predictions, and generates 3-D bounding boxes for different object classes such as cars, trucks, and pedestrians.
The PointPillars network has these main stages.
Use a feature encoder to convert a point cloud to a sparse pseudoimage.
Process the pseudoimage into a high-level representation using a 2-D convolution backbone.
Detect and regress 3D bounding boxes using detection heads.
PointPillars Network
A PointPillars network requires two inputs: pillar indices as a P-by-2 and pillar features as a P-by-N-by-K matrix. P is the number of pillars in the network, N is the number of points per pillar, and K is the feature dimension.
The network begins with a feature encoder, which is a simplified PointNet. It contains a series of convolution, batch-norm, and relu layers followed by a max pooling layer. A scatter layer at the end maps the extracted features into a 2-D space using the pillar indices.
Next, the network has a 2-D CNN backbone that consists of encoder-decoder blocks. Each encoder block consists of convolution, batch-norm, and relu layers to extract features at different spatial resolutions. Each decoder block consists of transpose convolution, batch-norm, and relu layers.
The network then concatenates output features at the end of each decoder block, and passes these features through six detection heads with convolutional and sigmoid layers to predict occupancy, location, size, angle, heading, and class.
Create PointPillars Network
You can use the Deep Network Designer (Deep Learning Toolbox)
app to interactively create a PointPillars deep learning network. To programmatically create
a PointPillars network, use the pointPillarsObjectDetector
object.
Transfer Learning
Transfer learning is a common deep learning technique in which you take a pretrained network as a starting point to train a network for a new task.
To perform transfer learning with a pretrained pointPillarsObjectDetector
network, specify new object classes and their
corresponding anchor boxes. Then, train the network on a new data set.
Anchor boxes capture the scale and aspect ratio of specific object classes you want to detect, and are typically chosen based on object sizes in your training data set. For more information on anchor boxes, see Anchor Boxes for Object Detection.
Train PointPillars Object Detector and Perform Object Detection
Use the trainPointPillarsObjectDetector
function to train a PointPillars network. To
perform object detection on a trained PointPillars network, use the detect
function.
For more information on how to train a PointPillars network, see Lidar 3-D Object Detection Using PointPillars Deep Learning.
Code Generation
To learn how to generate CUDA® code for a PointPillars Network, see Code Generation for Lidar Object Detection Using PointPillars Deep Learning.
References
[1] Lang, Alex H., Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. “PointPillars: Fast Encoders for Object Detection From Point Cloud” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12689–97. Long Beach, CA, USA: IEEE, 2019. https://doi.org/10.1109/CVPR.2019.01298.
[2] Hesai and Scale. PandaSet. https://scale.com/open-datasets/pandaset.
See Also
Apps
- Deep Network Designer (Deep Learning Toolbox) | Lidar Viewer | Lidar Labeler
Objects
Functions
Related Examples
- Lidar 3-D Object Detection Using PointPillars Deep Learning
- Code Generation for Lidar Object Detection Using PointPillars Deep Learning
- Lane Detection in 3-D Lidar Point Cloud
- Unorganized to Organized Conversion of Point Clouds Using Spherical Projection
More About
- Deep Learning in MATLAB (Deep Learning Toolbox)
- Deep Learning with Point Clouds