Main Content

Object Detection

Perform classification, object detection, transfer learning using convolutional neural networks (CNNs, or ConvNets), create customized detectors

Object detection is a computer vision technique for locating instances of objects in images or videos. Object detection algorithms typically leverage machine learning or deep learning to produce meaningful results. When looking at images or video, humans can recognize and locate objects of interest in a matter of moments. The goal of object detection is to replicate this intelligence using a computer. The best approach for object detection depends on your application and the problem you are trying to solve.

Deep learning techniques require a large number of labeled training images, so the use of a GPU is recommended to decrease the time needed to train a model. Deep learning-based approaches to object detection use convolutional neural networks (CNNs or ConvNets), such as R-CNN and YOLO v2, or use single-shot detection (SSD). You can train a custom object detector, or use a pretrained object detector by leveraging transfer learning, an approach that enables you to start with a pretrained network and then fine-tune it for your application. Convolutional neural networks require Deep Learning Toolbox™. Training and prediction are supported on a CUDA®-capable GPU. Use of a GPU is recommended and requires Parallel Computing Toolbox™. For more information, see Computer Vision Toolbox Preferences and Parallel Computing Support in MathWorks Products (Parallel Computing Toolbox).

Machine learning techniques for object detection include aggregate channel features (ACF), support vector machines (SVM) classification using histograms of oriented gradient (HOG) features, and the Viola-Jones algorithm for human face or upper-body detection. You can choose to start with a pretrained object detector or create a custom object detector to suit your application.

Object detection, neural network


Image LabelerLabel images for computer vision applications
Video LabelerLabel video for computer vision applications


expand all

Deep Learning Detectors

rcnnObjectDetectorDetect objects using R-CNN deep learning detector
fastRCNNObjectDetectorDetect objects using Fast R-CNN deep learning detector
fasterRCNNObjectDetectorDetect objects using Faster R-CNN deep learning detector
ssdObjectDetectorDetect objects using SSD deep learning detector
yolov2ObjectDetectorDetect objects using YOLO v2 object detector
yolov3ObjectDetectorCreate YOLO v3 object detector
maskrcnnDetect objects using Mask R-CNN instance segmentation

Feature-based Detectors

ocrRecognize text using optical character recognition
readAprilTagDetect and estimate pose for AprilTag in image
readBarcodeDetect and decode 1-D or 2-D barcode in image
acfObjectDetectorDetect objects using aggregate channel features
peopleDetectorACFDetect people using aggregate channel features
vision.CascadeObjectDetectorDetect objects using the Viola-Jones algorithm
vision.ForegroundDetectorForeground detection using Gaussian mixture models
vision.PeopleDetectorDetect upright people using HOG features
vision.BlobAnalysisProperties of connected regions

Detect Objects Using Point Features

detectBRISKFeaturesDetect BRISK features and return BRISKPoints object
detectFASTFeaturesDetect corners using FAST algorithm and return cornerPoints object
detectHarrisFeaturesDetect corners using Harris–Stephens algorithm and return cornerPoints object
detectKAZEFeaturesDetect KAZE features and return KAZEPoints object
detectMinEigenFeaturesDetect corners using minimum eigenvalue algorithm and return cornerPoints object
detectMSERFeaturesDetect MSER features and return MSERRegions object
detectORBFeaturesDetect ORB keypoints and return an ORBPoints object
detectSIFTFeaturesDetect scale invariant feature transform (SIFT) features and return SIFTPoints object
detectSURFFeaturesDetect SURF features and return SURFPoints object
extractFeaturesExtract interest point descriptors
matchFeaturesFind matching features

Select Detected Objects

selectStrongestBboxSelect strongest bounding boxes from overlapping clusters
selectStrongestBboxMulticlassSelect strongest multiclass bounding boxes from overlapping clusters

Load Training Data

boxLabelDatastoreDatastore for bounding box label data
groundTruthGround truth label data
imageDatastoreDatastore for image data
objectDetectorTrainingDataCreate training data for an object detector
combineCombine data from multiple datastores

Train Feature-Based Object Detectors

trainACFObjectDetectorTrain ACF object detector
trainCascadeObjectDetectorTrain cascade object detector model
trainImageCategoryClassifierTrain an image category classifier

Train Deep Learning Based Object Detectors

trainRCNNObjectDetectorTrain an R-CNN deep learning object detector
trainFastRCNNObjectDetectorTrain a Fast R-CNN deep learning object detector
trainFasterRCNNObjectDetectorTrain a Faster R-CNN deep learning object detector
trainSSDObjectDetectorTrain an SSD deep learning object detector
trainYOLOv2ObjectDetectorTrain YOLO v2 object detector

Augment and Preprocess Training Data for Deep Learning

balanceBoxLabelsBalance bounding box labels for object detection
bboxcropCrop bounding boxes
bboxeraseRemove bounding boxes
bboxresizeResize bounding boxes
bboxwarpApply geometric transformation to bounding boxes
bbox2pointsConvert rectangle to corner points list
imwarpApply geometric transformation to image
imcropCrop image
imresizeResize image
randomAffine2dCreate randomized 2-D affine transformation
centerCropWindow2dCreate rectangular center cropping window
randomWindow2dRandomly select rectangular region in image
integralImageCalculate 2-D integral image

R-CNN (Regions With Convolutional Neural Networks)

rcnnBoxRegressionLayerBox regression layer for Fast and Faster R-CNN
fasterRCNNLayersCreate a faster R-CNN object detection network
rpnSoftmaxLayerSoftmax layer for region proposal network (RPN)
rpnClassificationLayerClassification layer for region proposal networks (RPNs)
regionProposalLayerRegion proposal layer for Faster R-CNN
roiAlignLayerNon-quantized ROI pooling layer for Mask-CNN
roiInputLayerROI input layer for Fast R-CNN
roiMaxPooling2dLayerNeural network layer used to output fixed-size feature maps for rectangular ROIs
roialignNon-quantized ROI pooling of dlarray data

YOLO (You Only Look Once)

yolov2LayersCreate YOLO v2 object detection network
yolov2TransformLayerCreate transform layer for YOLO v2 object detection network
yolov2OutputLayerCreate output layer for YOLO v2 object detection network
yolov2ReorgLayer(Not recommended) Create reorganization layer for YOLO v2 object detection network
spaceToDepthLayerSpace to depth layer

Focal Loss Layers

focalLossLayerCreate focal loss layer using focal loss function
focalCrossEntropyCompute focal cross-entropy loss

SSD (Single Shot Detector)

ssdMergeLayerCreate SSD merge layer for object detection
ssdLayersSSD multibox object detection network

Anchor Boxes

anchorBoxLayerCreate anchor box layer for object detection
estimateAnchorBoxesEstimate anchor boxes for deep learning object detectors
insertObjectAnnotationAnnotate truecolor or grayscale image or video stream
insertObjectMask Insert masks in image or video stream
insertShapeInsert shapes in image or video
showShapeDisplay shapes on image, video, or point cloud
evaluateDetectionAOSEvaluate average orientation similarity metric for object detection
evaluateDetectionMissRateEvaluate miss rate metric for object detection
evaluateDetectionPrecisionEvaluate precision metric for object detection
bboxOverlapRatioCompute bounding box overlap ratio
bboxPrecisionRecallCompute bounding box precision and recall against ground truth


Deep Learning Object DetectorDetect objects using trained deep learning object detector


Get Started

Getting Started with Object Detection Using Deep Learning

Object detection using deep learning neural networks.

Point Feature Types

Choose functions that return and accept points objects for several types of features

Coordinate Systems

Specify pixel Indices, spatial coordinates, and 3-D coordinate systems

Local Feature Detection and Extraction

Learn the benefits and applications of local feature detection and extraction.

Image Classification with Bag of Visual Words

Use the Computer Vision Toolbox™ functions for image category classification by creating a bag of visual words.

Get Started with Cascade Object Detector

Train a custom classifier

Choose Function to Visualize Detected Objects

Compare visualization functions.

Training Data for Object Detection and Semantic Segmentation

Get Started with the Image Labeler

Interactively label rectangular ROIs for object detection, pixels for semantic segmentation, polygons for instance segmentation, and scenes for image classification.

Get Started with the Video Labeler

Interactively label rectangular ROIs for object detection, pixels for semantic segmentation, polygons for instance segmentation, and scenes for image classification in a video or image sequence.

Datastores for Deep Learning (Deep Learning Toolbox)

Learn how to use datastores in deep learning applications.

Getting Started with Mask R-CNN for Instance Segmentation

Perform multiclass instance segmentation using Mask R-CNN and deep learning.

Training Data for Object Detection and Semantic Segmentation

Create training data for object detection or semantic segmentation using the Image Labeler or Video Labeler.

Get Started With Deep Learning

Deep Network Designer (Deep Learning Toolbox)

List of Deep Learning Layers (Deep Learning Toolbox)

Discover all the deep learning layers in MATLAB®.

Deep Learning in MATLAB (Deep Learning Toolbox)

Discover deep learning capabilities in MATLAB using convolutional neural networks for classification and regression, including pretrained networks and transfer learning, and training on GPUs, CPUs, clusters, and clouds.

Pretrained Deep Neural Networks (Deep Learning Toolbox)

Learn how to download and use pretrained convolutional neural networks for classification, transfer learning and feature extraction.

Featured Examples