I've been following the object-detection tutorial (https://www.mathworks.com/help/vision/ug/object-detection-using-deep-learning.html) and trying to apply it to some of my own image data. Specifically, what I'd like to do is to train a detector to identify tassels in aerial crop images. I've followed the code in the tutorial very closely. The only major change I've made is the image dataset used to train the detector (at the call to trainRCNNObjectDetector), since I'm trying to identify tassels, not stop signs. I've conducted a lot of different tests using this detector and have tried different image datasets for training. The problem I consistently have is that the detector misses many of the tassels in the image (I need the identification rate to be as close to 100% of the visible tassels as possible). Most recently, I used a published dataset (specifically the one available here: https://github.com/poppinace/mtdc). I trained the detector on 186 of these images using the annotations provided, but I am still seeing the same quality of results as before. The attached image is typical, with only 22 bounding boxes drawn but 55 tassels visible in the image.
I've established that the problem is not in the quality or quantity of the training images I'm using, which means the problem is either in the training parameters (do I need to include substantially more layers in the detector, for instance?) or the detection algorithm itself. Can anyone offer some insight into this? Is the basic code provided in the linked tutorial even meant to identify all objects of a given type in an image, or is it just intended to identify some of the highest-scoring matches? Is there a detection threshold that can be lowered to draw more bounding boxes?