How to Improve Accuracy in Visual SLAM

Achieving reliable real-time localization and mapping is essential for robotics and AR applications. The Computer Vision Toolbox™ provides a performant, configurable, and easy-to-use interface that offers an out-of-the-box solution for visual simultaneous localization and mapping (vSLAM), handling tasks such as feature extraction, matching, pose estimation, mapping, loop closure, and IMU sensor fusion internally. To meet performance demands, you can improve the accuracy, robustness, and efficiency of your visual SLAM system by optimizing sensor use for loop closure and tuning key parameters. For a general description on why SLAM matters and how it works for different applications, see What is SLAM?

Using Verbose Mode to Diagnose SLAM Errors

During SLAM processing, you can diagnose and troubleshoot errors using runtime messages returned to the command line as the algorithm runs. To display these messages, set the Verbose name-value argument to true for the monovslam, stereovslam, or rgbdvslam object. In addition to enabling Verbose mode, see Techniques to Improve Accuracy for common sources of inaccuracy and ways to improve SLAM accuracy.

Verbose Mode Display Options

Verbose value	Display description	Display location
`[]` or `false`	Display is turned off	—
`1` or `true`	Stages of vSLAM execution	Command window
`2`	Stages of vSLAM execution, with details on how the frame is processed, such as the artifacts used to initialize the map.	The MATLAB^® command window displays a link to the log file.
`3`	Stages of vSLAM, artifacts used to initialize the map, poses and map points before and after bundle adjustment, and loop closure optimization data.	The MATLAB command window displays a link to the log file.

This table lists some of the most common messages and root causes you may encounter.

Verbose Message Root Cause Parameters to Tune

Verbose Message	Root Cause	Parameters to Tune
`Not enough matched points with frame {K}. NumMatchedPoints=X is less than MinNumPoints=Y` `Not enough feature points in frame {K}. NumMatchedPoints=X is less than MinExtractedPoints=Y` `Not enough world points in frame {K}. NumWorldPoints=X is less than MinWorldPoints=Y`	Occasionally, the count of tracked features or points may fall below a critical threshold, resulting in initialization failures or a loss of tracking. This issue can arise from many factors such as: inadequate image quality, abrupt variations in brightness, or rapid movements. To mitigate this problem, consider extracting a larger number of 2-D features or reduce the number of frames skipped between each pair of keyframes.	`MaxNumPoints` `ScaleFactor` `NumLevels` `SkipMaxFrames` See SLAM Initialization and Tune Keyframe and Tracking Parameters.
`Loop not closed. All loop candidates were rejected`	Loop closure failures typically arise from two primary factors: The loop closure threshold may be set too high, resulting in missed matches. Gradually lower this threshold to enhance your results, but be cautious not to set it too low, as this may lead to false positives. The bag of words utilized might not be well-suited to the input data. When other methods do not yield sufficient performance, consider generating a new bag-of-words model using data from a camera sensor with characteristics similar to the target sensor.	`LoopClosureThreshold` `CustomBagOfFeatures` See Loop Closure.
`Tracking lost`	Loss of tracking can occur due to several factors: An insufficient number of extracted features, which can result in a gradual decline in the number of tracked features. An excessive number of frames skipped between each pair of keyframes, particularly in sequences involving aggressive maneuvers. A threshold that is set too high for the minimum number of tracked features. Use `checkStatus` for diagnostic feedback. For more details, see Tracking and Keyframe Management	`MaxNumPoints` `SkipMaxFrames` `TrackFeatureRange` See Tune Keyframe and Tracking Parameters.

Not enough matched points with frame {K}. NumMatchedPoints=X is less than MinNumPoints=Y
Not enough feature points in frame {K}. NumMatchedPoints=X is less than MinExtractedPoints=Y
Not enough world points in frame {K}. NumWorldPoints=X is less than MinWorldPoints=Y

Occasionally, the count of tracked features or points may fall below a critical threshold, resulting in initialization failures or a loss of tracking. This issue can arise from many factors such as: inadequate image quality, abrupt variations in brightness, or rapid movements.

To mitigate this problem, consider extracting a larger number of 2-D features or reduce the number of frames skipped between each pair of keyframes.

MaxNumPoints
ScaleFactor
NumLevels
SkipMaxFrames

See SLAM Initialization and Tune Keyframe and Tracking Parameters.

Loop not closed. All loop candidates were rejected

Loop closure failures typically arise from two primary factors:

The loop closure threshold may be set too high, resulting in missed matches. Gradually lower this threshold to enhance your results, but be cautious not to set it too low, as this may lead to false positives.
The bag of words utilized might not be well-suited to the input data.
When other methods do not yield sufficient performance, consider generating a new bag-of-words model using data from a camera sensor with characteristics similar to the target sensor.

LoopClosureThreshold
CustomBagOfFeatures

See Loop Closure.

Tracking lost

Loss of tracking can occur due to several factors:

An insufficient number of extracted features, which can result in a gradual decline in the number of tracked features.
An excessive number of frames skipped between each pair of keyframes, particularly in sequences involving aggressive maneuvers.
A threshold that is set too high for the minimum number of tracked features.

Use checkStatus for diagnostic feedback. For more details, see Tracking and Keyframe Management

MaxNumPoints
SkipMaxFrames
TrackFeatureRange

See Tune Keyframe and Tracking Parameters.

Sources of Inaccuracy in SLAM

Achieving high accuracy in visual SLAM is challenging because errors can arise from many sources. Issues with sensor calibration, data association, or environmental complexity can all lead to drift or inaccurate maps. Understanding where these inaccuracies originate is the first step toward improving system performance. The accuracy of SLAM systems can be affected by several factors, including:

Camera Calibration — Inaccurate camera calibration, such as errors in intrinsic parameters can lead to incorrect pose estimation and mapping results.
SLAM Initialization — Issues during initialization. If the system cannot extract or reliably match enough visual features between initial frames, it may struggle to track motion or build a consistent map.
Tracking and Keyframe Management — Tracking can be lost due to factors such as motion blur, fast camera movements, or scenes with few distinctive visual features.
Loop Closure — A missed loop closure can occur if the system either fails to recognize that it has revisited a location (a false negative) or incorrectly detects a loop closure when it hasn't (a false positive). In both cases, accumulated errors in the system’s position estimate may not be properly corrected.
Visual-Inertial SLAM (Sensor Fusion) — Poor sensor fusion between camera and IMU data in SLAM is often caused by IMU calibration and incorrect noise models.

Techniques to Improve Accuracy

Improving SLAM accuracy involves optimizing several key components of the system. This section outlines techniques such as camera calibration, initialization, tracking and keyframe management, loop closure, and visual-inertial sensor fusion, each contributing to more reliable and precise mapping and localization.

Camera Calibration Accuracy

Accurate camera calibration is essential in SLAM because it ensures precise mapping of 3-D environments and reliable pose estimation. A camera calibration is accurate when the reprojection error is low, typically below one pixel, and remains evenly distributed across all images. Undistorting images that contain straight lines should preserve their straightness, with no bending or structured artifacts. The calibration should also perform reliably in downstream tasks such as pose estimation or SLAM and should not introduce curvature, drift, or scale inconsistencies.

Obtain Accurate Intrinsic Parameters

Improve Image Quality Using Calibration Results

SLAM Initialization

SLAM initialization establishes the first reference frame and creates the initial 3-D map of the environment. During this phase, the system detects and matches visual features to estimate the camera’s pose and the positions of scene keypoints. A good initialization means that the position of the camera is stable and does not jump around unexpectedly. It also requires a non-degenerate baseline between the first keyframes, which means the camera must move enough so that the 3D structure of the scene can be estimated clearly. In addition, the points in the map should be well-triangulated, with positive depth and enough parallax for accurate reconstruction. If the first few frames can be tracked reliably and the map does not quickly collapse, change shape, or scale incorrectly, then the initialization is sufficient for normal SLAM operation to continue.

Establish Initial Map and Camera Pose

Improve Image Resolution for Feature Extraction

The initialization stage relies heavily on feature extraction and matching to estimate the initial camera pose and the 3-D positions of keypoints. Image resolution directly affects this process by determining how many distinctive features can be detected and matched across frames. Finding the right balance between feature richness and processing speed is essential for stable and efficient initialization.

Use image resolution between 480x640 (SD) and 1920x1080 (HD) and adjust the tuning parameters accordingly. These tuning parameters are typically specified as name-value arguments in SLAM objects such as monovslam, stereovslam, or rgbdvslam.

MaxNumPoints — Controls the number of ORB keypoints extracted from each frame. Higher values improve map density and matching reliability but results in more computations.
ScaleFactor — Determines the scale step between pyramid levels during feature extraction. Smaller values produce more pyramid levels, increasing scale invariance and matching robutstness at the cost of speed.
NumLevels — Defines the number of pyramid levels for feature detection. More levels improve robustness to scale changes but results in more computations.

Recommended MaxNumPoints values for different resolutions:

Resolution	`MaxNumPoints`	Characteristics
Low (~480x640)	1000	Fewer, less distinctive features Fast processing but low robustness
Medium (~720x1280)	2000	Moderate feature density and distinctiveness Balanced accuracy and speed
High (~1080x1920)	2000-3000	Rich, detailed features Slower initialization due to greater number of computations.

These examples show the tuning of ScaleFactor and NumLevels and their effect on the total number of matches and runtime. Runtime can vary based on your hardware configuration.

Configure Disparity Range for Stereo Matching

Tracking and Keyframe Management

Tracking and keyframe management are critical components of SLAM systems. Tracking estimates the camera's motion over time, while keyframes are selected frames that capture significant changes in viewpoint and serve as stable reference points to maintain map consistency and support robust localization. The methods for managing tracking and keyframes are described in these techniques:

Track Camera Pose Across Frames

Tune Keyframe and Tracking Parameters

Stable tracking depends on maintaining a sufficient number of reliable feature correspondences across frames. If tracking is lost, mapping halts and relocalization may be required. Tracking behavior and keyframe selection are primarily controlled by the SkipMaxFrames and TrackFeatureRange name-value arguments, which can be configured by the monovslam, stereovslam, or rgbdvslam object.

SkipMaxFrames — Defines the maximum number of frames that can be skipped before forcing a new keyframe. Lower values are recommended for sequences with fast or irregular motion. If videos are not recorded at 30fps or have already been downsampled, then consider reducing the value of SkipMaxFrames.

Frame Rate/Motion Scenario	`SkipMaxFrames`	`Characteristics`
Slow or static motion	~20	Skips more frames between keyframes to improve speed when motion is minimal. Safe for static or slow sequences. Excessive skipping during motion can cause drift.
Moderate motion/handheld	10-15	Balances performance and robustness. Maintains consistent localization with manageable computational load.
Fast or abrupt motion	5-10	Reduces skipped frames to maintain robustness during rapid camera movement. Increases computational load but prevents tracking failure.

TrackFeatureRange — Specifies the lower and upper limits for the number of tracked points required for keyframe creation. Helps control the rate of new keyframe insertion. The lower limit should be in the range [30,50]. The upper limit should be approximately 15% of MaxNumPoints value.

Evaluate Tracking Status Messages

The checkStatus enumeration provides diagnostic feedback during runtime, indicating the health of the tracking process. Use these messages to identify issues such as insufficient feature matches or complete tracking loss, and adjust parameters like MaxNumPoints, SkipMaxFrames, or feature extraction settings as needed.

checkStatus Definition Recommended Action

`checkStatus`	Definition	Recommended Action
`TrackingLost`	Too few reliable correspondences exist. The number of tracked feature points in the current frame is below the lower limit set by `TrackFeatureRange`. This indicates the image does not contain enough features, or that the camera is moving too fast.	One or both of: Increase the upper limit value of `TrackFeatureRange` Decrease the `SkipMaxFrames` value to add key frames more frequently.
`TrackingSuccessful`	Tracking is successful. The number of tracked feature points in the current frame is between the lower and upper limits set by `TrackFeatureRange`.	Continue mapping.
`FrequentKeyFrames`	Tracking adds key frames too frequently. The number of tracked feature points exceeds the upper bound of `TrackFeatureRange`.	Consider increasing the lower limit of `TrackFeatureRange` so keyframes aren’t inserted too frequently, or reduce `MaxNumPoints` to limit feature density.

TrackingLost

Too few reliable correspondences exist. The number of tracked feature points in the current frame is below the lower limit set by TrackFeatureRange. This indicates the image does not contain enough features, or that the camera is moving too fast.

One or both of:

Increase the upper limit value of TrackFeatureRange
Decrease the SkipMaxFrames value to add key frames more frequently.

TrackingSuccessful

Tracking is successful. The number of tracked feature points in the current frame is between the lower and upper limits set by TrackFeatureRange.

Continue mapping.

FrequentKeyFrames

Tracking adds key frames too frequently. The number of tracked feature points exceeds the upper bound of TrackFeatureRange.

Consider increasing the lower limit of TrackFeatureRange so keyframes aren’t inserted too frequently, or reduce MaxNumPoints to limit feature density.

Loop Closure

Loop closure is a process in SLAM that detects when the camera revisits a previously mapped area. By recognizing these revisits, the system can correct accumulated drift and refine both the trajectory and the map, ensuring consistency. Loop closure typically runs in the background using feature-based place recognition, matching visual features from the current view against those from past keyframes. Effective loop closure significantly improves the accuracy and robustness of SLAM in large or repeatedly traversed environments.

Tune Loop Closure Parameters

Argument	Purpose	Sensitivity	Best Practice
`CustomBagOfFeatures`	Define a custom Bag of Words (BoW) vocabulary to improve place recognition during loop closure.	Using an untrained or generic vocabulary may cause missed matches or false positives, especially in scenes with repetitive textures or unique lighting.	Train a BoW vocabulary on representative images from the target environment using the `bagOfFeaturesDBoW` object (based on DBoW2). A well-trained vocabulary improves loop closure detection reliability and reduces false matches.
`LoopClosureThreshold`	Set the similarity score threshold for confirming a loop closure candidate.	If set too high, the system may miss valid loop closures; if set too low, it increases the risk of incorrect matches and map distortion.	Start with the default threshold, then adjust: increase in feature-rich environments to reduce false positives, or decrease in low-texture scenes to avoid missed closures. When increasing `MaxNumPoints`, raise this threshold proportionally.

Visual-Inertial SLAM (Sensor Fusion)

Visual-inertial SLAM uses both camera and IMU data to improve motion tracking. By combining these measurements, the system stays accurate even during rapid motion or challenging visual conditions, where feature extraction degrades. Key techniques for leveraging IMU data and optimizing its integration:

Improve SLAM Robustness Using IMU Data

Tune IMU Parameters

Estimate Gravity Rotation and Pose Scale

IMU initialization in monocular visual-inertial SLAM involves estimating both gravity rotation and pose scale. These steps are essential to resolve the scale ambiguity inherent in monocular vision and to align inertial and visual data within a consistent reference frame.

Gravity rotation — The gravity rotation estimation aligns the inertial measurements with the visual data, ensuring the orientation of the system reflects the true gravitational direction. This alignment is essential for accurate motion estimation because accelerometer readings include the constant acceleration due to gravity, which does not represent actual motion and must be removed before sensor fusion.
Since the input pose reference frame may not match the IMU local navigation frame, typically North–East–Down (NED) or East–North–Up (ENU), in which the gravity direction is known, it is necessary to transform the estimated camera poses to the local navigation frame to remove the known gravity effect. The estimated rotation provides this transformation, aligning the input pose reference frame to the IMU local navigation reference frame.
The estimated gravity alignment is returned in the GravityRotation property. When this alignment is successfully estimated, the ISIMUAligned property is set to true.
Pose scale — Estimation determines the real-world metric scale of the scene, enabling accurate and drift-free 3-D reconstruction and trajectory estimation.
For monocular systems, estimating the pose scale is necessary because the real-world scale of the scene cannot be directly inferred from images alone. By leveraging inertial data, the system can resolve this scale ambiguity, resulting in more accurate and reliable mapping and localization.
The estimated scale factor is available in the IMUScale property.

Together, the gravity rotation and pose scale estimations enable the system to produce metrically accurate 3-D reconstructions and trajectories. It is important to note that camera-IMU fusion cannot proceed if the IMU initialization is not successful.

The animation illustrates the effects of properly tuning the gravity rotation and pose scale estimations by showing the SLAM trajectory before and after alignment. After the estimation is applied, the trajectory plot is automatically updated to reflect the path in the newly aligned reference frame, incorporating the corrected (estimated) scale. This ensures that the visualized trajectory is both spatially accurate and metrically consistent with the real-world environment.

Animation showing the before and after alignment effects of tuning the gravity rotation and pose scale estimations.

The monovslam, stereovslam, and rgbdvslam SLAM objects automatically estimate gravity rotation and pose scale using internally designed factor graphs. With sufficient data coverage and appropriate tuning of the NumPosesThreshold and AlignmentThreshold name-value pairs, reliable initialization can be achieved with minimal user intervention.

Best Practices for a Successful Camera-IMU Alignment:

The estimation of gravity rotation and pose scale is typically performed early in the sequence once a sufficient number of camera poses have been collected. For more information on this calibration technique, see Gravity Rotation and Pose Scale (Navigation Toolbox).

To obtain a reliable estimation of gravity rotation and pose scale, the collected poses should satisfy several conditions:

Number of poses — Try to keep the number of poses under 30 for most cameras and frame rates. A larger number increases drift, while fewer than 10 poses may not provide enough information for a robust estimation.
Motion diversity — Include rotation around all three axes.
Vertical translation — Incorporate upward motion (opposite the gravity direction).
Pose accuracy — Ensure accurate camera pose estimates by tuning SLAM name-value arguments, such as TrackFeatureRange or SkipMaxFrames.

The number of camera poses used for camera–IMU alignment is controlled by the NumPosesThreshold and AlignmentFraction name-value arguments. These settings determine when the alignment process begins and how much of the available data is used.

To perform accurate calibration between the camera and IMU, a sufficient number of camera-only poses must be collected. The NumPosesThreshold defines the minimum number of camera poses required before alignment can begin, while AlignmentFraction determines the proportion of the total dataset to use during the alignment process.

In summary, these arguments help ensure that enough spatial and temporal information is available to reliably align the camera and IMU data streams. Each plays a distinct role in the calibration process:

NumPosesThreshold — Number of estimated camera poses required to trigger IMU alignment. Too few poses yield unstable estimates; too many can incorporate drift. Try to keep the number of poses under 30 for most cameras and frame rates. A larger number increases drift, while fewer than 10 poses may not provide enough information for a robust estimation.
Choosing an appropriate threshold is critical. A value set too low may not provide enough data for accurate calibration, while a value set too high can introduce drift and noise from accumulated pose errors.
AlignmentFraction — Subset of the most recent poses used for alignment, specified as a scalar in the range of (0,1]. This helps exclude early noisy estimates for more accurate calibration. The number of poses considered is calculated as
round(NumPosesThreshold*AlignmentFraction)
This value effectively filters out initial, potentially noisy pose estimates, ensuring only the most relevant data contributes to the alignment for improved accuracy.

Key Takeaways for Improving SLAM Accuracy

Achieving robust and accurate SLAM depends on careful tuning and validation. After setting parameters for camera calibration, initialization, tracking, loop closure, and IMU fusion, validate your system by visualizing trajectories, checking for drift, and confirming that loop closures and IMU alignment occur consistently. To compare estimated trajectories against ground truth, you can use the compareTrajectories function.

Use the diagnostic messages, mapping visualizations, and performance metrics to identify weak points in the processing of your data and environment. Adjust parameters as needed until tracking remains stable under varying motion, lighting, and environmental conditions.

Improving SLAM accuracy is an iterative process that combines precise sensor calibration, thoughtful parameter tuning, and validation against real-world data. By systematically refining your configuration and verifying performance using the visualization and diagnostic tools in the Computer Vision Toolbox and the Navigation Toolbox™, you can achieve high-accuracy, real-time SLAM suitable for robotics, AR, and autonomous navigation applications.