Main Content

detectTextCRAFT

Detect texts in images by using CRAFT deep learning model

Since R2022a

    Description

    bboxes = detectTextCRAFT(I) detects texts in images by using character region awareness for text detection (CRAFT) deep learning model. The detectTextCRAFT function uses a pretrained CRAFT deep learning model to detect texts in an image. The pretrained CRAFT model can detect 9 languages that include Chinese, Japanese, Korean, Italian, English, French, Arabic, German, and Bangla (Indian).

    example

    Note

    To use the pretrained CRAFT model, you must install the Computer Vision Toolbox™ Model for Text Detection. You can download and install the Computer Vision Toolbox Model for Text Detection from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. To run this function, you will require the Deep Learning Toolbox™.

    bboxes = detectTextCRAFT(I,roi) detects texts within a region-of-interest (ROI) in the image.

    example

    bboxes = detectTextCRAFT(___,Name=Value) specifies additional options by using name-value pair arguments. You can use the name-value pair arguments to fine-tune the detection results.

    example

    Examples

    collapse all

    Read an input image into the MATLAB workspace.

    I = imread("handicapSign.jpg");

    Compute the text detection results by using the detectTextCRAFT function. The region and the affinity thresholds are set to default values. The output is a set of bounding boxes that contain the detected text regions.

    bboxes = detectTextCRAFT(I);

    Draw the output bounding boxes on the image by using the insertShape function.

    Iout = insertShape(I,"rectangle",bboxes,LineWidth=3);

    Display the text detection results.

    figure
    imshow(Iout)

    Figure contains an axes object. The hidden axes object contains an object of type image.

    Read an input image into the MATLAB workspace.

    visiondatadir = fullfile(toolboxdir('vision'),'visiondata'); 
    I = imread(fullfile(visiondatadir,'imageSets','books','pairOfBooks.jpg'));

    Specify a region of interest (ROI) within the input image.

    roi = [120,80,250,200];

    Detect texts within the specified ROI by using the detectTextCRAFT function. The region and affinity thresholds are set to default values. The output is a set of bounding boxes that contain the detected text regions.

    bboxes = detectTextCRAFT(I,roi);

    Draw the ROI and the output bounding boxes on the input image. Display the text detection results.

    I = insertObjectAnnotation(I,"rectangle",roi,"ROI",Color="green");
    Iout = insertShape(I,"rectangle",bboxes,LineWidth=3);
    figure
    imshow(Iout)

    Figure contains an axes object. The hidden axes object contains an object of type image.

    This example shows how to detect each character in the text regions of an input image by using the CRAFT model. You can achieve this by modifying the affinity threshold. This example also demonstrates the effect of different affinity threshold values on the detection results.

    Read an input image into the MATLAB workspace.

    visiondatadir = fullfile(toolboxdir('vision'),'visiondata'); 
    I = imread(fullfile(visiondatadir,'bookCovers','book27.jpg'));

    Specify the affinity threshold values to consider for detecting the text regions in the image.

    threshold = [1 0.1 0.01 0.001 0.0004];

    Preallocate a 4-D array Iout to store the output image with detection results.

    Iout = zeros(size(I,1),size(I,2),size(I,3),length(threshold));

    Compute the output for each affinity threshold value specified at the input. The output is a set of bounding boxes that contain the detected text regions. Draw the output bounding boxes on the image by using the insertShape function. The region threshold is set to the default value, 0.4.

    for cnt = 1:length(threshold)
        bboxes = detectTextCRAFT(I,LinkThreshold=threshold(cnt));
        Iout(:,:,:,cnt) = insertShape(I,"rectangle",bboxes,LineWidth=3);
    end

    Display the text detection results obtained for different values of affinity threshold. You can notice that as the affinity threshold value decrease, the characters with less affinity scores are considered as connected components and are grouped as a single instance. For good localization and detection results, the affinity threshold must be greater than zero.

    figure
    montage(uint8(Iout),Size=[1 5],BackgroundColor="white");
    title(['LinkThreshold = ' num2str(threshold(1)) ' | LinkThreshold = ' num2str(threshold(2)) ' | LinkThreshold = ' num2str(threshold(3)) ...
        ' | LinkThreshold = ' num2str(threshold(4)) ' | LinkThreshold = ' num2str(threshold(5))]);

    Figure contains an axes object. The hidden axes object with title LinkThreshold = 1 | LinkThreshold = 0.1 | LinkThreshold = 0.01 | LinkThreshold = 0.001 | LinkThreshold = 0.0004 contains an object of type image.

    Input Arguments

    collapse all

    Input image, specified as a 2-D grayscale image or 2-D color image.

    Data Types: single | double | int16 | uint8 | uint16 | logical

    Search a rectangular region-of-interest in an image, specified as a four-element vector of the form [x y width height]. The vector specifies the upper left corner and size of a rectangular region in pixels. The region must be fully contained in the image.

    When you specify this value, the detectTextCRAFT function detects texts that are present only within this ROI.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: bboxes = detectTextCRAFT(I,MaxSize=[10,10]) specifies the maximum size of the text region to detect in the input image

    Region threshold for localizing each character in the image, specified as a positive scalar in the range [0, 1]. To increase the number of detections, lower the region threshold value. However, this will also result in false-positives. To reduce the number of false-positives, increase the region threshold value.

    Data Types: single | double

    Link threshold for grouping adjacent characters into a word, specified as a positive scalar in the range [0, 1]. You can increase the number of character level detections by increasing the link threshold. To detect each character in the image, set this value to 1. For good localization and detection results, the link threshold must be greater than zero.

    Data Types: single | double

    Size of smallest detectable text region in the image, specified as a two-element vector of form [height width].

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Size of largest detectable text region in the image, specified as a two-element vector of form [height width]. By default, this value is set to the height and width of the input image.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Hardware resource for processing images with the CRAFT model, specified as "auto", "gpu", or "cpu".

    ExecutionEnvironmentDescription
    "auto"Use a GPU if available. Otherwise, use the CPU. The use of GPU requires Parallel Computing Toolbox™ and a CUDA® enabled NVIDIA® GPU. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).
    "gpu"Use the GPU. If a suitable GPU is not available, the function returns an error message.
    "cpu"Use the CPU.

    Data Types: char | string

    Performance optimization, specified as "auto", "mex", or "none".

    AccelerationDescription
    "auto"Automatically apply a number of optimizations suitable for the input network and hardware resource.
    "mex"Compile and execute a MEX function. This option is available when using a GPU only. You must also have a C/C++ compiler installed. For setup instructions, see MEX Setup (GPU Coder).
    "none"Disable all acceleration.

    The default option is "auto". If you use the "auto" option, MATLAB® does not ever generate a MEX function.

    Using the "Acceleration" options "auto" and "mex" can offer performance benefits, but at the expense of an increased initial run time. Subsequent calls with compatible parameters are faster. Use performance optimization when you plan to call the function multiple times using new input data.

    The "mex" option generates and executes a MEX function based on the network and parameters used in the function call. You can have several MEX functions associated with a single network at one time. Clearing the network variable also clears any MEX functions associated with that network.

    The "mex" option is only available when you are using a GPU. Using a GPU requires Parallel Computing Toolbox and a CUDA enabled NVIDIA GPU. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox). If Parallel Computing Toolbox or a suitable GPU is not available, then the function returns an error.

    Output Arguments

    collapse all

    Bounding boxes specifying the detected text regions, returned as an M-by-4 matrix. M is the number of detected text regions. Each row in the matrix is a vector of form [x y width height]. The vector specifies the upper left corner and size of the detected region in pixels.

    Extended Capabilities

    Version History

    Introduced in R2022a

    expand all