Main Content

isInNetworkDistribution

Determine whether data is within the distribution of the network

Since R2023a

    Description

    tf = isInNetworkDistribution(net,X) returns a logical array that indicates which observations in X are in-distribution (ID) and which observations are out-of-distribution (OOD). If an observation is ID, then the corresponding element of tf is 1 (true). Otherwise, the corresponding element of tf is 0 (false).

    The function computes the distribution confidence score for each observation using the baseline method. For more information, see Softmax-Based Methods. The function classifies any observation with a score less than or equal to the threshold as OOD. To use the default threshold value, use this syntax.

    To set the threshold, use the thr name-value argument. Alternatively, use the networkDistributionDiscriminator function to create a discriminator object that automatically finds an optimal threshold and use that as the first input argument instead of net. You can also use the discriminator object to specify a different method to use to compute the distribution confidence scores.

    example

    tf = isInNetworkDistribution(discriminator,X) determines which observations in X are ID and which observations are OOD using discriminator. To create a discriminator object, use the networkDistributionDiscriminator function. This syntax uses the threshold stored in the Threshold property of discriminator. Use this syntax to specify additional options for the software to use when it computes the distribution confidence scores and to automatically find a suitable threshold. For example, when creating a discriminator, you can specify whether to use a target true positive or false positive rate to pick the threshold. For more information, see networkDistributionDiscriminator.

    example

    tf = isInNetworkDistribution(net,X1,...,XN) determines whether the data is in distribution for networks with multiple inputs using the specified in-memory data.

    tf = isInNetworkDistribution(discriminator,X1,...,XN) determines whether the data is in distribution for a discriminator constructed with a network with multiple inputs using the specified in-memory data.

    tf = isInNetworkDistribution(___,Name=Value) sets the Threshold and VerbosityLevel options using one or more name-value arguments in addition to the input arguments in previous syntaxes.

    example

    Examples

    collapse all

    Load a pretrained classification network.

    load("digitsClassificationMLPNetwork.mat")

    Load data. Convert the data to a dlarray object.

    X = digitTrain4DArrayData;
    X = dlarray(X,"SSCB");

    Determine if the data is ID.

    tf = isInNetworkDistribution(net,X);

    Find the proportion of observations that the function classifies as OOD.

    oodProportion = (sum(1-tf)/numel(tf))
    oodProportion = 
    0.0026
    

    Load a pretrained classification network.

    load("digitsClassificationMLPNetwork.mat")

    Load data and convert the data to a dlarray object.

    X = digitTrain4DArrayData;
    X = dlarray(X,"SSCB");

    Determine if the data is ID using a threshold of 0.9.

    tf = isInNetworkDistribution(net,X,Threshold=0.9);

    Find the proportion of observations that the function classifies as OOD.

    oodProportion = (sum(1-tf)/numel(tf))
    oodProportion = 
    0
    

    Load a pretrained classification network.

    load("digitsClassificationMLPNetwork.mat")

    Load ID data. Convert the data to a dlarray object.

    X = digitTrain4DArrayData;
    X = dlarray(X,"SSCB");

    Create a discriminator using the networkDistributionDiscriminator function. Set the method to "odin" and the true positive goal to 0.975. The software finds the threshold that satisfies the true positive goal.

    method = "odin";
    discriminator = networkDistributionDiscriminator(net,X,[],method, ...
        TruePositiveGoal=0.975);

    Determine if data is ID.

    tf = isInNetworkDistribution(discriminator,X);

    Find the true positive rate.

    truePositives = sum(tf);
    falseNegatives = sum(1-tf); 
    truePositiveRate = truePositives/(truePositives + falseNegatives)
    truePositiveRate = 
    0.9750
    

    Input Arguments

    collapse all

    Neural network, specified as a dlnetwork object with a single softmax output.

    The software uses the baseline method to compute the distribution confidence scores. To use another method, such as ODIN or energy, specify discriminator as the first input argument. For more information about methods for computing distribution confidence scores, see Distribution Confidence Scores.

    For networks without a single softmax layer, create a discriminator object using the networkDistributionDiscriminator function with method set to "hbos" and use this object as the first input argument instead.

    Input data, specified as a formatted dlarray or a minibatchqueue object that returns a formatted dlarray. For more information about dlarray formats, see the fmt input argument of dlarray.

    Use a minibatchqueue object for a network with multiple inputs where the data does not fit on disk. If you have data that fits in memory that does not require additional processing, then it is usually easiest to specify the input data as in-memory arrays. For more information, see X1,...,XN.

    In-memory data for multi-input network, specified dlarray objects. The input Xi corresponds to the network input which is net.InputNames(i) if net is the first input or discriminator.Network.InputNames(i) if discriminator is the first input.

    For multi-input networks, if you have data that fits in memory that does not require additional processing, then it is usually easiest to specify the input data as in-memory arrays. If you want to make predictions with data stored on disk, then specify X as a minibatchqueue object.

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

    Example: TruePositiveGoal=0.99,Temperature=10

    Distribution threshold, specified as a scalar in the range [0, 1]. The software uses this value to separate the ID and OOD data.

    Dependency

    You can only specify this input when the first argument is net. If the first argument is discriminator, then the software uses the threshold stored in the Threshold property of discriminator. For more information, see networkDistributionDiscriminator.

    Verbosity level of the Command Window output, specified as one of these values:

    • "off" — Do not display progress information.

    • "summary" — Display a summary of the progress information.

    • "detailed" — Display detailed information about the progress. This option prints the mini-batch progress. If you do not specify the input data as a minibatchqueue object, then the "detailed" and "summary" options print the same information.

    More About

    collapse all

    References

    [1] Shalev, Gal, Gabi Shalev, and Joseph Keshet. “A Baseline for Detecting Out-of-Distribution Examples in Image Captioning.” In Proceedings of the 30th ACM International Conference on Multimedia, 4175–84. Lisboa Portugal: ACM, 2022. https://doi.org/10.1145/3503161.3548340.

    [2] Shiyu Liang, Yixuan Li, and R. Srikant, “Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks” arXiv:1706.02690 [cs.LG], August 30, 2020, http://arxiv.org/abs/1706.02690.

    [3] Weitang Liu, Xiaoyun Wang, John D. Owens, and Yixuan Li, “Energy-based Out-of-distribution Detection” arXiv:2010.03759 [cs.LG], April 26, 2021, http://arxiv.org/abs/2010.03759.

    [4] Markus Goldstein and Andreas Dengel. "Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm." KI-2012: poster and demo track 9 (2012).

    [5] Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu, “Generalized Out-of-Distribution Detection: A Survey” August 3, 2022, http://arxiv.org/abs/2110.11334.

    [6] Lee, Kimin, Kibok Lee, Honglak Lee, and Jinwoo Shin. “A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks.” arXiv, October 27, 2018. http://arxiv.org/abs/1807.03888.

    Extended Capabilities

    expand all

    Version History

    Introduced in R2023a

    expand all