Main Content

decode

Convert token codes to tokens

Since R2023b

    Description

    str = decode(tokenizer,tokenCodes) decodes the specified token codes using the tokenizer object tokenizer.

    example

    Examples

    collapse all

    Load a pretrained BERT-Base neural network and corresponding tokenizer using the bert function.

    [net,tokenizer] = bert;

    View the tokenizer.

    tokenizer
    tokenizer = 
      bertTokenizer with properties:
    
            IgnoreCase: 1
          StripAccents: 1
          PaddingToken: "[PAD]"
           PaddingCode: 1
            StartToken: "[CLS]"
             StartCode: 102
          UnknownToken: "[UNK]"
           UnknownCode: 101
        SeparatorToken: "[SEP]"
         SeparatorCode: 103
           ContextSize: 512
    
    

    Decode an array of token codes using the decode function.

    tokenCodes = [102 7227 7443 7543 2390 4373 16045 2100 15067 2014 19082 103];
    str = decode(tokenizer,tokenCodes)
    str = 
    "[CLS] bidirectional encoder representations from transformers [SEP]"
    

    Input Arguments

    collapse all

    Tokenizer, specified as a bertTokenizer or bpeTokenizer object.

    Token codes, specified as a vector of positive integers.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Output Arguments

    collapse all

    Decoded tokens, returned as a string array.

    Algorithms

    collapse all

    References

    [1] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding" Preprint, submitted May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805.

    [2] Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun et al. "Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation." Preprint, submitted October 8, 2016. https://doi.org/10.48550/arXiv.1609.08144

    Version History

    Introduced in R2023b