encodeTokens
Syntax
Description
[
encodes tokenCodes
,segments
] = encodeTokens(tokenizer
,tokens
)tokens
using the specified tokenizer and returns the token
codes and segments. This syntax automatically adds special tokens to the input.
[
encodes the sentence pair tokenCodes
,segments
] = encodeTokens(tokenizer
,tokens1,tokens2
)tokens1,tokens2
. This syntax automatically adds
special tokens to the input.
[
also returns the mapping between the input and the encoded output.tokenCodes
,segments
,idx
] = encodeTokens(___)
___ = encodeTokens(___,AddSpecialTokens=
specifies whether to add special tokens to the input.tf
)
Examples
Input Arguments
Output Arguments
Algorithms
References
[1] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding" Preprint, submitted May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805.
[2] Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun et al. "Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation." Preprint, submitted October 8, 2016. https://doi.org/10.48550/arXiv.1609.08144
Version History
Introduced in R2023b