export BERT to MATLAB: Load pre-trained BERT models
- We have tested to export models from PyTorch and TensorFlow.
- Pre-trained models for a downstream task are supported. This comprises text-classification (e.g. sentiment, multiclass or multilabel models), token-classification (e.g. for named entity recognition NER) and question-answering.
- Models with a different structure then BERT (like Roberta etc.) are not supported.
- Only models that use the word-piece tokenizer are currently supported.
- Install Python (we have only tested Python 3.9.x-3.11.x)
- Generate an environment using the “bert2matlab.yml” provided in our “Python”-folder. This installs PyTorch, TensorFlow, and HuggingFace’s “transformers” libraries, to be able to import the pre-trained Python models. GPU support is not necessary.
- A specific IDE is not necessary to export models, you can use the Python command line interface.
- For example, these commands will export a plain, pre-trained German BERT model from HuggingFace, where the import syntax consists of the HuggingFace model name, the type of model (“none”, “text-classification”, “token-classification” or "question-answering"), and the model format (“tf” or “pt”):
- For the Python syntax to import a model from HugingFace, see the included "MinimalExample.txt" file.
- It is also possible to import own models. Simply provide the path to the model instead of a model name in Huggingface. As the internal model is loaded using the Python Transformers library with the function AutoModel.from_pretrained(), the same conditions for loading apply as for this function. It is also possible to use a separate tokenizer, either by using the tokenizer name from HUggingface or a path to an own tokenizer. However, the tokenizer must be compatible to the BERT model. This only makes sense if for example the path to the own BER model contains no tokenizer, such that it has to be provided separately.
- Use the "readBertFromPython.m" function to load the model into Matlab and use powerful NLP.
- The sbert-embeddings are mostly based on the MPNet-BERT model with relative positional encoding, which would require a little modification of the Mathworks BERT-implementation.
- Multilingual sentence embeddings rely on Byte-Pair-Encoding (BPE) for which we don’t have a compatible implementation.
Citation pour cette source
Moritz Scherrmann (2024). export BERT to MATLAB: Load pre-trained BERT models (https://www.mathworks.com/matlabcentral/fileexchange/125305-export-bert-to-matlab-load-pre-trained-bert-models), MATLAB Central File Exchange. Extrait(e) le .
Compatibilité avec les versions de MATLAB
Plateformes compatibles
Windows macOS LinuxTags
Remerciements
Inspiré par : Transformer Models
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Découvrir Live Editor
Créez des scripts avec du code, des résultats et du texte formaté dans un même document exécutable.
exportBertToMatlab/Matlab/Model_Tests
exportBertToMatlab/Matlab/helperFunctions
exportBertToMatlab/Matlab/helperFunctions/squadUtil
Version | Publié le | Notes de version | |
---|---|---|---|
2.0.3 | - Update README |
||
2.0.2 | - Update of input structure of BERT heads for downstream tasks. Now, it is enough to pass the pooler weights (in case of sequence classification) and the task specific weights to the respective functions.
|
||
1.0.4 | This version updates all codes to the new transformer version of Matlab 2023b. |
||
1.0.3 | Changed license in files |