How can i make vision transformer model that recives input, multiple images

Question

수민 안 le 26 Déc 2023

1
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2064127-how-can-i-make-vision-transformer-model-that-recives-input-multiple-images

Modifié(e) : Debraj Maji le 27 Déc 2023

Is it possible to create or learn a deep learning model in Matlab that receives multiple images as input and has one sequence as output?

For example, I wonder how to receive 20 consecutive images as inputs and output a sequence such as '11153'.

Thanks for reading

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Debraj Maji le 27 Déc 2023

2
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2064127-how-can-i-make-vision-transformer-model-that-recives-input-multiple-images#answer_1378312

Modifié(e) : Debraj Maji le 27 Déc 2023

Hi @수민 안

I understand that you are trying to create a Vision Transformer(ViT) Model which takes multiple images as input and generates a sequence.

Creating a Vision Transformer (ViT) model that receives multiple images as input in MATLAB involves adapting the ViT architecture to handle sequences of images. The original ViT architecture is designed for single-image classification tasks. To modify it for sequential multi-image input, you would treat each image as a token in the sequence and process these tokens in a way similar to how transformers process sequential data in natural language processing (NLP).

Here is a conceptual outline of how you could approach this:

Step 1: Preprocessing:

Resize all images to a fixed size.
Flatten each image into a 1D vector or use patches as tokens, as done in ViT.
Optionally, add positional encoding to retain the order of the images.

Step 2: Transformer Encoder:

Use a series of transformer encoder layers to process the sequence of image tokens.
Each transformer encoder layer would include multi-head self-attention and feedforward neural networks.

Step 3: Sequence Decoder:

After processing the images through the transformer encoder, you need to decode the output into a sequence.
You can use an RNN, LSTM, or another transformer decoder to generate the output sequence.

Step 4: Output Layer:

The output layer would produce the final sequence, which could be a series of classification layers, one for each position in the output sequence.

In MATLAB, you can use Deep Learning Toolbox to create custom layers and models. Currently MATLAB does support a pre-defined ViT. However this scenario would require you to implement the transformer layers manually. You can follow this documentation for steps on how to define custom Deep Learning Layers:

https://www.mathworks.com/help/deeplearning/ug/define-custom-deep-learning-layer.html

For the Pretrained ViT available in MATLAB you can refer to the following documentation: https://www.mathworks.com/help/vision/ref/visiontransformer.html

For additional info on pre-defined Deep Learning Layers in MATLAB you can refer to the following link:

https://www.mathworks.com/help/deeplearning/ug/list-of-deep-learning-layers.html

I hope this resolves your query.

With regards,

Debraj.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 2

Shubham le 27 Déc 2023

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2064127-how-can-i-make-vision-transformer-model-that-recives-input-multiple-images#answer_1378252

Hi 수민 안,

Yes, it is possible to create a deep learning model in MATLAB that takes multiple images as input and outputs a sequence of numbers. This can be done by using a Convolutional Neural Network (CNN) for image feature extraction combined with a Recurrent Neural Network (RNN) or Long Short-Term Memory (LSTM) network for sequence prediction.

Here's a high-level overview of how you might approach this:

Data Preparation:

Organize your images and corresponding sequence labels.
Preprocess the images (resizing, normalization, etc.).
Split the data into training, validation, and test sets.

2. Model Architecture:

Use a CNN as the feature extractor for the images. You can use pre-trained networks like VGG, ResNet, or create your own.
Flatten the output of the CNN or use global pooling to reduce the dimensionality.
Feed the output into an RNN or LSTM layer(s) to handle the sequence prediction.
The final output layer should have the number of units corresponding to the length of your output sequence with a softmax activation if you are treating each position as a classification problem.

3. Training:

Compile the model with an appropriate loss function (e.g., categorical cross-entropy if you're treating the sequence prediction as a classification problem).
Train the model using the training data with validation data to monitor performance.

4. Evaluation and Testing:

Evaluate the model's performance on the test set.
Adjust hyperparameters or model architecture as needed based on performance.

5. Prediction:

Use the trained model to predict sequences from new sets of images.

I hope this helps!

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

How can i make vision transformer model that recives input, multiple images

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (2)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

How can i make vision transformer model that recives input, multiple images

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (2)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens