Slow Training of RL Agent on HPC Compared to Local Machine
6 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I am currently running a MATLAB 2021a script (execute.m added as attachment for reference) to train a reinforcement learning (RL) agent in Simulink to control a drone. While training it in my local machine it connects to 6 workers and the training speed is much higher compared to HPC which is connected to 12 workers. I have ensured that the whole node is assigned to the the job with 28 cores in total.
Here is the SLURM script:
#!/bin/bash -l
#SBATCH -J MATLAB_Execute # Job name
#SBATCH -N 1 # Number of nodes
#SBATCH -n 1 # Number of tasks (1 instance of the program)
#SBATCH -c 28 # Number of CPU cores per node
#SBATCH --gres=gpu:0 # Number of GPUs per node
#SBATCH --time=1:00:0 # Time limit (10 minutes)
#SBATCH -p batch -C skylake # Partition name (GPU partition)
export JAVA_LOG_DIR=/scratch/users/gshetty/java_logs
mkdir -p $JAVA_LOG_DIR
# Load the MATLAB module
module load math/MATLAB/2021a
module load openssl/1.1.1k
export LD_PRELOAD=/usr/lib64/libcrypto.so.1.1
# Run the MATLAB script
srun matlab -nodisplay -nosplash -r execute -logfile execute.out
what can be the potential reason?
4 commentaires
Harald
le 7 Juin 2024
Hi,
that's a big difference, indeed. If it takes hours on HPC, I am surprised that it finishes at all since you have specified a time limit.
If you get error messages, please copy the precise error message you get and the code that throws them. That makes it easier to investigate.
Assuming that we are speaking of run time and not any time that your job may be queued, waiting for resources to become available, I cannot imagine why it would take that long on HPC.
If there are no further ideas here, it may be an idea to reach out to Technical Support: https://www.mathworks.com/support/contact_us.html
Best wishes,
Harald
Réponses (0)
Voir également
Catégories
En savoir plus sur Containers dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!