Cluster error: Opening log file

Question

Ceren GURKAN le 25 Fév 2013

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/64871-cluster-error-opening-log-file

Hi everybody,

I am running a matlab code through university cluster which is basically a for loop that submits job to the cluster, waits 2.5 hours for the results to be generated and moves to the next iteration. However, say it completes generation 8, and after 2.5 hours it starts generation 9 and also completes that but in the point it suppose to move to generation 10 this error message appears in the screen "Opening log file: /eng/cvcluster/eggurkanc/java.log.3643" and it does not move to 10th generation. I have no idea how to cope with that, any help will be appreciated.

Thanks in advance.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Jason Ross le 25 Fév 2013

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/64871-cluster-error-opening-log-file#answer_76455

Modifié(e) : Jason Ross le 25 Fév 2013

Are you out of disk space? Have you exceeded a disk quota? Looks like you aren't in a normal "home" directory, so there may be more restrictive limits on the cluster.

Does the queue you are submitting to have restrictions on job time or hours of the day it runs? You might need to check with the admins.

Are you getting pre-empted by some other job that jumps the queue?

Are there any emails from the cluster about your job?

If you check the job status what does it show? (this will depend on the scheduler you are using to know what the command is, but it might be something like qstat)

4 commentaires
Afficher 2 commentaires plus anciensMasquer 2 commentaires plus anciens

Ceren GURKAN le 26 Fév 2013

I am not sure if I understand you completely or not, so first of all sorry for that :( , what I can say is that I am just running this specific code and nothng else. So not sure if could I be using the log file simultaneously, and if I do so how I can understand and prevent that to happen ???

Jason Ross le 26 Fév 2013

Ouvrir dans MATLAB Online

One of the common problems that happens on clustered systems is that something that you test/prototype in single execution that works becomes a shared resource when you run it on a cluster. Since you can now have multiple threads of execution acting on the same resource, this can become a problem. For example, the following will work fine with one process

cd to /cluster/shared/filesystem
open a file named "myresults"
write to "myresults"
close "myresults" when done.

Then you submit this to a cluster and problems start. When you had one process working on that file, everything was OK. Now you have n processes trying to write to the file simultaneously. You end up with (at best) a jumbled mess of output, and at worst you deadlock and get confused.

To get out of this, the solutions are many. One is to use the PID to try and make the log unique (which it looks like is already being tried -- but you can still get a clash). You can also use random numbers, machine name, etc to further make files unique (and then concatenate them at the end of your run).

This is a pretty simple example -- but I'd inspect and further instrument the code to see where it's getting to and what is stopping the execution.

Connectez-vous pour commenter.

Cluster error: Opening log file

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

4 commentaires
Afficher 2 commentaires plus anciensMasquer 2 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Community Treasure Hunt

Cluster error: Opening log file

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

4 commentaires Afficher 2 commentaires plus anciensMasquer 2 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

4 commentaires
Afficher 2 commentaires plus anciensMasquer 2 commentaires plus anciens