Speeding up code: pre-allocation, vectorization, parfor, spmd....

Question

0 votes

I have written a very tricky and large bit of code. Its processing a data set of 5 million values. In outline the code goes like this.

1. Outer parfor loop (1: 500K).

2. Next loop (1: ~100)

3. Test lots of conditions

4. Inner for loop. 1: K. Assign values to a growing cell array, where K is the length of it. Each cell contains a struct, which in turn contains cell arrays (Its a really high dimensional data set!).

The problem is that it currently takes 6 seconds to carry out one run of the outer loop. I need to dramatically speed it up. (24 hours run time would be ok. 24 mins would be better :)).

I have used the profiler extensivly, and other than a warning telling me to pre-allocate, it all looks ok.

I also use http://research.microsoft.com/en-us/um/people/minka/software/lightspeed/

I have read lots of stuff like : http://www.ee.columbia.edu/~marios/matlab/Writing_Fast_MATLAB_Code.pdf:

Stuff I am NOT currently doing:

1. pre-allocate the cell object. The reason is that I would need to search through the object to find which values are active or not each time I accessed the object. I assume this would take more time than would be saved by pre-allocation.

2. Vectorization. This is what I normally use when possible. However, this is such a complex bit of code, with many loops inside loops I dont even know where to start. Any hints?

3. I have the parallel toolbox, though only one 64 bit machine with enough RAM to load the dataset. Should I be using this? I have not done so before.

I am using win7 on a quod core machine with 16GB.

Any sensible comments welcome. thank you.

2 commentaires
Afficher Aucune Masquer Aucune

Jan le 7 Fév 2013

A too pessimistic pre-allocation is usually much cheaper than letting a an array grow repeatedly. Either allocate to many elements initially or let the array grow by 1000 elements on demand to reduce the effects.
Try to vectorize the innermost loop only. But in most cases I've seen here in the forum, eliminating repeated calculations has been a great advantage already.
Find the bottlenecks and use PARFOR to use all cores of your machine.

Matlab2010 le 8 Fév 2013

quick update.

The loop time has decreased from 6s to 0.05s by pre-allocating the complex cell array/structure object.

thank you for your comments so far. I will update in much more detail next week.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Jonathan Sullivan le 6 Fév 2013

1 vote

It's really hard to say what exactly you can do without knowing the nature of what is inside those for-loops.

Some ideas come to mind:

Eliminate redundant calculations
Vectorize any operations that you can
Avoid using things like repmat and use things like bsxfun, where possible

But post your code. It's really hard to say whats applicable without seeing it.

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Jan le 6 Fév 2013

+1 for "Eliminate redundant calculations", while "Vectorize any operation" might be counterproductive, when the required temporary arrays are more expensive than the overhead for running loops.

Connectez-vous pour commenter.

Answer 2

Jason Ross le 6 Fév 2013

Modifié(e) : Jason Ross le 6 Fév 2013

1 vote

Check using the Resource Monitor to see if you are swapping to disk while you are running. Given that you say you have enough RAM to load the data set in memory, I'm wondering if you also have enough RAM to deal with everything else that's going on. If that's the case, get more RAM.

It might also be useful to look at CPU utilization rates as well to see if they are pegged out. When you say you have four cores, are those compute cores or the "hyper-threaded" ones?

As for the parallel toolbox, if you only have one machine available, and you are taking up all the RAM with your existing program, then parallelization isn't likely to generate a speedup. But if you look at your RAM utilization and CPU utilization and could fit multiple copies of your program on the machine, it might help -- but if you can't, you'll likely end up going slower.

If you are doing a lot of disk I/O, a SSD will trump a "traditional" hard drive for performance and access time.

But it's likely that the above should be considered when you know your program is as tight as it can be. Sometimes throwing money at a problem works, but not always.

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 3

Dan K le 6 Fév 2013

Modifié(e) : Dan K le 6 Fév 2013

0 votes

Make sure that it is a function and not a script... Huge speed difference there. When you pre-allocate, make sure your allocating enough for the eventual usage. I've seen cases where one pre-allocates, fills it up, then adds a little more to the end.

A few more thoughts...

If you are making many calls to a simple subroutine consider putting it inline, rather than calling it over and over.

If there is a particular computation that is really the bottleneck, you could consider mex-ing it.

hope it helps.

Dan

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 4

Jan le 6 Fév 2013

0 votes

Concentrate to optimize the bottlenecks only. When you spend hours to improve some code, which occupies 2% of the total processing time, an acceleration of the factor 1000 leads to a almost 2% faster program also. So use the profiler and better some tic tocs to locate the bottle necks at first.

Unfortunately the profile disables the JIT acceleration, because the JIT can change the processing order of lines, while the profile must measure the lines in the original order.

2 commentaires
Afficher Aucune Masquer Aucune

Dan K le 7 Fév 2013

Really? Thank you Jan, that is something I didn't know! Do you know any way to get an accurate measure of the impact of JIT on code?

Jan le 7 Fév 2013

Modifié(e) : Jan le 7 Fév 2013

Ouvrir dans MATLAB Online

feature('JIT', 'off')
feature('accel', 'off')

Then TIC/TOC the function again. Re-enabling is straight forward.

It should be explained at a very prominent location, that the tool to measure the performance influences the efficiency of the code execution substantially. This is a massive design error of the profiler.

Connectez-vous pour commenter.

Answer 5

Sean de Wolski le 7 Fév 2013

0 votes

I'll also throw in the use of MATLAB Coder to generate mex files from various pieces of code. Although the milage will vary, you can sometimes see a pretty good speed-up with the MEXed version.

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Speeding up code: pre-allocation, vectorization, parfor, spmd....

2 commentaires
Afficher Aucune Masquer Aucune

Réponses (5)

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

2 commentaires
Afficher Aucune Masquer Aucune

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Tags

Community Treasure Hunt

Speeding up code: pre-allocation, vectorization, parfor, spmd....

2 commentaires Afficher Aucune Masquer Aucune

Réponses (5)

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

2 commentaires Afficher Aucune Masquer Aucune

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Tags

Voir également

Community Treasure Hunt

2 commentaires
Afficher Aucune Masquer Aucune

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

2 commentaires
Afficher Aucune Masquer Aucune

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens