Code is running very slow, how to make faster?
Afficher commentaires plus anciens
Hello,
Please i am having issue with getting this code to run faster, i am using it for my final year dissertation. I will really appreciate your help on it if you can help me optimize the code to be faster.
The code is for Zero Normalized Cross Correlation which i want to use for template matching. I have attached all the files including a screenshot of my profiling of the code which is very slow. I have also attached the workspace needed. Everything is in the link above.
Other things you need to know for running the code are in the "Recommended Instruction for Executing Code.txt" file.
I will really appreciate it if you can help me.
Thanks a lot
4 commentaires
Mario Malic
le 24 Juil 2020
Recommended Instruction for Executing Code.txt doesn't exist.
Fego Etese
le 24 Juil 2020
Mario Malic
le 25 Juil 2020
Modifié(e) : Mario Malic
le 29 Juil 2020
Is double necessary?
% convert to single
target = single(matchImg);
template = single(enrollTemplateImage);
Change them both to single gives some improvement.
>> tic, posZNCC = znccPrf(enrollTemplateImage); toc % with double, .png image
Elapsed time is 1.675404 seconds.
>> tic, posZNCC = znccPrf(enrollTemplateImage); toc % with single, .png image
Elapsed time is 1.418316 seconds.
Also, the fact that you run your code in OneDrive folder, maybe .mlx generates some files while running and the OneDrive sync produces the problems.
Also, consider the difference in outputs of matchImg when you are supplying it with different images. Finger1A will not generate same output as Finger1E even though they are the same format.
Fego Etese
le 25 Juil 2020
Réponses (1)
per isakson
le 25 Juil 2020
Modifié(e) : per isakson
le 7 Août 2020
Caveat: I've never seriously used the Live Editor.
I've undertaken the following steps
- uploaded your files to a new folder, which I made the current folder
- read "Recommended Instruction for Executing Code.txt" file.
- loaded workspace.mat
- converted znccPrf.mlx to znccPrf.m (an old time m-file)
- changed imread('finger1E.tif'); to imread('finger1E.png'); since there was no tif-file in the upload.
- profiled posZNCC = znccPrf(enrollTemplateImage);. The statement, meanRef=mean(mean(ref)); dominated together with "self time".
- replaced mean(mean(ref)); by mean(ref,'all');. That helped a bit. And sum(reshape(ref,1,[]))/numel(ref); is still a bit faster.
Finally I run
>> tic, posZNCC = znccPrf(enrollTemplateImage); toc
Elapsed time is 2.003947 seconds.
>> tic, posZNCC = znccPrf(enrollTemplateImage); toc
Elapsed time is 2.024163 seconds.
The code I ran differs from the code of your profiling screenshots. The profiling results differs dramatically.
I use Matlab R2018b, Win10 and a fairly new desktop PC.
In response to comments
I've made a few more changes to your code and achieved close to a doubling of the speed compared to your function.
I use the uploaded png-file, finger1E.png, in both cases. (Is the code intended to process png or tif files?)
Furthermore, I use the lines
for y = 1:rTem % <<<<<<<<<<<<
for x = 1:cTem % <<<<<<<<<<<<
in both functions, since I believe that's the relevant case. Why do you use " = 1:2 " in some cases?
The script
%%
tic, posZNCC = znccPrf( enrollTemplateImage ); toc
tic, posZNCC_poi = znccPrf_poi( enrollTemplateImage, 'png' ); toc
posZNCC, posZNCC_poi
%%
t = bench()
outputs
>> fego
Elapsed time is 2.611501 seconds.
Elapsed time is 1.430623 seconds.
posZNCC =
209 103 0.76105
posZNCC_poi =
1×3 single row vector
209 103 0.76105
t =
0.081743 0.078711 0.01291 0.083046 1.2827 2.0448
The two return the same value of posZNCC, that is within the precision displayed by format short. The last line describes the performance of my Matlab+PC. The first four numbers are good the last two are poor.
Measures to improve the speed
Use single instead of double. It introduces rounding errors, which I believe are acceptable.
% convert to double
target = single(matchImg); % <<<<<<<<<<<<
template = single(enrollTemplateImage); % <<<<<<<<<<<<
Split the calculation of the temporary variable, ref, into two steps. This should decrease the need for shuffling data.
for jj = 1 : (rTar - rTem + 1)
refjj = target( jj:(jj+rTem-1), : ); % <<<<<<<<<<<<
for ii = 1 : (cTar - cTem + 1)
ref = refjj( :, ii:(ii+cTem-1) ); % <<<<<<<<<<<<
Chose a more efficient code to calculate mean of a matrix. In a reply to Walter's question I showed a comparison between six different ways to calculate the mean.
meanRef = sum(ref(:))/numelTem; % <<<<<<<<<<<<
Vectorize the two inner loops
tmT = template - meanTem; % <<<<<<<<<<<<
rmR = ref - meanRef; % <<<<<<<<<<<<
sum1 = sum( reshape( tmT.*rmR, [],1 ) ); % <<<<<<<<<<<<
sum2 = sum( reshape( tmT.*tmT, [],1 ) ); % <<<<<<<<<<<<
sum3 = sum( reshape( rmR.*rmR, [],1 ) ); % <<<<<<<<<<<<
ZNCC = sum1 / (sqrt(sum2) * sqrt(sum3)); % <<<<<<<<<<<<
That was a lot of work and it didn't even double the speed. (The two functions are attached.)
One more measure (2020-08-07)
The execution time of the script** increases faster than linear with the size of the image, i.e. with the size of the variable matchImg in the code. The image, finger1E.png, has a fairly large white areas to the left and right. Removing most of that white area decreases the execution time substantially without affecting the result.
I made this little test
>> pic = 'finger1E';
>> crop = false;
>> tic, [ posZNCC, P ] = znccPrf_poi_v2( enrollTemplateImage, pic, crop ); toc
Elapsed time is 1.413413 seconds.
>> crop = true;
>> tic, [ posZNCC, P ] = znccPrf_poi_v2( enrollTemplateImage, pic, crop ); toc
Elapsed time is 0.802015 seconds.
All of the measure described above are implemented in znccPrf_poi_v2. With crop==false the elapse time, 1.41sec, is close enough to 1.43sec reported above for znccPrf_poi. With crop==true the leftmost 90 and rightmost 58 columns of the 374x388 matchImg are removed by
matchImg = imread('finger1E.png');
if nargin==3 && crop
matchImg = matchImg( :, 91:330 );
end
**) should be function
35 commentaires
Walter Roberson
le 25 Juil 2020
How about mean(ref(:)) for timing ?
per isakson
le 25 Juil 2020
Modifié(e) : per isakson
le 26 Juil 2020

(:) is a bit faster than reshape( ___,1,[]).
The differences overall are too large to my taste.
Fego Etese
le 25 Juil 2020
Fego Etese
le 28 Juil 2020
Mario Malic
le 28 Juil 2020
Modifié(e) : Mario Malic
le 28 Juil 2020
I would suggest you to try to do this on a different machine, since there might be something wrong with your laptop.
What he did on step 7: he calculated mean value of a matrix by other ways. I also did it on live editor and the code was done in similar time as his.
Also, if you are importing an image with significantly higher pixel count can result in much longer solving time.
Fego Etese
le 28 Juil 2020
Modifié(e) : Fego Etese
le 28 Juil 2020
Mario Malic
le 28 Juil 2020
If you consider that his computer spent 1.5s on that line out of 2s of total time, then execution of these lines take less than 0.5s and are irrelevant.
As I said, if you were working with .tiff image, it is not the same as with .png due to the reasons I mentioned in my comment.
If you are working with different code or different files (which may explain for the difference in our times vs yours), it is hard for us to troubleshoot where the problem is.
Fego Etese
le 28 Juil 2020
Fego Etese
le 28 Juil 2020
Fego Etese
le 28 Juil 2020
Modifié(e) : Fego Etese
le 28 Juil 2020
Fego Etese
le 28 Juil 2020
Mario Malic
le 28 Juil 2020
I suggest you to your upload your .mlx file here or to the onedrive link and upload the same image that you are using before continuing anything from my side.
Fego Etese
le 29 Juil 2020
Modifié(e) : per isakson
le 29 Juil 2020
Fego Etese
le 29 Juil 2020
Modifié(e) : Fego Etese
le 29 Juil 2020
Walter Roberson
le 29 Juil 2020
Replace
mean(template, 'all')
with
mean(template(:))
If I recall correctly, the 'all' option was added in the release after you are using.
Fego Etese
le 2 Août 2020
Fego Etese
le 2 Août 2020
Fego Etese
le 3 Août 2020
Mario Malic
le 3 Août 2020
I have R2018a.
Finger1A and znccPrf_poi Elapsed time is 12.098890 seconds.
Finger1E and znccPrf_poi Elapsed time is 3.686021 seconds.
Even though the images look similar, one is grayscale and the other is truecolor.
If you uncomment the figure line, you'll see that for truecolor image you will do two more fingerprints, or two extra calculations, was that your intention?
Fego Etese
le 3 Août 2020
per isakson
le 7 Août 2020
Make your screenshot easier to read with my old eyes, by placing the outputs below the code in the Live Editor. Use the icon in the upper right corner
per isakson
le 7 Août 2020
"Please I don't actually understand the step you took at no 7."
There are many ways in Matlab to calculate the average of all values of a matrix. I've tried a handful. The speed is 1 to 3 between slowest and the fastest, as I displayed in an answer to a comment by Walter. Your code spends a large portion of the time calculating the average of matrices. Thus, I replace mean(mean(ref)) (which is the slowest) by a faster way.
per isakson
le 7 Août 2020
Modifié(e) : per isakson
le 7 Août 2020
"[...] I still don't know why I can't get mine to be similar to yours"
We have to run exctly the same code with exactly the same image file to be able to compare execution times in a meaningful way. And measure time in the same way, tic/toc or profile.
I believe that the major part of the differences that you report are because of diffences in code and image file and that only a minor part are because of our different hardware.
"[...] could it be that version 2018a is slower than others?" Mario Malic runs R2018a. I would be surprised if the diffence between R2018a and R2018b explains more than a few percent.
It's confusing that the script named znccPrf_poi exists in many versions.
What exactly does "I'll change the files to png" and "I extracted the gray part" mean?
Fego Etese
le 7 Août 2020
Fego Etese
le 7 Août 2020
Modifié(e) : Fego Etese
le 7 Août 2020
per isakson
le 7 Août 2020
And see my addition to the answer
Fego Etese
le 7 Août 2020
per isakson
le 7 Août 2020
I run (see my answer regarding znccPrf_poi_v2 )
>> profile on
>> [ posZNCC, P ] = znccPrf_poi_v2( enrollTemplateImage, 'finger1A', false );
>> profile viewer
where finger1A.png is handle by
case 'finger1A'
matchImg = imread('finger1A.png');
matchImg = rgb2gray( matchImg );
if nargin==3 && crop
matchImg = matchImg( :, 91:end );
end
Excert from the profiling result

Your code is running the inner loop more than three times as many iterations as mine. The reason is probably with the value of cTar.
per isakson
le 7 Août 2020
Modifié(e) : per isakson
le 7 Août 2020
I knew there was a good reason to "Caveat: I've never seriously used the Live Editor."
Fego Etese
le 7 Août 2020
per isakson
le 7 Août 2020
"Yes, I have read this before, how live scripts run slower"
How come you have not tested if this affects your function and how come you didn't provide this reference in your question or a comment? You let people work with your problem, without providing this potentially important information. That's not fair!
Fego Etese
le 7 Août 2020
Modifié(e) : Fego Etese
le 7 Août 2020
per isakson
le 10 Août 2020
Modifié(e) : per isakson
le 10 Août 2020
"I said I read it before, but I haven't seen it actually happen"
The problem is that I can only know what you actually written in this thread of comments. The thread has been going on for two weeks and contains thirtytwo comments. I don't remember all the details and I have no good ideas of why you see such poor performance.
You may very well have read "live scripts run slower" and compared the performance of mlx-functions and m-functions, but I cannot know since you have not reported it here. Your justification "because of it's UI that is nice" sounds absurd to me when the focus should be on performance.
Please don't assume that I deduce from the profiling screenshots whether they show results from mlx- or m-functions.
per isakson
le 10 Août 2020
You need to approach the problem more systematically. Don't cut corners.
What do you know for sure?
What could be the reasons for the poor performance? There were some good hypotheses in a recent comment (now deleted) by Mario Malic. Make your own list and test one hypothesis at a time. Document the results.
znccPrf_poi_v2 differs from znccPrf_poi only regarding the cropping of white area. Thus, I think it's better that you concentrate on reproducing my result with znccPrf_poi.
per isakson
le 10 Août 2020
Modifié(e) : per isakson
le 10 Août 2020
I looked at your screenshots of the for ii = 1 : (cTar - cTem + 1) loop. I noticed that the number of iterations differs between the first and the rest. I find nothing in your July 28 comments that explains the difference.
- Fego Etese on 28 Jul 2020 shows 96672 iterations
- Fego Etese on 28 Jul 2020 shows 332576 iterations
- Fego Etese on 2 Aug 2020 shows 332576 iterations
- Fego Etese on 7 Aug 2020 shows 332576 iterations
The screenshot of my 7 Aug 2020 comment shows 96672 iterations.
P.S. The dates are the dates displayed here. Local time may add or substract a day.
Catégories
En savoir plus sur Whos dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!








