isprime function seems to have poor performance
13 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Thomas
le 21 Août 2019
Modifié(e) : John D'Errico
le 23 Déc 2022
Why is MatLab's "isprime" function so much slower than Octave's "isprime" function?
I am using MatLab's "isprime" function to check whether a large number is a prime or not using the symbolic toolbox. I found that the performance of "isprime" in MatLab is much slower than in Octave. Why is this the case or what am I doing wrong with MatLab?
My tests with octave testing large Mersenne-primes produced the following runtimes for "isprime":
tested prime runtime in seconds
2^607-1, 0.15724
2^1279-1, 0.18309
2^2203-1, 0.41784
2^2281-1, 0.70215
2^3217-1, 1.7013
2^4253-1, 2.4854
2^4423-1, 2.2523
2^9689-1, 25.7571
2^9941-1, 25.761
2^11213-1, 38.3376
and with MatLab's "isprime":
tested prime runtime in seconds
2^607-1 31.930225
2^1279-1 547.414940
2^2203-1 5168.567632
2^2281-1 5578.169207
2^3217-1 461.535261
2^4253-1 739.918345
2^4423-1 3805.209681
2^9689-1 8954.457005
2^9941-1 10550.740359
2^11213-1 11865.530147
MatLab's documentation about "isprime" says, that 10 random tests based on the Miller-Rabin method are done. I believe, Octave only does 4 random tests (I suppose also Miller-Rabin, but I am not sure). However this does by far not explain the huge difference in runtime.
In both cases no parallelisation was used and the program ran on one thread on the CPU.
This is the MatLab code I used to run the test. The Octave version is basically the same...
function speedtrace_isprime();
% teste Dauer der Ausführung von "isprime" in Abhängigkeit von wachsenden Mersenne-Primzahlen 2^p-1
% zu testende p's:
p = [2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607, 1279, 2203, 2281, 3217, 4253, 4423, 9689, 9941, 11213, 19937, 21701, 23209, 44497, 86243, 110503, 132049, 216091, 756839, 859433, ...
1257787, 2976221, 3021377, 6972593, 13466917, 20996011, 24036583, 25964951, 30402457, 32582657, ...
37156667, 42643801, 43112609, 57885161, 74207281, 77232917, 82589933];
speedtrace = fopen('speedtrace_isprime.txt', 'w'); %trace-file öffnen
fprintf(speedtrace, "%s %s \n", "Start: ", string(datetime)); % schreiben
fprintf(speedtrace, "%s \n", "getestete Primzahl, Zeit in Sekunden, Uhrzeit/Datum"); % schreiben
fclose(speedtrace); % file wieder zumachen
base = sym("2");
disp(["getestete Primzahl, Zeit in Sekunden, Uhrzeit/Datum"]);
for k = 1:1:numel(p);
tic;
isprime(base^p(k)-1);
Zeit(k) = toc;
fprintf("2^%i-1 %f %s \n", p(k), Zeit(k), datetime);
speedtrace = fopen('speedtrace_isprime.txt', 'a');
fprintf(speedtrace,"2^%i-1 %f %s \n", p(k), Zeit(k), datetime);
fclose(speedtrace);
% figure(1); plot(Zeit);
end;
speedtrace = fopen('speedtrace_isprime.txt', 'a');
fprintf(speedtrace, "%s %s \n", ["Ende: ", string(datetime)]);
fclose(speedtrace);
end
4 commentaires
Walter Roberson
le 22 Août 2019
Octave sym appears to use Python sympy and the isprime for that is documented :
https://docs.sympy.org/latest/modules/ntheory.html#sympy.ntheory.primetest.isprime
Your inputs are over 2^64 so BPSW would be used, not Miller Rabin.
Réponse acceptée
John D'Errico
le 21 Août 2019
Modifié(e) : John D'Errico
le 21 Août 2019
First, using isprime to test for primality of a Mersenne number is a bad idea. The Lucas-Lehmer test, is as I recall, much more efficient here, and it gives a statement of primality, NOT a statement of isprobable primality. The Miller-Rabin test is an isprobable test, not a proof of primality! That is why multiple internal runs are employed. Do some reading about how these methods can fail.
But more imprtantly, read about Lucas-Lehmer.
In fact, sym/isprime has greatly improved over previous ersions. I recall testing it some years ago, and it was pretty slow. But that is not so true now. This means if you have an old release, it might gain you to upgrade. By the way, if I do a test for primality of a number with on the order of 70000 digits, it is a matter typically of 20 minutes, as I recall. (Its been a month or so since I was running massive overnight jobs, searching for prime members of a specific family of primes, thus quasi-modified Woodal primes of a specific class. They can be exceedingly rare if you choose the right family.)
Anyway, you don't want to use isprime to test for a Mersenne prime. Use Lucas-Lehmer. It is REALLY fast in comparison. For a quick implementation...
function s = Lucas_Lehmer(p)
% Returns true for prime 2^p-1
% don't even bother if p is not prime
if ~isprime(p)
s = false;
return
end
s = 4;
M = sym(2)^p - 1;
for i = 1:p-2
s = mod(s*s - 2,M);
end
s = logical(s == 0);
Seriously, you can't do something much simpler.
Lucas_Lehmer(607)
ans =
logical
1
isprime(sym(2)^607-1)
ans =
logical
1
timeit(@() Lucas_Lehmer(607))
ans =
0.4162580338225
tic,isprime(sym(2)^607-1),toc
ans =
logical
1
Elapsed time is 9.295827 seconds.
Lucas_Lehmer(613)
ans =
logical
0
% A quick check of the list of known Mersenne primes tells me that 21701 is...
tic,Lucas_Lehmer(21701),toc
ans =
logical
1
Elapsed time is 17.560138 seconds.
2^21701-1 is not really that large of a number, but I expect that isprime will take considerably longer. (My own test right now is still running after about 10 minutes.)
tic,isprime(sym(2)^21701-1),toc
???
Finally, if your goal is to find seriously large Mersenne primes, MATLAB is probably not the tool to use if you are looking for primes with millions of digits at some point. But I have found it to work reasonably well in my own play time, searching for primes with 50000-100000 decimal digits. For any seriously large investigation, I'd suggest looking at GIMPS, which seems to be the state of the art.
If your goal is to search for primes in some other family, then there are sometimes tricks to make things more efficient, but they are often strongly dependent on the number family itself. For example, Cullen and Woodall number families have some neat things you can do, to avoid calling isprime too often.
4 commentaires
Emerson Dutra
le 13 Nov 2020
n = 10000;
primes = [];
if n>=2
primes = [2];
end
for value = 3:2:n
isprime = true;
for divisor = 2:floor(value/2)
if mod(value, divisor) == 0
isprime = false;
break;
end
end
if isprime
primes (end +1) = value;
end
end
primes
Plus de réponses (3)
John D'Errico
le 21 Déc 2022
As a followup, to this question, I've now learned where the time has gone.
In R2018 or so, the symbolic toolbox isprime call was changed to a tue test for primality. Note that the tests for prime in tools like Python and java.math.BigInteger are usually implementations of the MIller-Rabin test, which can be fooled. Essentially they should be truly called isprobableprime tests. In fact, the accuracy of those tests is VERY good. Unlike, for example, the Fermat test for primality, which can pretty easily be tricked, MIller-Rabin is not easily tricked. But in some rare cases, it can be.
Anyway, the current implementation of isprime in the symbolic TB is different, and is a true test for primality. But it is slower.
3 commentaires
Paul
le 23 Déc 2022
Modifié(e) : Paul
le 23 Déc 2022
I would have thought a change in behavior of sym/isprime would have been listed in the Release Notes, but I didn't see anything, at least as far back as 2017b, which looked to be as far back as can be searched.
John D'Errico
le 23 Déc 2022
Modifié(e) : John D'Errico
le 23 Déc 2022
I finally got around to putting in a problem report into TMW, asing why isprime was so slow. It took some looking, but they figured out where the time is going.
What I learned was about the fact that elliptic curves are now used in isprime, to provide a true test for primality. And that is a good thing. Sort of. Because if someone wanted to send in a prime to the large primes database, knowing a number is prime might ne nice, instead of being hopeful that it probably isprime.
As for it being a change in behavior, I've not checked the release notes for those years. They may not have realized that people would be using isprime to test some pretty large primes. And isprime is not that godawfully terrible for small-ish numbers, say on the order of a few hundred digits. For example, using my own VPIJ toolbox (not VPI, VPIJ is my replacement for VPI. I never posted it on the FEX, probably because of rumors of the gradual disappearance of Java)...
Pj = nextprime(vpij(10)^500)
Pj =
100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000961
So 500 digits. 10^500+961 is a prime, according to VPIJ.
I used my own version of nextprime, which is much faster than isprime. But the test in VPIJ is juat an isprobableprime call to the java.math.BigInteger.isprobableprime utility. Pretty hard to fool, since Miller-Rabin is actually quite good in general.
>> tic,isprime(Ps),toc
ans =
logical
1
Elapsed time is 296.594971 seconds.
Pj = vpij(10)^500 + 961
Pj =
100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000961
isprime(Pj)
ans =
logical
1
timeit(@() isprime(Pj))
ans =
0.027499
So isprime is still pretty slow.
I did look back at the release notes, as far back as 2017. They do not say the internal behaviour of isprime had changed. But to be honest, TMW generallty does not promise they will tell you about internal algorithmic changes in the release notes. They tell you when new tools are relased, or when the interface to a code has changed. And while nyone might argue this is something they should have mentioned, it seems not be be shown.
Anyway, I did ask for a new, isprobableprime code, as a feature request. I also asked for the ability to set the strength of the test, thuse effectively to set the number of witnesses to be applied. Yes, it is something I could write. So I slapped together a simple MIller-Rabin code using the symbolic TB.
tic,millerRabinTest(Ps,2:50),toc
ans =
1×49 logical array
Columns 1 through 23
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Columns 24 through 46
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Columns 47 through 49
1 1 1
Elapsed time is 0.524319 seconds.
tic,millerRabinTest(Pj,2:50),toc
ans =
1×49 logical array
Columns 1 through 23
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Columns 24 through 46
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Columns 47 through 49
1 1 1
Elapsed time is 0.272139 seconds.
This was just a test bed, to see how fast I could perform for 50 different MIller-Rabin witnesses, here, all of which agree that Ps is indeed prime. Interstingly, even there, the Java libraries seem to be a little faster, but not a lot.
Still way slower than the VPIJ code, but my test was written in MATLAB itself. The strength of a test with 50 different witnesses would be at least
4^-50
ans =
7.8886e-31
And usually MIller-Rabin is much better than that, based on what I've read. a 25% failure rate per witness is kind of a worst case.
Anyway, I could post VPIJ, if anyone felt a desire for it. At least for now, it would be a boost in capability.
John D'Errico
le 25 Juil 2022
As a follow-up, I've also compared the MATLAB isprime to that of Python and Mathematica. On significanty large numbers, The MATLAB sym/isprime was significantly slower. Note that an isprime test on these large numbers is not a true test, but should really be called isprobable prime, as it is possible for the test to be mistaken.
Anyway, as a point of comparison, you can still use Java, which has its own BigInteger form, and a test for primality there. Some years ago, I wrote a variant of my VPI code to use Java, called, naturally, VPIJ.
The next prime number after 10^200 is 10^200+357. the sym is prime tool took nearly 9 seconds to identify it as prime, while the Java tool took a tiny fraction of a second.
tic,isprime(sym(10)^200 + 357),toc
ans =
logical
1
Elapsed time is 8.680651 seconds.
tic,isprime(vpij(10)^200 + 357),toc
ans =
logical
1
Elapsed time is 0.007850 seconds.
The above computations were done in R2022a, so sym/isprime is still slow. I gave up waiting for sym/isprime to return a result for this next 1000 digit prime, but Java was reasonably fast.
tic,isprime(vpij(10)^1000 + 453),toc
ans =
logical
1
Elapsed time is 0.170706 seconds.
Chris
le 21 Août 2019
>> tic, isprime(2^607-1), toc
ans =
logical
0
Elapsed time is 0.453590 seconds.
>> tic, isprime(uint64(2^1279-1)),toc
ans =
logical
0
Elapsed time is 26.890225 seconds.
Cant talk about the speed difference with octave but ML seemes to have trouble with the large numbers; without the uint64 ML took the second input as inf. I am using 19a and have a few year old computer.
6 commentaires
John D'Errico
le 22 Août 2019
Um, first of all, 4 cores is 4 cores. 8 threads is not relevant. That is just your computer making believe it has 8 cores. Hyperthreading is a fallacy in terms of CPU performance, sold to you by the manufacturers, as a way of letting you think you have real power there. If one core is running flat out, you don't gain by having two hyperthreaded cores interleaved, both running flat out.
Hyperthreading was a gain when you had only a limited set of cores (1 is the real issue here) and you want to do several things at once. It allows you to browse the web, or check your mail, while MATLAB is at work in the background. And since your computer is always doing things in the background, hyper threading was arguably a good idea in the past. Now? Just a mirage.
Next, you will probably find that a CPU monitor tells you that MATLAB is using only ONE of those cores anyway, in a call to isprime. symbolic toolbox tools are not multi-threaded at all. At least they are not so in release R2019a. (This is one thing I'd like to see change in the future for MATLAB symbolic toolbox calls.)
So you might find that part of the speed bump found by Octave is it might be multi-threaded for symbolic problems. And if they do fewer iterations of Miller-Rabin than does MATLAB, that might explain as much as an 8-1 speed bump for this specific problem.
Voir également
Catégories
En savoir plus sur Software Development Tools dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!