Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

python - Why do two algorithms for finding primes differ in speed so much even though they seem to do the same number of iterations?

I have two algorithms of finding primes, in Python. The inner loop of each one seems to be executed the same number of times, and is equally simple. However, one of them takes 10 times as much as the other. My question is:

Why? Is this some quirk of Python that can be optimized away (how?), or am I missing something else?

The problem I am solving is essentially from http://www.spoj.pl/problems/PRIME1/. In my case, I have N = 10 ** 9, delta = 10 ** 5, and I want to find all primes between N-delta and delta. I also have smallprimes, a pre-made list of all primes less than or equal to square root of N. The first algorithm is very simple:

def range_f1(lo, hi, smallprimes):
  """Finds all primes p with lo <= p <= hi. 

  smallprimes is the sorted list of all primes up to (at least) square root of hi.
  hi & lo might be large, but hi-lo+1 miust fit into a long."""

  primes =[]
  for i in xrange(hi-lo+1):
    n = lo + i

    isprime = True
    for p in smallprimes:
        if n % p == 0:
            isprime = False
            break

    if isprime:
        primes.append(n)

  return primes

Calling range_f1(N-delta,N,smallprimes) takes a long time (about 10 seconds). The inner loop is called 195170 times. I also have a version of this algorithm that replaces the loop with a list comprehension (That is the one I actually use for profiling; see the end of the question) but that is no faster.

The second version is (an ugly implementation of) the sieve of Eratosthenes:

def range_sieve(lo, hi, smallprimes):
  """Parameters as before"""

  # So ugly! But SO FAST!! How??

  delta = hi-lo+1
  iamprime = [True] * delta      # iamprime[i] says whether lo + i is prime
  if lo<= 1:
    iamprime[1 - lo] = False

  def sillyfun():      # For profiling & speed comparison
    pass

  for p in smallprimes:
    rem = lo % p
    if rem == 0:
        iamprime[0] = False
    for i in xrange(p - rem, delta, p):
        iamprime[i] = False
        sillyfun()

    if p >= lo and p <= hi:
        iamprime[p - lo] = True

  return [p + lo for (p, ami) in enumerate(iamprime) if ami]

This is about 10 times as fast, takes less than 2 seconds. However, the inner loop (sillyfun()) is called 259982 times, more than in the last case. I am at a loss to explain why this is fast.

I thought that maybe the reason is because the inner loop of the first algorithm contains modular arithmetic, while the second only has an assignment. However, the following seems to imply that assignment is no faster than modular arithmetic:

>>> from timeit import timeit
>>> timeit("10**9 % 2341234")
0.23445186469234613
>>> timeit("a[5000]=False", "a = [True] * 10000")
0.47924750212666822

Here's the (less readable) version of the first algorithm I actually use:

def range_f2(lo, hi, smallprimes):

  primes =[]
  for i in xrange(hi-lo+1):
    n = lo + i

    try:
        (1 for p in smallprimes if n % p ==0).next()
    except StopIteration:
        primes.append(n)

  return primes

Here are the result of calling the profiler for range_f2() (note the number of time generating expression is evaluated):

>>> from cProfile import run as prof
>>> prof("range_f2(N-delta,N,sp)")
 200005 function calls in 13.866 CPU seconds

 Ordered by: standard name

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000   13.866   13.866 <string>:1(<module>)
 195170   12.632    0.000   12.632    0.000 prime1.py:101(<genexpr>)
      1    1.224    1.224   13.865   13.865 prime1.py:90(range_f2)
   4832    0.009    0.000    0.009    0.000 {method 'append' of 'list' objects}
      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Here is the profiler result for range_sieve():

>>> prof("range_sieve(N-delta,N,sp)")
259985 function calls in 1.370 CPU seconds

Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.003    0.003    1.370    1.370 <string>:1(<module>)
     1    0.877    0.877    1.367    1.367 prime1.py:39(range_sieve)
259982    0.490    0.000    0.490    0.000 prime1.py:51(sillyfun)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Finally,here is he complete code that generates the lists of small primes (in a very silly way) so that you can check what results you get: http://pastebin.com/7sfN4mG4

UPDATE By popular demand, the profiling data for the first chunk of code. No data on how many times the inner loop is executed, but it seems pretty clear it's the same as the third.

>>> prof("range_f1(N-delta,N,sp)")
      4835 function calls in 14.184 CPU seconds
Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000   14.184   14.184 <string>:1(<module>)
     1   14.162   14.162   14.183   14.183 prime1.py:69(range_f1)
  4832    0.021    0.000    0.021    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The difference is an algorithmic one. In the first version, trial division, you test each candidate against all small primes - that you don't stop when the small prime exceeds candidate ** 0.5 doesn't matter for the range(10**9 - 10**5 ,10**9) if smallprimes has a good upper bound, but it would if the length of the range were much larger in relation to the upper bound. For composites, that doesn't incur much cost, since most of them have at least one small prime divisor. But for primes, you have to go all the way to N**0.5. There are roughly 10**5/log(10**9) primes in that interval, each of them is trial-divided by about 10**4.5/log(10**4.5) primes, so that makes about 1.47*10**7 trial divisions against primes.

On the other hand, with the sieve, you only cross off composites, each composite is crossed off as many times as it has prime divisors, while primes aren't crossed off at all (so primes are free). The number of prime divisors of n is bounded by (a multiple of) log(n) (that's a crude upper bound, usually greatly overestimating), so that gives an upper bound of 10**5*log(10**9) (times a small constant) crossings-off, about 2*10**6. In addition to that, the crossing off may be less work than a division (don't know about Python, it is for C arrays). So you're doing less work with the sieve.

Edit: collected the actual numbers for 10**9-10**5 to 10**9.

Ticks: 259987
4832
Divisions: 20353799
4832

The sieve does only 259987 crossings-off, you see that the crude upper bound above is overestimating by a large factor. The trial division needs more than 20 million divisions, 16433632 of those for primes (x/log x underestimates the number of primes, for x = 10**4.5 by about 10%), 3435183 are used for the 3297 numbers in that range whose smallest prime factor is larger than n**(1/3).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...