Another appeal to clever programmers: generating primes in linear time

Shalmanese · Apr 9, 2002

Argh, help again, we have to write a program which generates the first n primes and it has to generate them in linear time (Is this even possible?).

The specs of the assignment is that we have to write a function which has the following specs:

n___ p__ r
100 547 700000
200 1229 140000
400 2749 320000
800 6143 7000000
1600 13513 1500000
3200 29453 3300000

where n is the nth prime number, p is what the number actually is and r is how many "reductions" we can do to obtain this number.

This means I get roughly 240 reductions per number searched assuming I only calculate odd numbers.

Now, a multiplication takes 16 reductions, and addition takes 16, a gcd takes roughly 50 and a mod takes roughly 80 so I only have about 4 or 5 operations per number.

The best so far that I have managed is to do it in polynomial time but at 10 times the allowable time for even the smallest one so I have NFI how I can make this any faster.

BTW: The language being used is Haskell, a functional programming language but what I am doing right now is:

take a list of primes you already know and a test number.

if the test number is divisible by the first prime, go onto the next test number

otherwise, test if it is divisible by the next prime until you get to the prime that is more that the square root of the test number. If you have gotten this far, then the tst number is a prime so you can add it to the list of primes and start with the new test number.

Does anyone have a better algorithm?

Peter · Apr 9, 2002

This sounds like a variation of the "Erathosthenes' Sieve" method, ancient greek, originally performed on beach sand or so.
History repeating, after thousands of years on paper, that algorithm is now back onto small piles of sand 🙂

Here goes:

* Make a list of all numbers up to a given limit, starting with 2.

One iteration goes like this:

* Find the first non-tagged number entry in the list, starting from the bottom. This is a prime number. Output it.

* Tag all multiples of this number in the list.

Optimizations:
* Code 2 as a given, and work on odd numbers only. This halves the array size.
If memory consumption is a concern, use a bit array rather than a byte array.
If speed is more important, use a byte array. Make the array size an input parameter
to the program.

* Do not rescan the list for non-tagged entries from the start.
Remember where you were from one iteration to the next.

* Iterate over the lower half of the list only, since once you're across that point, there will
be no more multiples to tag. After that, just scan the top half for untagged entries and output them.

Order of complexity? Should be elaborated in any halfway decent math or CS book. After all,
this is (excuse the upcoming pun) a prime example of how to program, often used for
introductory classes.

The obvious fact: On a given array size, as numbers are increasing, the algorithm needs less and less time for tagging.
The weakness: It's rather useless for finding really large prime numbers.

Last time I coded that was on an 8 MHz Atari ST (those were the days), and although I used a bit array,
it ran pretty quickly even when I had it search up to 1,000,000... with an array size of just 15000
(to reach up to 30000) and a current system, it'll be done in ridiculously short time.

regards, Peter

Shalmanese · Apr 9, 2002

I tried that but the problem is that I need say the first 100 primes.

To use the sieve, you need to know the end point. So the sieve would be good if you wanted all the primes between1 and a million but not the first 10,000 primes.

Is there any way of dynamically extending the sieve?

m0ti · Apr 9, 2002

There is another optimization that can be done to the "Erathosthenes' Sieve" which has some interesting results.

Instead of ordering your numbers in the standard way order them as follows:

1 2 3 4 5 6
7 8 9 10 11 12
.
.
.

as you can see each row adds on 6. What this means is that you can eliminate the 2's,3's,4's and 6's column. (we can also eliminate the 1 - though not the column).

Additonally, examine what happens when you have to eliminate a number. Effectively we get two columns:

one is 5 + 6n, and the other is 7+6m.
Let's assume that we have found a prime in one of the columns. First off, you have to eliminate all the numbers in the same column that are multiples of this number. This is done easily by moving forward the size of the number in the same column.

In other words: if x + 6n is prime then : x + 6n + (x+6n)*i is not a prime (since it divides by x+6n).

The only other problem is eliminating cross-columns. Interestingly the 5's column cancels the 7's but the reverse ins't true (meaning that as one searches for higher and higher prime numbers they will be equal to 5+6n with growing probability).

let's take a look at the two columns (drawn out here as rows):

5 11 17 23 29 35 41 47 53
7 13 19 !25! 31 37 43 49 !55!

the 25 and 55 have to be cancelled by numbers in the 5's column.

what we see is the following:

we'll rebuild the table

_1 _2 _3 _4 _5 _6
_7 _8 _9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
31 32 33 34 35 36
37 38 39 40 41 42
43 44 45 46 47 48
49 50 51 52 53 54
55 56 57 58 59 60
61 62 63 64 65 66

what we see here is the following: for each row we move down we add on 6. therefore, if we start at the 5, and go down one row and left one column we get a number that divides by 5. if we were to start at the 11, then we would have to go down two rows (+12) and back 1. To get from 5 to 25 we have to move down 4 times, and left 4 times. Therefore the index is the same as the 5's + 4. looking at the 11, we go down twice for every move left, therefore we have to go down 8 times, and left four times.

this works because we index 5 with 1, and 11 with 2, etc: in other words working off a base of -1 + 6k. For C, we'd change this to 5+6(k-1) = 5 + 6k, so that 5 would have index 0.

But, to keep things simple we're going to go with 5 having index 1.

So we've got two arrays, one which represents -1+6n (5's column), and the other which is 1+6m (7's column).

If -1 + 6n is a prime number then 1+6(n (starting index) + 4*n (4 times we move n down and one over)) divides by it = 1+6(n+4n) = 1 + 6*5n.

Therefore, since we know that the number at index 5n divides by -1+6n, we also know that: 1+6*5n + i(-1+6n) divides by -1+6n and isn't prime.
So, we can mark off all those as not being prime either (note that moving forward by -1+6n is minimal. If you move any less than a whole increment the resulting number will not divide by -1+6n).

If we look at the 7's column, we see a similar thing. Go down 1 row (+6) and moving right (+1) yields a number which is divisible by 7.
Therefore for a number 1+6n which is prime, the number n rows down (+6n) and 1 column over (+1) will divide by it.
Therefore moving 4 rows over we get that if 1+6n is prime then:
5+6(n+ 4*n) divides by it = 5 + 6*5n

However, this number is divisible by 5!!!!!!!!!!!

Hence, if we first handle the case of 5, any primes found in the 7's column don't affect those in the 5's column.

So here's the algorithm:

two arrays one for -1 + 6n (5 is at index 1), one for 1 + 6m (7 is at index 1). The index of each array maps to n and m respectively. Alternately, you can use a regular sized array and remember that the index is one smaller than what is needed for calculating cancellations.

i <- 1.
Print out 2 as being prime, print out 3 as being prime.
1) in the 5's take the next number (index i). if it hasn't been marked go down the array by -1+6i steps tagging as you go. Mark 5n in the 7's array as not being prime, and go down that array by -1+6i steps tagging as you go. (if it has been marked, continue straight to 2)
2) in the 7's take the next number (index i). if it hasn't been marked go down the array by 1+6i steps tagging as you go.
3) i<- i+1. go to 1.

This will allow you to print out the primes in the correct order with no problems whatsoever.

I implemented this once, for any number of primes (millions, billions, etc). it works really quite well.
Let's say you're looking for the first 1 thousand primes.
You can easily hold that many in a 1000x2 array, where you save the index in one field, and whether it's in the 5 column or the 7's column in the other field (if you're sticky for space use two arrays, one of which can be a bit array. If you need to do a billion or so (or any amount that you don't have enough memory for) you'll have to use a file).
Let's assume that you're working in increments of 100.

go over the above algorithm saving the indeces of the primes as you go along. When you're array is filled flush it (reset to the blank case) add one to the counter and fix the numbers as needed. This step isn't hard. After the first pass you've got the indeces of the primes, now then, we want to use that knowledge to throw out the numbers which are multiples of them on the second pass. This is just simple arithmetic (for each index find the first number in each arrray to which it maps (this is an arithmetic calculation), and then eliminate the following numbers as necessary). Note that you have to do two sets of cancellations for indeces in the 5's column: one in the 5's column and one in the 7's.

In any case, even you decide to go for the simpler standard algorithm, you can still use the same trick of remembering which numbers you found to already be prime, and eliminate as needed.

Xalista · Apr 9, 2002

Last time I checked (a couple of weeks ago) Erathosthenes' Sieve had time complexity O(n log log n), which is clearly not linear, almost, but not quite. While m0ti's algorithm will certainly speed things up, it will do so only by a constant. So, if the "linearness" of your algorithm is a really strict demand, Erathosthenes' Sieve is not the way to go. On the other hand, I have never heard of a faster algorithm for generating primes than Erathosthenes' Sieve, and I even doubt there is one.....

Turkey · Apr 9, 2002

Try the primes page. Maybe it'll have something you're looking for.

QwErTyBk · Apr 10, 2002

Not positive, but you may be able to use this to code up something that runs in linear time.

Just when I thought discrete math had no practical use, here I am using it... 🙂

To show a given integer is prime:

Therom - If n is a composite integer (not prime), n has a prime divisor less than or equal to sqr(n)

Proof:
if n is composite, it has a factor a 1<a<n. n=ab, where ab are pos integers >1. a<= sqr(n) or b<= sqr(n) since otherwise ab > sqr(n) * sqr(n) = n. So n has a positive divisor not exceeding sqr(n). This divisor is either prime or, by fundamental theorm of Arithmetic, has a prime divisor. Either way, n has a prime divisor <= sqr(n).

So we see an integer is prime if it is not divisible by any prime less than or equal to its square root.

so to check if 101 is prime for example: only primes <= sqr(101) are 2,3,5,7. 101 isnt divisible by any of them and therefore is prime.

Hope that helps.

-qwertybk

Sahakiel · Apr 10, 2002

Stupid question, but what does it mean to generate the list in linear time? Does it mean that if you want the second prime it'll take 2x the time to figure out compared to finding the first prime and 100x for the 100th number?

Peter · Apr 10, 2002

The "see if it's divisable" approach is GOING to fail on numeric systems because of the limited exactness of calculation.
And it's slow, dividing and multiplying is expensive.

Sahakiel, the order of complexity is a measure of how the workload grows with the size of the problem.
Linear means that for a problem n-times the size, the algorithm will n-times as much time.

Usually you want your algorithms to be of linear order or better, so they don't suddenly take ages when your users
start throwing real life tasks at them 🙂

A good example of how not to do it: Microsoft Word's as-you-type WYSIWYG document formatter.
This is way above linear for growing document sizes.
regards, Peter

Shalmanese · Apr 10, 2002

As I stated before, I can't use the sieve method as I need to find the nth prime number, not all the prime numbers between 1 and k.

The sieve can only work if you have a fixed endpoint which is not what I have unless there is a method of generating the nth prime without generating all n-1 primes (which there isnt to the best of my knowledge).

Anyway, I think I have worked out how to do it. It turns out that the gcd function is roughly linear with respect to the smaller of the two inputs so what I have done is to keep the product of all the past primes in the memory and then just tested if the gcd of the product of all the primes and the test number was 1. If this was so, it was a prime.

Previously, I tried doing this but I was recalculating the product every loop, hence the n^2 efficiency 🙁. (This is not as stupid as it seems as I am using Haskell, a functional programming language where it is not just a case of defining a new variable. There are no variables in Haskell)

Thanks for all the help.

Turkey · Apr 11, 2002

Haven't you been given the n'th prime number (of value k)?

And does it have to actually generate them? There are sites with tables of numbers, just put 'em in a file and read 'em out 🙂. I'll bet you'd be the only project in the class that output primes in constant time 😀.

Shalmanese · Apr 11, 2002

I thought of that but they check the source as well 🙂

aznmist · Apr 13, 2002

perhaps doing the hokey pokey would help??
😕 i don't..comprehend...what's going on!?

Another appeal to clever programmers: generating primes in linear time

Shalmanese

Platinum Member

Peter

Elite Member

Shalmanese

Platinum Member

m0ti

Senior member

Xalista

Member

Turkey

Senior member

QwErTyBk

Member

Sahakiel

Golden Member

Peter

Elite Member

Shalmanese

Platinum Member

Turkey

Senior member

Shalmanese

Platinum Member

aznmist

Golden Member

TRENDING THREADS