Judging prime numbers

Prime Number Algorithm

Prime numbers are very important in computer science. In the famous RSA protocol, it is done by factoring large integers. Its difficulty determines the reliability of the RSA algorithm.

In Goldbach's conjecture, the calculation of prime numbers is also very important. In future section, we will study how to prove Goldbach's conjecture on finite sets.

what is prime number

The opposite of prime numbers is composite numbers.

A prime number refers to a number that is factored. If the factor is only 1 and itself, then the number is a prime number.

For example, the smallest prime number and the only even prime number is 2, because 2=1*2, only 1 and itself are factors.

Then 3 and 5 and 7 and 11 are also prime numbers.

We see 4=2*2, so 4 is a sum, then there are 6, 8, 9, 10, 12 and so on.

According to the above method, it is easy to write a program to calculate the prime numbers:

first try

In the first attempt we didn't consider efficiency, we just wrote a program that would work.

def prime_detect_v1(num):
    for i in range(2, num):
        if num % i == 0:
            return 'composite number'
    return 'prime number'

for i in range(2, 20):
    print(i, end='\t')
    print(prime_detect_v1(i))

The program is very simple. If a number is not divisible by all numbers less than itself, then the number is a prime number, otherwise a composite number.

We simply run it and the program is correct.

second attempt

In the first attempt we only considered correctness, this version we consider efficiency:

The most time-consuming thing in determining prime function is the loop. If we can reduce the number of loops, then the efficiency of our program can increase.

If there is a number 29, and we want to determine whether it is a prime number or not, do we need to try from 2 all the way to 28? No, because if numbers up to (square 29)+1 are not divisible by 29, then neither can numbers over (square 29)+1.

Such a method reduces a considerable amount of computation.

third attempt

The second attempt reduced the computational load by a considerable amount, so can we go further?

YES!

We know that all even numbers except 2 are not prime numbers, so it is very simple, we can exclude all even numbers except 2.

But since 2 is also the first number to be tested if you follow the normal process, this method does not change that much. Using this method reduces the amount of computation by about half.

fourth attempt

In the third attempt, we ruled out all multiples of 2, so can we do the same for multiples of 3? Of course it is possible.

Any natural number can always be expressed in one of the following six forms: 6n, 6n+1, 6n+2, 6n+3, 6n+4, 6n+5 (n=0,1,2...)

We can find that, apart from 2 and 3, only numbers of the form 6n+1 and 6n+5 can be prime numbers. And if numbers of the form 6n+1 and 6n+5 are not prime numbers, their factors will also contain numbers of the form 6n+1 or 6n+5, so the following algorithm can be obtained:

This approach has reached the limit of artificial optimization, so we will not use the same method to optimize this program (you can add 5 or 7 or 11 ... optimization algorithms, but these optimizations decrease marginally as the number increases, and the code changes is too complicated, so I will not continue to use this method to optimize)

fifth attempt

The previous algorithm has reached the limit of optimization, so if we want to continue to optimize the program, we can only change the algorithm.

There are three different measures of Labor complexity and space complexity and time complexity in the program.

Labor complexity refers to the labor cost of writing a program. The simpler the program, the lower the labor complexity, and vice versa.

We use python because the Labor complexity of python is very low, so many python programs do not care so much about the time complexity or space complexity, because in scientific research, most programs do not run many times, and the length of time to write code is long. Much larger than the running time, and writing the program quickly is the most important.

Space complexity refers to the memory and disk size occupied by a program. The larger the space complexity, the larger the disk and memory size occupied by the program.

Time complexity refers to the length of the running time of the program. In theory, we can calculate the time complexity of the program running (check algorithm books for details, and mathematics is not explained here). However, due to the multi-level cache mechanism of the computer and the connection mechanism of the memory, the time complexity is complicated. It is difficult to predict correctly (we are only predicting a trend), so the time complexity of a program is often calculated by running the program thousands of times (most programs run for a very short time, and the measured results are inaccurate) and measure the actual time divided by number of times the programming runs.

For a series of algorithms that are not stupid, the weighted product of Labor complexity and space complexity and time complexity is generally similar. That is to say, if the Labor complexity of an algorithm is relatively high, then the space complexity and time complexity of the algorithm will be relatively low.

In the previous program, the program became more and more complex, which is our Labor complexity has been increasing, but the corresponding time complexity has been decreasing.

Now our time complexity can't seem to be reduced, so we can convert some time complexity and Labor complexity into space complexity to reduce time complexity and Labor complexity.

Today's computers are more resourceful (not always running out of memory as in 1990) and many programs can mediate between these three complexities.

In this method we build an array in which we keep all the prime numbers that have been tested.

Then we only need to divide the detected number by all the values in the array that are less than the square root of the detected number when we perform the detection.

This method has the same time complexity as the previous method, and the space complexity is much larger, but it is very easy to think of.

If a number is even, then the first number 2 in the array can rule out that the number is prime, and if a number is a multiple of 3, then the second number 3 in the array can rule out the number being prime.

The disadvantage of this method is that the continuous sequence must be detected, and in reality we generally only need to detect whether a certain number is a prime number. Which this method doesnot work.

ensemble

We can put these methods together to see the correctness:

A basic way to check whether a program is right or wrong is a checker: use random numbers or a lot of data to check your method multiple times against a library function method (or two methods you write, one algorithm, the other is the brute force calculation) results are the same or not.

Intuitively, it can be seen that these five methods can get the same answer. In the actual program, we generally do not print the answer but check it inside the program with logic.

In general, check the program is correct for millions of times does not take more than 1 minute. And with such a large amount of data, it is almost certain that the program is correct. (Sometimes we need special value detection, that is, enter some special values to check the correctness of the program, such as 0 or 1 or 2, because some programs work well under random numbers, but special values will not work.)

Statistics

Start time of this page: February 2, 2022

Completion time of this page: February 4, 2022

Last updated