Let be a large positive integer whose “prime versus composite” status is not known. One way to know whether is prime or composite is to factor into its prime factors. If there is a non-trivial factor (one that is neither 1 nor ), it is composite. Otherwise is prime. This may sound like a reasonable approach in performing primality testing – checking whether a number is prime or composite. In reality, factoring and primality testing, though related, are very different problems. For a very large number (e.g. with at least 300 decimal digits), it is possible that, even with the state of the art in computing, factoring it may take more than a few million years. On the other hand, it will take a modern computer less than a second to determine whether a 300-digit number is prime or composite. Interestingly this disparity is one reason that makes the RSA work as a practical and secure cryptosystem. In this post, we use the RSA cryptosystem as an example to give a sense that factoring is a “hard” problem while primality testing is an “easy” problem. The primality test used in the examples is the Fermat primality test.
The brute force approach
There is a natural and simple approach in factoring, which is to do trial divisions. To factor the number , we divide by every integer in the range . Once a factor is found, we repeat the process with the complementary factor until all the prime factors of are found. This is simple in concept and is sure to produce the correct answer. For applications in cryptography, this brute force approach is essentially useless since the amount of time to try every candidate factor is prohibitively huge. The amount of time required may be more than the age of the universe if the brute force approach is used.
The brute force approach can be improved upon slightly by doing the trial divisions using candidate factors up to . It is well known that if a composite integer is greater than one, then it has a prime divisor such that . So instead of dividing by every number with , we can divide by every prime number with . But even this improved brute force approach will still take too long to be practical.
Let’s look at a quick example for brute force factoring. Let . Note that . There are 1192 odd primes less than 9676. In dividing by these primes, we stop at 127 and we have . We now focus the attention on . Note that and there are 147 odd primes less than 858. Dividing by these 147 candidate factors, we find that none of them is a factor. We can conclude is prime. Then we have the factorization .
Example of RSA
RSA is a public-key cryptosystem and is widely used for secure data transmission. The RSA public key consists of two parts. One is the modulus that is the product of two distinct prime factors. Suppose the modulus is called and we have where and are distinct prime numbers. How large does have to be? The larger the is, the more secure RSA is. The current practice is that for corporate use the modulus is at least a 1024-bit number (the bit size is called the key length). If data is extra sensitive or if the data needs to be retained for a long time, then a larger key length should be used (e.g. 2048-bit). With a 1024-bit modulus , each prime factor is a 512-bit number. The other part of the RSA public key is the encryption key, which is an integer that is relatively prime to the integer .
Let’s say we want to generate a 1024-bit modulus. There are two challenges with a key of this size. One is that a reliable way is needed to obtain two prime numbers that are 512-bit long. Given a large integer that is at least 512-bit long, how do we determine reliably that it is prime? Is it possible to factor a 512-bit integer using the brute force approach? The other challenge is from the perspective of an attacker – successful factoring the 1024-bit modulus would break RSA and allow the attacker to read the secret message. Let’s look at the problem of the attacker trying to factor a 1024-bit number. A 1024-bit number is approximately . The following calculation converts it to a decimal basis:
We use to denote the logarithm of base 10. Note that . So a 1024-bit number has over 300 digits.
Let’s see what the challenge is if you want to factor a 1024-bit number. Suppose your chosen large number is such that . Note that . According to the improved brute force approach described above, in effect you will need to divide by every prime number less than .
Now let’s get an estimate on the number of prime numbers less than . According to the prime number theorem, the number of prime numbers at most is approximately
where is the number of primes at most . Then . This is certainly a lot of prime numbers to check.
It is hard to comprehend such large numbers. Let’s put this into perspective. Currently the world population is about 7 billion. Let’s say each person in the world possesses a supercomputer that can check prime numbers per second (i.e. to check whether they are factors of the number ). This scenario clearly far exceeds the computing resources that are currently available. Suppose that the 7 billion supercomputers are available and that each one can check many primes per second. Then in each second, the following is the number of prime numbers that can be checked by the 7 billion supercomputers.
The following is the number of seconds it will take to check many prime numbers:
The universe is estimated to be about 13 billion years old. The following calculation converts it to seconds.
With 7 billion fast suppercomputers (one for each person in the world) running in the entire life of the universe, you can only finish checking
of the many prime numbers. Note that is a tiny portion of 1%. So by taking the entire life of the universe to run the 7 billion supercomputers, each checking many candidate prime factors per second, you would not even make a dent in the problem!
The security of RSA rests on the apparent difficulty of factoring large numbers. If the modulus can be factored, then an eavesdropper can obtain the private key from the public key and be able to read the message. The difficulty in factoring means there is a good chance that RSA is secure. In order to break RSA, an attacker would probably have to explore other possible vulnerabilities instead of factoring the modulus.
By carrying out a similar calculation, we can also see that factoring a 512-bit number by brute force factoring is also not feasible. Thus in the RSA key generation process, it is not feasible to use factoring as a way to test primality. The alternative is to use efficient primality tests such as Fermat test or Miller-Rabin test. The computation for these tests is based on the fast powering algorithm, which is a very efficient algorithm.
The story told by RSA numbers
The required time of more than the life of the universe as discussed above is based on the naïve brute force approach of factoring. There are many other factoring approaches that are much more efficient and much faster, e.g., the quadratic sieve algorithm, the number field sieve algorithm, and the general number field sieve algorithm. For these methods, with ample computing resources at the ready, factoring a 1024-bit or 2048-bit number may not take the entire life of the universe but make take decades or more. Even with these better methods, the disparity between slow factoring and fast primality testing is still very pronounced and dramatic.
The best evidence of slow factoring even with using modern methods is from the RSA numbers. The RSA numbers are part of the the RSA Factoring Challenge, which was created in 1991 to foster research in computational number theory and the practical difficulty of factoring large integers. The challenge was declared inactive in 2007. The effort behind the successful factorization of some of these numbers gives us an idea of the monumental challenges in factoring large numbers.
According to the link given in the above paragraph, there are 54 RSA numbers, ranging from 330 bits long to 2048 bits long (100 decimal digits to 617 decimal digits). Each of these numbers is a product of two prime numbers. Of these 54 numbers, 18 were successfully factored (as of the writing of this post). They were all massive efforts involving large groups of volunteers (in some cases using hundreds or thousands of computers), spanning over months or years. Some of methods used are the quadratic sieve algorithm, the number field sieve algorithm, and the general number field sieve algorithm.
The largest RSA number that was successfully factored is the RSA-768, which is 768 bits long and has 232 decimal digits (completed in December 2009). The method used was the Number Field Sieve method. There were 4 main steps in this effort. The first step is the polynomial selection, which took half a year using 80 processors. The second step is the sieving step, which took almost two years on many hundreds of machines. If only using a single core 2.2 GHz AMD Opteron processor with 2 GB RAM, the second step would take about 1500 years! The third step is the matrix step, which took a couple of weeks on a few processors. The final step took a few days, which involved a great deal of debugging.
The number field sieve method is the fastest known method for factoring large numbers that are a product of two primes (i.e. RSA moduli). The effort that went into factoring RSA-768 was massive and involved many years of complicated calculations and processing. This was only a medium size number on the list!
Another interesting observation that can be made is on the RSA numbers that have not been factored yet. There are 36 unfactored numbers in the list. One indication that RSA is secure in the current environment is that the larger numbers in the list are not yet factored (e.g. RSA-1024 which is 1024-bit long). Successful factorization of these numbers has important security implication for RSA. The largest number on the list is RSA-2048, which is 2048-bit long and has 617 digits. It is widely believed that RSA-2048 will stay unfactored in the decades to come, barring any dramatic and significant advance in computing technology.
The factoring challenge for the RSA numbers certainly provides empirical evidence that factoring is hard. Of course, no one should be complacent. We should not think that factoring will always be hard. Technology will continue to improve. A 768-bit RSA modulus was once considered secure. With the successful factorization of RSA-768, key size of 768 bits is no longer considered secure. Currently 1024 bit key size is considered secure. The RSA number RSA-1024 could very well be broken in within the next decade.
There could be new advances in factoring algorithm too. A problem that is thought to be hard may eventually turn out to be easy. Just because everyone thinks that there is no fast way of factoring, it does not mean that no such method exists. It is possible that someone has discovered such a method but decides to keep it secret in order to maintain the advantage. Beyond the issue of factoring, there could be some other vulnerabilities in RSA that can be explored and exploited by attackers.
Fermat primality test
We now give some examples showing primality testing is a much better approach (over factoring) if the goal is to check the “prime or composite” status only. We use Fermat primality test as an example.
Let . This is a small number. So factoring would be practical as a primality test. We use it to illustrate the point that the “prime or composite” status can be determined without factoring. One option is to use Fermat’s little theorem (hence the name of Fermat primality test):
Fermat’s little theorem
If is a prime number and if is an integer that is relatively prime to , then .
Consider the contrapositive of the theorem. If we can find an , relatively prime to such that , then we know for sure is not prime. Such a value of is said to be a Fermat witness for the compositeness of .
If a Fermat witness is found, then we can say conclusively that is composite. On the other hand, if is relatively prime to and , then is probably a prime. We can then declare is prime or choose to run the test for a few more random values of .
The exponentiation is done using the fast powering algorithm, which involves a series of squarings and multiplications. Even for large moduli, the computer implementation of this algorithm is fast and efficient.
Let’s try some value of , say . Using an online calculator, we have
In this case, one congruence calculation tells us that is not prime (if it were, the congruence calculation would lead to a value of one). It turns out that is a product of two primes where . Of course, this is not a secure modulus for RSA. The current consensus is to use a modulus that is at least 1024-bit long.
Let . This is also a small number (in relation what is required for RSA). Once again this is an illustrative example. We calculate for , the first few values of . All such congruence values are one. We suspect that may be prime. So we randomly choose 20 values of and compute . The following shows the results.
For all 20 random values of , . This represents strong evidence (though not absolute proof) that is a prime. In fact, we can attach the following probability statement to the above table of 20 random values of .
If were a composite number that has at least one Fermat witness, there is at most a 0.0000953674% chance that 20 randomly selected values of are not Fermat witnesses.
In other words, if were a composite number that has at least one Fermat witness, there is at most a 0.0000953674% chance of getting 20 1’s in the above computation.
In general, if has at least one Fermat witness, the probability that all randomly selected values of with are not Fermat witnesses is at most . For , , which is 0.0000953674%. The probability statement should give us enough confidence to consider a prime number.
There is a caveat that has to be mentioned. For the above probability statement to be valid, the number must have at least one Fermat witness. If a number is composite, we would like the test to produce a Fermat witness. It turns out that there are composite numbers that have no Fermat witnesses. These numbers are called Carmichael numbers. If is such a number, for any that is relatively prime to the number . In other words, the Fermat test will always indicate “probably prime” for Carmichael numbers. Unless you are lucky and randomly pick a value of that shares a common prime factor with , the Fermat test will always incorrectly identify a Carmichael number as prime. Fortunately Carmichael numbers are rare, even though there are infinitely many of them. In this previous post, we estimate that a randomly selected 1024-bit odd integer has a less than one in chance of being a Carmichael number!
The Fermat test is a powerful test when the number being tested is a prime number or a composite number that has a Fermat witness. For Carmichael numbers, the test is likely to produce a false positive (identifying a composite number as prime). Thus the existence of Carmichael numbers is the biggest weakness of the Fermat test. Fortunately Carmichael numbers are rare. Though they are rare, their existence may still make the Fermat test unsuitable in some situation, e.g., when you test a number provided by your adversary. If you really want to avoid situations like these, you can always switch to the Miller-Rabin test.