# Factorization versus primality testing

Let $n$ be a large positive integer whose “prime versus composite” status is not known. One way to know whether $n$ is prime or composite is to factor $n$ into its prime factors. If there is a non-trivial factor (one that is neither 1 nor $n$), it is composite. Otherwise $n$ is prime. This may sound like a reasonable approach in performing primality testing – checking whether a number is prime or composite. In reality, factoring and primality testing, though related, are very different problems. For a very large number (e.g. with at least 300 decimal digits), it is possible that, even with the state of the art in computing, factoring it may take more than a few million years. On the other hand, it will take a modern computer less than a second to determine whether a 300-digit number is prime or composite. Interestingly this disparity is one reason that makes the RSA work as a practical and secure cryptosystem. In this post, we use the RSA cryptosystem as an example to give a sense that factoring is a “hard” problem while primality testing is an “easy” problem. The primality test used in the examples is the Fermat primality test.

____________________________________________________________________________

The brute force approach

There is a natural and simple approach in factoring, which is to do trial divisions. To factor the number $n$, we divide $n$ by every integer $a$ in the range $1. Once a factor $a$ is found, we repeat the process with the complementary factor $\frac{n}{a}$ until all the prime factors of $n$ are found. This is simple in concept and is sure to produce the correct answer. For applications in cryptography, this brute force approach is essentially useless since the amount of time to try every candidate factor is prohibitively huge. The amount of time required may be more than the age of the universe if the brute force approach is used.

The brute force approach can be improved upon slightly by doing the trial divisions using candidate factors up to $\sqrt{n}$. It is well known that if a composite integer $n$ is greater than one, then it has a prime divisor $d$ such that $1. So instead of dividing $n$ by every number $a$ with $1, we can divide $n$ by every prime number $a$ with $1. But even this improved brute force approach will still take too long to be practical.

Let’s look at a quick example for brute force factoring. Let $n=96638243$. Note that $\sqrt{n}=\sqrt{96638243}=9676$. There are 1192 odd primes less than 9676. In dividing $n$ by these primes, we stop at 127 and we have $n=96638243=127 \cdot 737309$. We now focus the attention on $737309$. Note that $\sqrt{737309}=858.67$ and there are 147 odd primes less than 858. Dividing $737309$ by these 147 candidate factors, we find that none of them is a factor. We can conclude $737309$ is prime. Then we have the factorization $n=96638243=127 \cdot 737309$.

____________________________________________________________________________

Example of RSA

RSA is a public-key cryptosystem and is widely used for secure data transmission. The RSA public key consists of two parts. One is the modulus that is the product of two distinct prime factors. Suppose the modulus is called $N$ and we have $N=pq$ where $p$ and $q$ are distinct prime numbers. How large does $N$ have to be? The larger the $N$ is, the more secure RSA is. The current practice is that for corporate use the modulus is at least a 1024-bit number (the bit size is called the key length). If data is extra sensitive or if the data needs to be retained for a long time, then a larger key length should be used (e.g. 2048-bit). With a 1024-bit modulus $N=pq$, each prime factor is a 512-bit number. The other part of the RSA public key is the encryption key, which is an integer $e$ that is relatively prime to the integer $(p-1) \cdot (q-1)$.

Let’s say we want to generate a 1024-bit modulus. There are two challenges with a key of this size. One is that a reliable way is needed to obtain two prime numbers that are 512-bit long. Given a large integer that is at least 512-bit long, how do we determine reliably that it is prime? Is it possible to factor a 512-bit integer using the brute force approach? The other challenge is from the perspective of an attacker – successful factoring the 1024-bit modulus would break RSA and allow the attacker to read the secret message. Let’s look at the problem of the attacker trying to factor a 1024-bit number. A 1024-bit number is approximately $2^{1024}$. The following calculation converts it to a decimal basis:

$\displaystyle 2^{1024}=(10^{\text{log(2)}})^{1024} \approx 10^{308.25}$

We use $\text{log}(x)$ to denote the logarithm of base 10. Note that $1024 \cdot \text{log}(2)=308.25$. So a 1024-bit number has over 300 digits.

Let’s see what the challenge is if you want to factor a 1024-bit number. Suppose your chosen large number $n$ is such that $n \approx 10^{308}$. Note that $\sqrt{10^{308}}=10^{154}$. According to the improved brute force approach described above, in effect you will need to divide $n$ by every prime number less than $10^{154}$.

Now let’s get an estimate on the number of prime numbers less than $10^{154}$. According to the prime number theorem, the number of prime numbers at most $x$ is approximately

$\displaystyle \pi(x) \approx \frac{x}{\text{ln}x}$

where $\pi(x)$ is the number of primes at most $x$. Then $\pi(10^{154}) \approx 2.82 \cdot 10^{151}$. This is certainly a lot of prime numbers to check.

It is hard to comprehend such large numbers. Let’s put this into perspective. Currently the world population is about 7 billion. Let’s say each person in the world possesses a supercomputer that can check $10^{40}$ prime numbers per second (i.e. to check whether they are factors of the number $n$). This scenario clearly far exceeds the computing resources that are currently available. Suppose that the 7 billion supercomputers are available and that each one can check $10^{40}$ many primes per second. Then in each second, the following is the number of prime numbers that can be checked by the 7 billion supercomputers.

$\displaystyle 7 \cdot 10^9 \cdot 10^{40}=7 \cdot 10^{49} \text{ prime numbers per second}$

The following is the number of seconds it will take to check $2.82 \cdot 10^{151}$ many prime numbers:

$\displaystyle \frac{2.82 \cdot 10^{151}}{7 \cdot 10^{49}} \approx 4 \cdot 10^{101} \text{ seconds}$

The universe is estimated to be about 13 billion years old. The following calculation converts it to seconds.

$13 \text{ billion years}=13 \cdot 10^9 \cdot 365 \cdot 24 \cdot 3600 \approx 4 \cdot 10^{17} \text{ seconds}$

With 7 billion fast suppercomputers (one for each person in the world) running in the entire life of the universe, you can only finish checking

$\displaystyle \frac{4 \cdot 10^{17}}{4 \cdot 10^{101}}=\frac{1}{10^{84}}$

of the $2.82 \cdot 10^{151}$ many prime numbers. Note that $\frac{1}{10^{84}}$ is a tiny portion of 1%. So by taking the entire life of the universe to run the 7 billion supercomputers, each checking $10^{40}$ many candidate prime factors per second, you would not even make a dent in the problem!

The security of RSA rests on the apparent difficulty of factoring large numbers. If the modulus $N=pq$ can be factored, then an eavesdropper can obtain the private key from the public key and be able to read the message. The difficulty in factoring means there is a good chance that RSA is secure. In order to break RSA, an attacker would probably have to explore other possible vulnerabilities instead of factoring the modulus.

By carrying out a similar calculation, we can also see that factoring a 512-bit number by brute force factoring is also not feasible. Thus in the RSA key generation process, it is not feasible to use factoring as a way to test primality. The alternative is to use efficient primality tests such as Fermat test or Miller-Rabin test. The computation for these tests is based on the fast powering algorithm, which is a very efficient algorithm.

____________________________________________________________________________

The story told by RSA numbers

The required time of more than the life of the universe as discussed above is based on the naïve brute force approach of factoring. There are many other factoring approaches that are much more efficient and much faster, e.g., the quadratic sieve algorithm, the number field sieve algorithm, and the general number field sieve algorithm. For these methods, with ample computing resources at the ready, factoring a 1024-bit or 2048-bit number may not take the entire life of the universe but make take decades or more. Even with these better methods, the disparity between slow factoring and fast primality testing is still very pronounced and dramatic.

The best evidence of slow factoring even with using modern methods is from the RSA numbers. The RSA numbers are part of the the RSA Factoring Challenge, which was created in 1991 to foster research in computational number theory and the practical difficulty of factoring large integers. The challenge was declared inactive in 2007. The effort behind the successful factorization of some of these numbers gives us an idea of the monumental challenges in factoring large numbers.

According to the link given in the above paragraph, there are 54 RSA numbers, ranging from 330 bits long to 2048 bits long (100 decimal digits to 617 decimal digits). Each of these numbers is a product of two prime numbers. Of these 54 numbers, 18 were successfully factored (as of the writing of this post). They were all massive efforts involving large groups of volunteers (in some cases using hundreds or thousands of computers), spanning over months or years. Some of methods used are the quadratic sieve algorithm, the number field sieve algorithm, and the general number field sieve algorithm.

The largest RSA number that was successfully factored is the RSA-768, which is 768 bits long and has 232 decimal digits (completed in December 2009). The method used was the Number Field Sieve method. There were 4 main steps in this effort. The first step is the polynomial selection, which took half a year using 80 processors. The second step is the sieving step, which took almost two years on many hundreds of machines. If only using a single core 2.2 GHz AMD Opteron processor with 2 GB RAM, the second step would take about 1500 years! The third step is the matrix step, which took a couple of weeks on a few processors. The final step took a few days, which involved a great deal of debugging.

The number field sieve method is the fastest known method for factoring large numbers that are a product of two primes (i.e. RSA moduli). The effort that went into factoring RSA-768 was massive and involved many years of complicated calculations and processing. This was only a medium size number on the list!

Another interesting observation that can be made is on the RSA numbers that have not been factored yet. There are 36 unfactored numbers in the list. One indication that RSA is secure in the current environment is that the larger numbers in the list are not yet factored (e.g. RSA-1024 which is 1024-bit long). Successful factorization of these numbers has important security implication for RSA. The largest number on the list is RSA-2048, which is 2048-bit long and has 617 digits. It is widely believed that RSA-2048 will stay unfactored in the decades to come, barring any dramatic and significant advance in computing technology.

The factoring challenge for the RSA numbers certainly provides empirical evidence that factoring is hard. Of course, no one should be complacent. We should not think that factoring will always be hard. Technology will continue to improve. A 768-bit RSA modulus was once considered secure. With the successful factorization of RSA-768, key size of 768 bits is no longer considered secure. Currently 1024 bit key size is considered secure. The RSA number RSA-1024 could very well be broken in within the next decade.

There could be new advances in factoring algorithm too. A problem that is thought to be hard may eventually turn out to be easy. Just because everyone thinks that there is no fast way of factoring, it does not mean that no such method exists. It is possible that someone has discovered such a method but decides to keep it secret in order to maintain the advantage. Beyond the issue of factoring, there could be some other vulnerabilities in RSA that can be explored and exploited by attackers.

____________________________________________________________________________

Fermat primality test

We now give some examples showing primality testing is a much better approach (over factoring) if the goal is to check the “prime or composite” status only. We use Fermat primality test as an example.

Example 1
Let $n=15144781$. This is a small number. So factoring would be practical as a primality test. We use it to illustrate the point that the “prime or composite” status can be determined without factoring. One option is to use Fermat’s little theorem (hence the name of Fermat primality test):

Fermat’s little theorem
If $n$ is a prime number and if $a$ is an integer that is relatively prime to $n$, then $a^{n-1} \equiv 1 \ (\text{mod} \ n)$.

Consider the contrapositive of the theorem. If we can find an $a$, relatively prime to $n$ such that $a^{n-1} \not \equiv 1 \ (\text{mod} \ n)$, then we know for sure $n$ is not prime. Such a value of $a$ is said to be a Fermat witness for the compositeness of $n$.

If a Fermat witness is found, then we can say conclusively that $n$ is composite. On the other hand, if $a$ is relatively prime to $n$ and $a^{n-1} \equiv 1 \ (\text{mod} \ n)$, then $n$ is probably a prime. We can then declare $n$ is prime or choose to run the test for a few more random values of $a$.

The exponentiation $a^{n-1} \ (\text{mod} \ n)$ is done using the fast powering algorithm, which involves a series of squarings and multiplications. Even for large moduli, the computer implementation of this algorithm is fast and efficient.

Let’s try some value of $a$, say $a=2$. Using an online calculator, we have

$2^{15144780} \equiv 1789293 \not \equiv 1 \ (\text{mod} \ 15144781)$

In this case, one congruence calculation tells us that $n=15144781$ is not prime (if it were, the congruence calculation would lead to a value of one). It turns out that $n=15144781$ is a product of two primes where $n=15144781=3733 \cdot 4057$. Of course, this is not a secure modulus for RSA. The current consensus is to use a modulus that is at least 1024-bit long.

Example 2
Let $n=15231691$. This is also a small number (in relation what is required for RSA). Once again this is an illustrative example. We calculate $a^{15231690} \ (\text{mod} \ 15231691)$ for $a=2,3,4,5,6,7$, the first few values of $a$. All such congruence values are one. We suspect that $n=15231691$ may be prime. So we randomly choose 20 values of $a$ and compute $a^{15231690} \ (\text{mod} \ 15231691)$. The following shows the results.

$\left[\begin{array}{rrr} a & \text{ } & a^{n-1} \ \text{mod} \ n \\ \text{ } & \text{ } & n=15,231,691 \\ \text{ } & \text{ } & \text{ } \\ 3,747,236 & \text{ } & 1 \\ 370,478 & \text{ } & 1 \\ 12,094,560 & \text{ } & 1 \\ 705,835 & \text{ } & 1 \\ 10,571,714 & \text{ } & 1 \\ 15,004,366 & \text{ } & 1 \\ 12,216,046 & \text{ } & 1 \\ 10,708,300 & \text{ } & 1 \\ 6,243,738 & \text{ } & 1 \\ 1,523,626 & \text{ } & 1 \\ 10,496,554 & \text{ } & 1 \\ 10,332,033 & \text{ } & 1 \\ 10,233,123 & \text{ } & 1 \\ 3,996,691 & \text{ } & 1 \\ 4,221,958 & \text{ } & 1 \\ 3,139,943 & \text{ } & 1 \\ 1,736,767 & \text{ } & 1 \\ 12,672,150 & \text{ } & 1 \\ 12,028,143 & \text{ } & 1 \\ 8,528,642 & \text{ } & 1 \end{array}\right]$

For all 20 random values of $a$, $a^{15231690} \equiv 1 \ (\text{mod} \ 15231691)$. This represents strong evidence (though not absolute proof) that $n=15231691$ is a prime. In fact, we can attach the following probability statement to the above table of 20 random values of $a$.

If $n=15231691$ were a composite number that has at least one Fermat witness, there is at most a 0.0000953674% chance that 20 randomly selected values of $a$ are not Fermat witnesses.

In other words, if $n=15231691$ were a composite number that has at least one Fermat witness, there is at most a 0.0000953674% chance of getting 20 1’s in the above computation.

In general, if $n$ has at least one Fermat witness, the probability that all $k$ randomly selected values of $a$ with $1 are not Fermat witnesses is at most $0.5^k$. For $k=20$, $0.5^{20}=0.000000953674$, which is 0.0000953674%. The probability statement should give us enough confidence to consider $n=15231691$ a prime number.

There is a caveat that has to be mentioned. For the above probability statement to be valid, the number $n$ must have at least one Fermat witness. If a number $n$ is composite, we would like the test to produce a Fermat witness. It turns out that there are composite numbers that have no Fermat witnesses. These numbers are called Carmichael numbers. If $n$ is such a number, $a^{n-1} \equiv 1 \ (\text{mod} \ n)$ for any $a$ that is relatively prime to the number $n$. In other words, the Fermat test will always indicate “probably prime” for Carmichael numbers. Unless you are lucky and randomly pick a value of $a$ that shares a common prime factor with $n$, the Fermat test will always incorrectly identify a Carmichael number $n$ as prime. Fortunately Carmichael numbers are rare, even though there are infinitely many of them. In this previous post, we estimate that a randomly selected 1024-bit odd integer has a less than one in $10^{88}$ chance of being a Carmichael number!

The Fermat test is a powerful test when the number being tested is a prime number or a composite number that has a Fermat witness. For Carmichael numbers, the test is likely to produce a false positive (identifying a composite number as prime). Thus the existence of Carmichael numbers is the biggest weakness of the Fermat test. Fortunately Carmichael numbers are rare. Though they are rare, their existence may still make the Fermat test unsuitable in some situation, e.g., when you test a number provided by your adversary. If you really want to avoid situations like these, you can always switch to the Miller-Rabin test.

____________________________________________________________________________

$\copyright \ 2014 \text{ by Dan Ma}$