Boswell at the Zoo

Newby look into RSA

Loose notes on me going through and poorly implementing an RSA encryption scheme. I'll be using Wikipedia - you'll find that my explanation is formatted just like Wikipedias, and in fact the Wikipedia page is certainly much more helpful.

  1. Pick two primes, p and q. These should be very large, the first source I saw said ~2^10 bits, which is around 300 digits. This is to increase the computation needed to 'crack' the encryption, which I'll get to, but also makes intended computation more expensive, so I won't be doing it.

  2. n = p * q

  3. Find λ(n).

    • Related: φ, or Euler's totient function, is a function that takes a positive integer as input, say n, and outputs the number of integers below it with which n is coprime. Two numbers are coprime if they share no factors besides 1. So φ just goes through range(1, n-1) and checks if the two are coprime, if so tally += 1
      • Note: when checking if a and b are coprime, if a or b are prime, then they must be coprime, but they can also be coprime without either being prime.
    • λ, or the Carmichael function also takes a positive integer as input, n, but outputs m such that:
      a^m ≡ 1 (mod n).
      - a is a subset of all number coprime to n, so where φ just takes the length of a, λ performs some checks using the each values as input.
      - mod - mod is an operation, it just returns the remainder after dividing by n, so: 17 mod 5 is 2, because 17/5 = 3 + 2/5.
      - ≡ means =, but you assume that the mod at the end of the line applies to both sides of the equation, so 15 ≡ 9 (mod 6).
      To actually use these definitions, now: get a list of all numbers less and coprime to n, call it a. If you'd like to brute force it: start with m = 1. For each element of a, raise it to the mth power, add one, and check if n can neatly divide this. If so: go to the next element of a. If not, start back at the first element of a, and m += 1. So you're trying to find the smallest m such that (all coprimes of n)^ m (mod n) = 1. So, what does this actually mean? Well, all elements of a are coprime to n, so they do not share any dividers. Raising ai (an element of a) to a power will not change this - 7's factors are 7 and 1 - 7^2 = 49, 49's factors are 1, 7, 49, so -ai ^ m (mod n) will never be 0. It will be exlusivley coprime numbers, though, and I don't know why. More importantly here- it cycles through these coprime numbers as m increases, so say you're looking at 3^m (mod 5).
      At m = 1, 3 mod 5 is 3.
      At m = 2, 9 mod 5 is 4.
      at m = 3, 27 mod 5 is 2.
      at m = 4, 81 mod 5 is 1.
      at m = 5, 243 mod 5 is 3 (again)
      at m = 6, 729 mod 5 is 4 (again)
      at m = 7, 2187 mod 5 is 2 (again)
      etc...
      So, 2 important things about these cycles are their length and the numbers they contain. About all I know about either of these is that they all contain a 1, and so the formula waits for them all to line up and all spit out the 1 at the same time. I have no idea what the significance of this is.
      Note: I'm really lost here. I don't know why it does what it does or why its so important.\
    • Some ways to simplify:
      -λ(n) = lcm(λ℗, λ(q)). I really do not understand why. Clearly there is a pattern in λ that I am totally missing out on. (lcm is lowest common multiple - lcm(12, 8) = 24. at most lcm(a, b) = a * b)
      -Since p and q are each prime, λ℗ = φ℗ (why? i dont know!) = p - 1 (because since p is prime, any value less than it will be coprime with it)
      -If you accept the above, which I've given you no reason to. But. λ(n) = lcm(p-1, q-1), which is simple enough, and can be solved with:
      - Eulidean Algorithm: One of the first recorded algorithms, given two numbers, a and b, such that a > b, GCD(a, b) = GCD(a-b, b). You can keep doing this until a and b are equal, their value will be the GCD of the original a and b. and b. However. With, say, GCD(1, 100000) you'd need to take 100000 steps to get that the GCD is 1. Instead, the modern version uses a % b in place of a - b.\
  4. Choose a value e such that 1 < e < λ(n), and e and λ(n) are coprime. This is a public value, and influential in the efficiency of encoding + decoding.\

  5. Determine d if d is the modular multiplicative inverse of e module λ(n)

    • The modular multiplicative inverse takes 2 arguments - the subject and base of a modulo equation, and outputs one value. For example, the mmi of 3 mod 7 is: \3x mod 7 = 1
      (the 1 is always 1) - and so since 5 * 3 = 15 and 15 mod 7 = 1, the mmi of 3mod7 is 5. Of course, there are many x s such that this is true, but you only consider the one below the mod base, 7 in this case, if there is one which is not a given.
      There are ways to simplify this by modification to the Eulidean Algorithm- it involves keeping track of factors of the numbers you mod. I don't know enough about it to explain any more.

This is all! You now have:
a value n - the big number which is a product of 2 big primes. This will be public
a value e - a big number. Also public.
a value d - the number that satisfies d * e (mod λ(n)) = 1. This is private, and important.
The tdlr here is that you have n and e which you give out freely, and your only proof of self is d, which is derived from n and e! but, thanks to the highlighted text, because you know what n's divisors are, the proccess of finding d is much less expensive than for anyone without that knowledge.

To use this, you broadcast n and e, take your message, encode it to a number in an agreed upon way (no secrecy needed here, this is not a cypher), call that m, then broadcast:
m ^ e (mod n)
The receiver (and maybe others!) will get this message, but only someone who knows the value of d can reverse it cheaply.
if c is the message after being encoded and encrypted:
c ^ d (mod n) = the message!\

This happens constantly when using the internet - up to hundreds a minute with regular browsing according to claude! Each of these would take years of computation to correctly crack, so we're safe for now..\

Some other stuff I read about:
Fermat's Little Theorem: For any prime, p, for any int, a: a^p - a (mod p) = 0 has a nice proof: Take a as the number of letters in an alphabet, and p as the length of combinations of letters you will be making - there are a^p combinations of letters - if you identify letters that are identical, but shifted, eg. AAB and ABA, and group them, the group sizes will be: the length of the shortest string S such that S * n = original string (including itself) - so AAA is in a group of one, as it is A * 3, and len('A') = 1. Now - for any substrings to exist, len(S) must be a divisor of the original length, and since the length of the strings are prime, they can either be that prime, or 1. you can now imagine 2 rectangles - one 1 by a, representing the combinations like AAA or BBB - and one p by ((a^p) - a) / p - representing the group of length p, of which there are ((a^p) - a) / p. (To intuit why the groups are of length p, remember that p is the length, and so every unsimplifiable combo will be able to be cycled one to the right or left, p times). Once you imagine the 2 squares, recognizing that the entire thing is simply a representation of a^p, that the small bit is a and the large bit is p*(that post-hoc length above), it is clear that the large rectangle is susceptible to mod℗, and will leave only a.
Chinese Remainder Theorem: that given an int x and a set of coprimes n {n1, n2, etc}, x (mod (sum(n))) exists, and it appears every n1 * n2 * n3 etc numbers.

Final Thoughts:
This was ultimately wildly unsatisfying. I learned very little staring at these autistic machinations of primes which I should be exploring - maybe RSA is just too complex a subject to broach like this - there were certainly shortfalls from just looking through Wikipedia, although it forced me to work quite a bit more out than I would have otherwise.

Here is implementation in python:

p = 2467
q = 4481
e = 65537
n = p * q

def lcm(a, b):
    if a < 0:
        a = a * -1
    if b < 0:
        b = b * -1
    gcd, _, _ = newEuler(a, b)
    return int(a * b / gcd)

def newEuler(a, b):
    if b == 0:
        return a, 1, 0
    else:
        gcd, x1, y1 = newEuler(b, a % b)
        x = y1
        y = x1 - (a // b) * y1
        return gcd, x, y

lambda_n = lcm(p - 1, q - 1)


gcd, d, _ = newEuler(e, lambda_n)
# make sure d is positive - courtesy of claude
d = d % lambda_n

print(f"Public key: (n={n}, e={e})")
print(f"Private key: d={d}")

m = 800813
c = (m ** e) % n

print(f"Encrypted message: c={c}")

print((c**d) % n)