Bignum arithmetic notes / 08

Barrett and pseudo-Mersenne reduction

Montgomery reduction is not the only way to avoid division. Reciprocal and special-form reductions trade precomputation for structured arithmetic.

Barrett reduction fixes a modulus \(m\) and precomputes an approximation to \(1/m\). For radix \(B\) and \(k\) limbs, define

\[\mu=\left\lfloor\frac{B^{2k}}{m}\right\rfloor.\]

For \(x<B^{2k}\), estimate \(q\approx \lfloor x/m\rfloor\) using high limbs of \(x\mu\), then compute \(r=x-qm\) and correct by subtracting \(m\) a small bounded number of times.

Barrett contract

Precondition: \(0<m<B^k\) and \(0\le x<B^{2k}\). Postcondition: output \(r\equiv x\pmod m\) and \(0\le r<m\) after correction.

The exact quotient approximation depends on the Barrett variant. The implementation must document which high limbs are used and how many correction subtractions are required.

Example: one fixed RSA public modulus

For repeated public reductions modulo a fixed RSA modulus, Barrett can be useful when operands are public or when a constant-time correction schedule is implemented. For private RSA exponentiation, Montgomery is usually preferable because every multiply is already inside the same odd modulus.

Pseudo-Mersenne primes

If

\[p=2^k-c\]

with small \(c\), then \(2^k\equiv c\pmod p\). A high part can be folded into the low part by multiplying by \(c\).

Example: \(2^{255}-19\)

Write \(x=x_0+x_1 2^{255}\). Modulo \(p=2^{255}-19\),

\[x\equiv x_0+19x_1\pmod p.\]

The fold is cheap, but the representation must leave enough headroom for the multiplication by 19 and subsequent carries.

Example: \(2^{448}-2^{224}-1\)

Here \(2^{448}\equiv 2^{224}+1\pmod p\). A high limb chunk folds into two lower positions. The reduction is still linear, but not a single multiply by a small constant.

Barrett versus Montgomery

Situation Better default
Many products modulo one odd modulus Montgomery
One-off public reduction Barrett or division
Special prime with sparse relation pseudo-Mersenne/Solinas
Secret-dependent correction impossible to bound redesign

C sketch: pseudo-Mersenne fold

/* Conceptual fold for p = 2^255 - 19, not a complete field implementation. */
void fold_25519(limb_t out[16], const limb_t in[32]) {
    /* Split at bit 255, fold high part multiplied by 19, then carry. */
    (void)out;
    (void)in;
}

The omitted details are exactly the important ones: bit 255 cuts through the top 16-bit limb. A 32-bit-only production representation may use uniform 16-bit limbs or another carefully proved sub-32-bit radix, but it cannot rely on a scalar product wider than uint32_t.

SageMath reduction check

p = 2^255 - 19
for x in [0, 1, p-1, p, 2^510 - 1]:
    x0 = x % 2^255
    x1 = x // 2^255
    print((x0 + 19*x1) % p == x % p)