A Fast Large-Integer Extended GCD Algorithm and Hardware Design for Verifiable Delay Functions and Modular Inversion
Speaker: Kavya Sreedhar, PhD Student, Stanford University
Date: May 4, 2022
The extended GCD computation is the most expensive operation in state-of-the-art implementations of two advanced cryptography applications that are gathering increasing interest — verifiable delay functions that involve squaring in class groups, and constant-time modular inversion for elliptic curve cryptography. These application developments are relatively recent, so few papers have investigated XGCD hardware acceleration. In this talk, we explore the large-integer XGCD accelerator design space across several axes: algorithm choice, large-integer arithmetic circuit design, and computation constraints due to various application requirements. While all prior hardware works build from Euclid’s division-based algorithm, we find that building upon Stein’s subtraction algorithm instead is a more promising approach for faster hardware. Our use of carry-save adders and redundant representations enable us to achieve 4+GHz clock frequencies in 16nm. Our 16nm ASIC design calculates 1024-bit XGCD in 294ns (8X faster than the state-of-the-art ASIC) and constant-time 255-bit XGCD for inverses in the field of integers modulo the prime 2^255 - 19 in 87ns (31X faster than state-of-the-art software). We plan to tape out this design in GF12 in September.