37.3.8 PUKCL Requirements and Performance
Services Stack Usage
This library is using the main core to execute its computations, and therefore is also sharing some resources with the application.
It may be important for the application to know RAM usage by the library functions and to be aware that the library does not use any global variables.
The following table provides the minimum number of bytes used by the library that have to be available on the stacks to ensure that the functionality can be executed correctly. In some cases, the library may use less bytes than the specified number for some options. This table contains estimated values.
PUKCL Service | STACK Usage (Bytes) |
---|---|
SelfTest | 112 |
ClearFlags | 0 |
Swap | 8 |
Fill | 8 |
CondCopy | 24 |
FastCopy | 16 |
Smult | 16 |
Smult (with reduction) | 88 |
Comp | 8 |
Fmult | 24 |
Fmult (with reduction) | 96 |
Square | 16 |
Square (with reduction) | 88 |
Div | 144 |
GCD | 136 |
RedMod (Setup) | 160 |
RedMod (using fast reduction) | 80 |
RedMod (randomize) | 80 |
RedMod (Normalize) | 80 |
RedMod (Using Division) | 184 |
ExpMod | 200 |
PrimeGen | 416 |
CRT | 304 |
ZpEccAddFast | 104 |
ZpEccAddSubFast | 92 |
ZpEcConvProjToAffine | 280 |
ZpEcConvAffineToProjective | 64 |
ZpEccDblFast | 96 |
ZpEccMulFast | 168 |
ZpEccQuickDualMulFast | 216 |
ZpEcDsaGenerateFast | 392 |
ZpEcDsaVerifyFast | 456 |
ZpEcDsaQuickVerify | 368 |
ZpEcRandomiseCoordinate | 56 |
GF2NEccAddFast | 128 |
GF2NEcConvProjToAffine | 264 |
GF2NEcConvAffineToProjective | 56 |
GF2NEccDblFast | 136 |
GF2NEccMulFast | 208 |
GF2NEcDsaGenerateFast | 376 |
GF2NEcDsaVerifyFast | 440 |
GF2NEcRandomiseCoordinate | 56 |
Parameter Size Limits for Different Services
The following table lists parameter size limits for different services.
For the services ModExp, PrimeGen, and CRT, additional details are available in the service description.
API | Min/Max Sizes | Comments |
---|---|---|
SelfTest | — | — |
ClearFlags | — | — |
Swap | 4 bytes to 2044 bytes | Per block to be swapped |
Fill | 4 bytes to 4088 bytes | — |
Fast Copy/Clear | 4 bytes to 2044 bytes | Supposing Length(R) = Length(X) |
Conditional Copy/Clear | 4 bytes to 2044 bytes | Supposing Length(R) = Length(X) |
Smult |
4 bytes to 2040 bytes | Supposing Length(R) = Length(X) + 4 Bytes, No Z Parameter, No Reduction |
Compare | 4 bytes to 2044 bytes | Supposing Length(X) = Length(Y) |
FMult | Input: 4 bytes to 1020 bytes Output: 4bytes to 2040 bytes | Supposing Length(Y) = Length(X), No Z Parameter, No Reduction |
Square |
Input: 4 bytes to 1020 bytes Output: 4 bytes to 2040 bytes |
Supposing No Z Parameter, No Reduction |
Euclidean Division |
Divider: 8 to 1016 bytes Num.: 8 to 2032 bytes |
Supposing Length(Num) = 2*Length(Divider) |
Mod. inv. / GCD | 8 to 1012 bytes | — |
ModRed |
Modulus: 12 to 1016 bytes Input: 24 to 2032 bytes |
Supposing RBase = XBase |
Fast ModExp Exp in Crypto RAM |
12 to 576 bytes (96 to 4608 bits) |
Supposing Length(Exponent) = Length(Modulus), Window Size = 1 With the Exponent in Crypto RAM |
Fast ModExp Exp not in Crypto RAM |
12 to 672 bytes (96 to 5376 bits) |
Supposing Length(Exponent) = Length(Modulus), Window Size = 1 With the Exponent not in Crypto RAM |
Prime Gen. |
Prime Number: 12 to 448 bytes (96 to 3584 bits) |
Supposing Window Size = 1 |
CRT |
Modulus = Two Primes: Size of one prime from 24 to 448 bytes Modulus = from 48 to 896 bytes (384 to 7168 bits) |
Supposing Length(Exponent) = Length(Modulus), Window Size = 1 |
ECC Addition qnd Subtraction GF(p) |
Modulus: 12 to 308 bytes | — |
ECC Doubling GF(p) |
Modulus: 12 to 400 bytes | — |
ECC Multiplication GF(p) |
Modulus: 12 to 264 bytes |
Supposing Length(Scalar) = Length(Modulus) |
ECC Quick Dual Multiplication GF(p) |
Modulus: 12 to 152 bytes | — |
ECDSA Generate GF(p) |
Modulus: 12 to 220 bytes (up to 521 bits for common curves) |
Supposing Length(Scalar) = Length(Modulus) |
ECDSA Verify GF(p) |
Modulus: 12 to 188 bytes (up to 521 bits for common curves) |
Supposing Length(Scalar) = Length(Modulus) |
ECC Addition GF(2n) |
Modulus: 12 to 248 bytes | — |
ECC Doubling GF(2n) |
Modulus: 12 to 364 bytes | — |
ECC Multiplication GF(2n) |
Modulus: 12 to 250 bytes |
Supposing Length(Scalar) = Length(Modulus) |
ECDSA Generate GF(2n) |
Modulus: 12 to 208 bytes (up to 571 bits for common curves) |
Supposing Length(Scalar) = Length(Modulus) |
ECDSA Verify GF(2n) |
Modulus: 12 to 180 bytes (up to 571 bits for common curves) |
Supposing Length(Scalar) = Length(Modulus) |
ECDSA Quick Verify GF(2n) |
Modulus: 12 to 140 bytes (up to 571 bits for common curves) |
Supposing Length(Scalar) = Length(Modulus) |
Service Timing
The values in the following tables are estimated performances for CPU clock of 64 MHz. The CPU and PUKCC are operated at the same frequency. Due to possible change in the parameters values, the measurements show approximated values.
Other test conditions:
- PUKCL library data in Crypto RAM
- Test code and test data in SRAM
- ICache and DCache are disabled
Service Timing for RSA
RSA uses the ExpMod service for encryption and decryption. Following tables show service timing, where ‘W’ indicates window size.
Operation | Clock Cycles | Timing one block |
---|---|---|
RSA 1024 decryption / signature generation. No CRT, Regular implementation, W=4 | 3.05 MCycles | 47.799 ms |
RSA 1024 decryption / signature generation. With CRT, Regular implementation, W=4 | 1.09 MCycles | 17.109 ms |
RSA 1024 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=3 | 0.07 MCycles | 1.141 ms |
RSA 1024 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=0x10001 | 0.07 MCycles | 1.129 ms |
Operation | Clock Cycles | Timing One block |
---|---|---|
RSA 2048 decryption / signature generation. No CRT, Regular implementation, W=4 | 21.6 MCycles | 338.249 ms |
RSA 2048 decryption / signature generation. With CRT, Regular implementation, W=4 | 6.36 MCycles | 99.408 ms |
RSA 2048 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=3 | 0.24 MCycles | 3.843 ms |
RSA 2048 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=0x10001 | 0.24 MCycles | 3.827 ms |
Operation | Clock Cycles | Timing One block |
---|---|---|
RSA 4096 Decryption / signature generation. No CRT, Regular implementation, W=1 | 209 MCycles | 3.2742s |
RSA 4096 Decryption / signature generation. With CRT, Regular implementation, W=3 | 46.1 MCycles | 720.95 ms |
RSA 4096 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=3 | 0.91 MCycles | 14.346 ms |
RSA 4096 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=0x10001 | 0.91 MCycles | 14.337 ms |
Service Timing for Prime Generation
Prime generation uses the PrimeGen service.
Operation | Clock Cycles | Timing One Block |
---|---|---|
Regular Generation of two primes, Prime_Length=512 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples) | Mean = 47.4 MCycles | Mean = 0.4s |
Regular Generation of two primes, Prime_Length=512 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples) | Std Dev = 30.3 Mcycles | Std Dev = 0.47s |
Regular Generation of two primes, Prime_Length=1024 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples) | Mean = 419.71 MCycles | Mean = 6.558s |
Regular Generation of two primes, Prime_Length=1024 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples) | Std Dev = 294 Mcycles | Std Dev = 4.59s |
Regular Generation of two primes, Prime_Length=2048 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples) |
Mean = 4.78 GCycles
|
Mean = 74.68s
|
Regular Generation of two primes, Prime_Length=2048 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples) |
Std Dev = 3.05 GCycles
|
Std Dev = 47.65s
|
Service Timing for ECDSA on Prime Field
In the following table, ECDSA signature generation uses the ZpEcDsaGenerateFast service and signature verification uses ZpEcDsaQuickVerify
Operation | Clock Cycles | Timing One block |
---|---|---|
ECDSA GF(p) 256 Generate Fast | 2.67 MCycles | 41.864 ms |
ECDSA GF(p) 256 Verify Quick W=(4,4) Scalar in PUKCC RAM | 1.84 MCycles | 28.888 ms |
ECDSA GF(p) 384 Generate Fast | 6.18 MCycles | 96.712 ms |
ECDSA GF(p) 384 Verify Quick W=(4,4) Scalar in PUKCC RAM | 4.15 MCycles | 64.868 ms |
ECDSA GF(p) 521 Generate Fast | 13.36 MCycles | 208.869 ms |
ECDSA GF(p) 521 Verify Quick W=(4,4) Scalar in PUKCC RAM | 8.81 MCycles | 137.711 ms |
Service Timing for ECDSA on Binary Field
In the following table, ECDSA signature generation uses the GF2NEcDsaGenerateFast service and signature verification uses GF2NEcDsaVerifyFast
Operation | CPU Cycles | Timing One block |
---|---|---|
ECDSA GF(2n) B283 Generate Fast | 3.21 MCycles | 50.301 ms |
ECDSA GF(2n) B283 Verify | 6.40 MCycles | 100.150 ms |
ECDSA GF(2n) B409 Generate Fast | 6.94 Mcycles | 108.554 ms |
ECDSA GF(2n) B409 Verify | 13.73 Mcycles | 214.571 ms |
ECDSA GF(2n) B571 Generate Fast | 15.08 Mcycles | 235.704 ms |
ECDSA GF(2n) B571 Verify | 30.07 MCycles | 469.972 ms |