43.3.8 PUKCL Requirements and Performance
Services Stack Usage
This library is using the main core to execute its computations, and therefore is also sharing some resources with the application.
It may be important for the application to know RAM usage by the library functions and to be aware that the library does not use any global variables.
The following table provides the minimum number of bytes used by the library that have to be available on the stacks to ensure that the functionality can be executed correctly. In some cases, the library may use less bytes than the specified number for some options. This table contains estimated values.
PUKCL Service | STACK Usage (Bytes) |
---|---|
SelfTest | 112 |
ClearFlags | 0 |
Swap | 8 |
Fill | 8 |
CondCopy | 24 |
FastCopy | 16 |
Smult | 16 |
Smult (with reduction) | 88 |
Comp | 8 |
Fmult | 24 |
Fmult (with reduction) | 96 |
Square | 16 |
Square (with reduction) | 88 |
Div | 144 |
GCD | 136 |
RedMod (Setup) | 160 |
RedMod (using fast reduction) | 80 |
RedMod (randomize) | 80 |
RedMod (Normalize) | 80 |
RedMod (Using Division) | 184 |
ExpMod | 200 |
PrimeGen | 416 |
CRT | 304 |
ZpEccAddFast | 104 |
ZpEccAddSubFast | 92 |
ZpEcConvProjToAffine | 280 |
ZpEcConvAffineToProjective | 64 |
ZpEccDblFast | 96 |
ZpEccMulFast | 168 |
ZpEccQuickDualMulFast | 216 |
ZpEcDsaGenerateFast | 392 |
ZpEcDsaVerifyFast | 456 |
ZpEcDsaQuickVerify | 368 |
ZpEcRandomiseCoordinate | 56 |
GF2NEccAddFast | 128 |
GF2NEcConvProjToAffine | 264 |
GF2NEcConvAffineToProjective | 56 |
GF2NEccDblFast | 136 |
GF2NEccMulFast | 208 |
GF2NEcDsaGenerateFast | 376 |
GF2NEcDsaVerifyFast | 440 |
GF2NEcRandomiseCoordinate | 56 |
Parameter Size Limits for Different Services
The following table lists parameter size limits for different services.
For the services ModExp, PrimeGen, and CRT, additional details are available in the service description.
API | Min/Max Sizes | Comments |
---|---|---|
SelfTest | – | – |
ClearFlags | – | – |
Swap | 4 bytes to 2044 bytes | Per block to be swapped |
Fill | 4 bytes to 4088 bytes | – |
Fast Copy/Clear | 4 bytes to 2044 bytes | Supposing Length(R) = Length(X) |
Conditional Copy/Clear | 4 bytes to 2044 bytes | Supposing Length(R) = Length(X) |
Smult |
4 bytes to 2040 bytes | Supposing Length(R) = Length(X) + 4 Bytes, No Z Parameter, No Reduction |
Compare | 4 bytes to 2044 bytes | Supposing Length(X) = Length(Y) |
FMult | Input: 4 bytes to 1020 bytes Output: 4bytes to 2040 bytes | Supposing Length(Y) = Length(X), No Z Parameter, No Reduction |
Square |
Input: 4 bytes to 1020 bytes Output: 4 bytes to 2040 bytes |
Supposing No Z Parameter, No Reduction |
Euclidean Division |
Divider: 8 to 1016 bytes Num.: 8 to 2032 bytes |
Supposing Length(Num) = 2*Length(Divider) |
Mod. inv. / GCD | 8 to 1012 bytes | – |
ModRed |
Modulus: 12 to 1016 bytes Input: 24 to 2032 bytes |
Supposing RBase = XBase |
Fast ModExp Exp in Crypto RAM |
12 to 576 bytes (96 to 4608 bits) |
Supposing Length(Exponent) = Length(Modulus), Window Size = 1 With the Exponent in Crypto RAM |
Fast ModExp Exp not in Crypto RAM |
12 to 672 bytes (96 to 5376 bits) |
Supposing Length(Exponent) = Length(Modulus), Window Size = 1 With the Exponent not in Crypto RAM |
Prime Gen. |
Prime Number: 12 to 448 bytes (96 to 3584 bits) |
Supposing Window Size = 1 |
CRT |
Modulus = Two Primes: Size of one prime from 24 to 448 bytes Modulus = from 48 to 896 bytes (384 to 7168 bits) |
Supposing Length(Exponent) = Length(Modulus), Window Size = 1 |
ECC Addition qnd Subtraction GF(p) |
Modulus: 12 to 308 bytes | – |
ECC Doubling GF(p) |
Modulus: 12 to 400 bytes | – |
ECC Multiplication GF(p) |
Modulus: 12 to 264 bytes |
Supposing Length(Scalar) = Length(Modulus) |
ECC Quick Dual Multiplication GF(p) |
Modulus: 12 to 152 bytes | – |
ECDSA Generate GF(p) |
Modulus: 12 to 220 bytes (up to 521 bits for common curves) |
Supposing Length(Scalar) = Length(Modulus) |
ECDSA Verify GF(p) |
Modulus: 12 to 188 bytes (up to 521 bits for common curves) |
Supposing Length(Scalar) = Length(Modulus) |
ECC Addition GF(2n) |
Modulus: 12 to 248 bytes | – |
ECC Doubling GF(2n) |
Modulus: 12 to 364 bytes | – |
ECC Multiplication GF(2n) |
Modulus: 12 to 250bytes |
Supposing Length(Scalar) = Length(Modulus) |
ECDSA Generate GF(2n) |
Modulus: 12 to 208 bytes (up to 571 bits for common curves) |
Supposing Length(Scalar) = Length(Modulus) |
ECDSA Verify GF(2n) |
Modulus: 12 to 180 bytes (up to 571 bits for common curves) |
Supposing Length(Scalar) = Length(Modulus) |
ECDSA Quick Verify GF(2n) |
Modulus: 12 to 140 bytes (up to 571 bits for common curves) |
Supposing Length(Scalar) = Length(Modulus) |
Service Timing
The values in the following tables are estimated performances for CPU clock of 120 MHz. The CPU and PUKCC are operated at the same frequency. Due to possible change in the parameters values, the measurements show approximated values.
Other test conditions:
- PUKCL library data in Crypto RAM
- Test code and test data in SRAM
- ICache and DCache are disabled
Service Timing for RSA
RSA uses the ExpMod service for encryption and decryption. Following tables show service timing, where ‘W’ indicates window size.
Operation | Clock Cycles | Timing one block |
---|---|---|
RSA 1024 decryption / signature generation. No CRT, Regular implementation, W=4 | 3.05 MCycles | 25.42 ms |
RSA 1024 decryption / signature generation. With CRT, Regular implementation, W=4 | 1.04 MCycles | 8.67 ms |
RSA 1024 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=3 | 0.07 MCycles | 0.58 ms |
RSA 1024 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=0x10001 | 0.07 MCycles | 0.58 ms |
Operation | Clock Cycles | Timing One block |
---|---|---|
RSA 2048 decryption / signature generation. No CRT, Regular implementation, W=4 | 21.9 MCycles | 182 ms |
RSA 2048 decryption / signature generation. With CRT, Regular implementation, W=4 | 6.19 MCycles | 51.6 ms |
RSA 2048 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=3 | 0.24 MCycles | 2 ms |
RSA 2048 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=0x10001 | 0.24 MCycles | 2 ms |
Operation | Clock Cycles | Timing One block |
---|---|---|
RSA 4096 Decryption / signature generation. No CRT, Regular implementation, W=1 | 208 MCycles | 1.73s |
RSA 4096 Decryption / signature generation. With CRT, Regular implementation, W=3 | 45.5 MCycles | 379 ms |
RSA 4096 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=3 | 0.92 MCycles | 7.67 ms |
RSA 4096 encryption / signature verification. No CRT, Fast implementation, W=1 Exponent=0x10001 | 0.92 MCycles | 7.67 ms |
Service Timing for Prime Generation
Prime generation uses the PrimeGen service.
Operation | Clock Cycles | Timing One Block |
---|---|---|
Regular Generation of two primes, Prime_Length=512 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples) | Mean = 47.4 MCycles | Mean = 0.40s |
Regular Generation of two primes, Prime_Length=512 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples) | Std Dev = 30.3 Mcycles | Std Dev = 0.25s |
Regular Generation of two primes, Prime_Length=1024 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples) | Mean = 448 MCycles | Mean = 3.73s |
Regular Generation of two primes, Prime_Length=1024 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples) | Std Dev = 294 Mcycles | Std Dev = 2.45s |
Regular Generation of two primes, Prime_Length=2048 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples) |
Mean = 4.78 GCycles
|
Mean = 39.8s
|
Regular Generation of two primes, Prime_Length=2048 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples) |
Std Dev = 3,05 GCycles
|
Std Dev = 25.4s
|
Service Timing for ECDSA on Prime Field
In the following table, ECDSA signature generation uses the ZpEcDsaGenerateFast service and signature verification uses ZpEcDsaQuickVerify
Operation | Clock Cycles | Timing One block |
---|---|---|
ECDSA GF(p) 256 Generate Fast | 2.72 MCycles | 22.7 ms |
ECDSA GF(p) 256 Verify Quick W=(6,6) Scalar in Classical RAM | 1.78 MCycles | 14.8 ms |
ECDSA GF(p) 256 Verify Quick W=(4,4) Scalar in PUKCC RAM | 1.83 MCycles | 15.2 ms |
ECDSA GF(p) 384 Generate Fast | 6.28 MCycles | 52.3 ms |
ECDSA GF(p) 384 Verify Quick W=(5,5) Scalar in Classical RAM | 3.93 MCycles | 32.8 ms |
ECDSA GF(p) 384 Verify Quick W=(4,4) Scalar in PUKCC RAM | 4.09 MCycles | 34.1 ms |
ECDSA GF(p) 521 Generate Fast | 13.4 MCycles | 112 ms |
ECDSA GF(p) 521 Verify Quick W=(4,5) Scalar in Classical RAM | 8.4 MCycles | 70.3 ms |
ECDSA GF(p) 521 Verify Quick W=(4,4) Scalar in PUKCC RAM | 8.6 MCycles | 72ms |
Service Timing for ECDSA on Binary Field
In the following table, ECDSA signature generation uses the GF2NEcDsaGenerateFast service and signature verification uses GF2NEcDsaVerifyFast
Operation | CPU Cycles | Timing One block |
---|---|---|
ECDSA GF(2n) B283 Generate Fast | 3.21 MCycles | 26.8 ms |
ECDSA GF(2n) B283 Verify | 6.44 MCycles | 53.5 ms |
ECDSA GF(2n) B409 Generate Fast | 6.93 Mcycles | 57.8 ms |
ECDSA GF(2n) B409 Verify | 13.8 Mcycles | 115 ms |
ECDSA GF(2n) B571 Generate Fast | 15.1 Mcycles | 125 ms |
ECDSA GF(2n) B571 Verify | 30.1 MCycles | 251 ms |