37.3.8 PUKCL Requirements and Performance

Services Stack Usage

This library is using the main core to execute its computations, and therefore is also sharing some resources with the application.

It may be important for the application to know RAM usage by the library functions and to be aware that the library does not use any global variables.

The following table provides the minimum number of bytes used by the library that have to be available on the stacks to ensure that the functionality can be executed correctly. In some cases, the library may use less bytes than the specified number for some options. This table contains estimated values.

Table 37-112. Services Stack Usage
PUKCL Service STACK Usage (Bytes)
SelfTest 112
ClearFlags 0
Swap 8
Fill 8
CondCopy 24
FastCopy 16
Smult 16
Smult (with reduction) 88
Comp 8
Fmult 24
Fmult (with reduction) 96
Square 16
Square (with reduction) 88
Div 144
GCD 136
RedMod (Setup) 160
RedMod (using fast reduction) 80
RedMod (randomize) 80
RedMod (Normalize) 80
RedMod (Using Division) 184
ExpMod 200
PrimeGen 416
CRT 304
ZpEccAddFast 104
ZpEccAddSubFast 92
ZpEcConvProjToAffine 280
ZpEcConvAffineToProjective 64
ZpEccDblFast 96
ZpEccMulFast 168
ZpEccQuickDualMulFast 216
ZpEcDsaGenerateFast 392
ZpEcDsaVerifyFast 456
ZpEcDsaQuickVerify 368
ZpEcRandomiseCoordinate 56
GF2NEccAddFast 128
GF2NEcConvProjToAffine 264
GF2NEcConvAffineToProjective 56
GF2NEccDblFast 136
GF2NEccMulFast 208
GF2NEcDsaGenerateFast 376
GF2NEcDsaVerifyFast 440
GF2NEcRandomiseCoordinate 56

Parameter Size Limits for Different Services

The following table lists parameter size limits for different services.

For the services ModExp, PrimeGen, and CRT, additional details are available in the service description.

Table 37-113. Parameter Size Limits
API Min/Max Sizes Comments
Swap 4 bytes to 2044 bytes Per block to be swapped
Fill 4 bytes to 4088 bytes
Fast Copy/Clear 4 bytes to 2044 bytes Supposing Length(R) = Length(X)
Conditional Copy/Clear 4 bytes to 2044 bytes Supposing Length(R) = Length(X)

4 bytes to 2040 bytes

Supposing Length(R) = Length(X) + 4 Bytes, No Z Parameter, No Reduction
Compare 4 bytes to 2044 bytes Supposing Length(X) = Length(Y)


Input: 4 bytes to 1020 bytes Output: 4bytes to 2040 bytes Supposing Length(Y) = Length(X), No Z Parameter, No Reduction

Input: 4 bytes to 1020 bytes

Output: 4 bytes to 2040 bytes

Supposing No Z Parameter, No Reduction

Euclidean Division

Divider: 8 to 1016 bytes

Num.: 8 to 2032 bytes

Supposing Length(Num) = 2*Length(Divider)

Mod. inv. / GCD 8 to 1012 bytes

Modulus: 12 to 1016 bytes

Input: 24 to 2032 bytes

Supposing RBase = XBase

Fast ModExp Exp in Crypto RAM

12 to 576 bytes

(96 to 4608 bits)

Supposing Length(Exponent) = Length(Modulus), Window Size = 1

With the Exponent in Crypto RAM

Fast ModExp

Exp not in Crypto RAM

12 to 672 bytes

(96 to 5376 bits)

Supposing Length(Exponent) = Length(Modulus), Window Size = 1

With the Exponent not in Crypto RAM

Prime Gen.

Prime Number: 12 to 448 bytes

(96 to 3584 bits)

Supposing Window Size = 1


Modulus = Two Primes:

Size of one prime from 24 to 448 bytes Modulus = from 48 to 896 bytes

(384 to 7168 bits)

Supposing Length(Exponent) = Length(Modulus), Window Size = 1

ECC Addition qnd Subtraction GF(p)

Modulus: 12 to 308 bytes

ECC Doubling GF(p)

Modulus: 12 to 400 bytes

ECC Multiplication GF(p)

Modulus: 12 to 264 bytes

Supposing Length(Scalar) = Length(Modulus)

ECC Quick Dual Multiplication GF(p)

Modulus: 12 to 152 bytes

ECDSA Generate GF(p)

Modulus: 12 to 220 bytes

(up to 521 bits for common curves)

Supposing Length(Scalar) = Length(Modulus)

ECDSA Verify GF(p)

Modulus: 12 to 188 bytes

(up to 521 bits for common curves)

Supposing Length(Scalar) = Length(Modulus)

ECC Addition GF(2n)

Modulus: 12 to 248 bytes

ECC Doubling GF(2n)

Modulus: 12 to 364 bytes

ECC Multiplication GF(2n)

Modulus: 12 to 250 bytes

Supposing Length(Scalar) = Length(Modulus)

ECDSA Generate GF(2n)

Modulus: 12 to 208 bytes

(up to 571 bits for common curves)

Supposing Length(Scalar) = Length(Modulus)

ECDSA Verify GF(2n)

Modulus: 12 to 180 bytes

(up to 571 bits for common curves)

Supposing Length(Scalar) = Length(Modulus)

ECDSA Quick Verify GF(2n)

Modulus: 12 to 140 bytes

(up to 571 bits for common curves)

Supposing Length(Scalar) = Length(Modulus)

Service Timing

The values in the following tables are estimated performances for CPU clock of 64 MHz. The CPU and PUKCC are operated at the same frequency. Due to possible change in the parameters values, the measurements show approximated values.

Other test conditions:

  • PUKCL library data in Crypto RAM
  • Test code and test data in SRAM
  • ICache and DCache are disabled

Service Timing for RSA

RSA uses the ExpMod service for encryption and decryption. Following tables show service timing, where ‘W’ indicates window size.

Table 37-114. RSA1024
Operation Clock Cycles Timing one block
RSA 1024 decryption / signature generation. No CRT, Regular implementation, W=4 3.05 MCycles 47.799 ms

RSA 1024 decryption / signature generation.

With CRT, Regular implementation, W=4

1.09 MCycles 17.109 ms

RSA 1024 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=3

0.07 MCycles 1.141 ms

RSA 1024 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=0x10001

0.07 MCycles 1.129 ms
Table 37-115. RSA2048
Operation Clock Cycles Timing One block

RSA 2048 decryption / signature generation.

No CRT, Regular implementation, W=4

21.6 MCycles 338.249 ms
RSA 2048 decryption / signature generation. With CRT, Regular implementation, W=4 6.36 MCycles 99.408 ms

RSA 2048 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=3

0.24 MCycles 3.843 ms

RSA 2048 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=0x10001

0.24 MCycles 3.827 ms
Table 37-116. RSA4096
Operation Clock Cycles Timing One block
RSA 4096 Decryption / signature generation. No CRT, Regular implementation, W=1 209 MCycles 3.2742s
RSA 4096 Decryption / signature generation. With CRT, Regular implementation, W=3 46.1 MCycles 720.95 ms

RSA 4096 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=3

0.91 MCycles 14.346 ms

RSA 4096 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=0x10001

0.91 MCycles 14.337 ms

Service Timing for Prime Generation

Prime generation uses the PrimeGen service.

Table 37-117. Prime Generation
Operation Clock Cycles Timing One Block
Regular Generation of two primes, Prime_Length=512 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples) Mean = 47.4 MCycles Mean = 0.4s
Regular Generation of two primes, Prime_Length=512 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples) Std Dev = 30.3 Mcycles Std Dev = 0.47s
Regular Generation of two primes, Prime_Length=1024 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples) Mean = 419.71 MCycles Mean = 6.558s
Regular Generation of two primes, Prime_Length=1024 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples) Std Dev = 294 Mcycles Std Dev = 4.59s
Regular Generation of two primes, Prime_Length=2048 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples)

Mean = 4.78 GCycles

Mean = 74.68s

Regular Generation of two primes, Prime_Length=2048 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples)

Std Dev = 3.05 GCycles

Std Dev = 47.65s

Service Timing for ECDSA on Prime Field

In the following table, ECDSA signature generation uses the ZpEcDsaGenerateFast service and signature verification uses ZpEcDsaQuickVerify

Table 37-118. ECDSA GF(p)
Operation Clock Cycles Timing One block
ECDSA GF(p) 256 Generate Fast 2.67 MCycles 41.864 ms

ECDSA GF(p) 256 Verify Quick W=(4,4)

Scalar in PUKCC RAM

1.84 MCycles 28.888 ms
ECDSA GF(p) 384 Generate Fast 6.18 MCycles 96.712 ms

ECDSA GF(p) 384 Verify Quick W=(4,4)

Scalar in PUKCC RAM

4.15 MCycles 64.868 ms
ECDSA GF(p) 521 Generate Fast 13.36 MCycles 208.869 ms

ECDSA GF(p) 521 Verify Quick W=(4,4)

Scalar in PUKCC RAM

8.81 MCycles 137.711 ms

Service Timing for ECDSA on Binary Field

In the following table, ECDSA signature generation uses the GF2NEcDsaGenerateFast service and signature verification uses GF2NEcDsaVerifyFast

Table 37-119. ECDSA GF(2n)
Operation CPU Cycles Timing One block
ECDSA GF(2n) B283 Generate Fast 3.21 MCycles 50.301 ms
ECDSA GF(2n) B283 Verify 6.40 MCycles 100.150 ms
ECDSA GF(2n) B409 Generate Fast 6.94 Mcycles 108.554 ms
ECDSA GF(2n) B409 Verify 13.73 Mcycles 214.571 ms
ECDSA GF(2n) B571 Generate Fast 15.08 Mcycles 235.704 ms
ECDSA GF(2n) B571 Verify 30.07 MCycles 469.972 ms