37.3.8 PUKCL Requirements and Performance

Services Stack Usage

This library is using the main core to execute its computations, and therefore is also sharing some resources with the application.

It may be important for the application to know RAM usage by the library functions and to be aware that the library does not use any global variables.

The following table provides the minimum number of bytes used by the library that have to be available on the stacks to ensure that the functionality can be executed correctly. In some cases, the library may use less bytes than the specified number for some options. This table contains estimated values.

Table 37-112. Services Stack Usage
PUKCL ServiceSTACK Usage (Bytes)
SelfTest112
ClearFlags0
Swap8
Fill8
CondCopy24
FastCopy16
Smult16
Smult (with reduction)88
Comp8
Fmult24
Fmult (with reduction)96
Square16
Square (with reduction)88
Div144
GCD136
RedMod (Setup)160
RedMod (using fast reduction)80
RedMod (randomize)80
RedMod (Normalize)80
RedMod (Using Division)184
ExpMod200
PrimeGen416
CRT304
ZpEccAddFast104
ZpEccAddSubFast92
ZpEcConvProjToAffine280
ZpEcConvAffineToProjective64
ZpEccDblFast96
ZpEccMulFast168
ZpEccQuickDualMulFast216
ZpEcDsaGenerateFast392
ZpEcDsaVerifyFast456
ZpEcDsaQuickVerify368
ZpEcRandomiseCoordinate56
GF2NEccAddFast128
GF2NEcConvProjToAffine264
GF2NEcConvAffineToProjective56
GF2NEccDblFast136
GF2NEccMulFast208
GF2NEcDsaGenerateFast376
GF2NEcDsaVerifyFast440
GF2NEcRandomiseCoordinate56

Parameter Size Limits for Different Services

The following table lists parameter size limits for different services.

For the services ModExp, PrimeGen, and CRT, additional details are available in the service description.

Table 37-113. Parameter Size Limits
APIMin/Max SizesComments
SelfTest
ClearFlags
Swap4 bytes to 2044 bytesPer block to be swapped
Fill4 bytes to 4088 bytes
Fast Copy/Clear4 bytes to 2044 bytesSupposing Length(R) = Length(X)
Conditional Copy/Clear4 bytes to 2044 bytesSupposing Length(R) = Length(X)
Smult

4 bytes to 2040 bytes

Supposing Length(R) = Length(X) + 4 Bytes, No Z Parameter, No Reduction
Compare4 bytes to 2044 bytesSupposing Length(X) = Length(Y)

FMult

Input: 4 bytes to 1020 bytes Output: 4bytes to 2040 bytesSupposing Length(Y) = Length(X), No Z Parameter, No Reduction
Square

Input: 4 bytes to 1020 bytes

Output: 4 bytes to 2040 bytes

Supposing No Z Parameter, No Reduction

Euclidean Division

Divider: 8 to 1016 bytes

Num.: 8 to 2032 bytes

Supposing Length(Num) = 2*Length(Divider)

Mod. inv. / GCD8 to 1012 bytes
ModRed

Modulus: 12 to 1016 bytes

Input: 24 to 2032 bytes

Supposing RBase = XBase

Fast ModExp Exp in Crypto RAM

12 to 576 bytes

(96 to 4608 bits)

Supposing Length(Exponent) = Length(Modulus), Window Size = 1

With the Exponent in Crypto RAM

Fast ModExp

Exp not in Crypto RAM

12 to 672 bytes

(96 to 5376 bits)

Supposing Length(Exponent) = Length(Modulus), Window Size = 1

With the Exponent not in Crypto RAM

Prime Gen.

Prime Number: 12 to 448 bytes

(96 to 3584 bits)

Supposing Window Size = 1

CRT

Modulus = Two Primes:

Size of one prime from 24 to 448 bytes Modulus = from 48 to 896 bytes

(384 to 7168 bits)

Supposing Length(Exponent) = Length(Modulus), Window Size = 1

ECC Addition qnd Subtraction GF(p)

Modulus: 12 to 308 bytes

ECC Doubling GF(p)

Modulus: 12 to 400 bytes

ECC Multiplication GF(p)

Modulus: 12 to 264 bytes

Supposing Length(Scalar) = Length(Modulus)

ECC Quick Dual Multiplication GF(p)

Modulus: 12 to 152 bytes

ECDSA Generate GF(p)

Modulus: 12 to 220 bytes

(up to 521 bits for common curves)

Supposing Length(Scalar) = Length(Modulus)

ECDSA Verify GF(p)

Modulus: 12 to 188 bytes

(up to 521 bits for common curves)

Supposing Length(Scalar) = Length(Modulus)

ECC Addition GF(2n)

Modulus: 12 to 248 bytes

ECC Doubling GF(2n)

Modulus: 12 to 364 bytes

ECC Multiplication GF(2n)

Modulus: 12 to 250 bytes

Supposing Length(Scalar) = Length(Modulus)

ECDSA Generate GF(2n)

Modulus: 12 to 208 bytes

(up to 571 bits for common curves)

Supposing Length(Scalar) = Length(Modulus)

ECDSA Verify GF(2n)

Modulus: 12 to 180 bytes

(up to 571 bits for common curves)

Supposing Length(Scalar) = Length(Modulus)

ECDSA Quick Verify GF(2n)

Modulus: 12 to 140 bytes

(up to 571 bits for common curves)

Supposing Length(Scalar) = Length(Modulus)

Service Timing

The values in the following tables are estimated performances for CPU clock of 64 MHz. The CPU and PUKCC are operated at the same frequency. Due to possible change in the parameters values, the measurements show approximated values.

Other test conditions:

  • PUKCL library data in Crypto RAM
  • Test code and test data in SRAM
  • ICache and DCache are disabled

Service Timing for RSA

RSA uses the ExpMod service for encryption and decryption. Following tables show service timing, where ‘W’ indicates window size.

Table 37-114. RSA1024
OperationClock CyclesTiming one block
RSA 1024 decryption / signature generation. No CRT, Regular implementation, W=43.05 MCycles47.799 ms

RSA 1024 decryption / signature generation.

With CRT, Regular implementation, W=4

1.09 MCycles17.109 ms

RSA 1024 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=3

0.07 MCycles1.141 ms

RSA 1024 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=0x10001

0.07 MCycles1.129 ms
Table 37-115. RSA2048
OperationClock CyclesTiming One block

RSA 2048 decryption / signature generation.

No CRT, Regular implementation, W=4

21.6 MCycles338.249 ms
RSA 2048 decryption / signature generation. With CRT, Regular implementation, W=46.36 MCycles99.408 ms

RSA 2048 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=3

0.24 MCycles3.843 ms

RSA 2048 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=0x10001

0.24 MCycles3.827 ms
Table 37-116. RSA4096
OperationClock CyclesTiming One block
RSA 4096 Decryption / signature generation. No CRT, Regular implementation, W=1209 MCycles3.2742s
RSA 4096 Decryption / signature generation. With CRT, Regular implementation, W=346.1 MCycles720.95 ms

RSA 4096 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=3

0.91 MCycles14.346 ms

RSA 4096 encryption / signature verification.

No CRT, Fast implementation, W=1 Exponent=0x10001

0.91 MCycles14.337 ms

Service Timing for Prime Generation

Prime generation uses the PrimeGen service.

Table 37-117. Prime Generation
OperationClock CyclesTiming One Block
Regular Generation of two primes, Prime_Length=512 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples)Mean = 47.4 MCyclesMean = 0.4s
Regular Generation of two primes, Prime_Length=512 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples)Std Dev = 30.3 McyclesStd Dev = 0.47s
Regular Generation of two primes, Prime_Length=1024 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples)Mean = 419.71 MCyclesMean = 6.558s
Regular Generation of two primes, Prime_Length=1024 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples)Std Dev = 294 McyclesStd Dev = 4.59s
Regular Generation of two primes, Prime_Length=2048 bits, W=4, Rabin Miller Iterations Number = 3, (average of 200 samples)

Mean = 4.78 GCycles

Mean = 74.68s

Regular Generation of two primes, Prime_Length=2048 bits, W=4, Rabin Miller Iterations Number = 3, (Standard Deviation for 200 samples)

Std Dev = 3.05 GCycles

Std Dev = 47.65s

Service Timing for ECDSA on Prime Field

In the following table, ECDSA signature generation uses the ZpEcDsaGenerateFast service and signature verification uses ZpEcDsaQuickVerify

Table 37-118. ECDSA GF(p)
OperationClock CyclesTiming One block
ECDSA GF(p) 256 Generate Fast2.67 MCycles41.864 ms

ECDSA GF(p) 256 Verify Quick W=(4,4)

Scalar in PUKCC RAM

1.84 MCycles28.888 ms
ECDSA GF(p) 384 Generate Fast6.18 MCycles96.712 ms

ECDSA GF(p) 384 Verify Quick W=(4,4)

Scalar in PUKCC RAM

4.15 MCycles64.868 ms
ECDSA GF(p) 521 Generate Fast13.36 MCycles208.869 ms

ECDSA GF(p) 521 Verify Quick W=(4,4)

Scalar in PUKCC RAM

8.81 MCycles137.711 ms

Service Timing for ECDSA on Binary Field

In the following table, ECDSA signature generation uses the GF2NEcDsaGenerateFast service and signature verification uses GF2NEcDsaVerifyFast

Table 37-119. ECDSA GF(2n)
OperationCPU CyclesTiming One block
ECDSA GF(2n) B283 Generate Fast3.21 MCycles50.301 ms
ECDSA GF(2n) B283 Verify6.40 MCycles100.150 ms
ECDSA GF(2n) B409 Generate Fast6.94 Mcycles108.554 ms
ECDSA GF(2n) B409 Verify13.73 Mcycles214.571 ms
ECDSA GF(2n) B571 Generate Fast15.08 Mcycles235.704 ms
ECDSA GF(2n) B571 Verify30.07 MCycles469.972 ms