2011-02-22

"OpenSSL speed" on iPhone 4

A test I thought could be interesting to get some (diffuse) idea of the processing power of the iPhone 4:

Mijails-iPhone:~ mobile$ uname -a
Darwin Mijails-iPhone 10.4.0 Darwin Kernel Version 10.4.0: Wed Oct 20 20:14:45 PDT 2010; root:xnu-1504.58.28~3/RELEASE_ARM_S5L8930X iPhone3,1 arm N90AP Darwin
Mijails-iPhone:~ mobile$ openssl speed
To get the most accurate results, try to run this
program when this computer is idle.
Doing md2 for 3s on 16 size blocks: 113903 md2's in 2.93s
Doing md2 for 3s on 64 size blocks: 59789 md2's in 2.96s
Doing md2 for 3s on 256 size blocks: 20740 md2's in 2.94s
Doing md2 for 3s on 1024 size blocks: 5722 md2's in 2.95s
Doing md2 for 3s on 8192 size blocks: 741 md2's in 2.95s
Doing md4 for 3s on 16 size blocks: 962184 md4's in 2.99s
Doing md4 for 3s on 64 size blocks: 808365 md4's in 2.94s
Doing md4 for 3s on 256 size blocks: 553565 md4's in 2.94s
Doing md4 for 3s on 1024 size blocks: 251644 md4's in 2.96s
Doing md4 for 3s on 8192 size blocks: 41204 md4's in 2.94s
Doing md5 for 3s on 16 size blocks: 512656 md5's in 2.88s
Doing md5 for 3s on 64 size blocks: 446107 md5's in 2.70s
Doing md5 for 3s on 256 size blocks: 323702 md5's in 2.65s
Doing md5 for 3s on 1024 size blocks: 169849 md5's in 2.79s
Doing md5 for 3s on 8192 size blocks: 30073 md5's in 2.68s
Doing hmac(md5) for 3s on 16 size blocks: 933276 hmac(md5)'s in 2.62s
Doing hmac(md5) for 3s on 64 size blocks: 741506 hmac(md5)'s in 2.72s
Doing hmac(md5) for 3s on 256 size blocks: 490370 hmac(md5)'s in 2.62s
Doing hmac(md5) for 3s on 1024 size blocks: 202907 hmac(md5)'s in 2.63s
Doing hmac(md5) for 3s on 8192 size blocks: 28168 hmac(md5)'s in 2.32s
Doing sha1 for 3s on 16 size blocks: 466460 sha1's in 2.83s
Doing sha1 for 3s on 64 size blocks: 498797 sha1's in 2.92s
Doing sha1 for 3s on 256 size blocks: 298535 sha1's in 2.90s
Doing sha1 for 3s on 1024 size blocks: 114257 sha1's in 2.94s
Doing sha1 for 3s on 8192 size blocks: 16773 sha1's in 2.94s
Doing sha256 for 3s on 16 size blocks: 577481 sha256's in 2.93s
Doing sha256 for 3s on 64 size blocks: 350266 sha256's in 2.96s
Doing sha256 for 3s on 256 size blocks: 158648 sha256's in 2.96s
Doing sha256 for 3s on 1024 size blocks: 49984 sha256's in 2.93s
Doing sha256 for 3s on 8192 size blocks: 6722 sha256's in 2.88s
Doing sha512 for 3s on 16 size blocks: 120505 sha512's in 2.93s
Doing sha512 for 3s on 64 size blocks: 121406 sha512's in 2.91s
Doing sha512 for 3s on 256 size blocks: 43968 sha512's in 2.92s
Doing sha512 for 3s on 1024 size blocks: 15084 sha512's in 2.94s
Doing sha512 for 3s on 8192 size blocks: 2119 sha512's in 2.96s
Doing rmd160 for 3s on 16 size blocks: 500321 rmd160's in 2.88s
Doing rmd160 for 3s on 64 size blocks: 518314 rmd160's in 2.93s
Doing rmd160 for 3s on 256 size blocks: 294776 rmd160's in 2.93s
Doing rmd160 for 3s on 1024 size blocks: 108497 rmd160's in 2.92s
Doing rmd160 for 3s on 8192 size blocks: 14321 rmd160's in 2.69s
Doing rc4 for 3s on 16 size blocks: 8838055 rc4's in 2.96s
Doing rc4 for 3s on 64 size blocks: 2573509 rc4's in 2.91s
Doing rc4 for 3s on 256 size blocks: 670210 rc4's in 2.93s
Doing rc4 for 3s on 1024 size blocks: 168703 rc4's in 2.92s
Doing rc4 for 3s on 8192 size blocks: 20854 rc4's in 2.71s
Doing des cbc for 3s on 16 size blocks: 1656140 des cbc's in 2.75s
Doing des cbc for 3s on 64 size blocks: 425114 des cbc's in 2.83s
Doing des cbc for 3s on 256 size blocks: 108385 des cbc's in 2.87s
Doing des cbc for 3s on 1024 size blocks: 28718 des cbc's in 2.76s
Doing des cbc for 3s on 8192 size blocks: 3510 des cbc's in 2.81s
Doing des ede3 for 3s on 16 size blocks: 646206 des ede3's in 2.83s
Doing des ede3 for 3s on 64 size blocks: 164355 des ede3's in 2.87s
Doing des ede3 for 3s on 256 size blocks: 41628 des ede3's in 2.83s
Doing des ede3 for 3s on 1024 size blocks: 10274 des ede3's in 2.87s
Doing des ede3 for 3s on 8192 size blocks: 1241 des ede3's in 2.80s
Doing aes-128 cbc for 3s on 16 size blocks: 2403683 aes-128 cbc's in 2.89s
Doing aes-128 cbc for 3s on 64 size blocks: 649635 aes-128 cbc's in 2.81s
Doing aes-128 cbc for 3s on 256 size blocks: 174904 aes-128 cbc's in 2.89s
Doing aes-128 cbc for 3s on 1024 size blocks: 45134 aes-128 cbc's in 2.94s
Doing aes-128 cbc for 3s on 8192 size blocks: 5175 aes-128 cbc's in 2.68s
Doing aes-192 cbc for 3s on 16 size blocks: 2168292 aes-192 cbc's in 2.91s
Doing aes-192 cbc for 3s on 64 size blocks: 600481 aes-192 cbc's in 2.93s
Doing aes-192 cbc for 3s on 256 size blocks: 155498 aes-192 cbc's in 2.90s
Doing aes-192 cbc for 3s on 1024 size blocks: 39192 aes-192 cbc's in 2.91s
Doing aes-192 cbc for 3s on 8192 size blocks: 4777 aes-192 cbc's in 2.86s
Doing aes-256 cbc for 3s on 16 size blocks: 1983041 aes-256 cbc's in 2.94s
Doing aes-256 cbc for 3s on 64 size blocks: 540276 aes-256 cbc's in 2.90s
Doing aes-256 cbc for 3s on 256 size blocks: 137713 aes-256 cbc's in 2.89s
Doing aes-256 cbc for 3s on 1024 size blocks: 35393 aes-256 cbc's in 2.93s
Doing aes-256 cbc for 3s on 8192 size blocks: 4384 aes-256 cbc's in 2.93s
Doing aes-128 ige for 3s on 16 size blocks: 2375637 aes-128 ige's in 2.92s
Doing aes-128 ige for 3s on 64 size blocks: 698695 aes-128 ige's in 2.94s
Doing aes-128 ige for 3s on 256 size blocks: 183967 aes-128 ige's in 2.91s
Doing aes-128 ige for 3s on 1024 size blocks: 46891 aes-128 ige's in 2.86s
Doing aes-128 ige for 3s on 8192 size blocks: 5870 aes-128 ige's in 2.93s
Doing aes-192 ige for 3s on 16 size blocks: 2129667 aes-192 ige's in 2.94s
Doing aes-192 ige for 3s on 64 size blocks: 615957 aes-192 ige's in 2.90s
Doing aes-192 ige for 3s on 256 size blocks: 159366 aes-192 ige's in 2.89s
Doing aes-192 ige for 3s on 1024 size blocks: 40431 aes-192 ige's in 2.90s
Doing aes-192 ige for 3s on 8192 size blocks: 4566 aes-192 ige's in 2.62s
Doing aes-256 ige for 3s on 16 size blocks: 1874993 aes-256 ige's in 2.85s
Doing aes-256 ige for 3s on 64 size blocks: 546698 aes-256 ige's in 2.93s
Doing aes-256 ige for 3s on 256 size blocks: 143584 aes-256 ige's in 2.87s
Doing aes-256 ige for 3s on 1024 size blocks: 36373 aes-256 ige's in 2.82s
Doing aes-256 ige for 3s on 8192 size blocks: 4479 aes-256 ige's in 2.88s
Doing idea cbc for 3s on 16 size blocks: 1616991 idea cbc's in 2.84s
Doing idea cbc for 3s on 64 size blocks: 419827 idea cbc's in 2.85s
Doing idea cbc for 3s on 256 size blocks: 107747 idea cbc's in 2.91s
Doing idea cbc for 3s on 1024 size blocks: 26694 idea cbc's in 2.90s
Doing idea cbc for 3s on 8192 size blocks: 3366 idea cbc's in 2.91s
Doing rc2 cbc for 3s on 16 size blocks: 1456945 rc2 cbc's in 2.86s
Doing rc2 cbc for 3s on 64 size blocks: 387618 rc2 cbc's in 2.92s
Doing rc2 cbc for 3s on 256 size blocks: 98467 rc2 cbc's in 2.89s
Doing rc2 cbc for 3s on 1024 size blocks: 24847 rc2 cbc's in 2.92s
Doing rc2 cbc for 3s on 8192 size blocks: 3091 rc2 cbc's in 2.95s
Doing blowfish cbc for 3s on 16 size blocks: 3588480 blowfish cbc's in 2.92s
Doing blowfish cbc for 3s on 64 size blocks: 969771 blowfish cbc's in 2.92s
Doing blowfish cbc for 3s on 256 size blocks: 244745 blowfish cbc's in 2.89s
Doing blowfish cbc for 3s on 1024 size blocks: 63589 blowfish cbc's in 2.97s
Doing blowfish cbc for 3s on 8192 size blocks: 7246 blowfish cbc's in 2.65s
Doing cast cbc for 3s on 16 size blocks: 2605484 cast cbc's in 2.93s
Doing cast cbc for 3s on 64 size blocks: 689905 cast cbc's in 2.88s
Doing cast cbc for 3s on 256 size blocks: 158869 cast cbc's in 2.80s
Doing cast cbc for 3s on 1024 size blocks: 44980 cast cbc's in 2.90s
Doing cast cbc for 3s on 8192 size blocks: 6318 cast cbc's in 2.92s
Doing 512 bit private rsa's for 10s: 2657 512 bit private RSA's in 9.77s
Doing 512 bit public rsa's for 10s: 29891 512 bit public RSA's in 9.75s
Doing 1024 bit private rsa's for 10s: 507 1024 bit private RSA's in 9.81s
Doing 1024 bit public rsa's for 10s: 9986 1024 bit public RSA's in 9.80s
Doing 2048 bit private rsa's for 10s: 79 2048 bit private RSA's in 9.53s
Doing 2048 bit public rsa's for 10s: 2934 2048 bit public RSA's in 9.85s
Doing 4096 bit private rsa's for 10s: 13 4096 bit private RSA's in 10.50s
Doing 4096 bit public rsa's for 10s: 818 4096 bit public RSA's in 9.70s
Doing 512 bit sign dsa's for 10s: 3096 512 bit DSA signs in 9.80s
Doing 512 bit verify dsa's for 10s: 2681 512 bit DSA verify in 9.78s
Doing 1024 bit sign dsa's for 10s: 1004 1024 bit DSA signs in 9.58s
Doing 1024 bit verify dsa's for 10s: 877 1024 bit DSA verify in 9.78s
Doing 2048 bit sign dsa's for 10s: 297 2048 bit DSA signs in 9.68s
Doing 2048 bit verify dsa's for 10s: 249 2048 bit DSA verify in 9.73s
OpenSSL 0.9.8k 25 Mar 2009
built on: date not available
options:bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: arm-apple-darwin9-gcc -fPIC -fno-common -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -D__DARWIN_UNIX03 -O3 -fomit-frame-pointer -fno-common
available timing options: TIMEB USE_TOD HZ=100 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2                622.00k     1292.74k     1805.93k     1986.21k     2057.72k
mdc2                 0.00         0.00         0.00         0.00         0.00 
md4               5148.81k    17597.06k    48201.58k    87055.22k   114810.60k
md5               2848.09k    10574.39k    31270.83k    62338.84k    91924.63k
hmac(md5)         5699.40k    17447.20k    47914.02k    79002.57k    99462.18k
sha1              2637.23k    10932.54k    26353.43k    39795.64k    46736.20k
rmd160            2779.56k    11321.53k    25755.17k    38048.26k    43612.50k
rc4              47773.27k    56599.51k    58557.60k    59161.60k    63039.10k
des cbc           9635.72k     9613.89k     9667.79k    10654.79k    10232.71k
des ede3          3653.46k     3665.06k     3765.64k     3665.71k     3630.81k
idea cbc          9109.81k     9427.69k     9478.77k     9425.74k     9475.69k
seed cbc             0.00         0.00         0.00         0.00         0.00 
rc2 cbc           8150.74k     8495.74k     8722.34k     8713.47k     8583.55k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00 
blowfish cbc     19662.90k    21255.25k    21679.83k    21924.29k    22399.71k
cast cbc         14227.90k    15331.22k    14525.17k    15882.59k    17725.02k
aes-128 cbc      13307.59k    14795.96k    15493.23k    15720.14k    15818.51k
aes-192 cbc      11921.88k    13116.31k    13726.72k    13791.27k    13682.93k
aes-256 cbc      10792.06k    11923.33k    12198.80k    12369.43k    12257.25k
camellia-128 cbc        0.00         0.00         0.00         0.00         0.00 
camellia-192 cbc        0.00         0.00         0.00         0.00         0.00 
camellia-256 cbc        0.00         0.00         0.00         0.00         0.00 
sha256            3153.48k     7573.32k    13720.91k    17468.81k    19120.36k
sha512             658.05k     2670.10k     3854.73k     5253.75k     5864.48k
aes-128 ige      13017.19k    15209.69k    16184.04k    16788.95k    16411.96k
aes-192 ige      11590.02k    13593.53k    14116.85k    14276.33k    14276.59k
aes-256 ige      10526.28k    11941.53k    12807.49k    13207.78k    12740.27k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.003677s 0.000326s    272.0   3065.7
rsa 1024 bits 0.019349s 0.000981s     51.7   1019.0
rsa 2048 bits 0.120633s 0.003357s      8.3    297.9
rsa 4096 bits 0.807692s 0.011858s      1.2     84.3
                  sign    verify    sign/s verify/s
dsa  512 bits 0.003165s 0.003648s    315.9    274.1
dsa 1024 bits 0.009542s 0.011152s    104.8     89.7
dsa 2048 bits 0.032593s 0.039076s     30.7     25.6

OpenSSL has optimizations for different architectures; if I remember correctly, when I checked the sources there were implementations using vector extensions (MMX, SSE and such) and downright assembler for Intel processors.

So it would be interesting to see if any of those optimizations do exist for ARM (and if so, how good are they). I already had this "problem" when trying the openssl speed command to compare speeds between OS X/PowerPC and Windows/Intel: the PowerPC version seemed stupidly slow, when it should be faster, given that Seti@Home, which is supposedly actively ported and optimized, reported better speeds on the PowerPC machine.

Of course the comparison still is interesting in that at the end of the day you get a comparison of how fast the full "stack" (CPU/hardware, compiler, sources and optimizations) is going to work; but it's difficult to get any better granularity than that.

Would be interesting to see how would a JIT change the speeds, absolutely in each platform (vs hand-optimized vs non-optimized) and relatively between them (maybe all platforms would be closer to each other?).

No comments

Post a Comment