KERNEL PERFORMANCE TESTS ON FREEBSD 10.0-CURRENT, SEPTEMBER 2012, PART 2 ======================================================================== INTRODUCTION ------------ These tests aim to give an indication of the runtime performance of FreeBSD kernels compiled with different compilers, at various optimization levels. The compilers tested were: - gcc 4.2.1, the system compiler in FreeBSD. - clang 3.2 (trunk 162107), which is the default version of clang in FreeBSD 10.0-CURRENT, after r239462. All tests were run on a machine gracefully provided by Gavin Atkinson, which is based on an Intel DQ57TM desktop board, with a quad-core 3.20 GHz Intel Core i5 CPU (id=0x20652), and 4 GB RAM. It runs FreeBSD/amd64 10.0-CURRENT as of Tue Sep 11 19:11:00 UTC 2012. An excerpt of dmesg follows: CPU: Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz (3192.08-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x20652 Family = 6 Model = 25 Stepping = 2 Features=0xbfebfbff Features2=0x298e3ff AMD Features=0x28100800 AMD Features2=0x1 TSC: P-state invariant, performance statistics real memory = 4294967296 (4096 MB) avail memory = 3882647552 (3702 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 SMT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 4 cpu3 (AP): APIC ID: 5 With each compiler, stock GENERIC kernels for amd64 were built from head as of r240384, for each of the following optimization flags: -O2 -frename-registers -pipe -fno-strict-aliasing -O1 -pipe -O0 -pipe Note that clang does not support -frename-registers, so it was omitted for the corresponding kernel builds. No CPU-specific optimization flags (-march=) were used. Each kernel was installed into a separate kernel installation directory under /boot. The system was then booted with each of these kernels, without modifying anything else, and multiple runs of "make -j8 buildworld" were done. Between each run, the /usr/obj directory was fully cleaned out, and filesystems were synced. The timing results, processed with ministat(1), are below. Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O0 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 6503.62 6527.84 6520.49 6517.2817 8.3845558 user 6 12534.49 12576.55 12555.29 12555.547 14.079771 sys 6 9655.1 9733.92 9716.1 9709.9533 28.981809 maxrss 6 758208 758248 758224 758222.67 13.779213 ixrss 6 4396 4401 4397 4397.1667 1.9407902 idrss 6 523 523 523 523 0 isrss 6 126 126 126 126 0 minflt 6 6.6264519e+08 6.6337812e+08 6.6297908e+08 6.6299306e+08 249092.49 majflt 6 4354 10457 5722 6207.8333 2208.4725 nswap 6 40 56 42 44.333333 6.1210021 inblock 6 25167 44267 29212 31042.667 6677.3727 oublock 6 32801 34666 33500 33635.167 692.27897 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60495 60504 60502 60500 3.5213634 nvcsw 6 1750409 1759010 1754971 1754668.8 3641.3163 nivcsw 6 1867335 1943885 1924258 1909641.2 30495.366 Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O1 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 4788.59 4831.96 4798.01 4802.305 15.48322 user 6 12239.94 12285.9 12268.91 12263.5 17.190572 sys 6 4041.05 4100.4 4083.92 4076.235 21.374684 maxrss 6 758212 758256 758256 758242.67 18.532855 ixrss 6 4963 4971 4964 4964.6667 3.1411251 idrss 6 589 590 589 589.16667 0.40824829 isrss 6 132 132 132 132 0 minflt 6 6.617985e+08 6.6339562e+08 6.629315e+08 6.6272587e+08 574835.78 majflt 6 7935 23481 17450 16901.667 5324.564 nswap 6 40 52 48 47.333333 3.9327683 inblock 6 25121 44292 29173 30980.667 6715.0864 oublock 6 24867 28037 26579 26667.167 1162.513 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60492 60500 60498 60496.667 3.4448028 nvcsw 6 1559857 1576788 1562507 1565002.8 6454.8513 nivcsw 6 1632143 1721204 1688209 1682830 35836.46 Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O2 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 4780.24 4819.77 4801.98 4798.5867 14.236627 user 6 12242.91 12275.04 12256.37 12255.905 11.676621 sys 6 4052.75 4118.65 4104.76 4096.2217 22.874298 maxrss 6 758220 758256 758256 758244.67 17.603031 ixrss 6 4960 4970 4964 4963.8333 3.4880749 idrss 6 589 590 589 589.16667 0.40824829 isrss 6 132 132 132 132 0 minflt 6 6.6248246e+08 6.6340936e+08 6.6300404e+08 6.6293496e+08 324940.82 majflt 6 4300 22493 14128 12176.833 6396.7734 nswap 6 40 52 48 46 4.8989795 inblock 6 29120 44375 29277 31760 6180.4181 oublock 6 24915 28157 25984 26315.333 1251.164 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60490 60499 60497 60495.667 3.2041639 nvcsw 6 1559291 1575794 1570626 1569117.3 5467.274 nivcsw 6 1593865 1678135 1654604 1640246 31701.067 Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O0 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 6083.69 6101.08 6096.85 6094.4383 6.5165003 user 6 12424.93 12462.24 12438.63 12441.97 12.975073 sys 6 8305.66 8394.45 8377.26 8366.4767 32.469675 maxrss 6 758208 758256 758224 758225.33 16.52473 ixrss 6 4481 4491 4484 4484.6667 3.3862467 idrss 6 533 534 533 533.16667 0.40824829 isrss 6 127 127 127 127 0 minflt 6 6.6241224e+08 6.6339646e+08 6.6301629e+08 6.6292507e+08 336924.37 majflt 6 4357 9603 6231 6667.8333 1812.2422 nswap 6 40 48 40 41.666667 3.204164 inblock 6 29162 44302 29272 31759.333 6145.0026 oublock 6 30081 32816 31538 31281.5 1163.8237 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60500 60501 60500 60500.333 0.51639753 nvcsw 6 1701009 1713077 1709140 1707903 3975.4753 nivcsw 6 1854572 1936195 1896858 1894873.2 26725.543 Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O1 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 4943.74 4965.28 4955.62 4953.78 7.2888627 user 6 12274.46 12334.13 12322.13 12314.472 21.858036 sys 6 4576.99 4621.09 4617.21 4609.75 16.658918 maxrss 6 758208 758256 758224 758232 19.595918 ixrss 6 4897 4902 4898 4898.6667 1.9663842 idrss 6 581 582 581 581.33333 0.51639778 isrss 6 131 131 131 131 0 minflt 6 6.626435e+08 6.634147e+08 6.6301953e+08 6.629835e+08 279004.88 majflt 6 6092 11215 9188 8755.1667 1849.3565 nswap 6 40 62 48 49.333333 7.1180522 inblock 6 29076 44462 29163 31697 6253.6444 oublock 6 25415 28495 28175 27508.167 1179.5914 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60488 60499 60495 60494.333 3.9832984 nvcsw 6 1575048 1588567 1584504 1582316.7 5705.6913 nivcsw 6 1682902 1745827 1730506 1722802.3 24060.717 Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O2 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 4876.16 4901.55 4895.24 4888.7583 10.598318 user 6 12241.35 12306.04 12283.94 12278.767 23.922356 sys 6 4400.43 4452.62 4446.22 4438.0117 19.231095 maxrss 6 758212 758256 758224 758229.33 17.095809 ixrss 6 4899 4905 4900 4900.6667 2.2509257 idrss 6 581 582 582 581.83333 0.40824829 isrss 6 131 131 131 131 0 minflt 6 6.6214332e+08 6.6334997e+08 6.6298766e+08 6.6278723e+08 436172.22 majflt 6 6055 12473 9169 8895.5 2381.6063 nswap 6 40 54 48 48 4.5607017 inblock 6 29193 44443 29313 31804 6192.0071 oublock 6 25113 28152 26770 26490.167 1254.3383 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60496 60501 60499 60498.667 2.2509257 nvcsw 6 1566521 1592140 1579251 1578889.5 9354.883 nivcsw 6 1686675 1809406 1785290 1756283.7 50719.325 Summary: -------- On a kernel compiled with clang 3.2 -O2, building world in multi-threaded mode is ~1.9% faster in real time than on a kernel compiled with gcc 4.2.1 -O2, and ~8.3% faster in system time. On a kernel compiled with clang 3.2 -O1, building world in multi-threaded mode is ~3.2% faster in real time than on a kernel compiled with gcc 4.2.1 -O1, and ~13.1% faster in system time. On a kernel compiled with gcc 4.2.1 -O2, building world in multi-threaded mode is ~1.3% faster in real time than on a kernel compiled with gcc 4.2.1 -O1, and ~3.9% faster in system time. The difference between building world in multi-threaded mode on kernels compiled with clang 3.2 -O2 and -O1 is not significant (to within 1 standard deviation). Conclusion: ----------- Kernels compiled with clang are a little faster in real time for building world, and in system time the difference is even larger, roughly 10%. For clang, the difference between -O1 and -O2 is not measurable, but for gcc, -O2 is slightly faster than -O1. ================================================================================ Copyright (c) 2012 Dimitry Andric Verbatim copying and redistribution of this entire text are permitted, provided this notice is preserved. ================================================================================