[Embench] How to measure code size fairly

Ofer Shinaar Ofer.Shinaar at wdc.com
Wed Sep 4 14:43:21 CEST 2019


On 03-Sep-19 13:10, Jon Taylor wrote:

Hi Jon,
Regarding context switch you are right lets take this to different 
thread, and we can correspond comments there.
So for library usage, I dont think we should ignore it, on the opposite, 
dont ignore it. Any test case is good.
My point here is that a real embedded project for small memory budget 
(like IoT) will not take benchmarks result for libraries as 
rule-of-thumb that this ISA give better results then others, it might, 
but its not for sure (we know that sometime companies tweak compiler or 
core just to get good score for coremark, as an example).

Those devices will usually preferred to have HW IPs that process those 
algorithms, and have the mcu just control that.
This mean, for example, that access to memories, loops, inlines and more 
will effect the size more then lib usage.
based on that I proposed my two options. The usage of RTOS is just 
because it access the memory a lot and have various coding. But we can 
add more... for example drivers are very real embedded usage: SPI, I2C, 
UART, memory matrix operations, TCP/IP, etc....

Thanks,
Ofer


> Hi Ofer,
>
> I take your point that the kernels may not be the dominant factor in an overall application size; but I'm still concerned that we shouldn't be ignoring library size, particularly when libraries may be the most significant fraction of a kernel. If it's a small fraction - then great, it doesn't matter overall anyway. But for tests (such as cubic) where it is a very large fraction, it seems odd to choose to ignore it.
>
> Having some more complete applications is potentially interesting, although the scope gets harder to strictly define to ensure comparisons are like-for-like. For instance you could also then consider aspects such as MPU/PMP reprogramming, and context switch timings may become more meaningful if you're measuring OS behaviour, rather than just the kernel of a context switch. But I think that's a whole other thread and we still need to get the basic elements (such as context switch kernel) in place first.
>
> Regards,
>
> Jon
>
>> -----Original Message-----
>> From: Ofer Shinaar <Ofer.Shinaar at wdc.com>
>> Sent: 29 August 2019 16:42
>> To: Jon Taylor <Jon.Taylor at arm.com>; embench at lists.librecores.org
>> Cc: nd <nd at arm.com>
>> Subject: RE: [Embench] How to measure code size fairly
>>
>> Hi Jon,
>> I want to share my two cents regarding code size.
>> Measuring code size is a different approach from checking performance for
>> synthetic/none-synthetic benchmarks.
>> While performance is tested over libs and applicative code (like crc, SHA,
>> Fourier transform, etc...), checking size over those will be unreverent since
>> usually embedded FW will have more "control code" then  "library usage".
>> For example, FW can use JPEG encoder and it will take 4kB of size in some
>> target, but the overall code of the program will be 100 or 1000 bigger.
>> Today IoT device are "fighting" over several bytes how we measure code size
>> (we call small embedded devices IoT today just because it fits to the concept,
>> but we can have big ones as well) so?
>>
>> Well, I think that practice comparison is one of the options. If we have a code
>> that we spotted that have difference between ARM/RV/x86/Other we can
>> use it as a "test case".
>> This code will have randomly C functionality  (loops, ifs, inline, etc...) Of
>> course that this will be massively depend on the compiler but also on the ISA
>> and ABI rules, we already spotted cases like this internally and we open
>> source those test cases.
>>
>> Another approach will be to use "big FW applications" which use a lot of
>> randomly C functionality, like RTOS.
>> For example we can examine what is the size of FreeRTOS with RV32IMC vs
>> ARM (Thumb 2). This will be very interesting to small embedded devices that
>> depend on RTOS, this can highlight how much one target is better/worst
>> then the other, from size perspective.
>>
>> Thanks,
>> Ofer
>>
>>
>>
>>> -----Original Message-----
>>> From: Embench [mailto:embench-bounces at lists.librecores.org] On Behalf
>>> Of Jon Taylor
>>> Sent: Thursday, August 29, 2019 10:16
>>> To: embench at lists.librecores.org
>>> Cc: nd <nd at arm.com>
>>> Subject: Re: [Embench] How to measure code size fairly
>>>
>>> Thanks Jeremy.
>>>
>>> Firstly my opinion is that any code we're measuring the size or
>>> performance of needs to be functional. If an algorithm requires lots
>>> of maths library code (such as cubic), there is a benefit to having an
>>> optimised library available and that should be reflected in a
>>> benchmark score. This could also include allowing a library optimised
>>> for a processor with custom instruction extensions.  I'm really not
>>> sure what measuring the performance of something that can't be
>>> executed really tells us - for example "cubic" is about 1k of code
>>> with dummy libraries, but ~9k with libraries (Arm GCC, building - O2).
>>> We wouldn't measure the runtime without libraries, so why would
>> measuring the size without libraries be considered valid?
>>> Having said that, I think it likely (particularly for benchmarks run
>>> on actual hardware), use of printf might be desirable for recording
>>> the runtime (eg via a UART, trace port or other mechanism), but
>>> measuring the size of the printf library is not helpful because it's
>>> effectively only for debug, not functional purposes.  Comparing code
>>> with and without printf, the print library adds ~20k to Arm code size,
>>> and ~60k to RISC-V; when many of the tests are a kb or two in size,
>>> this massively distorts the results. Having an empty test allows this
>>> to be discarded, since the printf would be in common code and thus
>> compiled into the empty test too.
>>> I'm not sure I understand the point about needing a different dummy
>>> for each benchmark. My expectation is that a test consists of:
>>> <bootcode>
>>> <test initialisation>
>>> <start timer>
>>> <test>
>>> <stop timer>
>>> <possible cleanup code>
>>>
>>> We want to discount everything that is not <test> - and an empty test
>>> would achieve this (assuming that we are happy counting library code
>>> that is required by the benchmarks). Everything outside of <test>
>>> should be common code across all of the tests, so only a single dummy
>>> is needed. I do think we need to allow for LTO being used as it can
>>> offer some significant size and performance benefits, but we should
>>> investigate whether it distorts the results significantly.
>>>
>>> Kind regards,
>>>
>>> Jon
>>>
>>>> -----Original Message-----
>>>> From: Embench <embench-bounces at lists.librecores.org> On Behalf Of
>>>> Jeremy Bennett
>>>> Sent: 26 August 2019 19:36
>>>> To: embench at lists.librecores.org
>>>> Subject: [Embench] How to measure code size fairly
>>>>
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Hi all,
>>>>
>>>> Jon Taylor from ARM has posed some useful questions about how
>>> Embench
>>>> measures code size. This is a new thread to get input from the
>> community.
>>>> I think we can do better, and would welcome on advice on improved
>>>> approaches.
>>>>
>>>> Background
>>>> - ----------
>>>>
>>>> At present, the scripts measure size by building a benchmark with
>>>> dummy libraries and dummy startup code. This minimizes the impact of
>>>> such code
>>> on
>>>> the measurement. Since libraries are not typically rebuilt with the
>>>> same compiler options, they can provide a constant bias on each
>>>> benchmark measurement.
>>>>
>>>> This is particularly the case with the relatively small benchmarks
>>>> we have in Embench. We can see this if we compare ARM and RISC-V
>>>> benchmarks out
>>> of
>>>> the box. Most of the time ARM appears to be much larger, but this is
>>>> because its startup code is much more general purpose than RISC-V,
>>>> and adds 4Kbyte to the code size. Strip this out and ARM code comes
>>>> out generally somewhat smaller than RISC-V.
>>>>
>>>> Conversely in the few benchmarks that have floating point
>>>> calculations,
>>> ARM
>>>> does very well, due to its hand-optimized floating point library.
>>>>
>>>> By using dummy startup code and libraries, we can remove this bias.
>>>>
>>>> However...
>>>>
>>>> The programs will not then execute, so there is no guarantee that
>>>> the compiler has generated correct code. There is also much greater
>>>> potential
>>> for
>>>> global inter-procedural optimization (LTO) than would be the case
>>>> with real libraries.
>>>>
>>>> I refer to this current approach as "Option 0". Here are some other
>>>> options which might be better.
>>>>
>>>> Option 1: Just accept the bias
>>>> - ------------------------------
>>>>
>>>> We could just accept that the bias is there, and use size as measured.
>>>> This option relies on very few assumptions about the target and tools.
>>>>
>>>> The problem with this, that with small programs, the bias is
>>>> substantial and we lose a lot of insight. Instead of being able to
>>>> see which architecture and compiler features are beneficial, we just
>>>> measure start-up code and library design for the architecture.
>>>>
>>>> Option 2: Have a dummy benchmark with no code to subtract
>>>> - ---------------------------------------------------------
>>>>
>>>> This would give us a good result, but with garbage collection of
>>>> sections, modern tool chains only link in the code they actually use.
>>>> So we would need a different dummy for each benchmark, potentially
>>> quite
>>>> complex to construct. This gets even harder with LTO, potentially
>>>> moving code in and out of libraries.
>>>>
>>>> This option starts to require more assumptions about the target and tools.
>>>>
>>>> Option 3: Just count the size of the object files before linking
>>>> - ----------------------------------------------------------------
>>>>
>>>> This is relatively straightforward to do.  The problem is that it
>>>> precludes any benchmarking of link time optimizations such as
>>>> global-interprocedural optimization (LTO). Given the importance of
>>>> such techniques, this significantly reduces the value of Embench to the
>> compiler community.
>>>> This option makes relatively few assumptions about the target
>>>> architecture and tools.
>>>>
>>>> Option 4: Subtract the size of the startup and library code
>>>> - -----------------------------------------------------------
>>>>
>>>> We can look at the compiled binary and subtract any code/data
>>>> associated with libraries and startup.
>>>>
>>>> This would be compatible with link time optimizations, although with
>>>> a measurement error if such optimizations migrate benchmark code
>>>> to/from library code.
>>>>
>>>> This option makes assumptions about code and data layout. For
>>>> example
>>> that
>>>> a function starts at its label and ends at the label with the next
>>>> highest address.
>>>>
>>>> Option 5: Link but measure only benchmark code
>>>> - ----------------------------------------------
>>>>
>>>> This is a combination of options 3 and 4. We look at the benchmark
>>>> code
>>> pre-
>>>> linking to determine the symbols used in the benchmark code and data.
>>> We
>>>> then link and only count the size of the symbols from the benchmark
>> code.
>>>> Also potentially vulnerable to error with link time optimizations,
>>>> and makes all the same assumptions as options 3 and 4.
>>>>
>>>> Option 6: Statistically eliminate the bias
>>>> - ------------------------------------------
>>>>
>>>> This uses the current option 0 and option 1, to provide a per
>>>> benchmark estimate of startup and library code size. This still
>>>> actually includes dummy code size, but potentially option 4 could we used
>> to estimate this.
>>>> This makes relatively few assumptions about target and tools (at
>>>> least without option 4), but might be hard to explain to people.
>>>>
>>>>
>>>> Feedback very welcome.
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Jeremy
>>>>
>>>> - --
>>>> Tel: +44 (1590) 610184
>>>> Cell: +44 (7970) 676050
>>>> SkypeID: jeremybennett
>>>> Twitter: @jeremypbennett
>>>> Email: jeremy.bennett at embecosm.com
>>>> Web: www.embecosm.com
>>>> PGP key: 1024D/BEF58172FB4754E1 2009-03-20 -----BEGIN PGP
>>> SIGNATURE---
>>>> --
>>>>
>>>>
>> iEYEARECAAYFAl1kJnoACgkQvvWBcvtHVOHaFwCdHFYOoMHcsF2QL2fdXCpcg
>>>> OAH
>>>> +CIAnRS1iWUyEHbdwreisMGAW1ccyCZs
>>>> =x6gL
>>>> -----END PGP SIGNATURE-----
>>>> --
>>>> Embench mailing list
>>>> Embench at lists.librecores.org
>>>> https://lists.librecores.org/listinfo/embench
>>> --
>>> Embench mailing list
>>> Embench at lists.librecores.org
>>> https://lists.librecores.org/listinfo/embench




More information about the Embench mailing list