[Embench] WD Code size compiler tests

Roger Shepherd roger.shepherd at chipless.eu
Fri May 14 12:39:29 CEST 2021

I replied directly to Nidal about this which was probably not the right way to have done things. Prompted by Jeremy’s response, I’ll copy what I said to Nidal here. I’ll leave it to Nidal to add his response.


I’ve had a look carefully at a couple of the benchmarks and I think they give me a feel for what’s been done. I do like the idea of having a set of Embench compiler tests and these could be a good starting point. Before we put too much effort into setting things up, I think we need to try agree a set of achievable and useful objectives. I think we need to cover

1. Audience/Users

2. Specific aspects of compiler that we’re trying to assess

3. Assessment methodology

4. Anything else

To get things going, let me make some suggestions - if only that we can have something concrete to disagree with!

1. People wanting to assess compiler maturity. This includes compiler developers and users.

2. A set of particular scenarios where compilers in the past have shown a lack of specific optimisations

3. Compilation of source source code. An automated measurement of the quality of the benchmark; perhaps size is good enough.

It’s not clear that “size” provides a useful metric for comparing maturity between architectures or between different abi’s on the same architecture. I suspect that if undertaking this work personally, I’d probably eyeball the code to see what was going on and not rely on any automated measurement, but I think for Embench we should do better than that.

4. There must be a check that the code will execute as expected.

I don’t trust code that doesn’t execute. But making code executable has it’s own problems - in particular, the extra code needed may make it difficult to see what is happening regarding the particular optimisation that is being checked for.

Before we get too dragged into the details of 3 and 4 I do think we need to check that we have 1, the audience, sorted out. We have a different task on our hands if we are looking at providing a set of microbenchmarks for architects and tool developers, or if we are trying benchmarks to help to select cores and tools.


In the light of this my response to Jeremy’s mail is:

> On 02/05/2021 11:57, Nidal Faour wrote:
>> Hi all,
>> I would like to have your opinion about opening a new repo under the Embench to hold the test cases done by WD. (located at the following link)
>> https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench <https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench>Hi Nidal,
> I'm up for this. I think we need to get buy in from the Embench community. What do others think?
It needs buy-in and commitment. That means some people need to commit to working on it, rather than just agreeing it is a good thing.

>> My suggestion is to have a new repo to hold these tests and labeled “compiler benchmark” or whatever better name you may suggest.
> That would be the thing to do. We'd need to make sure it was consistent with the 7 principles behind Embench. One that springs to mind is the need to make sure that the benchmarks are self-verifying.

Yes, but… see my (second) point 4 above. I think the tests are probably useful even if they aren’t benchmarks as such (e.g. “here are some useful code fragments to use to assess compiler maturity/code size”) but that might beyond the scope of Embench. I this is also tied in with the questions of who will use them? and what will they use them for.

>> Also, I think it would be better to add a new column labeled “Compiler” to the Embench Baseline Data slide to emphasize that these tests are made to test the compiler maturity for the specific target.
> Well they test three things - the processor, the compiler and the libraries. There is always the issue of how much information to put on the summary.
>> Just to remind you, here is the Baseline Data table I’m referring to as an example that I took from one of Jeremy's presentations. (see the last column I’ve added)
>> Name
>> Comments
>> Orig Source
>> C LOC
>> code size
>> data size
>> time (ms)
>> branch
>> memory
>> compute
>> Compiler
>> aha-mont64
>> Montgomery multiplication
>> AHA
>> 162
>> 1,072
>> 0
>> 4,004
>> low
>> low
>> high
>> crc32
>> CRC error checking 32b
>> MiBench
>> 101
>> 284
>> 1,024
>> 4,010
>> high
>> med
>> low
>> cubic
>> Cubic root solver
>> MiBench
>> 125
>> 1,584
>> 0
>> 3,831
>> low
>> med
>> med
> I don't think the compiler alone adds much. What matters is all the supplementary data (in the JSON file) providing all the details to reproduce the results.
Agreed that providing the information needed to enable reproducibility is key for benchmarks.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.librecores.org/pipermail/embench/attachments/20210514/761f7b7f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.librecores.org/pipermail/embench/attachments/20210514/761f7b7f/attachment.sig>

More information about the Embench mailing list