[Librecores Discussion] IEEE754 FPU in nmigen

Luke Kenneth Casson Leighton lkcl at lkcl.net
Thu Mar 28 21:15:34 CET 2019


Rudi wrote:

> Curious: What is the cycle time of the generated code versus
> native verilog code ?

hi rudi, apologies for taking such a long time to reply, i've been
extremely busy doing a massive redesign of the jon dawson IEEE754 FPU
into a modular pipeline-based design.

honestly it's hard to tell, as whilst native verilog code uses if /
elif / elif constructs, yosys, through its AST, generates *casez* (not
case, *casez*) mutually-exclusive statements where only one bit is set
in each case, and the boolean logic that would normally be carried out
by a verilog-to-gate-level converter (chaining the inversion of each
if statement and ANDing it with the next if test) is handled by yosys
directly.

so, where normally you'd see an "if else if else if else if" single
statement, instead you get a block of 1-bit boolean tests joined
together that in turn go into an assigner.

actually... *multiple* assigners, because, again, for some reason,
yosys splits out each separate individual variable (register),
creating a totally separate state machine for each.register / input /
output.  makes reading the generated-output a pain in the neck,
although it's quite understandable why it's done.

the point is, then, that it's not *nmigen* that's making these
decisions: it's yosys.

that having been said, it's actually fine (other than being hard to
read).  i've given up reading the auto-generated verilog, and chose
instead to generate yosys "ilang" files (the native intermediary
language files), and look at the graphviz interpretation of it (not
the actual .il file).

i quickly became able to identify latches (a register that's looped
directly to itself and also has CLK coming in), and have found that by
splitting down into modules at every possible opportunity, the graphs
are really easy to understand and verify that the code is sane.

that, and an awful lot of unit tests, side-by-side with gtkwave output :)

anyway, i'm delighted to be able to say that after something close to
6 weeks i have a 3-stage pipeline adder *and* a finite-state-machine
using the exact same codebase, plus an extensive pipeline API that can
handle buffering [1] and non-buffering, and have a multi-input fan-in
stage *and* a multi-output (fan-out) stage that may be used as the
basis for a Reservation Station / Function Unit in an out-of-order
design.

the possibility also exists to use the fan-in and fan-out stages to
create early-out (branching / rejoining) pipelines (which is fine for
an OoO design).

i'm now going to look at doing something a little more sophisticated
(more python-like), as i really liked this code by pyrtl when i first
encountered it:
https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/add/pipeline_example.py;h=544b745b0a5d7b710b7d9eea38397acab5f4799a;hb=HEAD

it's extremely clever: it over-rides __getattr__ and use that to
detect first allocation of a register, *automatically* creating a
verilog variable prefixed by the current stage name, and initialising
it from the *previous* stage.  the end result is code that looks clean
and obvious and in sequence, using the same variable names in each.

l.

[1] https://zipcpu.com/blog/2017/08/14/strategies-for-pipelining.html


More information about the Discussion mailing list