[Open SoC Debug] packet vs. memory interface
stefan at wallentowitz.de
Thu Jan 28 14:47:24 CET 2016
-----BEGIN PGP SIGNED MESSAGE-----
On 27.01.2016 20:42, Tim Newsome wrote:
> My thoughts here are that this is a lot of overhead if all you
> care about is accessing memory mapped registers. It seems more
> straightforward to implement a protocol that doesn't require an
> extra layer of packet format on top of a memory bus. Am I
> overestimating the extra complexity here?
I think you are overestimating it a bit, but it all depends a bit on
how you define a bus. If you have a look at state-of-the art systems
they actually have not much in common with the old tristate-lines plus
Below I wrote a small overview of interconnect topologies. But the
most important thing to say is, that we plan composable modules in a
way that they are split into the Debug Interface Interconnect
(DII)-specific frontend and an interconnect-independent part with the
MMIO-Interface  or similar wherever possible. As a sidenote: In
trace modules thats between the trace generation and the
packetization, but because the trace primitives can generally be of
arbitrary size that depends a bit on the specific module.
For a general discussion around topologies let me shortly summarize my
rough knowledge and thoughts:
# Old-School Debug Interconnect
Thats of course the good old JTAG. Traditionally, the JTAG TAPs are
chained up in a device and you get the large shift register spanning
your chip. This has slightly change for modern debug systems, where
you often find a tree of multiplexers that are controlled with a JTAG
register themselves. Slide 10 in  is a good picture of this. There
you can also find the equivalence for trace streams on slide 16.
# The Bus
In the old days a bus was a shared medium with tri-state drivers and
an arbiter. In on-chip implementations this is very rare nowadays and
on an FPGA it is even impossible to do tri-state since Virtex2-Pro.
Instead there are a few building blocks: One #Masters-Mux, one
arbiter, one #Slave-Demux and an address decoder. I put up a rough
sketch in . To increase the throughput you can use a crossbar
instead, and many processors actually used such. There you have
#Masters many (#Slave-Demux, Address Decoder) pairs and #Slaves many
(#Masters-Mux, Arbiter). I have similarly drawn it in .
# The Ring
The ring is like the most simple network-on-chip of point-to-point
connections. As you say it is generally packet-based as in our case.
Each ring router looks like : 2 2-Demux, 2 Comparators, 2 2-Mux, 2
Arbiters and buffers. The buffers allows parallel transmission of
multiple packets by partitioning it and it increases the speed, making
a ring the fastest interconnect.
If you look at current Intel processors, the previous crossbar between
the cores and slaves has now been replaces with a ring .
# The modern channel-based interfaces: AXI, NASTI, TileLink
The description of a bus above pretty much matches what you find in
simple AHB buses etc. For the modern interfaces like AXI (or NASTI or
TileLink) the design changes a bit. Due to their channel-based nature
the interconnects don't share a common medium, but the different
requests and responses are very much decoupled. The AXI Interconnect
from Xilinx is still a bus or crossbar on the channels . Taking
that each channel is pretty wide, adding a port actually implies
relatively much logic. If you use the ARM CoreLink NIC-400  in a
large design, the thin link (TLX) feature is often used. What it does
it serialize and de-serialize the requests and responses to reduce the
internal connectivity. I roughly depicted how it looks like in SoC
interconnects then in . On the outside there are protocol adapters
and internally some kind of packet format is actually routed between
# Conclusion for Debug Interconnect
I am sorry, that this got rather long. But the point I wanted to make
is that the ring is actually not the complex if you compare it to
what's in the field as system buses. As packet-based interconnect the
advantages are its better scalability, it can span a chip thanks to
the buffer partitioning and can be faster. Nevertheless the throughput
is not that high, so that I am really looking forward to exploring
more sophisticated topologies.
To further justify the decision for a packet-based interconnect
actually stems from the demand for trace-based debugging which I
personally see as much more important in future systems. Using an MMIO
interface here is also possible, but then you map it to a FIFO-like
static address write with variable length bursts or so. Thats not much
different from the packet-based interconnect then.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
-----END PGP SIGNATURE-----
More information about the OpenSoCDebug