September 26th, 2019 – By: Dominik Strasse
Expectations for C++/SystemC designs must be set properly.
Not so long ago, some EDA vendors were painting a very attractive picture of chip design in the then-near future. The idea was that an architectural team would write a single description of the complete system in some high-level language, usually C/C++/SystemC, and that a new class of EDA tool would automatically partition the design into hardware and software, choosing the functionality of each to meet the project’s price/performance/area (PPA) goals. Such a tool would generate SystemVerilog RTL for the hardware, which would be turned into a chip (or chips) by logic synthesis. The tool would generate C code for the software portions, in the form of programs that could be compiled directly and run on the hardware. The system-level description would remain the golden source, and neither hardware engineer nor programmer would have to make any changes to the generated design and software. Further, virtually all functional verification would be done on the system-level model.
This was a nice vision while it lasted, but the industry has been able to take only relatively small steps toward its realisation. Automatic partitioning has proven to be very difficult problem, due in part to the different ways that software and RTL are written. As one example, hardware generally stores the least amount of data needed to get started processing. A huge amount of streaming data can be processed even though only a portion of it may be within the chip at any given point in time. The hardware micro-architecture must reflect this. Software, on the other hand, tends to keep large buffers in memory. In software, the data is generally persistent; in hardware, it’s temporal. Software also tends to regard reading and writing memory as essentially free, whereas memory accesses in hardware may have high cost.
Most software and system-level models are written without a specific hardware architecture in mind. An algorithm might read a series of locations from memory (for example, a packet header), process them, and serially write them back or into another set of memory locations. This would be inefficient in hardware, so parallelism and pipelining are used. Expecting an EDA tool to take a system-level description and define the entire hardware micro-architecture automatically is beyond the current state of the art. To be sure, high-level synthesis (HLS) tools are in wide use today, but their input is only a few steps more abstract than RTL code, not a C description of the complete hardware-plus-software system. In practice, teams who use C/C++/SystemC models for architectural analysis early in the project spend a lot of effort refining those models so that HLS can handle them and produce results meeting PPA goals.