The MCU’s challenge to application dominance has begun. Field programmable logic devices (FPGAs) with on-chip fixed-function processing subsystems, also known as system-on-chip (SoC) FPGAs, have recently emerged as potential contenders for high-end processing applications. This raises the question: as application performance requirements continue to increase, will SoC FPGAs become a challenger in broader applications, or will MCUs evolve to better compete with SoC FPGAs? If you are considering a new design, which approach is best for you today – MCU or SoC FPGA?
This article will quickly review some of the key advantages and disadvantages of SoC FPGAs compared to MCUs. It will also explore some of the latest innovations in MCUs that make them more flexible and better equipped to address some of the key advantages offered by SoC FPGAs. Armed with this information, you will be better able to choose between MCU and SoC FPGA in your next design.
SoC FPGAs combine new flexibility with familiar processing systems
SoC FPGAs combine the best of both worlds. First, SoC FPGAs provide a familiar processing system—the CPU—to execute familiar sequential processing algorithms. In fact, many SoC FPGAs have converged on the common ARM processor architecture to form the basis of their “fixed-function” processing subsystems. This leverages an extensive ecosystem of ARM-compatible tools, intellectual property cores (IP cores) and supporting technologies, making development a very familiar process.
SoC FPGAs also provide a flexible programmable alternative to sequential processing. Programmable fabrics can implement almost any hardware function you need to enhance sequential processing in the processing subsystem. Programmable fabrics are inherently parallel in that multiple hardware modules can run concurrently, either in parallel, where the logic is repeated, or in a pipelined fashion, where the algorithm is divided into stages in order to handle overlap. Any of these methods will yield huge throughput gains compared to sequential methods.
SoC FPGAs are particularly useful when high performance is required for part of an algorithm that can be implemented in hardware using parallel or pipelined (or combined) techniques. Let’s look at an example device to better understand how this would work in a real system.
The Xilinx Zynq-7000SoC FPGA block diagram is shown in Figure 1 below. The top of the figure shows all the fixed function blocks contained on the chip. These modules implement a complete dual-core ARM processor application processing unit with extensive support for interconnect buses, peripherals, memory and off-chip interfaces. The programmable logic section is shown at the very bottom of the diagram and is accessible through various system-level interfaces. The group has introduced new improvements to the programmable logic aspect of SoC FPGAs, because the fixed-function elements can all work even without programmable logic. This means that the processor system can “boot up” and then configure the programmable logic. Previously, the non-SoC-oriented approach required the programmable logic to be configured before the processor could boot.
Figure 1: Xilinx Zynq SoC FPGA block diagram. (Courtesy of Xilinx)
In fact, code developers can think of programmable logic in an SoC as a hardware resource used to speed up parts of code that are too slow to implement on a processor. A design team member may focus their activities on creating the hardware acceleration that the programmer requires, or the programmer may be able to implement the hardware themselves. Either way, the algorithm becomes the focus of development, with a variety of implementation options available.
The SoC approach seems to work best when there are multiple performance-oriented algorithms running concurrently. One application area where SoC FPGAs have had significant success is complex image processing. These algorithms can often be pipelined and/or parallelized, making them good candidates for FPGA acceleration. If the processor also needs to handle high-bandwidth traffic on and off-chip (possibly using high-speed serial interfaces and large off-chip buffer memory), additional hardware support for offloading low-level tasks from the processor may also pay off in big dividends.
Multicore Response to SoC FPGAs
There are other ways to achieve parallel and pipelined implementations for applications such as image processing. One approach taken by MCU vendors is to implement multiple processing engines (multicore) on-chip to allow designers to decompose complex algorithms. When the architecture of each processor is the same, complex algorithms can easily be taken and broken down into fragments, each executed on a different but functionally identical processor.
For example, Texas Instruments (TI) offers the TMS320C66x multicore fixed and floating point digital signal processors (DSPs) with eight DSP processors plus a network coprocessor and a multicore navigator to simplify data using hardware queues transmission (Figure 2). DSP cores provide very high processing power for a variety of complex algorithms such as audio, video, analytics, industrial automation and media processing.
Figure 1: Xilinx Zynq SoC FPGA block diagram. (Courtesy of Xilinx)
In fact, code developers can think of programmable logic in an SoC as a hardware resource used to speed up parts of code that are too slow to implement on a processor. A design team member may focus their activities on creating the hardware acceleration that the programmer requires, or the programmer may be able to implement the hardware themselves. Either way, the algorithm becomes the focus of development, with a variety of implementation options available.
The SoC approach seems to work best when there are multiple performance-oriented algorithms running concurrently. One application area where SoC FPGAs have had significant success is complex image processing. These algorithms can often be pipelined and/or parallelized, making them good candidates for FPGA acceleration. If the processor also needs to handle high-bandwidth traffic on and off-chip (possibly using high-speed serial interfaces and large off-chip buffer memory), additional hardware support for offloading low-level tasks from the processor may also pay off in big dividends.
Multicore Response to SoC FPGAs
There are other ways to achieve parallel and pipelined implementations for applications such as image processing. One approach taken by MCU vendors is to implement multiple processing engines (multicore) on-chip to allow designers to decompose complex algorithms. When the architecture of each processor is the same, complex algorithms can easily be taken and broken down into fragments, each executed on a different but functionally identical processor.
For example, Texas Instruments (TI) offers the TMS320C66x multicore fixed and floating point digital signal processors (DSPs) with eight DSP processors plus a network coprocessor and a multicore navigator to simplify data using hardware queues transmission (Figure 2). DSP cores provide very high processing power for a variety of complex algorithms such as audio, video, analytics, industrial automation and media processing.