|
Meeting the Dramatically Higher Computational Demands of 3G/4G/LTE Base Stations
By Alan Taylor, Director of Marketing,
Mindspeed’s Multiservice Access Business Unit
The Mobile Internet is bracing for an avalanche of mobile devices that will devour data while consuming already scarce bandwidth in the network infrastructure. Standing in the path of this avalanche is the base station’s baseband processor, which provides the only significant pathway to increased bandwidth within finite resources. Unlike the wired infrastructure, in which bandwidth can be increased simply by adding new lines, the mobile infrastructure must operate with finite available radio spectrum resources. Increased bandwidth comes from making the most efficient use of that available spectrum, which requires technologies that impose a heavy computational burden on the base station’s baseband processor.
The burden is daunting, and growing. In a report released by Cisco Systems in February entitled, “Visual Networking Index (VNI) Global Mobile Data Forecast for 2009-2014,” the company projected that annual global mobile data traffic will reach 3.6 exabytes per month or an annual run rate of 40 exabytes by 2014. Such a figure equates to a 39-fold increase from 2009 to 2014, or a compound annual growth rate (CAGR) of 108 percent, the report said. According to the report, two major global trends are driving this increase. The first is the proliferation of mobile-ready devices and widespread mobile video content consumption. By 2014, there could be over 5 billion personal devices connecting to mobile networks – and billions more machine-to-machine nodes, according to the Cisco report. The second trend is the growth in mobile video, which the study projected would represent 66 percent of all mobile data traffic by 2014. This would be a 66-fold increase from 2009 to 2014 -- the highest growth rate of any mobile data application tracked in the Cisco VNI Global Mobile Data Forecast.
The wireless industry has had to reinvent itself many times over the last two decades for the sole purpose of increasing air interface efficiency. The first such reinvention was from FM radio-based cellular technology (e.g. AMPS, advanced mobile phone system, or 1G) to digital TDMA/CDMA based GSM, IS-54 or IS-96 (2G). The industry then moved to W-CDMA/cdma2000 (2.5G/3G), and then to HSPA/HSPA+/cdma2000 EVDO (3G+, or enhanced 3G). Virtually every transition was motivated by the need to increase air-interface efficiency in order to provide the required service enhancements to satisfy growing market demand. Figure 1 shows the air-interface evolution and associated bandwidth increased over the past several years.

Until now, network equipment manufacturers have generally used an array of digital signal processor (DSP) devices plus very expensive field programmable gate array (FPGA) offload engines to support the growing computational loads of 3G networks. The majority of this 3G computational load comes from very simple correlations. In contrast, 4G standards require a significant amount of more complex computations, primarily for fast Fourier transform (FFT)/inverse FFT (iFFT). 3G baseband processing solutions are at a significant disadvantage in the 4G computational environment. First, they impose a high system cost while consuming a significant amount of power. Plus, because they are multi-chip solutions, it is necessary to partition tasks and deal with inevitable integration challenges. They also require longer development cycles than optimized, application-specific system-on-chip (SoC) solutions, and impose longer system latencies due to data transfer across devices.
Finally, the nature of general-purpose DSP approaches is that they require a significant amount of software maintenance. The traditional DSP-plus-FPGA approach simply will not scale to full LTE/WiMAX without significant adaptation.
Supporting 4G Radio Technologies
4G base station baseband processors must support many radio access technologies (RATs) that are designed to reduce cost per bit for broadband data services delivered over limited radio spectrum resources. These include orthogonal frequency division multiplexing (OFDM)/OFDM Access (OFDMA) and multiple input and multiple output (MIMO) antenna technologies, which help increase data throughput but also increase radio transmission power, which can lead to interferences. The higher data throughput also creates the need for higher computational complexity in the baseband processing – as much as one order of magnitude greater for 4G systems (LTE/WiMAX) than for 3G systems.
In addition to reducing the cost per bit for wireless data services, 4G RATs also must dramatically reduce end-to-end latency. For now, 3G RATs can’t deliver satisfactory quality of service (QoS) for voice conversations, since voice requires end-to-end latency of no more than 150 milliseconds (ms), and today’s 3G RATs have an average latency of 200ms or more. As a result, voice is still very inefficiently carried primarily over a 2G/2.5G RAT-based infrastructure. In order to support voice traffic over a 3G or 4G data channel, the RAT’s end-to-end latency has to be short -- a painful lesson learned from VoIP applications. An added benefit of a low-latency 3G or 4G RAT is that it will drive the acceptance of popular wireline applications such as interactive gaming, which have very low latency requirements, on wireless networks.
The key to all of the above RAT capabilities is generating the horsepower to handle a dramatically increased level of 4G baseband processing complexity, without ratcheting up cost or power consumption.
Solving the Challenge at the Silicon Level
Enabling 4G networks requires more than just processor core speed increases. The problem with this brute-force approach is that it also brings power and cost increases. While the VLSI industry will certainly continue to push DSP performance, it is unlikely that significant 4G computational improvement can be achieved based on core speed (i.e. MHz) increases, alone. Instead, it is increasingly clear that a multi-core approach is the only pragmatic solution for increasing performance with satisfactory cost and power efficiency.
In a multi-core approach, each core performs the tasks it does best. General-purpose DSP cores are used to implement complex algorithms such as speech compression, channel estimation, etc. Reduced instruction set computer (RISC) processors are used to implement control or protocol layer processing. And finally, ASICs or co-processors are used to perform computationally intensive yet algorithmically simple or fixed applications. Using this approach, it is possible to accommodate 4G’s increased computational complexity while accommodating all of the other 3G/4G and legacy system needs.
Another benefit of the multi-core approach is that it will benefit from the advantages of the now-mature 45/40nm manufacturing process, and continuing reductions in transistor geometry. These factors will enable the industry to either shrink the size of DSP/SoCs or integrate more VLSI elements into the same footprint, which will open the door for a new breed of compact, cost-effective and ultra-low-power DSPs/SoCs that use multiple, task-oriented processing engines to deliver dramatic increases in computational performance.
Over the last decade, Mindspeed has innovated and commercialized some of the earliest examples of high performance SoCs that integrate multiple task-oriented processing engines, initially for carrier-class VoIP, data and mixed-media processors and, more recently, for triple-play customer premise equipment (CPE) gateways. The first generation of Comcerto® VoIP SoCs incorporates two RISC processors (ARM-9) and two general-purpose DSP cores.
Today’s highest-performing Comcerto SoC combines two ARM-11 RISC processors and eight DSP cores, plus eight co-processors. It has a total combined processing power of 22.4GMAC/s and 900M RISC instructions per second and can handle up to 640 channels of VoIP. The Comcerto SoC family performs the entire processing function from time division multiplexing (TDM) samples to IP packet processing, enabling a single Comcerto device to support applications ranging from single-chip access gateways (as a Class V switch replacement for PSTN) to very-high-capacity media gateways (as a Class IV switch replacement).
This same approach can also be used to address 4G baseband processing needs for platforms ranging from cost-sensitive picocell equipment to high-performance macrocell base stations. It can also very flexibly support the full range of 4G standards, including LTE-FDD, LTE-TDD and WiMax 802.16d, 16e and 16m, as well as 3G standards like W-CDMA (WCDMA, HSPA and HSPA+) or TD-SCDMA.

Figure 2 shows a block diagram of one of the industry’s first examples of this approach. The Transcede 4000 family uses a heterogeneous multi-core approach built on an ARM multi-layer AXI interconnect fabric. A central fabric connects processing and I/O clusters, each with its own internal fabrics. There are two processing clusters and two I/O clusters. A system cluster supports system control and packet processing and uses one quad-core and one dual-core ARM Cortex A9, which are combined with hardware accelerators for forward error correction and encryption on a local AXI fabric along with a DMA engine, cache, and DDR2/3 DRAM interface.
The second processor cluster is connected to the first cluster via the chip’s system-level AXI, and contains 10 CEVA 1641 programmable DSP cores and 10 proprietary filter-processor cores, plus their local memory interfaces and a second DMA engine, plus a second DRAM interface on a second local AXI fabric. Two more clusters on the system AXI feature system I/Os include two Gigabit Ethernet ports on a local AMBA AHB bus, plus 10 high-performance serializer/deserializer (SerDes) lanes that are multiplexed into two sRIO controllers, one PCIe gen II controller, and one CPRI controller.
The lowest-power T4000, with less than 12W typical power consumption, features integrated 600 MHz processor cores and supports complete LTE or WiMAX Layers 1 and 2 (L1 and L2) processing needs for 3 sectors of 10MHz with 2 x 2 MIMO in a single device. It delivers these capabilities with more than 40 percent DSP headroom for future feature addition/evolution, such as supporting features needed for evolving and new standards. The T4020 features integrated 750 MHz processor cores and performs similar LTE or WiMAX Layers 1 and 2 processing functions for up to 3 sectors of 20 MHz with 2x2 SIMO in a single device that consumes less than 15W of power. Each is optimized for HSPA solutions supporting up to 144 HSPA+ active users, with a full duplex data rate of 40 Mbps. The inclusion of both an L1 physical layer (PHY) and L2 media access control (MAC) on the same device provides the lowest possible system latency.
The Transcede family’s hardware architecture not only includes a mix of programmable DSPs and task-specific hardware-acceleration elements, but also enables the use of a single-threaded simplified programming model to efficiently deal with all of the parallel-processing elements. This approach also creates the challenge of programming a large multi-core architecture so that it can operate at wire speed. The answer is a new approach to modeling the system. Mindspeed has created a hardware abstraction mechanism that has matured and proven in its VoIP product lines. The user-programmable computing loads are broken into tasks and coded in C language. The tasks access a library of inter-task communication utilities that use a task-control block convention. Task scheduling is performed using a combination of hardware schedulers and software. Developers have been given access to the task scheduling mechanism so they can leverage it to optimize their systems. They don’t need to partition the hardware resources in a traditional static manner, resulting in ease of design and faster time to maturity for their wireless baseband products, while streamlining future software maintenance.
To facilitate debug, the Transcede device leverages the internal debug monitors of the Cortex processors and AXI buses. This gives designers insights into control-processor control threads as well as interprocessor traffic, so that they can develop solutions for optimizing processing operations depending on their given design. Additionally, the Transcede 4000 family uses a scalable hardware architecture that enables the same software to be used not only for macrocell, microcell and picocell designs, but also for derivative, low-cost enterprise femtocell designs, as well. An optimized I/O capability enables concurrent use of both common public radio interface (CPRI)/Peripheral Component Interconnect (PCIe) and serial rapid IO (sRIO) interfaces. The SoC architecture also features IEEE 1588 version 2 clock recovery, and a built-in ciphering engine for radio interface and backhaul.
The migration to 4G mobile networks poses a number of challenges, which must be resolved in order to support the rapid acceleration in data-enabled devices and associated bandwidth demand. Next-generation silicon solutions are forging new territory to increase throughput while mitigating RF interferences, while shouldering significantly more computationally complex processing and delivering high enough levels of integration to minimize system latency, cost and power consumption.
MINDSPEED
Email
this article to a friend!
|