Design of advanced LDPC decoders using traditional and new implementation technologies

Awais, Muhammad

Low Density Parity Check (LDPC) codes, a class of linear block codes have gained huge attention in digital communication domain. Binary LDPC codes were invented by Gallager in 1963 and rediscovered by Mackay and Neil in 1995. Thanks to near Shannon limit performance, low error floor, intrinsic parallelism and affordable com- plexity, binary LDPC codes are considered in a number of standards e.g. WiMAX (IEEE 802.16e), WiFi (IEEE 802.11n), WLAN and DVB-S2. Non binary LDPC (NB-LDPC) codes, an extension of binary LDPC to higher order Galois fields, show better performance when the code length is small or when a high-order modula- tion is applied in the communication system. Apart from all the elegant features, hardware design of LDPC decoders meeting area, power and speed constraints is still a challenging task and requires considerable research effort. In this thesis we focused our research towards efficient design of high performance LDPC decoders using traditional CMOS VLSI and “beyond CMOS” technologies. This thesis has contributions both in the domain of binary and non binary LDPC decoding. The main contributions to this thesis are summarized in the following paragraphs. The processing core of a binary LDPC decoder lies in the check node (CN) part which executes actual decoding algorithm and contributes towards the overall complexity, throughput and performance of the whole decoder. The state of the art for LDPC decoders mainly features a partial parallel architecture which consists of a number of CNs realized in hardware to achieve flexible, high throughput, iterative decoding. However, in most of the published works on LDPC decoders, the CN itself is implemented in a serial way which limits the achievable throughput to a large extent. Realizing high throughput decoders (supporting data rates up to few hundred Mbps) either asks for a massive number of CNs or a high clock frequency which results in significant area and power overhead. Parallelism at check node level is an essential step which can bring significant increase in throughput. However, a straightforward parallel implementation suffers from large complexity of CN. So, in the first part of the thesis we proposed a generic implementation of a parallel check node based on a novel “Tree way” approach. In addition, we presented a generalization of the “Tree-way” approach for check node degree dc up to 32, which provides compile time flexibility to support a large number of LDPC codes for next generation standards. The “Tree way” check node architecture is exploited to design a fully parameterized LDPC decoder IP core forWiMAX andWiFi standards. With the help of an efficient datapath reuse and simple control mechanism, the proposed decoder based on “Tree way” check node achieves a high throughput with fairly affordable complexity. The second part of the thesis deals with the the VLSI hardware implementation of a novel Belief Propagation (BP) algorithm named as Analog Digital Belief Prop- agation (ADBP). The ADBP algorithm works on factor graphs over linear models and uses messages in the form of Gaussian like probability distributions by track- ing their parameters. In particular, ADBP can deal with system variables that are discrete and/or wrapped. A variant of ADBP can then be applied for the iterative decoding of a particular class of NB-LDPC codes and yields decoders with complex- ity independent of modulation alphabet size M, thus allowing to construct efficient decoders for digital transmission systems with unbounded spectral efficiency. In this work, we propose some simplifications to the updating rules for ADBP algorithm that are suitable for hardware implementation. In addition, we analyze the effect of finite precision on the decoding performance of the algorithm. A careful selection of quantization scheme for input, output and intermediate variables allows us to con- struct a complete ADBP decoding architecture that performs close to the double precision implementation and shows a promising complexity for large values of M. Because of the computation intensive nature of LDPC decoding algorithms, a CMOS VLSI based implementation of LDPC decoders results in a considerable area and power. In addition, the limitations on the switching frequency of CMOS transis- tors puts an upper bound on the achievable throughput. Therefore, implementation of LDPC decoders on advanced “beyond CMOS” technologies makes sense. Quan- tum dot Cellular Automata (QCA) is an emerging nanotechnology that has gained significant research interest in recent years. Extremely small feature sizes, ultra low power consumption, and high clock frequency make QCA a potentially attractive solution for implementing computing architectures at the nanoscale. In the third part of the thesis we present a novel QCA architecture for binary LDPC check node which executes Normalized Min Sum algorithm. We adapt the decoding architec- ture to the specific characteristics of QCA technology, by exploiting majority voting circuits and inherent delaying and pipelining behavior of wires. The proposed CN is fully pipelined, partial parallel and reconfigurable to support up to degree dc = 20. The circuit is described using a realistic layout aware VHDL model which allows in addition to the circuit simulation, area and power estimation for the two im- plementations of QCA technology i.e. magnetic and molecular. Simulation results show that remarkable area saving and high throughput could be achieved for molec- ular QCA implementation, while the magnetic QCA is attractive for achieving low power. For both cases, the proposed design has an area fairly smaller and clock speed comparable or much larger than its implementation on up to date CMOS technology. Finally, we present a QCA implementation of Fast Fourier Transform (FFT) Algorithm which has application in decoding of non binary LDPC codes. A novel architecture for a partial parallel FFT processor is presented which not only reduces the circuit complexity but also eliminates the need of feedback signals, allowing to maximize the throughput. Again, the circuit performance results are estimated with the help of a layout aware VHDL model for magnetic and molecular QCA technologies.

PORTO @ Archivio Istituzionale della Ricerca