Research Topic

bootstrap template

Heterogeneous Chiplet-Based Architectures

Current Researchers: Yuan Li;

In the dark silicon era, only a fraction of transistors on a chip can be switched on simultaneously, due to the constrained power budget. To improve energy-efficiency, general-purpose cores are augmented with multiple types of accelerators. The general-purpose cores and accelerators can be integrated on a single chip or in an emerging chiplet-based system. The integration of heterogeneous cores on a chip or in a chiplet-based system is putting stringent demands on the communication fabric, as the heterogeneous cores with different microarchitectures and programming models usually have distinct traffic patterns and sensitivities to network latency and bandwidth.

In this research project, we address the interconnection design challenges by fully exploring the traffic patterns of diverse types of cores and then designing the interconnection network which can be configured to adapt to specific traffic patterns. We are especially interested in utilizing the wiring resource in silicon interposer in chiplet systems to design the interconnection network.


Network-on-Chip (NoC) Design for Heterogeneous Manycores

Heterogeneous manycores comprised of CPUs, GPUs and accelerators are putting stringent demands on network-on-chips (NoCs). The NoCs need to support the combined traffic, including both latency-sensitive CPU traffic and throughput-sensitive GPU and accelerator traffic. We study the characteristics of the combined traffic, and observe that (1) the limited injection bandwidth is the main obstacle to throughput improvement, and (2) the latency due to local and global contention accounts for a significant portion of the network latency. We propose a router architecture for heterogeneous manycores. The proposed router architecture introduces two new optimizations: (1) increasing injection bandwidth to improve throughput, and (2) resolving local and global contention to reduce network latency. Specifically, our design increases the injection bandwidth through modifications to injection link, crossbar switch and buffer organization in the injection port of the router; our design identifies the upcoming local contention and resolves it by optimally selecting traffic routes; our design also utilizes a supervised learning engine to detect the global contention through traffic analysis and prediction, and alleviate it by adaptively adjusting traffic injection. Simulation results using the Rodinia benchmark show that the proposed router architecture provides 28% throughput increase, 24% latency reduction, 22% execution time speedup, and 19% energy efficiency improvement, compared to the baseline router.

High Performance Computing Architectures & Technologies Lab

Department of Electrical and Computer Enginnering
School of Engineering and Applied Science
The George Washington University

800 22nd Street NW
Washington, DC 20052
United States of America 


Ahmed Louri, IEEE Fellow
David and Marilyn Karlgaard Endowed Chair Professor of ECE
Director,  HPCAT Lab 

Phone: +1 (202) 994 8241