Research Topic

Machine Learning For High Performance Reliable On-chip/Off-chip Communications

Current Researchers: Dr. Ke Wang, Dr. Hao Zheng, Yuan Li, Yingnan Zhao, and Jiaqi Yang;

Mobirise

With continued aggressive technology scaling, Network-on- Chips (NoCs) architectures are facing three major challenges including minimizing power consumption, scaling performance and providing a reliable and robust communication limited by area, power, and cost constraints.

Researchers have proposed various techniques individually tackling these challenges, while few efforts to date have simultaneously targeted improving power, performance and reliability altogether. Due to the complexity of the interactions among three competing objectives and explosion of design space, it is harder to manually design rules and strategies for an interconnection system for optimal power, reliability and performance.

In our research, we use Machine Learning (ML) algorithms, which can work with high-dimensional data and automatically infer complex decisions, to balance reliability, performance, and energy efficiency for NoCs. We first use supervised ML algorithms to build predictive decision models, which can optimize competing goals of two of the three targets (e.g. reliability and performance, performance and power, etc.).

We further use reinforcement learning (RL) to eschew the prediction step and automatically learn a decision policy that directly maps system-level states to optimal decisions which can yield maximum benefits on reducing power, enhancing reliability, and improving performance simultaneously.

01.

H. Zheng and A. Louri, “Agile: A Learning-Enabled Power and Performance-Efficient Network-on-Chip Design", IEEE Transactions on Emerging Topics in Computing, 10.1 (2022): 223-236.

A number of techniques to achieve power-efficient Network-on-Chips (NoCs) have been proposed, two of which are power-gating and dynamic voltage and frequency scaling (DVFS). Power-gating reduces static power, and DVFS reduces dynamic power. With the goal of reducing both static and dynamic power, it is intuitive to simultaneously deploy both techniques. However, we observe that the straightforward combination of power-gating and DVFS can result in reduced power benefits and degraded performance. In this project, we uniquely combine power-gating and DVFS with the aim of maximizing the NoC power savings and improving performance. The proposed NoC design, called Agile, consists of several architectural designs and a reinforcement learning (RL) based control policy to mitigate the negative effects induced by the combined power-gating and DVFS. Specifically, a simple bypass switch is deployed to maintain network connectivity, avoiding frequently waking up the powered-off router. An optimized pipeline can simply pipeline stages of the bypass switch to reduce network latency. Reversible link channel buffers can be dynamically allocated to where they are needed to improve throughput. In addition, the RL control policy predicts NoC traffic and decides optimal power-gating decisions, voltage/frequency levels and NoC architecture configurations at runtime. Furthermore, we explore the use of an artificial neural network (ANN) to efficiently reduce the area overhead of implementing RL. We evaluate our design using PARSEC benchmarks suite. The full system simulation results show that the proposed design improves the overall power savings by up to 58 percent while improving the performance up to 11 percent as compared to state-of-the-art designs. The ANN-based RL implementation and bypass switch incur nominal area overhead of 5 percent, as compared to a conventional router.

02.

Y. Li and A. Louri, "ALPHA: A Learning-Enabled High-Performance Network-on-Chip Router Design for Heterogeneous Manycore Architectures" IEEE Transactions on Sustainable Computing 6.2 (2020): 274-288.

Heterogeneous manycores comprised of CPUs, GPUs and accelerators are putting stringent demands on network-on-chips (NoC). The NoCs need to support the combined traffic, including both latency-sensitive CPU traffic and throughput-sensitive GPU and accelerator traffic. We study the characteristics of the combined traffic, and observe that (1) the limited injection bandwidth is the main obstacle to throughput improvement, and (2) the latency due to local and global contention accounts for a significant portion of the network latency. We propose a router architecture named ALPHA for heterogeneous manycores. ALPHA introduces two new optimizations: (1) increasing injection bandwidth to improve throughput, and (2) resolving local and global contention to reduce network latency. Specifically, ALPHA increases the injection bandwidth through modifications to injection link, crossbar switch and buffer organization in the injection port of the router; ALPHA identifies the upcoming local contention and resolves it by optimally selecting traffic routes; ALPHA detects and alleviates the global contention by utilizing a supervised learning engine for traffic analysis, prediction, and adjustment. Simulation results using Rodinia benchmark show that ALPHA provides 28 percent throughput increase, 24 percent latency reduction, 22 percent execution time speedup, and 19 percent energy efficiency improvement, compared to the baseline router.

03.

H. Zheng and A. Louri, “An Energy-Efficient Network-on-Chip Design using Reinforcement Learning”, in Proceedings of 56th Design Automation Conference (DAC’19), Las Vegas, NV, June 2-6, 2019.

The design space for energy-efficient Network-on-Chips (NoCs) has expanded significantly comprising a number of techniques. The simultaneous application of these techniques to yield maximum energy efficiency requires the monitoring of a large number of system parameters which often results in substantial engineering efforts and complicated control policies. This motivates us to explore the use of reinforcement learning (RL) approach that automatically learns an optimal control policy to improve NoC energy efficiency. First, we deploy power-gating (PG) and dynamic voltage and frequency scaling (DVFS) to simultaneously reduce both static and dynamic power. Second, we use RL to automatically explore the dynamic interactions among PG, DVFS, and system parameters, learn the critical system parameters contained in the router and cache, and eventually evolve optimal per-router control policies that significantly improve energy efficiency. Moreover, we introduce an artificial neural network (ANN) to efficiently implement the large state-action table required by RL. Simulation results using PARSEC benchmark show that the proposed RL approach improves power consumption by 26%, while improving system performance by 7%, as compared to a combined PG and DVFS design without RL. Additionally, the ANN design yields 67% area reduction, as compared to a conventional RL implementation.

04.

K. Wang and A. Louri, "CURE: A High-Performance, Low-Power, and Reliable Network-on-Chip Design Using Reinforcement Learning", IEEE Transactions on Parallel and Distributed Systems 31.9 (2020): 2125-2138.

We propose CURE, a deep reinforcement learning (DRL)-based NoC design framework that simultaneously reduces network latency, improves energy-efficiency, and tolerates transient errors and permanent faults. CURE has several architectural innovations and a DRL-based hardware controller to manage design complexity and optimize trade-offs. First, in CURE, we propose reversible multi-function adaptive channels (RMCs) to reduce NoC power consumption and network latency. Second, we implement a new fault-secure adaptive error correction hardware in each router to enhance reliability for both transient errors and permanent faults. Third, we propose a router power-gating and bypass design that powers off NoC components to reduce power and extend chip lifespan. Further, for the complex dynamic interactions of these techniques, we propose using DRL to train a proactive control policy to provide improved fault-tolerance, reduce power consumption, and improved performance. Simulation using the PARSEC benchmark shows that CURE reduces end-to-end packet latency by 39 percent, improves energy efficiency by 92 percent, and lowers static and dynamic power consumption by 24 and 38 percent, respectively, over conventional solutions. Using mean-time-to-failure, we show that CURE is 7.7 X more reliable than the conventional NoC design.

05.

K. Wang, A. Louri, A. Karanth and R. Bunescu, “IntelliNoC: A Holistic Design Framework for Energy-Efficient and Reliable On-chip Communication for Manycores”, in Proceedings of the 46th International Symposium on Computer Architecture (ISCA-46), Phoenix, Arizona, June 22-26, 2019.

Network-on-Chips (NoCs), currently being used for on-chip communication in manycore architectures, face several problems including high network latency, excessive power consumption, and low reliability. Simultaneously addressing these problems is proving to be difficult due to the explosion of the design space and the complexity of handling many trade-offs. In this paper, we propose IntelliNoC, an intelligent NoC design framework which introduces architectural innovations and uses reinforcement learning to manage the design complexity and simultaneously optimize performance, energy-efficiency, and reliability in a holistic manner. IntelliNoC integrates three NoC architectural techniques: (1) multifunction adaptive channels (MFACs) to improve energy-efficiency; (2) adaptive error detection/correction and re-transmission control to enhance reliability; and (3) a stress-relaxing bypass feature which dynamically powers off NoC components to prevent overheating and fatigue. To handle the complex dynamic interactions induced by these techniques, we train a dynamic control policy using Q-learning, with the goal of providing improved fault-tolerance and performance while reducing power consumption and area overhead. Simulation using PARSEC benchmarks shows that our proposed IntelliNoC design improves energy-efficiency by 67% and mean-time-to-failure (MTTF) by 77%, and decreases end-to-end packet latency by 32% and area requirements by 25% over baseline NoC architecture.

06.

K. Wang, A. Louri, A. Karanth, and R. Bunescu, "High-performance, Energy-efficient, Fault-tolerant Network-on-Chip Design Using Reinforcement Learning", in Proceedings of the Design Automation & Test in Europe Conference & Exhibition (DATE), Florence, Italy, 2019.

As technology continues to scale, transistors and wires on the chip are becoming increasingly vulnerable to various fault mechanisms, especially timing errors, resulting in exacerbation of energy efficiency and performance for NoCs. Typical techniques for handling timing errors are reactive in nature, responding to the faults after their occurrence. They rely on error detection/correction techniques which have resulted in excessive power consumption and degraded performance, since the error detection/correction hardware is constantly enabled. On the other hand, indiscriminately disabling error handling hardware can induce more errors and intrusive retransmission traffic. Therefore, the challenge is to balance the trade-offs among error rate, packet retransmission, performance, and energy. In this paper, we propose a proactive fault-tolerant mechanism to optimize energy efficiency and performance with reinforcement learning (RL). First, we propose a new proactive error handling technique comprised of a dynamic scheme for enabling per-router error detection/correction hardware and an effective retransmission mechanism. Second, we propose the use of RL to train the dynamic control policy with the goals of providing increased fault-tolerance, reduced power consumption and improved performance as compared to conventional techniques. Our evaluation indicates that, on average, end-to-end packet latency is lowered by 55%, energy efficiency is improved by 64%, and retransmission caused by faults is reduced by 48% over the reactive error correction techniques.

07.

M. Clark, A. Kodi, R. Bunescu and A. Louri, “LEAD: Learning-enabled Energy-Aware Dynamic Voltage/Frequency Scaling in NoCs,” in Proceedings of the 55th Design Automation Conference (DAC), San Francisco, CA, 2018.

Dynamic Voltage and Frequency Scaling (DVFS) is a popular technique that allows dynamic energy to be saved, but it can potentially lead to loss in throughput. We propose LEAD - Learning-enabled Energy-Aware Dynamic voltage/frequency scaling for NoC architectures wherein we use machine learning techniques to enable energy-performance trade-offs at reduced overhead cost. LEAD enables a proactive energy management strategy that relies on an offline trained regression model and provides a wide variety of voltage/frequency pairs (modes). LEAD groups each router and the router’s outgoing links locally into the same V/F domain, allowing energy management at a finer granularity without additional timing complications and overhead. Our simulation shows an average dynamic energy savings of 17% with a minimal loss of 4% in throughput and no latency increase.

08.

S. V. Winkle, A. Kodi, R. Bunescu and A. Louri, “Extending the Power-Efficiency and Performance of Photonic Interconnects for Heterogeneous Multicores with Machine Learning,” in Proceedings of the 24th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Vienna, 2018.

As communication energy exceeds computation energy in future technologies, traditional on-chip electrical interconnects face fundamental challenges in the many-core era. Photonic interconnects have been proposed as a disruptive technology solution due to superior performance per Watt, distance independent energy consumption and CMOS compatibility for on-chip interconnects. Static power due to the laser being always switched on, varying link utilization due to spatial and temporal traffic fluctuations and thermal sensitivity are some of the critical challenges facing photonics interconnects. We propose photonic interconnects for heterogeneous multicores using a checkerboard pattern that clusters CPU-GPU cores together and implements bandwidth reconfiguration using local router information without global coordination. To reduce the static power, we also propose a dynamic laser scaling technique that predicts the power level for the next epoch using the buffer occupancy of previous epoch. To further improve power-performance trade-offs, we also propose a regression-based machine learning technique for scaling the power of the photonic link. Our simulation results demonstrate a 34% performance improvement over a baseline electrical CMESH while consuming 25% less energy per bit when dynamically reallocating bandwidth. When dynamically scaling laser power, our buffer-based reactive and ML-based proactive prediction techniques show 40 - 65% in power savings with 0 - 14% in throughput loss depending on the reservation window size.

09.

D. DiTomaso, A. Sikder, A. Kodi and A. Louri, “Machine learning enabled power-aware Network-on-Chip design,” in Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, Lausanne, 2017, pp. 1354-1359.

NoC designs suffer from excessive static and dynamic power consumption. High dynamic power consumption results from switching and storing data within routers/links while excess static power is consumed when routers and links are not utilized for communication and yet have to be powered up. We propose LESSON (Learning Enabled Sleepy Storage Links and Routers in NoCs) to reduce both static and dynamic power consumption by power-gating the links and routers at low network utilization and moving the data storage from within the routers to the links at high network utilization. As the network utilization increases from low-to- high, to accommodate more traffic, we design the same channels to flow traffic in either direction, thereby avoiding complex routing or look-ahead wake-up algorithms. Machine learning algorithms predict when to power-gate the channels and routers and when to increase the channel bandwidths such that power savings are maximized while performance penalty is minimized. Our results show that we can improve total network power consumption when compared to conventional NoC buffer designs by 85.6% and when compared with aggressive NoC buffer designs by 31.7%. Our predictor shows marginal performance penalties and by dynamically changing the direction of the links, we can improve packet latency by 14%.

10.

D. DiTomaso, T. Boraten, A. Kodi and A. Louri, “Dynamic error mitigation in NoCs using intelligent prediction techniques,” in Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Taipei, 2016, pp. 1-12.

We develop a new approach to proactive fault-tolerance which uses machine learning (ML) algorithms to predict and mitigate errors. We provide a comprehensive fault-prediction system in which we (a) create a methodology to obtain realistic data sets, (b) train a ML algorithm to predict timing faults on links, and (c) mitigate for soft errors. We develop a fault model, which accounts for parameter variation and device wear-out, to create training/testing data sets for the ML algorithm. Using the training data set and the ID3 algorithm, we create decision trees which can be used to accurately predict the number of errors. Finally, we dynamically mitigate the errors using a combination of error correction codes (ECC) and a relaxed transmission. Our network results indicate a 26.8% reduction in packet retransmissions, a 3.31× speedup, and an energy savings of 60.0% on average over other designs.

HPCAT Lab
High Performance Computing Architectures & Technologies Lab

Department of Electrical and Computer Enginnering
School of Engineering and Applied Science
The George Washington University


800 22nd Street NW
Washington, DC 20052
United States of America 

Contact

Ahmed Louri, IEEE Fellow
David and Marilyn Karlgaard Endowed Chair Professor of ECE
Director,  HPCAT Lab 


Email: louri@gwu.edu                    
Phone: +1 (202) 994 8241

Mobirise website builder - Try here