Research Topic

Accelerating Machine Learning with Approximation

Current Researchers: Yuechen Chen;

Machine learning algorithms suffer from the high cost of computational complexity and a significant amount of data movement. Recent studies show that machine learning algorithms can tolerate modest errors, thus opening a new design dimension, namely, trading accuracy for better system performance. With reduced accuracy, the multi-core system processes less data and results in less power consumption and execution time. In this project, we explore the possibility of supporting approximation in the multi-core architecture to reduce the time and power consumption on the execution of machine learning algorithms. Further, we are also exploring different approximation method for different memory technology (e.g., HBM, NVM) and architecture (e.g., GPU, FPGA).

01.

Y. Chen, A. Louri, S. Liu, and F. Lombardi, "Slack-Aware Packet Approximation for Energy-Efficient Network-on-Chips", IEEE Transactions on Sustainable Computing, pp: 1-12, DOI: 10.1109/TSUSC.2022.3213469, October.

Network-on-Chips (NoCs) are the standard on-chip communication fabrics for connecting cores, caches, and memory controllers in multi/many-core systems. With the increase in communication load introduced by emerging parallel computing applications, on-chip communication is becoming more costly than computation in terms of energy consumption. This paper contributes to existing research on approximate communication by proposing a slack-aware packet approximation technique to reduce the energy consumed by NoCs for sustainable parallel computation. The proposed approximation technique lowers both the execution time and NoC power consumption by reducing the packet size based on slack. The slack is the number of cycles by which a packet can be delayed in the network with no effect on execution time. Thus, low-slack packets are considered critical to system performance, and prioritizing these packets during the transmission will significantly reduce execution time. The proposed technique includes a slack-aware control policy to identify low-slack packets and accelerates these packets using two packet approximation mechanisms, namely, an in-network approximation (INAP) and a network interface approximation (NIAP). INAP mechanism prioritizes low-slack packets during the arbitration phase of the router by approximating packets with high-slack. NIAP mechanism reduces the latency of the network links and switch traversals by truncating data for the low-slack packets. An approximate network interface and router are implemented to support the proposed technique with lightweight packet approximation hardware for lower power consumption and execution time. Cycle-accurate simulations using the AxBench and PARSEC benchmark suites show that the proposed approximate communication technique achieves reductions of up to 24% in execution time and 38% in energy consumption with 1.1% less accuracy loss on average compared to existing approximate communication techniques.

02.

Y. Chen, A. Louri, S. Liu, and F. Lombardi, "Approximate Network-on-Chips with Application to Image Classification", in Proceedings of the 16th IEEE International Conference on Networking, Architecture, and Storage, Philadelphia, PA, October 3-4, 2022.

Approximation is an emerging design methodology for reducing power consumption and latency of on-chip communication in many computing applications. However, existing approximation techniques either achieve modest improvements in these metrics or require retraining after approximation. Since classifying many images introduces intensive on-chip communication, reductions in both network latency and power consumption are highly desired. In this paper, we propose an approximate communication technique (ACT) to improve the efficiency of on-chip communications for image classification applications. The proposed technique exploits the error-tolerance of the image classification process to reduce power consumption and latency of on-chip communications, resulting in better overall performance for image classification. This is achieved by incorporating novel quality control and data approximation mechanisms that reduce the packet size. In particular, the proposed quality control mechanisms identify the error-resilient variables and automatically adjust the error thresholds of the variables based on the image classification accuracy. The proposed data approximation mechanisms significantly reduce packet size when the variables are transmitted. The proposed technique reduces the number of flits in each data packet as well as the on-chip communication while maintaining an excellent image classification accuracy. Cycle-accurate simulation results show that ACT achieves 27% in network latency reduction and 28% in dynamic power reduction as compared to existing approximate communication techniques with less than 0.85% classification accuracy loss.

03.

Y. Chen, S. Liu, L. Fabrizio, and A. Louri, "A Technique for Approximate Communication in Network-on-Chips for Image Classification", IEEE Transactions on Emerging Topics in Computing, 2022.

Approximation is an emerging design methodology for reducing power consumption and latency of on-chip communication in many computing applications. However, existing approximation techniques either achieve modest improvements in these metrics or require retraining after approximation. Since classifying many images introduces intensive on-chip communication, reductions in both network latency and power consumption are highly desired. In this project, we propose an approximate communication technique (ACT) to improve the efficiency of on-chip communications for image classification applications. The proposed technique exploits the error-tolerance of the image classification process to reduce power consumption and latency of on-chip communications, resulting in better overall performance for image classification. This is achieved by incorporating novel quality control and data approximation mechanisms that reduce the packet size. The proposed technique reduces the number of flits in each data packet as well as the on-chip communication while maintaining an excellent image classification accuracy. Cycle-accurate simulation results show that ACT achieves 23 percent in network latency reduction and 24 percent in dynamic power reduction as compared to existing approximate communication techniques with less than 0.99 percent classification accuracy loss.

04.

Y. Chen and A. Louri, “Learning-Based Quality Management for Approximate Communication in Network-on-Chips,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 11, pp. 3724–3735, Nov. 2020, doi: 10.1109/TCAD.2020.3012235.

Current multi/many-core systems spend large amounts of time and power transmitting data across on-chip interconnects. This problem is aggravated when data-intensive applications, such as machine learning and pattern recognition, are executed in these systems. Recent studies show that some data-intensive applications can tolerate modest errors, thus opening a new design dimension, namely, trading result quality for better system performance. In this article, we explore application error tolerance and propose an approximate communication framework to reduce the power consumption and latency of network-on-chips (NoCs). The proposed framework incorporates a quality control method and a data approximation mechanism to reduce the packet size to decrease network power consumption and latency. The quality control method automatically identifies the error-resilient variables that can be approximated during transmission and calculates their error thresholds based on the quality requirements of the application by analyzing the source code. The data approximation method includes a lightweight lossy compression scheme, which significantly reduces packet size when the error-resilient variables are transmitted. This frame- work results in fewer flits in each data packet and reduces traffic in NoCs while guaranteeing the quality requirements of applications. Our cycle-accurate simulation using the AxBench benchmark suite shows that the proposed approximate communication framework achieves 36 percent latency reduction and 46 percent dynamic power reduction compared to previous approximate communication techniques. 

05.

Y. Chen and A. Louri, “An Approximate Communication Framework for Network-on-Chips,” IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 6, pp. 1434–1446, Jun. 2020, doi: 10.1109/TPDS.2020.2968068.

In this article, we explore application error tolerance and propose an approximate communication framework to reduce the power consumption and latency of network-on-chips (NoCs). The proposed framework incorporates a quality control method and a data approximation mechanism to reduce the packet size to decrease network power consumption and latency. The quality control method automatically identifies the error-resilient variables that can be approximated during transmission and calculates their error thresholds based on the quality requirements of the application by analyzing the source code. The data approximation method includes a lightweight lossy compression scheme, which significantly reduces packet size when the error-resilient variables are transmitted. This framework results in fewer flits in each data packet and reduces traffic in NoCs while guaranteeing the quality requirements of applications. Our cycle-accurate simulation using the AxBench benchmark suite shows that the proposed approximate communication framework achieves 62 percent latency reduction and 43 percent dynamic power reduction compared to previous approximate communication techniques while ensuring 95 percent result quality. 

06.

Y. Chen and A. Louri, "An Online Quality Management Framework for Approximate Communication in Network-on-Chips", in Proceedings of the 33rd International Conference on Supercomputing (ICS), Phoenix, AZ, June 26-28, 2019.

Approximate communication is being seriously considered as an effective technique for reducing power consumption and improving the communication efficiency of network-on-chips (NoCs). A major problem faced by these techniques is quality control: how do we ensure that the network will transmit data with sufficient accuracy for applications to produce acceptable results? In this paper, we propose a hardware-based quality management framework for approximate communication. The proposed framework employs a configuration algorithm to continuously adjust the quality of every piece of data based on the difference between the output quality and the application's quality requirement. When the proposed framework is implemented in a network, every request packet can be transmitted with the updated approximation level. This framework results in fewer flits in each data packet and reduces traffic in NoCs while meeting the quality requirements of applications. Our cycle-accurate simulation using the AxBench benchmark suite shows that the proposed online quality management framework can reduce network latency by up to 52% and dynamic power consumption by 59% compared to previous approximate communication techniques while ensuring 95% output quality. This hardware-software codesign incurs 1% area overhead over previous techniques.

07.

Y. Chen, M.F. Reza and A. Louri, “DEC-NoC: An Approximate Framework based on Dynamic Error Control with Applications to Energy-efficient NoCs,” in Proceedings of the 36th IEEE International Conference on Computer Design (ICCD), Orlando, FL, October 7-10, 2018.

Network-on-Chips (NoCs) have emerged as the standard on-chip communication fabrics for multi/manycore systems and system on chips. However, as the number of cores on chip increases, so does power consumption. Recent studies have shown that NoC power consumption can reach up to 40% of the overall chip power.  Considerable research efforts have been deployed to significantly reduce NoC power consumption. In this paper, we build on approximate computing techniques and propose an approximate communication methodology called DEC-NoC for reducing NoC power consumption.  The proposed DEC-NoC leverages applications' error tolerance and dynamically reduces the amount of error checking and correction in packet transmission, which results in a significant reduction in the number of retransmitted packets. The reduction in packet retransmission results in reduced power consumption. Our cycle accurate simulation using PARSEC benchmark suites shows that DEC-NoC achieves up to 56% latency reduction and up to 58% dynamic power reduction compared to NoC architectures with conventional error control techniques.

HPCAT Lab
High Performance Computing Architectures & Technologies Lab

Department of Electrical and Computer Enginnering
School of Engineering and Applied Science
The George Washington University


800 22nd Street NW
Washington, DC 20052
United States of America 

Contact

Ahmed Louri, IEEE Fellow
David and Marilyn Karlgaard Endowed Chair Professor of ECE
Director,  HPCAT Lab 


Email: louri@gwu.edu                    
Phone: +1 (202) 994 8241