Search | arXiv e-print repository

arXiv:2403.19724 [pdf]

Towards Reverse-Engineering the Brain: Brain-Derived Neuromorphic Computing Approach with Photonic, Electronic, and Ionic Dynamicity in 3D integrated circuits

Authors: S. J. Ben Yoo, Luis El-Srouji, Suman Datta, Shimeng Yu, Jean Anne Incorvia, Alberto Salleo, Volker Sorger, Juejun Hu, Lionel C Kimerling, Kristofer Bouchard, Joy Geng, Rishidev Chaudhuri, Charan Ranganath, Randall O'Reilly

Abstract: The human brain has immense learning capabilities at extreme energy efficiencies and scale that no artificial system has been able to match. For decades, reverse engineering the brain has been one of the top priorities of science and technology research. Despite numerous efforts, conventional electronics-based methods have failed to match the scalability, energy efficiency, and self-supervised lea… ▽ More The human brain has immense learning capabilities at extreme energy efficiencies and scale that no artificial system has been able to match. For decades, reverse engineering the brain has been one of the top priorities of science and technology research. Despite numerous efforts, conventional electronics-based methods have failed to match the scalability, energy efficiency, and self-supervised learning capabilities of the human brain. On the other hand, very recent progress in the development of new generations of photonic and electronic memristive materials, device technologies, and 3D electronic-photonic integrated circuits (3D EPIC ) promise to realize new brain-derived neuromorphic systems with comparable connectivity, density, energy-efficiency, and scalability. When combined with bio-realistic learning algorithms and architectures, it may be possible to realize an 'artificial brain' prototype with general self-learning capabilities. This paper argues the possibility of reverse-engineering the brain through architecting a prototype of a brain-derived neuromorphic computing system consisting of artificial electronic, ionic, photonic materials, devices, and circuits with dynamicity resembling the bio-plausible molecular, neuro/synaptic, neuro-circuit, and multi-structural hierarchical macro-circuits of the brain based on well-tested computational models. We further argue the importance of bio-plausible local learning algorithms applicable to the neuromorphic computing system that capture the flexible and adaptive unsupervised and self-supervised learning mechanisms central to human intelligence. Most importantly, we emphasize that the unique capabilities in brain-derived neuromorphic computing prototype systems will enable us to understand links between specific neuronal and network-level properties with system-level functioning and behavior. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 15 pages, 12 figures

arXiv:2308.00215 [pdf, other]

From Talent Shortage to Workforce Excellence in the CHIPS Act Era: Harnessing Industry 4.0 Paradigms for a Sustainable Future in Domestic Chip Production

Authors: Aida Damanpak Rizi, Antika Roy, Rouhan Noor, Hyo Kang, Nitin Varshney, Katja Jacob, Sindia Rivera-Jimenez, Nathan Edwards, Volker J. Sorger, Hamed Dalir, Navid Asadizanjani

Abstract: The CHIPS Act is driving the U.S. towards a self-sustainable future in domestic chip production. Decades of outsourced manufacturing, assembly, testing, and packaging has diminished the workforce ecosystem, imposing major limitations on semiconductor companies racing to build new fabrication sites as part of the CHIPS Act. In response, a systemic alliance between academic institutions, the industr… ▽ More The CHIPS Act is driving the U.S. towards a self-sustainable future in domestic chip production. Decades of outsourced manufacturing, assembly, testing, and packaging has diminished the workforce ecosystem, imposing major limitations on semiconductor companies racing to build new fabrication sites as part of the CHIPS Act. In response, a systemic alliance between academic institutions, the industry, government, various consortiums, and organizations has emerged to establish a pipeline to educate and onboard the next generation of talent. Establishing a stable and continuous flow of talent requires significant time investments and comes with no guarantees, particularly factoring in the low workplace desirability in current fabrication houses for U.S workforce. This paper will explore the feasibility of two paradigms of Industry 4.0, automation and Augmented Reality(AR)/Virtual Reality(VR), to complement ongoing workforce development efforts and optimize workplace desirability by catalyzing core manufacturing processes and effectively enhancing the education, onboarding, and professional realms-all with promising capabilities amid the ongoing talent shortage and trajectory towards advanced packaging. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: 18 pages, 8 figures

arXiv:2211.05276 [pdf, other]

PhotoFourier: A Photonic Joint Transform Correlator-Based Neural Network Accelerator

Authors: Shurui Li, Hangbo Yang, Chee Wei Wong, Volker J. Sorger, Puneet Gupta

Abstract: The last few years have seen a lot of work to address the challenge of low-latency and high-throughput convolutional neural network inference. Integrated photonics has the potential to dramatically accelerate neural networks because of its low-latency nature. Combined with the concept of Joint Transform Correlator (JTC), the computationally expensive convolution functions can be computed instantan… ▽ More The last few years have seen a lot of work to address the challenge of low-latency and high-throughput convolutional neural network inference. Integrated photonics has the potential to dramatically accelerate neural networks because of its low-latency nature. Combined with the concept of Joint Transform Correlator (JTC), the computationally expensive convolution functions can be computed instantaneously (time of flight of light) with almost no cost. This 'free' convolution computation provides the theoretical basis of the proposed PhotoFourier JTC-based CNN accelerator. PhotoFourier addresses a myriad of challenges posed by on-chip photonic computing in the Fourier domain including 1D lenses and high-cost optoelectronic conversions. The proposed PhotoFourier accelerator achieves more than 28X better energy-delay product compared to state-of-art photonic neural network accelerators. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: 12 pages, 13 figures, accepted in HPCA 2023

arXiv:2211.01476 [pdf, other]

doi 10.1109/JLT.2023.3269957

Integrated Photonic Tensor Processing Unit for a Matrix Multiply: a Review

Authors: Nicola Peserico, Bhavin J. Shastri, Volker J. Sorger

Abstract: The explosion of artificial intelligence and machine-learning algorithms, connected to the exponential growth of the exchanged data, is driving a search for novel application-specific hardware accelerators. Among the many, the photonics field appears to be in the perfect spotlight for this global data explosion, thanks to its almost infinite bandwidth capacity associated with limited energy consum… ▽ More The explosion of artificial intelligence and machine-learning algorithms, connected to the exponential growth of the exchanged data, is driving a search for novel application-specific hardware accelerators. Among the many, the photonics field appears to be in the perfect spotlight for this global data explosion, thanks to its almost infinite bandwidth capacity associated with limited energy consumption. In this review, we will overview the major advantages that photonics has over electronics for hardware accelerators, followed by a comparison between the major architectures implemented on Photonics Integrated Circuits (PIC) for both the linear and nonlinear parts of Neural Networks. By the end, we will highlight the main driving forces for the next generation of photonic accelerators, as well as the main limits that must be overcome. △ Less

Submitted 2 November, 2022; originally announced November 2022.

arXiv:2209.09189 [pdf]

Bistable all-optical devices based on nonlinear epsilon-near-zero (ENZ) materials

Authors: J. Gosciniak, Z. Hu, M. Thomaschewski, V. Sorger, J. B. Khurgin

Abstract: Non-linear and bistable optical systems are a key enabling technology for the next generation optical networks and photonic neural systems with many potential applications in optical logic and information processing. Here, we propose a novel bistable resonator-free all-optical waveguide device based on indium tin oxide as nonlinear epsilon-near-zero material providing a cost-efficient and high-per… ▽ More Non-linear and bistable optical systems are a key enabling technology for the next generation optical networks and photonic neural systems with many potential applications in optical logic and information processing. Here, we propose a novel bistable resonator-free all-optical waveguide device based on indium tin oxide as nonlinear epsilon-near-zero material providing a cost-efficient and high-performance binarity photonic platform. The salient features of the proposed device are compatibility with silicon photonics, enabling sub-picosecond operation speeds with moderate switching power. The device can act as an optical analogue of memristor or thyristor and can become an enabling element of photonic neural networks not requiring OEO conversions. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 9 pages, 5 figures

arXiv:2112.12844 [pdf]

Emerging Devices and Packaging Strategies for Electronic-Photonic AI Accelerators

Authors: Nicola Peserico, Thomas Ferreira De Lima, Paul R. Prucnal, Volker J. Sorger

Abstract: The field of mimicking the structure of the brain on a chip is experiencing interest driven by the demand for machine intelligent applications. However, the power consumption and available performance of machine-learning (ML) accelerating hardware still leave much desire for improvement. In this letter, we share viewpoints, challenges, and prospects of electronic-photonic neural network (NN) accel… ▽ More The field of mimicking the structure of the brain on a chip is experiencing interest driven by the demand for machine intelligent applications. However, the power consumption and available performance of machine-learning (ML) accelerating hardware still leave much desire for improvement. In this letter, we share viewpoints, challenges, and prospects of electronic-photonic neural network (NN) accelerators. Combining electronics with photonics offers synergistic co-design strategies for high-performance AI Application-specific integrated circuits (ASICs) and systems. Taking advantages of photonic signal processing capabilities and combining them with electronic logic control and data storage is an emerging prospect. However, the optical component library leaves much to be desired and is challenged by the enormous size of photonic devices. Within this context, we will review the emerging electro-optic materials, functional devices, and systems packaging strategies that, when realized, provide significant performance gains and fuel the ongoing AI revolution, leading to a stand-alone photonics-inside AI ASIC 'black-box' for streamlined plug-and-play board integration in future AI processors. △ Less

Submitted 23 December, 2021; originally announced December 2021.

arXiv:2112.12297 [pdf]

High Throughput Multi-Channel Parallelized Diffraction Convolutional Neural Network Accelerator

Authors: Zibo Hu, Shurui Li, Russell L. T. Schwartz, Maria Solyanik-Gorgone, Mario Miscuglio, Puneet Gupta, Volker J. Sorger

Abstract: Convolutional neural networks are paramount in image and signal processing including the relevant classification and training tasks alike and constitute for the majority of machine learning compute demand today. With convolution operations being computationally intensive, next generation hardware accelerators need to offer parallelization and algorithmic-hardware homomorphism. Fortunately, diffrac… ▽ More Convolutional neural networks are paramount in image and signal processing including the relevant classification and training tasks alike and constitute for the majority of machine learning compute demand today. With convolution operations being computationally intensive, next generation hardware accelerators need to offer parallelization and algorithmic-hardware homomorphism. Fortunately, diffractive display optics is capable of million-channel parallel data processing at low latency, however, thus far only showed tens of Hertz slow single image and kernel capability, thereby significantly underdelivering from its performance potential. Here, we demonstrate an operation-parallelized high-throughput Fourier optic convolutional neural network accelerator. For the first time simultaneously processing of multiple kernels in Fourier domain enabled by optical diffraction has been achieved alongside with already conventional in the field input parallelism. Additionally, we show an about one hundred times system speed up over existing optical diffraction-based processors and this demonstration rivals performance of modern electronic solutions. Therefore, this system is capable of processing large-scale matrices about ten times faster than state of art electronic systems. △ Less

Submitted 7 July, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: 13 pages, 4 figures

arXiv:2111.06862 [pdf, other]

doi 10.1364/OPTICA.475493

Silicon Photonic Architecture for Training Deep Neural Networks with Direct Feedback Alignment

Authors: Matthew J. Filipovich, Zhimu Guo, Mohammed Al-Qadasi, Bicky A. Marquez, Hugh D. Morison, Volker J. Sorger, Paul R. Prucnal, Sudip Shekhar, Bhavin J. Shastri

Abstract: There has been growing interest in using photonic processors for performing neural network inference operations; however, these networks are currently trained using standard digital electronics. Here, we propose on-chip training of neural networks enabled by a CMOS-compatible silicon photonic architecture to harness the potential for massively parallel, efficient, and fast data operations. Our sch… ▽ More There has been growing interest in using photonic processors for performing neural network inference operations; however, these networks are currently trained using standard digital electronics. Here, we propose on-chip training of neural networks enabled by a CMOS-compatible silicon photonic architecture to harness the potential for massively parallel, efficient, and fast data operations. Our scheme employs the direct feedback alignment training algorithm, which trains neural networks using error feedback rather than error backpropagation, and can operate at speeds of trillions of multiply-accumulate (MAC) operations per second while consuming less than one picojoule per MAC operation. The photonic architecture exploits parallelized matrix-vector multiplications using arrays of microring resonators for processing multi-channel analog signals along single waveguide buses to calculate the gradient vector for each neural network layer in situ. We also experimentally demonstrate training deep neural networks with the MNIST dataset using on-chip MAC operation results. Our novel approach for efficient, ultra-fast neural network training showcases photonics as a promising platform for executing AI applications. △ Less

Submitted 18 August, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

Comments: 15 pages, 6 figures

Journal ref: Optica 9, 1323-1332 (2022)

arXiv:2105.09943 [pdf, other]

doi 10.1080/23746149.2021.1981155

Prospects and applications of photonic neural networks

Authors: Chaoran Huang, Volker J. Sorger, Mario Miscuglio, Mohammed Al-Qadasi, Avilash Mukherjee, Sudip Shekhar, Lukas Chrostowski, Lutz Lampe, Mitchell Nichols, Mable P. Fok, Daniel Brunner, Alexander N. Tait, Thomas Ferreira de Lima, Bicky A. Marquez, Paul R. Prucnal, Bhavin J. Shastri

Abstract: Neural networks have enabled applications in artificial intelligence through machine learning, and neuromorphic computing. Software implementations of neural networks on conventional computers that have separate memory and processor (and that operate sequentially) are limited in speed and energy efficiency. Neuromorphic engineering aims to build processors in which hardware mimics neurons and syna… ▽ More Neural networks have enabled applications in artificial intelligence through machine learning, and neuromorphic computing. Software implementations of neural networks on conventional computers that have separate memory and processor (and that operate sequentially) are limited in speed and energy efficiency. Neuromorphic engineering aims to build processors in which hardware mimics neurons and synapses in the brain for distributed and parallel processing. Neuromorphic engineering enabled by photonics (optical physics) can offer sub-nanosecond latencies and high bandwidth with low energies to extend the domain of artificial intelligence and neuromorphic computing applications to machine learning acceleration, nonlinear programming, intelligent signal processing, etc. Photonic neural networks have been demonstrated on integrated platforms and free-space optics depending on the class of applications being targeted. Here, we discuss the prospects and demonstrated applications of these photonic neural networks. △ Less

Submitted 20 May, 2021; originally announced May 2021.

arXiv:2102.10398 [pdf]

All-Chalcogenide Programmable All-Optical Deep Neural Networks

Authors: Ting Yu, Xiaoxuan Ma, Ernest Pastor, Jonathan K. George, Simon Wall, Mario Miscuglio, Robert E. Simpson, Volker J. Sorger

Abstract: Deeplearning algorithms are revolutionising many aspects of modern life. Typically, they are implemented in CMOS-based hardware with severely limited memory access times and inefficient data-routing. All-optical neural networks without any electro-optic conversions could alleviate these shortcomings. However, an all-optical nonlinear activation function, which is a vital building block for optical… ▽ More Deeplearning algorithms are revolutionising many aspects of modern life. Typically, they are implemented in CMOS-based hardware with severely limited memory access times and inefficient data-routing. All-optical neural networks without any electro-optic conversions could alleviate these shortcomings. However, an all-optical nonlinear activation function, which is a vital building block for optical neural networks, needs to be developed efficiently on-chip. Here, we introduce and demonstrate both optical synapse weighting and all-optical nonlinear thresholding using two different effects in a chalcogenide material photonic platform. We show how the structural phase transitions in a wide-bandgap phase-change material enables storing the neural network weights via non-volatile photonic memory, whilst resonant bond destabilisation is used as a nonlinear activation threshold without changing the material. These two different transitions within chalcogenides enable programmable neural networks with near-zero static power consumption once trained, in addition to picosecond delays performing inference tasks not limited by wire charging that limit electrical circuits; for instance, we show that nanosecond-order weight programming and near-instantaneous weight updates enable accurate inference tasks within 20 picoseconds in a 3-layer all-optical neural network. Optical neural networks that bypass electro-optic conversion altogether hold promise for network-edge machine learning applications where decision-making in real-time are critical, such as for autonomous vehicles or navigation systems such as signal pre-processing of LIDAR systems. △ Less

Submitted 27 February, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

arXiv:2011.07391 [pdf, other]

Channel Tiling for Improved Performance and Accuracy of Optical Neural Network Accelerators

Authors: Shurui Li, Mario Miscuglio, Volker J. Sorger, Puneet Gupta

Abstract: Low latency, high throughput inference on Convolution Neural Networks (CNNs) remains a challenge, especially for applications requiring large input or large kernel sizes. 4F optics provides a solution to accelerate CNNs by converting convolutions into Fourier-domain point-wise multiplications that are computationally 'free' in optical domain. However, existing 4F CNN systems suffer from the all-po… ▽ More Low latency, high throughput inference on Convolution Neural Networks (CNNs) remains a challenge, especially for applications requiring large input or large kernel sizes. 4F optics provides a solution to accelerate CNNs by converting convolutions into Fourier-domain point-wise multiplications that are computationally 'free' in optical domain. However, existing 4F CNN systems suffer from the all-positive sensor readout issue which makes the implementation of a multi-channel, multi-layer CNN not scalable or even impractical. In this paper we propose a simple channel tiling scheme for 4F CNN systems that utilizes the high resolution of 4F system to perform channel summation inherently in optical domain before sensor detection, so the outputs of different channels can be correctly accumulated. Compared to state of the art, channel tiling gives similar accuracy, significantly better robustness to sensing quantization (33\% improvement in required sensing precision) error and noise (10dB reduction in tolerable sensing noise), 0.5X total filters required, 10-50X+ throughput improvement and as much as 3X reduction in required output camera resolution/bandwidth. Not requiring any additional optical hardware, the proposed channel tiling approach addresses an important throughput and precision bottleneck of high-speed, massively-parallel optical 4F computing systems. △ Less

Submitted 14 January, 2021; v1 submitted 14 November, 2020; originally announced November 2020.

Comments: 11 pages, 8 figures

arXiv:2007.05380 [pdf]

Analog Computing with Metatronic Circuits

Authors: Mario Miscuglio, Yaliang Gui, Xiaoxuan Ma, Shuai Sun, Tarek El-Ghazawi, Tatsuo Itoh, Andrea Alù, Volker J. Sorger

Abstract: Analog photonic solutions offer unique opportunities to address complex computational tasks with unprecedented performance in terms of energy dissipation and speeds, overcoming current limitations of modern computing architectures based on electron flows and digital approaches. The lack of modularization and lumped element reconfigurability in photonics has prevented the transition to an all-optic… ▽ More Analog photonic solutions offer unique opportunities to address complex computational tasks with unprecedented performance in terms of energy dissipation and speeds, overcoming current limitations of modern computing architectures based on electron flows and digital approaches. The lack of modularization and lumped element reconfigurability in photonics has prevented the transition to an all-optical analog computing platform. Here, we explore a nanophotonic platform based on epsilon-near-zero materials capable of solving in the analog domain partial differential equations (PDE). Wavelength stretching in zero-index media enables highly nonlocal interactions within the board based on the conduction of electric displacement, which can be monitored to extract the solution of a broad class of PDE problems. By exploiting control of deposition technique through process parameters, we demonstrate the possibility of implementing the proposed nano-optic processor using CMOS-compatible indium-tin-oxide, whose optical properties can be tuned by carrier injection to obtain programmability at high speeds and low energy requirements. Our nano-optical analog processor can be integrated at chip-scale, processing arbitrary inputs at the speed of light. △ Less

Submitted 10 July, 2020; originally announced July 2020.

arXiv:2006.08533 [pdf, other]

A Design Methodology for Post-Moore's Law Accelerators: The Case of a Photonic Neuromorphic Processor

Authors: Armin Mehrabian, Volker J. Sorger, Tarek El-Ghazawi

Abstract: Over the past decade alternative technologies have gained momentum as conventional digital electronics continue to approach their limitations, due to the end of Moore's Law and Dennard Scaling. At the same time, we are facing new application challenges such as those due to the enormous increase in data. The attention, has therefore, shifted from homogeneous computing to specialized heterogeneous s… ▽ More Over the past decade alternative technologies have gained momentum as conventional digital electronics continue to approach their limitations, due to the end of Moore's Law and Dennard Scaling. At the same time, we are facing new application challenges such as those due to the enormous increase in data. The attention, has therefore, shifted from homogeneous computing to specialized heterogeneous solutions. As an example, brain-inspired computing has re-emerged as a viable solution for many applications. Such new processors, however, have widened the abstraction gamut from device level to applications. Therefore, efficient abstractions that can provide vertical design-flow tools for such technologies became critical. Photonics in general, and neuromorphic photonics in particular, are among the promising alternatives to electronics. While the arsenal of device level toolbox for photonics, and high-level neural network platforms are rapidly expanding, there has not been much work to bridge this gap. Here, we present a design methodology to mitigate this problem by extending high-level hardware-agnostic neural network design tools with functional and performance models of photonic components. In this paper we detail this tool and methodology by using design examples and associated results. We show that adopting this approach enables designers to efficiently navigate the design space and devise hardware-aware systems with alternative technologies. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: 4 pages, 4 figures

ACM Class: C.1.4; C.1.m; C.3; D.2.2; I.2; I.2.11; I.2.m; J.6

arXiv:2002.03780 [pdf]

doi 10.1063/5.0001942

Photonic tensor cores for machine learning

Authors: Mario Miscuglio, Volker J. Sorger

Abstract: With an ongoing trend in computing hardware towards increased heterogeneity, domain-specific co-processors are emerging as alternatives to centralized paradigms. The tensor core unit (TPU) has shown to outperform graphic process units by almost 3-orders of magnitude enabled by higher signal throughout and energy efficiency. In this context, photons bear a number of synergistic physical properties… ▽ More With an ongoing trend in computing hardware towards increased heterogeneity, domain-specific co-processors are emerging as alternatives to centralized paradigms. The tensor core unit (TPU) has shown to outperform graphic process units by almost 3-orders of magnitude enabled by higher signal throughout and energy efficiency. In this context, photons bear a number of synergistic physical properties while phase-change materials allow for local nonvolatile mnemonic functionality in these emerging distributed non van-Neumann architectures. While several photonic neural network designs have been explored, a photonic TPU to perform matrix vector multiplication and summation is yet outstanding. Here we introduced an integrated photonics-based TPU by strategically utilizing a) photonic parallelism via wavelength division multiplexing, b) high 2 Peta-operations-per second throughputs enabled by 10s of picosecond-short delays from optoelectronics and compact photonic integrated circuitry, and c) zero power-consuming novel photonic multi-state memories based on phase-change materials featuring vanishing losses in the amorphous state. Combining these physical synergies of material, function, and system, we show that the performance of this 8-bit photonic TPU can be 2-3 orders higher compared to an electrical TPU whilst featuring similar chip areas. This work shows that photonic specialized processors have the potential to augment electronic systems and may perform exceptionally well in network-edge devices in the looming 5G networks and beyond. △ Less

Submitted 29 June, 2020; v1 submitted 1 February, 2020; originally announced February 2020.

Journal ref: Applied Physics Reviews 7, 031404 (2020)

arXiv:2002.01308

Integrated Photonic FFT for Optical Convolutions towards Efficient and High-Speed Neural Networks

Authors: Moustafa Ahmed, Yas Al-Hadeethi, Ahmed Bakry, Hamed Dalir, Volker J. Sorger

Abstract: The technologically-relevant task of feature extraction from data performed in deep-learning systems is routinely accomplished as repeated fast Fourier transforms (FFT) electronically in prevalent domain-specific architectures such as in graphics processing units (GPUs). However, electronics systems are limited with respect to power dissipation and delay, both, due to wire-charging challenges rela… ▽ More The technologically-relevant task of feature extraction from data performed in deep-learning systems is routinely accomplished as repeated fast Fourier transforms (FFT) electronically in prevalent domain-specific architectures such as in graphics processing units (GPUs). However, electronics systems are limited with respect to power dissipation and delay, both, due to wire-charging challenges related to interconnect capacitance. Here we present a silicon photonics-based architecture for convolutional neural networks that harnesses the phase property of light to perform FFTs efficiently by executing the convolution as a multiplication in the Fourier-domain. The algorithmic executing time is determined by the time-of-flight of the signal through this photonic reconfigurable passive FFT filter circuit and is on the order of 10s of picosecond. A sensitivity analysis shows that this optical processor must be thermally phase stabilized corresponding to a few degrees. Furthermore, we find that for a small sample number, the obtainable number of convolutions per {time-power-chip area) outperforms GPUs by about 2 orders of magnitude. Lastly, we show that, conceptually, the optical FFT and convolution-processing performance is indeed directly linked to optoelectronic device-level, and improvements in plasmonics, metamaterials or nanophotonics are fueling next generation densely interconnected intelligent photonic circuits with relevance for edge-computing 5G networks. △ Less

Submitted 23 March, 2020; v1 submitted 1 February, 2020; originally announced February 2020.

Comments: Revisions required

arXiv:1906.10487 [pdf, other]

A Winograd-based Integrated Photonics Accelerator for Convolutional Neural Networks

Authors: Armin Mehrabian, Mario Miscuglio, Yousra Alkabani, Volker J. Sorger, Tarek El-Ghazawi

Abstract: Neural Networks (NNs) have become the mainstream technology in the artificial intelligence (AI) renaissance over the past decade. Among different types of neural networks, convolutional neural networks (CNNs) have been widely adopted as they have achieved leading results in many fields such as computer vision and speech recognition. This success in part is due to the widespread availability of cap… ▽ More Neural Networks (NNs) have become the mainstream technology in the artificial intelligence (AI) renaissance over the past decade. Among different types of neural networks, convolutional neural networks (CNNs) have been widely adopted as they have achieved leading results in many fields such as computer vision and speech recognition. This success in part is due to the widespread availability of capable underlying hardware platforms. Applications have always been a driving factor for design of such hardware architectures. Hardware specialization can expose us to novel architectural solutions, which can outperform general purpose computers for tasks at hand. Although different applications demand for different performance measures, they all share speed and energy efficiency as high priorities. Meanwhile, photonics processing has seen a resurgence due to its inherited high speed and low power nature. Here, we investigate the potential of using photonics in CNNs by proposing a CNN accelerator design based on Winograd filtering algorithm. Our evaluation results show that while a photonic accelerator can compete with current-state-of-the-art electronic platforms in terms of both speed and power, it has the potential to improve the energy efficiency by up to three orders of magnitude. △ Less

Submitted 4 December, 2019; v1 submitted 25 June, 2019; originally announced June 2019.

Comments: 12 pages, photonics, artificial intelligence, convolutional neural networks, Winograd

MSC Class: B.0; B.7; C.1; C.1.2; C.1.4; C.3; C.5; I.2; I.2.5; I.2.10; I.2.11; I.4; I.5; I.5.2; I.5.4; I.5.5; I.6; I.6.3 ACM Class: B.0; B.7; C.1; C.1.2; C.1.4; C.3; C.5; I.2; I.2.5; I.2.10; I.2.11; I.4; I.5; I.5.2; I.5.4; I.5.5; I.6; I.6.3

arXiv:1807.08792 [pdf, other]

doi 10.1109/SOCC.2018.8618542

PCNNA: A Photonic Convolutional Neural Network Accelerator

Authors: Armin Mehrabian, Yousra Al-Kabani, Volker J Sorger, Tarek El-Ghazawi

Abstract: Convolutional Neural Networks (CNN) have been the centerpiece of many applications including but not limited to computer vision, speech processing, and Natural Language Processing (NLP). However, the computationally expensive convolution operations impose many challenges to the performance and scalability of CNNs. In parallel, photonic systems, which are traditionally employed for data communicati… ▽ More Convolutional Neural Networks (CNN) have been the centerpiece of many applications including but not limited to computer vision, speech processing, and Natural Language Processing (NLP). However, the computationally expensive convolution operations impose many challenges to the performance and scalability of CNNs. In parallel, photonic systems, which are traditionally employed for data communication, have enjoyed recent popularity for data processing due to their high bandwidth, low power consumption, and reconfigurability. Here we propose a Photonic Convolutional Neural Network Accelerator (PCNNA) as a proof of concept design to speedup the convolution operation for CNNs. Our design is based on the recently introduced silicon photonic microring weight banks, which use broadcast-and-weight protocol to perform Multiply And Accumulate (MAC) operation and move data through layers of a neural network. Here, we aim to exploit the synergy between the inherent parallelism of photonics in the form of Wavelength Division Multiplexing (WDM) and sparsity of connections between input feature maps and kernels in CNNs. While our full system design offers up to more than 3 orders of magnitude speedup in execution time, its optical core potentially offers more than 5 order of magnitude speedup compared to state-of-the-art electronic counterparts. △ Less

Submitted 23 July, 2018; originally announced July 2018.

Comments: 5 Pages, 6 Figures, IEEE SOCC 2018

arXiv:1804.02389

Energy-Quality Scaling in Analog Mesh Computers

Authors: Jeff Anderson, Engin Kayraklioglu, Vikram Narayana, Volker Sorger, Tarek El-Ghazawi

Abstract: The recent push for post-Moore computer architectures has introduced a wide variety of application-specific accelerators. One particular accelerator, the resistance network analogue, has been well received due to its ability to efficiently solve partial differential equations by eliminating the iterative stages required by today's numerical solvers. However, in the ago of programmable integrated c… ▽ More The recent push for post-Moore computer architectures has introduced a wide variety of application-specific accelerators. One particular accelerator, the resistance network analogue, has been well received due to its ability to efficiently solve partial differential equations by eliminating the iterative stages required by today's numerical solvers. However, in the ago of programmable integrated circuits, the static nature of the resistance network analogue, and other analog mesh computers like it, has relegated it to an academic curiosity. Recent developments in materials, such as the memristor, have made the resistance network analogue viable for inclusion in future heterogeneous computer architectures. However, selection of an appropriate sized mesh to be incorporated into a computer system requires that energy-quality trade-offs are made regarding the problem size and required resolution of the solution. This paper provides an in-depth study of the scaling of analog mesh computer hardware, from the perspective of energy per bit and required resolution, introduces a metric to aid in quantifying analog mesh computers with different parameters, and introduces a method of virtualization which enables an analog mesh computer of a fixed size to approximate the calculations of a larger-sized mesh. △ Less

Submitted 18 November, 2018; v1 submitted 5 April, 2018; originally announced April 2018.

Comments: large simulation error effectively nullifies results

arXiv:1712.00049 [pdf]

Integrated Nanophotonics Architecture for Residue Number System Arithmetic

Authors: Jiaxin Peng, Shuai Sun, Vikram K. Narayana, Volker J. Sorger, Tarek El-Ghazawi

Abstract: Residue number system (RNS) enables dimensionality reduction of an arithmetic problem by representing a large number as a set of smaller integers, where the number is decomposed by prime number factorization using the moduli as basic functions. These reduced problem sets can then be processed independently and in parallel, thus improving computational efficiency and speed. Here we show an optical… ▽ More Residue number system (RNS) enables dimensionality reduction of an arithmetic problem by representing a large number as a set of smaller integers, where the number is decomposed by prime number factorization using the moduli as basic functions. These reduced problem sets can then be processed independently and in parallel, thus improving computational efficiency and speed. Here we show an optical RNS hardware representation based on integrated nanophotonics. The digit-wise shifting in RNS arithmetic is expressed as spatial routing of an optical signal in 2x2 hybrid photonic-plasmonic switches. Here the residue is represented by spatially shifting the input waveguides relative to the routers outputs, where the moduli are represented by the number of waveguides. By cascading the photonic 2x2 switches, we design a photonic RNS adder and a multiplier forming an all-to-all sparse directional network. The advantage of this photonic arithmetic processor is the short (10's ps) computational execution time given by the optical propagation delay through the integrated nanophotonic router. Furthermore, we show how photonic processing in-the-network leverages the natural parallelism of optics such as wavelength-division-multiplexing or optical angular momentum in this RNS processor. A key application for photonic RNS is the functional analysis convolution with widespread usage in numerical linear algebra, computer vision, language- image- and signal processing, and neural networks. △ Less

Submitted 30 November, 2017; originally announced December 2017.

Comments: 7 pages, 5 figures

arXiv:1709.02684 [pdf]

Identifying Mirror Symmetry Density with Delay in Spiking Neural Networks

Authors: Jonathan K. George, Cesare Soci, Volker J. Sorger

Abstract: The ability to rapidly identify symmetry and anti-symmetry is an essential attribute of intelligence. Symmetry perception is a central process in human vision and may be key to human 3D visualization. While previous work in understanding neuron symmetry perception has concentrated on the neuron as an integrator, here we show how the coincidence detecting property of the spiking neuron can be used… ▽ More The ability to rapidly identify symmetry and anti-symmetry is an essential attribute of intelligence. Symmetry perception is a central process in human vision and may be key to human 3D visualization. While previous work in understanding neuron symmetry perception has concentrated on the neuron as an integrator, here we show how the coincidence detecting property of the spiking neuron can be used to reveal symmetry density in spatial data. We develop a method for synchronizing symmetry-identifying spiking artificial neural networks to enable layering and feedback in the network. We show a method for building a network capable of identifying symmetry density between sets of data and present a digital logic implementation demonstrating an 8x8 leaky-integrate-and-fire symmetry detector in a field programmable gate array. Our results show that the efficiencies of spiking neural networks can be harnessed to rapidly identify symmetry in spatial data with applications in image processing, 3D computer vision, and robotics. △ Less

Submitted 25 August, 2017; originally announced September 2017.

Comments: 8 pages, 8 figures

arXiv:1708.09534 [pdf]

Towards On-Chip Optical FFTs for Convolutional Neural Networks

Authors: Jonathan George, Hani Nejadriahi, Volker Sorger

Abstract: Convolutional neural networks have become an essential element of spatial deep learning systems. In the prevailing architecture, the convolution operation is performed with Fast Fourier Transforms (FFT) electronically in GPUs. The parallelism of GPUs provides an efficiency over CPUs, however both approaches being electronic are bound by the speed and power limits of the interconnect delay inside t… ▽ More Convolutional neural networks have become an essential element of spatial deep learning systems. In the prevailing architecture, the convolution operation is performed with Fast Fourier Transforms (FFT) electronically in GPUs. The parallelism of GPUs provides an efficiency over CPUs, however both approaches being electronic are bound by the speed and power limits of the interconnect delay inside the circuits. Here we present a silicon photonics based architecture for convolutional neural networks that harnesses the phase property of light to perform FFTs efficiently. Our all-optical FFT is based on nested Mach-Zender Interferometers, directional couplers, and phase shifters, with backend electro-optic modulators for sampling. The FFT delay depends only on the propagation delay of the optical signal through the silicon photonics structures. Designing and analyzing the performance of a convolutional neural network deployed with our on-chip optical FFT, we find dramatic improvements by up to 10^4 when compared to state-of-the-art GPUs when exploring a compounded figure-of-merit given by power per convolution over area. At a high level, this performance is enabled by mapping the desired mathematical function, an FFT, synergistically onto hardware, in this case optical delay interferometers. △ Less

Submitted 30 August, 2017; originally announced August 2017.

arXiv:1708.06721 [pdf, other]

D3NOC: Dynamic Data-Driven Network On Chip in Photonic Electronic Hybrids

Authors: Armin Mehrabian, Shuai Sun, Vikram K. Narayana, Volker J. Sorger, Tarek El-Ghazawi

Abstract: In this paper, we present a reconfigurable hybrid Photonic-Plasmonic Network-on-Chip (NoC) based on the Dynamic Data Driven Application System (DDDAS) paradigm. In DDDAS computations and measurements form a dynamic closed feedback loop in which they tune one another in response to changes in the environment. Our proposed system enables dynamic augmentation of a base electrical mesh topology with a… ▽ More In this paper, we present a reconfigurable hybrid Photonic-Plasmonic Network-on-Chip (NoC) based on the Dynamic Data Driven Application System (DDDAS) paradigm. In DDDAS computations and measurements form a dynamic closed feedback loop in which they tune one another in response to changes in the environment. Our proposed system enables dynamic augmentation of a base electrical mesh topology with an optical express bus during the run-time. In addition, the measurement process itself adjusts to the environment. In order to achieve lower latencies, lower dynamic power, and higher throughput, we take advantage of a Configurable Hybrid Photonic Plasmonic Interconnect (CHyPPI) for our reconfigurable connections. We evaluate the performance and power of our system against kernels from NAS Parallel Benchmark (NPB) in addition to some synthetically generated traffic. In comparison to a 16x16 base electrical mesh, D3NOC shows up to 89% latency and 67% dynamic power net improvements beyond overhead-corrected performance. It should be noted that the design-space of NoC reconfiguration is vast and the goal of this study is not design-space exploration. Our goal is to show the potentials of adaptive dynamic measurements when coupled with other reconfiguration techniques in the NoC context. △ Less

Submitted 22 August, 2017; originally announced August 2017.

Comments: 8 pages

arXiv:1703.04646 [pdf, other]

doi 10.1109/ICPP.2017.22

HyPPI NoC: Bringing Hybrid Plasmonics to an Opto-Electronic Network-on-Chip

Authors: Vikram K. Narayana, Shuai Sun, Armin Mehrabian, Volker J. Sorger, Tarek El-Ghazawi

Abstract: As we move towards an era of hundreds of cores, the research community has witnessed the emergence of opto-electronic network on-chip designs based on nanophotonics, in order to achieve higher network throughput, lower latencies, and lower dynamic power. However, traditional nanophotonics options face limitations such as large device footprints compared with electronics, higher static power due to… ▽ More As we move towards an era of hundreds of cores, the research community has witnessed the emergence of opto-electronic network on-chip designs based on nanophotonics, in order to achieve higher network throughput, lower latencies, and lower dynamic power. However, traditional nanophotonics options face limitations such as large device footprints compared with electronics, higher static power due to continuous laser operation, and an upper limit on achievable data rates due to large device capacitances. Nanoplasmonics is an emerging technology that has the potential for providing transformative gains on multiple metrics due to its potential to increase the light-matter interaction. In this paper, we propose and analyze a hybrid opto-electric NoC that incorporates Hybrid Plasmonics Photonics Interconnect (HyPPI), an optical interconnect that combines photonics with plasmonics. We explore various opto-electronic network hybridization options by augmenting a mesh network with HyPPI links, and compare them with the equivalent options afforded by conventional nanophotonics as well as pure electronics. Our design space exploration indicates that augmenting an electronic NoC with HyPPI gives a performance to cost ratio improvement of up to 1.8x. To further validate our estimates, we conduct trace based simulations using the NAS Parallel Benchmark suite. These benchmarks show latency improvements up to 1.64x, with negligible energy increase. We then further carry out performance and cost projections for fully optical NoCs, using HyPPI as well as conventional nanophotonics. These futuristic projections indicate that all-HyPPI NoCs would be two orders more energy efficient than electronics, and two orders more area efficient than all-photonic NoCs. △ Less

Submitted 14 March, 2017; originally announced March 2017.

Comments: 10 pages, 8 figures

ACM Class: B.4.3; B.4.4; C.1.2

arXiv:1701.05930 [pdf, other]

doi 10.1016/j.micpro.2017.03.006

MorphoNoC: Exploring the Design Space of a Configurable Hybrid NoC using Nanophotonics

Authors: Vikram K. Narayana, Shuai Sun, Abdel-Hameed A. Badawy, Volker J. Sorger, Tarek El-Ghazawi

Abstract: As diminishing feature sizes drive down the energy for computations, the power budget for on-chip communication is steadily rising. Furthermore, the increasing number of cores is placing a huge performance burden on the network-on-chip (NoC) infrastructure. While NoCs are designed as regular architectures that allow scaling to hundreds of cores, the lack of a flexible topology gives rise to higher… ▽ More As diminishing feature sizes drive down the energy for computations, the power budget for on-chip communication is steadily rising. Furthermore, the increasing number of cores is placing a huge performance burden on the network-on-chip (NoC) infrastructure. While NoCs are designed as regular architectures that allow scaling to hundreds of cores, the lack of a flexible topology gives rise to higher latencies, lower throughput, and increased energy costs. In this paper, we explore MorphoNoCs - scalable, configurable, hybrid NoCs obtained by extending regular electrical networks with configurable nanophotonic links. In order to design MorphoNoCs, we first carry out a detailed study of the design space for Multi-Write Multi-Read (MWMR) nanophotonics links. After identifying optimum design points, we then discuss the router architecture for deploying them in hybrid electronic-photonic NoCs. We then study explore the design space at the network level, by varying the waveguide lengths and the number of hybrid routers. This affords us to carry out energy-latency trade-offs. For our evaluations, we adopt traces from synthetic benchmarks as well as the NAS Parallel Benchmark suite. Our results indicate that MorphoNoCs can achieve latency improvements of up to 3.0x or energy improvements of up to 1.37x over the base electronic network. △ Less

Submitted 14 March, 2017; v1 submitted 12 December, 2016; originally announced January 2017.

Comments: 14 pages, 15 figures

arXiv:1612.02898 [pdf]

Moore's Law in CLEAR Light

Authors: Shuai Sun, Vikram K. Narayana, Tarek El-Ghazawi, Volker J. Sorger

Abstract: The inability of Moore's Law and other figure-of-merits (FOMs) to accurately explain the technology development of the semiconductor industry demands a holistic merit to guide the industry. Here we introduce a FOM termed CLEAR that accurately postdicts technology developments since the 1940's until today, and predicts photonics as a logical extension to keep-up the pace of information-handling mac… ▽ More The inability of Moore's Law and other figure-of-merits (FOMs) to accurately explain the technology development of the semiconductor industry demands a holistic merit to guide the industry. Here we introduce a FOM termed CLEAR that accurately postdicts technology developments since the 1940's until today, and predicts photonics as a logical extension to keep-up the pace of information-handling machines. We show that CLEAR (Capability-to-Latency-Energy-Amount-Resistance) is multi-hierarchical applying to the device, interconnect, and system level. Being a holistic FOM, we show that empirical trends such as Moore's Law and the Makimoto's wave are special cases of the universal CLEAR merit. Looking ahead, photonic board- and chip-level technologies are able to continue the observed doubling rate of the CLEAR value every 12 months, while electronic technologies are unable to keep pace. △ Less

Submitted 8 December, 2016; originally announced December 2016.

Comments: 10 pages, 2 figures

arXiv:1612.02486 [pdf]

A Universal Multi-Hierarchy Figure-of-Merit for On-Chip Computing and Communications

Authors: Shuai Sun, Vikram K. Narayana, Armin Mehrabian, Tarek El-Ghazawi, Volker J. Sorger

Abstract: Continuing demands for increased compute efficiency and communication bandwidth have led to the development of novel interconnect technologies with the potential to outperform conventional electrical interconnects. With a plurality of interconnect technologies to include electronics, photonics, plasmonics, and hybrids thereof, the simple approach of counting on-chip devices to capture performance… ▽ More Continuing demands for increased compute efficiency and communication bandwidth have led to the development of novel interconnect technologies with the potential to outperform conventional electrical interconnects. With a plurality of interconnect technologies to include electronics, photonics, plasmonics, and hybrids thereof, the simple approach of counting on-chip devices to capture performance is insufficient. While some efforts have been made to capture the performance evolution more accurately, they eventually deviate from the observed development pace. Thus, a holistic figure of merit (FOM) is needed to adequately compare these recent technology paradigms. Here we introduce the Capability-to-Latency-Energy-Amount-Resistance (CLEAR) FOM derived from device and link performance criteria of both active optoelectronic devices and passive components alike. As such CLEAR incorporates communication delay, energy efficiency, on-chip scaling and economic cost. We show that CLEAR accurately describes compute development including most recent machines. Since this FOM is derived bottom-up, we demonstrate remarkable adaptability to applications ranging from device-level to network and system-level. Applying CLEAR to benchmark device, link, and network performance against fundamental physical compute and communication limits shows that photonics is competitive even for fractions of the die-size, thus making a case for on-chip optical interconnects. △ Less

Submitted 7 December, 2016; originally announced December 2016.

Comments: 10 pages

Showing 1–26 of 26 results for author: Sorger, V