-
Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics
Authors:
George Michelogiannakis,
Yehia Arafa,
Brandon Cook,
Liang Yuan Dai,
Abdel Hameed Badawy,
Madeleine Glick,
Yuyang Wang,
Keren Bergman,
John Shalf
Abstract:
The diversity of workload requirements and increasing hardware heterogeneity in emerging high performance computing (HPC) systems motivate resource disaggregation. Resource disaggregation allows compute and memory resources to be allocated individually as required to each workload. However, it is unclear how to efficiently realize this capability and cost-effectively meet the stringent bandwidth a…
▽ More
The diversity of workload requirements and increasing hardware heterogeneity in emerging high performance computing (HPC) systems motivate resource disaggregation. Resource disaggregation allows compute and memory resources to be allocated individually as required to each workload. However, it is unclear how to efficiently realize this capability and cost-effectively meet the stringent bandwidth and latency requirements of HPC applications. To that end, we describe how modern photonics can be co-designed with modern HPC racks to implement flexible intra-rack resource disaggregation and fully meet the bit error rate (BER) and high escape bandwidth of all chip types in modern HPC racks. Our photonic-based disaggregated rack provides an average application speedup of 11% (46% maximum) for 25 CPU and 61% for 24 GPU benchmarks compared to a similar system that instead uses modern electronic switches for disaggregation. Using observed resource usage from a production system, we estimate that an iso-performance intra-rack disaggregated HPC system using photonics would require 4x fewer memory modules and 2x fewer NICs than a non-disaggregated baseline.
△ Less
Submitted 17 July, 2023; v1 submitted 9 January, 2023;
originally announced January 2023.
-
Massively Scalable Wavelength Diverse Integrated Photonic Linear Neuron
Authors:
Matthew van Niekerk,
Anthony Rizzo,
Hector Rubio Rivera,
Gerald Leake,
Daniel Coleman,
Christopher Tison,
Michael Fanto,
Keren Bergman,
Stefan Preble
Abstract:
As computing resource demands continue to escalate in the face of big data, cloud-connectivity and the internet of things, it has become imperative to develop new low-power, scalable architectures. Neuromorphic photonics, or photonic neural networks, have become a feasible solution for the physical implementation of efficient algorithms directly on-chip. This application is primarily due to the li…
▽ More
As computing resource demands continue to escalate in the face of big data, cloud-connectivity and the internet of things, it has become imperative to develop new low-power, scalable architectures. Neuromorphic photonics, or photonic neural networks, have become a feasible solution for the physical implementation of efficient algorithms directly on-chip. This application is primarily due to the linear nature of light and the scalability of silicon photonics, specifically leveraging the wide-scale complementary metal-oxide-semiconductor (CMOS) manufacturing infrastructure used to fabricate microelectronics chips. Current neuromorphic photonic implementations stem from two paradigms: wavelength coherent and incoherent. Here, we introduce a novel architecture that supports coherent and incoherent operation to increase the capability and capacity of photonic neural networks with a dramatic reduction in footprint compared to previous demonstrations. As a proof-of-principle, we experimentally demonstrate simple addition and subtraction operations on a foundry-fabricated silicon photonic chip. Additionally, we experimentally validate an on-chip network to predict the logical 2-bit gates AND, OR, and XOR to accuracies of $96.8\%, 99\%,$ and $98.5\%$, respectively. This architecture is compatible with highly wavelength parallel sources, enabling massively scalable photonic neural networks.
△ Less
Submitted 25 August, 2022; v1 submitted 11 May, 2022;
originally announced May 2022.
-
COUDER: Robust Topology Engineering for Optical Circuit Switched Data Center Networks
Authors:
Min Yee Teh,
Shizhen Zhao,
Peirui Cao,
Keren Bergman
Abstract:
Many optical circuit switched data center networks (DCN) have been proposed in the past to attain higher capacity and topology reconfigurability, though commercial adoption of these architectures have been minimal. One major challenge these architectures face is the difficulty of handling uncertain traffic demands using commercial optical circuit switches (OCS) with high switching latency. Prior w…
▽ More
Many optical circuit switched data center networks (DCN) have been proposed in the past to attain higher capacity and topology reconfigurability, though commercial adoption of these architectures have been minimal. One major challenge these architectures face is the difficulty of handling uncertain traffic demands using commercial optical circuit switches (OCS) with high switching latency. Prior works have generally focused on developing fast-switching OCS prototypes to quickly react to traffic changes through frequent reconfigurations. This approach, however, adds tremendous complexity to the control plane, and raises the barrier for commercial adoption of optical circuit switched data center networks.
We propose COUDER, a robust topology and routing optimization framework for reconfigurable optical circuit switched data centers. COUDER optimizes topology and routing based on a convex set of traffic matrices, and offers strict throughput guarantees for any future traffic matrices bounded by the convex set. For the bursty traffic demands that are unbounded by the convex set, we employ a desensitization technique to reduce performance hit. This enables COUDER to generate topology and routing solutions capable of handling unexpected traffic changes without relying on frequent topology reconfigurations. Our extensive evaluations based on Facebook's production DCN traces show that, even with daily reconfiguration, COUDER achieves about 20\% higher throughput, and about 32\% lower average hop count compared to cost-equivalent static topologies. Our work shows that adoption of reconfigurable topologies in commercial DCNs is feasible even without fast OCSs.
△ Less
Submitted 30 September, 2020;
originally announced October 2020.
-
Optically Connected Memory for Disaggregated Data Centers
Authors:
Jorge Gonzalez,
Alexander Gazman,
Maarten Hattink,
Mauricio G. Palma,
Meisam Bahadori,
Ruth Rubio-Noriega,
Lois Orosa,
Madeleine Glick,
Onur Mutlu,
Keren Bergman,
Rodolfo Azevedo
Abstract:
Recent advances in integrated photonics enable the implementation of reconfigurable, high-bandwidth, and low energy-per-bit interconnects in next-generation data centers. We propose and evaluate an Optically Connected Memory (OCM) architecture that disaggregates the main memory from the computation nodes in data centers. OCM is based on micro-ring resonators (MRRs), and it does not require any mod…
▽ More
Recent advances in integrated photonics enable the implementation of reconfigurable, high-bandwidth, and low energy-per-bit interconnects in next-generation data centers. We propose and evaluate an Optically Connected Memory (OCM) architecture that disaggregates the main memory from the computation nodes in data centers. OCM is based on micro-ring resonators (MRRs), and it does not require any modification to the DRAM memory modules. We calculate energy consumption from real photonic devices and integrate them into a system simulator to evaluate performance. Our results show that (1) OCM is capable of interconnecting four DDR4 memory channels to a computing node using two fibers with 1.07 pJ energy-per-bit consumption and (2) OCM performs up to 5.5x faster than a disaggregated memory with 40G PCIe NIC connectors to computing nodes.
△ Less
Submitted 24 August, 2020;
originally announced August 2020.
-
METTEOR: Robust Multi-Traffic Topology Engineering for Commercial Data Center Networks
Authors:
Min Yee Teh,
Shizhen Zhao,
Keren Bergman
Abstract:
Numerous optical circuit switched data center networks have been proposed over the past decade for higher capacity, though commercial adoption of these architectures have been minimal so far. One major challenge commonly facing these architectures is the difficulty of handling bursty traffic with optical circuit switches (OCS) with high switching latency. Prior works generally rely on fast-switchi…
▽ More
Numerous optical circuit switched data center networks have been proposed over the past decade for higher capacity, though commercial adoption of these architectures have been minimal so far. One major challenge commonly facing these architectures is the difficulty of handling bursty traffic with optical circuit switches (OCS) with high switching latency. Prior works generally rely on fast-switching OCS prototypes to better react to traffic changes via frequent reconfigurations. This approach, unfortunately, adds further complexity to the control plane. We propose METTEOR, an easily deployable solution for optical circuit switched data centers, that is designed for the current capabilities of commercial OCSs. Using multiple predicted traffic matrices, METTEOR designs data center topologies that are less sensitive to traffic changes, thus eliminating the need of frequently reconfiguring OCSs upon traffic changes. Results based on extensive evaluations using production traces show that METTEOR increases the percentage of direct-hop traffic by about 80% over a fat tree at comparable cost, and by about 30% over a uniform mesh, at comparable maximum link utilizations. Compared to ideal solutions that reconfigure OCSs on every traffic matrix, METTEOR achieves close-to-optimal bandwidth utilization even with biweekly reconfiguration. This drastically lowers the controller and management complexity needed to perform METTEOR in commercial settings.
△ Less
Submitted 2 February, 2020;
originally announced February 2020.
-
Optimization-based motion planning for multi-steered articulated vehicles
Authors:
Oskar Ljungqvist,
Kristoffer Bergman,
Daniel Axehill
Abstract:
The task of maneuvering a multi-steered articulated vehicle in confined environments is difficult even for experienced drivers. In this work, we present an optimization-based trajectory planner targeting low-speed maneuvers in unstructured environments for multi-steered N-trailer vehicles, which are comprised of a car-like tractor and an arbitrary number of interconnected trailers with fixed or st…
▽ More
The task of maneuvering a multi-steered articulated vehicle in confined environments is difficult even for experienced drivers. In this work, we present an optimization-based trajectory planner targeting low-speed maneuvers in unstructured environments for multi-steered N-trailer vehicles, which are comprised of a car-like tractor and an arbitrary number of interconnected trailers with fixed or steerable wheels. The proposed trajectory planning framework is divided into two steps, where a lattice-based trajectory planner is used in a first step to compute a resolution optimal solution to a discretized version of the trajectory planning problem. The output from the lattice planner is then used in a second step to initialize an optimal control problem solver, which enables the framework to compute locally optimal trajectories that start at the vehicle's initial state and reaches the goal state exactly. The performance of the proposed optimization-based trajectory planner is evaluated in a set of practically relevant scenarios for a multi-steered 3-trailer vehicle with a car-like tractor where the last trailer is steerable.
△ Less
Submitted 2 March, 2020; v1 submitted 12 December, 2019;
originally announced December 2019.
-
Software-Defined Silicon Photonics based Metro Node for Spatial and Wavelength Superchannel Switching
Authors:
Vidak Vujicic,
Aravind P. Anthur,
Alexander Gazman,
Colm Browning,
M. Deseada Gutierrez Pascual,
Ziyi Zhu,
Keren Bergman,
Liam P. Barry
Abstract:
Due to the growing popularity of optical superchannels and software defined networking, reconfigurable optical add-drop multiplexer (ROADM) architectures for superchannel switching have recently attracted significant attention. ROADMs based on micro electro-mechanical system (MEMS) and liquid crystal-on-silicon (LCoS) technologies are predominantly used. Motivated by requirements for low power, hi…
▽ More
Due to the growing popularity of optical superchannels and software defined networking, reconfigurable optical add-drop multiplexer (ROADM) architectures for superchannel switching have recently attracted significant attention. ROADMs based on micro electro-mechanical system (MEMS) and liquid crystal-on-silicon (LCoS) technologies are predominantly used. Motivated by requirements for low power, high-speed, small area footprint and compact switching solutions, we propose and demonstrate spatial and wavelength flexible superchannel switching using monolithically integrated silicon photonics (SiP) micro-ring resonators (MRR). We demonstrate the MRRs capabilities and potential to be used as a fundamental building block in ROADMs. Unicast and multicast switching operation of an entire superchannel is demonstrated after transmission over 50 km of standard single mode fiber. The performance of each sub-channel from the 120 Gb/s QPSK Nyquist superchannel is analyzed and degradation in error vector magnitude performance was observed for outer sub-channels due to the 3-dB bandwidth of the MRRs, which is comparable with the superchannel bandwidth. However, all sub-channels for all switching cases (unicast, multicast and bi-directional operation) exhibit performance far below the 7% FEC limit. The switching time of the SiP MRR chip is such that high capacity superchannel interconnects between users can be setup and reconfigured on the microsecond timescale.
△ Less
Submitted 9 February, 2017; v1 submitted 29 November, 2016;
originally announced December 2016.