Search | arXiv e-print repository

arXiv:2407.20773 [pdf]

UpDown: Programmable fine-grained Events for Scalable Performance on Irregular Applications

Authors: Andronicus Rajasukumar, Jiya Su, Yuqing, Wang, Tianshuo Su, Marziyeh Nourian, Jose M Monsalve Diaz, Tianchi Zhang, Jianru Ding, Wenyi Wang, Ziyi Zhang, Moubarak Jeje, Henry Hoffmann, Yanjing Li, Andrew A. Chien

Abstract: Applications with irregular data structures, data-dependent control flows and fine-grained data transfers (e.g., real-world graph computations) perform poorly on cache-based systems. We propose the UpDown accelerator that supports fine-grained execution with novel architecture mechanisms - lightweight threading, event-driven scheduling, efficient ultra-short threads, and split-transaction DRAM acc… ▽ More Applications with irregular data structures, data-dependent control flows and fine-grained data transfers (e.g., real-world graph computations) perform poorly on cache-based systems. We propose the UpDown accelerator that supports fine-grained execution with novel architecture mechanisms - lightweight threading, event-driven scheduling, efficient ultra-short threads, and split-transaction DRAM access with software-controlled synchronization. These hardware primitives support software programmable events, enabling high performance on diverse data structures and algorithms. UpDown also supports scalable performance; hardware replication enables programs to scale up performance. Evaluation results show UpDown's flexibility and scalability enable it to outperform CPUs on graph mining and analytics computations by up to 116-195x geomean speedup and more than 4x speedup over prior accelerators. We show that UpDown generates high memory parallelism (~4.6x over CPU) required for memory intensive graph computations. We present measurements that attribute the performance of UpDown (23x architectural advantage) to its individual architectural mechanisms. Finally, we also analyze the area and power cost of UpDown's mechanisms for software programmability. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: 14 pages, 23 figures

arXiv:2407.00123 [pdf, other]

Modeling Performance of Data Collection Systems for High-Energy Physics

Authors: Wilkie Olin-Ammentorp, Xingfu Wu, Andrew A. Chien

Abstract: Exponential increases in scientific experimental data are outstripping the rate of progress in silicon technology. As a result, heterogeneous combinations of architectures and process or device technologies are increasingly important to meet the computing demands of future scientific experiments. However, the complexity of heterogeneous computing systems requires systematic modeling to understand… ▽ More Exponential increases in scientific experimental data are outstripping the rate of progress in silicon technology. As a result, heterogeneous combinations of architectures and process or device technologies are increasingly important to meet the computing demands of future scientific experiments. However, the complexity of heterogeneous computing systems requires systematic modeling to understand performance. We present a model which addresses this need by framing key aspects of data collection pipelines and constraints, and combines them with the important vectors of technology that shape alternatives, computing metrics that allow complex alternatives to be compared. For instance, a data collection pipeline may be characterized by parameters such as sensor sampling rates, amount of data collected, and the overall relevancy of retrieved samples. Alternatives to this pipeline are enabled by hardware development vectors including advancing CMOS, GPUs, neuromorphic computing, and edge computing. By calculating metrics for each alternative such as overall F1 score, power, hardware cost, and energy expended per relevant sample, this model allows alternate data collection systems to be rigorously compared. To demonstrate this model's capability, we apply it to the CMS experiment (and planned HL-LHC upgrade) to evaluate and compare the application of novel technologies in the data acquisition system (DAQ). We demonstrate that improvements to early stages in the DAQ are highly beneficial, greatly reducing the resources required at later stages of processing (such as a 60% power reduction) and increasing the amount of relevant data retrieved from the experiment per unit power (improving from 0.065 to 0.31 samples/kJ) However, we predict further advances will be required in order to meet overall power and cost constraints for the DAQ. △ Less

Submitted 27 June, 2024; originally announced July 2024.

Comments: 22 pages, 6 figures

arXiv:2311.11645 [pdf, other]

doi 10.1145/3632775.3661959

Exploding AI Power Use: an Opportunity to Rethink Grid Planning and Management

Authors: Liuzixuan Lin, Rajini Wijayawardana, Varsha Rao, Hai Nguyen, Wedan Emmanuel Gnibga, Andrew A. Chien

Abstract: The unprecedented rapid growth of computing demand for AI is projected to increase global annual datacenter (DC) growth from 7.2% to 11.3%. We project the 5-year AI DC demand for several power grids and assess whether they will allow desired AI growth (resource adequacy). If not, several "desperate measures" -- grid policies that enable more load growth and maintain grid reliability by sacrificing… ▽ More The unprecedented rapid growth of computing demand for AI is projected to increase global annual datacenter (DC) growth from 7.2% to 11.3%. We project the 5-year AI DC demand for several power grids and assess whether they will allow desired AI growth (resource adequacy). If not, several "desperate measures" -- grid policies that enable more load growth and maintain grid reliability by sacrificing new DC reliability are considered. We find that two DC hotspots -- EirGrid (Ireland) and Dominion (US) -- will have difficulty accommodating new DCs needed by the AI growth. In EirGrid, relaxing new DC reliability guarantees increases the power available to 1.6x--4.1x while maintaining 99.6% actual power availability for the new DCs, sufficient for the 5-year AI demand. In Dominion, relaxing reliability guarantees increases available DC capacity similarly (1.5x--4.6x) but not enough for the 5-year AI demand. New DCs only receive 89% power availability. Study of other US power grids -- SPP, CAISO, ERCOT -- shows that sufficient capacity exists for the projected AI load growth. Our results suggest the need to rethink adequacy assessment and also grid planning and management. New research opportunities include coordinated planning, reliability models that incorporate load flexibility, and adaptive load abstractions. △ Less

Submitted 30 April, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: Accepted by ACM e-Energy '24: the 15th ACM International Conference on Future and Sustainable Energy Systems

arXiv:2301.03148 [pdf, other]

doi 10.1145/3575813.3595197

Adapting Datacenter Capacity for Greener Datacenters and Grid

Authors: Liuzixuan Lin, Andrew A. Chien

Abstract: Cloud providers are adapting datacenter (DC) capacity to reduce carbon emissions. With hyperscale datacenters exceeding 100 MW individually, and in some grids exceeding 15% of power load, DC adaptation is large enough to harm power grid dynamics, increasing carbon emissions, power prices, or reduce grid reliability. To avoid harm, we explore coordination of DC capacity change varying scope in sp… ▽ More Cloud providers are adapting datacenter (DC) capacity to reduce carbon emissions. With hyperscale datacenters exceeding 100 MW individually, and in some grids exceeding 15% of power load, DC adaptation is large enough to harm power grid dynamics, increasing carbon emissions, power prices, or reduce grid reliability. To avoid harm, we explore coordination of DC capacity change varying scope in space and time. In space, coordination scope spans a single datacenter, a group of datacenters, and datacenters with the grid. In time, scope ranges from online to day-ahead. We also consider what DC and grid information is used (e.g. real-time and day-ahead average carbon, power price, and compute backlog). For example, in our proposed PlanShare scheme, each datacenter uses day-ahead information to create a capacity plan and shares it, allowing global grid optimization (over all loads, over entire day). We evaluate DC carbon emissions reduction. Results show that local coordination scope fails to reduce carbon emissions significantly (3.2%--5.4% reduction). Expanding coordination scope to a set of datacenters improves slightly (4.9%--7.3%). PlanShare, with grid-wide coordination and full-day capacity planning, performs the best. PlanShare reduces DC emissions by 11.6%--12.6%, 1.56x--1.26x better than the best local, online approach's results. PlanShare also achieves lower cost. We expect these advantages to increase as renewable generation in power grids increases. Further, a known full-day DC capacity plan provides a stable target for DC resource management. △ Less

Submitted 23 June, 2023; v1 submitted 8 January, 2023; originally announced January 2023.

Comments: Published at e-Energy '23: Proceedings of the 14th ACM International Conference on Future Energy Systems

arXiv:2204.07336 [pdf, ps, other]

Preparing for the Future -- Rethinking Proxy Apps

Authors: Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Ray Bair, Andrew A. Chien, Jeffrey S. Vetter, John Shalf

Abstract: A considerable amount of research and engineering went into designing proxy applications, which represent common high-performance computing workloads, to co-design and evaluate the current generation of supercomputers, e.g., RIKEN's Supercomputer Fugaku, ANL's Aurora, or ORNL's Frontier. This process was necessary to standardize the procurement while avoiding duplicated effort at each HPC center t… ▽ More A considerable amount of research and engineering went into designing proxy applications, which represent common high-performance computing workloads, to co-design and evaluate the current generation of supercomputers, e.g., RIKEN's Supercomputer Fugaku, ANL's Aurora, or ORNL's Frontier. This process was necessary to standardize the procurement while avoiding duplicated effort at each HPC center to develop their own benchmarks. Unfortunately, proxy applications force HPC centers and providers (vendors) into a an undesirable state of rigidity, in contrast to the fast-moving trends of current technology and future heterogeneity. To accommodate an extremely-heterogeneous future, we have to reconsider how to co-design supercomputers during the next decade, and avoid repeating the past mistakes. This position paper outlines the current state-of-the-art in system co-design, challenges encountered over the past years, and a proposed plan to move forward. △ Less

Submitted 15 April, 2022; originally announced April 2022.

arXiv:1607.02133 [pdf, other]

Extreme Scaling of Supercomputing with Stranded Power: Costs and Capabilities

Authors: Fan Yang, Andrew A. Chien

Abstract: Power consumption (supply, heat, cost) and associated carbon emissions (environmental impact) are increasingly critical challenges in scaling supercomputing to Exascale and beyond. We proposes to exploit stranded power, renewable energy that has no value to the power grid, for scaling supercomputers, Zero-Carbon Cloud (ZCCloud), and showing that stranded power can be employed effectively to expand… ▽ More Power consumption (supply, heat, cost) and associated carbon emissions (environmental impact) are increasingly critical challenges in scaling supercomputing to Exascale and beyond. We proposes to exploit stranded power, renewable energy that has no value to the power grid, for scaling supercomputers, Zero-Carbon Cloud (ZCCloud), and showing that stranded power can be employed effectively to expand computing [1]. We build on those results with a new analysis of stranded power, characterizing temporal, geographic, and interval properties. We simulate production supercomputing workloads and model datacenter total-cost-of-ownership (TCO), assessing the costs and capabilities of stranded-power based supercomputing. Results show that the ZCCloud approach is cost-effective today in regions with high cost power. The ZCCloud approach reduces TCO by 21-45%, and improves cost-effectiveness up to 34%. We study many scenarios. With higher power price, cheaper computing hardware and higher system power density, benefits rise to 55%, 97% and 116% respectively. Finally, we study future extreme-scale systems, showing that beyond terascale, projected power requirements in excess of 100MW make ZCCloud up to 45% lower cost, for a fixed budget, increase peak PFLOPS achievable by 80%. △ Less

Submitted 7 July, 2016; originally announced July 2016.

Comments: 12 pages, 22 figures

arXiv:1606.00350 [pdf, ps, other]

Data Centers as Dispatchable Loads to Harness Stranded Power

Authors: Kibaek Kim, Fan Yang, Victor M. Zavala, Andrew A. Chien

Abstract: We analyze how both traditional data center integration and dispatchable load integration affect power grid efficiency. We use detailed network models, parallel optimization solvers, and thousands of renewable generation scenarios to perform our analysis. Our analysis reveals that significant spillage and stranded power will be observed in power grids as wind power levels are increased. A counter-… ▽ More We analyze how both traditional data center integration and dispatchable load integration affect power grid efficiency. We use detailed network models, parallel optimization solvers, and thousands of renewable generation scenarios to perform our analysis. Our analysis reveals that significant spillage and stranded power will be observed in power grids as wind power levels are increased. A counter-intuitive finding is that collocating data centers with inflexible loads next to wind farms has limited impacts on renewable portfolio standard (RPS) goals because it provides limited system-level flexibility and can in fact increase stranded power and fossil-fueled generation. In contrast, optimally placing data centers that are dispatchable (with flexible loads) provides system-wide flexibility, reduces stranded power, and improves efficiency. In short, optimally placed dispatchable computing loads can enable better scaling to high RPS. We show that these dispatchable computing loads are powered to 60~80\% of their requested capacity, indicating that there are significant economic incentives provided by stranded power. △ Less

Submitted 1 June, 2016; originally announced June 2016.

Showing 1–7 of 7 results for author: Chien, A A