-
Recoverable and Detectable Self-Implementations of Swap
Authors:
Tomer Lev Lehman,
Hagit Attiya,
Danny Hendler
Abstract:
Recoverable algorithms tolerate failures and recoveries of processes by using non-volatile memory. Of particular interest are self-implementations of key operations, in which a recoverable operation is implemented from its non-recoverable counterpart (in addition to reads and writes). This paper presents two self-implementations of the SWAP operation. One works in the system-wide failures model, w…
▽ More
Recoverable algorithms tolerate failures and recoveries of processes by using non-volatile memory. Of particular interest are self-implementations of key operations, in which a recoverable operation is implemented from its non-recoverable counterpart (in addition to reads and writes). This paper presents two self-implementations of the SWAP operation. One works in the system-wide failures model, where all processes fail and recover together, and the other in the independent failures model, where each process crashes and recovers independently of the other processes. Both algorithms are wait-free in crash-free executions, but their recovery code is blocking. We prove that this is inherent for the independent failures model. The impossibility result is proved for implementations of distinguishable operations using interfering functions, and in particular, it applies to a recoverable self-implementation of swap.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Managed Network Services for Exascale Data Movement Across Large Global Scientific Collaborations
Authors:
Frank Würthwein,
Jonathan Guiang,
Aashay Arora,
Diego Davila,
John Graham,
Dima Mishin,
Thomas Hutton,
Igor Sfiligoi,
Harvey Newman,
Justas Balcas,
Tom Lehman,
Xi Yang,
Chin Guok
Abstract:
Unique scientific instruments designed and operated by large global collaborations are expected to produce Exabyte-scale data volumes per year by 2030. These collaborations depend on globally distributed storage and compute to turn raw data into science. While all of these infrastructures have batch scheduling capabilities to share compute, Research and Education networks lack those capabilities.…
▽ More
Unique scientific instruments designed and operated by large global collaborations are expected to produce Exabyte-scale data volumes per year by 2030. These collaborations depend on globally distributed storage and compute to turn raw data into science. While all of these infrastructures have batch scheduling capabilities to share compute, Research and Education networks lack those capabilities. There is thus uncontrolled competition for bandwidth between and within collaborations. As a result, data "hogs" disk space at processing facilities for much longer than it takes to process, leading to vastly over-provisioned storage infrastructures. Integrated co-scheduling of networks as part of high-level managed workflows might reduce these storage needs by more than an order of magnitude. This paper describes such a solution, demonstrates its functionality in the context of the Large Hadron Collider (LHC) at CERN, and presents the next-steps towards its use in production.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Snowmass 2021 Computational Frontier CompF4 Topical Group Report: Storage and Processing Resource Access
Authors:
W. Bhimji,
D. Carder,
E. Dart,
J. Duarte,
I. Fisk,
R. Gardner,
C. Guok,
B. Jayatilaka,
T. Lehman,
M. Lin,
C. Maltzahn,
S. McKee,
M. S. Neubauer,
O. Rind,
O. Shadura,
N. V. Tran,
P. van Gemmeren,
G. Watts,
B. A. Weaver,
F. Würthwein
Abstract:
Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commer…
▽ More
Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commercial clouds, federally funded High Performance Computing (HPC) systems for all of science, and systems funded explicitly for a given experimental or theoretical program. This topical group report summarizes the findings and recommendations for the storage, processing, networking and associated software service infrastructures for future high energy physics research, based on the discussions organized through the Snowmass 2021 community study.
△ Less
Submitted 29 September, 2022; v1 submitted 19 September, 2022;
originally announced September 2022.
-
The LBNL Superfacility Project Report
Authors:
Deborah Bard,
Cory Snavely,
Lisa Gerhardt,
Jason Lee,
Becci Totzke,
Katie Antypas,
William Arndt,
Johannes Blaschke,
Suren Byna,
Ravi Cheema,
Shreyas Cholia,
Mark Day,
Bjoern Enders,
Aditi Gaur,
Annette Greiner,
Taylor Groves,
Mariam Kiran,
Quincey Koziol,
Tom Lehman,
Kelly Rowland,
Chris Samuel,
Ashwin Selvarajan,
Alex Sim,
David Skinner,
Laurie Stephey
, et al. (2 additional authors not shown)
Abstract:
The Superfacility model is designed to leverage HPC for experimental science. It is more than simply a model of connected experiment, network, and HPC facilities; it encompasses the full ecosystem of infrastructure, software, tools, and expertise needed to make connected facilities easy to use. The three-year Lawrence Berkeley National Laboratory (LBNL) Superfacility project was initiated in 2019…
▽ More
The Superfacility model is designed to leverage HPC for experimental science. It is more than simply a model of connected experiment, network, and HPC facilities; it encompasses the full ecosystem of infrastructure, software, tools, and expertise needed to make connected facilities easy to use. The three-year Lawrence Berkeley National Laboratory (LBNL) Superfacility project was initiated in 2019 to coordinate work being performed at LBNL to support this model, and to provide a coherent and comprehensive set of science requirements to drive existing and new work.
A key component of the project was the in-depth engagements with eight science teams that represent challenging use cases across the DOE Office of Science. By the close of the project, we met our project goal by enabling our science application engagements to demonstrate automated pipelines that analyze data from remote facilities at large scale, without routine human intervention. In several cases, we have gone beyond demonstrations and now provide production-level services. To achieve this goal, the Superfacility team developed tools, infrastructure, and policies for near-real-time computing support, dynamic high-performance networking, data management and movement tools, API-driven automation, HPC-scale notebooks via Jupyter, authentication using Federated Identity and container-based edge services supported.
The lessons we learned during this project provide a valuable model for future large, complex, cross-disciplinary collaborations. There is a pressing need for a coherent computing infrastructure across national facilities, and LBNL's Superfacility project is a unique model for success in tackling the challenges that will be faced in hardware, software, policies, and services across multiple science domains.
△ Less
Submitted 27 June, 2022; v1 submitted 23 June, 2022;
originally announced June 2022.
-
Data Transfer and Network Services management for Domain Science Workflows
Authors:
Tom Lehman,
Xi Yang,
Chin Guok,
Frank Wuerthwein,
Igor Sfiligoi,
John Graham,
Aashay Arora,
Dima Mishin,
Diego Davila,
Jonathan Guiang,
Tom Hutton,
Harvey Newman,
Justas Balcas
Abstract:
This paper describes a vision and work in progress to elevate network resources and data transfer management to the same level as compute and storage in the context of services access, scheduling, life cycle management, and orchestration. While domain science workflows often include active compute resource allocation and management, the data transfers and associated network resource coordination i…
▽ More
This paper describes a vision and work in progress to elevate network resources and data transfer management to the same level as compute and storage in the context of services access, scheduling, life cycle management, and orchestration. While domain science workflows often include active compute resource allocation and management, the data transfers and associated network resource coordination is not handled in a similar manner. As a result data transfers can introduce a degree of uncertainty in workflow operations, and the associated lack of network information does not allow for either the workflow operations or the network use to be optimized. The net result is that domain science workflow processes are forced to view the network as an opaque infrastructure into which they inject data and hope that it emerges at the destination with an acceptable Quality of Experience. There is little ability for applications to interact with the network to exchange information, negotiate performance parameters, discover expected performance metrics, or receive status/troubleshooting information in real time. Developing mechanisms to allow an application workflow to obtain information regarding the network services, capabilities, and options, to a degree similar to what is possible for compute resources is the primary motivation for this work. The initial focus is on the Open Science Grid (OSG)/Compact Muon Solenoid (CMS) Large Hadron Collider (LHC) workflows with Rucio/FTS/XRootD based data transfers and the interoperation with the ESnet SENSE (Software-Defined Network for End-to-end Networked Science at the Exascale) system.
△ Less
Submitted 20 March, 2022; v1 submitted 15 March, 2022;
originally announced March 2022.
-
Analyzing Behavioral Changes of Twitter Users After Exposure to Misinformation
Authors:
Yichen Wang,
Richard Han,
Tamara Lehman,
Qin Lv,
Shivakant Mishra
Abstract:
Social media platforms have been exploited to disseminate misinformation in recent years. The widespread online misinformation has been shown to affect users' beliefs and is connected to social impact such as polarization. In this work, we focus on misinformation's impact on specific user behavior and aim to understand whether general Twitter users changed their behavior after being exposed to mis…
▽ More
Social media platforms have been exploited to disseminate misinformation in recent years. The widespread online misinformation has been shown to affect users' beliefs and is connected to social impact such as polarization. In this work, we focus on misinformation's impact on specific user behavior and aim to understand whether general Twitter users changed their behavior after being exposed to misinformation. We compare the before and after behavior of exposed users to determine whether the frequency of the tweets they posted, or the sentiment of their tweets underwent any significant change. Our results indicate that users overall exhibited statistically significant changes in behavior across some of these metrics. Through language distance analysis, we show that exposed users were already different from baseline users before the exposure. We also study the characteristics of two specific user groups, multi-exposure and extreme change groups, which were potentially highly impacted. Finally, we study if the changes in the behavior of the users after exposure to misinformation tweets vary based on the number of their followers or the number of followers of the tweet authors, and find that their behavioral changes are all similar.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Analyzing Twitter Users' Behavior Before and After Contact by the Internet Research Agency
Authors:
Upasana Dutta,
Rhett Hanscom,
Jason Shuo Zhang,
Richard Han,
Tamara Lehman,
Qin Lv,
Shivakant Mishra
Abstract:
Social media platforms have been exploited to conduct election interference in recent years. In particular, the Russian-backed Internet Research Agency (IRA) has been identified as a key source of misinformation spread on Twitter prior to the 2016 U.S. presidential election. The goal of this research is to understand whether general Twitter users changed their behavior in the year following first…
▽ More
Social media platforms have been exploited to conduct election interference in recent years. In particular, the Russian-backed Internet Research Agency (IRA) has been identified as a key source of misinformation spread on Twitter prior to the 2016 U.S. presidential election. The goal of this research is to understand whether general Twitter users changed their behavior in the year following first contact from an IRA account. We compare the before and after behavior of contacted users to determine whether there were differences in their mean tweet count, the sentiment of their tweets, and the frequency and sentiment of tweets mentioning @realDonaldTrump or @HillaryClinton. Our results indicate that users overall exhibited statistically significant changes in behavior across most of these metrics, and that those users that engaged with the IRA generally showed greater changes in behavior.
△ Less
Submitted 15 February, 2021; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Software-Defined Network for End-to-end Networked Science at the Exascale
Authors:
Inder Monga,
Chin Guok,
John MacAuley,
Alex Sim,
Harvey Newman,
Justas Balcas,
Phil DeMar,
Linda Winkler,
Tom Lehman,
Xi Yang
Abstract:
Domain science applications and workflow processes are currently forced to view the network as an opaque infrastructure into which they inject data and hope that it emerges at the destination with an acceptable Quality of Experience. There is little ability for applications to interact with the network to exchange information, negotiate performance parameters, discover expected performance metrics…
▽ More
Domain science applications and workflow processes are currently forced to view the network as an opaque infrastructure into which they inject data and hope that it emerges at the destination with an acceptable Quality of Experience. There is little ability for applications to interact with the network to exchange information, negotiate performance parameters, discover expected performance metrics, or receive status/troubleshooting information in real time. The work presented here is motivated by a vision for a new smart network and smart application ecosystem that will provide a more deterministic and interactive environment for domain science workflows. The Software-Defined Network for End-to-end Networked Science at Exascale (SENSE) system includes a model-based architecture, implementation, and deployment which enables automated end-to-end network service instantiation across administrative domains. An intent based interface allows applications to express their high-level service requirements, an intelligent orchestrator and resource control systems allow for custom tailoring of scalability and real-time responsiveness based on individual application and infrastructure operator requirements. This allows the science applications to manage the network as a first-class schedulable resource as is the current practice for instruments, compute, and storage systems. Deployment and experiments on production networks and testbeds have validated SENSE functions and performance. Emulation based testing verified the scalability needed to support research and education infrastructures. Key contributions of this work include an architecture definition, reference implementation, and deployment. This provides the basis for further innovation of smart network services to accelerate scientific discovery in the era of big data, cloud computing, machine learning and artificial intelligence.
△ Less
Submitted 13 April, 2020;
originally announced April 2020.
-
The Future of CISE Distributed Research Infrastructure
Authors:
Jay Aikat,
Ilya Baldin,
Mark Berman,
Joe Breen,
Richard Brooks,
Prasad Calyam,
Jeff Chase,
Wallace Chase,
Russ Clark,
Chip Elliott,
Jim Griffioen,
Dijiang Huang,
Julio Ibarra,
Tom Lehman,
Inder Monga,
Abrahim Matta,
Christos Papadopoulos,
Mike Reiter,
Dipankar Raychaudhuri,
Glenn Ricart,
Robert Ricci,
Paul Ruth,
Ivan Seskar,
Jerry Sobieski,
Kobus Van der Merwe
, et al. (3 additional authors not shown)
Abstract:
Shared research infrastructure that is globally distributed and widely accessible has been a hallmark of the networking community. This paper presents an initial snapshot of a vision for a possible future of mid-scale distributed research infrastructure aimed at enabling new types of research and discoveries. The paper is written from the perspective of "lessons learned" in constructing and operat…
▽ More
Shared research infrastructure that is globally distributed and widely accessible has been a hallmark of the networking community. This paper presents an initial snapshot of a vision for a possible future of mid-scale distributed research infrastructure aimed at enabling new types of research and discoveries. The paper is written from the perspective of "lessons learned" in constructing and operating the Global Environment for Network Innovations (GENI) infrastructure and attempts to project future concepts and solutions based on these lessons. The goal of this paper is to engage the community to contribute new ideas and to inform funding agencies about future research directions to realize this vision.
△ Less
Submitted 27 March, 2018;
originally announced March 2018.