-
The Llama 3 Herd of Models
Authors:
Abhimanyu Dubey,
Abhinav Jauhri,
Abhinav Pandey,
Abhishek Kadian,
Ahmad Al-Dahle,
Aiesha Letman,
Akhil Mathur,
Alan Schelten,
Amy Yang,
Angela Fan,
Anirudh Goyal,
Anthony Hartshorn,
Aobo Yang,
Archi Mitra,
Archie Sravankumar,
Artem Korenev,
Arthur Hinsvark,
Arun Rao,
Aston Zhang,
Aurelien Rodriguez,
Austen Gregerson,
Ava Spataru,
Baptiste Roziere,
Bethany Biron,
Binh Tang
, et al. (510 additional authors not shown)
Abstract:
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical…
▽ More
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.
△ Less
Submitted 15 August, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Utilizing Explainability Techniques for Reinforcement Learning Model Assurance
Authors:
Alexander Tapley,
Kyle Gatesman,
Luis Robaina,
Brett Bissey,
Joseph Weissman
Abstract:
Explainable Reinforcement Learning (XRL) can provide transparency into the decision-making process of a Deep Reinforcement Learning (DRL) model and increase user trust and adoption in real-world use cases. By utilizing XRL techniques, researchers can identify potential vulnerabilities within a trained DRL model prior to deployment, therefore limiting the potential for mission failure or mistakes b…
▽ More
Explainable Reinforcement Learning (XRL) can provide transparency into the decision-making process of a Deep Reinforcement Learning (DRL) model and increase user trust and adoption in real-world use cases. By utilizing XRL techniques, researchers can identify potential vulnerabilities within a trained DRL model prior to deployment, therefore limiting the potential for mission failure or mistakes by the system. This paper introduces the ARLIN (Assured RL Model Interrogation) Toolkit, an open-source Python library that identifies potential vulnerabilities and critical points within trained DRL models through detailed, human-interpretable explainability outputs. To illustrate ARLIN's effectiveness, we provide explainability visualizations and vulnerability analysis for a publicly available DRL model. The open-source code repository is available for download at https://github.com/mitre/arlin.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Locality, Latency and Spatial-Aware Data Placement Strategies at the Edge
Authors:
N. Sreekumar,
A. Chandra,
J. B. Weissman
Abstract:
The vast data deluge at the network's edge is raising multiple challenges for the edge computing community. One of them is identifying edge storage servers where data from edge devices/sensors have to be stored to ensure low latency access services to emerging edge applications. Existing data placement algorithms mainly focus on locality, latency, and zoning to select edge storage servers under mu…
▽ More
The vast data deluge at the network's edge is raising multiple challenges for the edge computing community. One of them is identifying edge storage servers where data from edge devices/sensors have to be stored to ensure low latency access services to emerging edge applications. Existing data placement algorithms mainly focus on locality, latency, and zoning to select edge storage servers under multiple environmental constraints. This paper uses a data placement framework to compare distance-based, latency-based, and spatial-awareness-based data placement strategies, which all share a decision-making system with similar constraints. Based on simulation experiments, we observed that the spatial-awareness-based strategy could provide a quality of service on par with the latency-based and better than the distance-based strategy.
△ Less
Submitted 6 April, 2023; v1 submitted 4 December, 2022;
originally announced December 2022.
-
Constellation: An Edge-Based Semantic Runtime System for Internet of Things Applications
Authors:
Mitch Terrell,
Yixuan Wang,
Matt Dorow,
Soumya Agrawal,
Bhaargav Sriraman,
Zach Leidall,
Abhishek Chandra,
Jon Weissman
Abstract:
With the global Internet of Things IoT market size predicted to grow to over 1 trillion dollars in the next 5 years, many large corporations are scrambling to solidify their product line as the defacto device suite for consumers. This has led to each corporation developing their devices in a siloed environment with unique protocols and runtime frameworks that explicitly exclude the ability to work…
▽ More
With the global Internet of Things IoT market size predicted to grow to over 1 trillion dollars in the next 5 years, many large corporations are scrambling to solidify their product line as the defacto device suite for consumers. This has led to each corporation developing their devices in a siloed environment with unique protocols and runtime frameworks that explicitly exclude the ability to work with the competitions devices. This development silo has created problems with programming complexity for application developers as well as concurrency and scalability limitations for applications that involve a network of IoT devices. The Constellation project is a distributed IoT runtime system that attempts to address these challenges by creating an operating system layer that decouples applications from devices. This layer provides mechanisms designed to allow applications to interface with an underlying substrate of IoT devices while abstracting away the complexities of application concurrency, device interoperability, and system scalability. This paper provides an overview of the Constellation system as well as details four new project expansions to improve system scalability.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Armada: A Robust Latency-Sensitive Edge Cloud in Heterogeneous Edge-Dense Environments
Authors:
Lei Huang,
Zhiying Liang,
Nikhil Sreekumar,
Sumanth Kaushik Vishwanath,
Cody Perakslis,
Abhishek Chandra,
Jon Weissman
Abstract:
Edge computing has enabled a large set of emerging edge applications by exploiting data proximity and offloading latency-sensitive and computation-intensive workloads to nearby edge servers. However, supporting edge application users at scale in wide-area environments poses challenges due to limited point-of-presence edge sites and constrained elasticity. In this paper, we introduce Armada: a dens…
▽ More
Edge computing has enabled a large set of emerging edge applications by exploiting data proximity and offloading latency-sensitive and computation-intensive workloads to nearby edge servers. However, supporting edge application users at scale in wide-area environments poses challenges due to limited point-of-presence edge sites and constrained elasticity. In this paper, we introduce Armada: a densely-distributed edge cloud infrastructure that explores the use of dedicated and volunteer resources to serve geo-distributed users in heterogeneous environments. We describe the lightweight Armada architecture and optimization techniques including performance-aware edge selection, auto-scaling and load balancing on the edge, fault tolerance, and in-situ data access. We evaluate Armada in both real-world volunteer environments and emulated platforms to show how common edge applications, namely real-time object detection and face recognition, can be easily deployed on Armada serving distributed users at scale with low latency.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Integrating Abstractions to Enhance the Execution of Distributed Applications
Authors:
Matteo Turilli,
Feng Liu,
Zhao Zhang,
Andre Merzky,
Michael Wilde,
Jon Weissman,
Daniel S. Katz,
Shantenu Jha
Abstract:
One of the factors that limits the scale, performance, and sophistication of distributed applications is the difficulty of concurrently executing them on multiple distributed computing resources. In part, this is due to a poor understanding of the general properties and performance of the coupling between applications and dynamic resources. This paper addresses this issue by integrating abstractio…
▽ More
One of the factors that limits the scale, performance, and sophistication of distributed applications is the difficulty of concurrently executing them on multiple distributed computing resources. In part, this is due to a poor understanding of the general properties and performance of the coupling between applications and dynamic resources. This paper addresses this issue by integrating abstractions representing distributed applications, resources, and execution processes into a pilot-based middleware. The middleware provides a platform that can specify distributed applications, execute them on multiple resource and for different configurations, and is instrumented to support investigative analysis. We analyzed the execution of distributed applications using experiments that measure the benefits of using multiple resources, the late-binding of scheduling decisions, and the use of backfill scheduling.
△ Less
Submitted 18 February, 2016; v1 submitted 18 April, 2015;
originally announced April 2015.
-
Survey and Analysis of Production Distributed Computing Infrastructures
Authors:
Daniel S. Katz,
Shantenu Jha,
Manish Parashar,
Omer Rana,
Jon Weissman
Abstract:
This report has two objectives. First, we describe a set of the production distributed infrastructures currently available, so that the reader has a basic understanding of them. This includes explaining why each infrastructure was created and made available and how it has succeeded and failed. The set is not complete, but we believe it is representative.
Second, we describe the infrastructures i…
▽ More
This report has two objectives. First, we describe a set of the production distributed infrastructures currently available, so that the reader has a basic understanding of them. This includes explaining why each infrastructure was created and made available and how it has succeeded and failed. The set is not complete, but we believe it is representative.
Second, we describe the infrastructures in terms of their use, which is a combination of how they were designed to be used and how users have found ways to use them. Applications are often designed and created with specific infrastructures in mind, with both an appreciation of the existing capabilities provided by those infrastructures and an anticipation of their future capabilities. Here, the infrastructures we discuss were often designed and created with specific applications in mind, or at least specific types of applications. The reader should understand how the interplay between the infrastructure providers and the users leads to such usages, which we call usage modalities. These usage modalities are really abstractions that exist between the infrastructures and the applications; they influence the infrastructures by representing the applications, and they influence the ap- plications by representing the infrastructures.
△ Less
Submitted 13 August, 2012;
originally announced August 2012.