Search | arXiv e-print repository

ARVO: Atlas of Reproducible Vulnerabilities for Open Source Software

Authors: Xiang Mei, Pulkit Singh Singaria, Jordi Del Castillo, Haoran Xi, Abdelouahab, Benchikh, Tiffany Bao, Ruoyu Wang, Yan Shoshitaishvili, Adam Doupé, Hammond Pearce, Brendan Dolan-Gavitt

Abstract: High-quality datasets of real-world vulnerabilities are enormously valuable for downstream research in software security, but existing datasets are typically small, require extensive manual effort to update, and are missing crucial features that such research needs. In this paper, we introduce ARVO: an Atlas of Reproducible Vulnerabilities in Open-source software. By sourcing vulnerabilities from… ▽ More High-quality datasets of real-world vulnerabilities are enormously valuable for downstream research in software security, but existing datasets are typically small, require extensive manual effort to update, and are missing crucial features that such research needs. In this paper, we introduce ARVO: an Atlas of Reproducible Vulnerabilities in Open-source software. By sourcing vulnerabilities from C/C++ projects that Google's OSS-Fuzz discovered and implementing a reliable re-compilation system, we successfully reproduce more than 5,000 memory vulnerabilities across over 250 projects, each with a triggering input, the canonical developer-written patch for fixing the vulnerability, and the ability to automatically rebuild the project from source and run it at its vulnerable and patched revisions. Moreover, our dataset can be automatically updated as OSS-Fuzz finds new vulnerabilities, allowing it to grow over time. We provide a thorough characterization of the ARVO dataset, show that it can locate fixes more accurately than Google's own OSV reproduction effort, and demonstrate its value for future research through two case studies: firstly evaluating real-world LLM-based vulnerability repair, and secondly identifying over 300 falsely patched (still-active) zero-day vulnerabilities from projects improperly labeled by OSS-Fuzz. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Comments: 14 pages, 9 figures

arXiv:2405.02326 [pdf, other]

Evaluating LLMs for Hardware Design and Test

Authors: Jason Blocklove, Siddharth Garg, Ramesh Karri, Hammond Pearce

Abstract: Large Language Models (LLMs) have demonstrated capabilities for producing code in Hardware Description Languages (HDLs). However, most of the focus remains on their abilities to write functional code, not test code. The hardware design process consists of both design and test, and so eschewing validation and verification leaves considerable potential benefit unexplored, given that a design and tes… ▽ More Large Language Models (LLMs) have demonstrated capabilities for producing code in Hardware Description Languages (HDLs). However, most of the focus remains on their abilities to write functional code, not test code. The hardware design process consists of both design and test, and so eschewing validation and verification leaves considerable potential benefit unexplored, given that a design and test framework may allow for progress towards full automation of the digital design pipeline. In this work, we perform one of the first studies exploring how a LLM can both design and test hardware modules from provided specifications. Using a suite of 8 representative benchmarks, we examined the capabilities and limitations of the state-of-the-art conversational LLMs when producing Verilog for functional and verification purposes. We taped out the benchmarks on a Skywater 130nm shuttle and received the functional chip. △ Less

Submitted 23 April, 2024; originally announced May 2024.

arXiv:2404.15446 [pdf, other]

OffRAMPS: An FPGA-based Intermediary for Analysis and Modification of Additive Manufacturing Control Systems

Authors: Jason Blocklove, Md Raz, Prithwish Basu Roy, Hammond Pearce, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri

Abstract: Cybersecurity threats in Additive Manufacturing (AM) are an increasing concern as AM adoption continues to grow. AM is now being used for parts in the aerospace, transportation, and medical domains. Threat vectors which allow for part compromise are particularly concerning, as any failure in these domains would have life-threatening consequences. A major challenge to investigation of AM part-compr… ▽ More Cybersecurity threats in Additive Manufacturing (AM) are an increasing concern as AM adoption continues to grow. AM is now being used for parts in the aerospace, transportation, and medical domains. Threat vectors which allow for part compromise are particularly concerning, as any failure in these domains would have life-threatening consequences. A major challenge to investigation of AM part-compromises comes from the difficulty in evaluating and benchmarking both identified threat vectors as well as methods for detecting adversarial actions. In this work, we introduce a generalized platform for systematic analysis of attacks against and defenses for 3D printers. Our "OFFRAMPS" platform is based on the open-source 3D printer control board "RAMPS." OFFRAMPS allows analysis, recording, and modification of all control signals and I/O for a 3D printer. We show the efficacy of OFFRAMPS by presenting a series of case studies based on several Trojans, including ones identified in the literature, and show that OFFRAMPS can both emulate and detect these attacks, i.e., it can both change and detect arbitrary changes to the g-code print commands. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.07235 [pdf, other]

Explaining EDA synthesis errors with LLMs

Authors: Siyu Qiu, Benjamin Tan, Hammond Pearce

Abstract: Training new engineers in digital design is a challenge, particularly when it comes to teaching the complex electronic design automation (EDA) tooling used in this domain. Learners will typically deploy designs in the Verilog and VHDL hardware description languages to Field Programmable Gate Arrays (FPGAs) from Altera (Intel) and Xilinx (AMD) via proprietary closed-source toolchains (Quartus Prime… ▽ More Training new engineers in digital design is a challenge, particularly when it comes to teaching the complex electronic design automation (EDA) tooling used in this domain. Learners will typically deploy designs in the Verilog and VHDL hardware description languages to Field Programmable Gate Arrays (FPGAs) from Altera (Intel) and Xilinx (AMD) via proprietary closed-source toolchains (Quartus Prime and Vivado, respectively). These tools are complex and difficult to use -- yet, as they are the tools used in industry, they are an essential first step in this space. In this work, we examine how recent advances in artificial intelligence may be leveraged to address aspects of this challenge. Specifically, we investigate if Large Language Models (LLMs), which have demonstrated text comprehension and question-answering capabilities, can be used to generate novice-friendly explanations of compile-time synthesis error messages from Quartus Prime and Vivado. To perform this study we generate 936 error message explanations using three OpenAI LLMs over 21 different buggy code samples. These are then graded for relevance and correctness, and we find that in approximately 71% of cases the LLMs give correct & complete explanations suitable for novice learners. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 6 pages, 6 figures

arXiv:2312.12575 [pdf, other]

LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks

Authors: Saad Ullah, Mingji Han, Saurabh Pujar, Hammond Pearce, Ayse Coskun, Gianluca Stringhini

Abstract: Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set… ▽ More Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set of 228 code scenarios and analyze eight of the most capable LLMs across eight different investigative dimensions using our framework. Our evaluation shows LLMs provide non-deterministic responses, incorrect and unfaithful reasoning, and perform poorly in real-world scenarios. Most importantly, our findings reveal significant non-robustness in even the most advanced models like `PaLM2' and `GPT-4': by merely changing function or variable names, or by the addition of library functions in the source code, these models can yield incorrect answers in 26% and 17% of cases, respectively. These findings demonstrate that further LLM advances are needed before LLMs can be used as general purpose security assistants. △ Less

Submitted 24 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted for publication in IEEE Symposium on Security and Privacy 2024

arXiv:2311.04887 [pdf, other]

AutoChip: Automating HDL Generation Using LLM Feedback

Authors: Shailja Thakur, Jason Blocklove, Hammond Pearce, Benjamin Tan, Siddharth Garg, Ramesh Karri

Abstract: Traditionally, designs are written in Verilog hardware description language (HDL) and debugged by hardware engineers. While this approach is effective, it is time-consuming and error-prone for complex designs. Large language models (LLMs) are promising in automating HDL code generation. LLMs are trained on massive datasets of text and code, and they can learn to generate code that compiles and is… ▽ More Traditionally, designs are written in Verilog hardware description language (HDL) and debugged by hardware engineers. While this approach is effective, it is time-consuming and error-prone for complex designs. Large language models (LLMs) are promising in automating HDL code generation. LLMs are trained on massive datasets of text and code, and they can learn to generate code that compiles and is functionally accurate. We aim to evaluate the ability of LLMs to generate functionally correct HDL models. We build AutoChip by combining the interactive capabilities of LLMs and the output from Verilog simulations to generate Verilog modules. We start with a design prompt for a module and the context from compilation errors and debugging messages, which highlight differences between the expected and actual outputs. This ensures that accurate Verilog code can be generated without human intervention. We evaluate AutoChip using problem sets from HDLBits. We conduct a comprehensive analysis of the AutoChip using several LLMs and problem categories. The results show that incorporating context from compiler tools, such as Icarus Verilog, improves the effectiveness, yielding 24.20% more accurate Verilog. We release our evaluation scripts and datasets as open-source contributions at the following link https://github.com/shailja-thakur/AutoChip. △ Less

Submitted 4 June, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.10560 [pdf, other]

Towards the Imagenets of ML4EDA

Authors: Animesh Basak Chowdhury, Shailja Thakur, Hammond Pearce, Ramesh Karri, Siddharth Garg

Abstract: Despite the growing interest in ML-guided EDA tools from RTL to GDSII, there are no standard datasets or prototypical learning tasks defined for the EDA problem domain. Experience from the computer vision community suggests that such datasets are crucial to spur further progress in ML for EDA. Here we describe our experience curating two large-scale, high-quality datasets for Verilog code generati… ▽ More Despite the growing interest in ML-guided EDA tools from RTL to GDSII, there are no standard datasets or prototypical learning tasks defined for the EDA problem domain. Experience from the computer vision community suggests that such datasets are crucial to spur further progress in ML for EDA. Here we describe our experience curating two large-scale, high-quality datasets for Verilog code generation and logic synthesis. The first, VeriGen, is a dataset of Verilog code collected from GitHub and Verilog textbooks. The second, OpenABC-D, is a large-scale, labeled dataset designed to aid ML for logic synthesis tasks. The dataset consists of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs on a large number of open-source hardware projects. In this paper we will discuss challenges in curating, maintaining and growing the size and scale of these datasets. We will also touch upon questions of dataset quality and security, and the use of novel data augmentation tools that are tailored for the hardware domain. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: Invited paper, ICCAD 2023

Report number: October 16 Update

Journal ref: ICCAD 2023

arXiv:2310.05135 [pdf, other]

Are Emily and Greg Still More Employable than Lakisha and Jamal? Investigating Algorithmic Hiring Bias in the Era of ChatGPT

Authors: Akshaj Kumar Veldanda, Fabian Grob, Shailja Thakur, Hammond Pearce, Benjamin Tan, Ramesh Karri, Siddharth Garg

Abstract: Large Language Models (LLMs) such as GPT-3.5, Bard, and Claude exhibit applicability across numerous tasks. One domain of interest is their use in algorithmic hiring, specifically in matching resumes with job categories. Yet, this introduces issues of bias on protected attributes like gender, race and maternity status. The seminal work of Bertrand & Mullainathan (2003) set the gold-standard for id… ▽ More Large Language Models (LLMs) such as GPT-3.5, Bard, and Claude exhibit applicability across numerous tasks. One domain of interest is their use in algorithmic hiring, specifically in matching resumes with job categories. Yet, this introduces issues of bias on protected attributes like gender, race and maternity status. The seminal work of Bertrand & Mullainathan (2003) set the gold-standard for identifying hiring bias via field experiments where the response rate for identical resumes that differ only in protected attributes, e.g., racially suggestive names such as Emily or Lakisha, is compared. We replicate this experiment on state-of-art LLMs (GPT-3.5, Bard, Claude and Llama) to evaluate bias (or lack thereof) on gender, race, maternity status, pregnancy status, and political affiliation. We evaluate LLMs on two tasks: (1) matching resumes to job categories; and (2) summarizing resumes with employment relevant information. Overall, LLMs are robust across race and gender. They differ in their performance on pregnancy status and political affiliation. We use contrastive input decoding on open-source LLMs to uncover potential sources of bias. △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2308.11873 [pdf, other]

Dcc --help: Generating Context-Aware Compiler Error Explanations with Large Language Models

Authors: Andrew Taylor, Alexandra Vassar, Jake Renzella, Hammond Pearce

Abstract: In the challenging field of introductory programming, high enrollments and failure rates drive us to explore tools and systems to enhance student outcomes, especially automated tools that scale to large cohorts. This paper presents and evaluates the dcc --help tool, an integration of a Large Language Model (LLM) into the Debugging C Compiler (DCC) to generate unique, novice-focused explanations ta… ▽ More In the challenging field of introductory programming, high enrollments and failure rates drive us to explore tools and systems to enhance student outcomes, especially automated tools that scale to large cohorts. This paper presents and evaluates the dcc --help tool, an integration of a Large Language Model (LLM) into the Debugging C Compiler (DCC) to generate unique, novice-focused explanations tailored to each error. dcc --help prompts an LLM with contextual information of compile- and run-time error occurrences, including the source code, error location and standard compiler error message. The LLM is instructed to generate novice-focused, actionable error explanations and guidance, designed to help students understand and resolve problems without providing solutions. dcc --help was deployed to our CS1 and CS2 courses, with 2,565 students using the tool over 64,000 times in ten weeks. We analysed a subset of these error/explanation pairs to evaluate their properties, including conceptual correctness, relevancy, and overall quality. We found that the LLM-generated explanations were conceptually accurate in 90% of compile-time and 75% of run-time cases, but often disregarded the instruction not to provide solutions in code. Our findings, observations and reflections following deployment indicate that dcc-help provides novel opportunities for scaffolding students' introduction to programming. △ Less

Submitted 15 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: 7 pages, 2 figures. Accepted in SIGCSE'24

arXiv:2308.00708 [pdf, other]

VeriGen: A Large Language Model for Verilog Code Generation

Authors: Shailja Thakur, Baleegh Ahmad, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri, Siddharth Garg

Abstract: In this study, we explore the capability of Large Language Models (LLMs) to automate hardware design by generating high-quality Verilog code, a common language for designing and modeling digital systems. We fine-tune pre-existing LLMs on Verilog datasets compiled from GitHub and Verilog textbooks. We evaluate the functional correctness of the generated Verilog code using a specially designed test… ▽ More In this study, we explore the capability of Large Language Models (LLMs) to automate hardware design by generating high-quality Verilog code, a common language for designing and modeling digital systems. We fine-tune pre-existing LLMs on Verilog datasets compiled from GitHub and Verilog textbooks. We evaluate the functional correctness of the generated Verilog code using a specially designed test suite, featuring a custom problem set and testing benches. Here, our fine-tuned open-source CodeGen-16B model outperforms the commercial state-of-the-art GPT-3.5-turbo model with a 1.1% overall increase. Upon testing with a more diverse and complex problem set, we find that the fine-tuned model shows competitive performance against state-of-the-art gpt-3.5-turbo, excelling in certain scenarios. Notably, it demonstrates a 41% improvement in generating syntactically correct Verilog code across various problem categories compared to its pre-trained counterpart, highlighting the potential of smaller, in-house LLMs in hardware design automation. △ Less

Submitted 27 July, 2023; originally announced August 2023.

Comments: arXiv admin note: text overlap with arXiv:2212.11140

arXiv:2306.14027 [pdf]

doi 10.1109/TIFS.2024.3372809

(Security) Assertions by Large Language Models

Authors: Rahul Kande, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Shailja Thakur, Ramesh Karri, Jeyavijayan Rajendran

Abstract: The security of computer systems typically relies on a hardware root of trust. As vulnerabilities in hardware can have severe implications on a system, there is a need for techniques to support security verification activities. Assertion-based verification is a popular verification technique that involves capturing design intent in a set of assertions that can be used in formal verification or tes… ▽ More The security of computer systems typically relies on a hardware root of trust. As vulnerabilities in hardware can have severe implications on a system, there is a need for techniques to support security verification activities. Assertion-based verification is a popular verification technique that involves capturing design intent in a set of assertions that can be used in formal verification or testing-based checking. However, writing security-centric assertions is a challenging task. In this work, we investigate the use of emerging large language models (LLMs) for code generation in hardware assertion generation for security, where primarily natural language prompts, such as those one would see as code comments in assertion files, are used to produce SystemVerilog assertions. We focus our attention on a popular LLM and characterize its ability to write assertions out of the box, given varying levels of detail in the prompt. We design an evaluation framework that generates a variety of prompts, and we create a benchmark suite comprising real-world hardware designs and corresponding golden reference assertions that we want to generate with the LLM. △ Less

Submitted 9 July, 2024; v1 submitted 24 June, 2023; originally announced June 2023.

Comments: This article has been accepted for publication in IEEE Transactions on Information Forensics and Security. This is the author's version. See https://ieeexplore.ieee.org/document/10458667 for the published version of the paper. Citation information: DOI 10.1109/TIFS.2024.3372809. See https://www.ieee.org/publications/rights/index.html for information on publication rights

Journal ref: IEEE Transactions on Information Forensics and Security. 2024 Mar 4

arXiv:2306.12643 [pdf, other]

FLAG: Finding Line Anomalies (in code) with Generative AI

Authors: Baleegh Ahmad, Benjamin Tan, Ramesh Karri, Hammond Pearce

Abstract: Code contains security and functional bugs. The process of identifying and localizing them is difficult and relies on human labor. In this work, we present a novel approach (FLAG) to assist human debuggers. FLAG is based on the lexical capabilities of generative AI, specifically, Large Language Models (LLMs). Here, we input a code file then extract and regenerate each line within that file for sel… ▽ More Code contains security and functional bugs. The process of identifying and localizing them is difficult and relies on human labor. In this work, we present a novel approach (FLAG) to assist human debuggers. FLAG is based on the lexical capabilities of generative AI, specifically, Large Language Models (LLMs). Here, we input a code file then extract and regenerate each line within that file for self-comparison. By comparing the original code with an LLM-generated alternative, we can flag notable differences as anomalies for further inspection, with features such as distance from comments and LLM confidence also aiding this classification. This reduces the inspection search space for the designer. Unlike other automated approaches in this area, FLAG is language-agnostic, can work on incomplete (and even non-compiling) code and requires no creation of security properties, functional tests or definition of rules. In this work, we explore the features that help LLMs in this classification and evaluate the performance of FLAG on known bugs. We use 121 benchmarks across C, Python and Verilog; with each benchmark containing a known security or functional weakness. We conduct the experiments using two state of the art LLMs in OpenAI's code-davinci-002 and gpt-3.5-turbo, but our approach may be used by other models. FLAG can identify 101 of the defects and helps reduce the search space to 12-17% of source code. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2305.13243 [pdf, other]

doi 10.1109/MLCAD58807.2023.10299874

Chip-Chat: Challenges and Opportunities in Conversational Hardware Design

Authors: Jason Blocklove, Siddharth Garg, Ramesh Karri, Hammond Pearce

Abstract: Modern hardware design starts with specifications provided in natural language. These are then translated by hardware engineers into appropriate Hardware Description Languages (HDLs) such as Verilog before synthesizing circuit elements. Automating this translation could reduce sources of human error from the engineering process. But, it is only recently that artificial intelligence (AI) has demons… ▽ More Modern hardware design starts with specifications provided in natural language. These are then translated by hardware engineers into appropriate Hardware Description Languages (HDLs) such as Verilog before synthesizing circuit elements. Automating this translation could reduce sources of human error from the engineering process. But, it is only recently that artificial intelligence (AI) has demonstrated capabilities for machine-based end-to-end design translations. Commercially-available instruction-tuned Large Language Models (LLMs) such as OpenAI's ChatGPT and Google's Bard claim to be able to produce code in a variety of programming languages; but studies examining them for hardware are still lacking. In this work, we thus explore the challenges faced and opportunities presented when leveraging these recent advances in LLMs for hardware design. Given that these `conversational' LLMs perform best when used interactively, we perform a case study where a hardware engineer co-architects a novel 8-bit accumulator-based microprocessor architecture with the LLM according to real-world hardware constraints. We then sent the processor to tapeout in a Skywater 130nm shuttle, meaning that this `Chip-Chat' resulted in what we believe to be the world's first wholly-AI-written HDL for tapeout. △ Less

Submitted 14 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: 6 pages, 8 figures. Accepted in 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD)

arXiv:2305.06902 [pdf, other]

REMaQE: Reverse Engineering Math Equations from Executables

Authors: Meet Udeshi, Prashanth Krishnamurthy, Hammond Pearce, Ramesh Karri, Farshad Khorrami

Abstract: Cybersecurity attacks on embedded devices for industrial control systems and cyber-physical systems may cause catastrophic physical damage as well as economic loss. This could be achieved by infecting device binaries with malware that modifies the physical characteristics of the system operation. Mitigating such attacks benefits from reverse engineering tools that recover sufficient semantic knowl… ▽ More Cybersecurity attacks on embedded devices for industrial control systems and cyber-physical systems may cause catastrophic physical damage as well as economic loss. This could be achieved by infecting device binaries with malware that modifies the physical characteristics of the system operation. Mitigating such attacks benefits from reverse engineering tools that recover sufficient semantic knowledge in terms of mathematical equations of the implemented algorithm. Conventional reverse engineering tools can decompile binaries to low-level code, but offer little semantic insight. This paper proposes the REMaQE automated framework for reverse engineering of math equations from binary executables. Improving over state-of-the-art, REMaQE handles equation parameters accessed via registers, the stack, global memory, or pointers, and can reverse engineer object-oriented implementations such as C++ classes. Using REMaQE, we discovered a bug in the Linux kernel thermal monitoring tool "tmon". To evaluate REMaQE, we generate a dataset of 25,096 binaries with math equations implemented in C and Simulink. REMaQE successfully recovers a semantically matching equation for all 25,096 binaries. REMaQE executes in 0.48 seconds on average and in up to 2 seconds for complex equations. Real-time execution enables integration in an interactive math-oriented reverse engineering workflow. △ Less

Submitted 11 April, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

ACM Class: C.3; D.2.5

arXiv:2302.01215 [pdf, other]

doi 10.1109/TIFS.2024.3374558

Fixing Hardware Security Bugs with Large Language Models

Authors: Baleegh Ahmad, Shailja Thakur, Benjamin Tan, Ramesh Karri, Hammond Pearce

Abstract: Novel AI-based code-writing Large Language Models (LLMs) such as OpenAI's Codex have demonstrated capabilities in many coding-adjacent domains. In this work we consider how LLMs maybe leveraged to automatically repair security relevant bugs present in hardware designs. We focus on bug repair in code written in the Hardware Description Language Verilog. For this study we build a corpus of domain-re… ▽ More Novel AI-based code-writing Large Language Models (LLMs) such as OpenAI's Codex have demonstrated capabilities in many coding-adjacent domains. In this work we consider how LLMs maybe leveraged to automatically repair security relevant bugs present in hardware designs. We focus on bug repair in code written in the Hardware Description Language Verilog. For this study we build a corpus of domain-representative hardware security bugs. We then design and implement a framework to quantitatively evaluate the performance of any LLM tasked with fixing the specified bugs. The framework supports design space exploration of prompts (i.e., prompt engineering) and identifying the best parameters for the LLM. We show that an ensemble of LLMs can repair all ten of our benchmarks. This ensemble outperforms the state-of-the-art Cirfix hardware bug repair tool on its own suite of bugs. These results show that LLMs can repair hardware security bugs and the framework is an important step towards the ultimate goal of an automated end-to-end bug repair framework. △ Less

Submitted 2 February, 2023; originally announced February 2023.

arXiv:2301.10336 [pdf, other]

A survey of Digital Manufacturing Hardware and Software Trojans

Authors: Prithwish Basu Roy, Mudit Bhargava, Chia-Yun Chang, Ellen Hui, Nikhil Gupta, Ramesh Karri, Hammond Pearce

Abstract: Digital Manufacturing (DM) refers to the on-going adoption of smarter, more agile manufacturing processes and cyber-physical systems. This includes modern techniques and technologies such as Additive Manufacturing (AM)/3D printing, as well as the Industrial Internet of Things (IIoT) and the broader trend toward Industry 4.0. However, this adoption is not without risks: with a growing complexity an… ▽ More Digital Manufacturing (DM) refers to the on-going adoption of smarter, more agile manufacturing processes and cyber-physical systems. This includes modern techniques and technologies such as Additive Manufacturing (AM)/3D printing, as well as the Industrial Internet of Things (IIoT) and the broader trend toward Industry 4.0. However, this adoption is not without risks: with a growing complexity and connectivity, so too grows the cyber-physical attack surface. Here, malicious actors might seek to steal sensitive information or sabotage products or production lines, causing financial and reputational loss. Of particular concern are where such malicious attacks may enter the complex supply chains of DM systems as Trojans -- malicious modifications that may trigger their payloads at later times or stages of the product lifecycle. In this work, we thus present a comprehensive overview of the threats posed by Trojans in Digital Manufacturing. We cover both hardware and software Trojans which may exist in products or their production and supply lines. From this, we produce a novel taxonomy for classifying and analyzing these threats, and elaborate on how different side channels (e.g. visual, thermal, acoustic, power, and magnetic) may be used to either enhance the impact of a given Trojan or utilized as part of a defensive strategy. Other defenses are also presented -- including hardware, web-, and software-related. To conclude, we discuss seven different case studies and elaborate how they fit into our taxonomy. Overall, this paper presents a detailed survey of the Trojan landscape for Digital Manufacturing: threats, defenses, and the importance of implementing secure practices. △ Less

Submitted 24 January, 2023; originally announced January 2023.

Comments: 15 pages

arXiv:2212.11140 [pdf, other]

Benchmarking Large Language Models for Automated Verilog RTL Code Generation

Authors: Shailja Thakur, Baleegh Ahmad, Zhenxing Fan, Hammond Pearce, Benjamin Tan, Ramesh Karri, Brendan Dolan-Gavitt, Siddharth Garg

Abstract: Automating hardware design could obviate a significant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and design digital systems, thus generating Verilog code is a critical first step. Emerging large language models (LLMs) are able to write high-quality code in other programming languages. In this paper, we c… ▽ More Automating hardware design could obviate a significant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and design digital systems, thus generating Verilog code is a critical first step. Emerging large language models (LLMs) are able to write high-quality code in other programming languages. In this paper, we characterize the ability of LLMs to generate useful Verilog. For this, we fine-tune pre-trained LLMs on Verilog datasets collected from GitHub and Verilog textbooks. We construct an evaluation framework comprising test-benches for functional analysis and a flow to test the syntax of Verilog code generated in response to problems of varying difficulty. Our findings show that across our problem scenarios, the fine-tuning results in LLMs more capable of producing syntactically correct code (25.9% overall). Further, when analyzing functional correctness, a fine-tuned open-source CodeGen LLM can outperform the state-of-the-art commercial Codex LLM (6.5% overall). Training/evaluation scripts and LLM checkpoints are available: https://github.com/shailja-thakur/VGen. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: Accepted in DATE 2023. 7 pages, 4 tables, 7 figures

arXiv:2209.01291 [pdf, other]

doi 10.1145/3508352.3549369

Don't CWEAT It: Toward CWE Analysis Techniques in Early Stages of Hardware Design

Authors: Baleegh Ahmad, Wei-Kai Liu, Luca Collini, Hammond Pearce, Jason M. Fung, Jonathan Valamehr, Mohammad Bidmeshki, Piotr Sapiecha, Steve Brown, Krishnendu Chakrabarty, Ramesh Karri, Benjamin Tan

Abstract: To help prevent hardware security vulnerabilities from propagating to later design stages where fixes are costly, it is crucial to identify security concerns as early as possible, such as in RTL designs. In this work, we investigate the practical implications and feasibility of producing a set of security-specific scanners that operate on Verilog source files. The scanners indicate parts of code t… ▽ More To help prevent hardware security vulnerabilities from propagating to later design stages where fixes are costly, it is crucial to identify security concerns as early as possible, such as in RTL designs. In this work, we investigate the practical implications and feasibility of producing a set of security-specific scanners that operate on Verilog source files. The scanners indicate parts of code that might contain one of a set of MITRE's common weakness enumerations (CWEs). We explore the CWE database to characterize the scope and attributes of the CWEs and identify those that are amenable to static analysis. We prototype scanners and evaluate them on 11 open source designs - 4 system-on-chips (SoC) and 7 processor cores - and explore the nature of identified weaknesses. Our analysis reported 53 potential weaknesses in the OpenPiton SoC used in Hack@DAC-21, 11 of which we confirmed as security concerns. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2208.09727 [pdf, other]

Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants

Authors: Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, Brendan Dolan-Gavitt

Abstract: Large Language Models (LLMs) such as OpenAI Codex are increasingly being used as AI-based coding assistants. Understanding the impact of these tools on developers' code is paramount, especially as recent work showed that LLMs may suggest cybersecurity vulnerabilities. We conduct a security-driven user study (N=58) to assess code written by student programmers when assisted by LLMs. Given the poten… ▽ More Large Language Models (LLMs) such as OpenAI Codex are increasingly being used as AI-based coding assistants. Understanding the impact of these tools on developers' code is paramount, especially as recent work showed that LLMs may suggest cybersecurity vulnerabilities. We conduct a security-driven user study (N=58) to assess code written by student programmers when assisted by LLMs. Given the potential severity of low-level bugs as well as their relative frequency in real-world projects, we tasked participants with implementing a singly-linked 'shopping list' structure in C. Our results indicate that the security impact in this setting (low-level C with pointer and array manipulations) is small: AI-assisted users produce critical security bugs at a rate no greater than 10% more than the control, indicating the use of LLMs does not introduce new security risks. △ Less

Submitted 27 February, 2023; v1 submitted 20 August, 2022; originally announced August 2022.

Comments: Accepted for publication in USENIX'23. For associated dataset see https://doi.org/10.5281/zenodo.7187359. 18 pages, 12 figures. G. Sandoval and H. Pearce contributed equally to this work

arXiv:2207.10466 [pdf, other]

doi 10.1145/3577200

High-Level Approaches to Hardware Security: A Tutorial

Authors: Hammond Pearce, Ramesh Karri, Benjamin Tan

Abstract: Designers use third-party intellectual property (IP) cores and outsource various steps in the integrated circuit (IC) design and manufacturing flow. As a result, security vulnerabilities have been rising. This is forcing IC designers and end users to re-evaluate their trust in ICs. If attackers get hold of an unprotected IC, they can reverse engineer the IC and pirate the IP. Similarly, if attacke… ▽ More Designers use third-party intellectual property (IP) cores and outsource various steps in the integrated circuit (IC) design and manufacturing flow. As a result, security vulnerabilities have been rising. This is forcing IC designers and end users to re-evaluate their trust in ICs. If attackers get hold of an unprotected IC, they can reverse engineer the IC and pirate the IP. Similarly, if attackers get hold of a design, they can insert malicious circuits or take advantage of "backdoors" in a design. Unintended design bugs can also result in security weaknesses. This tutorial paper provides an introduction to the domain of hardware security through two pedagogical examples of hardware security problems. The first is a walk-through of the scan chain-based side channel attack. The second is a walk-through of logic locking of digital designs. The tutorial material is accompanied by open access digital resources that are linked in this article. △ Less

Submitted 6 March, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: Accepted in IEEE TECS. 41 pages, 13 figures

arXiv:2202.01142 [pdf, other]

Pop Quiz! Can a Large Language Model Help With Reverse Engineering?

Authors: Hammond Pearce, Benjamin Tan, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Brendan Dolan-Gavitt

Abstract: Large language models (such as OpenAI's Codex) have demonstrated impressive zero-shot multi-task capabilities in the software domain, including code explanation. In this work, we examine if this ability can be used to help with reverse engineering. Specifically, we investigate prompting Codex to identify the purpose, capabilities, and important variable names or values from code, even when the cod… ▽ More Large language models (such as OpenAI's Codex) have demonstrated impressive zero-shot multi-task capabilities in the software domain, including code explanation. In this work, we examine if this ability can be used to help with reverse engineering. Specifically, we investigate prompting Codex to identify the purpose, capabilities, and important variable names or values from code, even when the code is produced through decompilation. Alongside an examination of the model's responses in answering open-ended questions, we devise a true/false quiz framework to characterize the performance of the language model. We present an extensive quantitative analysis of the measured performance of the language model on a set of program purpose identification and information extraction tasks: of the 136,260 questions we posed, it answered 72,754 correctly. A key takeaway is that while promising, LLMs are not yet ready for zero-shot reverse engineering. △ Less

Submitted 2 February, 2022; originally announced February 2022.

Comments: 18 pages, 19 figures. Linked dataset: https://doi.org/10.5281/zenodo.5949075

arXiv:2112.02125 [pdf, other]

Examining Zero-Shot Vulnerability Repair with Large Language Models

Authors: Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, Brendan Dolan-Gavitt

Abstract: Human developers can produce code with cybersecurity bugs. Can emerging 'smart' code completion tools help repair those bugs? In this work, we examine the use of large language models (LLMs) for code (such as OpenAI's Codex and AI21's Jurassic J-1) for zero-shot vulnerability repair. We investigate challenges in the design of prompts that coax LLMs into generating repaired versions of insecure cod… ▽ More Human developers can produce code with cybersecurity bugs. Can emerging 'smart' code completion tools help repair those bugs? In this work, we examine the use of large language models (LLMs) for code (such as OpenAI's Codex and AI21's Jurassic J-1) for zero-shot vulnerability repair. We investigate challenges in the design of prompts that coax LLMs into generating repaired versions of insecure code. This is difficult due to the numerous ways to phrase key information - both semantically and syntactically - with natural languages. We perform a large scale study of five commercially available, black-box, "off-the-shelf" LLMs, as well as an open-source model and our own locally-trained model, on a mix of synthetic, hand-crafted, and real-world security bug scenarios. Our experiments demonstrate that while the approach has promise (the LLMs could collectively repair 100% of our synthetically generated and hand-crafted scenarios), a qualitative evaluation of the model's performance over a corpus of historical real-world examples highlights challenges in generating functionally correct code. △ Less

Submitted 15 August, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

Comments: 18 pages, 19 figures. Accepted for publication in 2023 IEEE Symposium on Security and Privacy (SP)

arXiv:2111.12746 [pdf, other]

doi 10.1109/LES.2021.3129108

Needle in a Haystack: Detecting Subtle Malicious Edits to Additive Manufacturing G-code Files

Authors: Caleb Beckwith, Harsh Sankar Naicker, Svara Mehta, Viba R. Udupa, Nghia Tri Nim, Varun Gadre, Hammond Pearce, Gary Mac, Nikhil Gupta

Abstract: Increasing usage of Digital Manufacturing (DM) in safety-critical domains is increasing attention on the cybersecurity of the manufacturing process, as malicious third parties might aim to introduce defects in digital designs. In general, the DM process involves creating a digital object (as CAD files) before using a slicer program to convert the models into printing instructions (e.g. g-code) sui… ▽ More Increasing usage of Digital Manufacturing (DM) in safety-critical domains is increasing attention on the cybersecurity of the manufacturing process, as malicious third parties might aim to introduce defects in digital designs. In general, the DM process involves creating a digital object (as CAD files) before using a slicer program to convert the models into printing instructions (e.g. g-code) suitable for the target printer. As the g-code is an intermediate machine format, malicious edits may be difficult to detect, especially when the golden (original) models are not available to the manufacturer. In this work we aim to quantify this hypothesis through a red-team/blue-team case study, whereby the red-team aims to introduce subtle defects that would impact the properties (strengths) of the 3D printed parts, and the blue-team aims to detect these modifications in the absence of the golden models. The case study had two sets of models, the first with 180 designs (with 2 compromised using 2 methods) and the second with 4320 designs (with 60 compromised using 6 methods). Using statistical modelling and machine learning (ML), the blue-team was able to detect all the compromises in the first set of data, and 50 of the compromises in the second. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: To appear in IEEE Embedded Systems Letters

arXiv:2110.01974 [pdf, other]

Runtime Interchange for Adaptive Re-use of Intelligent Cyber-Physical System Controllers

Authors: Hammond Pearce, Xin Yang, Srinivas Pinisetty, Partha S. Roop

Abstract: Cyber-Physical Systems (CPSs) such as those found within autonomous vehicles are increasingly adopting Artificial Neural Network (ANN)-based controllers. To ensure the safety of these controllers, there is a spate of recent activity to formally verify the ANN-based designs. There are two challenges with these approaches: (1) The verification of such systems is difficult and time consuming. (2) The… ▽ More Cyber-Physical Systems (CPSs) such as those found within autonomous vehicles are increasingly adopting Artificial Neural Network (ANN)-based controllers. To ensure the safety of these controllers, there is a spate of recent activity to formally verify the ANN-based designs. There are two challenges with these approaches: (1) The verification of such systems is difficult and time consuming. (2) These verified controllers are not able to adapt to frequent requirements changes, which are typical in situations like autonomous driving. This raises the question: how can trained and verified controllers, which have gone through expensive training and verification processes, be re-used to deal with requirement changes? This paper addresses this challenge for the first time by proposing a new framework that is capable of dealing with requirement changes at runtime through a mechanism we term runtime interchange. Our approach functions via a continual exchange and selection process of multiple pre-verified controllers. It represents a key step on the way to component-oriented engineering for intelligent designs, as it preserves the behaviours of the original controllers while introducing additional functionality. To demonstrate the efficacy of our approach we utilise an existing autonomous driving case study as well as a set of smaller benchmarks. These show that introduced overheads are extremely minimal and that the approach is very scalable. △ Less

Submitted 23 September, 2021; originally announced October 2021.

Comments: 10 pages, 7 figures

arXiv:2108.09293 [pdf, other]

Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions

Authors: Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri

Abstract: There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described `AI pair programmer', GitHub Copilot, a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity… ▽ More There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described `AI pair programmer', GitHub Copilot, a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot's code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk CWEs (e.g. those from MITRE's "Top 25" list). We explore Copilot's performance on three distinct code generation axes -- examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,689 programs. Of these, we found approximately 40% to be vulnerable. △ Less

Submitted 16 December, 2021; v1 submitted 20 August, 2021; originally announced August 2021.

Comments: Accepted for publication in IEEE Symposium on Security and Privacy 2022

arXiv:2104.09562 [pdf, other]

FLAW3D: A Trojan-based Cyber Attack on the Physical Outcomes of Additive Manufacturing

Authors: Hammond Pearce, Kaushik Yanamandra, Nikhil Gupta, Ramesh Karri

Abstract: Additive Manufacturing (AM) systems such as 3D printers use inexpensive microcontrollers that rarely feature cybersecurity defenses. This is a risk, especially given the rising threat landscape within the larger digital manufacturing domain. In this work we demonstrate this risk by presenting the design and study of a malicious Trojan (the FLAW3D bootloader) for AVR-based Marlin-compatible 3D prin… ▽ More Additive Manufacturing (AM) systems such as 3D printers use inexpensive microcontrollers that rarely feature cybersecurity defenses. This is a risk, especially given the rising threat landscape within the larger digital manufacturing domain. In this work we demonstrate this risk by presenting the design and study of a malicious Trojan (the FLAW3D bootloader) for AVR-based Marlin-compatible 3D printers (>100 commercial models). We show that the Trojan can hide from programming tools, and even within tight design constraints (less than 1.7 kilobytes in size), it can compromise the quality of additively manufactured prints and reduce tensile strengths by up to 50%. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Comments: 8 pages, 11 figures

arXiv:2009.01026 [pdf, other]

doi 10.1145/3380446.3430634

DAVE: Deriving Automatically Verilog from English

Authors: Hammond Pearce, Benjamin Tan, Ramesh Karri

Abstract: While specifications for digital systems are provided in natural language, engineers undertake significant efforts to translate them into the programming languages understood by compilers for digital systems. Automating this process allows designers to work with the language in which they are most comfortable --the original natural language -- and focus instead on other downstream design challenge… ▽ More While specifications for digital systems are provided in natural language, engineers undertake significant efforts to translate them into the programming languages understood by compilers for digital systems. Automating this process allows designers to work with the language in which they are most comfortable --the original natural language -- and focus instead on other downstream design challenges. We explore the use of state-of-the-art machine learning (ML) to automatically derive Verilog snippets from English via fine-tuning GPT-2, a natural language ML system. We describe our approach for producing a suitable dataset of novice-level digital design tasks and provide a detailed exploration of GPT-2, finding encouraging translation performance across our task sets (94.8% correct), with the ability to handle both simple and abstract design tasks. △ Less

Submitted 27 August, 2020; originally announced September 2020.

Comments: 6 pages, 2 figures

arXiv:2008.11830 [pdf, other]

doi 10.1109/LES.2020.3009910

Designing Neural Networks for Real-Time Systems

Authors: Hammond Pearce, Xin Yang, Partha S. Roop, Marc Katzef, Tórur Biskopstø Strøm

Abstract: Artificial Neural Networks (ANNs) are increasingly being used within safety-critical Cyber-Physical Systems (CPSs). They are often co-located with traditional embedded software, and may perform advisory or control-based roles. It is important to validate both the timing and functional correctness of these systems. However, most approaches in the literature consider guaranteeing only the functional… ▽ More Artificial Neural Networks (ANNs) are increasingly being used within safety-critical Cyber-Physical Systems (CPSs). They are often co-located with traditional embedded software, and may perform advisory or control-based roles. It is important to validate both the timing and functional correctness of these systems. However, most approaches in the literature consider guaranteeing only the functionality of ANN based controllers. This issue stems largely from the implementation strategies used within common neural network frameworks -- their underlying source code is often simply unsuitable for formal techniques such as static timing analysis. As a result, developers of safety-critical CPS must rely on informal techniques such as measurement based approaches to prove correctness, techniques that provide weak guarantees at best. In this work we address this challenge. We propose a design pipeline whereby neural networks trained using the popular deep learning framework Keras are compiled to functionally equivalent C code. This C code is restricted to simple constructs that may be analysed by existing static timing analysis tools. As a result, if compiled to a suitable time-predictable platform all execution bounds may be statically derived. To demonstrate the benefits of our approach we execute an ANN trained to drive an autonomous vehicle around a race track. We compile the ANN to the Patmos time-predictable controller, and show that we can derive worst case execution timings. △ Less

Submitted 26 August, 2020; originally announced August 2020.

Comments: 4 pages, 2 figures. IEEE Embedded Systems Letters, 2020

arXiv:1703.06581 [pdf, other]

Disaggregated Benders decomposition and lazy constraints for solving the budget-constrained dynamic uncapacitated facility location and network design problem

Authors: Robin H Pearce, Michael Forbes

Abstract: We present an approach for solving to optimality the budget-constrained Dynamic Uncapacitated Facility Location and Network Design problem (DUFLNDP). This is a problem where a network must be constructed or expanded and facilities placed in the network, subject to a budget, in order to satisfy a number of demands. With the demands satisfied, the objective is to minimise the running cost of the net… ▽ More We present an approach for solving to optimality the budget-constrained Dynamic Uncapacitated Facility Location and Network Design problem (DUFLNDP). This is a problem where a network must be constructed or expanded and facilities placed in the network, subject to a budget, in order to satisfy a number of demands. With the demands satisfied, the objective is to minimise the running cost of the network and the cost of moving demands to facilities. The problem can be disaggregated over two different sets simultaneously, leading to many smaller models which can be solved more easily. Using disaggregated Benders decomposition and lazy constraints, we solve many instances to optimality that have not previously been solved. We use an analytic procedure to generate Benders optimality cuts which are provably Pareto-optimal. △ Less

Submitted 19 March, 2017; originally announced March 2017.

Comments: 31 pages, 1 figure, 7 tables

MSC Class: 90B10; 90B80

arXiv:1603.02384 [pdf, other]

Column Generation and Lazy Constraints for solving the Liner Ship Fleet Repositioning Problem with cargo flows

Authors: Robin H. Pearce, Alexis Tyler, Michael Forbes

Abstract: We consider an important problem in the shipping industry known as the liner shipping fleet repositioning problem (LSFRP). We examine a public data set for this problem including many instances which have not previously been solved to optimality. We present several improvements on a previous mathematical formulation, however the largest instances still result in models too difficult to solve in re… ▽ More We consider an important problem in the shipping industry known as the liner shipping fleet repositioning problem (LSFRP). We examine a public data set for this problem including many instances which have not previously been solved to optimality. We present several improvements on a previous mathematical formulation, however the largest instances still result in models too difficult to solve in reasonable time. The implementation of column generation reduces the model size significantly, allowing all instances to be solved, with some taking two to three hours. A novel application of lazy constraints further reduces the size of the model, and results in all instances being solved to optimality in under four minutes. △ Less

Submitted 8 March, 2016; originally announced March 2016.

MSC Class: 90B06 (Primary); 90B10 (Secondary)

arXiv:1603.02378 [pdf, other]

Disaggregated Benders Decomposition for solving a Network Maintenance Scheduling Problem

Authors: Robin H. Pearce, Michael Forbes

Abstract: We consider a problem concerning a network and a set of maintenance requests to be undertaken. We wish to schedule the maintenance in such a way as to minimise the impact on the total throughput of the network. We apply disaggregated Benders cuts and lazy constraints to solve the problem to optimality, as well as exploring the strengths and weaknesses of the technique. We prove that our Benders cu… ▽ More We consider a problem concerning a network and a set of maintenance requests to be undertaken. We wish to schedule the maintenance in such a way as to minimise the impact on the total throughput of the network. We apply disaggregated Benders cuts and lazy constraints to solve the problem to optimality, as well as exploring the strengths and weaknesses of the technique. We prove that our Benders cuts are pareto optimal. Solutions to the LP relaxation also provide further valid inequalities to reduce total solve time. We implement these techniques on simulated data presented in previous papers, and compare our solution technique to previous methods and a direct MIP formulation. We prove optimality in many problem instances that have not previously been proven. △ Less

Submitted 15 March, 2017; v1 submitted 7 March, 2016; originally announced March 2016.

Comments: 22 pages, 7 tables, 2 figures

MSC Class: 90B35 (Primary); 68M20; 90B25; 90B10 (Secondary)

arXiv:1304.7276 [pdf, ps, other]

doi 10.1093/mnras/stt715

High velocity outflows from young star-forming galaxies in the UKIDSS Ultra-Deep Survey

Authors: E. J. Bradshaw, O. Almaini, W. G. Hartley, K. T. Smith, C. J. Conselice, J. S. Dunlop, C. Simpson, R. W. Chuter, M. Cirasuolo, S. Foucaud, R. J. McLure, A. Mortlock, H. Pearce

Abstract: We investigate galactic-scale outflows in the redshift range 0.71 < z < 1.63, using 413 K-band selected galaxies observed in the spectroscopic follow-up of the UKIDSS Ultra-Deep Survey (UDSz). The galaxies have an average stellar mass of ~10^9.5 solar masses and span a wide range in rest-frame colours, representing typical star-forming galaxies at this epoch. We stack the spectra by various galaxy… ▽ More We investigate galactic-scale outflows in the redshift range 0.71 < z < 1.63, using 413 K-band selected galaxies observed in the spectroscopic follow-up of the UKIDSS Ultra-Deep Survey (UDSz). The galaxies have an average stellar mass of ~10^9.5 solar masses and span a wide range in rest-frame colours, representing typical star-forming galaxies at this epoch. We stack the spectra by various galaxy properties, including stellar mass, [OII] equivalent width, star-formation rate, specific star-formation rate and rest-frame spectral indices. We find that outflows are present in virtually all spectral stacks, with velocities ranging from 100-1000 km s^-1, indicating that large-scale outflowing winds are a common property at these redshifts. The highest velocity outflows (>500 km s^-1) are found in galaxies with the highest stellar masses and the youngest stellar populations. Our findings suggest that high velocity galactic outflows are mostly driven by star-forming processes rather than AGN, with implied mass outflow rates comparable to the rates of star formation. Such behaviour is consistent with models required to reproduce the high-redshift mass-metallicity relation. △ Less

Submitted 26 April, 2013; originally announced April 2013.

Comments: 16 pages, 15 figures, accepted by MNRAS

arXiv:1303.0816 [pdf, ps, other]

doi 10.1093/mnras/stt383

Studying the emergence of the red sequence through galaxy clustering: host halo masses at z > 2

Authors: William G. Hartley, Omar Almaini, Alice Mortlock, Christopher J. Conselice, Ruth Grützbauch, Chris Simpson, Emma J. Bradshaw, Rob W. Chuter, Sebastien Foucaud, Michele Cirasuolo, James S. Dunlop, Ross J. McLure, Henry Pearce

Abstract: We use the UKIDSS Ultra-Deep Survey, the deepest degree-scale near-infrared survey to date, to investigate the clustering of star-forming and passive galaxies to z ~ 3.5. Our new measurements include the first determination of the clustering for passive galaxies at z > 2, which we achieve using a cross-correlation technique. We find that passive galaxies are the most strongly clustered, typically… ▽ More We use the UKIDSS Ultra-Deep Survey, the deepest degree-scale near-infrared survey to date, to investigate the clustering of star-forming and passive galaxies to z ~ 3.5. Our new measurements include the first determination of the clustering for passive galaxies at z > 2, which we achieve using a cross-correlation technique. We find that passive galaxies are the most strongly clustered, typically hosted by massive dark matter halos with M_halo > 5 x 10^12 M_sun irrespective of redshift or stellar mass. Our findings are consistent with models in which a critical halo mass determines the transition from star-forming to passive galaxies. Star-forming galaxies show no strong correlation between stellar mass and halo mass, but passive galaxies show evidence for an anti-correlation; low-mass passive galaxies appear, on average, to be located in the most massive halos. These results can be understood if the termination of star formation is most efficient for galaxies of low stellar mass in very dense environments. △ Less

Submitted 6 March, 2013; v1 submitted 4 March, 2013; originally announced March 2013.

Comments: Accepted for publication in MNRAS. 16 pages, 9 figures, 1 table

arXiv:1205.4058 [pdf, other]

doi 10.1093/mnras/sts092

The sizes, masses and specific star-formation rates of massive galaxies at 1.3<z<1.5: strong evidence in favour of evolution via minor mergers

Authors: R. J. McLure, H. J. Pearce, J. S. Dunlop, M. Cirasuolo, E. Curtis-Lake, V. A. Bruce, K. Caputi, O. Almaini, D. G. Bonfield, E. J. Bradshaw, F. Buitrago, R. Chuter, S. Foucaud, W. G. Hartley, M. J. Jarvis

Abstract: We report the results of a comprehensive study of the relationship between galaxy size, stellar mass and specific star-formation rate (sSFR) at redshifts 1.3<z<1.5. Based on a mass complete (M_star >= 6x10^10 Msun), spectroscopic sample from the UKIDSS Ultra-deep Survey (UDS), with accurate stellar-mass measurements derived from spectro photometric fitting, we find that at z~1.4 the location of ma… ▽ More We report the results of a comprehensive study of the relationship between galaxy size, stellar mass and specific star-formation rate (sSFR) at redshifts 1.3<z<1.5. Based on a mass complete (M_star >= 6x10^10 Msun), spectroscopic sample from the UKIDSS Ultra-deep Survey (UDS), with accurate stellar-mass measurements derived from spectro photometric fitting, we find that at z~1.4 the location of massive galaxies on the size-mass plane is determined primarily by their sSFR. At this epoch we find that massive galaxies which are passive (sSFR <= 0.1 Gyr^-1) follow a tight size-mass relation, with half-light radii a factor f=2.4+/-0.2 smaller than their local counterparts. Moreover, amongst the passive sub-sample we find no evidence that the off-set from the local size-mass relation is a function of stellar population age. Based on a sub-sample with dynamical mass estimates we also derive an independent estimate of f=2.3+/-0.3 for the typical growth in half-light radius between z~1.4 and the present day. Focusing on the passive sub-sample, we conclude that to produce the necessary evolution predominantly via major mergers would require an unfeasible number of merger events and over populate the high-mass end of the local stellar mass function. In contrast, we find that a scenario in which mass accretion is dominated by minor mergers can produce the necessary evolution, whereby an increase in stellar mass by a factor of ~2, accompanied by an increase in size by a factor of ~3.5, is sufficient to reconcile the size-mass relation at z~1.4 with that observed locally. Finally, we note that a significant fraction (44+/-12%) of the passive galaxies in our sample have a disk-like morphology, providing additional evidence that separate physical processes are responsible for the quenching of star-formation and the morphological transformation of massive galaxies (abridged). △ Less

Submitted 15 November, 2012; v1 submitted 17 May, 2012; originally announced May 2012.

Comments: 21 pages, 11 figures, accepted for publication in MNRAS. Replaced to match accepted version

arXiv:1201.3609 [pdf, ps, other]

doi 10.1088/1475-7516/2012/08/006

Improved constraints on the expansion rate of the Universe up to z~1.1 from the spectroscopic evolution of cosmic chronometers

Authors: M. Moresco, A. Cimatti, Raul Jimenez, L. Pozzetti, G. Zamorani, M. Bolzonella, J. Dunlop, F. Lamareille, M. Mignoli, H. Pearce, P. Rosati, D. Stern, L. Verde, E. Zucca, C. M. Carollo, T. Contini, J. -P. Kneib, O. Le Fevre, S. J. Lilly, V. Mainieri, A. Renzini, M. Scodeggio, I. Balestra, R. Gobat, R. McLure , et al. (43 additional authors not shown)

Abstract: We present new improved constraints on the Hubble parameter H(z) in the redshift range 0.15 < z < 1.1, obtained from the differential spectroscopic evolution of early-type galaxies as a function of redshift. We extract a large sample of early-type galaxies (\sim11000) from several spectroscopic surveys, spanning almost 8 billion years of cosmic lookback time (0.15 < z < 1.42). We select the most m… ▽ More We present new improved constraints on the Hubble parameter H(z) in the redshift range 0.15 < z < 1.1, obtained from the differential spectroscopic evolution of early-type galaxies as a function of redshift. We extract a large sample of early-type galaxies (\sim11000) from several spectroscopic surveys, spanning almost 8 billion years of cosmic lookback time (0.15 < z < 1.42). We select the most massive, red elliptical galaxies, passively evolving and without signature of ongoing star formation. Those galaxies can be used as standard cosmic chronometers, as firstly proposed by Jimenez & Loeb (2002), whose differential age evolution as a function of cosmic time directly probes H(z). We analyze the 4000 Å break (D4000) as a function of redshift, use stellar population synthesis models to theoretically calibrate the dependence of the differential age evolution on the differential D4000, and estimate the Hubble parameter taking into account both statistical and systematical errors. We provide 8 new measurements of H(z) (see Tab. 4), and determine its change in H(z) to a precision of 5-12% mapping homogeneously the redshift range up to z \sim 1.1; for the first time, we place a constraint on H(z) at z \neq 0 with a precision comparable with the one achieved for the Hubble constant (about 5-6% at z \sim 0.2), and covered a redshift range (0.5 < z < 0.8) which is crucial to distinguish many different quintessence cosmologies. These measurements have been tested to best match a ΛCDM model, clearly providing a statistically robust indication that the Universe is undergoing an accelerated expansion. This method shows the potentiality to open a new avenue in constrain a variety of alternative cosmologies, especially when future surveys (e.g. Euclid) will open the possibility to extend it up to z \sim 2. △ Less

Submitted 5 February, 2013; v1 submitted 17 January, 2012; originally announced January 2012.

Comments: 34 pages, 15 figures, 6 tables, published in JCAP. It is a companion to Moresco et al. (2012b, http://arxiv.org/abs/1201.6658) and Jimenez et al. (2012, http://arxiv.org/abs/1201.3608). The H(z) data can be downloaded at http://www.physics-astronomy.unibo.it/en/research/areas/astrophysics/cosmology-with-cosmic-chronometers

arXiv:1110.1722 [pdf, other]

doi 10.1111/j.1365-2966.2012.20720.x

A remarkably high fraction of strong Ly_alpha emitters amongst luminous redshift 6.0<z<6.5 Lyman break galaxies in the UKIDSS Ultra-Deep Survey

Authors: E. Curtis-Lake, R. J. McLure, H. J. Pearce, J. S. Dunlop, M. Cirasuolo, D. P. Stark, O. Almaini, E. J. Bradshaw, R. Chuter, S. Foucaud, W. G. Hartley

Abstract: We present spectroscopic confirmation of ten highly luminous (L >= 2L*) Lyman alpha emitters in the redshift range 6.01<z<6.49 (nine galaxies and one AGN), initially drawn from a sample of fourteen z_phot >= 6 Lyman break galaxies (LBGs) selected from an area of 0.25 square degrees within the UKIDSS Ultra-deep Survey (UDS). Overall, our high rate of spectroscopic confirmation (>= 71%) and low rate… ▽ More We present spectroscopic confirmation of ten highly luminous (L >= 2L*) Lyman alpha emitters in the redshift range 6.01<z<6.49 (nine galaxies and one AGN), initially drawn from a sample of fourteen z_phot >= 6 Lyman break galaxies (LBGs) selected from an area of 0.25 square degrees within the UKIDSS Ultra-deep Survey (UDS). Overall, our high rate of spectroscopic confirmation (>= 71%) and low rate of contamination provides a strong vindication of the photometric redshift analysis used to define the original sample. By considering star-formation rate estimates based on the Ly_alpha and UV continuum luminosity we conclude that our sample is consistent with a Ly_alpha escape fraction of ~25%. Moreover, after careful consideration of the potential uncertainties and biases, we find that 40%-50% of our sample of L >= 2L* galaxies at 6.0<z<6.5 display strong Ly_alpha emission (rest-frame equivalent width >= 25 Angs), a fraction which is a factor of ~2 higher than previously reported for L <= L* galaxies at z~6. Our results suggest that, as the epoch of reionization is approached, it is plausible that the Ly_alpha emitter fraction amongst luminous (L >=2 L*) LBGs shows a similarly sharp increase to that observed in their lower-luminosity (L <= L*) counterparts. △ Less

Submitted 22 February, 2012; v1 submitted 8 October, 2011; originally announced October 2011.

Comments: accepted by MNRAS, 13 pages, 7 figures

Showing 1–36 of 36 results for author: Pearce, H