Search | arXiv e-print repository

Modeling Human Strategy for Flattening Wrinkled Cloth Using Neural Networks

Authors: Nilay Kant, Ashrut Aryal, Rajiv Ranganathan, Ranjan Mukherjee, Charles Owen

Abstract: This paper explores a novel approach to model strategies for flattening wrinkled cloth learning from humans. A human participant study was conducted where the participants were presented with various wrinkle types and tasked with flattening the cloth using the fewest actions possible. A camera and Aruco marker were used to capture images of the cloth and finger movements, respectively. The human s… ▽ More This paper explores a novel approach to model strategies for flattening wrinkled cloth learning from humans. A human participant study was conducted where the participants were presented with various wrinkle types and tasked with flattening the cloth using the fewest actions possible. A camera and Aruco marker were used to capture images of the cloth and finger movements, respectively. The human strategies for flattening the cloth were modeled using a supervised regression neural network, where the cloth images served as input and the human actions as output. Before training the neural network, a series of image processing techniques were applied, followed by Principal Component Analysis (PCA) to extract relevant features from each image and reduce the input dimensionality. This reduction decreased the model's complexity and computational cost. The actions predicted by the neural network closely matched the actual human actions on an independent data set, demonstrating the effectiveness of neural networks in modeling human actions for flattening wrinkled cloth. △ Less

Submitted 19 August, 2024; originally announced September 2024.

Comments: 6 Pages

arXiv:2409.02823 [pdf, other]

Design Contradictions: Help or Hindrance?

Authors: Aron E. Owen, Jonathan C. Roberts

Abstract: The need for innovative ideas in data visualisation drives us to explore new creative approaches. Combining two or more creative words, particularly those that contradict each other, can positively impact the creative process, sparking novel ideas and designs. As we move towards AI-driven design, an open question arises: do these design contradictions work positively with AI tools? Currently, the… ▽ More The need for innovative ideas in data visualisation drives us to explore new creative approaches. Combining two or more creative words, particularly those that contradict each other, can positively impact the creative process, sparking novel ideas and designs. As we move towards AI-driven design, an open question arises: do these design contradictions work positively with AI tools? Currently, the answer is no. AI systems, like large language models (LLMs), rely on algorithms that engender similarity, whereas creativity often requires divergence and novelty. This poster initiates a conversation on how to drive AI systems to be more creative and generate new ideas. This research invites us to reconsider traditional design methods and explore new approaches in an AI-driven world. Can we apply the same techniques used in traditional design, like the double diamond model, or do we need new methods for design engineering? How can we quickly design visualisations and craft new ideas with generative AI? This paper seeks to start this critical conversation and offers practical insights into the potential of AI in driving creativity in data visualisation. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.02036 [pdf, other]

Towards Metrics for Evaluating Creativity in Visualisation Design

Authors: Aron E Owen, Jonathan C Roberts

Abstract: Creativity in visualisation design is essential for designers and data scientists who need to present data in innovative ways. It is often achieved through sketching or drafting low-fidelity prototypes. However, judging this innovation is often difficult. A creative visualisation test would offer a structured approach to enhancing visual thinking and design skills, which are vital across many fiel… ▽ More Creativity in visualisation design is essential for designers and data scientists who need to present data in innovative ways. It is often achieved through sketching or drafting low-fidelity prototypes. However, judging this innovation is often difficult. A creative visualisation test would offer a structured approach to enhancing visual thinking and design skills, which are vital across many fields. Such a test can facilitate objective evaluation, skill identification, benchmarking, fostering innovation, and improving learning outcomes. In developing such a test, we propose focusing on four criteria: Quantity, Correctness, Novelty, and Feasibility. These criteria integrate into a test that is easy to administer. We name it the Rowen Test of Creativity in Visualisation Design; We introduce the test, scoring system and results from using eight visualisation experts. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2409.01969 [pdf, other]

Connectivity structure and dynamics of nonlinear recurrent neural networks

Authors: David G. Clark, Owen Marschall, Alexander van Meegen, Ashok Litwin-Kumar

Abstract: We develop a theory to analyze how structure in connectivity shapes the high-dimensional, internally generated activity of nonlinear recurrent neural networks. Using two complementary methods -- a path-integral calculation of fluctuations around the saddle point, and a recently introduced two-site cavity approach -- we derive analytic expressions that characterize important features of collective… ▽ More We develop a theory to analyze how structure in connectivity shapes the high-dimensional, internally generated activity of nonlinear recurrent neural networks. Using two complementary methods -- a path-integral calculation of fluctuations around the saddle point, and a recently introduced two-site cavity approach -- we derive analytic expressions that characterize important features of collective activity, including its dimensionality and temporal correlations. To model structure in the coupling matrices of real neural circuits, such as synaptic connectomes obtained through electron microscopy, we introduce the random-mode model, which parameterizes a coupling matrix using random input and output modes and a specified spectrum. This model enables systematic study of the effects of low-dimensional structure in connectivity on neural activity. These effects manifest in features of collective activity, that we calculate, and can be undetectable when analyzing only single-neuron activities. We derive a relation between the effective rank of the coupling matrix and the dimension of activity. By extending the random-mode model, we compare the effects of single-neuron heterogeneity and low-dimensional connectivity. We also investigate the impact of structured overlaps between input and output modes, a feature of biological coupling matrices. Our theory provides tools to relate neural-network architecture and collective dynamics in artificial and biological systems. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: 35 pages, 11 figures

arXiv:2409.01283 [pdf, other]

Towards a Generative AI Design Dialogue

Authors: Aron E. Owen, Jonathan C. Roberts

Abstract: Traditional visualisation designers often start with sketches before implementation. With generative AI, these sketches can be turned into AI-generated visualisations using specific prompts. However, guiding AI to create compelling visuals can be challenging. We propose a new design process where designers verbalise their thoughts during work, later converting these narratives into AI prompts. Thi… ▽ More Traditional visualisation designers often start with sketches before implementation. With generative AI, these sketches can be turned into AI-generated visualisations using specific prompts. However, guiding AI to create compelling visuals can be challenging. We propose a new design process where designers verbalise their thoughts during work, later converting these narratives into AI prompts. This approach helps AI generate accurate visuals and assists designers in refining their concepts, enhancing the overall design process. Blending human creativity with AI capabilities enables rapid iteration, leading to higher quality and more innovative visualisations, making design more accessible and efficient. △ Less

Submitted 19 August, 2024; originally announced September 2024.

arXiv:2408.15116 [pdf, other]

Evaluating Stability of Unreflective Alignment

Authors: James Lucassen, Mark Henry, Philippa Wright, Owen Yeung

Abstract: Many theoretical obstacles to AI alignment are consequences of reflective stability - the problem of designing alignment mechanisms that the AI would not disable if given the option. However, problems stemming from reflective stability are not obviously present in current LLMs, leading to disagreement over whether they will need to be solved to enable safe delegation of cognitive labor. In this pa… ▽ More Many theoretical obstacles to AI alignment are consequences of reflective stability - the problem of designing alignment mechanisms that the AI would not disable if given the option. However, problems stemming from reflective stability are not obviously present in current LLMs, leading to disagreement over whether they will need to be solved to enable safe delegation of cognitive labor. In this paper, we propose Counterfactual Priority Change (CPC) destabilization as a mechanism by which reflective stability problems may arise in future LLMs. We describe two risk factors for CPC-destabilization: 1) CPC-based stepping back and 2) preference instability. We develop preliminary evaluations for each of these risk factors, and apply them to frontier LLMs. Our findings indicate that in current LLMs, increased scale and capability are associated with increases in both CPC-based stepping back and preference instability, suggesting that CPC-destabilization may cause reflective stability problems in future LLMs. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.10439 [pdf, other]

Visual Storytelling: A Methodological Approach to Designing and Implementing a Visualisation Poster

Authors: Rhiannon Owen, Jonathan Roberts

Abstract: We present a design study of developing a visualisation poster. Posters can be difficult to create, and the story on a poster is not always clear. Using a case-study approach we propose three important aspects: the poster should have a clear focus (especially a hero visualisation), envisioning its use helps to drive the important aspects, and third the essence (its fundamental concept and guiding… ▽ More We present a design study of developing a visualisation poster. Posters can be difficult to create, and the story on a poster is not always clear. Using a case-study approach we propose three important aspects: the poster should have a clear focus (especially a hero visualisation), envisioning its use helps to drive the important aspects, and third the essence (its fundamental concept and guiding idea) must be clear. We will use case studies that have focused on the use of the Five Design-Sheet method (FdS) as a way to sketch and plan a visualisation, before successfully implementing and creating the visual poster. The case studies serve as a practical illustration of the workflow, offering a means to explain the three key processes involved: (1) comprehending the data, (2) employing a design study with the FdS (Five Design-Sheet), (3) crafting, evaluating and refining the visualisation. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 5 pages, 1 figure, accepted for publication to the EG UK Computer Graphics & Visual Computing (CGVC) 2024

ACM Class: I.3.8; K.3.0

arXiv:2408.06507 [pdf]

Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset

Authors: Stefano Puliti, Emily R. Lines, Jana Müllerová, Julian Frey, Zoe Schindler, Adrian Straker, Matthew J. Allen, Lukas Winiwarter, Nataliia Rehush, Hristina Hristova, Brent Murray, Kim Calders, Louise Terryn, Nicholas Coops, Bernhard Höfle, Samuli Junttila, Martin Krůček, Grzegorz Krok, Kamil Král, Shaun R. Levick, Linda Luck, Azim Missarov, Martin Mokroš, Harry J. F. Owen, Krzysztof Stereńczak , et al. (8 additional authors not shown)

Abstract: Proximally-sensed laser scanning offers significant potential for automated forest data capture, but challenges remain in automatically identifying tree species without additional ground data. Deep learning (DL) shows promise for automation, yet progress is slowed by the lack of large, diverse, openly available labeled datasets of single tree point clouds. This has impacted the robustness of DL mo… ▽ More Proximally-sensed laser scanning offers significant potential for automated forest data capture, but challenges remain in automatically identifying tree species without additional ground data. Deep learning (DL) shows promise for automation, yet progress is slowed by the lack of large, diverse, openly available labeled datasets of single tree point clouds. This has impacted the robustness of DL models and the ability to establish best practices for species classification. To overcome these challenges, the FOR-species20K benchmark dataset was created, comprising over 20,000 tree point clouds from 33 species, captured using terrestrial (TLS), mobile (MLS), and drone laser scanning (ULS) across various European forests, with some data from other regions. This dataset enables the benchmarking of DL models for tree species classification, including both point cloud-based (PointNet++, MinkNet, MLP-Mixer, DGCNNs) and multi-view image-based methods (SimpleView, DetailView, YOLOv5). 2D image-based models generally performed better (average OA = 0.77) than 3D point cloud-based models (average OA = 0.72), with consistent results across different scanning platforms and sensors. The top model, DetailView, was particularly robust, handling data imbalances well and generalizing effectively across tree sizes. The FOR-species20K dataset, available at https://zenodo.org/records/13255198, is a key resource for developing and benchmarking DL models for tree species classification using laser scanning data, providing a foundation for future advancements in the field. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.03319 [pdf, other]

Training LLMs to Recognize Hedges in Spontaneous Narratives

Authors: Amie J. Paige, Adil Soubki, John Murzaku, Owen Rambow, Susan E. Brennan

Abstract: Hedges allow speakers to mark utterances as provisional, whether to signal non-prototypicality or "fuzziness", to indicate a lack of commitment to an utterance, to attribute responsibility for a statement to someone else, to invite input from a partner, or to soften critical feedback in the service of face-management needs. Here we focus on hedges in an experimentally parameterized corpus of 63 Ro… ▽ More Hedges allow speakers to mark utterances as provisional, whether to signal non-prototypicality or "fuzziness", to indicate a lack of commitment to an utterance, to attribute responsibility for a statement to someone else, to invite input from a partner, or to soften critical feedback in the service of face-management needs. Here we focus on hedges in an experimentally parameterized corpus of 63 Roadrunner cartoon narratives spontaneously produced from memory by 21 speakers for co-present addressees, transcribed to text (Galati and Brennan, 2010). We created a gold standard of hedges annotated by human coders (the Roadrunner-Hedge corpus) and compared three LLM-based approaches for hedge detection: fine-tuning BERT, and zero and few-shot prompting with GPT-4o and LLaMA-3. The best-performing approach was a fine-tuned BERT model, followed by few-shot GPT-4o. After an error analysis on the top performing approaches, we used an LLM-in-the-Loop approach to improve the gold standard coding, as well as to highlight cases in which hedges are ambiguous in linguistically interesting ways that will guide future research. This is the first step in our research program to train LLMs to interpret and generate collateral signals appropriately and meaningfully in conversation. △ Less

Submitted 6 August, 2024; originally announced August 2024.

Comments: Amie Paige, Adil Soubki, and John Murzaku contributed equally to this study

ACM Class: I.2.7

Journal ref: SIGDIAL 2024

arXiv:2408.02798 [pdf, other]

Examining Gender and Power on Wikipedia Through Face and Politeness

Authors: Adil Soubki, Shyne Choi, Owen Rambow

Abstract: We propose a framework for analyzing discourse by combining two interdependent concepts from sociolinguistic theory: face acts and politeness. While politeness has robust existing tools and data, face acts are less resourced. We introduce a new corpus created by annotating Wikipedia talk pages with face acts and we use this to train a face act tagger. We then employ our framework to study how face… ▽ More We propose a framework for analyzing discourse by combining two interdependent concepts from sociolinguistic theory: face acts and politeness. While politeness has robust existing tools and data, face acts are less resourced. We introduce a new corpus created by annotating Wikipedia talk pages with face acts and we use this to train a face act tagger. We then employ our framework to study how face and politeness interact with gender and power in discussions between Wikipedia editors. Among other findings, we observe that female Wikipedians are not only more polite, which is consistent with prior studies, but that this difference corresponds with significantly more language directed at humbling aspects of their own face. Interestingly, the distinction nearly vanishes once limiting to editors with administrative power. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Journal ref: SIGDIAL 2024

arXiv:2407.18712 [pdf, other]

Cluster-norm for Unsupervised Probing of Knowledge

Authors: Walter Laurito, Sharan Maiya, Grégoire Dhimoïla, Owen, Yeung, Kaarel Hänni

Abstract: The deployment of language models brings challenges in generating reliable information, especially when these models are fine-tuned using human preferences. To extract encoded knowledge without (potentially) biased human labels, unsupervised probing techniques like Contrast-Consistent Search (CCS) have been developed (Burns et al., 2022). However, salient but unrelated features in a given dataset… ▽ More The deployment of language models brings challenges in generating reliable information, especially when these models are fine-tuned using human preferences. To extract encoded knowledge without (potentially) biased human labels, unsupervised probing techniques like Contrast-Consistent Search (CCS) have been developed (Burns et al., 2022). However, salient but unrelated features in a given dataset can mislead these probes (Farquhar et al., 2023). Addressing this, we propose a cluster normalization method to minimize the impact of such features by clustering and normalizing activations of contrast pairs before applying unsupervised probing techniques. While this approach does not address the issue of differentiating between knowledge in general and simulated knowledge - a major issue in the literature of latent knowledge elicitation (Christiano et al., 2021) - it significantly improves the ability of unsupervised probes to identify the intended knowledge amidst distractions. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: 34 pages, 35 figures

arXiv:2407.16593 [pdf, other]

A Comparative Study on Patient Language across Therapeutic Domains for Effective Patient Voice Classification in Online Health Discussions

Authors: Giorgos Lysandrou, Roma English Owen, Vanja Popovic, Grant Le Brun, Aryo Pradipta Gema, Beatrice Alex, Elizabeth A. L. Fairley

Abstract: There exists an invisible barrier between healthcare professionals' perception of a patient's clinical experience and the reality. This barrier may be induced by the environment that hinders patients from sharing their experiences openly with healthcare professionals. As patients are observed to discuss and exchange knowledge more candidly on social media, valuable insights can be leveraged from t… ▽ More There exists an invisible barrier between healthcare professionals' perception of a patient's clinical experience and the reality. This barrier may be induced by the environment that hinders patients from sharing their experiences openly with healthcare professionals. As patients are observed to discuss and exchange knowledge more candidly on social media, valuable insights can be leveraged from these platforms. However, the abundance of non-patient posts on social media necessitates filtering out such irrelevant content to distinguish the genuine voices of patients, a task we refer to as patient voice classification. In this study, we analyse the importance of linguistic characteristics in accurately classifying patient voices. Our findings underscore the essential role of linguistic and statistical text similarity analysis in identifying common patterns among patient groups. These results allude to even starker differences in the way patients express themselves at a disease level and across various therapeutic domains. Additionally, we fine-tuned a pre-trained Language Model on the combined datasets with similar linguistic patterns, resulting in a highly accurate automatic patient voice classification. Being the pioneering study on the topic, our focus on extracting authentic patient experiences from social media stands as a crucial step towards advancing healthcare standards and fostering a patient-centric approach. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 14 pages, 4 figures, 5 tables, funded by Talking Medicines Limited

arXiv:2407.11894 [pdf, other]

Deep Learning without Global Optimization by Random Fourier Neural Networks

Authors: Owen Davis, Gianluca Geraci, Mohammad Motamed

Abstract: We introduce a new training algorithm for variety of deep neural networks that utilize random complex exponential activation functions. Our approach employs a Markov Chain Monte Carlo sampling procedure to iteratively train network layers, avoiding global and gradient-based optimization while maintaining error control. It consistently attains the theoretical approximation rate for residual network… ▽ More We introduce a new training algorithm for variety of deep neural networks that utilize random complex exponential activation functions. Our approach employs a Markov Chain Monte Carlo sampling procedure to iteratively train network layers, avoiding global and gradient-based optimization while maintaining error control. It consistently attains the theoretical approximation rate for residual networks with complex exponential activation functions, determined by network complexity. Additionally, it enables efficient learning of multiscale and high-frequency features, producing interpretable parameter distributions. Despite using sinusoidal basis functions, we do not observe Gibbs phenomena in approximating discontinuous target functions. △ Less

Submitted 16 July, 2024; originally announced July 2024.

MSC Class: 65T40; 90C15; 65C05; 65C40; 60J22; 68T07

arXiv:2407.08162 [pdf, other]

Improving Visual Place Recognition Based Robot Navigation Through Verification of Localization Estimates

Authors: Owen Claxton, Connor Malone, Helen Carson, Jason Ford, Gabe Bolton, Iman Shames, Michael Milford

Abstract: Visual Place Recognition (VPR) systems often have imperfect performance, which affects robot navigation decisions. This research introduces a novel Multi-Layer Perceptron (MLP) integrity monitor for VPR which demonstrates improved performance and generalizability over the previous state-of-the-art SVM approach, removing per-environment training and reducing manual tuning requirements. We test our… ▽ More Visual Place Recognition (VPR) systems often have imperfect performance, which affects robot navigation decisions. This research introduces a novel Multi-Layer Perceptron (MLP) integrity monitor for VPR which demonstrates improved performance and generalizability over the previous state-of-the-art SVM approach, removing per-environment training and reducing manual tuning requirements. We test our proposed system in extensive real-world experiments, where we also present two real-time integrity-based VPR verification methods: an instantaneous rejection method for a robot navigating to a goal zone (Experiment 1); and a historical method that takes a best, verified, match from its recent trajectory and uses an odometer to extrapolate forwards to a current position estimate (Experiment 2). Noteworthy results for Experiment 1 include a decrease in aggregate mean along-track goal error from ~9.8m to ~3.1m in missions the robot pursued to completion, and an increase in the aggregate rate of successful mission completion from ~41% to ~55%. Experiment 2 showed a decrease in aggregate mean along-track localization error from ~2.0m to ~0.5m, and an increase in the aggregate precision of localization attempts from ~97% to ~99%. Overall, our results demonstrate the practical usefulness of a VPR integrity monitor in real-world robotics to improve VPR localization and consequent navigation performance. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Currently Under Review

arXiv:2407.05206 [pdf, other]

Helios: An extremely low power event-based gesture recognition for always-on smart eyewear

Authors: Prarthana Bhattacharyya, Joshua Mitton, Ryan Page, Owen Morgan, Ben Menzies, Gabriel Homewood, Kemi Jacobs, Paolo Baesso, David Trickett, Chris Mair, Taru Muhonen, Rory Clark, Louis Berridge, Richard Vigars, Iain Wallace

Abstract: This paper introduces Helios, the first extremely low-power, real-time, event-based hand gesture recognition system designed for all-day on smart eyewear. As augmented reality (AR) evolves, current smart glasses like the Meta Ray-Bans prioritize visual and wearable comfort at the expense of functionality. Existing human-machine interfaces (HMIs) in these devices, such as capacitive touch and voice… ▽ More This paper introduces Helios, the first extremely low-power, real-time, event-based hand gesture recognition system designed for all-day on smart eyewear. As augmented reality (AR) evolves, current smart glasses like the Meta Ray-Bans prioritize visual and wearable comfort at the expense of functionality. Existing human-machine interfaces (HMIs) in these devices, such as capacitive touch and voice controls, present limitations in ergonomics, privacy and power consumption. Helios addresses these challenges by leveraging natural hand interactions for a more intuitive and comfortable user experience. Our system utilizes a extremely low-power and compact 3mmx4mm/20mW event camera to perform natural hand-based gesture recognition for always-on smart eyewear. The camera's output is processed by a convolutional neural network (CNN) running on a NXP Nano UltraLite compute platform, consuming less than 350mW. Helios can recognize seven classes of gestures, including subtle microgestures like swipes and pinches, with 91% accuracy. We also demonstrate real-time performance across 20 users at a remarkably low latency of 60ms. Our user testing results align with the positive feedback we received during our recent successful demo at AWE-USA-2024. △ Less

Submitted 26 August, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

Comments: Accepted at ECCV-Integrating Computer Vision in Smart Eyewear, 2024. 18 pages, 10 figures. First three authors contributed equally to this paper

arXiv:2407.04915 [pdf, other]

Safe Generative Chats in a WhatsApp Intelligent Tutoring System

Authors: Zachary Levonian, Owen Henkel

Abstract: Large language models (LLMs) are flexible, personalizable, and available, which makes their use within Intelligent Tutoring Systems (ITSs) appealing. However, that flexibility creates risks: inaccuracies, harmful content, and non-curricular material. Ethically deploying LLM-backed ITS systems requires designing safeguards that ensure positive experiences for students. We describe the design of a c… ▽ More Large language models (LLMs) are flexible, personalizable, and available, which makes their use within Intelligent Tutoring Systems (ITSs) appealing. However, that flexibility creates risks: inaccuracies, harmful content, and non-curricular material. Ethically deploying LLM-backed ITS systems requires designing safeguards that ensure positive experiences for students. We describe the design of a conversational system integrated into an ITS, and our experience evaluating its safety with red-teaming, an in-classroom usability test, and field deployment. We present empirical data from more than 8,000 student conversations with this system, finding that GPT-3.5 rarely generates inappropriate messages. Comparatively more common is inappropriate messages from students, which prompts us to reason about safeguarding as a content moderation and classroom management problem. The student interaction behaviors we observe provide implications for designers - to focus on student inputs as a content moderation problem - and implications for researchers - to focus on subtle forms of bad content. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: EDM 2024 LLM Workshop

arXiv:2407.04153 [pdf, other]

Mixture of A Million Experts

Authors: Xu Owen He

Abstract: The feedforward (FFW) layers in standard transformer architectures incur a linear increase in computational costs and activation memory as the hidden layer width grows. Sparse mixture-of-experts (MoE) architectures have emerged as a viable approach to address this issue by decoupling model size from computational cost. The recent discovery of the fine-grained MoE scaling law shows that higher gran… ▽ More The feedforward (FFW) layers in standard transformer architectures incur a linear increase in computational costs and activation memory as the hidden layer width grows. Sparse mixture-of-experts (MoE) architectures have emerged as a viable approach to address this issue by decoupling model size from computational cost. The recent discovery of the fine-grained MoE scaling law shows that higher granularity leads to better performance. However, existing MoE models are limited to a small number of experts due to computational and optimization challenges. This paper introduces PEER (parameter efficient expert retrieval), a novel layer design that utilizes the product key technique for sparse retrieval from a vast pool of tiny experts (over a million). Experiments on language modeling tasks demonstrate that PEER layers outperform dense FFWs and coarse-grained MoEs in terms of performance-compute trade-off. By enabling efficient utilization of a massive number of experts, PEER unlocks the potential for further scaling of transformer models while maintaining computational efficiency. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.15649

Efficient Human Pose Estimation: Leveraging Advanced Techniques with MediaPipe

Authors: Sandeep Singh Sengar, Abhishek Kumar, Owen Singh

Abstract: This study presents significant enhancements in human pose estimation using the MediaPipe framework. The research focuses on improving accuracy, computational efficiency, and real-time processing capabilities by comprehensively optimising the underlying algorithms. Novel modifications are introduced that substantially enhance pose estimation accuracy across challenging scenarios, such as dynamic m… ▽ More This study presents significant enhancements in human pose estimation using the MediaPipe framework. The research focuses on improving accuracy, computational efficiency, and real-time processing capabilities by comprehensively optimising the underlying algorithms. Novel modifications are introduced that substantially enhance pose estimation accuracy across challenging scenarios, such as dynamic movements and partial occlusions. The improved framework is benchmarked against traditional models, demonstrating considerable precision and computational speed gains. The advancements have wide-ranging applications in augmented reality, sports analytics, and healthcare, enabling more immersive experiences, refined performance analysis, and advanced patient monitoring. The study also explores the integration of these enhancements within mobile and embedded systems, addressing the need for computational efficiency and broader accessibility. The implications of this research set a new benchmark for real-time human pose estimation technologies and pave the way for future innovations in the field. The implementation code for the paper is available at https://github.com/avhixd/Human_pose_estimation. △ Less

Submitted 13 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

Comments: There is an error in this work. BY mistake in Section 3.3, the angle is calculated wrongly

arXiv:2406.15646 [pdf, other]

VigilEye -- Artificial Intelligence-based Real-time Driver Drowsiness Detection

Authors: Sandeep Singh Sengar, Aswin Kumar, Owen Singh

Abstract: This study presents a novel driver drowsiness detection system that combines deep learning techniques with the OpenCV framework. The system utilises facial landmarks extracted from the driver's face as input to Convolutional Neural Networks trained to recognise drowsiness patterns. The integration of OpenCV enables real-time video processing, making the system suitable for practical implementation… ▽ More This study presents a novel driver drowsiness detection system that combines deep learning techniques with the OpenCV framework. The system utilises facial landmarks extracted from the driver's face as input to Convolutional Neural Networks trained to recognise drowsiness patterns. The integration of OpenCV enables real-time video processing, making the system suitable for practical implementation. Extensive experiments on a diverse dataset demonstrate high accuracy, sensitivity, and specificity in detecting drowsiness. The proposed system has the potential to enhance road safety by providing timely alerts to prevent accidents caused by driver fatigue. This research contributes to advancing real-time driver monitoring systems and has implications for automotive safety and intelligent transportation systems. The successful application of deep learning techniques in this context opens up new avenues for future research in driver monitoring and vehicle safety. The implementation code for the paper is available at https://github.com/LUFFY7001/Driver-s-Drowsiness-Detection. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.15317 [pdf, other]

Diverse beam search to find densest-known planar unit distance graphs

Authors: Peter Engel, Owen Hammond-Lee, Yiheng Su, Dániel Varga, Pál Zsámboki

Abstract: This paper addresses the problem of determining the maximum number of edges in a unit distance graph (UDG) of $n$ vertices using computer search. An unsolved problem of Paul Erdős asks the maximum number of edges $u(n)$ a UDG of $n$ vertices can have. Those UDGs that attain $u(n)$ are called "maximally dense." In this paper, we seek to demonstrate a computer algorithm to generate dense UDGs for ve… ▽ More This paper addresses the problem of determining the maximum number of edges in a unit distance graph (UDG) of $n$ vertices using computer search. An unsolved problem of Paul Erdős asks the maximum number of edges $u(n)$ a UDG of $n$ vertices can have. Those UDGs that attain $u(n)$ are called "maximally dense." In this paper, we seek to demonstrate a computer algorithm to generate dense UDGs for vertex counts up to at least 100. Via beam search with an added visitation metric, our algorithm finds all known maximally dense UDGs up to isomorphism at the push of a button. In addition, for $15 < n$, where $u(n)$ is unknown, i) the algorithm finds all previously published densest UDGs up to isomorphism for $15 < n \le 30$, and ii) the rate of growth of $u(n)/n$ remains similar for $30 < n$. The code and database of over 60 million UDGs found by our algorithm will be open-sourced at time of publication. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.13961 [pdf, other]

Equivariant Offline Reinforcement Learning

Authors: Arsh Tangri, Ondrej Biza, Dian Wang, David Klee, Owen Howell, Robert Platt

Abstract: Sample efficiency is critical when applying learning-based methods to robotic manipulation due to the high cost of collecting expert demonstrations and the challenges of on-robot policy learning through online Reinforcement Learning (RL). Offline RL addresses this issue by enabling policy learning from an offline dataset collected using any behavioral policy, regardless of its quality. However, re… ▽ More Sample efficiency is critical when applying learning-based methods to robotic manipulation due to the high cost of collecting expert demonstrations and the challenges of on-robot policy learning through online Reinforcement Learning (RL). Offline RL addresses this issue by enabling policy learning from an offline dataset collected using any behavioral policy, regardless of its quality. However, recent advancements in offline RL have predominantly focused on learning from large datasets. Given that many robotic manipulation tasks can be formulated as rotation-symmetric problems, we investigate the use of $SO(2)$-equivariant neural networks for offline RL with a limited number of demonstrations. Our experimental results show that equivariant versions of Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) outperform their non-equivariant counterparts. We provide empirical evidence demonstrating how equivariance improves offline learning algorithms in the low-data regime. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12800 [pdf, other]

Supporting Human Raters with the Detection of Harmful Content using Large Language Models

Authors: Kurt Thomas, Patrick Gage Kelley, David Tao, Sarah Meiklejohn, Owen Vallis, Shunwen Tan, Blaž Bratanič, Felipe Tiengo Ferreira, Vijay Kumar Eranti, Elie Bursztein

Abstract: In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 comments, we demonstrate that LLMs can achieve 90% accuracy when compared to human verdicts. We explore how to best leverage the… ▽ More In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 comments, we demonstrate that LLMs can achieve 90% accuracy when compared to human verdicts. We explore how to best leverage these capabilities, proposing five design patterns that integrate LLMs with human rating, such as pre-filtering non-violative content, detecting potential errors in human rating, or surfacing critical context to support human rating. We outline how to support all of these design patterns using a single, optimized prompt. Beyond these synthetic experiments, we share how piloting our proposed techniques in a real-world review queue yielded a 41.5% improvement in optimizing available human rater capacity, and a 9--11% increase (absolute) in precision and recall for detecting violative content. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12131 [pdf, other]

Gram2Vec: An Interpretable Document Vectorizer

Authors: Peter Zeng, Eric Sclafani, Owen Rambow

Abstract: We present Gram2Vec, a grammatical style embedding algorithm that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In our demo, we present a way to visualize a mapping of authors to do… ▽ More We present Gram2Vec, a grammatical style embedding algorithm that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In our demo, we present a way to visualize a mapping of authors to documents based on their Gram2Vec vectors and highlight the ability to drop or add features to view which authors make certain linguistic choices. Next, we use authorship attribution as an application to show how Gram2Vec can explain why a document is attributed to a certain author, using cosine similarities between the Gram2Vec feature vectors to calculate the distances between candidate documents and a query document. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 6 pages, 2 figures

arXiv:2406.10786 [pdf, other]

Evaluating LLMs with Multiple Problems at once: A New Paradigm for Probing LLM Capabilities

Authors: Zhengxiang Wang, Jordan Kodner, Owen Rambow

Abstract: Current LLM evaluation predominantly performs evaluation with prompts comprising single problems. We propose multi-problem evaluation as an additional approach to study the multiple problem handling capabilities of LLMs. We present a systematic study in this regard by comprehensively examining 7 LLMs on 4 related types of tasks constructed from 6 classification benchmarks. The 4 task types include… ▽ More Current LLM evaluation predominantly performs evaluation with prompts comprising single problems. We propose multi-problem evaluation as an additional approach to study the multiple problem handling capabilities of LLMs. We present a systematic study in this regard by comprehensively examining 7 LLMs on 4 related types of tasks constructed from 6 classification benchmarks. The 4 task types include traditional single-problem tasks, homogeneous multi-problem tasks, and two index selection tasks that embed the multi-problem tasks. We find that LLMs are competent multi-problem solvers: they generally perform (nearly) as well on multi-problem tasks as on single-problem tasks. Furthermore, contrary to common expectation, they often do not suffer from a positional bias with long inputs. This makes multi-problem prompting a simple and cost-efficient prompting method of practical significance. However, our results also strongly indicate that LLMs lack true understanding: they perform significantly worse in the two index selection tasks than in the multi-problem task under various evaluation settings, although they can indeed do index selection in general. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 20 pages, 15 figures, 9 tables

arXiv:2406.07466 [pdf, other]

Multimodal Belief Prediction

Authors: John Murzaku, Adil Soubki, Owen Rambow

Abstract: Recognizing a speaker's level of commitment to a belief is a difficult task; humans do not only interpret the meaning of the words in context, but also understand cues from intonation and other aspects of the audio signal. Many papers and corpora in the NLP community have approached the belief prediction task using text-only approaches. We are the first to frame and present results on the multimod… ▽ More Recognizing a speaker's level of commitment to a belief is a difficult task; humans do not only interpret the meaning of the words in context, but also understand cues from intonation and other aspects of the audio signal. Many papers and corpora in the NLP community have approached the belief prediction task using text-only approaches. We are the first to frame and present results on the multimodal belief prediction task. We use the CB-Prosody corpus (CBP), containing aligned text and audio with speaker belief annotations. We first report baselines and significant features using acoustic-prosodic features and traditional machine learning methods. We then present text and audio baselines for the CBP corpus fine-tuning on BERT and Whisper respectively. Finally, we present our multimodal architecture which fine-tunes on BERT and Whisper and uses multiple fusion methods, improving on both modalities alone. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: John Murzaku and Adil Soubki contributed equally to this work

Journal ref: Interspeech 2024

arXiv:2406.07263 [pdf, other]

Active learning for affinity prediction of antibodies

Authors: Alexandra Gessner, Sebastian W. Ober, Owen Vickery, Dino Oglić, Talip Uçar

Abstract: The primary objective of most lead optimization campaigns is to enhance the binding affinity of ligands. For large molecules such as antibodies, identifying mutations that enhance antibody affinity is particularly challenging due to the combinatorial explosion of potential mutations. When the structure of the antibody-antigen complex is available, relative binding free energy (RBFE) methods can of… ▽ More The primary objective of most lead optimization campaigns is to enhance the binding affinity of ligands. For large molecules such as antibodies, identifying mutations that enhance antibody affinity is particularly challenging due to the combinatorial explosion of potential mutations. When the structure of the antibody-antigen complex is available, relative binding free energy (RBFE) methods can offer valuable insights into how different mutations will impact the potency and selectivity of a drug candidate, thereby reducing the reliance on costly and time-consuming wet-lab experiments. However, accurately simulating the physics of large molecules is computationally intensive. We present an active learning framework that iteratively proposes promising sequences for simulators to evaluate, thereby accelerating the search for improved binders. We explore different modeling approaches to identify the most effective surrogate model for this task, and evaluate our framework both using pre-computed pools of data and in a realistic full-loop setting. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06576 [pdf, other]

OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

Authors: Owen Dugan, Donato Manuel Jimenez Beneto, Charlotte Loh, Zhuo Chen, Rumen Dangovski, Marin Soljačić

Abstract: Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. Language model systems often enable LLMs to generate code for arithmetic operations to achieve accurate calculations. However, this approach compromises speed and security, and fine-tuning risks the language model losing prior… ▽ More Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. Language model systems often enable LLMs to generate code for arithmetic operations to achieve accurate calculations. However, this approach compromises speed and security, and fine-tuning risks the language model losing prior capabilities. We propose a framework that enables exact arithmetic in a single autoregressive step, providing faster, more secure, and more interpretable LLM systems with arithmetic capabilities. We use the hidden states of a LLM to control a symbolic architecture that performs arithmetic. Our implementation using Llama 3 with OccamNet as a symbolic model (OccamLlama) achieves 100\% accuracy on single arithmetic operations ($+,-,\times,÷,\sin{},\cos{},\log{},\exp{},\sqrt{}$), outperforming GPT 4o with and without a code interpreter. Furthermore, OccamLlama outperforms GPT 4o with and without a code interpreter on average across a range of mathematical problem solving benchmarks, demonstrating that OccamLLMs can excel in arithmetic tasks, even surpassing much larger models. We will make our code public shortly. △ Less

Submitted 2 September, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.04109 [pdf, other]

Intention and Face in Dialog

Authors: Adil Soubki, Owen Rambow

Abstract: The notion of face described by Brown and Levinson (1987) has been studied in great detail, but a critical aspect of the framework, that which focuses on how intentions mediate the planning of turns which impose upon face, has received far less attention. We present an analysis of three computational systems trained for classifying both intention and politeness, focusing on how the former influenc… ▽ More The notion of face described by Brown and Levinson (1987) has been studied in great detail, but a critical aspect of the framework, that which focuses on how intentions mediate the planning of turns which impose upon face, has received far less attention. We present an analysis of three computational systems trained for classifying both intention and politeness, focusing on how the former influences the latter. In politeness theory, agents attend to the desire to have their wants appreciated (positive face), and a complementary desire to act unimpeded and maintain freedom (negative face). Similar to speech acts, utterances can perform so-called face acts which can either raise or threaten the positive or negative face of the speaker or hearer. We begin by using an existing corpus to train a model which classifies face acts, achieving a new SoTA in the process. We then observe that every face act has an underlying intention that motivates it and perform additional experiments integrating dialog act annotations to provide these intentions by proxy. Our analysis finds that dialog acts improve performance on face act detection for minority classes and points to a close relationship between aspects of face and intent. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Journal ref: May 2024. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 9143-9153, Torino, Italia. ELRA and ICCL

arXiv:2406.03476 [pdf, other]

Does your data spark joy? Performance gains from domain upsampling at the end of training

Authors: Cody Blakeney, Mansheej Paul, Brett W. Larsen, Sean Owen, Jonathan Frankle

Abstract: Pretraining datasets for large language models (LLMs) have grown to trillions of tokens composed of large amounts of CommonCrawl (CC) web scrape along with smaller, domain-specific datasets. It is expensive to understand the impact of these domain-specific datasets on model capabilities as training at large FLOP scales is required to reveal significant changes to difficult and emergent benchmarks.… ▽ More Pretraining datasets for large language models (LLMs) have grown to trillions of tokens composed of large amounts of CommonCrawl (CC) web scrape along with smaller, domain-specific datasets. It is expensive to understand the impact of these domain-specific datasets on model capabilities as training at large FLOP scales is required to reveal significant changes to difficult and emergent benchmarks. Given the increasing cost of experimenting with pretraining data, how does one determine the optimal balance between the diversity in general web scrapes and the information density of domain specific data? In this work, we show how to leverage the smaller domain specific datasets by upsampling them relative to CC at the end of training to drive performance improvements on difficult benchmarks. This simple technique allows us to improve up to 6.90 pp on MMLU, 8.26 pp on GSM8K, and 6.17 pp on HumanEval relative to the base data mix for a 7B model trained for 1 trillion (T) tokens, thus rivaling Llama-2 (7B)$\unicode{x2014}$a model trained for twice as long. We experiment with ablating the duration of domain upsampling from 5% to 30% of training and find that 10% to 20% percent is optimal for navigating the tradeoff between general language modeling capabilities and targeted benchmarks. We also use domain upsampling to characterize at scale the utility of individual datasets for improving various benchmarks by removing them during this final phase of training. This tool opens up the ability to experiment with the impact of different pretraining datasets at scale, but at an order of magnitude lower cost compared to full pretraining runs. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: The first three authors contributed equally

arXiv:2406.00132 [pdf, other]

QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation

Authors: Zhuo Chen, Rumen Dangovski, Charlotte Loh, Owen Dugan, Di Luo, Marin Soljačić

Abstract: We propose Quantum-informed Tensor Adaptation (QuanTA), a novel, easy-to-implement, fine-tuning method with no inference overhead for large-scale pre-trained language models. By leveraging quantum-inspired methods derived from quantum circuit structures, QuanTA enables efficient high-rank fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)--low-rank approximation may fail for com… ▽ More We propose Quantum-informed Tensor Adaptation (QuanTA), a novel, easy-to-implement, fine-tuning method with no inference overhead for large-scale pre-trained language models. By leveraging quantum-inspired methods derived from quantum circuit structures, QuanTA enables efficient high-rank fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)--low-rank approximation may fail for complicated downstream tasks. Our approach is theoretically supported by the universality theorem and the rank representation theorem to achieve efficient high-rank adaptations. Experiments demonstrate that QuanTA significantly enhances commonsense reasoning, arithmetic reasoning, and scalability compared to traditional methods. Furthermore, QuanTA shows superior performance with fewer trainable parameters compared to other approaches and can be designed to integrate with existing fine-tuning algorithms for further improvement, providing a scalable and efficient solution for fine-tuning large language models and advancing state-of-the-art in natural language processing. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.21015 [pdf, other]

The rising costs of training frontier AI models

Authors: Ben Cottier, Robi Rahman, Loredana Fattorini, Nestor Maslej, David Owen

Abstract: The costs of training frontier AI models have grown dramatically in recent years, but there is limited public data on the magnitude and growth of these expenses. This paper develops a detailed cost model to address this gap, estimating training costs using three approaches that account for hardware, energy, cloud rental, and staff expenses. The analysis reveals that the amortized cost to train the… ▽ More The costs of training frontier AI models have grown dramatically in recent years, but there is limited public data on the magnitude and growth of these expenses. This paper develops a detailed cost model to address this gap, estimating training costs using three approaches that account for hardware, energy, cloud rental, and staff expenses. The analysis reveals that the amortized cost to train the most compute-intensive models has grown precipitously at a rate of 2.4x per year since 2016 (95% CI: 2.0x to 3.1x). For key frontier models, such as GPT-4 and Gemini, the most significant expenses are AI accelerator chips and staff costs, each costing tens of millions of dollars. Other notable costs include server components (15-22%), cluster-level interconnect (9-13%), and energy consumption (2-6%). If the trend of growing development costs continues, the largest training runs will cost more than a billion dollars by 2027, meaning that only the most well-funded organizations will be able to finance frontier AI models. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.18600 [pdf, other]

OpenConvoy: Universal Platform for Real-World Testing of Cooperative Driving Systems

Authors: Owen Burns, Hossein Maghsoumi, Yaser Fallah, Israel Charles

Abstract: Cooperative driving, enabled by communication between automated vehicle systems, promises significant benefits to fuel efficiency, road capacity, and safety over single-vehicle driver assistance systems such as adaptive cruise control (ACC). However, the responsible development and implementation of these algorithms poses substantial challenges due to the need for extensive real-world testing. We… ▽ More Cooperative driving, enabled by communication between automated vehicle systems, promises significant benefits to fuel efficiency, road capacity, and safety over single-vehicle driver assistance systems such as adaptive cruise control (ACC). However, the responsible development and implementation of these algorithms poses substantial challenges due to the need for extensive real-world testing. We address this issue and introduce OpenConvoy, an open and extensible framework designed for the implementation and assessment of cooperative driving policies on physical connected and autonomous vehicles (CAVs). We demonstrate the capabilities of OpenConvoy through a series of experiments on a convoy of multi-scale vehicles controlled by Platooning to show the stability of our system across vehicle configurations and its ability to effectively measure convoy cohesion across driving scenarios including varying degrees of communication loss. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 7 pages, 8 figures

arXiv:2405.17813 [pdf, other]

The Impacts of Data, Ordering, and Intrinsic Dimensionality on Recall in Hierarchical Navigable Small Worlds

Authors: Owen Pendrigh Elliott, Jesse Clark

Abstract: Vector search systems, pivotal in AI applications, often rely on the Hierarchical Navigable Small Worlds (HNSW) algorithm. However, the behaviour of HNSW under real-world scenarios using vectors generated with deep learning models remains under-explored. Existing Approximate Nearest Neighbours (ANN) benchmarks and research typically has an over-reliance on simplistic datasets like MNIST or SIFT1M… ▽ More Vector search systems, pivotal in AI applications, often rely on the Hierarchical Navigable Small Worlds (HNSW) algorithm. However, the behaviour of HNSW under real-world scenarios using vectors generated with deep learning models remains under-explored. Existing Approximate Nearest Neighbours (ANN) benchmarks and research typically has an over-reliance on simplistic datasets like MNIST or SIFT1M and fail to reflect the complexity of current use-cases. Our investigation focuses on HNSW's efficacy across a spectrum of datasets, including synthetic vectors tailored to mimic specific intrinsic dimensionalities, widely-used retrieval benchmarks with popular embedding models, and proprietary e-commerce image data with CLIP models. We survey the most popular HNSW vector databases and collate their default parameters to provide a realistic fixed parameterisation for the duration of the paper. We discover that the recall of approximate HNSW search, in comparison to exact K Nearest Neighbours (KNN) search, is linked to the vector space's intrinsic dimensionality and significantly influenced by the data insertion sequence. Our methodology highlights how insertion order, informed by measurable properties such as the pointwise Local Intrinsic Dimensionality (LID) or known categories, can shift recall by up to 12 percentage points. We also observe that running popular benchmark datasets with HNSW instead of KNN can shift rankings by up to three positions for some models. This work underscores the need for more nuanced benchmarks and design considerations in developing robust vector search systems using approximate vector search algorithms. This study presents a number of scenarios with varying real world applicability which aim to better increase understanding and future development of ANN algorithms and embedding △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 15 pages, 2 figures

arXiv:2405.15537 [pdf, ps, other]

Do Not Trust Power Management: Challenges and Hints for Securing Future Trusted Execution Environments

Authors: Owen Le Gonidec, Maria Méndez Real, Guillaume Bouffard, Jean-Christophe Prévotet

Abstract: Over the past few years, several research groups have introduced innovative hardware designs for Trusted Execution Environments (TEEs), aiming to secure applications against potentially compromised privileged software, including the kernel. Since 2017, Tang et al. introduced a new class of software-enabled hardware attacks, which leverages energy management mechanisms. These attacks aim at bypassi… ▽ More Over the past few years, several research groups have introduced innovative hardware designs for Trusted Execution Environments (TEEs), aiming to secure applications against potentially compromised privileged software, including the kernel. Since 2017, Tang et al. introduced a new class of software-enabled hardware attacks, which leverages energy management mechanisms. These attacks aim at bypassing TEE security guarantees and exposing sensitive information like cryptographic keys. They have increased in prevalence over the past few years. Despite that, current RISC-V TEE architectures have yet to incorporate them into their threat models. Proprietary implementations, such as Arm TrustZone and Intel SGX, embed countermeasures. However, these countermeasures are not viable in the long term and hinder the capabilities of energy management mechanisms. This article presents the first comprehensive knowledge survey of these attacks, along with an evaluation of literature countermeasures. Our analysis highlights a substantial security gap between assumed threat models and the actual ones, presenting considerable threats in modern systems-on-chip that can undermine even the security guarantees provided by TEEs. We advocate for the enhancement of the next generation of RISC-V TEEs to address these attacks within their threat models, and we believe this study will spur further community efforts in this direction. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.13640 [pdf, other]

Knowledge Graph Reasoning with Self-supervised Reinforcement Learning

Authors: Ying Ma, Owen Burns, Mingqiu Wang, Gang Li, Nan Du, Laurent El Shafey, Liqiang Wang, Izhak Shafran, Hagen Soltau

Abstract: Reinforcement learning (RL) is an effective method of finding reasoning pathways in incomplete knowledge graphs (KGs). To overcome the challenges of a large action space, a self-supervised pre-training method is proposed to warm up the policy network before the RL training stage. To alleviate the distributional mismatch issue in general self-supervised RL (SSRL), in our supervised learning (SL) st… ▽ More Reinforcement learning (RL) is an effective method of finding reasoning pathways in incomplete knowledge graphs (KGs). To overcome the challenges of a large action space, a self-supervised pre-training method is proposed to warm up the policy network before the RL training stage. To alleviate the distributional mismatch issue in general self-supervised RL (SSRL), in our supervised learning (SL) stage, the agent selects actions based on the policy network and learns from generated labels; this self-generation of labels is the intuition behind the name self-supervised. With this training framework, the information density of our SL objective is increased and the agent is prevented from getting stuck with the early rewarded paths. Our self-supervised RL (SSRL) method improves the performance of RL by pairing it with the wide coverage achieved by SL during pretraining, since the breadth of the SL objective makes it infeasible to train an agent with that alone. We show that our SSRL model meets or exceeds current state-of-the-art results on all Hits@k and mean reciprocal rank (MRR) metrics on four large benchmark KG datasets. This SSRL method can be used as a plug-in for any RL architecture for a KGR task. We adopt two RL architectures, i.e., MINERVA and MultiHopKG as our baseline RL models and experimentally show that our SSRL model consistently outperforms both baselines on all of these four KG reasoning tasks. Full code for the paper available at https://github.com/owenonline/Knowledge-Graph-Reasoning-with-Self-supervised-Reinforcement-Learning. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 17 pages, 11 figures

arXiv:2405.06727 [pdf, other]

Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces

Authors: Owen Davis, Gianluca Geraci, Mohammad Motamed

Abstract: In this work, we consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function and inversely proportional to the product of network width and depth. We inherit this approximation error bound from Fouri… ▽ More In this work, we consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function and inversely proportional to the product of network width and depth. We inherit this approximation error bound from Fourier features residual networks, a type of neural network that uses complex exponential activation functions. Our proof is constructive and proceeds by conducting a careful complexity analysis associated with the approximation of a Fourier features residual network by a ReLU network. △ Less

Submitted 10 May, 2024; originally announced May 2024.

MSC Class: 41A25; 41A30; 41A46; 68T07

arXiv:2405.05594 [pdf, other]

Expected Work Search: Combining Win Rate and Proof Size Estimation

Authors: Owen Randall, Martin Müller, Ting Han Wei, Ryan Hayward

Abstract: We propose Expected Work Search (EWS), a new game solving algorithm. EWS combines win rate estimation, as used in Monte Carlo Tree Search, with proof size estimation, as used in Proof Number Search. The search efficiency of EWS stems from minimizing a novel notion of Expected Work, which predicts the expected computation required to solve a position. EWS outperforms traditional solving algorithms… ▽ More We propose Expected Work Search (EWS), a new game solving algorithm. EWS combines win rate estimation, as used in Monte Carlo Tree Search, with proof size estimation, as used in Proof Number Search. The search efficiency of EWS stems from minimizing a novel notion of Expected Work, which predicts the expected computation required to solve a position. EWS outperforms traditional solving algorithms on the games of Go and Hex. For Go, we present the first solution to the empty 5x5 board with the commonly used positional superko ruleset. For Hex, our algorithm solves the empty 8x8 board in under 4 minutes. Experiments show that EWS succeeds both with and without extensive domain-specific knowledge. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.04288 [pdf, other]

BetterNet: An Efficient CNN Architecture with Residual Learning and Attention for Precision Polyp Segmentation

Authors: Owen Singh, Sandeep Singh Sengar

Abstract: Colorectal cancer contributes significantly to cancer-related mortality. Timely identification and elimination of polyps through colonoscopy screening is crucial in order to decrease mortality rates. Accurately detecting polyps in colonoscopy images is difficult because of the differences in characteristics such as size, shape, texture, and similarity to surrounding tissues. Current deep-learning… ▽ More Colorectal cancer contributes significantly to cancer-related mortality. Timely identification and elimination of polyps through colonoscopy screening is crucial in order to decrease mortality rates. Accurately detecting polyps in colonoscopy images is difficult because of the differences in characteristics such as size, shape, texture, and similarity to surrounding tissues. Current deep-learning methods often face difficulties in capturing long-range connections necessary for segmentation. This research presents BetterNet, a convolutional neural network (CNN) architecture that combines residual learning and attention methods to enhance the accuracy of polyp segmentation. The primary characteristics encompass (1) a residual decoder architecture that facilitates efficient gradient propagation and integration of multiscale features. (2) channel and spatial attention blocks within the decoder block to concentrate the learning process on the relevant areas of polyp regions. (3) Achieving state-of-the-art performance on polyp segmentation benchmarks while still ensuring computational efficiency. (4) Thorough ablation tests have been conducted to confirm the influence of architectural components. (5) The model code has been made available as open-source for further contribution. Extensive evaluations conducted on datasets such as Kvasir-SEG, CVC ClinicDB, Endoscene, EndoTect, and Kvasir-Sessile demonstrate that BetterNets outperforms current SOTA models in terms of segmentation accuracy by significant margins. The lightweight design enables real-time inference for various applications. BetterNet shows promise in integrating computer-assisted diagnosis techniques to enhance the detection of polyps and the early recognition of cancer. Link to the code: https://github.com/itsOwen/BetterNet △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.02985 [pdf]

Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education

Authors: Owen Henkel, Adam Boxer, Libby Hills, Bill Roberts

Abstract: This paper presents reports on a series of experiments with a novel dataset evaluating how well Large Language Models (LLMs) can mark (i.e. grade) open text responses to short answer questions, Specifically, we explore how well different combinations of GPT version and prompt engineering strategies performed at marking real student answers to short answer across different domain areas (Science and… ▽ More This paper presents reports on a series of experiments with a novel dataset evaluating how well Large Language Models (LLMs) can mark (i.e. grade) open text responses to short answer questions, Specifically, we explore how well different combinations of GPT version and prompt engineering strategies performed at marking real student answers to short answer across different domain areas (Science and History) and grade-levels (spanning ages 5-16) using a new, never-used-before dataset from Carousel, a quizzing platform. We found that GPT-4, with basic few-shot prompting performed well (Kappa, 0.70) and, importantly, very close to human-level performance (0.75). This research builds on prior findings that GPT-4 could reliably score short answer reading comprehension questions at a performance-level very close to that of expert human raters. The proximity to human-level performance, across a variety of subjects and grade levels suggests that LLMs could be a valuable tool for supporting low-stakes formative assessment tasks in K-12 education and has important implications for real-world education delivery. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2404.16767 [pdf, other]

REBEL: Reinforcement Learning via Regressing Relative Rewards

Authors: Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

Abstract: While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping), and is notorious for its sensitivity to the precise impleme… ▽ More While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping), and is notorious for its sensitivity to the precise implementation of these components. In response, we take a step back and ask what a minimalist RL algorithm for the era of generative models would look like. We propose REBEL, an algorithm that cleanly reduces the problem of policy optimization to regressing the relative reward between two completions to a prompt in terms of the policy, enabling strikingly lightweight implementation. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL, which allows us to match the strongest known theoretical guarantees in terms of convergence and sample complexity in the RL literature. REBEL can also cleanly incorporate offline data and be extended to handle the intransitive preferences we frequently see in practice. Empirically, we find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO, all while being simpler to implement and more computationally efficient than PPO. When fine-tuning Llama-3-8B-Instruct, REBEL achieves strong performance in AlpacaEval 2.0, MT-Bench, and Open LLM Leaderboard. △ Less

Submitted 1 September, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: New experimental results on general chat

arXiv:2404.11525 [pdf, other]

JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA

Authors: Zeyu Zhang, Xuyin Qi, Mingxi Chen, Guangxi Li, Ryan Pham, Ayub Qassim, Ella Berry, Zhibin Liao, Owen Siggs, Robert Mclaughlin, Jamie Craig, Minh-Son To

Abstract: The oxygen saturation level in the blood (SaO2) is crucial for health, particularly in relation to sleep-related breathing disorders. However, continuous monitoring of SaO2 is time-consuming and highly variable depending on patients' conditions. Recently, optical coherence tomography angiography (OCTA) has shown promising development in rapidly and effectively screening eye-related lesions, offeri… ▽ More The oxygen saturation level in the blood (SaO2) is crucial for health, particularly in relation to sleep-related breathing disorders. However, continuous monitoring of SaO2 is time-consuming and highly variable depending on patients' conditions. Recently, optical coherence tomography angiography (OCTA) has shown promising development in rapidly and effectively screening eye-related lesions, offering the potential for diagnosing sleep-related disorders. To bridge this gap, our paper presents three key contributions. Firstly, we propose JointViT, a novel model based on the Vision Transformer architecture, incorporating a joint loss function for supervision. Secondly, we introduce a balancing augmentation technique during data preprocessing to improve the model's performance, particularly on the long-tail distribution within the OCTA dataset. Lastly, through comprehensive experiments on the OCTA dataset, our proposed method significantly outperforms other state-of-the-art methods, achieving improvements of up to 12.28% in overall accuracy. This advancement lays the groundwork for the future utilization of OCTA in diagnosing sleep-related disorders. See project website https://steve-zeyu-zhang.github.io/JointViT △ Less

Submitted 28 July, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted to MIUA 2024 Oral

arXiv:2404.08495 [pdf, other]

Dataset Reset Policy Optimization for RLHF

Authors: Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun

Abstract: Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a reward model from an offline preference dataset followed by running online RL to optimize the learned reward model. In this work, leveraging the idea of r… ▽ More Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a reward model from an offline preference dataset followed by running online RL to optimize the learned reward model. In this work, leveraging the idea of reset, we propose a new RLHF algorithm with provable guarantees. Motivated by the fact that offline preference dataset provides informative states (i.e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution. In theory, we show that DR-PO learns to perform at least as good as any policy that is covered by the offline dataset under general function approximation with finite sample complexity. In experiments, we demonstrate that on both the TL;DR summarization and the Anthropic Helpful Harmful (HH) dataset, the generation from DR-PO is better than that from Proximal Policy Optimization (PPO) and Direction Preference Optimization (DPO), under the metric of GPT4 win-rate. Code for this work can be found at https://github.com/Cornell-RL/drpo. △ Less

Submitted 16 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: 28 pages, 6 tables, 3 Figures, 3 Algorithms

arXiv:2404.04837 [pdf, other]

GATlab: Modeling and Programming with Generalized Algebraic Theories

Authors: Owen Lynch, Kris Brown, James Fairbanks, Evan Patterson

Abstract: Categories and categorical structures are increasingly recognized as useful abstractions for modeling in science and engineering. To uniformly implement category-theoretic mathematical models in software, we introduce GATlab, a domain-specific language for algebraic specification embedded in a technical programming language. GATlab is based on generalized algebraic theories (GATs), a logical syste… ▽ More Categories and categorical structures are increasingly recognized as useful abstractions for modeling in science and engineering. To uniformly implement category-theoretic mathematical models in software, we introduce GATlab, a domain-specific language for algebraic specification embedded in a technical programming language. GATlab is based on generalized algebraic theories (GATs), a logical system extending algebraic theories with dependent types so as to encompass category theory. Using GATlab, the programmer can specify generalized algebraic theories and their models, including both free models, based on symbolic expressions, and computational models, defined by arbitrary code in the host language. Moreover, the programmer can define maps between theories and use them to declaratively migrate models of one theory to models of another. In short, GATlab aims to provide a unified environment for both computer algebra and software interface design with generalized algebraic theories. In this paper, we describe the design, implementation, and applications of GATlab. △ Less

Submitted 8 June, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: 14 pages plus references and appendix. To appear at MFPS 2024

arXiv:2404.03673 [pdf, other]

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Authors: Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun

Abstract: Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning… ▽ More Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration. In this work, to optimize text-to-image generative models for task specific rewards and enable fast training and inference, we propose a framework for fine-tuning consistency models via RL. Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure. Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps. Experimentally, we show that RLCM can adapt text-to-image consistency models to objectives that are challenging to express with prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Our code is available at https://rlcm.owenoertell.com. △ Less

Submitted 22 June, 2024; v1 submitted 25 March, 2024; originally announced April 2024.

Comments: 18 pages, 9 figures, 1 table

arXiv:2403.19851 [pdf, other]

Localizing Paragraph Memorization in Language Models

Authors: Niklas Stoehr, Mitchell Gordon, Chiyuan Zhang, Owen Lewis

Abstract: Can we localize the weights and mechanisms used by a language model to memorize and recite entire paragraphs of its training data? In this paper, we show that while memorization is spread across multiple layers and model components, gradients of memorized paragraphs have a distinguishable spatial pattern, being larger in lower model layers than gradients of non-memorized examples. Moreover, the me… ▽ More Can we localize the weights and mechanisms used by a language model to memorize and recite entire paragraphs of its training data? In this paper, we show that while memorization is spread across multiple layers and model components, gradients of memorized paragraphs have a distinguishable spatial pattern, being larger in lower model layers than gradients of non-memorized examples. Moreover, the memorized examples can be unlearned by fine-tuning only the high-gradient weights. We localize a low-layer attention head that appears to be especially involved in paragraph memorization. This head is predominantly focusing its attention on distinctive, rare tokens that are least frequent in a corpus-level unigram distribution. Next, we study how localized memorization is across the tokens in the prefix by perturbing tokens and measuring the caused change in the decoding. A few distinctive tokens early in a prefix can often corrupt the entire continuation. Overall, memorized continuations are not only harder to unlearn, but also to corrupt than non-memorized ones. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.14100 [pdf, other]

Causal knowledge engineering: A case study from COVID-19

Authors: Steven Mascaro, Yue Wu, Ross Pearson, Owen Woodberry, Jessica Ramsay, Tom Snelling, Ann E. Nicholson

Abstract: COVID-19 appeared abruptly in early 2020, requiring a rapid response amid a context of great uncertainty. Good quality data and knowledge was initially lacking, and many early models had to be developed with causal assumptions and estimations built in to supplement limited data, often with no reliable approach for identifying, validating and documenting these causal assumptions. Our team embarked… ▽ More COVID-19 appeared abruptly in early 2020, requiring a rapid response amid a context of great uncertainty. Good quality data and knowledge was initially lacking, and many early models had to be developed with causal assumptions and estimations built in to supplement limited data, often with no reliable approach for identifying, validating and documenting these causal assumptions. Our team embarked on a knowledge engineering process to develop a causal knowledge base consisting of several causal BNs for diverse aspects of COVID-19. The unique challenges of the setting lead to experiments with the elicitation approach, and what emerged was a knowledge engineering method we call Causal Knowledge Engineering (CKE). The CKE provides a structured approach for building a causal knowledge base that can support the development of a variety of application-specific models. Here we describe the CKE method, and use our COVID-19 work as a case study to provide a detailed discussion and analysis of the method. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 22 pages (plus 19 pages in appendices), 9 figures, submitted for review

arXiv:2403.09362 [pdf, other]

Komodo: A Linguistic Expedition into Indonesia's Regional Languages

Authors: Louis Owen, Vishesh Tripathi, Abhay Kumar, Biddwan Ahmed

Abstract: The recent breakthroughs in Large Language Models (LLMs) have mostly focused on languages with easily available and sufficient resources, such as English. However, there remains a significant gap for languages that lack sufficient linguistic resources in the public domain. Our work introduces Komodo-7B, 7-billion-parameter Large Language Models designed to address this gap by seamlessly operating… ▽ More The recent breakthroughs in Large Language Models (LLMs) have mostly focused on languages with easily available and sufficient resources, such as English. However, there remains a significant gap for languages that lack sufficient linguistic resources in the public domain. Our work introduces Komodo-7B, 7-billion-parameter Large Language Models designed to address this gap by seamlessly operating across Indonesian, English, and 11 regional languages in Indonesia. Komodo-7B is a family of LLMs that consist of Komodo-7B-Base and Komodo-7B-Instruct. Komodo-7B-Instruct stands out by achieving state-of-the-art performance in various tasks and languages, outperforming the benchmarks set by OpenAI's GPT-3.5, Cohere's Aya-101, Llama-2-Chat-13B, Mixtral-8x7B-Instruct-v0.1, Gemma-7B-it , and many more. This model not only demonstrates superior performance in both language-specific and overall assessments but also highlights its capability to excel in linguistic diversity. Our commitment to advancing language models extends beyond well-resourced languages, aiming to bridge the gap for those with limited linguistic assets. Additionally, Komodo-7B-Instruct's better cross-language understanding contributes to addressing educational disparities in Indonesia, offering direct translations from English to 11 regional languages, a significant improvement compared to existing language translation services. Komodo-7B represents a crucial step towards inclusivity and effectiveness in language models, providing to the linguistic needs of diverse communities. △ Less

Submitted 19 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 30 Pages, 8 Figures, 4 Tables

arXiv:2403.08777 [pdf, other]

doi 10.1109/IPDPS57955.2024.00043

Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs

Authors: Herbert Owen, Dominik Ernst, Thomas Gruber, Oriol Lemkuhl, Guillaume Houzeaux, Lucas Gasparino, Gerhard Wellein

Abstract: This paper addresses the challenge of providing portable and highly efficient code structures for CPU and GPU architectures. We choose the assembly of the right-hand term in the incompressible flow module of the High-Performance Computational Mechanics code Alya, which is one of the two CFD codes in the Unified European Benchmark Suite. Starting from an efficient CPU-code and a related OpenACC-por… ▽ More This paper addresses the challenge of providing portable and highly efficient code structures for CPU and GPU architectures. We choose the assembly of the right-hand term in the incompressible flow module of the High-Performance Computational Mechanics code Alya, which is one of the two CFD codes in the Unified European Benchmark Suite. Starting from an efficient CPU-code and a related OpenACC-port for GPUs we successively investigate performance potentials arising from code specialization, algorithmic restructuring and low-level optimizations. We demonstrate that only the combination of these different dimensions of runtime optimization unveils the full performance potential on the GPU and CPU. Roofline-based performance modelling is applied in this process and we demonstrate the need to investigate new optimization strategies if a classical roofline limit such as memory bandwidth utilization is achieved, rather than stopping the process. The final unified OpenACC-based implementation boosts performance by more than 50x on an NVIDIA A100 GPU (achieving approximately 2.5 TF/s FP64) and a further factor of 5x for an Intel Icelake based CPU-node (achieving approximately 1.0 TF/s FP64). The insights gained in our manual approach lays ground implementing unified but still highly efficient code structures for related kernels in Alya and other applications. These can be realized by manual coding or automatic code generation frameworks. △ Less

Submitted 22 January, 2024; originally announced March 2024.

arXiv:2403.08127 [pdf]

Guidelines for the Creation of Analysis Ready Data

Authors: Harriette Phillips, Aiden Price, Owen Forbes, Claire Boulange, Kerrie Mengersen, Marketa Reeves, Rebecca Glauert

Abstract: Globally, there is an increased need for guidelines to produce high-quality data outputs for analysis. No framework currently exists that provides guidelines for a comprehensive approach to producing analysis ready data (ARD). Through critically reviewing and summarising current literature, this paper proposes such guidelines for the creation of ARD. The guidelines proposed in this paper inform te… ▽ More Globally, there is an increased need for guidelines to produce high-quality data outputs for analysis. No framework currently exists that provides guidelines for a comprehensive approach to producing analysis ready data (ARD). Through critically reviewing and summarising current literature, this paper proposes such guidelines for the creation of ARD. The guidelines proposed in this paper inform ten steps in the generation of ARD: ethics, project documentation, data governance, data management, data storage, data discovery and collection, data cleaning, quality assurance, metadata, and data dictionary. These steps are illustrated through a substantive case study that aimed to create ARD for a digital spatial platform: the Australian Child and Youth Wellbeing Atlas (ACYWA). △ Less

Submitted 29 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: 49 pages, 3 figures, 3 tables, and 5 appendices

arXiv:2403.05812 [pdf, other]

Algorithmic progress in language models

Authors: Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla

Abstract: We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months,… ▽ More We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months, substantially faster than hardware gains per Moore's Law. We estimate augmented scaling laws, which enable us to quantify algorithmic progress and determine the relative contributions of scaling models versus innovations in training algorithms. Despite the rapid pace of algorithmic progress and the development of new architectures such as the transformer, our analysis reveals that the increase in compute made an even larger contribution to overall performance improvements over this time period. Though limited by noisy benchmark data, our analysis quantifies the rapid progress in language modeling, shedding light on the relative contributions from compute and algorithms. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Showing 1–50 of 329 results for author: Owen