-
UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback
Authors:
Jason Wu,
Eldon Schoop,
Alan Leung,
Titus Barik,
Jeffrey P. Bigham,
Jeffrey Nichols
Abstract:
Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model. In this paper, we explore the use of automated feedback (compilers and multi-modal models) to guide LLMs to generate high-quality UI code. Our method starts with an…
▽ More
Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model. In this paper, we explore the use of automated feedback (compilers and multi-modal models) to guide LLMs to generate high-quality UI code. Our method starts with an existing LLM and iteratively produces improved models by self-generating a large synthetic dataset using an original model, applying automated tools to aggressively filter, score, and de-duplicate the data into a refined higher quality dataset. The original LLM is improved by finetuning on this refined dataset. We applied our approach to several open-source LLMs and compared the resulting performance to baseline models with both automated metrics and human preferences. Our evaluation shows the resulting models outperform all other downloadable baselines and approach the performance of larger proprietary models.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain
Authors:
Brian Hu,
Bill Ray,
Alice Leung,
Amy Summerville,
David Joy,
Christopher Funk,
Arslan Basharat
Abstract:
In difficult decision-making scenarios, it is common to have conflicting opinions among expert human decision-makers as there may not be a single right answer. Such decisions may be guided by different attributes that can be used to characterize an individual's decision. We introduce a novel dataset for medical triage decision-making, labeled with a set of decision-maker attributes (DMAs). This da…
▽ More
In difficult decision-making scenarios, it is common to have conflicting opinions among expert human decision-makers as there may not be a single right answer. Such decisions may be guided by different attributes that can be used to characterize an individual's decision. We introduce a novel dataset for medical triage decision-making, labeled with a set of decision-maker attributes (DMAs). This dataset consists of 62 scenarios, covering six different DMAs, including ethical principles such as fairness and moral desert. We present a novel software framework for human-aligned decision-making by utilizing these DMAs, paving the way for trustworthy AI with better guardrails. Specifically, we demonstrate how large language models (LLMs) can serve as ethical decision-makers, and how their decisions can be aligned to different DMAs using zero-shot prompting. Our experiments focus on different open-source models with varying sizes and training techniques, such as Falcon, Mistral, and Llama 2. Finally, we also introduce a new form of weighted self-consistency that improves the overall quantified performance. Our results provide new research directions in the use of LLMs as alignable decision-makers. The dataset and open-source software are publicly available at: https://github.com/ITM-Kitware/llm-alignable-dm.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks
Authors:
Ruijia Cheng,
Titus Barik,
Alan Leung,
Fred Hohman,
Jeffrey Nichols
Abstract:
Programmers frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generati…
▽ More
Programmers frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generation with an additional ephemeral UI step, offering users UI scaffolds as an intermediate stage between user prompts and code generation. We present this workflow in BISCUIT, an extension for JupyterLab that provides users with ephemeral UIs generated by LLMs based on the context of their code and intentions, scaffolding users to understand, guide, and explore with LLM-generated code. Through a user study where 10 novices used BISCUIT for machine learning tutorials, we found that BISCUIT offers users representations of code to aid their understanding, reduces the complexity of prompt engineering, and creates a playground for users to explore different variables and iterate on their ideas.
△ Less
Submitted 11 July, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
On Improving the Performance of Glitch Classification for Gravitational Wave Detection by using Generative Adversarial Networks
Authors:
Jianqi Yan,
Alex P. Leung,
David C. Y. Hui
Abstract:
Spectrogram classification plays an important role in analyzing gravitational wave data. In this paper, we propose a framework to improve the classification performance by using Generative Adversarial Networks (GANs). As substantial efforts and expertise are required to annotate spectrograms, the number of training examples is very limited. However, it is well known that deep networks can perform…
▽ More
Spectrogram classification plays an important role in analyzing gravitational wave data. In this paper, we propose a framework to improve the classification performance by using Generative Adversarial Networks (GANs). As substantial efforts and expertise are required to annotate spectrograms, the number of training examples is very limited. However, it is well known that deep networks can perform well only when the sample size of the training set is sufficiently large. Furthermore, the imbalanced sample sizes in different classes can also hamper the performance. In order to tackle these problems, we propose a GAN-based data augmentation framework. While standard data augmentation methods for conventional images cannot be applied on spectrograms, we found that a variant of GANs, ProGAN, is capable of generating high-resolution spectrograms which are consistent with the quality of the high-resolution original images and provide a desirable diversity. We have validated our framework by classifying glitches in the {\it Gravity Spy} dataset with the GAN-generated spectrograms for training. We show that the proposed method can provide an alternative to transfer learning for the classification of spectrograms using deep networks, i.e. using a high-resolution GAN for data augmentation instead. Furthermore, fluctuations in classification performance with small sample sizes for training and evaluation can be greatly reduced. Using the trained network in our framework, we have also examined the spectrograms with label anomalies in {\it Gravity Spy}.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
3D modelling of survey scene from images enhanced with a multi-exposure fusion
Authors:
Kwok-Leung Chan,
Liping Li,
Arthur Wing-Tak Leung,
Ho-Yin Chan
Abstract:
In current practice, scene survey is carried out by workers using total stations. The method has high accuracy, but it incurs high costs if continuous monitoring is needed. Techniques based on photogrammetry, with the relatively cheaper digital cameras, have gained wide applications in many fields. Besides point measurement, photogrammetry can also create a three-dimensional (3D) model of the scen…
▽ More
In current practice, scene survey is carried out by workers using total stations. The method has high accuracy, but it incurs high costs if continuous monitoring is needed. Techniques based on photogrammetry, with the relatively cheaper digital cameras, have gained wide applications in many fields. Besides point measurement, photogrammetry can also create a three-dimensional (3D) model of the scene. Accurate 3D model reconstruction depends on high quality images. Degraded images will result in large errors in the reconstructed 3D model. In this paper, we propose a method that can be used to improve the visibility of the images, and eventually reduce the errors of the 3D scene model. The idea is inspired by image dehazing. Each original image is first transformed into multiple exposure images by means of gamma-correction operations and adaptive histogram equalization. The transformed images are analyzed by the computation of the local binary patterns. The image is then enhanced, with each pixel generated from the set of transformed image pixels weighted by a function of the local pattern feature and image saturation. Performance evaluation has been performed on benchmark image dehazing datasets. Experimentations have been carried out on outdoor and indoor surveys. Our analysis finds that the method works on different types of degradation that exist in both outdoor and indoor images. When fed into the photogrammetry software, the enhanced images can reconstruct 3D scene models with sub-millimeter mean errors.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
Contact Graph Epidemic Modelling of COVID-19 for Transmission and Intervention Strategies
Authors:
Abby Leung,
Xiaoye Ding,
Shenyang Huang,
Reihaneh Rabbany
Abstract:
The coronavirus disease 2019 (COVID-19) pandemic has quickly become a global public health crisis unseen in recent years. It is known that the structure of the human contact network plays an important role in the spread of transmissible diseases. In this work, we study a structure aware model of COVID-19 CGEM. This model becomes similar to the classical compartment-based models in epidemiology if…
▽ More
The coronavirus disease 2019 (COVID-19) pandemic has quickly become a global public health crisis unseen in recent years. It is known that the structure of the human contact network plays an important role in the spread of transmissible diseases. In this work, we study a structure aware model of COVID-19 CGEM. This model becomes similar to the classical compartment-based models in epidemiology if we assume the contact network is a Erdos-Renyi (ER) graph, i.e. everyone comes into contact with everyone else with the same probability. In contrast, CGEM is more expressive and allows for plugging in the actual contact networks, or more realistic proxies for it. Moreover, CGEM enables more precise modelling of enforcing and releasing different non-pharmaceutical intervention (NPI) strategies. Through a set of extensive experiments, we demonstrate significant differences between the epidemic curves when assuming different underlying structures. More specifically we demonstrate that the compartment-based models are overestimating the spread of the infection by a factor of 3, and under some realistic assumptions on the compliance factor, underestimating the effectiveness of some of NPIs, mischaracterizing others (e.g. predicting a later peak), and underestimating the scale of the second peak after reopening.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Incorporating Dynamic Flight Network in SEIR to Model Mobility between Populations
Authors:
Xiaoye Ding,
Shenyang Huang,
Abby Leung,
Reihaneh Rabbany
Abstract:
Current efforts of modelling COVID-19 are often based on the standard compartmental models such as SEIR and their variations. As pre-symptomatic and asymptomatic cases can spread the disease between populations through travel, it is important to incorporate mobility between populations into the epidemiological modelling. In this work, we propose to modify the commonly-used SEIR model to account fo…
▽ More
Current efforts of modelling COVID-19 are often based on the standard compartmental models such as SEIR and their variations. As pre-symptomatic and asymptomatic cases can spread the disease between populations through travel, it is important to incorporate mobility between populations into the epidemiological modelling. In this work, we propose to modify the commonly-used SEIR model to account for the dynamic flight network, by estimating the imported cases based on the air traffic volume as well as the test positive rate at the source. This modification, called Flight-SEIR, can potentially enable 1). early detection of outbreaks due to imported pre-symptomatic and asymptomatic cases, 2). more accurate estimation of the reproduction number and 3). evaluation of the impact of travel restrictions and the implications of lifting these measures. The proposed Flight-SEIR is essential in navigating through this pandemic and the next ones, given how interconnected our world has become.
△ Less
Submitted 3 October, 2020;
originally announced October 2020.
-
DRIFT: Deep Reinforcement Learning for Functional Software Testing
Authors:
Luke Harries,
Rebekah Storan Clarke,
Timothy Chapman,
Swamy V. P. L. N. Nallamalli,
Levent Ozgur,
Shuktika Jain,
Alex Leung,
Steve Lim,
Aaron Dietrich,
José Miguel Hernández-Lobato,
Tom Ellis,
Cheng Zhang,
Kamil Ciosek
Abstract:
Efficient software testing is essential for productive software development and reliable user experiences. As human testing is inefficient and expensive, automated software testing is needed. In this work, we propose a Reinforcement Learning (RL) framework for functional software testing named DRIFT. DRIFT operates on the symbolic representation of the user interface. It uses Q-learning through Ba…
▽ More
Efficient software testing is essential for productive software development and reliable user experiences. As human testing is inefficient and expensive, automated software testing is needed. In this work, we propose a Reinforcement Learning (RL) framework for functional software testing named DRIFT. DRIFT operates on the symbolic representation of the user interface. It uses Q-learning through Batch-RL and models the state-action value function with a Graph Neural Network. We apply DRIFT to testing the Windows 10 operating system and show that DRIFT can robustly trigger the desired software functionality in a fully automated manner. Our experiments test the ability to perform single and combined tasks across different applications, demonstrating that our framework can efficiently test software with a large range of testing objectives.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
PinView: Implicit Feedback in Content-Based Image Retrieval
Authors:
Zakria Hussain,
Arto Klami,
Jussi Kujala,
Alex P. Leung,
Kitsuchart Pasupa,
Peter Auer,
Samuel Kaski,
Jorma Laaksonen,
John Shawe-Taylor
Abstract:
This paper describes PinView, a content-based image retrieval system that exploits implicit relevance feedback collected during a search session. PinView contains several novel methods to infer the intent of the user. From relevance feedback, such as eye movements or pointer clicks, and visual features of images, PinView learns a similarity metric between images which depends on the current intere…
▽ More
This paper describes PinView, a content-based image retrieval system that exploits implicit relevance feedback collected during a search session. PinView contains several novel methods to infer the intent of the user. From relevance feedback, such as eye movements or pointer clicks, and visual features of images, PinView learns a similarity metric between images which depends on the current interests of the user. It then retrieves images with a specialized online learning algorithm that balances the tradeoff between exploring new images and exploiting the already inferred interests of the user. We have integrated PinView to the content-based image retrieval system PicSOM, which enables applying PinView to real-world image databases. With the new algorithms PinView outperforms the original PicSOM, and in online experiments with real users the combination of implicit and explicit feedback gives the best results.
△ Less
Submitted 2 October, 2014;
originally announced October 2014.