-
Mathematical Models of Computation in Superposition
Authors:
Kaarel Hänni,
Jake Mendel,
Dmitry Vaintrob,
Lawrence Chan
Abstract:
Superposition -- when a neural network represents more ``features'' than it has dimensions -- seems to pose a serious challenge to mechanistically interpreting current AI systems. Existing theory work studies \emph{representational} superposition, where superposition is only used when passing information through bottlenecks. In this work, we present mathematical models of \emph{computation} in sup…
▽ More
Superposition -- when a neural network represents more ``features'' than it has dimensions -- seems to pose a serious challenge to mechanistically interpreting current AI systems. Existing theory work studies \emph{representational} superposition, where superposition is only used when passing information through bottlenecks. In this work, we present mathematical models of \emph{computation} in superposition, where superposition is actively helpful for efficiently accomplishing the task.
We first construct a task of efficiently emulating a circuit that takes the AND of the $\binom{m}{2}$ pairs of each of $m$ features. We construct a 1-layer MLP that uses superposition to perform this task up to $\varepsilon$-error, where the network only requires $\tilde{O}(m^{\frac{2}{3}})$ neurons, even when the input features are \emph{themselves in superposition}. We generalize this construction to arbitrary sparse boolean circuits of low depth, and then construct ``error correction'' layers that allow deep fully-connected networks of width $d$ to emulate circuits of width $\tilde{O}(d^{1.5})$ and \emph{any} polynomial depth. We conclude by providing some potential applications of our work for interpreting neural networks that implement computation in superposition.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
On The Stability of Moral Preferences: A Problem with Computational Elicitation Methods
Authors:
Kyle Boerstler,
Vijay Keswani,
Lok Chan,
Jana Schaich Borg,
Vincent Conitzer,
Hoda Heidari,
Walter Sinnott-Armstrong
Abstract:
Preference elicitation frameworks feature heavily in the research on participatory ethical AI tools and provide a viable mechanism to enquire and incorporate the moral values of various stakeholders. As part of the elicitation process, surveys about moral preferences, opinions, and judgments are typically administered only once to each participant. This methodological practice is reasonable if par…
▽ More
Preference elicitation frameworks feature heavily in the research on participatory ethical AI tools and provide a viable mechanism to enquire and incorporate the moral values of various stakeholders. As part of the elicitation process, surveys about moral preferences, opinions, and judgments are typically administered only once to each participant. This methodological practice is reasonable if participants' responses are stable over time such that, all other relevant factors being held constant, their responses today will be the same as their responses to the same questions at a later time. However, we do not know how often that is the case. It is possible that participants' true moral preferences change, are subject to temporary moods or whims, or are influenced by environmental factors we don't track. If participants' moral responses are unstable in such ways, it would raise important methodological and theoretical issues for how participants' true moral preferences, opinions, and judgments can be ascertained. We address this possibility here by asking the same survey participants the same moral questions about which patient should receive a kidney when only one is available ten times in ten different sessions over two weeks, varying only presentation order across sessions. We measured how often participants gave different responses to simple (Study One) and more complicated (Study Two) repeated scenarios. On average, the fraction of times participants changed their responses to controversial scenarios was around 10-18% across studies, and this instability is observed to have positive associations with response time and decision-making difficulty. We discuss the implications of these results for the efficacy of moral preference elicitation, highlighting the role of response instability in causing value misalignment between stakeholders and AI tools trained on their moral judgments.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Compact Proofs of Model Performance via Mechanistic Interpretability
Authors:
Jason Gross,
Rajashree Agrawal,
Thomas Kwa,
Euan Ong,
Chun Hei Yip,
Alex Gibson,
Soufiane Noubir,
Lawrence Chan
Abstract:
We propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof strategies an…
▽ More
We propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless noise as a key challenge for using mechanistic interpretability to generate compact proofs on model performance.
△ Less
Submitted 21 July, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
A Pilot Study on the Comparison of Prefrontal Cortex Activities of Robotic Therapies on Elderly with Mild Cognitive Impairment
Authors:
King Tai Henry Au-Yeung,
William Wai Lam Chan,
Kwan Yin Brian Chan,
Hongjie Jiang,
Junpei Zhong
Abstract:
Demographic shifts have led to an increase in mild cognitive impairment (MCI), and this study investigates the effects of cognitive training (CT) and reminiscence therapy (RT) conducted by humans or socially assistive robots (SARs) on prefrontal cortex activation in elderly individuals with MCI, aiming to determine the most effective therapy-modality combination for promoting cognitive function. T…
▽ More
Demographic shifts have led to an increase in mild cognitive impairment (MCI), and this study investigates the effects of cognitive training (CT) and reminiscence therapy (RT) conducted by humans or socially assistive robots (SARs) on prefrontal cortex activation in elderly individuals with MCI, aiming to determine the most effective therapy-modality combination for promoting cognitive function. This pilot study employs a randomized control trial (RCT) design. Additionally, the study explores the efficacy of Reminiscence Therapy (RT) in comparison to Cognitive Training (CT). Eight MCI subjects, with a mean age of 70.125 years, were randomly assigned to ``human-led'' or ``SAR-led'' groups. Utilizing Functional Near-infrared Spectroscopy (fNIRS) to measure oxy-hemoglobin concentration changes in the dorsolateral prefrontal cortex (DLPFC), the study found no significant differences in the effects of human-led and SAR-led cognitive training on DLPFC activation. However, distinct patterns emerged in memory encoding and retrieval phases between RT and CT, shedding light on the impacts of these interventions on brain activation in the context of MCI.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
Classification of Nasopharyngeal Cases using DenseNet Deep Learning Architecture
Authors:
W. S. H. M. W. Ahmad,
M. F. A. Fauzi,
M. K. Abdullahi,
Jenny T. H. Lee,
N. S. A. Basry,
A Yahaya,
A. M. Ismail,
A. Adam,
Elaine W. L. Chan,
F. S. Abas
Abstract:
Nasopharyngeal carcinoma (NPC) is one of the understudied yet deadliest cancers in South East Asia. In Malaysia, the prevalence is identified mainly in Sarawak, among the ethnic of Bidayuh. NPC is often late-diagnosed because it is asymptomatic at the early stage. There are several tissue representations from the nasopharynx biopsy, such as nasopharyngeal inflammation (NPI), lymphoid hyperplasia (…
▽ More
Nasopharyngeal carcinoma (NPC) is one of the understudied yet deadliest cancers in South East Asia. In Malaysia, the prevalence is identified mainly in Sarawak, among the ethnic of Bidayuh. NPC is often late-diagnosed because it is asymptomatic at the early stage. There are several tissue representations from the nasopharynx biopsy, such as nasopharyngeal inflammation (NPI), lymphoid hyperplasia (LHP), nasopharyngeal carcinoma (NPC) and normal tissue. This paper is our first initiative to identify the difference between NPC, NPI and normal cases. Seven whole slide images (WSIs) with gigapixel resolutions from seven different patients and two hospitals were experimented with using two test setups, consisting of a different set of images. The tissue regions are patched into smaller blocks and classified using DenseNet architecture with 21 dense layers. Two tests are carried out, each for proof of concept (Test 1) and real-test scenario (Test 2). The accuracy achieved for NPC class is 94.8% for Test 1 and 67.0% for Test 2.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Evaluating Language-Model Agents on Realistic Autonomous Tasks
Authors:
Megan Kinniment,
Lucas Jun Koba Sato,
Haoxing Du,
Brian Goodrich,
Max Hasin,
Lawrence Chan,
Luke Harold Miles,
Tao R. Lin,
Hjalmar Wijk,
Joel Burget,
Aaron Ho,
Elizabeth Barnes,
Paul Christiano
Abstract:
In this report, we explore the ability of language model agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild. We refer to this cluster of capabilities as "autonomous replication and adaptation" or ARA. We believe that systems capable of ARA could have wide-reaching and hard-to-anticipate consequences, and that measuring and forecasting…
▽ More
In this report, we explore the ability of language model agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild. We refer to this cluster of capabilities as "autonomous replication and adaptation" or ARA. We believe that systems capable of ARA could have wide-reaching and hard-to-anticipate consequences, and that measuring and forecasting ARA may be useful for informing measures around security, monitoring, and alignment. Additionally, once a system is capable of ARA, placing bounds on a system's capabilities may become significantly more difficult.
We construct four simple example agents that combine language models with tools that allow them to take actions in the world. We then evaluate these agents on 12 tasks relevant to ARA. We find that these language model agents can only complete the easiest tasks from this list, although they make some progress on the more challenging tasks. Unfortunately, these evaluations are not adequate to rule out the possibility that near-future agents will be capable of ARA. In particular, we do not think that these evaluations provide good assurance that the ``next generation'' of language models (e.g. 100x effective compute scaleup on existing models) will not yield agents capable of ARA, unless intermediate evaluations are performed during pretraining. Relatedly, we expect that fine-tuning of the existing models could produce substantially more competent agents, even if the fine-tuning is not directly targeted at ARA.
△ Less
Submitted 4 January, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Finite Volume Features, Global Geometry Representations, and Residual Training for Deep Learning-based CFD Simulation
Authors:
Loh Sher En Jessica,
Naheed Anjum Arafat,
Wei Xian Lim,
Wai Lee Chan,
Adams Wai Kin Kong
Abstract:
Computational fluid dynamics (CFD) simulation is an irreplaceable modelling step in many engineering designs, but it is often computationally expensive. Some graph neural network (GNN)-based CFD methods have been proposed. However, the current methods inherit the weakness of traditional numerical simulators, as well as ignore the cell characteristics in the mesh used in the finite volume method, a…
▽ More
Computational fluid dynamics (CFD) simulation is an irreplaceable modelling step in many engineering designs, but it is often computationally expensive. Some graph neural network (GNN)-based CFD methods have been proposed. However, the current methods inherit the weakness of traditional numerical simulators, as well as ignore the cell characteristics in the mesh used in the finite volume method, a common method in practical CFD applications. Specifically, the input nodes in these GNN methods have very limited information about any object immersed in the simulation domain and its surrounding environment. Also, the cell characteristics of the mesh such as cell volume, face surface area, and face centroid are not included in the message-passing operations in the GNN methods. To address these weaknesses, this work proposes two novel geometric representations: Shortest Vector (SV) and Directional Integrated Distance (DID). Extracted from the mesh, the SV and DID provide global geometry perspective to each input node, thus removing the need to collect this information through message-passing. This work also introduces the use of Finite Volume Features (FVF) in the graph convolutions as node and edge attributes, enabling its message-passing operations to adjust to different nodes. Finally, this work is the first to demonstrate how residual training, with the availability of low-resolution data, can be adopted to improve the flow field prediction accuracy. Experimental results on two datasets with five different state-of-the-art GNN methods for CFD indicate that SV, DID, FVF and residual training can effectively reduce the predictive error of current GNN-based methods by as much as 41%.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets
Authors:
Faris F. Gulamali,
Ashwin S. Sawant,
Lora Liharska,
Carol R. Horowitz,
Lili Chan,
Patricia H. Kovatch,
Ira Hofer,
Karandeep Singh,
Lynne D. Richardson,
Emmanuel Mensah,
Alexander W Charney,
David L. Reich,
Jianying Hu,
Girish N. Nadkarni
Abstract:
The adoption of diagnosis and prognostic algorithms in healthcare has led to concerns about the perpetuation of bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success. Here, we generate a data-centric, model-agnostic, task-agnostic ap…
▽ More
The adoption of diagnosis and prognostic algorithms in healthcare has led to concerns about the perpetuation of bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success. Here, we generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity). We then apply a systematic analysis of AEq values across subpopulations to identify and mitigate manifestations of racial bias in two known cases in healthcare - Chest X-rays diagnosis with deep convolutional neural networks and healthcare utilization prediction with multivariate logistic regression. AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
AutoEncoding Tree for City Generation and Applications
Authors:
Wenyu Han,
Congcong Wen,
Lazarus Chok,
Yan Liang Tan,
Sheung Lung Chan,
Hang Zhao,
Chen Feng
Abstract:
City modeling and generation have attracted an increased interest in various applications, including gaming, urban planning, and autonomous driving. Unlike previous works focused on the generation of single objects or indoor scenes, the huge volumes of spatial data in cities pose a challenge to the generative models. Furthermore, few publicly available 3D real-world city datasets also hinder the d…
▽ More
City modeling and generation have attracted an increased interest in various applications, including gaming, urban planning, and autonomous driving. Unlike previous works focused on the generation of single objects or indoor scenes, the huge volumes of spatial data in cities pose a challenge to the generative models. Furthermore, few publicly available 3D real-world city datasets also hinder the development of methods for city generation. In this paper, we first collect over 3,000,000 geo-referenced objects for the city of New York, Zurich, Tokyo, Berlin, Boston and several other large cities. Based on this dataset, we propose AETree, a tree-structured auto-encoder neural network, for city generation. Specifically, we first propose a novel Spatial-Geometric Distance (SGD) metric to measure the similarity between building layouts and then construct a binary tree over the raw geometric data of building based on the SGD metric. Next, we present a tree-structured network whose encoder learns to extract and merge spatial information from bottom-up iteratively. The resulting global representation is reversely decoded for reconstruction or generation. To address the issue of long-dependency as the level of the tree increases, a Long Short-Term Memory (LSTM) Cell is employed as a basic network element of the proposed AETree. Moreover, we introduce a novel metric, Overlapping Area Ratio (OAR), to quantitatively evaluate the generation results. Experiments on the collected dataset demonstrate the effectiveness of the proposed model on 2D and 3D city generation. Furthermore, the latent features learned by AETree can serve downstream urban planning applications.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Embracing assay heterogeneity with neural processes for markedly improved bioactivity predictions
Authors:
Lucian Chan,
Marcel Verdonk,
Carl Poelking
Abstract:
Predicting the bioactivity of a ligand is one of the hardest and most important challenges in computer-aided drug discovery. Despite years of data collection and curation efforts by research organizations worldwide, bioactivity data remains sparse and heterogeneous, thus hampering efforts to build predictive models that are accurate, transferable and robust. The intrinsic variability of the experi…
▽ More
Predicting the bioactivity of a ligand is one of the hardest and most important challenges in computer-aided drug discovery. Despite years of data collection and curation efforts by research organizations worldwide, bioactivity data remains sparse and heterogeneous, thus hampering efforts to build predictive models that are accurate, transferable and robust. The intrinsic variability of the experimental data is further compounded by data aggregation practices that neglect heterogeneity to overcome sparsity. Here we discuss the limitations of these practices and present a hierarchical meta-learning framework that exploits the information synergy across disparate assays by successfully accounting for assay heterogeneity. We show that the model achieves a drastic improvement in affinity prediction across diverse protein targets and assay types compared to conventional baselines. It can quickly adapt to new target contexts using very few observations, thus enabling large-scale virtual screening in early-phase drug discovery.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Raphtory: The temporal graph engine for Rust and Python
Authors:
Ben Steer,
Naomi Arnold,
Cheick Tidiane Ba,
Renaud Lambiotte,
Haaroon Yousaf,
Lucas Jeub,
Fabian Murariu,
Shivam Kapoor,
Pedro Rico,
Rachel Chan,
Louis Chan,
James Alford,
Richard G. Clegg,
Felix Cuadrado,
Matthew Russell Barnes,
Peijie Zhong,
John N. Pougué Biyong,
Alhamza Alnaimi
Abstract:
Raphtory is a platform for building and analysing temporal networks. The library includes methods for creating networks from a variety of data sources; algorithms to explore their structure and evolution; and an extensible GraphQL server for deployment of applications built on top. Raphtory's core engine is built in Rust, for efficiency, with Python interfaces, for ease of use. Raphtory is develop…
▽ More
Raphtory is a platform for building and analysing temporal networks. The library includes methods for creating networks from a variety of data sources; algorithms to explore their structure and evolution; and an extensible GraphQL server for deployment of applications built on top. Raphtory's core engine is built in Rust, for efficiency, with Python interfaces, for ease of use. Raphtory is developed by network scientists, with a background in Physics, Applied Mathematics, Engineering and Computer Science, for use across academia and industry.
△ Less
Submitted 3 January, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Computational Pathology: A Survey Review and The Way Forward
Authors:
Mahdi S. Hosseini,
Babak Ehteshami Bejnordi,
Vincent Quoc-Huy Trinh,
Danial Hasan,
Xingwen Li,
Taehyo Kim,
Haochen Zhang,
Theodore Wu,
Kajanan Chinniah,
Sina Maghsoudlou,
Ryan Zhang,
Stephen Yang,
Jiadai Zhu,
Lyndon Chan,
Samir Khaki,
Andrei Buin,
Fatemeh Chaji,
Ala Salehi,
Bich Ngoc Nguyen,
Dimitris Samaras,
Konstantinos N. Plataniotis
Abstract:
Computational Pathology CPath is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology, facilitating transformational changes in the diagnosis and treatment of cancer that a…
▽ More
Computational Pathology CPath is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology, facilitating transformational changes in the diagnosis and treatment of cancer that are mainly address by CPath tools. With evergrowing developments in deep learning and computer vision algorithms, and the ease of the data flow from digital pathology, currently CPath is witnessing a paradigm shift. Despite the sheer volume of engineering and scientific works being introduced for cancer image analysis, there is still a considerable gap of adopting and integrating these algorithms in clinical practice. This raises a significant question regarding the direction and trends that are undertaken in CPath. In this article we provide a comprehensive review of more than 800 papers to address the challenges faced in problem design all-the-way to the application and implementation viewpoints. We have catalogued each paper into a model-card by examining the key works and challenges faced to layout the current landscape in CPath. We hope this helps the community to locate relevant works and facilitate understanding of the field's future directions. In a nutshell, we oversee the CPath developments in cycle of stages which are required to be cohesively linked together to address the challenges associated with such multidisciplinary science. We overview this cycle from different perspectives of data-centric, model-centric, and application-centric problems. We finally sketch remaining challenges and provide directions for future technical developments and clinical integration of CPath (https://github.com/AtlasAnalyticsLab/CPath_Survey).
△ Less
Submitted 27 January, 2024; v1 submitted 11 April, 2023;
originally announced April 2023.
-
$\text{DC}^2$: Dual-Camera Defocus Control by Learning to Refocus
Authors:
Hadi Alzayer,
Abdullah Abuolaim,
Leung Chun Chan,
Yang Yang,
Ying Chen Lou,
Jia-Bin Huang,
Abhishek Kar
Abstract:
Smartphone cameras today are increasingly approaching the versatility and quality of professional cameras through a combination of hardware and software advancements. However, fixed aperture remains a key limitation, preventing users from controlling the depth of field (DoF) of captured images. At the same time, many smartphones now have multiple cameras with different fixed apertures -- specifica…
▽ More
Smartphone cameras today are increasingly approaching the versatility and quality of professional cameras through a combination of hardware and software advancements. However, fixed aperture remains a key limitation, preventing users from controlling the depth of field (DoF) of captured images. At the same time, many smartphones now have multiple cameras with different fixed apertures -- specifically, an ultra-wide camera with wider field of view and deeper DoF and a higher resolution primary camera with shallower DoF. In this work, we propose $\text{DC}^2$, a system for defocus control for synthetically varying camera aperture, focus distance and arbitrary defocus effects by fusing information from such a dual-camera system. Our key insight is to leverage real-world smartphone camera dataset by using image refocus as a proxy task for learning to control defocus. Quantitative and qualitative evaluations on real-world data demonstrate our system's efficacy where we outperform state-of-the-art on defocus deblurring, bokeh rendering, and image refocus. Finally, we demonstrate creative post-capture defocus control enabled by our method, including tilt-shift and content-based defocus effects.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
KG-Hub -- Building and Exchanging Biological Knowledge Graphs
Authors:
J Harry Caufield,
Tim Putman,
Kevin Schaper,
Deepak R Unni,
Harshad Hegde,
Tiffany J Callahan,
Luca Cappelletti,
Sierra AT Moxon,
Vida Ravanmehr,
Seth Carbon,
Lauren E Chan,
Katherina Cortes,
Kent A Shefchek,
Glass Elsarboukh,
James P Balhoff,
Tommaso Fontana,
Nicolas Matentzoglu,
Richard M Bruskiewich,
Anne E Thessen,
Nomi L Harris,
Monica C Munoz-Torres,
Melissa A Haendel,
Peter N Robinson,
Marcin P Joachimiak,
Christopher J Mungall
, et al. (1 additional authors not shown)
Abstract:
Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of knowledge graphs is lacking. Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of knowledge graphs. Features include a simp…
▽ More
Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of knowledge graphs is lacking. Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of knowledge graphs. Features include a simple, modular extract-transform-load (ETL) pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate knowledge graphs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph machine learning, including node embeddings and training of models for link prediction and node classification.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
Authors:
Bilal Chughtai,
Lawrence Chan,
Neel Nanda
Abstract:
Universality is a key hypothesis in mechanistic interpretability -- that different models learn similar features and circuits when trained on similar tasks. In this work, we study the universality hypothesis by examining how small neural networks learn to implement group composition. We present a novel algorithm by which neural networks may implement composition for any finite group via mathematic…
▽ More
Universality is a key hypothesis in mechanistic interpretability -- that different models learn similar features and circuits when trained on similar tasks. In this work, we study the universality hypothesis by examining how small neural networks learn to implement group composition. We present a novel algorithm by which neural networks may implement composition for any finite group via mathematical representation theory. We then show that networks consistently learn this algorithm by reverse engineering model logits and weights, and confirm our understanding using ablations. By studying networks of differing architectures trained on various groups, we find mixed evidence for universality: using our algorithm, we can completely characterize the family of circuits and features that networks learn on this task, but for a given network the precise circuits learned -- as well as the order they develop -- are arbitrary.
△ Less
Submitted 24 May, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Progress measures for grokking via mechanistic interpretability
Authors:
Neel Nanda,
Lawrence Chan,
Tom Lieberum,
Jess Smith,
Jacob Steinhardt
Abstract:
Neural networks often exhibit emergent behavior, where qualitatively new capabilities arise from scaling up the amount of parameters, training data, or training steps. One approach to understanding emergence is to find continuous \textit{progress measures} that underlie the seemingly discontinuous qualitative changes. We argue that progress measures can be found via mechanistic interpretability: r…
▽ More
Neural networks often exhibit emergent behavior, where qualitatively new capabilities arise from scaling up the amount of parameters, training data, or training steps. One approach to understanding emergence is to find continuous \textit{progress measures} that underlie the seemingly discontinuous qualitative changes. We argue that progress measures can be found via mechanistic interpretability: reverse-engineering learned behaviors into their individual components. As a case study, we investigate the recently-discovered phenomenon of ``grokking'' exhibited by small transformers trained on modular addition tasks. We fully reverse engineer the algorithm learned by these networks, which uses discrete Fourier transforms and trigonometric identities to convert addition to rotation about a circle. We confirm the algorithm by analyzing the activations and weights and by performing ablations in Fourier space. Based on this understanding, we define progress measures that allow us to study the dynamics of training and split training into three continuous phases: memorization, circuit formation, and cleanup. Our results show that grokking, rather than being a sudden shift, arises from the gradual amplification of structured mechanisms encoded in the weights, followed by the later removal of memorizing components.
△ Less
Submitted 19 October, 2023; v1 submitted 12 January, 2023;
originally announced January 2023.
-
Language models are better than humans at next-token prediction
Authors:
Buck Shlegeris,
Fabien Roger,
Lawrence Chan,
Euan McLean
Abstract:
Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code. However, language models are not trained to perform well at these tasks, they are trained to accurately predict the next token given previous tokes in tokenized text. It is not clear whether language models are better or worse than humans at next token prediction…
▽ More
Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code. However, language models are not trained to perform well at these tasks, they are trained to accurately predict the next token given previous tokes in tokenized text. It is not clear whether language models are better or worse than humans at next token prediction. To try to answer this question, we performed two distinct experiments to directly compare humans and language models on this front: one measuring top-1 accuracy and the other measuring perplexity. In both experiments, we find humans to be consistently \emph{worse} than even relatively small language models like GPT3-Ada at next-token prediction.
△ Less
Submitted 15 July, 2024; v1 submitted 21 December, 2022;
originally announced December 2022.
-
The Alignment Problem from a Deep Learning Perspective
Authors:
Richard Ngo,
Lawrence Chan,
Sören Mindermann
Abstract:
In coming years or decades, artificial general intelligence (AGI) may surpass human capabilities at many critical tasks. We argue that, without substantial effort to prevent it, AGIs could learn to pursue goals that are in conflict (i.e. misaligned) with human interests. If trained like today's most capable models, AGIs could learn to act deceptively to receive higher reward, learn misaligned inte…
▽ More
In coming years or decades, artificial general intelligence (AGI) may surpass human capabilities at many critical tasks. We argue that, without substantial effort to prevent it, AGIs could learn to pursue goals that are in conflict (i.e. misaligned) with human interests. If trained like today's most capable models, AGIs could learn to act deceptively to receive higher reward, learn misaligned internally-represented goals which generalize beyond their fine-tuning distributions, and pursue those goals using power-seeking strategies. We review emerging evidence for these properties. AGIs with these properties would be difficult to align and may appear aligned even when they are not. Finally, we briefly outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world, and we review research directions aimed at preventing this outcome.
△ Less
Submitted 19 March, 2024; v1 submitted 29 August, 2022;
originally announced September 2022.
-
A method for comparing multiple imputation techniques: a case study on the U.S. National COVID Cohort Collaborative
Authors:
Elena Casiraghi,
Rachel Wong,
Margaret Hall,
Ben Coleman,
Marco Notaro,
Michael D. Evans,
Jena S. Tronieri,
Hannah Blau,
Bryan Laraway,
Tiffany J. Callahan,
Lauren E. Chan,
Carolyn T. Bramante,
John B. Buse,
Richard A. Moffitt,
Til Sturmer,
Steven G. Johnson,
Yu Raymond Shao,
Justin Reese,
Peter N. Robinson,
Alberto Paccanaro,
Giorgio Valentini,
Jared D. Huling,
Kenneth Wilkins,
:,
Tell Bennet
, et al. (12 additional authors not shown)
Abstract:
Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful to assess associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases and the simple removal of these cases may introduce severe bias. For these reasons, several multiple imputation algorithms have been propose…
▽ More
Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful to assess associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases and the simple removal of these cases may introduce severe bias. For these reasons, several multiple imputation algorithms have been proposed to attempt to recover the missing information. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithms works best in a given scenario. Furthermore, the selection of each algorithm parameters and data-related modelling choices are also both crucial and challenging. In this paper, we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular focus on multiple imputation techniques. We demonstrate the feasibility of our approach on a large cohort of type-2 diabetes patients provided by the National COVID Cohort Collaborative (N3C) Enclave, where we explored the influence of various patient characteristics on outcomes related to COVID-19. Our analysis included classic multiple imputation techniques as well as simple complete-case Inverse Probability Weighted models. The experiments presented here show that our approach could effectively highlight the most valid and performant missing-data handling strategy for our case study. Moreover, our methodology allowed us to gain an understanding of the behavior of the different models and of how it changed as we modified their parameters. Our method is general and can be applied to different research fields and on datasets containing heterogeneous types.
△ Less
Submitted 25 September, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Adversarial Training for High-Stakes Reliability
Authors:
Daniel M. Ziegler,
Seraphina Nix,
Lawrence Chan,
Tim Bauman,
Peter Schmidt-Nielsen,
Tao Lin,
Adam Scherlis,
Noa Nabeshima,
Ben Weinstein-Raun,
Daniel de Haas,
Buck Shlegeris,
Nate Thomas
Abstract:
In the future, powerful AI systems may be deployed in high-stakes settings, where a single failure could be catastrophic. One technique for improving AI safety in high-stakes settings is adversarial training, which uses an adversary to generate examples to train on in order to achieve better worst-case performance.
In this work, we used a safe language generation task (``avoid injuries'') as a t…
▽ More
In the future, powerful AI systems may be deployed in high-stakes settings, where a single failure could be catastrophic. One technique for improving AI safety in high-stakes settings is adversarial training, which uses an adversary to generate examples to train on in order to achieve better worst-case performance.
In this work, we used a safe language generation task (``avoid injuries'') as a testbed for achieving high reliability through adversarial training. We created a series of adversarial training techniques -- including a tool that assists human adversaries -- to find and eliminate failures in a classifier that filters text completions suggested by a generator. In our task, we determined that we can set very conservative classifier thresholds without significantly impacting the quality of the filtered outputs. We found that adversarial training increased robustness to the adversarial attacks that we trained on -- doubling the time for our contractors to find adversarial examples both with our tool (from 13 to 26 minutes) and without (from 20 to 44 minutes) -- without affecting in-distribution performance.
We hope to see further work in the high-stakes reliability setting, including more powerful tools for enhancing human adversaries and better ways to measure high levels of reliability, until we can confidently rule out the possibility of catastrophic deployment-time failures of powerful models.
△ Less
Submitted 9 November, 2022; v1 submitted 3 May, 2022;
originally announced May 2022.
-
3D pride without 2D prejudice: Bias-controlled multi-level generative models for structure-based ligand design
Authors:
Lucian Chan,
Rajendra Kumar,
Marcel Verdonk,
Carl Poelking
Abstract:
Generative models for structure-based molecular design hold significant promise for drug discovery, with the potential to speed up the hit-to-lead development cycle, while improving the quality of drug candidates and reducing costs. Data sparsity and bias are, however, two main roadblocks to the development of 3D-aware models. Here we propose a first-in-kind training protocol based on multi-level…
▽ More
Generative models for structure-based molecular design hold significant promise for drug discovery, with the potential to speed up the hit-to-lead development cycle, while improving the quality of drug candidates and reducing costs. Data sparsity and bias are, however, two main roadblocks to the development of 3D-aware models. Here we propose a first-in-kind training protocol based on multi-level contrastive learning for improved bias control and data efficiency. The framework leverages the large data resources available for 2D generative modelling with datasets of ligand-protein complexes. The result are hierarchical generative models that are topologically unbiased, explainable and customizable. We show how, by deconvolving the generative posterior into chemical, topological and structural context factors, we not only avoid common pitfalls in the design and evaluation of generative models, but furthermore gain detailed insight into the generative process itself. This improved transparency significantly aids method development, besides allowing fine-grained control over novelty vs familiarity.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
Investigating Positive and Negative Qualities of Human-in-the-Loop Optimization for Designing Interaction Techniques
Authors:
Liwei Chan,
Yi-Chi Liao,
George B. Mo,
John J. Dudley,
Chun-Lien Cheng,
Per Ola Kristensson,
Antti Oulasvirta
Abstract:
Designers reportedly struggle with design optimization tasks where they are asked to find a combination of design parameters that maximizes a given set of objectives. In HCI, design optimization problems are often exceedingly complex, involving multiple objectives and expensive empirical evaluations. Model-based computational design algorithms assist designers by generating design examples during…
▽ More
Designers reportedly struggle with design optimization tasks where they are asked to find a combination of design parameters that maximizes a given set of objectives. In HCI, design optimization problems are often exceedingly complex, involving multiple objectives and expensive empirical evaluations. Model-based computational design algorithms assist designers by generating design examples during design, however they assume a model of the interaction domain. Black box methods for assistance, on the other hand, can work with any design problem. However, virtually all empirical studies of this human-in-the-loop approach have been carried out by either researchers or end-users. The question stands out if such methods can help designers in realistic tasks. In this paper, we study Bayesian optimization as an algorithmic method to guide the design optimization process. It operates by proposing to a designer which design candidate to try next, given previous observations. We report observations from a comparative study with 40 novice designers who were tasked to optimize a complex 3D touch interaction technique. The optimizer helped designers explore larger proportions of the design space and arrive at a better solution, however they reported lower agency and expressiveness. Designers guided by an optimizer reported lower mental effort but also felt less creative and less in charge of the progress. We conclude that human-in-the-loop optimization can support novice designers in cases where agency is not critical.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
Windmills of the minds: an algorithm for Fermat's Two Squares Theorem
Authors:
Hing Lun Chan
Abstract:
The two squares theorem of Fermat is a gem in number theory, with a spectacular one-sentence "proof from the Book". Here is a formalisation of this proof, with an interpretation using windmill patterns. The theory behind involves involutions on a finite set, especially the parity of the number of fixed points in the involutions. Starting as an existence proof that is non-constructive, there is an…
▽ More
The two squares theorem of Fermat is a gem in number theory, with a spectacular one-sentence "proof from the Book". Here is a formalisation of this proof, with an interpretation using windmill patterns. The theory behind involves involutions on a finite set, especially the parity of the number of fixed points in the involutions. Starting as an existence proof that is non-constructive, there is an ingenious way to turn it into a constructive one. This gives an algorithm to compute the two squares by iterating the two involutions alternatively from a known fixed point.
△ Less
Submitted 14 January, 2022; v1 submitted 5 December, 2021;
originally announced December 2021.
-
Human irrationality: both bad and good for reward inference
Authors:
Lawrence Chan,
Andrew Critch,
Anca Dragan
Abstract:
Assuming humans are (approximately) rational enables robots to infer reward functions by observing human behavior. But people exhibit a wide array of irrationalities, and our goal with this work is to better understand the effect they can have on reward inference. The challenge with studying this effect is that there are many types of irrationality, with varying degrees of mathematical formalizati…
▽ More
Assuming humans are (approximately) rational enables robots to infer reward functions by observing human behavior. But people exhibit a wide array of irrationalities, and our goal with this work is to better understand the effect they can have on reward inference. The challenge with studying this effect is that there are many types of irrationality, with varying degrees of mathematical formalization. We thus operationalize irrationality in the language of MDPs, by altering the Bellman optimality equation, and use this framework to study how these alterations would affect inference.
We find that wrongly modeling a systematically irrational human as noisy-rational performs a lot worse than correctly capturing these biases -- so much so that it can be better to skip inference altogether and stick to the prior! More importantly, we show that an irrational human, when correctly modelled, can communicate more information about the reward than a perfectly rational human can. That is, if a robot has the correct model of a human's irrationality, it can make an even stronger inference than it ever could if the human were rational. Irrationality fundamentally helps rather than hinder reward inference, but it needs to be correctly accounted for.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
The Who in XAI: How AI Background Shapes Perceptions of AI Explanations
Authors:
Upol Ehsan,
Samir Passi,
Q. Vera Liao,
Larry Chan,
I-Hsiang Lee,
Michael Muller,
Mark O. Riedl
Abstract:
Explainability of AI systems is critical for users to take informed actions. Understanding "who" opens the black-box of AI is just as important as opening it. We conduct a mixed-methods study of how two different groups--people with and without AI background--perceive different types of AI explanations. Quantitatively, we share user perceptions along five dimensions. Qualitatively, we describe how…
▽ More
Explainability of AI systems is critical for users to take informed actions. Understanding "who" opens the black-box of AI is just as important as opening it. We conduct a mixed-methods study of how two different groups--people with and without AI background--perceive different types of AI explanations. Quantitatively, we share user perceptions along five dimensions. Qualitatively, we describe how AI background can influence interpretations, elucidating the differences through lenses of appropriation and cognitive heuristics. We find that (1) both groups showed unwarranted faith in numbers for different reasons and (2) each group found value in different explanations beyond their intended design. Carrying critical implications for the field of XAI, our findings showcase how AI generated explanations can have negative consequences despite best intentions and how that could lead to harmful manipulation of trust. We propose design interventions to mitigate them.
△ Less
Submitted 5 March, 2024; v1 submitted 28 July, 2021;
originally announced July 2021.
-
Optimal Cost Design for Model Predictive Control
Authors:
Avik Jain,
Lawrence Chan,
Daniel S. Brown,
Anca D. Dragan
Abstract:
Many robotics domains use some form of nonconvex model predictive control (MPC) for planning, which sets a reduced time horizon, performs trajectory optimization, and replans at every step. The actual task typically requires a much longer horizon than is computationally tractable, and is specified via a cost function that cumulates over that full horizon. For instance, an autonomous car may have a…
▽ More
Many robotics domains use some form of nonconvex model predictive control (MPC) for planning, which sets a reduced time horizon, performs trajectory optimization, and replans at every step. The actual task typically requires a much longer horizon than is computationally tractable, and is specified via a cost function that cumulates over that full horizon. For instance, an autonomous car may have a cost function that makes a desired trade-off between efficiency, safety, and obeying traffic laws. In this work, we challenge the common assumption that the cost we optimize using MPC should be the same as the ground truth cost for the task (plus a terminal cost). MPC solvers can suffer from short planning horizons, local optima, incorrect dynamics models, and, importantly, fail to account for future replanning ability. Thus, we propose that in many tasks it could be beneficial to purposefully choose a different cost function for MPC to optimize: one that results in the MPC rollout having low ground truth cost, rather than the MPC planned trajectory. We formalize this as an optimal cost design problem, and propose a zeroth-order optimization-based approach that enables us to design optimal costs for an MPC planning robot in continuous MDPs. We test our approach in an autonomous driving domain where we find costs different from the ground truth that implicitly compensate for replanning, short horizon, incorrect dynamics models, and local minima issues. As an example, the learned cost incentivizes MPC to delay its decision until later, implicitly accounting for the fact that it will get more information in the future and be able to make a better decision. Code and videos available at https://sites.google.com/berkeley.edu/ocd-mpc/.
△ Less
Submitted 9 June, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Indecision Modeling
Authors:
Duncan C McElfresh,
Lok Chan,
Kenzie Doyle,
Walter Sinnott-Armstrong,
Vincent Conitzer,
Jana Schaich Borg,
John P Dickerson
Abstract:
AI systems are often used to make or contribute to important decisions in a growing range of applications, including criminal justice, hiring, and medicine. Since these decisions impact human lives, it is important that the AI systems act in ways which align with human values. Techniques for preference modeling and social choice help researchers learn and aggregate peoples' preferences, which are…
▽ More
AI systems are often used to make or contribute to important decisions in a growing range of applications, including criminal justice, hiring, and medicine. Since these decisions impact human lives, it is important that the AI systems act in ways which align with human values. Techniques for preference modeling and social choice help researchers learn and aggregate peoples' preferences, which are used to guide AI behavior; thus, it is imperative that these learned preferences are accurate. These techniques often assume that people are willing to express strict preferences over alternatives; which is not true in practice. People are often indecisive, and especially so when their decision has moral implications. The philosophy and psychology literature shows that indecision is a measurable and nuanced behavior -- and that there are several different reasons people are indecisive. This complicates the task of both learning and aggregating preferences, since most of the relevant literature makes restrictive assumptions on the meaning of indecision. We begin to close this gap by formalizing several mathematical \emph{indecision} models based on theories from philosophy, psychology, and economics; these models can be used to describe (indecisive) agent decisions, both when they are allowed to express indecision and when they are not. We test these models using data collected from an online survey where participants choose how to (hypothetically) allocate organs to patients waiting for a transplant.
△ Less
Submitted 12 March, 2021; v1 submitted 15 December, 2020;
originally announced December 2020.
-
Accounting for Human Learning when Inferring Human Preferences
Authors:
Harry Giles,
Lawrence Chan
Abstract:
Inverse reinforcement learning (IRL) is a common technique for inferring human preferences from data. Standard IRL techniques tend to assume that the human demonstrator is stationary, that is that their policy $π$ doesn't change over time. In practice, humans interacting with a novel environment or performing well on a novel task will change their demonstrations as they learn more about the enviro…
▽ More
Inverse reinforcement learning (IRL) is a common technique for inferring human preferences from data. Standard IRL techniques tend to assume that the human demonstrator is stationary, that is that their policy $π$ doesn't change over time. In practice, humans interacting with a novel environment or performing well on a novel task will change their demonstrations as they learn more about the environment or task. We investigate the consequences of relaxing this assumption of stationarity, in particular by modelling the human as learning. Surprisingly, we find in some small examples that this can lead to better inference than if the human was stationary. That is, by observing a demonstrator who is themselves learning, a machine can infer more than by observing a demonstrator who is noisily rational. In addition, we find evidence that misspecification can lead to poor inference, suggesting that modelling human learning is important, especially when the human is facing an unfamiliar environment.
△ Less
Submitted 1 December, 2020; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Bias Field Poses a Threat to DNN-based X-Ray Recognition
Authors:
Binyu Tian,
Qing Guo,
Felix Juefei-Xu,
Wen Le Chan,
Yupeng Cheng,
Xiaohong Li,
Xiaofei Xie,
Shengchao Qin
Abstract:
The chest X-ray plays a key role in screening and diagnosis of many lung diseases including the COVID-19. More recently, many works construct deep neural networks (DNNs) for chest X-ray images to realize automated and efficient diagnosis of lung diseases. However, bias field caused by the improper medical image acquisition process widely exists in the chest X-ray images while the robustness of DNN…
▽ More
The chest X-ray plays a key role in screening and diagnosis of many lung diseases including the COVID-19. More recently, many works construct deep neural networks (DNNs) for chest X-ray images to realize automated and efficient diagnosis of lung diseases. However, bias field caused by the improper medical image acquisition process widely exists in the chest X-ray images while the robustness of DNNs to the bias field is rarely explored, which definitely poses a threat to the X-ray-based automated diagnosis system. In this paper, we study this problem based on the recent adversarial attack and propose a brand new attack, i.e., the adversarial bias field attack where the bias field instead of the additive noise works as the adversarial perturbations for fooling the DNNs. This novel attack posts a key problem: how to locally tune the bias field to realize high attack success rate while maintaining its spatial smoothness to guarantee high realisticity. These two goals contradict each other and thus has made the attack significantly challenging. To overcome this challenge, we propose the adversarial-smooth bias field attack that can locally tune the bias field with joint smooth & adversarial constraints. As a result, the adversarial X-ray images can not only fool the DNNs effectively but also retain very high level of realisticity. We validate our method on real chest X-ray datasets with powerful DNNs, e.g., ResNet50, DenseNet121, and MobileNet, and show different properties to the state-of-the-art attacks in both image realisticity and attack transferability. Our method reveals the potential threat to the DNN-based X-ray automated diagnosis and can definitely benefit the development of bias-field-robust automated diagnosis system.
△ Less
Submitted 3 May, 2021; v1 submitted 19 September, 2020;
originally announced September 2020.
-
Artificial Artificial Intelligence: Measuring Influence of AI 'Assessments' on Moral Decision-Making
Authors:
Lok Chan,
Kenzie Doyle,
Duncan McElfresh,
Vincent Conitzer,
John P. Dickerson,
Jana Schaich Borg,
Walter Sinnott-Armstrong
Abstract:
Given AI's growing role in modeling and improving decision-making, how and when to present users with feedback is an urgent topic to address. We empirically examined the effect of feedback from false AI on moral decision-making about donor kidney allocation. We found some evidence that judgments about whether a patient should receive a kidney can be influenced by feedback about participants' own d…
▽ More
Given AI's growing role in modeling and improving decision-making, how and when to present users with feedback is an urgent topic to address. We empirically examined the effect of feedback from false AI on moral decision-making about donor kidney allocation. We found some evidence that judgments about whether a patient should receive a kidney can be influenced by feedback about participants' own decision-making perceived to be given by AI, even if the feedback is entirely random. We also discovered different effects between assessments presented as being from human experts and assessments presented as being from AI.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.
-
A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains
Authors:
Lyndon Chan,
Mahdi S. Hosseini,
Konstantinos N. Plataniotis
Abstract:
Recently proposed methods for weakly-supervised semantic segmentation have achieved impressive performance in predicting pixel classes despite being trained with only image labels which lack positional information. Because image annotations are cheaper and quicker to generate, weak supervision is more practical than full supervision for training segmentation algorithms. These methods have been pre…
▽ More
Recently proposed methods for weakly-supervised semantic segmentation have achieved impressive performance in predicting pixel classes despite being trained with only image labels which lack positional information. Because image annotations are cheaper and quicker to generate, weak supervision is more practical than full supervision for training segmentation algorithms. These methods have been predominantly developed to solve the background separation and partial segmentation problems presented by natural scene images and it is unclear whether they can be simply transferred to other domains with different characteristics, such as histopathology and satellite images, and still perform well. This paper evaluates state-of-the-art weakly-supervised semantic segmentation methods on natural scene, histopathology, and satellite image datasets and analyzes how to determine which method is most suitable for a given dataset. Our experiments indicate that histopathology and satellite images present a different set of problems for weakly-supervised semantic segmentation than natural scene images, such as ambiguous boundaries and class co-occurrence. Methods perform well for datasets they were developed on, but tend to perform poorly on other datasets. We present some practical techniques for these methods on unseen datasets and argue that more work is needed for a generalizable approach to weakly-supervised semantic segmentation. Our full code implementation is available on GitHub: https://github.com/lyndonchan/wsss-analysis.
△ Less
Submitted 17 October, 2020; v1 submitted 23 December, 2019;
originally announced December 2019.
-
Bell Diagonal and Werner state generation: entanglement, non-locality, steering and discord on the IBM quantum computer
Authors:
Elias Riedel Gårding,
Nicolas Schwaller,
Su Yeon Chang,
Samuel Bosch,
Willy Robert Laborde,
Javier Naya Hernandez,
Chun Lam Chan,
Frédéric Gessler,
Xinyu Si,
Marc-André Dupertuis,
Nicolas Macris
Abstract:
We propose the first correct special-purpose quantum circuits for preparation of Bell-diagonal states (BDS), and implement them on the IBM Quantum computer, characterizing and testing complex aspects of their quantum correlations in the full parameter space. Among the circuits proposed, one involves only two quantum bits but requires adapted quantum tomography routines handling classical bits in p…
▽ More
We propose the first correct special-purpose quantum circuits for preparation of Bell-diagonal states (BDS), and implement them on the IBM Quantum computer, characterizing and testing complex aspects of their quantum correlations in the full parameter space. Among the circuits proposed, one involves only two quantum bits but requires adapted quantum tomography routines handling classical bits in parallel. The entire class of Bell-diagonal states is generated, and a number of characteristic indicators, namely entanglement of formation, CHSH non-locality, steering and discord, are experimentally evaluated over the full parameter space and compared with theory. As a by-product of this work we also find a remarkable general inequality between "quantum discord" and "asymmetric relative entropy of discord": the former never exceeds the latter. We also prove that for all BDS the two coincide.
△ Less
Submitted 16 May, 2021; v1 submitted 12 December, 2019;
originally announced December 2019.
-
Decoding of visual-related information from the human EEG using an end-to-end deep learning approach
Authors:
Lingling Yang,
Leanne Lai Hang Chan,
Yao Lu
Abstract:
There is increasing interest in using deep learning approach for EEG analysis as there are still rooms for the improvement of EEG analysis in its accuracy. Convolutional long short-term (CNNLSTM) has been successfully applied in time series data with spatial structure through end-to-end learning. Here, we proposed a CNNLSTM based neural network architecture termed EEG_CNNLSTMNet for the classifica…
▽ More
There is increasing interest in using deep learning approach for EEG analysis as there are still rooms for the improvement of EEG analysis in its accuracy. Convolutional long short-term (CNNLSTM) has been successfully applied in time series data with spatial structure through end-to-end learning. Here, we proposed a CNNLSTM based neural network architecture termed EEG_CNNLSTMNet for the classification of EEG signals in response to grating stimuli with different spatial frequencies. EEG_CNNLSTMNet comprises two convolutional layers and one bidirectional long short-term memory (LSTM) layer. The convolutional layers capture local temporal characteristics of the EEG signal at each channel as well as global spatial characteristics across channels, while the LSTM layer extracts long-term temporal dependency of EEG signals. Our experiment showed that EEG_CNNLSTMNet performed much better at EEG classification than a traditional machine learning approach, i.e. a support vector machine (SVM) with features. Additionally, EEG_CNNLSTMNet outperformed EEGNet, a state-of-art neural network architecture for the intra-subject case. We infer that the underperformance when using an LSTM layer in the inter-subject case is due to long-term dependency characteristics in the EEG signal that vary greatly across subjects. Moreover, the inter-subject fine-tuned classification model using very little data of the new subject achieved much higher accuracy than that trained only on the data from the other subjects. Our study suggests that the fine-tuned inter-subject model can be a potential end-to-end EEG analysis method considering both the accuracy and the required training data of the new subject.
△ Less
Submitted 19 December, 2019; v1 submitted 1 November, 2019;
originally announced November 2019.
-
Domain Generalization via Multidomain Discriminant Analysis
Authors:
Shoubo Hu,
Kun Zhang,
Zhitang Chen,
Laiwan Chan
Abstract:
Domain generalization (DG) aims to incorporate knowledge from multiple source domains into a single model that could generalize well on unseen target domains. This problem is ubiquitous in practice since the distributions of the target data may rarely be identical to those of the source data. In this paper, we propose Multidomain Discriminant Analysis (MDA) to address DG of classification tasks in…
▽ More
Domain generalization (DG) aims to incorporate knowledge from multiple source domains into a single model that could generalize well on unseen target domains. This problem is ubiquitous in practice since the distributions of the target data may rarely be identical to those of the source data. In this paper, we propose Multidomain Discriminant Analysis (MDA) to address DG of classification tasks in general situations. MDA learns a domain-invariant feature transformation that aims to achieve appealing properties, including a minimal divergence among domains within each class, a maximal separability among classes, and overall maximal compactness of all classes. Furthermore, we provide the bounds on excess risk and generalization error by learning theory analysis. Comprehensive experiments on synthetic and real benchmark datasets demonstrate the effectiveness of MDA.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings
Authors:
Jean-Francois Ton,
Lucian Chan,
Yee Whye Teh,
Dino Sejdinovic
Abstract:
Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. on estimating conditional expectations in regression. In many applications, however, we are faced with conditional distributions which cannot be meaningfully summarized using expectation only (due to e.g. multimodality). Hence, we consider the problem of conditional density estima…
▽ More
Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. on estimating conditional expectations in regression. In many applications, however, we are faced with conditional distributions which cannot be meaningfully summarized using expectation only (due to e.g. multimodality). Hence, we consider the problem of conditional density estimation in the meta-learning setting. We introduce a novel technique for meta-learning which combines neural representation and noise-contrastive estimation with the established literature of conditional mean embeddings into reproducing kernel Hilbert spaces. The method is validated on synthetic and real-world problems, demonstrating the utility of sharing learned representations across multiple conditional density estimation tasks.
△ Less
Submitted 23 February, 2021; v1 submitted 5 June, 2019;
originally announced June 2019.
-
Mutual Information for the Stochastic Block Model by the Adaptive Interpolation Method
Authors:
Jean Barbier,
Chun Lam Chan,
Nicolas Macris
Abstract:
We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve mapping the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmi…
▽ More
We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve mapping the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmic, spatial coupling). In this contribution we provide a self-contained direct method using only the recently introduced adaptive interpolation method.
△ Less
Submitted 16 July, 2019; v1 submitted 19 February, 2019;
originally announced February 2019.
-
The Assistive Multi-Armed Bandit
Authors:
Lawrence Chan,
Dylan Hadfield-Menell,
Siddhartha Srinivasa,
Anca Dragan
Abstract:
Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences. Such approaches can fail when people are themselves learning about what they want. In this work, we introduce the assistive multi-armed bandit, where a robot a…
▽ More
Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences. Such approaches can fail when people are themselves learning about what they want. In this work, we introduce the assistive multi-armed bandit, where a robot assists a human playing a bandit task to maximize cumulative reward. In this problem, the human does not know the reward function but can learn it through the rewards received from arm pulls; the robot only observes which arms the human pulls but not the reward associated with each pull. We offer sufficient and necessary conditions for successfully assisting the human in this framework. Surprisingly, better human performance in isolation does not necessarily lead to better performance when assisted by the robot: a human policy can do better by effectively communicating its observed rewards to the robot. We conduct proof-of-concept experiments that support these results. We see this work as contributing towards a theory behind algorithms for human-robot interaction.
△ Less
Submitted 24 January, 2019;
originally announced January 2019.
-
Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions
Authors:
Upol Ehsan,
Pradyumna Tambwekar,
Larry Chan,
Brent Harrison,
Mark Riedl
Abstract:
Automated rationale generation is an approach for real-time explanation generation whereby a computational model learns to translate an autonomous agent's internal state and action data representations into natural language. Training on human explanation data can enable agents to learn to generate human-like explanations for their behavior. In this paper, using the context of an agent that plays F…
▽ More
Automated rationale generation is an approach for real-time explanation generation whereby a computational model learns to translate an autonomous agent's internal state and action data representations into natural language. Training on human explanation data can enable agents to learn to generate human-like explanations for their behavior. In this paper, using the context of an agent that plays Frogger, we describe (a) how to collect a corpus of explanations, (b) how to train a neural rationale generator to produce different styles of rationales, and (c) how people perceive these rationales. We conducted two user studies. The first study establishes the plausibility of each type of generated rationale and situates their user perceptions along the dimensions of confidence, humanlike-ness, adequate justification, and understandability. The second study further explores user preferences between the generated rationales with regard to confidence in the autonomous agent, communicating failure and unexpected behavior. Overall, we find alignment between the intended differences in features of the generated rationales and the perceived differences by users. Moreover, context permitting, participants preferred detailed rationales to form a stable mental model of the agent's behavior.
△ Less
Submitted 11 January, 2019;
originally announced January 2019.
-
Tooth morphometry using quasi-conformal theory
Authors:
Gary P. T. Choi,
Hei Long Chan,
Robin Yong,
Sarbin Ranjitkar,
Alan Brook,
Grant Townsend,
Ke Chen,
Lok Ming Lui
Abstract:
Shape analysis is important in anthropology, bioarchaeology and forensic science for interpreting useful information from human remains. In particular, teeth are morphologically stable and hence well-suited for shape analysis. In this work, we propose a framework for tooth morphometry using quasi-conformal theory. Landmark-matching Teichmüller maps are used for establishing a 1-1 correspondence be…
▽ More
Shape analysis is important in anthropology, bioarchaeology and forensic science for interpreting useful information from human remains. In particular, teeth are morphologically stable and hence well-suited for shape analysis. In this work, we propose a framework for tooth morphometry using quasi-conformal theory. Landmark-matching Teichmüller maps are used for establishing a 1-1 correspondence between tooth surfaces with prescribed anatomical landmarks. Then, a quasi-conformal statistical shape analysis model based on the Teichmüller mapping results is proposed for building a tooth classification scheme. We deploy our framework on a dataset of human premolars to analyze the tooth shape variation among genders and ancestries. Experimental results show that our method achieves much higher classification accuracy with respect to both gender and ancestry when compared to the existing methods. Furthermore, our model reveals the underlying tooth shape difference between different genders and ancestries in terms of the local geometric distortion and curvatures.
△ Less
Submitted 6 January, 2019;
originally announced January 2019.
-
Focus Quality Assessment of High-Throughput Whole Slide Imaging in Digital Pathology
Authors:
Mahdi S. Hosseini,
Yueyang Zhang,
Lyndon Chan,
Konstantinos N. Plataniotis,
Jasper A. Z. Brawley-Hayes,
Savvas Damaskinos
Abstract:
One of the challenges facing the adoption of digital pathology workflows for clinical use is the need for automated quality control. As the scanners sometimes determine focus inaccurately, the resultant image blur deteriorates the scanned slide to the point of being unusable. Also, the scanned slide images tend to be extremely large when scanned at greater or equal 20X image resolution. Hence, for…
▽ More
One of the challenges facing the adoption of digital pathology workflows for clinical use is the need for automated quality control. As the scanners sometimes determine focus inaccurately, the resultant image blur deteriorates the scanned slide to the point of being unusable. Also, the scanned slide images tend to be extremely large when scanned at greater or equal 20X image resolution. Hence, for digital pathology to be clinically useful, it is necessary to use computational tools to quickly and accurately quantify the image focus quality and determine whether an image needs to be re-scanned. We propose a no-reference focus quality assessment metric specifically for digital pathology images, that operates by using a sum of even-derivative filter bases to synthesize a human visual system-like kernel, which is modeled as the inverse of the lens' point spread function. This kernel is then applied to a digital pathology image to modify high-frequency image information deteriorated by the scanner's optics and quantify the focus quality at the patch level. We show in several experiments that our method correlates better with ground-truth $z$-level data than other methods, and is more computationally efficient. We also extend our method to generate a local slide-level focus quality heatmap, which can be used for automated slide quality control, and demonstrate the utility of our method for clinical scan quality control by comparison with subjective slide quality scores.
△ Less
Submitted 14 November, 2018;
originally announced November 2018.
-
Hyperparameter Learning via Distributional Transfer
Authors:
Ho Chung Leon Law,
Peilin Zhao,
Lucian Chan,
Junzhou Huang,
Dino Sejdinovic
Abstract:
Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial exploration even in cases where similar prior tasks have been solved. We propose to transfer information across tasks using learnt representations of training datasets used in those tasks. This results in a joint Gaussian process model on hyperparameters and data representations. Representations…
▽ More
Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial exploration even in cases where similar prior tasks have been solved. We propose to transfer information across tasks using learnt representations of training datasets used in those tasks. This results in a joint Gaussian process model on hyperparameters and data representations. Representations make use of the framework of distribution embeddings into reproducing kernel Hilbert spaces. The developed method has a faster convergence compared to existing baselines, in some cases requiring only a few evaluations of the target objective.
△ Less
Submitted 26 May, 2019; v1 submitted 15 October, 2018;
originally announced October 2018.
-
Causal Inference and Mechanism Clustering of A Mixture of Additive Noise Models
Authors:
Shoubo Hu,
Zhitang Chen,
Vahid Partovi Nia,
Laiwan Chan,
Yanhui Geng
Abstract:
The inference of the causal relationship between a pair of observed variables is a fundamental problem in science, and most existing approaches are based on one single causal model. In practice, however, observations are often collected from multiple sources with heterogeneous causal models due to certain uncontrollable factors, which renders causal analysis results obtained by a single model skep…
▽ More
The inference of the causal relationship between a pair of observed variables is a fundamental problem in science, and most existing approaches are based on one single causal model. In practice, however, observations are often collected from multiple sources with heterogeneous causal models due to certain uncontrollable factors, which renders causal analysis results obtained by a single model skeptical. In this paper, we generalize the Additive Noise Model (ANM) to a mixture model, which consists of a finite number of ANMs, and provide the condition of its causal identifiability. To conduct model estimation, we propose Gaussian Process Partially Observable Model (GPPOM), and incorporate independence enforcement into it to learn latent parameter associated with each observation. Causal inference and clustering according to the underlying generating mechanisms of the mixture model are addressed in this work. Experiments on synthetic and real data demonstrate the effectiveness of our proposed approach.
△ Less
Submitted 11 November, 2018; v1 submitted 23 September, 2018;
originally announced September 2018.
-
A Kernel Embedding-based Approach for Nonstationary Causal Model Inference
Authors:
Shoubo Hu,
Zhitang Chen,
Laiwan Chan
Abstract:
Although nonstationary data are more common in the real world, most existing causal discovery methods do not take nonstationarity into consideration. In this letter, we propose a kernel embedding-based approach, ENCI, for nonstationary causal model inference where data are collected from multiple domains with varying distributions. In ENCI, we transform the complicated relation of a cause-effect p…
▽ More
Although nonstationary data are more common in the real world, most existing causal discovery methods do not take nonstationarity into consideration. In this letter, we propose a kernel embedding-based approach, ENCI, for nonstationary causal model inference where data are collected from multiple domains with varying distributions. In ENCI, we transform the complicated relation of a cause-effect pair into a linear model of variables of which observations correspond to the kernel embeddings of the cause-and-effect distributions in different domains. In this way, we are able to estimate the causal direction by exploiting the causal asymmetry of the transformed linear model. Furthermore, we extend ENCI to causal graph discovery for multiple variables by transforming the relations among them into a linear nongaussian acyclic model. We show that by exploiting the nonstationarity of distributions, both cause-effect pairs and two kinds of causal graphs are identifiable under mild conditions. Experiments on synthetic and real-world data are conducted to justify the efficacy of ENCI over major existing methods.
△ Less
Submitted 23 September, 2018;
originally announced September 2018.
-
TBI Contusion Segmentation from MRI using Convolutional Neural Networks
Authors:
Snehashis Roy,
John A. Butman,
Leighton Chan,
Dzung L. Pham
Abstract:
Traumatic brain injury (TBI) is caused by a sudden trauma to the head that may result in hematomas and contusions and can lead to stroke or chronic disability. An accurate quantification of the lesion volumes and their locations is essential to understand the pathophysiology of TBI and its progression. In this paper, we propose a fully convolutional neural network (CNN) model to segment contusions…
▽ More
Traumatic brain injury (TBI) is caused by a sudden trauma to the head that may result in hematomas and contusions and can lead to stroke or chronic disability. An accurate quantification of the lesion volumes and their locations is essential to understand the pathophysiology of TBI and its progression. In this paper, we propose a fully convolutional neural network (CNN) model to segment contusions and lesions from brain magnetic resonance (MR) images of patients with TBI. The CNN architecture proposed here was based on a state of the art CNN architecture from Google, called Inception. Using a 3-layer Inception network, lesions are segmented from multi-contrast MR images. When compared with two recent TBI lesion segmentation methods, one based on CNN (called DeepMedic) and another based on random forests, the proposed algorithm showed improved segmentation accuracy on images of 18 patients with mild to severe TBI. Using a leave-one-out cross validation, the proposed model achieved a median Dice of 0.75, which was significantly better (p<0.01) than the two competing methods.
△ Less
Submitted 27 July, 2018;
originally announced July 2018.
-
Adaptive Path Interpolation for Sparse Systems: Application to a Simple Censored Block Model
Authors:
Jean Barbier,
Chun Lam Chan,
Nicolas Macris
Abstract:
Recently a new adaptive path interpolation method has been developed as a simple and versatile scheme to calculate exactly the asymptotic mutual information of Bayesian inference problems defined on dense factor graphs. These include random linear and generalized estimation, sparse superposition codes, or low-rank matrix and tensor estimation. For all these systems, the adaptive interpolation meth…
▽ More
Recently a new adaptive path interpolation method has been developed as a simple and versatile scheme to calculate exactly the asymptotic mutual information of Bayesian inference problems defined on dense factor graphs. These include random linear and generalized estimation, sparse superposition codes, or low-rank matrix and tensor estimation. For all these systems, the adaptive interpolation method directly proves that the replica symmetric prediction is exact, in a simple and unified manner. When the underlying factor graph of the inference problem is sparse the replica prediction is considerably more complicated, and rigorous results are often lacking or obtained by rather complicated methods. In this work we show how to extend the adaptive path interpolation method to sparse systems. We concentrate on a Censored Block Model, where hidden variables are measured through a binary erasure channel, for which we fully prove the replica prediction.
△ Less
Submitted 18 July, 2019; v1 submitted 13 June, 2018;
originally announced June 2018.
-
Causal Inference on Discrete Data via Estimating Distance Correlations
Authors:
Furui Liu,
Laiwan Chan
Abstract:
In this paper, we deal with the problem of inferring causal directions when the data is on discrete domain. By considering the distribution of the cause $P(X)$ and the conditional distribution mapping cause to effect $P(Y|X)$ as independent random variables, we propose to infer the causal direction via comparing the distance correlation between $P(X)$ and $P(Y|X)$ with the distance correlation bet…
▽ More
In this paper, we deal with the problem of inferring causal directions when the data is on discrete domain. By considering the distribution of the cause $P(X)$ and the conditional distribution mapping cause to effect $P(Y|X)$ as independent random variables, we propose to infer the causal direction via comparing the distance correlation between $P(X)$ and $P(Y|X)$ with the distance correlation between $P(Y)$ and $P(X|Y)$. We infer "$X$ causes $Y$" if the dependence coefficient between $P(X)$ and $P(Y|X)$ is smaller. Experiments are performed to show the performance of the proposed method.
△ Less
Submitted 6 August, 2018; v1 submitted 20 March, 2018;
originally announced March 2018.
-
Confounder Detection in High Dimensional Linear Models using First Moments of Spectral Measures
Authors:
Furui Liu,
Laiwan Chan
Abstract:
In this paper, we study the confounder detection problem in the linear model, where the target variable $Y$ is predicted using its $n$ potential causes $X_n=(x_1,...,x_n)^T$. Based on an assumption of rotation invariant generating process of the model, recent study shows that the spectral measure induced by the regression coefficient vector with respect to the covariance matrix of $X_n$ is close t…
▽ More
In this paper, we study the confounder detection problem in the linear model, where the target variable $Y$ is predicted using its $n$ potential causes $X_n=(x_1,...,x_n)^T$. Based on an assumption of rotation invariant generating process of the model, recent study shows that the spectral measure induced by the regression coefficient vector with respect to the covariance matrix of $X_n$ is close to a uniform measure in purely causal cases, but it differs from a uniform measure characteristically in the presence of a scalar confounder. Then, analyzing spectral measure pattern could help to detect confounding. In this paper, we propose to use the first moment of the spectral measure for confounder detection. We calculate the first moment of the regression vector induced spectral measure, and compare it with the first moment of a uniform spectral measure, both defined with respect to the covariance matrix of $X_n$. The two moments coincide in non-confounding cases, and differ from each other in the presence of confounding. This statistical causal-confounding asymmetry can be used for confounder detection. Without the need of analyzing the spectral measure pattern, our method does avoid the difficulty of metric choice and multiple parameter optimization. Experiments on synthetic and real data show the performance of this method.
△ Less
Submitted 20 March, 2018; v1 submitted 19 March, 2018;
originally announced March 2018.
-
The View from the Other Side: The Border Between Controversial Speech and Harassment on Kotaku in Action
Authors:
Shagun Jhaver,
Larry Chan,
Amy Bruckman
Abstract:
In this paper, we use mixed methods to study a controversial Internet site: The Kotaku in Action (KiA) subreddit. Members of KiA are part of GamerGate, a distributed social movement. We present an emic account of what takes place on KiA who are they, what are their goals and beliefs, and what rules do they follow. Members of GamerGate in general and KiA in particular have often been accused of har…
▽ More
In this paper, we use mixed methods to study a controversial Internet site: The Kotaku in Action (KiA) subreddit. Members of KiA are part of GamerGate, a distributed social movement. We present an emic account of what takes place on KiA who are they, what are their goals and beliefs, and what rules do they follow. Members of GamerGate in general and KiA in particular have often been accused of harassment. However, KiA site policies explicitly prohibit such behavior, and members insist that they have been falsely accused. Underlying the controversy over whether KiA supports harassment is a complex disagreement about what "harassment" is, and where to draw the line between freedom of expression and censorship. We propose a model that characterizes perceptions of controversial speech, dividing it into four categories: criticism, insult, public shaming, and harassment. We also discuss design solutions that address the challenges of moderating harassment without impinging on free speech, and communicating across different ideologies.
△ Less
Submitted 8 February, 2018; v1 submitted 11 December, 2017;
originally announced December 2017.
-
ProLanGO: Protein Function Prediction Using Neural~Machine Translation Based on a Recurrent Neural Network
Authors:
Renzhi Cao,
Colton Freitas,
Leong Chan,
Miao Sun,
Haiqing Jiang,
Zhangxin Chen
Abstract:
With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences…
▽ More
With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language "ProLan" to the protein function language "GOLan", and build a neural machine translation model based on recurrent neural networks to translate "ProLan" language to "GOLan" language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.
△ Less
Submitted 19 October, 2017;
originally announced October 2017.
-
The Message or the Messenger? Inferring Virality and Diffusion Structure from Online Petition Signature Data
Authors:
Chi Ling Chan,
Justin Lai,
Bryan Hooi,
Todd Davies
Abstract:
Goel et al. (2016) examined diffusion data from Twitter to conclude that online petitions are shared more virally than other types of content. Their definition of structural virality, which measures the extent to which diffusion follows a broadcast model or is spread person to person (virally), depends on knowing the topology of the diffusion cascade. But often the diffusion structure cannot be ob…
▽ More
Goel et al. (2016) examined diffusion data from Twitter to conclude that online petitions are shared more virally than other types of content. Their definition of structural virality, which measures the extent to which diffusion follows a broadcast model or is spread person to person (virally), depends on knowing the topology of the diffusion cascade. But often the diffusion structure cannot be observed directly. We examined time-stamped signature data from the Obama White House's We the People petition platform. We developed measures based on temporal dynamics that, we argue, can be used to infer diffusion structure as well as the more intrinsic notion of virality sometimes known as infectiousness. These measures indicate that successful petitions are likely to be higher in both intrinsic and structural virality than unsuccessful petitions are. We also investigate threshold effects on petition signing that challenge simple contagion models, and report simulations for a theoretical model that are consistent with our data.
△ Less
Submitted 11 August, 2017;
originally announced August 2017.