Search | arXiv e-print repository

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Authors: Viraat Aryabumi, Yixuan Su, Raymond Ma, Adrien Morisot, Ivan Zhang, Acyr Locatelli, Marzieh Fadaee, Ahmet Üstün, Sara Hooker

Abstract: Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been anecdotal consensus among practitioners that code data plays a vital role in general LLMs' performance, there is only limited work analyzing the precise impact of code on non-code tasks. In this work, we systematically investig… ▽ More Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been anecdotal consensus among practitioners that code data plays a vital role in general LLMs' performance, there is only limited work analyzing the precise impact of code on non-code tasks. In this work, we systematically investigate the impact of code data on general performance. We ask "what is the impact of code data used in pre-training on a large variety of downstream tasks beyond code generation". We conduct extensive ablations and evaluate across a broad range of natural language reasoning tasks, world knowledge tasks, code benchmarks, and LLM-as-a-judge win-rates for models with sizes ranging from 470M to 2.8B parameters. Across settings, we find a consistent results that code is a critical building block for generalization far beyond coding tasks and improvements to code quality have an outsized impact across all tasks. In particular, compared to text-only pre-training, the addition of code results in up to relative increase of 8.2% in natural language (NL) reasoning, 4.2% in world knowledge, 6.6% improvement in generative win-rates, and a 12x boost in code performance respectively. Our work suggests investments in code quality and preserving code during pre-training have positive impacts. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2406.16900 [pdf, other]

Utilizing Weak-to-Strong Consistency for Semi-Supervised Glomeruli Segmentation

Authors: Irina Zhang, Jim Denholm, Azam Hamidinekoo, Oskar Ålund, Christopher Bagnall, Joana Palés Huix, Michal Sulikowski, Ortensia Vito, Arthur Lewis, Robert Unwin, Magnus Soderberg, Nikolay Burlutskiy, Talha Qaiser

Abstract: Accurate segmentation of glomerulus instances attains high clinical significance in the automated analysis of renal biopsies to aid in diagnosing and monitoring kidney disease. Analyzing real-world histopathology images often encompasses inter-observer variability and requires a labor-intensive process of data annotation. Therefore, conventional supervised learning approaches generally achieve sub… ▽ More Accurate segmentation of glomerulus instances attains high clinical significance in the automated analysis of renal biopsies to aid in diagnosing and monitoring kidney disease. Analyzing real-world histopathology images often encompasses inter-observer variability and requires a labor-intensive process of data annotation. Therefore, conventional supervised learning approaches generally achieve sub-optimal performance when applied to external datasets. Considering these challenges, we present a semi-supervised learning approach for glomeruli segmentation based on the weak-to-strong consistency framework validated on multiple real-world datasets. Our experimental results on 3 independent datasets indicate superior performance of our approach as compared with existing supervised baseline models such as U-Net and SegFormer. △ Less

Submitted 30 May, 2024; originally announced June 2024.

Comments: accepted to MIDL'24

arXiv:2403.14770 [pdf, other]

Beehive: A Flexible Network Stack for Direct-Attached Accelerators

Authors: Katie Lim, Matthew Giordano, Theano Stavrinos, Pratyush Patel, Jacob Nelson, Irene Zhang, Baris Kasikci, Tom Anderson

Abstract: Direct-attached accelerators, where application accelerators are directly connected to the datacenter network via a hardware network stack, offer substantial benefits in terms of reduced latency, CPU overhead, and energy use. However, a key challenge is that modern datacenter network stacks are complex, with interleaved protocol layers, network management functions, and virtualization support. To… ▽ More Direct-attached accelerators, where application accelerators are directly connected to the datacenter network via a hardware network stack, offer substantial benefits in terms of reduced latency, CPU overhead, and energy use. However, a key challenge is that modern datacenter network stacks are complex, with interleaved protocol layers, network management functions, and virtualization support. To operators, network feature agility, diagnostics, and manageability are often considered just as important as raw performance. By contrast, existing hardware network stacks only support basic protocols and are often difficult to extend since they use fixed processing pipelines. We propose Beehive, a new, open-source FPGA network stack for direct-attached accelerators designed to enable flexible and adaptive construction of complex network functionality in hardware. Application and network protocol elements are modularized as tiles over a network-on-chip substrate. Elements can be added or scaled up/down to match workload characteristics with minimal effort or changes to other elements. Flexible diagnostics and control are integral, with tooling to ensure deadlock safety. Our implementation interoperates with standard Linux TCP and UDP clients, with a 4x improvement in end-to-end remote procedure call tail latency for Linux UDP clients versus a CPU-attached accelerator △ Less

Submitted 30 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

arXiv:2312.09118 [pdf, other]

LayerZero

Authors: Ryan Zarick, Bryan Pellegrino, Isaac Zhang, Thomas Kim, Caleb Banister

Abstract: In this paper, we present the first intrinsically secure and semantically universal omnichain interoperability protocol: LayerZero. Utilizing an immutable endpoint, append-only verification modules, and fully-configurable verification infrastructure, LayerZero provides the security, configurability, and extensibility necessary to achieve omnichain interoperability. LayerZero enforces strict applic… ▽ More In this paper, we present the first intrinsically secure and semantically universal omnichain interoperability protocol: LayerZero. Utilizing an immutable endpoint, append-only verification modules, and fully-configurable verification infrastructure, LayerZero provides the security, configurability, and extensibility necessary to achieve omnichain interoperability. LayerZero enforces strict application-exclusive ownership of protocol security and cost through its novel trust-minimized modular security framework which is designed to universally support all blockchains and use cases. Omnichain applications (OApps) built on the LayerZero protocol achieve frictionless blockchain-agnostic interoperation through LayerZero's universal network semantics. △ Less

Submitted 23 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

arXiv:2311.08639 [pdf, other]

ColorTrace: Fungible token coloring and attribution

Authors: Ryan Zarick, Bryan Pellegrino, Isaac Zhang, Thomas Kim, Caleb Banister

Abstract: We formally define the fungible token coloring problem of attributing (coloring) fungible tokens to originating entities (minters), and present, to our knowledge, the first practical onchain algorithm to solve it. Tracking attribution of colored tokens losslessly using existing approaches such as the Colored Coins protocol is computationally intractable due to the per-wallet storage requirements g… ▽ More We formally define the fungible token coloring problem of attributing (coloring) fungible tokens to originating entities (minters), and present, to our knowledge, the first practical onchain algorithm to solve it. Tracking attribution of colored tokens losslessly using existing approaches such as the Colored Coins protocol is computationally intractable due to the per-wallet storage requirements growing in proportion to the number of minters. Our first contribution is an elegant solution to the single-chain token coloring problem, where colored tokens are atomically burned and minted to ensure each wallet only contains tokens of a single color. Our second contribution is an extension to this single-chain token coloring algorithm to allow safe and efficient crosschain token transfers. We present ColorTrace, an onchain algorithm to achieve globally consistent, economically feasible, fungible token coloring. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.08041 [pdf, other]

ColorFloat: Constant space token coloring

Authors: Ryan Zarick, Bryan Pellegrino, Isaac Zhang, Thomas Kim, Caleb Banister

Abstract: We present ColorFloat, a family of O(1) space complexity algorithms that solve the problem of attributing (coloring) fungible tokens to the entity that minted them (minter). Tagging fungible tokens with metadata is not a new problem and was first formalized in the Colored Coins protocol. In certain contexts, practical solutions to this challenge have been implemented and deployed such as NFT. We d… ▽ More We present ColorFloat, a family of O(1) space complexity algorithms that solve the problem of attributing (coloring) fungible tokens to the entity that minted them (minter). Tagging fungible tokens with metadata is not a new problem and was first formalized in the Colored Coins protocol. In certain contexts, practical solutions to this challenge have been implemented and deployed such as NFT. We define the fungible token coloring problem, one specific aspect of the Colored Coins problem, to be the problem of retaining fungible characteristics of the underlying token while accurately tracking the attribution of fungible tokens to their respective minters. Fungible token coloring has a wide range of Web3 applications. One application which we highlight in this paper is the onchain yield-sharing collateral-based stablecoin. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.03121 [pdf]

OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials

Authors: Peter Eastman, Raimondas Galvelis, Raúl P. Peláez, Charlles R. A. Abreu, Stephen E. Farr, Emilio Gallicchio, Anton Gorenko, Michael M. Henry, Frank Hu, Jing Huang, Andreas Krämer, Julien Michel, Joshua A. Mitchell, Vijay S. Pande, João PGLM Rodrigues, Jaime Rodriguez-Guerra, Andrew C. Simmonett, Sukrit Singh, Jason Swails, Philip Turner, Yuanqing Wang, Ivy Zhang, John D. Chodera, Gianni De Fabritiis, Thomas E. Markland

Abstract: Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general… ▽ More Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features on simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein (GFP) chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations at only a modest increase in cost. △ Less

Submitted 29 November, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: 16 pages, 5 figures

ACM Class: J.2; J.3

arXiv:2304.04488 [pdf, other]

Hybrid Computing for Interactive Datacenter Applications

Authors: Pratyush Patel, Katie Lim, Kushal Jhunjhunwalla, Ashlie Martinez, Max Demoulin, Jacob Nelson, Irene Zhang, Thomas Anderson

Abstract: Field-Programmable Gate Arrays (FPGAs) are more energy efficient and cost effective than CPUs for a wide variety of datacenter applications. Yet, for latency-sensitive and bursty workloads, this advantage can be difficult to harness due to high FPGA spin-up costs. We propose that a hybrid FPGA and CPU computing framework can harness the energy efficiency benefits of FPGAs for such workloads at rea… ▽ More Field-Programmable Gate Arrays (FPGAs) are more energy efficient and cost effective than CPUs for a wide variety of datacenter applications. Yet, for latency-sensitive and bursty workloads, this advantage can be difficult to harness due to high FPGA spin-up costs. We propose that a hybrid FPGA and CPU computing framework can harness the energy efficiency benefits of FPGAs for such workloads at reasonable cost. Our key insight is to use FPGAs for stable-state workload and CPUs for short-term workload bursts. Using this insight, we design Spork, a lightweight hybrid scheduler that can realize these energy efficiency and cost benefits in practice. Depending on the desired objective, Spork can trade off energy efficiency for cost reduction and vice versa. It is parameterized with key differences between FPGAs and CPUs in terms of power draw, performance, cost, and spin-up latency. We vary this parameter space and analyze various application and worker configurations on production and synthetic traces. Our evaluation of cloud workloads shows that energy-optimized Spork is not only more energy efficient but it is also cheaper than homogeneous platforms--for short application requests with tight deadlines, it is 1.53x more energy efficient and 2.14x cheaper than using only FPGAs. Relative to an idealized version of an existing cost-optimized hybrid scheduler, energy-optimized Spork provides 1.2-2.4x higher energy efficiency at comparable cost, while cost-optimized Spork provides 1.1-2x higher energy efficiency at 1.06-1.2x lower cost. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: 13 pages

arXiv:2201.02120 [pdf, other]

Treehouse: A Case For Carbon-Aware Datacenter Software

Authors: Thomas Anderson, Adam Belay, Mosharaf Chowdhury, Asaf Cidon, Irene Zhang

Abstract: The end of Dennard scaling and the slowing of Moore's Law has put the energy use of datacenters on an unsustainable path. Datacenters are already a significant fraction of worldwide electricity use, with application demand scaling at a rapid rate. We argue that substantial reductions in the carbon intensity of datacenter computing are possible with a software-centric approach: by making energy and… ▽ More The end of Dennard scaling and the slowing of Moore's Law has put the energy use of datacenters on an unsustainable path. Datacenters are already a significant fraction of worldwide electricity use, with application demand scaling at a rapid rate. We argue that substantial reductions in the carbon intensity of datacenter computing are possible with a software-centric approach: by making energy and carbon visible to application developers on a fine-grained basis, by modifying system APIs to make it possible to make informed trade offs between performance and carbon emissions, and by raising the level of application programming to allow for flexible use of more energy efficient means of compute and storage. We also lay out a research agenda for systems software to reduce the carbon footprint of datacenter computing. △ Less

Submitted 6 January, 2022; originally announced January 2022.

arXiv:2108.07790 [pdf, other]

Mitigating harm in language models with conditional-likelihood filtration

Authors: Helen Ngo, Cooper Raterink, João G. M. Araújo, Ivan Zhang, Carol Chen, Adrien Morisot, Nicholas Frosst

Abstract: Language models trained on large-scale unfiltered datasets curated from the open web acquire systemic biases, prejudices, and harmful views from their training data. We present a methodology for programmatically identifying and removing harmful text from web-scale datasets. A pretrained language model is used to calculate the log-likelihood of researcher-written trigger phrases conditioned on a sp… ▽ More Language models trained on large-scale unfiltered datasets curated from the open web acquire systemic biases, prejudices, and harmful views from their training data. We present a methodology for programmatically identifying and removing harmful text from web-scale datasets. A pretrained language model is used to calculate the log-likelihood of researcher-written trigger phrases conditioned on a specific document, which is used to identify and filter documents from the dataset. We demonstrate that models trained on this filtered dataset exhibit lower propensity to generate harmful text, with a marginal decrease in performance on standard language modeling benchmarks compared to unfiltered baselines. We provide a partial explanation for this performance gap by surfacing examples of hate speech and other undesirable content from standard language modeling benchmarks. Finally, we discuss the generalization of this method and how trigger phrases which reflect specific values can be used by researchers to build language models which are more closely aligned with their values. △ Less

Submitted 27 November, 2021; v1 submitted 4 August, 2021; originally announced August 2021.

arXiv:2010.01196 [pdf, other]

doi 10.1039/D2SC02739A

End-to-End Differentiable Molecular Mechanics Force Field Construction

Authors: Yuanqing Wang, Josh Fass, Benjamin Kaminow, John E. Herr, Dominic Rufa, Ivy Zhang, Iván Pulido, Mike Henry, John D. Chodera

Abstract: Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discret… ▽ More Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discrete chemical perception rules or applying parameters to small molecules or biopolymers, making it difficult to optimize both types and parameters to fit quantum chemical or physical property data. Here, we propose an alternative approach that uses graph neural networks to perceive chemical environments, producing continuous atom embeddings from which valence and nonbonded parameters can be predicted using invariance-preserving layers. Since all stages are built from smooth neural functions, the entire process is modular and end-to-end differentiable with respect to model parameters, allowing new force fields to be easily constructed, extended, and applied to arbitrary molecules. We show that this approach is not only sufficiently expressive to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields. Trained with arbitrary loss functions, it can construct entirely new force fields self-consistently applicable to both biopolymers and small molecules directly from quantum chemical calculations, with superior fidelity than traditional atom or parameter typing schemes. When trained on the same quantum chemical small molecule dataset used to parameterize the openff-1.2.0 small molecule force field augmented with a peptide dataset, the resulting espaloma model shows superior accuracy vis-à-vis experiments in computing relative alchemical free energy calculations for a popular benchmark set. △ Less

Submitted 18 April, 2022; v1 submitted 2 October, 2020; originally announced October 2020.

arXiv:2008.06536 [pdf, other]

Making Distributed Mobile Applications SAFE: Enforcing User Privacy Policies on Untrusted Applications with Secure Application Flow Enforcement

Authors: Adriana Szekeres, Irene Zhang, Katelin Bailey, Isaac Ackerman, Haichen Shen, Franziska Roesner, Dan R. K. Ports, Arvind Krishnamurthy, Henry M. Levy

Abstract: Today's mobile devices sense, collect, and store huge amounts of personal information, which users share with family and friends through a wide range of applications. Once users give applications access to their data, they must implicitly trust that the apps correctly maintain data privacy. As we know from both experience and all-too-frequent press articles, that trust is often misplaced. While us… ▽ More Today's mobile devices sense, collect, and store huge amounts of personal information, which users share with family and friends through a wide range of applications. Once users give applications access to their data, they must implicitly trust that the apps correctly maintain data privacy. As we know from both experience and all-too-frequent press articles, that trust is often misplaced. While users do not trust applications, they do trust their mobile devices and operating systems. Unfortunately, sharing applications are not limited to mobile clients but must also run on cloud services to share data between users. In this paper, we leverage the trust that users have in their mobile OSes to vet cloud services. To do so, we define a new Secure Application Flow Enforcement (SAFE) framework, which requires cloud services to attest to a system stack that will enforce policies provided by the mobile OS for user data. We implement a mobile OS that enforces SAFE policies on unmodified mobile apps and two systems for enforcing policies on untrusted cloud services. Using these prototypes, we demonstrate that it is possible to enforce existing user privacy policies on unmodified applications. △ Less

Submitted 14 August, 2020; originally announced August 2020.

arXiv:2008.01896 [pdf, other]

doi 10.1109/TMI.2021.3059282

A coarse-to-fine framework for unsupervised multi-contrast MR image deformable registration with dual consistency constraint

Authors: Weijian Huang, Hao Yang, Xinfeng Liu, Cheng Li, Ian Zhang, Rongpin Wang, Hairong Zheng, Shanshan Wang

Abstract: Multi-contrast magnetic resonance (MR) image registration is useful in the clinic to achieve fast and accurate imaging-based disease diagnosis and treatment planning. Nevertheless, the efficiency and performance of the existing registration algorithms can still be improved. In this paper, we propose a novel unsupervised learning-based framework to achieve accurate and efficient multi-contrast MR i… ▽ More Multi-contrast magnetic resonance (MR) image registration is useful in the clinic to achieve fast and accurate imaging-based disease diagnosis and treatment planning. Nevertheless, the efficiency and performance of the existing registration algorithms can still be improved. In this paper, we propose a novel unsupervised learning-based framework to achieve accurate and efficient multi-contrast MR image registrations. Specifically, an end-to-end coarse-to-fine network architecture consisting of affine and deformable transformations is designed to improve the robustness and achieve end-to-end registration. Furthermore, a dual consistency constraint and a new prior knowledge-based loss function are developed to enhance the registration performances. The proposed method has been evaluated on a clinical dataset containing 555 cases, and encouraging performances have been achieved. Compared to the commonly utilized registration methods, including VoxelMorph, SyN, and LT-Net, the proposed method achieves better registration performance with a Dice score of 0.8397 in identifying stroke lesions. With regards to the registration speed, our method is about 10 times faster than the most competitive method of SyN (Affine) when testing on a CPU. Moreover, we prove that our method can still perform well on more challenging tasks with lacking scanning information data, showing high robustness for the clinical application. △ Less

Submitted 16 February, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

Journal ref: IEEE Transactions on Medical Imaging (2021)

arXiv:2001.08250 [pdf, other]

doi 10.1145/3427228.3427231

Talek: Private Group Messaging with Hidden Access Patterns

Authors: Raymond Cheng, William Scott, Elisaweta Masserova, Irene Zhang, Vipul Goyal, Thomas Anderson, Arvind Krishnamurthy, Bryan Parno

Abstract: Talek is a private group messaging system that sends messages through potentially untrustworthy servers, while hiding both data content and the communication patterns among its users. Talek explores a new point in the design space of private messaging; it guarantees access sequence indistinguishability, which is among the strongest guarantees in the space, while assuming an anytrust threat model,… ▽ More Talek is a private group messaging system that sends messages through potentially untrustworthy servers, while hiding both data content and the communication patterns among its users. Talek explores a new point in the design space of private messaging; it guarantees access sequence indistinguishability, which is among the strongest guarantees in the space, while assuming an anytrust threat model, which is only slightly weaker than the strongest threat model currently found in related work. Our results suggest that this is a pragmatic point in the design space, since it supports strong privacy and good performance: we demonstrate a 3-server Talek cluster that achieves throughput of 9,433 messages/second for 32,000 active users with 1.7-second end-to-end latency. To achieve its security goals without coordination between clients, Talek relies on information-theoretic private information retrieval. To achieve good performance and minimize server-side storage, Talek introduces new techniques and optimizations that may be of independent interest, e.g., a novel use of blocked cuckoo hashing and support for private notifications. The latter provide a private, efficient mechanism for users to learn, without polling, which logs have new messages. △ Less

Submitted 15 December, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

arXiv:1910.03655 [pdf, other]

Executing Instructions in Situated Collaborative Interactions

Authors: Alane Suhr, Claudia Yan, Charlotte Schluger, Stanley Yu, Hadi Khader, Marwa Mouallem, Iris Zhang, Yoav Artzi

Abstract: We study a collaborative scenario where a user not only instructs a system to complete tasks, but also acts alongside it. This allows the user to adapt to the system abilities by changing their language or deciding to simply accomplish some tasks themselves, and requires the system to effectively recover from errors as the user strategically assigns it new goals. We build a game environment to stu… ▽ More We study a collaborative scenario where a user not only instructs a system to complete tasks, but also acts alongside it. This allows the user to adapt to the system abilities by changing their language or deciding to simply accomplish some tasks themselves, and requires the system to effectively recover from errors as the user strategically assigns it new goals. We build a game environment to study this scenario, and learn to map user instructions to system actions. We introduce a learning approach focused on recovery from cascading errors between instructions, and modeling methods to explicitly reason about instructions with multiple goals. We evaluate with a new evaluation protocol using recorded interactions and online games with human users, and observe how users adapt to the system abilities. △ Less

Submitted 22 November, 2022; v1 submitted 8 October, 2019; originally announced October 2019.

Comments: EMNLP 2019 long paper

arXiv:1905.13678 [pdf, other]

Learning Sparse Networks Using Targeted Dropout

Authors: Aidan N. Gomez, Ivan Zhang, Siddhartha Rao Kamalakara, Divyam Madaan, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton

Abstract: Neural networks are easier to optimise when they have many more weights than are required for modelling the mapping from inputs to outputs. This suggests a two-stage learning procedure that first learns a large net and then prunes away connections or hidden units. But standard training does not necessarily encourage nets to be amenable to pruning. We introduce targeted dropout, a method for traini… ▽ More Neural networks are easier to optimise when they have many more weights than are required for modelling the mapping from inputs to outputs. This suggests a two-stage learning procedure that first learns a large net and then prunes away connections or hidden units. But standard training does not necessarily encourage nets to be amenable to pruning. We introduce targeted dropout, a method for training a neural network so that it is robust to subsequent pruning. Before computing the gradients for each weight update, targeted dropout stochastically selects a set of units or weights to be dropped using a simple self-reinforcing sparsity criterion and then computes the gradients for the remaining weights. The resulting network is robust to post hoc pruning of weights or units that frequently occur in the dropped sets. The method improves upon more complicated sparsifying regularisers while being simple to implement and easy to tune. △ Less

Submitted 9 September, 2019; v1 submitted 31 May, 2019; originally announced May 2019.

arXiv:1811.00491 [pdf, other]

A Corpus for Reasoning About Natural Language Grounded in Photographs

Authors: Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, Yoav Artzi

Abstract: We introduce a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges. The data contains 107,292 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a pair of photographs. We crowdsource the data using sets of visually ri… ▽ More We introduce a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges. The data contains 107,292 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a pair of photographs. We crowdsource the data using sets of visually rich images and a compare-and-contrast task to elicit linguistically diverse language. Qualitative analysis shows the data requires compositional joint reasoning, including about quantities, comparisons, and relations. Evaluation using state-of-the-art visual reasoning methods shows the data presents a strong challenge. △ Less

Submitted 21 July, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

Comments: ACL 2019 Long Paper

arXiv:1801.04883 [pdf, other]

Unsupervised Cipher Cracking Using Discrete GANs

Authors: Aidan N. Gomez, Sicong Huang, Ivan Zhang, Bryan M. Li, Muhammad Osama, Lukasz Kaiser

Abstract: This work details CipherGAN, an architecture inspired by CycleGAN used for inferring the underlying cipher mapping given banks of unpaired ciphertext and plaintext. We demonstrate that CipherGAN is capable of cracking language data enciphered using shift and Vigenere ciphers to a high degree of fidelity and for vocabularies much larger than previously achieved. We present how CycleGAN can be made… ▽ More This work details CipherGAN, an architecture inspired by CycleGAN used for inferring the underlying cipher mapping given banks of unpaired ciphertext and plaintext. We demonstrate that CipherGAN is capable of cracking language data enciphered using shift and Vigenere ciphers to a high degree of fidelity and for vocabularies much larger than previously achieved. We present how CycleGAN can be made compatible with discrete data and train in a stable way. We then prove that the technique used in CipherGAN avoids the common problem of uninformative discrimination associated with GANs applied to discrete data. △ Less

Submitted 15 January, 2018; originally announced January 2018.

Showing 1–18 of 18 results for author: Zhang, I