Zum Hauptinhalt springen

Showing 1–40 of 40 results for author: Reid, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12655  [pdf, other

    cs.LG cs.HC

    Improving Radiography Machine Learning Workflows via Metadata Management for Training Data Selection

    Authors: Mirabel Reid, Christine Sweeney, Oleg Korobkin

    Abstract: Most machine learning models require many iterations of hyper-parameter tuning, feature engineering, and debugging to produce effective results. As machine learning models become more complicated, this pipeline becomes more difficult to manage effectively. In the physical sciences, there is an ever-increasing pool of metadata that is generated by the scientific research cycle. Tracking this metada… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 14 pages, 9 figures

  2. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (172 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  3. arXiv:2406.14722  [pdf, other

    cs.AI

    Does GPT Really Get It? A Hierarchical Scale to Quantify Human vs AI's Understanding of Algorithms

    Authors: Mirabel Reid, Santosh S. Vempala

    Abstract: As Large Language Models (LLMs) perform (and sometimes excel at) more and more complex cognitive tasks, a natural question is whether AI really understands. The study of understanding in LLMs is in its infancy, and the community has yet to incorporate well-trodden research in philosophy, psychology, and education. We initiate this, specifically focusing on understanding algorithms, and propose a h… ▽ More

    Submitted 20 August, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 13 pages, 8 figures

    ACM Class: I.2.m; F.1.1

  4. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  5. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  6. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  7. arXiv:2305.14857  [pdf, other

    cs.CL

    BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer

    Authors: Akari Asai, Sneha Kudugunta, Xinyan Velocity Yu, Terra Blevins, Hila Gonen, Machel Reid, Yulia Tsvetkov, Sebastian Ruder, Hannaneh Hajishirzi

    Abstract: Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English. To facilitate research on few-shot cross-lingual transfer, we introduce a new benchmark, called BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format and provides a fixed set of few-shot examples and instructi… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: The data and code is available at https://buffetfs.github.io/

  8. arXiv:2305.14224  [pdf, other

    cs.CL

    mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations

    Authors: Jonas Pfeiffer, Francesco Piccinno, Massimo Nicosia, Xinyi Wang, Machel Reid, Sebastian Ruder

    Abstract: Multilingual sequence-to-sequence models perform poorly with increased language coverage and fail to consistently generate text in the correct target language in few-shot settings. To address these challenges, we propose mmT5, a modular multilingual sequence-to-sequence model. mmT5 utilizes language-specific modules during pre-training, which disentangle language-specific information from language… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  9. arXiv:2212.10173  [pdf, other

    cs.CL

    On the Role of Parallel Data in Cross-lingual Transfer Learning

    Authors: Machel Reid, Mikel Artetxe

    Abstract: While prior work has established that the use of parallel data is conducive for cross-lingual learning, it is unclear if the improvements come from the data itself, or if it is the modeling of parallel interactions that matters. Exploring this, we examine the usage of unsupervised machine translation to generate synthetic parallel data, and compare it to supervised machine translation and gold par… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: Preprint

  10. arXiv:2210.16886  [pdf, other

    cs.CL cs.LG

    DiffusER: Discrete Diffusion via Edit-based Reconstruction

    Authors: Machel Reid, Vincent J. Hellendoorn, Graham Neubig

    Abstract: In text generation, models that generate text from scratch one token at a time are currently the dominant paradigm. Despite being performant, these models lack the ability to revise existing text, which limits their usability in many practical scenarios. We look to address this, with DiffusER (Diffusion via Edit-based Reconstruction), a new edit-based generative model for text based on denoising d… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: Preprint. Work in progress

  11. arXiv:2210.07370  [pdf, other

    cs.CL

    M2D2: A Massively Multi-domain Language Modeling Dataset

    Authors: Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer

    Abstract: We present M2D2, a fine-grained, massively multi-domain corpus for studying domain adaptation in language models (LMs). M2D2 consists of 8.5B tokens and spans 145 domains extracted from Wikipedia and Semantic Scholar. Using ontologies derived from Wikipedia and ArXiv categories, we organize the domains in each data source into 22 groups. This two-level hierarchy enables the study of relationships… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  12. arXiv:2205.12374  [pdf, other

    cs.CL cs.LG

    Learning to Model Editing Processes

    Authors: Machel Reid, Graham Neubig

    Abstract: Most existing sequence generation models produce outputs in one pass, usually left-to-right. However, this is in contrast with a more natural approach that humans use in generating content; iterative refinement and editing. Recent work has introduced edit-based models for various tasks (such as neural machine translation and text style transfer), but these generally model a single edit step. In th… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  13. arXiv:2205.11916  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models are Zero-Shot Reasoners

    Authors: Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa

    Abstract: Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and sy… ▽ More

    Submitted 29 January, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Accepted to NeurIPS2022. Our code is available at https://github.com/kojima-takeshi188/zero_shot_cot

  14. arXiv:2205.09488  [pdf

    cs.SE cs.LG cs.NI

    PSI Draft Specification

    Authors: Mark Reid, James Montgomery, Barry Drake, Avraham Ruderman

    Abstract: This document presents the draft specification for delivering machine learning services over HTTP, developed as part of the Protocols and Structures for Inference project, which concluded in 2013. It presents the motivation for providing machine learning as a service, followed by a description of the essential and optional components of such a service.

    Submitted 1 May, 2022; originally announced May 2022.

    Comments: Software specification for PSI machine learning web services. 42 pages, 2 figures

  15. arXiv:2205.02022  [pdf, other

    cs.CL

    A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation

    Authors: David Ifeoluwa Adelani, Jesujoba Oluwadara Alabi, Angela Fan, Julia Kreutzer, Xiaoyu Shen, Machel Reid, Dana Ruiter, Dietrich Klakow, Peter Nabende, Ernie Chang, Tajuddeen Gwadabe, Freshia Sackey, Bonaventure F. P. Dossou, Chris Chinenye Emezue, Colin Leong, Michael Beukman, Shamsuddeen Hassan Muhammad, Guyo Dub Jarso, Oreen Yousuf, Andre Niyongabo Rubungo, Gilles Hacheme, Eric Peter Wairagala, Muhammad Umair Nasir, Benjamin Ayoade Ajibade, Tunde Oluwaseyi Ajayi , et al. (20 additional authors not shown)

    Abstract: Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models… ▽ More

    Submitted 22 August, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted to NAACL 2022 (added evaluation data for amh, kin, nya, sna, xho)

  16. arXiv:2203.12680  [pdf, other

    math.PR cs.DM

    The $k$-Cap Process on Geometric Random Graphs

    Authors: Mirabel Reid, Santosh S. Vempala

    Abstract: The $k$-cap (or $k$-winners-take-all) process on a graph works as follows: in each iteration, exactly $k$ vertices of the graph are in the cap (i.e., winners); the next round winners are the vertices that have the highest total degree to the current winners, with ties broken randomly. This natural process is a simple model of firing activity in the brain. We study its convergence on geometric rand… ▽ More

    Submitted 15 November, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: We edited to extend the analysis of the discrete k-cap process from 1-D interval graphs to constant d-dimensional graphs

    MSC Class: 37E05; 37E25 ACM Class: F.1.1; G.3

  17. arXiv:2201.12122  [pdf, other

    cs.LG cs.AI cs.CL

    Can Wikipedia Help Offline Reinforcement Learning?

    Authors: Machel Reid, Yutaro Yamada, Shixiang Shane Gu

    Abstract: Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is tr… ▽ More

    Submitted 23 July, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

  18. arXiv:2109.04715  [pdf, other

    cs.CL

    AfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages

    Authors: Machel Reid, Junjie Hu, Graham Neubig, Yutaka Matsuo

    Abstract: Reproducible benchmarks are crucial in driving progress of machine translation research. However, existing machine translation benchmarks have been mostly limited to high-resource or well-represented languages. Despite an increasing interest in low-resource machine translation, there are no standardized reproducible benchmarks for many African languages, many of which are used by millions of speak… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  19. arXiv:2108.01887  [pdf, other

    cs.CL cs.LG

    PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

    Authors: Machel Reid, Mikel Artetxe

    Abstract: Despite the success of multilingual sequence-to-sequence pretraining, most existing approaches rely on monolingual corpora, and do not make use of the strong cross-lingual signal contained in parallel data. In this paper, we present PARADISE (PARAllel & Denoising Integration in SEquence-to-sequence models), which extends the conventional denoising objective used to train these models by (i) replac… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

    Comments: Preprint

  20. arXiv:2105.08206  [pdf, other

    cs.CL

    LEWIS: Levenshtein Editing for Unsupervised Text Style Transfer

    Authors: Machel Reid, Victor Zhong

    Abstract: Many types of text style transfer can be achieved with only small, precise edits (e.g. sentiment transfer from I had a terrible time... to I had a great time...). We propose a coarse-to-fine editor for style transfer that transforms text using Levenshtein edit operations (e.g. insert, replace, delete). Unlike prior single-span edit methods, our method concurrently edits multiple spans in the sourc… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: ACL-IJCNLP 2021 (Findings)

  21. arXiv:2101.11489  [pdf, other

    hep-ex cs.DC

    Parallelizing the Unpacking and Clustering of Detector Data for Reconstruction of Charged Particle Tracks on Multi-core CPUs and Many-core GPUs

    Authors: Giuseppe Cerati, Peter Elmer, Brian Gravelle, Matti Kortelainen, Vyacheslav Krutelyov, Steven Lantz, Mario Masciovecchio, Kevin McDermott, Boyana Norris, Allison Reinsvold Hall, Micheal Reid, Daniel Riley, Matevž Tadel, Peter Wittich, Bei Wang, Frank Würthwein, Avraham Yagil

    Abstract: We present results from parallelizing the unpacking and clustering steps of the raw data from the silicon strip modules for reconstruction of charged particle tracks. Throughput is further improved by concurrently processing multiple events using nested OpenMP parallelism on CPU or CUDA streams on GPU. The new implementation along with earlier work in developing a parallelized and vectorized imple… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

  22. arXiv:2101.00234  [pdf, other

    cs.CL cs.LG

    Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers

    Authors: Machel Reid, Edison Marrese-Taylor, Yutaka Matsuo

    Abstract: Transformers have shown improved performance when compared to previous architectures for sequence processing such as RNNs. Despite their sizeable performance gains, as recently suggested, the model is computationally expensive to train and with a high parameter budget. In light of this, we explore parameter-sharing methods in Transformers with a specific focus on generative models. We perform an a… ▽ More

    Submitted 8 September, 2021; v1 submitted 1 January, 2021; originally announced January 2021.

    Comments: EMNLP 2021 Findings

  23. arXiv:2010.03124  [pdf, other

    cs.CL cs.LG

    VCDM: Leveraging Variational Bi-encoding and Deep Contextualized Word Representations for Improved Definition Modeling

    Authors: Machel Reid, Edison Marrese-Taylor, Yutaka Matsuo

    Abstract: In this paper, we tackle the task of definition modeling, where the goal is to learn to generate definitions of words and phrases. Existing approaches for this task are discriminative, combining distributional and lexical semantics in an implicit rather than direct way. To tackle this issue we propose a generative model for the task, introducing a continuous latent variable to explicitly model the… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020, 10 Pages

  24. arXiv:2004.09143  [pdf, other

    cs.CL

    Variational Inference for Learning Representations of Natural Language Edits

    Authors: Edison Marrese-Taylor, Machel Reid, Yutaka Matsuo

    Abstract: Document editing has become a pervasive component of the production of information, with version control systems enabling edits to be efficiently stored and applied. In light of this, the task of learning distributed representations of edits has been recently proposed. With this in mind, we propose a novel approach that employs variational inference to learn a continuous latent space of vector rep… ▽ More

    Submitted 3 January, 2021; v1 submitted 20 April, 2020; originally announced April 2020.

    Comments: Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

  25. arXiv:2003.04419  [pdf, ps, other

    cs.CL cs.LG

    Combining Pretrained High-Resource Embeddings and Subword Representations for Low-Resource Languages

    Authors: Machel Reid, Edison Marrese-Taylor, Yutaka Matsuo

    Abstract: The contrast between the need for large amounts of data for current Natural Language Processing (NLP) techniques, and the lack thereof, is accentuated in the case of African languages, most of which are considered low-resource. To help circumvent this issue, we explore techniques exploiting the qualities of morphologically rich languages (MRLs), while leveraging pretrained word vectors in well-res… ▽ More

    Submitted 21 April, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

    Comments: Accepted to the "AfricaNLP - Unlocking Local Languages" workshop at ICLR 2020

  26. arXiv:2003.04032  [pdf, other

    cs.CL

    Shallow Discourse Annotation for Chinese TED Talks

    Authors: Wanqiu Long, Xinyi Cai, James E. M. Reid, Bonnie Webber, Deyi Xiong

    Abstract: Text corpora annotated with language-related properties are an important resource for the development of Language Technology. The current work contributes a new resource for Chinese Language Technology and for Chinese-English translation, in the form of a set of TED talks (some originally given in English, some in Chinese) that have been annotated with discourse relations in the style of the Penn… ▽ More

    Submitted 6 April, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

  27. arXiv:1606.03203  [pdf, other

    stat.ML cs.LG

    Causal Bandits: Learning Good Interventions via Causal Inference

    Authors: Finnian Lattimore, Tor Lattimore, Mark D. Reid

    Abstract: We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and causal inference to model a novel type of bandit feedback that is not exploited by existing approaches. We propose a new algorithm that exploits the causal feedback and prove a bound on its simple regret that is… ▽ More

    Submitted 10 June, 2016; originally announced June 2016.

  28. arXiv:1602.02852  [pdf, other

    stat.ML cs.LG

    Compliance-Aware Bandits

    Authors: Nicolás Della Penna, Mark D. Reid, David Balduzzi

    Abstract: Motivated by clinical trials, we study bandits with observable non-compliance. At each step, the learner chooses an arm, after, instead of observing only the reward, it also observes the action that took place. We show that such noncompliance can be helpful or hurtful to the learner in general. Unfortunately, naively incorporating compliance information into bandit algorithms loses guarantees on s… ▽ More

    Submitted 8 February, 2016; originally announced February 2016.

  29. arXiv:1507.02592  [pdf, other

    cs.LG stat.ML

    Fast rates in statistical and online learning

    Authors: Tim van Erven, Peter D. Grünwald, Nishant A. Mehta, Mark D. Reid, Robert C. Williamson

    Abstract: The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most… ▽ More

    Submitted 1 September, 2015; v1 submitted 9 July, 2015; originally announced July 2015.

    Comments: 69 pages, 3 figures

    Journal ref: Journal of Machine Learning Research 6(54):1793-1861, 2015

  30. arXiv:1410.0413  [pdf, other

    cs.GT cs.AI math.OC

    Risk Dynamics in Trade Networks

    Authors: Rafael M. Frongillo, Mark D. Reid

    Abstract: We introduce a new framework to model interactions among agents which seek to trade to minimize their risk with respect to some future outcome. We quantify this risk using the concept of risk measures from finance, and introduce a class of trade dynamics which allow agents to trade contracts contingent upon the future outcome. We then show that these trade dynamics exactly correspond to a variant… ▽ More

    Submitted 9 October, 2014; v1 submitted 1 October, 2014; originally announced October 2014.

  31. arXiv:1406.6130  [pdf, other

    cs.LG

    Generalized Mixability via Entropic Duality

    Authors: Mark D. Reid, Rafael M. Frongillo, Robert C. Williamson, Nishant Mehta

    Abstract: Mixability is a property of a loss which characterizes when fast convergence is possible in the game of prediction with expert advice. We show that a key property of mixability generalizes, and the exp and log operations present in the usual theory are not as special as one might have thought. In doing this we introduce a more general notion of $Φ$-mixability where $Φ$ is a general entropy (\ie, a… ▽ More

    Submitted 23 June, 2014; originally announced June 2014.

    Comments: 20 pages, 1 figure. Supersedes the work in arXiv:1403.2433 [cs.LG]

  32. arXiv:1403.2433  [pdf, ps, other

    cs.LG stat.ML

    Generalised Mixability, Constant Regret, and Bayesian Updating

    Authors: Mark D. Reid, Rafael M. Frongillo, Robert C. Williamson

    Abstract: Mixability of a loss is known to characterise when constant regret bounds are achievable in games of prediction with expert advice through the use of Vovk's aggregating algorithm. We provide a new interpretation of mixability via convex analysis that highlights the role of the Kullback-Leibler divergence in its definition. This naturally generalises to what we call $Φ$-mixability where the Bregman… ▽ More

    Submitted 10 March, 2014; originally announced March 2014.

    Comments: 12 pages

  33. arXiv:1402.1921  [pdf, other

    cs.LG cs.AI cs.CV

    A Hybrid Loss for Multiclass and Structured Prediction

    Authors: Qinfeng Shi, Mark Reid, Tiberio Caetano, Anton van den Hengel, Zhenhua Wang

    Abstract: We propose a novel hybrid loss for multiclass and structured prediction problems that is a convex combination of a log loss for Conditional Random Fields (CRFs) and a multiclass hinge loss for Support Vector Machines (SVMs). We provide a sufficient condition for when the hybrid loss is Fisher consistent for classification. This condition depends on a measure of dominance between labels--specifical… ▽ More

    Submitted 9 February, 2014; originally announced February 2014.

    Comments: 12 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:1009.3346

  34. arXiv:1206.4664  [pdf

    cs.LG stat.ML

    Tighter Variational Representations of f-Divergences via Restriction to Probability Measures

    Authors: Avraham Ruderman, Mark Reid, Dario Garcia-Garcia, James Petterson

    Abstract: We show that the variational representations for f-divergences currently used in the literature can be tightened. This has implications to a number of methods recently proposed based on this representation. As an example application we use our tighter representation to derive a general f-divergence estimator based on two i.i.d. samples and derive the dual program for this estimator that performs w… ▽ More

    Submitted 18 June, 2012; originally announced June 2012.

    Comments: ICML2012

  35. arXiv:1206.4663  [pdf

    cs.LG stat.ML

    The Convexity and Design of Composite Multiclass Losses

    Authors: Mark Reid, Robert Williamson, Peng Sun

    Abstract: We consider composite loss functions for multiclass prediction comprising a proper (i.e., Fisher-consistent) loss over probability distributions and an inverse link function. We establish conditions for their (strong) convexity and explore the implications. We also show how the separation of concerns afforded by using this composite representation allows for the design of families of losses with t… ▽ More

    Submitted 18 June, 2012; originally announced June 2012.

    Comments: ICML2012

  36. arXiv:1204.3511  [pdf, ps, other

    cs.SI cs.GT

    Crowd & Prejudice: An Impossibility Theorem for Crowd Labelling without a Gold Standard

    Authors: Nicolás Della Penna, Mark D. Reid

    Abstract: A common use of crowd sourcing is to obtain labels for a dataset. Several algorithms have been proposed to identify uninformative members of the crowd so that their labels can be disregarded and the cost of paying them avoided. One common motivation of these algorithms is to try and do without any initial set of trusted labeled data. We analyse this class of algorithms as mechanisms in a game-theo… ▽ More

    Submitted 16 April, 2012; originally announced April 2012.

    Comments: Presented at Collective Intelligence conference, 2012 (arXiv:1204.2991)

    Report number: CollectiveIntelligence/2012/33

  37. arXiv:1112.0076  [pdf, other

    q-fin.TR cs.GT stat.ML

    Bandit Market Makers

    Authors: Nicolas Della Penna, Mark D. Reid

    Abstract: We introduce a modular framework for market making. It combines cost-function based automated market makers with bandit algorithms. We obtain worst-case profits guarantee's relative to the best in hindsight within a class of natural "overround" cost functions . This combination allow us to have distribution-free guarantees on the regret of profits while preserving the bounded worst-case losses and… ▽ More

    Submitted 1 August, 2013; v1 submitted 30 November, 2011; originally announced December 2011.

    Comments: A previous version of this work appeared in the NIPS 2011 Workshop on Computational Social Science and the Wisdom of the Crowds

  38. arXiv:1110.3907  [pdf, ps, other

    stat.ML cs.AI cs.CV

    AOSO-LogitBoost: Adaptive One-Vs-One LogitBoost for Multi-Class Problem

    Authors: Peng Sun, Mark D. Reid, Jie Zhou

    Abstract: This paper presents an improvement to model learning when using multi-class LogitBoost for classification. Motivated by the statistical view, LogitBoost can be seen as additive tree regression. Two important factors in this setting are: 1) coupled classifier output due to a sum-to-zero constraint, and 2) the dense Hessian matrices that arise when computing tree node split gain and node value fitti… ▽ More

    Submitted 4 July, 2012; v1 submitted 18 October, 2011; originally announced October 2011.

    Comments: 8-pages camera ready version for ICML2012

  39. arXiv:1009.3346  [pdf, other

    cs.LG

    Conditional Random Fields and Support Vector Machines: A Hybrid Approach

    Authors: Qinfeng Shi, Mark D. Reid, Tiberio Caetano

    Abstract: We propose a novel hybrid loss for multiclass and structured prediction problems that is a convex combination of log loss for Conditional Random Fields (CRFs) and a multiclass hinge loss for Support Vector Machines (SVMs). We provide a sufficient condition for when the hybrid loss is Fisher consistent for classification. This condition depends on a measure of dominance between labels - specificall… ▽ More

    Submitted 17 September, 2010; originally announced September 2010.

    Comments: 16 pages, 3 figures

  40. arXiv:0906.1244  [pdf, other

    cs.IT

    Generalised Pinsker Inequalities

    Authors: Mark D. Reid, Robert C. Williamson

    Abstract: We generalise the classical Pinsker inequality which relates variational divergence to Kullback-Liebler divergence in two ways: we consider arbitrary f-divergences in place of KL divergence, and we assume knowledge of a sequence of values of generalised variational divergences. We then develop a best possible inequality for this doubly generalised situation. Specialising our result to the classi… ▽ More

    Submitted 5 June, 2009; originally announced June 2009.

    Comments: 21 pages, 3 figures, accepted to COLT 2009