Search | arXiv e-print repository

Retrieval-Enhanced Machine Learning: Synthesis and Opportunities

Authors: To Eun Kim, Alireza Salemi, Andrew Drozdov, Fernando Diaz, Hamed Zamani

Abstract: In the field of language modeling, models augmented with retrieval components have emerged as a promising solution to address several challenges faced in the natural language processing (NLP) field, including knowledge grounding, interpretability, and scalability. Despite the primary focus on NLP, we posit that the paradigm of retrieval-enhancement can be extended to a broader spectrum of machine… ▽ More In the field of language modeling, models augmented with retrieval components have emerged as a promising solution to address several challenges faced in the natural language processing (NLP) field, including knowledge grounding, interpretability, and scalability. Despite the primary focus on NLP, we posit that the paradigm of retrieval-enhancement can be extended to a broader spectrum of machine learning (ML) such as computer vision, time series prediction, and computational biology. Therefore, this work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature. Also, we found that while a number of studies employ retrieval components to augment their models, there is a lack of integration with foundational Information Retrieval (IR) research. We bridge this gap between the seminal IR research and contemporary REML studies by investigating each component that comprises the REML framework. Ultimately, the goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2406.11565 [pdf, other]

Extrinsic Evaluation of Cultural Competence in Large Language Models

Authors: Shaily Bhatt, Fernando Diaz

Abstract: Productive interactions between diverse users and language technologies require outputs from the latter to be culturally relevant and sensitive. Prior works have evaluated models' knowledge of cultural norms, values, and artifacts, without considering how this knowledge manifests in downstream applications. In this work, we focus on extrinsic evaluation of cultural competence in two text generatio… ▽ More Productive interactions between diverse users and language technologies require outputs from the latter to be culturally relevant and sensitive. Prior works have evaluated models' knowledge of cultural norms, values, and artifacts, without considering how this knowledge manifests in downstream applications. In this work, we focus on extrinsic evaluation of cultural competence in two text generation tasks, open-ended question answering and story generation. We quantitatively and qualitatively evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts. Although we find that model outputs do vary when varying nationalities and feature culturally relevant words, we also find weak correlations between text similarity of outputs for different countries and the cultural values of these countries. Finally, we discuss important considerations in designing comprehensive evaluation of cultural competence in user-facing tasks. △ Less

Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Under peer review

arXiv:2402.01849 [pdf, other]

doi 10.1016/j.engappai.2020.104113

Capturing waste collection planning expert knowledge in a fitness function through preference learning

Authors: Laura Fernández Díaz, Miriam Fernández Díaz, José Ramón Quevedo, Elena Montañés

Abstract: This paper copes with the COGERSA waste collection process. Up to now, experts have been manually designed the process using a trial and error mechanism. This process is not globally optimized, since it has been progressively and locally built as council demands appear. Planning optimization algorithms usually solve it, but they need a fitness function to evaluate a route planning quality. The dra… ▽ More This paper copes with the COGERSA waste collection process. Up to now, experts have been manually designed the process using a trial and error mechanism. This process is not globally optimized, since it has been progressively and locally built as council demands appear. Planning optimization algorithms usually solve it, but they need a fitness function to evaluate a route planning quality. The drawback is that even experts are not able to propose one in a straightforward way due to the complexity of the process. Hence, the goal of this paper is to build a fitness function though a preference framework, taking advantage of the available expert knowledge and expertise. Several key performance indicators together with preference judgments are carefully established according to the experts for learning a promising fitness function. Particularly, the additivity property of them makes the task be much more affordable, since it allows to work with routes rather than with route plannings. Besides, a feature selection analysis is performed over such indicators, since the experts suspect of a potential existing (but unknown) redundancy among them. The experiment results confirm this hypothesis, since the best $C-$index ($98\%$ against around $94\%$) is reached when 6 or 8 out of 21 indicators are taken. Particularly, truck load seems to be a highly promising key performance indicator, together to the travelled distance along non-main roads. A comparison with other existing approaches shows that the proposed method clearly outperforms them, since the $C-$index goes from $72\%$ or $90\%$ to $98\%$. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Journal ref: Engineering Applications of Artificial Intelligence 2021 Volume 99 104113

arXiv:2312.12251 [pdf, other]

Fairness and Consensus in Opinion Models (Technical Report)

Authors: Jesús Aranda, Sebastián Betancourt, Juan Fco. Díaz, Frank Valencia

Abstract: We introduce a DeGroot-based model for opinion dynamics in social networks. A community of agents is represented as a weighted directed graph whose edges indicate how much agents influence one another. The model is formalized using labeled transition systems, henceforth called opinion transition systems (OTS), whose states represent the agents' opinions and whose actions are the edges of the influ… ▽ More We introduce a DeGroot-based model for opinion dynamics in social networks. A community of agents is represented as a weighted directed graph whose edges indicate how much agents influence one another. The model is formalized using labeled transition systems, henceforth called opinion transition systems (OTS), whose states represent the agents' opinions and whose actions are the edges of the influence graph. If a transition labeled $(i,j)$ is performed, agent $j$ updates their opinion taking into account the opinion of agent $i$ and the influence $i$ has over $j$. We study (convergence to) opinion consensus among the agents of strongly-connected graphs with influence values in the interval $(0,1)$. We show that consensus cannot be guaranteed under the standard strong fairness assumption on transition systems. We derive that consensus is guaranteed under a stronger notion from the literature of concurrent systems; bounded fairness. We argue that bounded-fairness is too strong of a notion for consensus as it almost surely rules out random runs and it is not a constructive liveness property. We introduce a weaker fairness notion, called $m$-bounded fairness, and show that it guarantees consensus. The new notion includes almost surely all random runs and it is a constructive liveness property. Finally, we consider OTS with dynamic influence and show convergence to consensus holds under $m$-bounded fairness if the influence changes within a fixed interval $[L,U]$ with $0<L<U<1$. We illustrate OTS with examples and simulations, offering insights into opinion formation under fairness and dynamic influence. △ Less

Submitted 11 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: 31 pages total: 15 main work, 2 for references and 13 for two proof appendices. 5 figures

arXiv:2310.20091 [pdf, other]

Density-based User Representation using Gaussian Process Regression for Multi-interest Personalized Retrieval

Authors: Haolun Wu, Ofer Meshi, Masrour Zoghi, Fernando Diaz, Xue Liu, Craig Boutilier, Maryam Karimzadehgan

Abstract: Accurate modeling of the diverse and dynamic interests of users remains a significant challenge in the design of personalized recommender systems. Existing user modeling methods, like single-point and multi-point representations, have limitations w.r.t.\ accuracy, diversity, and adaptability. To overcome these deficiencies, we introduce density-based user representations (DURs), a novel method tha… ▽ More Accurate modeling of the diverse and dynamic interests of users remains a significant challenge in the design of personalized recommender systems. Existing user modeling methods, like single-point and multi-point representations, have limitations w.r.t.\ accuracy, diversity, and adaptability. To overcome these deficiencies, we introduce density-based user representations (DURs), a novel method that leverages Gaussian process regression (GPR) for effective multi-interest recommendation and retrieval. Our approach, GPR4DUR, exploits DURs to capture user interest variability without manual tuning, incorporates uncertainty-awareness, and scales well to large numbers of users. Experiments using real-world offline datasets confirm the adaptability and efficiency of GPR4DUR, while online experiments with simulated users demonstrate its ability to address the exploration-exploitation trade-off by effectively utilizing model uncertainty. △ Less

Submitted 22 May, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: 22 pages

arXiv:2309.05892 [pdf, other]

doi 10.1145/3613455

Distributionally-Informed Recommender System Evaluation

Authors: Michael D. Ekstrand, Ben Carterette, Fernando Diaz

Abstract: Current practice for evaluating recommender systems typically focuses on point estimates of user-oriented effectiveness metrics or business metrics, sometimes combined with additional metrics for considerations such as diversity and novelty. In this paper, we argue for the need for researchers and practitioners to attend more closely to various distributions that arise from a recommender system (o… ▽ More Current practice for evaluating recommender systems typically focuses on point estimates of user-oriented effectiveness metrics or business metrics, sometimes combined with additional metrics for considerations such as diversity and novelty. In this paper, we argue for the need for researchers and practitioners to attend more closely to various distributions that arise from a recommender system (or other information access system) and the sources of uncertainty that lead to these distributions. One immediate implication of our argument is that both researchers and practitioners must report and examine more thoroughly the distribution of utility between and within different stakeholder groups. However, distributions of various forms arise in many more aspects of the recommender systems experimental process, and distributional thinking has substantial ramifications for how we design, evaluate, and present recommender systems evaluation and research results. Leveraging and emphasizing distributions in the evaluation of recommender systems is a necessary step to ensure that the systems provide appropriate and equitably-distributed benefit to the people they affect. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: Accepted to ACM Transactions on Recommender Systems

arXiv:2308.14601 [pdf, other]

Fairness Through Domain Awareness: Mitigating Popularity Bias For Music Discovery

Authors: Rebecca Salganik, Fernando Diaz, Golnoosh Farnadi

Abstract: As online music platforms grow, music recommender systems play a vital role in helping users navigate and discover content within their vast musical databases. At odds with this larger goal, is the presence of popularity bias, which causes algorithmic systems to favor mainstream content over, potentially more relevant, but niche items. In this work we explore the intrinsic relationship between mus… ▽ More As online music platforms grow, music recommender systems play a vital role in helping users navigate and discover content within their vast musical databases. At odds with this larger goal, is the presence of popularity bias, which causes algorithmic systems to favor mainstream content over, potentially more relevant, but niche items. In this work we explore the intrinsic relationship between music discovery and popularity bias. To mitigate this issue we propose a domain-aware, individual fairness-based approach which addresses popularity bias in graph neural network (GNNs) based recommender systems. Our approach uses individual fairness to reflect a ground truth listening experience, i.e., if two songs sound similar, this similarity should be reflected in their representations. In doing so, we facilitate meaningful music discovery that is robust to popularity bias and grounded in the music domain. We apply our BOOST methodology to two discovery based tasks, performing recommendations at both the playlist level and user level. Then, we ground our evaluation in the cold start setting, showing that our approach outperforms existing fairness benchmarks in both performance and recommendation of lesser-known content. Finally, our analysis explains why our proposed methodology is a novel and promising approach to mitigating popularity bias and improving the discovery of new and niche content in music recommender systems. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.02887 [pdf, other]

The Impact of Group Membership Bias on the Quality and Fairness of Exposure in Ranking

Authors: Ali Vardasbi, Maarten de Rijke, Fernando Diaz, Mostafa Dehghani

Abstract: When learning to rank from user interactions, search and recommender systems must address biases in user behavior to provide a high-quality ranking. One type of bias that has recently been studied in the ranking literature is when sensitive attributes, such as gender, have an impact on a user's judgment about an item's utility. For example, in a search for an expertise area, some users may be bias… ▽ More When learning to rank from user interactions, search and recommender systems must address biases in user behavior to provide a high-quality ranking. One type of bias that has recently been studied in the ranking literature is when sensitive attributes, such as gender, have an impact on a user's judgment about an item's utility. For example, in a search for an expertise area, some users may be biased towards clicking on male candidates over female candidates. We call this type of bias group membership bias. Increasingly, we seek rankings that are fair to individuals and sensitive groups. Merit-based fairness measures rely on the estimated utility of the items. With group membership bias, the utility of the sensitive groups is under-estimated, hence, without correcting for this bias, a supposedly fair ranking is not truly fair. In this paper, first, we analyze the impact of group membership bias on ranking quality as well as merit-based fairness metrics and show that group membership bias can hurt both ranking and fairness. Then, we provide a correction method for group bias that is based on the assumption that the utility score of items in different groups comes from the same distribution. This assumption has two potential issues of sparsity and equality-instead-of-equity; we use an amortized approach to address these. We show that our correction method can consistently compensate for the negative impact of group membership bias on ranking quality and fairness metrics. △ Less

Submitted 29 April, 2024; v1 submitted 5 August, 2023; originally announced August 2023.

arXiv:2307.03201 [pdf, ps, other]

Scaling Laws Do Not Scale

Authors: Fernando Diaz, Michael Madaio

Abstract: Recent work has proposed a power law relationship, referred to as ``scaling laws,'' between the performance of artificial intelligence (AI) models and aspects of those models' design (e.g., dataset size). In other words, as the size of a dataset (or model parameters, etc) increases, the performance of a given model trained on that dataset will correspondingly increase. However, while compelling in… ▽ More Recent work has proposed a power law relationship, referred to as ``scaling laws,'' between the performance of artificial intelligence (AI) models and aspects of those models' design (e.g., dataset size). In other words, as the size of a dataset (or model parameters, etc) increases, the performance of a given model trained on that dataset will correspondingly increase. However, while compelling in the aggregate, this scaling law relationship overlooks the ways that metrics used to measure performance may be precarious and contested, or may not correspond with how different groups of people may perceive the quality of models' output. In this paper, we argue that as the size of datasets used to train large AI models grows, the number of distinct communities (including demographic groups) whose data is included in a given dataset is likely to grow, each of whom may have different values. As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by (or in the worst case, at odds with) the metrics used to evaluate model performance for scaling laws. We end the paper with implications for AI scaling laws -- that models may not, in fact, continue to improve as the datasets get larger -- at least not for all people or communities impacted by those models. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.07908 [pdf, other]

Best-Case Retrieval Evaluation: Improving the Sensitivity of Reciprocal Rank with Lexicographic Precision

Authors: Fernando Diaz

Abstract: Across a variety of ranking tasks, researchers use reciprocal rank to measure the effectiveness for users interested in exactly one relevant item. Despite its widespread use, evidence suggests that reciprocal rank is brittle when discriminating between systems. This brittleness, in turn, is compounded in modern evaluation settings where current, high-precision systems may be difficult to distingui… ▽ More Across a variety of ranking tasks, researchers use reciprocal rank to measure the effectiveness for users interested in exactly one relevant item. Despite its widespread use, evidence suggests that reciprocal rank is brittle when discriminating between systems. This brittleness, in turn, is compounded in modern evaluation settings where current, high-precision systems may be difficult to distinguish. We address the lack of sensitivity of reciprocal rank by introducing and connecting it to the concept of best-case retrieval, an evaluation method focusing on assessing the quality of a ranking for the most satisfied possible user across possible recall requirements. This perspective allows us to generalize reciprocal rank and define a new preference-based evaluation we call lexicographic precision or lexiprecision. By mathematical construction, we ensure that lexiprecision preserves differences detected by reciprocal rank, while empirically improving sensitivity and robustness across a broad set of retrieval and recommendation tasks. △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2303.09335 [pdf, other]

doi 10.1051/0004-6361/202346417

ExoplANNET: A deep learning algorithm to detect and identify planetary signals in radial velocity data

Authors: L. A. Nieto, R. F. Díaz

Abstract: The detection of exoplanets with the radial velocity method consists in detecting variations of the stellar velocity caused by an unseen sub-stellar companion. Instrumental errors, irregular time sampling, and different noise sources originating in the intrinsic variability of the star can hinder the interpretation of the data, and even lead to spurious detections. In recent times, work began to e… ▽ More The detection of exoplanets with the radial velocity method consists in detecting variations of the stellar velocity caused by an unseen sub-stellar companion. Instrumental errors, irregular time sampling, and different noise sources originating in the intrinsic variability of the star can hinder the interpretation of the data, and even lead to spurious detections. In recent times, work began to emerge in the field of extrasolar planets that use Machine Learning algorithms, some with results that exceed those obtained with the traditional techniques in the field. We seek to explore the scope of the neural networks in the radial velocity method, in particular for exoplanet detection in the presence of correlated noise of stellar origin. In this work, a neural network is proposed to replace the computation of the significance of the signal detected with the radial velocity method and to classify it as of planetary origin or not. The algorithm is trained using synthetic data of systems with and without planetary companions. We injected realistic correlated noise in the simulations, based on previous studies of the behaviour of stellar activity. The performance of the network is compared to the traditional method based on null hypothesis significance testing. The network achieves 28 % fewer false positives. The improvement is observed mainly in the detection of small-amplitude signals associated with low-mass planets. In addition, its execution time is five orders of magnitude faster than the traditional method. The superior performance exhibited by the algorithm has only been tested on simulated radial velocity data so far. Although in principle it should be straightforward to adapt it for use in real time series, its performance has to be tested thoroughly. Future work should permit evaluating its potential for adoption as a valuable tool for exoplanet detection. △ Less

Submitted 1 July, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

Comments: Accepted for publication; Corrected typos; Added section 6.1 with a robustness analysis of the method; Added section 6.2 with tests on a real time series; Added section 6.3 with a more detailed analysis of the caution of the network around activity periods; Added other tested models to the appendix

Journal ref: A&A 677, A48 (2023)

arXiv:2302.11370 [pdf, other]

Recall, Robustness, and Lexicographic Evaluation

Authors: Fernando Diaz, Bhaskar Mitra

Abstract: Although originally developed to evaluate sets of items, recall is often used to evaluate rankings of items, including those produced by recommender, retrieval, and other machine learning systems. The application of recall without a formal evaluative motivation has led to criticism of recall as a vague or inappropriate measure. In light of this debate, we reflect on the measurement of recall in ra… ▽ More Although originally developed to evaluate sets of items, recall is often used to evaluate rankings of items, including those produced by recommender, retrieval, and other machine learning systems. The application of recall without a formal evaluative motivation has led to criticism of recall as a vague or inappropriate measure. In light of this debate, we reflect on the measurement of recall in rankings from a formal perspective. Our analysis is composed of three tenets: recall, robustness, and lexicographic evaluation. First, we formally define `recall-orientation' as the sensitivity of a metric to a user interested in finding every relevant item. Second, we analyze recall-orientation from the perspective of robustness with respect to possible content consumers and providers, connecting recall to recent conversations about fair ranking. Finally, we extend this conceptual and theoretical treatment of recall by developing a practical preference-based evaluation method based on lexicographic comparison. Through extensive empirical analysis across three recommendation tasks and 17 information retrieval tasks, we establish that our new evaluation method, lexirecall, has convergent validity (i.e., it is correlated with existing recall metrics) and exhibits substantially higher sensitivity in terms of discriminative power and stability in the presence of missing labels. Our conceptual, theoretical, and empirical analysis substantially deepens our understanding of recall and motivates its adoption through connections to robustness and fairness. △ Less

Submitted 8 March, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: Under review

arXiv:2302.11360 [pdf, other]

Commonality in Recommender Systems: Evaluating Recommender Systems to Enhance Cultural Citizenship

Authors: Andres Ferraro, Gustavo Ferreira, Fernando Diaz, Georgina Born

Abstract: Recommender systems have become the dominant means of curating cultural content, significantly influencing individual cultural experience. Since recommender systems tend to optimize for personalized user experience, they can overlook impacts on cultural experience in the aggregate. After demonstrating that existing metrics do not center culture, we introduce a new metric, commonality, that measure… ▽ More Recommender systems have become the dominant means of curating cultural content, significantly influencing individual cultural experience. Since recommender systems tend to optimize for personalized user experience, they can overlook impacts on cultural experience in the aggregate. After demonstrating that existing metrics do not center culture, we introduce a new metric, commonality, that measures the degree to which recommendations familiarize a given user population with specified categories of cultural content. We developed commonality through an interdisciplinary dialogue between researchers in computer science and the social sciences and humanities. With reference to principles underpinning public service media systems in democratic societies, we identify universality of address and content diversity in the service of strengthening cultural citizenship as particularly relevant goals for recommender systems delivering cultural content. We develop commonality as a measure of recommender system alignment with the promotion of content toward a shared cultural experience across a population of users. We empirically compare the performance of recommendation algorithms using commonality with existing metrics, demonstrating that commonality captures a novel property of system behavior complementary to existing metrics. Alongside existing fairness and diversity metrics, commonality contributes to a growing body of scholarship developing `public good' rationales for machine learning systems. △ Less

Submitted 22 February, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: extended version of "Measuring Commonality in Recommendation of Cultural Content: Recommender Systems to Enhance Cultural Citizenship", published at RecSys 2022

arXiv:2301.03971 [pdf, other]

Unsupervised Mandarin-Cantonese Machine Translation

Authors: Megan Dare, Valentina Fajardo Diaz, Averie Ho Zoen So, Yifan Wang, Shibingfeng Zhang

Abstract: Advancements in unsupervised machine translation have enabled the development of machine translation systems that can translate between languages for which there is not an abundance of parallel data available. We explored unsupervised machine translation between Mandarin Chinese and Cantonese. Despite the vast number of native speakers of Cantonese, there is still no large-scale corpus for the lan… ▽ More Advancements in unsupervised machine translation have enabled the development of machine translation systems that can translate between languages for which there is not an abundance of parallel data available. We explored unsupervised machine translation between Mandarin Chinese and Cantonese. Despite the vast number of native speakers of Cantonese, there is still no large-scale corpus for the language, due to the fact that Cantonese is primarily used for oral communication. The key contributions of our project include: 1. The creation of a new corpus containing approximately 1 million Cantonese sentences, and 2. A large-scale comparison across different model architectures, tokenization schemes, and embedding structures. Our best model trained with character-based tokenization and a Transformer architecture achieved a character-level BLEU of 25.1 when translating from Mandarin to Cantonese and of 24.4 when translating from Cantonese to Mandarin. In this paper we discuss our research process, experiments, and results. △ Less

Submitted 10 January, 2023; originally announced January 2023.

arXiv:2212.08038 [pdf, ps, other]

Redefining Relationships in Music

Authors: Christian Detweiler, Beth Coleman, Fernando Diaz, Lieke Dom, Chris Donahue, Jesse Engel, Cheng-Zhi Anna Huang, Larry James, Ethan Manilow, Amanda McCroskery, Kyle Pedersen, Pamela Peter-Agbia, Negar Rostamzadeh, Robert Thomas, Marco Zamarato, Ben Zevenbergen

Abstract: AI tools increasingly shape how we discover, make and experience music. While these tools can have the potential to empower creativity, they may fundamentally redefine relationships between stakeholders, to the benefit of some and the detriment of others. In this position paper, we argue that these tools will fundamentally reshape our music culture, with profound effects (for better and for worse)… ▽ More AI tools increasingly shape how we discover, make and experience music. While these tools can have the potential to empower creativity, they may fundamentally redefine relationships between stakeholders, to the benefit of some and the detriment of others. In this position paper, we argue that these tools will fundamentally reshape our music culture, with profound effects (for better and for worse) on creators, consumers and the commercial enterprises that often connect them. By paying careful attention to emerging Music AI technologies and developments in other creative domains and understanding the implications, people working in this space could decrease the possible negative impacts on the practice, consumption and meaning of music. Given that many of these technologies are already available, there is some urgency in conducting analyses of these technologies now. It is important that people developing and working with these tools address these issues now to help guide their evolution to be equitable and empower creativity. We identify some potential risks and opportunities associated with existing and forthcoming AI tools for music, though more work is needed to identify concrete actions which leverage the opportunities while mitigating risks. △ Less

Submitted 16 December, 2022; v1 submitted 13 December, 2022; originally announced December 2022.

Comments: Presented at Cultures in AI/AI in Culture workshop at NeurIPS 2022

arXiv:2211.06348 [pdf, other]

Striving for data-model efficiency: Identifying data externalities on group performance

Authors: Esther Rolf, Ben Packer, Alex Beutel, Fernando Diaz

Abstract: Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance. In this work, we seek to better understand how we might characterize, detect, and design for data-model synergies. We focus on a particular type of data-model inefficiency, in which adding training data… ▽ More Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance. In this work, we seek to better understand how we might characterize, detect, and design for data-model synergies. We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population, a phenomenon we refer to as negative data externalities on group performance. Such externalities can arise in standard learning settings and can manifest differently depending on conditions between training set size and model size. Data externalities directly imply a lower bound on feasible model improvements, yet improving models efficiently requires understanding the underlying data-model tensions. From a broader perspective, our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: 9 pages, 3 figures. Trustworthy and Socially Responsible Machine Learning (TSRML 2022) workshop co-located with NeurIPS 2022

arXiv:2210.05145 [pdf, other]

Retrieval Augmentation for T5 Re-ranker using External Sources

Authors: Kai Hui, Tao Chen, Zhen Qin, Honglei Zhuang, Fernando Diaz, Mike Bendersky, Don Metzler

Abstract: Retrieval augmentation has shown promising improvements in different tasks. However, whether such augmentation can assist a large language model based re-ranker remains unclear. We investigate how to augment T5-based re-rankers using high-quality information retrieved from two external corpora -- a commercial web search engine and Wikipedia. We empirically demonstrate how retrieval augmentation ca… ▽ More Retrieval augmentation has shown promising improvements in different tasks. However, whether such augmentation can assist a large language model based re-ranker remains unclear. We investigate how to augment T5-based re-rankers using high-quality information retrieved from two external corpora -- a commercial web search engine and Wikipedia. We empirically demonstrate how retrieval augmentation can substantially improve the effectiveness of T5-based re-rankers for both in-domain and zero-shot out-of-domain re-ranking tasks. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2209.03904 [pdf, ps, other]

Analyzing the Effect of Sampling in GNNs on Individual Fairness

Authors: Rebecca Salganik, Fernando Diaz, Golnoosh Farnadi

Abstract: Graph neural network (GNN) based methods have saturated the field of recommender systems. The gains of these systems have been significant, showing the advantages of interpreting data through a network structure. However, despite the noticeable benefits of using graph structures in recommendation tasks, this representational form has also bred new challenges which exacerbate the complexity of miti… ▽ More Graph neural network (GNN) based methods have saturated the field of recommender systems. The gains of these systems have been significant, showing the advantages of interpreting data through a network structure. However, despite the noticeable benefits of using graph structures in recommendation tasks, this representational form has also bred new challenges which exacerbate the complexity of mitigating algorithmic bias. When GNNs are integrated into downstream tasks, such as recommendation, bias mitigation can become even more difficult. Furthermore, the intractability of applying existing methods of fairness promotion to large, real world datasets places even more serious constraints on mitigation attempts. Our work sets out to fill in this gap by taking an existing method for promoting individual fairness on graphs and extending it to support mini-batch, or sub-sample based, training of a GNN, thus laying the groundwork for applying this method to a downstream recommendation task. We evaluate two popular GNN methods: Graph Convolutional Network (GCN), which trains on the entire graph, and GraphSAGE, which uses probabilistic random walks to create subgraphs for mini-batch training, and assess the effects of sub-sampling on individual fairness. We implement an individual fairness notion called \textit{REDRESS}, proposed by Dong et al., which uses rank optimization to learn individual fair node, or item, embeddings. We empirically show on two real world datasets that GraphSAGE is able to achieve, not just, comparable accuracy, but also, improved fairness as compared with the GCN model. These finding have consequential ramifications to individual fairness promotion, GNNs, and in downstream form, recommender systems, showing that mini-batch training facilitate individual fairness promotion by allowing for local nuance to guide the process of fairness promotion in representation learning. △ Less

Submitted 9 September, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

arXiv:2208.01696 [pdf, other]

doi 10.1145/3523227.3551476

Measuring Commonality in Recommendation of Cultural Content: Recommender Systems to Enhance Cultural Citizenship

Authors: Andres Ferraro, Gustavo Ferreira, Fernando Diaz, Georgina Born

Abstract: Recommender systems have become the dominant means of curating cultural content, significantly influencing the nature of individual cultural experience. While the majority of research on recommender systems optimizes for personalized user experience, this paradigm does not capture the ways that recommender systems impact cultural experience in the aggregate, across populations of users. Although e… ▽ More Recommender systems have become the dominant means of curating cultural content, significantly influencing the nature of individual cultural experience. While the majority of research on recommender systems optimizes for personalized user experience, this paradigm does not capture the ways that recommender systems impact cultural experience in the aggregate, across populations of users. Although existing novelty, diversity, and fairness studies probe how systems relate to the broader social role of cultural content, they do not adequately center culture as a core concept and challenge. In this work, we introduce commonality as a new measure that reflects the degree to which recommendations familiarize a given user population with specified categories of cultural content. Our proposed commonality metric responds to a set of arguments developed through an interdisciplinary dialogue between researchers in computer science and the social sciences and humanities. With reference to principles underpinning non-profit, public service media systems in democratic societies, we identify universality of address and content diversity in the service of strengthening cultural citizenship as particularly relevant goals for recommender systems delivering cultural content. Taking diversity in movie recommendation as a case study in enhancing pluralistic cultural experience, we empirically compare systems' performance using commonality and existing utility, diversity, and fairness metrics. Our results demonstrate that commonality captures a property of system behavior complementary to existing metrics and suggest the need for alternative, non-personalized interventions in recommender systems oriented to strengthening cultural citizenship across populations of users. In this way, commonality contributes to a growing body of scholarship developing 'public good' rationales for digital media and ML systems. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: The 16th ACM Conference on Recommender Systems

arXiv:2205.09403 [pdf, other]

doi 10.1145/3477495.3531873

On Natural Language User Profiles for Transparent and Scrutable Recommendation

Authors: Filip Radlinski, Krisztian Balog, Fernando Diaz, Lucas Dixon, Ben Wedin

Abstract: Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and… ▽ More Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and possible for algorithms that use natural language representations of users' preferences to be developed. We make the case that this could provide significantly greater transparency, as well as affordances for practical actionable interrogation of, and control over, recommendations. Moreover, we argue that such an approach, if successfully applied, may enable a major step towards systems that rely less on noisy implicit observations while increasing portability of knowledge of one's interests. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22), 2022

arXiv:2205.01230 [pdf, other]

doi 10.1145/3477495.3531722

Retrieval-Enhanced Machine Learning

Authors: Hamed Zamani, Fernando Diaz, Mostafa Dehghani, Donald Metzler, Michael Bendersky

Abstract: Although information access systems have long supported people in accomplishing a wide range of tasks, we propose broadening the scope of users of information access systems to include task-driven machines, such as machine learning models. In this way, the core principles of indexing, representation, retrieval, and ranking can be applied and extended to substantially improve model generalization,… ▽ More Although information access systems have long supported people in accomplishing a wide range of tasks, we propose broadening the scope of users of information access systems to include task-driven machines, such as machine learning models. In this way, the core principles of indexing, representation, retrieval, and ranking can be applied and extended to substantially improve model generalization, scalability, robustness, and interpretability. We describe a generic retrieval-enhanced machine learning (REML) framework, which includes a number of existing models as special cases. REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization. The REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: To appear in proceedings of ACM SIGIR 2022

arXiv:2205.00048 [pdf, other]

Joint Multisided Exposure Fairness for Recommendation

Authors: Haolun Wu, Bhaskar Mitra, Chen Ma, Fernando Diaz, Xue Liu

Abstract: Prior research on exposure fairness in the context of recommender systems has focused mostly on disparities in the exposure of individual or groups of items to individual users of the system. The problem of how individual or groups of items may be systemically under or over exposed to groups of users, or even all users, has received relatively less attention. However, such systemic disparities in… ▽ More Prior research on exposure fairness in the context of recommender systems has focused mostly on disparities in the exposure of individual or groups of items to individual users of the system. The problem of how individual or groups of items may be systemically under or over exposed to groups of users, or even all users, has received relatively less attention. However, such systemic disparities in information exposure can result in observable social harms, such as withholding economic opportunities from historically marginalized groups (allocative harm) or amplifying gendered and racialized stereotypes (representational harm). Previously, Diaz et al. developed the expected exposure metric -- that incorporates existing user browsing models that have previously been developed for information retrieval -- to study fairness of content exposure to individual users. We extend their proposed framework to formalize a family of exposure fairness metrics that model the problem jointly from the perspective of both the consumers and producers. Specifically, we consider group attributes for both types of stakeholders to identify and mitigate fairness concerns that go beyond individual users and items towards more systemic biases in recommendation. Furthermore, we study and discuss the relationships between the different exposure fairness dimensions proposed in this paper, as well as demonstrate how stochastic ranking policies can be optimized towards said fairness goals. △ Less

Submitted 29 April, 2022; originally announced May 2022.

arXiv:2204.11400 [pdf, other]

Offline Retrieval Evaluation Without Evaluation Metrics

Authors: Fernando Diaz, Andres Ferraro

Abstract: Offline evaluation of information retrieval and recommendation has traditionally focused on distilling the quality of a ranking into a scalar metric such as average precision or normalized discounted cumulative gain. We can use this metric to compare the performance of multiple systems for the same request. Although evaluation metrics provide a convenient summary of system performance, they also c… ▽ More Offline evaluation of information retrieval and recommendation has traditionally focused on distilling the quality of a ranking into a scalar metric such as average precision or normalized discounted cumulative gain. We can use this metric to compare the performance of multiple systems for the same request. Although evaluation metrics provide a convenient summary of system performance, they also collapse subtle differences across users into a single number and can carry assumptions about user behavior and utility not supported across retrieval scenarios. We propose recall-paired preference (RPP), a metric-free evaluation method based on directly computing a preference between ranked lists. RPP simulates multiple user subpopulations per query and compares systems across these pseudo-populations. Our results across multiple search and recommendation tasks demonstrate that RPP substantially improves discriminative power while correlating well with existing metrics and being equally robust to incomplete data. △ Less

Submitted 24 April, 2022; originally announced April 2022.

Comments: to appear at SIGIR 2022

arXiv:2110.07701 [pdf, other]

Exposing Query Identification for Search Transparency

Authors: Ruohan Li, Jianxiang Li, Bhaskar Mitra, Fernando Diaz, Asia J. Biega

Abstract: Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying which queries expose a given piece of content in the ranking results is an important and relatively under-explored search transparency challenge. Ex… ▽ More Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying which queries expose a given piece of content in the ranking results is an important and relatively under-explored search transparency challenge. Exposing queries are useful for quantifying various issues of search bias, privacy, data protection, security, and search engine optimization. Exact identification of exposing queries in a given system is computationally expensive, especially in dynamic contexts such as web search. We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems: dense dual-encoder models and traditional BM25 models. We then propose how this approach can be improved through metric learning over the retrieval embedding space. We further derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI. Overall, our work contributes a novel conception of transparency in search systems and computational means of achieving it. △ Less

Submitted 11 April, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

arXiv:2108.05152 [pdf, other]

Estimation of Fair Ranking Metrics with Incomplete Judgments

Authors: Ömer Kırnap, Fernando Diaz, Asia Biega, Michael Ekstrand, Ben Carterette, Emine Yılmaz

Abstract: There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individu… ▽ More There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items. We evaluate our approach using both simulated and real world data. Our experimental results demonstrate that our method can estimate this family of fair ranking metrics and provides a robust, reliable alternative to exhaustive or random data annotation. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: Published in Proceedings of the Web Conference 2021 (WWW '21)

arXiv:2108.05135 [pdf, other]

Overview of the TREC 2020 Fair Ranking Track

Authors: Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, Sergey Feldman, Sebastian Kohlmeier

Abstract: This paper provides an overview of the NIST TREC 2020 Fair Ranking track. For 2020, we again adopted an academic search task, where we have a corpus of academic article abstracts and queries submitted to a production academic search engine. The central goal of the Fair Ranking track is to provide fair exposure to different groups of authors (a group fairness framing). We recognize that there may b… ▽ More This paper provides an overview of the NIST TREC 2020 Fair Ranking track. For 2020, we again adopted an academic search task, where we have a corpus of academic article abstracts and queries submitted to a production academic search engine. The central goal of the Fair Ranking track is to provide fair exposure to different groups of authors (a group fairness framing). We recognize that there may be multiple group definitions (e.g. based on demographics, stature, topic) and hoped for the systems to be robust to these. We expected participants to develop systems that optimize for fairness and relevance for arbitrary group definitions, and did not reveal the exact group definitions until after the evaluation runs were submitted.The track contains two tasks,reranking and retrieval, with a shared evaluation. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: Published in The Twenty-Ninth Text REtrieval Conference Proceedings (TREC 2020). arXiv admin note: substantial text overlap with arXiv:2003.11650

arXiv:2107.08096 [pdf, other]

Learning to Limit Data Collection via Scaling Laws: A Computational Interpretation for the Legal Principle of Data Minimization

Authors: Divya Shanmugam, Samira Shabanian, Fernando Diaz, Michèle Finck, Asia Biega

Abstract: Modern machine learning systems are increasingly characterized by extensive personal data collection, despite the diminishing returns and increasing societal costs of such practices. Yet, data minimisation is one of the core data protection principles enshrined in the European Union's General Data Protection Regulation ('GDPR') and requires that only personal data that is adequate, relevant and li… ▽ More Modern machine learning systems are increasingly characterized by extensive personal data collection, despite the diminishing returns and increasing societal costs of such practices. Yet, data minimisation is one of the core data protection principles enshrined in the European Union's General Data Protection Regulation ('GDPR') and requires that only personal data that is adequate, relevant and limited to what is necessary is processed. However, the principle has seen limited adoption due to the lack of technical interpretation. In this work, we build on literature in machine learning and law to propose FIDO, a Framework for Inhibiting Data Overcollection. FIDO learns to limit data collection based on an interpretation of data minimization tied to system performance. Concretely, FIDO provides a data collection stopping criterion by iteratively updating an estimate of the performance curve, or the relationship between dataset size and performance, as data is acquired. FIDO estimates the performance curve via a piecewise power law technique that models distinct phases of an algorithm's performance throughout data collection separately. Empirical experiments show that the framework produces accurate performance curves and data collection stopping criteria across datasets and feature acquisition algorithms. We further demonstrate that many other families of curves systematically overestimate the return on additional data. Results and analysis from our investigation offer deeper insights into the relevant considerations when designing a data minimization framework, including the impacts of active feature acquisition on individual users and the feasability of user-specific data minimization. We conclude with practical recommendations for the implementation of data minimization. △ Less

Submitted 12 June, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

Comments: To appear at ACM Conference on Fairness, Accountability, and Transparency, 2022

arXiv:2107.07002 [pdf, other]

The Benchmark Lottery

Authors: Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

Abstract: The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods. This paper proposes the notion of "a benchmark lottery" that describes the overall fragility of the ML benchmarking process. The benchmark lottery postulates that many factors, other than fundamental algorithmic superiority, may lead to a… ▽ More The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods. This paper proposes the notion of "a benchmark lottery" that describes the overall fragility of the ML benchmarking process. The benchmark lottery postulates that many factors, other than fundamental algorithmic superiority, may lead to a method being perceived as superior. On multiple benchmark setups that are prevalent in the ML community, we show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks, highlighting the fragility of the current paradigms and potential fallacious interpretation derived from benchmarking ML methods. Given that every benchmark makes a statement about what it perceives to be important, we argue that this might lead to biased progress in the community. We discuss the implications of the observed phenomena and provide recommendations on mitigating them using multiple machine learning domains and communities as use cases, including natural language processing, computer vision, information retrieval, recommender systems, and reinforcement learning. △ Less

Submitted 14 July, 2021; originally announced July 2021.

arXiv:2105.05779 [pdf, other]

doi 10.1561/1500000079

Fairness in Information Access Systems

Authors: Michael D. Ekstrand, Anubrata Das, Robin Burke, Fernando Diaz

Abstract: Recommendation, information retrieval, and other information access systems pose unique challenges for investigating and applying the fairness and non-discrimination concepts that have been developed for studying other machine learning systems. While fair information access shares many commonalities with fair classification, the multistakeholder nature of information access applications, the rank-… ▽ More Recommendation, information retrieval, and other information access systems pose unique challenges for investigating and applying the fairness and non-discrimination concepts that have been developed for studying other machine learning systems. While fair information access shares many commonalities with fair classification, the multistakeholder nature of information access applications, the rank-based problem setting, the centrality of personalization in many cases, and the role of user response complicate the problem of identifying precisely what types and operationalizations of fairness may be relevant, let alone measuring or promoting them. In this monograph, we present a taxonomy of the various dimensions of fair information access and survey the literature to date on this new and rapidly-growing topic. We preface this with brief introductions to information access and algorithmic fairness, to facilitate use of this work by scholars with experience in one (or neither) of these fields who wish to learn about their intersection. We conclude with several open problems in fair information access, along with some suggestions for how to approach research in this space. △ Less

Submitted 12 July, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

ACM Class: H.3.3; K.4

Journal ref: Foundations and Trends in Information Retrieval 16:1-2 - 2022

arXiv:2105.02951 [pdf, other]

Multi-FR: A Multi-objective Optimization Framework for Multi-stakeholder Fairness-aware Recommendation

Authors: Haolun Wu, Chen Ma, Bhaskar Mitra, Fernando Diaz, Xue Liu

Abstract: Nowadays, most online services are hosted on multi-stakeholder marketplaces, where consumers and producers may have different objectives. Conventional recommendation systems, however, mainly focus on maximizing consumers' satisfaction by recommending the most relevant items to each individual. This may result in unfair exposure of items, thus jeopardizing producer benefits. Additionally, they do n… ▽ More Nowadays, most online services are hosted on multi-stakeholder marketplaces, where consumers and producers may have different objectives. Conventional recommendation systems, however, mainly focus on maximizing consumers' satisfaction by recommending the most relevant items to each individual. This may result in unfair exposure of items, thus jeopardizing producer benefits. Additionally, they do not care whether consumers from diverse demographic groups are equally satisfied. To address these limitations, we propose a multi-objective optimization framework for fairness-aware recommendation, Multi-FR, that adaptively balances accuracy and fairness for various stakeholders with Pareto optimality guarantee. We first propose four fairness constraints on consumers and producers. In order to train the whole framework in an end-to-end way, we utilize the smooth rank and stochastic ranking policy to make these fairness criteria differentiable and friendly to back-propagation. Then, we adopt the multiple gradient descent algorithm to generate a Pareto set of solutions, from which the most appropriate one is selected by the Least Misery Strategy. The experimental results demonstrate that Multi-FR largely improves recommendation fairness on multiple stakeholders over the state-of-the-art approaches while maintaining almost the same recommendation accuracy. The training efficiency study confirms our model's ability to simultaneously optimize different fairness constraints for many stakeholders efficiently. △ Less

Submitted 9 August, 2022; v1 submitted 6 May, 2021; originally announced May 2021.

Comments: 29 pages

arXiv:2101.07124 [pdf, ps, other]

Tip of the Tongue Known-Item Retrieval: A Case Study in Movie Identification

Authors: Jaime Arguello, Adam Ferguson, Emery Fine, Bhaskar Mitra, Hamed Zamani, Fernando Diaz

Abstract: While current information retrieval systems are effective for known-item retrieval where the searcher provides a precise name or identifier for the item being sought, systems tend to be much less effective for cases where the searcher is unable to express a precise name or identifier. We refer to this as tip of the tongue (TOT) known-item retrieval, named after the cognitive state of not being abl… ▽ More While current information retrieval systems are effective for known-item retrieval where the searcher provides a precise name or identifier for the item being sought, systems tend to be much less effective for cases where the searcher is unable to express a precise name or identifier. We refer to this as tip of the tongue (TOT) known-item retrieval, named after the cognitive state of not being able to retrieve an item from memory. Using movie search as a case study, we explore the characteristics of questions posed by searchers in TOT states in a community question answering website. We analyze how searchers express their information needs during TOT states in the movie domain. Specifically, what information do searchers remember about the item being sought and how do they convey this information? Our results suggest that searchers use a combination of information about: (1) the content of the item sought, (2) the context in which they previously engaged with the item, and (3) previous attempts to find the item using other resources (e.g., search engines). Additionally, searchers convey information by sometimes expressing uncertainty (i.e., hedging), opinions, emotions, and by performing relative (vs. absolute) comparisons with attributes of the item. As a result of our analysis, we believe that searchers in TOT states may require specialized query understanding methods or document representations. Finally, our preliminary retrieval experiments show the impact of each information type presented in information requests on retrieval performance. △ Less

Submitted 18 January, 2021; originally announced January 2021.

arXiv:2012.12712 [pdf, other]

Chest x-ray automated triage: a semiologic approach designed for clinical implementation, exploiting different types of labels through a combination of four Deep Learning architectures

Authors: Candelaria Mosquera, Facundo Nahuel Diaz, Fernando Binder, Jose Martin Rabellino, Sonia Elizabeth Benitez, Alejandro Daniel Beresñak, Alberto Seehaus, Gabriel Ducrey, Jorge Alberto Ocantos, Daniel Roberto Luna

Abstract: BACKGROUND AND OBJECTIVES: The multiple chest x-ray datasets released in the last years have ground-truth labels intended for different computer vision tasks, suggesting that performance in automated chest-xray interpretation might improve by using a method that can exploit diverse types of annotations. This work presents a Deep Learning method based on the late fusion of different convolutional a… ▽ More BACKGROUND AND OBJECTIVES: The multiple chest x-ray datasets released in the last years have ground-truth labels intended for different computer vision tasks, suggesting that performance in automated chest-xray interpretation might improve by using a method that can exploit diverse types of annotations. This work presents a Deep Learning method based on the late fusion of different convolutional architectures, that allows training with heterogeneous data with a simple implementation, and evaluates its performance on independent test data. We focused on obtaining a clinically useful tool that could be successfully integrated into a hospital workflow. MATERIALS AND METHODS: Based on expert opinion, we selected four target chest x-ray findings, namely lung opacities, fractures, pneumothorax and pleural effusion. For each finding we defined the most adequate type of ground-truth label, and built four training datasets combining images from public chest x-ray datasets and our institutional archive. We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool. Performance was measured on two test datasets: an external openly-available dataset, and a retrospective institutional dataset, to estimate performance on local population. RESULTS: The external and local test sets had 4376 and 1064 images, respectively, for which the model showed an area under the Receiver Operating Characteristics curve of 0.75 (95%CI: 0.74-0.76) and 0.87 (95%CI: 0.86-0.89) in the detection of abnormal chest x-rays. For the local population, a sensitivity of 86% (95%CI: 84-90), and a specificity of 88% (95%CI: 86-90) were obtained, with no significant differences between demographic subgroups. We present examples of heatmaps to show the accomplished level of interpretability, examining true and false positives. △ Less

Submitted 23 December, 2020; originally announced December 2020.

arXiv:2008.08871 [pdf]

doi 10.1038/s41467-021-25221-2

Deep learning-based transformation of the H&E stain into special stains

Authors: Kevin de Haan, Yijie Zhang, Jonathan E. Zuckerman, Tairan Liu, Anthony E. Sisk, Miguel F. P. Diaz, Kuang-Yu Jen, Alexander Nobori, Sofia Liou, Sarah Zhang, Rana Riahi, Yair Rivenson, W. Dean Wallace, Aydogan Ozcan

Abstract: Pathology is practiced by visual inspection of histochemically stained slides. Most commonly, the hematoxylin and eosin (H&E) stain is used in the diagnostic workflow and it is the gold standard for cancer diagnosis. However, in many cases, especially for non-neoplastic diseases, additional "special stains" are used to provide different levels of contrast and color to tissue components and allow p… ▽ More Pathology is practiced by visual inspection of histochemically stained slides. Most commonly, the hematoxylin and eosin (H&E) stain is used in the diagnostic workflow and it is the gold standard for cancer diagnosis. However, in many cases, especially for non-neoplastic diseases, additional "special stains" are used to provide different levels of contrast and color to tissue components and allow pathologists to get a clearer diagnostic picture. In this study, we demonstrate the utility of supervised learning-based computational stain transformation from H&E to different special stains (Masson's Trichrome, periodic acid-Schiff and Jones silver stain) using tissue sections from kidney needle core biopsies. Based on evaluation by three renal pathologists, followed by adjudication by a fourth renal pathologist, we show that the generation of virtual special stains from existing H&E images improves the diagnosis in several non-neoplastic kidney diseases sampled from 58 unique subjects. A second study performed by three pathologists found that the quality of the special stains generated by the stain transformation network was statistically equivalent to those generated through standard histochemical staining. As the transformation of H&E images into special stains can be achieved within 1 min or less per patient core specimen slide, this stain-to-stain transformation framework can improve the quality of the preliminary diagnosis when additional special stains are needed, along with significant savings in time and cost, reducing the burden on healthcare system and patients. △ Less

Submitted 12 August, 2021; v1 submitted 20 August, 2020; originally announced August 2020.

Comments: 27 Pages, 6 Figures

MSC Class: 68T01; 68T05; 68U10; 62M45; 78M32; 92C50; 92C55; 94A08 ACM Class: I.2; I.2.1; I.2.6; I.2.10; I.3; I.3.3; I.4.3; I.4.4; I.4.9; J.3

Journal ref: Nature Communications (2021)

arXiv:2007.05039 [pdf, other]

doi 10.5210/fm.v27i2.10887

On the Social and Technical Challenges of Web Search Autosuggestion Moderation

Authors: Timothy J. Hazen, Alexandra Olteanu, Gabriella Kazai, Fernando Diaz, Michael Golebiewski

Abstract: Past research shows that users benefit from systems that support them in their writing and exploration tasks. The autosuggestion feature of Web search engines is an example of such a system: It helps users in formulating their queries by offering a list of suggestions as they type. Autosuggestions are typically generated by machine learning (ML) systems trained on a corpus of search logs and docum… ▽ More Past research shows that users benefit from systems that support them in their writing and exploration tasks. The autosuggestion feature of Web search engines is an example of such a system: It helps users in formulating their queries by offering a list of suggestions as they type. Autosuggestions are typically generated by machine learning (ML) systems trained on a corpus of search logs and document representations. Such automated methods can become prone to issues that result in problematic suggestions that are biased, racist, sexist or in other ways inappropriate. While current search engines have become increasingly proficient at suppressing such problematic suggestions, there are still persistent issues that remain. In this paper, we reflect on past efforts and on why certain issues still linger by covering explored solutions along a prototypical pipeline for identifying, detecting, and addressing problematic autosuggestions. To showcase their complexity, we discuss several dimensions of problematic suggestions, difficult issues along the pipeline, and why our discussion applies to the increasing number of applications beyond web search that implement similar textual suggestion features. By outlining persistent social and technical challenges in moderating web search suggestions, we provide a renewed call for action. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: 17 Pages, 4 images displayed within 3 latex figures

Journal ref: First Monday, Volume 27, Number 2, February 7, 2022

arXiv:2006.00166 [pdf, other]

Analyzing and Learning from User Interactions for Search Clarification

Authors: Hamed Zamani, Bhaskar Mitra, Everest Chen, Gord Lueck, Fernando Diaz, Paul N. Bennett, Nick Craswell, Susan T. Dumais

Abstract: Asking clarifying questions in response to search queries has been recognized as a useful technique for revealing the underlying intent of the query. Clarification has applications in retrieval systems with different interfaces, from the traditional web search interfaces to the limited bandwidth interfaces as in speech-only and small screen devices. Generation and evaluation of clarifying question… ▽ More Asking clarifying questions in response to search queries has been recognized as a useful technique for revealing the underlying intent of the query. Clarification has applications in retrieval systems with different interfaces, from the traditional web search interfaces to the limited bandwidth interfaces as in speech-only and small screen devices. Generation and evaluation of clarifying questions have been recently studied in the literature. However, user interaction with clarifying questions is relatively unexplored. In this paper, we conduct a comprehensive study by analyzing large-scale user interactions with clarifying questions in a major web search engine. In more detail, we analyze the user engagements received by clarifying questions based on different properties of search queries, clarifying questions, and their candidate answers. We further study click bias in the data, and show that even though reading clarifying questions and candidate answers does not take significant efforts, there still exist some position and presentation biases in the data. We also propose a model for learning representation for clarifying questions based on the user interaction data as implicit feedback. The model is used for re-ranking a number of automatically generated clarifying questions for a given query. Evaluation on both click data and human labeled data demonstrates the high quality of the proposed method. △ Less

Submitted 29 May, 2020; originally announced June 2020.

Comments: To appear in the Proceedings of SIGIR 2020

arXiv:2005.13718 [pdf, other]

Operationalizing the Legal Principle of Data Minimization for Personalization

Authors: Asia J. Biega, Peter Potash, Hal Daumé III, Fernando Diaz, Michèle Finck

Abstract: Article 5(1)(c) of the European Union's General Data Protection Regulation (GDPR) requires that "personal data shall be [...] adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed (`data minimisation')". To date, the legal and computational definitions of `purpose limitation' and `data minimization' remain largely unclear. In particular, the… ▽ More Article 5(1)(c) of the European Union's General Data Protection Regulation (GDPR) requires that "personal data shall be [...] adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed (`data minimisation')". To date, the legal and computational definitions of `purpose limitation' and `data minimization' remain largely unclear. In particular, the interpretation of these principles is an open issue for information access systems that optimize for user experience through personalization and do not strictly require personal data collection for the delivery of basic service. In this paper, we identify a lack of a homogeneous interpretation of the data minimization principle and explore two operational definitions applicable in the context of personalization. The focus of our empirical study in the domain of recommender systems is on providing foundational insights about the (i) feasibility of different data minimization definitions, (ii) robustness of different recommendation algorithms to minimization, and (iii) performance of different minimization strategies.We find that the performance decrease incurred by data minimization might not be substantial, but that it might disparately impact different users---a finding which has implications for the viability of different formal minimization definitions. Overall, our analysis uncovers the complexities of the data minimization problem in the context of personalization and maps the remaining computational and regulatory challenges. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Comments: SIGIR 2020 paper: In Proc. of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

arXiv:2004.13157 [pdf, other]

doi 10.1145/3340531.3411962

Evaluating Stochastic Rankings with Expected Exposure

Authors: Fernando Diaz, Bhaskar Mitra, Michael D. Ekstrand, Asia J. Biega, Ben Carterette

Abstract: We introduce the concept of \emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principl… ▽ More We introduce the concept of \emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principle is desirable for many retrieval objectives and scenarios, including topical diversity and fair ranking. Leveraging user models from existing retrieval metrics, we propose a general evaluation methodology based on expected exposure and draw connections to related metrics in information retrieval evaluation. Importantly, this methodology relaxes classic information retrieval assumptions, allowing a system, in response to a query, to produce a \emph{distribution over rankings} instead of a single fixed ranking. We study the behavior of the expected exposure metric and stochastic rankers across a variety of information access conditions, including \emph{ad hoc} retrieval and recommendation. We believe that measuring and optimizing expected exposure metrics using randomization opens a new area for retrieval algorithm development and progress. △ Less

Submitted 20 October, 2020; v1 submitted 27 April, 2020; originally announced April 2020.

Comments: In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20). Association for Computing Machinery, New York, NY, USA

arXiv:2003.11650 [pdf, other]

Overview of the TREC 2019 Fair Ranking Track

Authors: Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, Sebastian Kohlmeier

Abstract: The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers in addition to classic notions of relevance. As part of the benchmark, we defined standardized fairness metrics with evaluation protocols and released a dataset for the fair ranking problem. The 2019 task focused on reranking academic paper abstrac… ▽ More The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers in addition to classic notions of relevance. As part of the benchmark, we defined standardized fairness metrics with evaluation protocols and released a dataset for the fair ranking problem. The 2019 task focused on reranking academic paper abstracts given a query. The objective was to fairly represent relevant authors from several groups that were unknown at the system submission time. Thus, the track emphasized the development of systems which have robust performance across a variety of group definitions. Participants were provided with querylog data (queries, documents, and relevance) from Semantic Scholar. This paper presents an overview of the track, including the task definition, descriptions of the data and the annotation process, as well as a comparison of the performance of submitted systems. △ Less

Submitted 25 March, 2020; originally announced March 2020.

Comments: Published in The Twenty-Eighth Text REtrieval Conference Proceedings (TREC 2019)

arXiv:1907.03693 [pdf, ps, other]

Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks

Authors: Bhaskar Mitra, Corby Rosset, David Hawking, Nick Craswell, Fernando Diaz, Emine Yilmaz

Abstract: Classical information retrieval (IR) methods, such as query likelihood and BM25, score documents independently w.r.t. each query term, and then accumulate the scores. Assuming query term independence allows precomputing term-document scores using these models---which can be combined with specialized data structures, such as inverted index, for efficient retrieval. Deep neural IR models, in contras… ▽ More Classical information retrieval (IR) methods, such as query likelihood and BM25, score documents independently w.r.t. each query term, and then accumulate the scores. Assuming query term independence allows precomputing term-document scores using these models---which can be combined with specialized data structures, such as inverted index, for efficient retrieval. Deep neural IR models, in contrast, compare the whole query to the document and are, therefore, typically employed only for late stage re-ranking. We incorporate query term independence assumption into three state-of-the-art neural IR models: BERT, Duet, and CKNRM---and evaluate their performance on a passage ranking task. Surprisingly, we observe no significant loss in result quality for Duet and CKNRM---and a small degradation in the case of BERT. However, by operating on each query term independently, these otherwise computationally intensive models become amenable to offline precomputation---dramatically reducing the cost of query evaluations employing state-of-the-art neural ranking models. This strategy makes it practical to use deep models for retrieval from large collections---and not restrict their usage to late stage re-ranking. △ Less

Submitted 8 July, 2019; originally announced July 2019.

arXiv:1710.06871 [pdf]

Promoting Saving for College Through Data Science

Authors: Fernando Diaz, Natnaell Mammo

Abstract: The cost of attending college has been steadily rising and in 10 years is estimated to reach $140,000 for a 4-year public university. Recent surveys estimate just over half of US families are saving for college. State-operated 529 college savings plans are an effective way for families to plan and save for future college costs, but only 3% of families currently use them. The Office of the Illinois… ▽ More The cost of attending college has been steadily rising and in 10 years is estimated to reach $140,000 for a 4-year public university. Recent surveys estimate just over half of US families are saving for college. State-operated 529 college savings plans are an effective way for families to plan and save for future college costs, but only 3% of families currently use them. The Office of the Illinois State Treasurer (Treasurer) administers two 529 plans to help its residents save for college. In order to increase the number of families saving for college, the Treasurer and Civis Analytics used data science techniques to identify the people most likely to sign up for a college savings plan. In this paper, we will discuss the use of person matching to join accountholder data from the Treasurer to the Civis National File, as well as the use of lookalike modeling to identify new potential signups. In order to avoid reinforcing existing demographic imbalances in who saves for college, the lookalike models used were ensured to be racially and economically balanced. We will also discuss how these new signup targets were then individually served digital ads to encourage opening college savings accounts. △ Less

Submitted 18 October, 2017; originally announced October 2017.

Comments: Presented at the Data For Good Exchange 2017

arXiv:1705.10689 [pdf, other]

doi 10.1145/3041021.3054197

Auditing Search Engines for Differential Satisfaction Across Demographics

Authors: Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, Emine Yilmaz

Abstract: Many online services, such as search engines, social media platforms, and digital marketplaces, are advertised as being available to any user, regardless of their age, gender, or other demographic factors. However, there are growing concerns that these services may systematically underserve some groups of users. In this paper, we present a framework for internally auditing such services for differ… ▽ More Many online services, such as search engines, social media platforms, and digital marketplaces, are advertised as being available to any user, regardless of their age, gender, or other demographic factors. However, there are growing concerns that these services may systematically underserve some groups of users. In this paper, we present a framework for internally auditing such services for differences in user satisfaction across demographic groups, using search engines as a case study. We first explain the pitfalls of naïvely comparing the behavioral metrics that are commonly used to evaluate search engines. We then propose three methods for measuring latent differences in user satisfaction from observed differences in evaluation metrics. To develop these methods, we drew on ideas from the causal inference literature and the multilevel modeling literature. Our framework is broadly applicable to other online services, and provides general insight into interpreting their evaluation metrics. △ Less

Submitted 24 May, 2017; originally announced May 2017.

Comments: 8 pages Accepted at WWW 2017

arXiv:1702.05042 [pdf, ps, other]

Luandri: a Clean Lua Interface to the Indri Search Engine

Authors: Bhaskar Mitra, Fernando Diaz, Nick Craswell

Abstract: In recent years, the information retrieval (IR) community has witnessed the first successful applications of deep neural network models to short-text matching and ad-hoc retrieval. It is exciting to see the research on deep neural networks and IR converge on these tasks of shared interest. However, the two communities have less in common when it comes to the choice of programming languages. Indri,… ▽ More In recent years, the information retrieval (IR) community has witnessed the first successful applications of deep neural network models to short-text matching and ad-hoc retrieval. It is exciting to see the research on deep neural networks and IR converge on these tasks of shared interest. However, the two communities have less in common when it comes to the choice of programming languages. Indri, an indexing framework popularly used by the IR community, is written in C++, while Torch, a popular machine learning library for deep learning, is written in the light-weight scripting language Lua. To bridge this gap, we introduce Luandri (pronounced "laundry"), a simple interface for exposing the search capabilities of Indri to Torch models implemented in Lua. △ Less

Submitted 16 February, 2017; originally announced February 2017.

Comments: Under review for SIGIR'17

arXiv:1610.08136 [pdf, other]

Learning to Match Using Local and Distributed Representations of Text for Web Search

Authors: Bhaskar Mitra, Fernando Diaz, Nick Craswell

Abstract: Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text.… ▽ More Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text. We hypothesize that matching with distributed representations complements matching with traditional local representations, and that a combination of the two is favorable. We propose a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that matches the query and the document using learned distributed representations. The two networks are jointly trained as part of a single neural network. We show that this combination or `duet' performs significantly better than either neural network individually on a Web page ranking task, and also significantly outperforms traditional baselines and other recently proposed models based on neural networks. △ Less

Submitted 25 October, 2016; originally announced October 2016.

arXiv:1609.02075 [pdf, other]

The Social Dynamics of Language Change in Online Networks

Authors: Rahul Goel, Sandeep Soni, Naman Goyal, John Paparrizos, Hanna Wallach, Fernando Diaz, Jacob Eisenstein

Abstract: Language change is a complex social phenomenon, revealing pathways of communication and sociocultural influence. But, while language change has long been a topic of study in sociolinguistics, traditional linguistic research methods rely on circumstantial evidence, estimating the direction of change from differences between older and younger speakers. In this paper, we use a data set of several mil… ▽ More Language change is a complex social phenomenon, revealing pathways of communication and sociocultural influence. But, while language change has long been a topic of study in sociolinguistics, traditional linguistic research methods rely on circumstantial evidence, estimating the direction of change from differences between older and younger speakers. In this paper, we use a data set of several million Twitter users to track language changes in progress. First, we show that language change can be viewed as a form of social influence: we observe complex contagion for phonetic spellings and "netspeak" abbreviations (e.g., lol), but not for older dialect markers from spoken language. Next, we test whether specific types of social network connections are more influential than others, using a parametric Hawkes process model. We find that tie strength plays an important role: densely embedded social ties are significantly better conduits of linguistic influence. Geographic locality appears to play a more limited role: we find relatively little evidence to support the hypothesis that individuals are more influenced by geographically local social ties, even in their usage of geographical dialect markers. △ Less

Submitted 7 September, 2016; originally announced September 2016.

Comments: This paper appears in the Proceedings of the International Conference on Social Informatics (SocInfo16). The final publication is available at springer.com

ACM Class: I.2.7; J.4; J.5

arXiv:1605.07891 [pdf, other]

Query Expansion with Locally-Trained Word Embeddings

Authors: Fernando Diaz, Bhaskar Mitra, Nick Craswell

Abstract: Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships. We study the use of term relatedness in the context of query expansion for ad hoc information retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when trained globally,… ▽ More Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships. We study the use of term relatedness in the context of query expansion for ad hoc information retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddings for retrieval tasks. These results suggest that other tasks benefiting from global embeddings may also benefit from local embeddings. △ Less

Submitted 22 June, 2016; v1 submitted 25 May, 2016; originally announced May 2016.

arXiv:1605.03664 [pdf, other]

Real-Time Web Scale Event Summarization Using Sequential Decision Making

Authors: Chris Kedzie, Fernando Diaz, Kathleen McKeown

Abstract: We present a system based on sequential decision making for the online summarization of massive document streams, such as those found on the web. Given an event of interest (e.g. "Boston marathon bombing"), our system is able to filter the stream for relevance and produce a series of short text updates describing the event as it unfolds over time. Unlike previous work, our approach is able to join… ▽ More We present a system based on sequential decision making for the online summarization of massive document streams, such as those found on the web. Given an event of interest (e.g. "Boston marathon bombing"), our system is able to filter the stream for relevance and produce a series of short text updates describing the event as it unfolds over time. Unlike previous work, our approach is able to jointly model the relevance, comprehensiveness, novelty, and timeliness required by time-sensitive queries. We demonstrate a 28.3% improvement in summary F1 and a 43.8% improvement in time-sensitive F1 metrics. △ Less

Submitted 11 May, 2016; originally announced May 2016.

Comments: in Proceedings of the 25th International Joint Conference on Artificial Intelligence 2016

arXiv:1604.04146 [pdf, other]

doi 10.1007/s00500-016-2114-1

A Discrete Firefly Algorithm to Solve a Rich Vehicle Routing Problem Modelling a Newspaper Distribution System with Recycling Policy

Authors: E. Osaba, Xin-She Yang, F. Diaz, E. Onieva, A. D. Masegosa, A. Perallos

Abstract: A real-world newspaper distribution problem with recycling policy is tackled in this work. In order to meet all the complex restrictions contained in such a problem, it has been modeled as a rich vehicle routing problem, which can be more specifically considered as an asymmetric and clustered vehicle routing problem with simultaneous pickup and deliveries, variable costs and forbidden paths (AC-VR… ▽ More A real-world newspaper distribution problem with recycling policy is tackled in this work. In order to meet all the complex restrictions contained in such a problem, it has been modeled as a rich vehicle routing problem, which can be more specifically considered as an asymmetric and clustered vehicle routing problem with simultaneous pickup and deliveries, variable costs and forbidden paths (AC-VRP-SPDVCFP). This is the first study of such a problem in the literature. For this reason, a benchmark composed by 15 instances has been also proposed. In the design of this benchmark, real geographical positions have been used, located in the province of Bizkaia, Spain. For the proper treatment of this AC-VRP-SPDVCFP, a discrete firefly algorithm (DFA) has been developed. This application is the first application of the firefly algorithm to any rich vehicle routing problem. To prove that the proposed DFA is a promising technique, its performance has been compared with two other well-known techniques: an evolutionary algorithm and an evolutionary simulated annealing. Our results have shown that the DFA has outperformed these two classic meta-heuristics. △ Less

Submitted 14 April, 2016; originally announced April 2016.

Comments: 7 tables and 4 figures

MSC Class: 78M32

arXiv:1604.04138 [pdf, other]

doi 10.1016/j.engappai.2015.10.006

An Improved Discrete Bat Algorithm for Symmetric and Asymmetric Traveling Salesman Problems

Authors: Eneko Osaba, Xin-She Yang, Fernando Diaz, Pedro Lopez-Garcia, Roberto Carballedo

Abstract: Bat algorithm is a population metaheuristic proposed in 2010 which is based on the echolocation or bio-sonar characteristics of microbats. Since its first implementation, the bat algorithm has been used in a wide range of fields. In this paper, we present a discrete version of the bat algorithm to solve the well-known symmetric and asymmetric traveling salesman problems. In addition, we propose an… ▽ More Bat algorithm is a population metaheuristic proposed in 2010 which is based on the echolocation or bio-sonar characteristics of microbats. Since its first implementation, the bat algorithm has been used in a wide range of fields. In this paper, we present a discrete version of the bat algorithm to solve the well-known symmetric and asymmetric traveling salesman problems. In addition, we propose an improvement in the basic structure of the classic bat algorithm. To prove that our proposal is a promising approximation method, we have compared its performance in 37 instances with the results obtained by five different techniques: evolutionary simulated annealing, genetic algorithm, an island based distributed genetic algorithm, a discrete firefly algorithm and an imperialist competitive algorithm. In order to obtain fair and rigorous comparisons, we have conducted three different statistical tests along the paper: the Student's $t$-test, the Holm's test, and the Friedman test. We have also compared the convergence behaviour shown by our proposal with the ones shown by the evolutionary simulated annealing, and the discrete firefly algorithm. The experimentation carried out in this study has shown that the presented improved bat algorithm outperforms significantly all the other alternatives in most of the cases. △ Less

Submitted 14 April, 2016; originally announced April 2016.

Comments: 1 figure, 8 tables

MSC Class: 78M32

Journal ref: Engineering Applications of Artificial Intelligence, 48 (1), 59-71 (2016)

arXiv:1603.04119 [pdf, other]

Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

Authors: David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire

Abstract: High-dimensional observations and complex real-world dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals. And second, we propose an exploration strateg… ▽ More High-dimensional observations and complex real-world dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals. And second, we propose an exploration strategy inspired by the principles of state abstraction and information acquisition under uncertainty. We demonstrate the empirical effectiveness of these techniques, first, as a preliminary check, on two standard tasks (Blackjack and $n$-Chain), and then on two much larger and more realistic tasks with high-dimensional observation spaces. Specifically, we introduce two benchmarks built within the game Minecraft where the observations are pixel arrays of the agent's visual field. A combination of our two algorithmic techniques performs competitively on the standard reinforcement-learning tasks while consistently and substantially outperforming baselines on the two tasks with high-dimensional observation spaces. The new function approximator, exploration strategy, and evaluation benchmarks are each of independent interest in the pursuit of reinforcement-learning methods that scale to real-world domains. △ Less

Submitted 13 March, 2016; originally announced March 2016.

arXiv:1507.03928 [pdf, other]

Pseudo-Query Reformulation

Authors: Fernando Diaz

Abstract: Automatic query reformulation refers to rewriting a user's original query in order to improve the ranking of retrieval results compared to the original query. We present a general framework for automatic query reformulation based on discrete optimization. Our approach, referred to as pseudo-query reformulation, treats automatic query reformulation as a search problem over the graph of unweighted q… ▽ More Automatic query reformulation refers to rewriting a user's original query in order to improve the ranking of retrieval results compared to the original query. We present a general framework for automatic query reformulation based on discrete optimization. Our approach, referred to as pseudo-query reformulation, treats automatic query reformulation as a search problem over the graph of unweighted queries linked by minimal transformations (e.g. term additions, deletions). This framework allows us to test existing performance prediction methods as heuristics for the graph search process. We demonstrate the effectiveness of the approach on several publicly available datasets. △ Less

Submitted 14 July, 2015; originally announced July 2015.

Showing 1–50 of 51 results for author: Diaz, F