Skip to main content

Showing 1–50 of 51 results for author: Diaz, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12982  [pdf, other

    cs.LG cs.CL cs.IR

    Retrieval-Enhanced Machine Learning: Synthesis and Opportunities

    Authors: To Eun Kim, Alireza Salemi, Andrew Drozdov, Fernando Diaz, Hamed Zamani

    Abstract: In the field of language modeling, models augmented with retrieval components have emerged as a promising solution to address several challenges faced in the natural language processing (NLP) field, including knowledge grounding, interpretability, and scalability. Despite the primary focus on NLP, we posit that the paradigm of retrieval-enhancement can be extended to a broader spectrum of machine… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2406.11565  [pdf, other

    cs.CL cs.CY

    Extrinsic Evaluation of Cultural Competence in Large Language Models

    Authors: Shaily Bhatt, Fernando Diaz

    Abstract: Productive interactions between diverse users and language technologies require outputs from the latter to be culturally relevant and sensitive. Prior works have evaluated models' knowledge of cultural norms, values, and artifacts, without considering how this knowledge manifests in downstream applications. In this work, we focus on extrinsic evaluation of cultural competence in two text generatio… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Under peer review

  3. Capturing waste collection planning expert knowledge in a fitness function through preference learning

    Authors: Laura Fernández Díaz, Miriam Fernández Díaz, José Ramón Quevedo, Elena Montañés

    Abstract: This paper copes with the COGERSA waste collection process. Up to now, experts have been manually designed the process using a trial and error mechanism. This process is not globally optimized, since it has been progressively and locally built as council demands appear. Planning optimization algorithms usually solve it, but they need a fitness function to evaluate a route planning quality. The dra… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Journal ref: Engineering Applications of Artificial Intelligence 2021 Volume 99 104113

  4. arXiv:2312.12251  [pdf, other

    cs.SI

    Fairness and Consensus in Opinion Models (Technical Report)

    Authors: Jesús Aranda, Sebastián Betancourt, Juan Fco. Díaz, Frank Valencia

    Abstract: We introduce a DeGroot-based model for opinion dynamics in social networks. A community of agents is represented as a weighted directed graph whose edges indicate how much agents influence one another. The model is formalized using labeled transition systems, henceforth called opinion transition systems (OTS), whose states represent the agents' opinions and whose actions are the edges of the influ… ▽ More

    Submitted 11 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: 31 pages total: 15 main work, 2 for references and 13 for two proof appendices. 5 figures

  5. arXiv:2310.20091  [pdf, other

    cs.IR

    Density-based User Representation using Gaussian Process Regression for Multi-interest Personalized Retrieval

    Authors: Haolun Wu, Ofer Meshi, Masrour Zoghi, Fernando Diaz, Xue Liu, Craig Boutilier, Maryam Karimzadehgan

    Abstract: Accurate modeling of the diverse and dynamic interests of users remains a significant challenge in the design of personalized recommender systems. Existing user modeling methods, like single-point and multi-point representations, have limitations w.r.t.\ accuracy, diversity, and adaptability. To overcome these deficiencies, we introduce density-based user representations (DURs), a novel method tha… ▽ More

    Submitted 22 May, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: 22 pages

  6. arXiv:2309.05892  [pdf, other

    cs.IR cs.HC

    Distributionally-Informed Recommender System Evaluation

    Authors: Michael D. Ekstrand, Ben Carterette, Fernando Diaz

    Abstract: Current practice for evaluating recommender systems typically focuses on point estimates of user-oriented effectiveness metrics or business metrics, sometimes combined with additional metrics for considerations such as diversity and novelty. In this paper, we argue for the need for researchers and practitioners to attend more closely to various distributions that arise from a recommender system (o… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted to ACM Transactions on Recommender Systems

  7. arXiv:2308.14601  [pdf, other

    cs.CY cs.IR cs.LG

    Fairness Through Domain Awareness: Mitigating Popularity Bias For Music Discovery

    Authors: Rebecca Salganik, Fernando Diaz, Golnoosh Farnadi

    Abstract: As online music platforms grow, music recommender systems play a vital role in helping users navigate and discover content within their vast musical databases. At odds with this larger goal, is the presence of popularity bias, which causes algorithmic systems to favor mainstream content over, potentially more relevant, but niche items. In this work we explore the intrinsic relationship between mus… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  8. arXiv:2308.02887  [pdf, other

    cs.IR

    The Impact of Group Membership Bias on the Quality and Fairness of Exposure in Ranking

    Authors: Ali Vardasbi, Maarten de Rijke, Fernando Diaz, Mostafa Dehghani

    Abstract: When learning to rank from user interactions, search and recommender systems must address biases in user behavior to provide a high-quality ranking. One type of bias that has recently been studied in the ranking literature is when sensitive attributes, such as gender, have an impact on a user's judgment about an item's utility. For example, in a search for an expertise area, some users may be bias… ▽ More

    Submitted 29 April, 2024; v1 submitted 5 August, 2023; originally announced August 2023.

  9. arXiv:2307.03201  [pdf, ps, other

    cs.LG cond-mat.dis-nn cs.AI cs.CY

    Scaling Laws Do Not Scale

    Authors: Fernando Diaz, Michael Madaio

    Abstract: Recent work has proposed a power law relationship, referred to as ``scaling laws,'' between the performance of artificial intelligence (AI) models and aspects of those models' design (e.g., dataset size). In other words, as the size of a dataset (or model parameters, etc) increases, the performance of a given model trained on that dataset will correspondingly increase. However, while compelling in… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  10. arXiv:2306.07908  [pdf, other

    cs.IR

    Best-Case Retrieval Evaluation: Improving the Sensitivity of Reciprocal Rank with Lexicographic Precision

    Authors: Fernando Diaz

    Abstract: Across a variety of ranking tasks, researchers use reciprocal rank to measure the effectiveness for users interested in exactly one relevant item. Despite its widespread use, evidence suggests that reciprocal rank is brittle when discriminating between systems. This brittleness, in turn, is compounded in modern evaluation settings where current, high-precision systems may be difficult to distingui… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  11. arXiv:2303.09335  [pdf, other

    astro-ph.EP astro-ph.IM cs.LG

    ExoplANNET: A deep learning algorithm to detect and identify planetary signals in radial velocity data

    Authors: L. A. Nieto, R. F. Díaz

    Abstract: The detection of exoplanets with the radial velocity method consists in detecting variations of the stellar velocity caused by an unseen sub-stellar companion. Instrumental errors, irregular time sampling, and different noise sources originating in the intrinsic variability of the star can hinder the interpretation of the data, and even lead to spurious detections. In recent times, work began to e… ▽ More

    Submitted 1 July, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted for publication; Corrected typos; Added section 6.1 with a robustness analysis of the method; Added section 6.2 with tests on a real time series; Added section 6.3 with a more detailed analysis of the caution of the network around activity periods; Added other tested models to the appendix

    Journal ref: A&A 677, A48 (2023)

  12. arXiv:2302.11370  [pdf, other

    cs.IR

    Recall, Robustness, and Lexicographic Evaluation

    Authors: Fernando Diaz, Bhaskar Mitra

    Abstract: Although originally developed to evaluate sets of items, recall is often used to evaluate rankings of items, including those produced by recommender, retrieval, and other machine learning systems. The application of recall without a formal evaluative motivation has led to criticism of recall as a vague or inappropriate measure. In light of this debate, we reflect on the measurement of recall in ra… ▽ More

    Submitted 8 March, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: Under review

  13. arXiv:2302.11360  [pdf, other

    cs.IR

    Commonality in Recommender Systems: Evaluating Recommender Systems to Enhance Cultural Citizenship

    Authors: Andres Ferraro, Gustavo Ferreira, Fernando Diaz, Georgina Born

    Abstract: Recommender systems have become the dominant means of curating cultural content, significantly influencing individual cultural experience. Since recommender systems tend to optimize for personalized user experience, they can overlook impacts on cultural experience in the aggregate. After demonstrating that existing metrics do not center culture, we introduce a new metric, commonality, that measure… ▽ More

    Submitted 22 February, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: extended version of "Measuring Commonality in Recommendation of Cultural Content: Recommender Systems to Enhance Cultural Citizenship", published at RecSys 2022

  14. arXiv:2301.03971  [pdf, other

    cs.CL

    Unsupervised Mandarin-Cantonese Machine Translation

    Authors: Megan Dare, Valentina Fajardo Diaz, Averie Ho Zoen So, Yifan Wang, Shibingfeng Zhang

    Abstract: Advancements in unsupervised machine translation have enabled the development of machine translation systems that can translate between languages for which there is not an abundance of parallel data available. We explored unsupervised machine translation between Mandarin Chinese and Cantonese. Despite the vast number of native speakers of Cantonese, there is still no large-scale corpus for the lan… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

  15. arXiv:2212.08038  [pdf, ps, other

    cs.CY

    Redefining Relationships in Music

    Authors: Christian Detweiler, Beth Coleman, Fernando Diaz, Lieke Dom, Chris Donahue, Jesse Engel, Cheng-Zhi Anna Huang, Larry James, Ethan Manilow, Amanda McCroskery, Kyle Pedersen, Pamela Peter-Agbia, Negar Rostamzadeh, Robert Thomas, Marco Zamarato, Ben Zevenbergen

    Abstract: AI tools increasingly shape how we discover, make and experience music. While these tools can have the potential to empower creativity, they may fundamentally redefine relationships between stakeholders, to the benefit of some and the detriment of others. In this position paper, we argue that these tools will fundamentally reshape our music culture, with profound effects (for better and for worse)… ▽ More

    Submitted 16 December, 2022; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Presented at Cultures in AI/AI in Culture workshop at NeurIPS 2022

  16. arXiv:2211.06348  [pdf, other

    cs.LG stat.ML

    Striving for data-model efficiency: Identifying data externalities on group performance

    Authors: Esther Rolf, Ben Packer, Alex Beutel, Fernando Diaz

    Abstract: Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance. In this work, we seek to better understand how we might characterize, detect, and design for data-model synergies. We focus on a particular type of data-model inefficiency, in which adding training data… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: 9 pages, 3 figures. Trustworthy and Socially Responsible Machine Learning (TSRML 2022) workshop co-located with NeurIPS 2022

  17. arXiv:2210.05145  [pdf, other

    cs.IR cs.CL

    Retrieval Augmentation for T5 Re-ranker using External Sources

    Authors: Kai Hui, Tao Chen, Zhen Qin, Honglei Zhuang, Fernando Diaz, Mike Bendersky, Don Metzler

    Abstract: Retrieval augmentation has shown promising improvements in different tasks. However, whether such augmentation can assist a large language model based re-ranker remains unclear. We investigate how to augment T5-based re-rankers using high-quality information retrieved from two external corpora -- a commercial web search engine and Wikipedia. We empirically demonstrate how retrieval augmentation ca… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  18. arXiv:2209.03904  [pdf, ps, other

    cs.LG cs.AI cs.CY

    Analyzing the Effect of Sampling in GNNs on Individual Fairness

    Authors: Rebecca Salganik, Fernando Diaz, Golnoosh Farnadi

    Abstract: Graph neural network (GNN) based methods have saturated the field of recommender systems. The gains of these systems have been significant, showing the advantages of interpreting data through a network structure. However, despite the noticeable benefits of using graph structures in recommendation tasks, this representational form has also bred new challenges which exacerbate the complexity of miti… ▽ More

    Submitted 9 September, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

  19. Measuring Commonality in Recommendation of Cultural Content: Recommender Systems to Enhance Cultural Citizenship

    Authors: Andres Ferraro, Gustavo Ferreira, Fernando Diaz, Georgina Born

    Abstract: Recommender systems have become the dominant means of curating cultural content, significantly influencing the nature of individual cultural experience. While the majority of research on recommender systems optimizes for personalized user experience, this paradigm does not capture the ways that recommender systems impact cultural experience in the aggregate, across populations of users. Although e… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: The 16th ACM Conference on Recommender Systems

  20. On Natural Language User Profiles for Transparent and Scrutable Recommendation

    Authors: Filip Radlinski, Krisztian Balog, Fernando Diaz, Lucas Dixon, Ben Wedin

    Abstract: Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22), 2022

  21. arXiv:2205.01230  [pdf, other

    cs.LG cs.CL cs.IR

    Retrieval-Enhanced Machine Learning

    Authors: Hamed Zamani, Fernando Diaz, Mostafa Dehghani, Donald Metzler, Michael Bendersky

    Abstract: Although information access systems have long supported people in accomplishing a wide range of tasks, we propose broadening the scope of users of information access systems to include task-driven machines, such as machine learning models. In this way, the core principles of indexing, representation, retrieval, and ranking can be applied and extended to substantially improve model generalization,… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: To appear in proceedings of ACM SIGIR 2022

  22. arXiv:2205.00048  [pdf, other

    cs.IR cs.AI cs.LG

    Joint Multisided Exposure Fairness for Recommendation

    Authors: Haolun Wu, Bhaskar Mitra, Chen Ma, Fernando Diaz, Xue Liu

    Abstract: Prior research on exposure fairness in the context of recommender systems has focused mostly on disparities in the exposure of individual or groups of items to individual users of the system. The problem of how individual or groups of items may be systemically under or over exposed to groups of users, or even all users, has received relatively less attention. However, such systemic disparities in… ▽ More

    Submitted 29 April, 2022; originally announced May 2022.

  23. arXiv:2204.11400  [pdf, other

    cs.IR

    Offline Retrieval Evaluation Without Evaluation Metrics

    Authors: Fernando Diaz, Andres Ferraro

    Abstract: Offline evaluation of information retrieval and recommendation has traditionally focused on distilling the quality of a ranking into a scalar metric such as average precision or normalized discounted cumulative gain. We can use this metric to compare the performance of multiple systems for the same request. Although evaluation metrics provide a convenient summary of system performance, they also c… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: to appear at SIGIR 2022

  24. arXiv:2110.07701  [pdf, other

    cs.IR cs.AI cs.LG

    Exposing Query Identification for Search Transparency

    Authors: Ruohan Li, Jianxiang Li, Bhaskar Mitra, Fernando Diaz, Asia J. Biega

    Abstract: Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying which queries expose a given piece of content in the ranking results is an important and relatively under-explored search transparency challenge. Ex… ▽ More

    Submitted 11 April, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

  25. arXiv:2108.05152  [pdf, other

    cs.IR cs.CY cs.LG

    Estimation of Fair Ranking Metrics with Incomplete Judgments

    Authors: Ömer Kırnap, Fernando Diaz, Asia Biega, Michael Ekstrand, Ben Carterette, Emine Yılmaz

    Abstract: There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individu… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Comments: Published in Proceedings of the Web Conference 2021 (WWW '21)

  26. arXiv:2108.05135  [pdf, other

    cs.IR cs.CY cs.LG

    Overview of the TREC 2020 Fair Ranking Track

    Authors: Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, Sergey Feldman, Sebastian Kohlmeier

    Abstract: This paper provides an overview of the NIST TREC 2020 Fair Ranking track. For 2020, we again adopted an academic search task, where we have a corpus of academic article abstracts and queries submitted to a production academic search engine. The central goal of the Fair Ranking track is to provide fair exposure to different groups of authors (a group fairness framing). We recognize that there may b… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Comments: Published in The Twenty-Ninth Text REtrieval Conference Proceedings (TREC 2020). arXiv admin note: substantial text overlap with arXiv:2003.11650

  27. arXiv:2107.08096  [pdf, other

    cs.LG cs.CY cs.IR

    Learning to Limit Data Collection via Scaling Laws: A Computational Interpretation for the Legal Principle of Data Minimization

    Authors: Divya Shanmugam, Samira Shabanian, Fernando Diaz, Michèle Finck, Asia Biega

    Abstract: Modern machine learning systems are increasingly characterized by extensive personal data collection, despite the diminishing returns and increasing societal costs of such practices. Yet, data minimisation is one of the core data protection principles enshrined in the European Union's General Data Protection Regulation ('GDPR') and requires that only personal data that is adequate, relevant and li… ▽ More

    Submitted 12 June, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: To appear at ACM Conference on Fairness, Accountability, and Transparency, 2022

  28. arXiv:2107.07002  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.IR

    The Benchmark Lottery

    Authors: Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

    Abstract: The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods. This paper proposes the notion of "a benchmark lottery" that describes the overall fragility of the ML benchmarking process. The benchmark lottery postulates that many factors, other than fundamental algorithmic superiority, may lead to a… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

  29. Fairness in Information Access Systems

    Authors: Michael D. Ekstrand, Anubrata Das, Robin Burke, Fernando Diaz

    Abstract: Recommendation, information retrieval, and other information access systems pose unique challenges for investigating and applying the fairness and non-discrimination concepts that have been developed for studying other machine learning systems. While fair information access shares many commonalities with fair classification, the multistakeholder nature of information access applications, the rank-… ▽ More

    Submitted 12 July, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

    ACM Class: H.3.3; K.4

    Journal ref: Foundations and Trends in Information Retrieval 16:1-2 - 2022

  30. arXiv:2105.02951  [pdf, other

    cs.IR

    Multi-FR: A Multi-objective Optimization Framework for Multi-stakeholder Fairness-aware Recommendation

    Authors: Haolun Wu, Chen Ma, Bhaskar Mitra, Fernando Diaz, Xue Liu

    Abstract: Nowadays, most online services are hosted on multi-stakeholder marketplaces, where consumers and producers may have different objectives. Conventional recommendation systems, however, mainly focus on maximizing consumers' satisfaction by recommending the most relevant items to each individual. This may result in unfair exposure of items, thus jeopardizing producer benefits. Additionally, they do n… ▽ More

    Submitted 9 August, 2022; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: 29 pages

  31. arXiv:2101.07124  [pdf, ps, other

    cs.IR cs.HC

    Tip of the Tongue Known-Item Retrieval: A Case Study in Movie Identification

    Authors: Jaime Arguello, Adam Ferguson, Emery Fine, Bhaskar Mitra, Hamed Zamani, Fernando Diaz

    Abstract: While current information retrieval systems are effective for known-item retrieval where the searcher provides a precise name or identifier for the item being sought, systems tend to be much less effective for cases where the searcher is unable to express a precise name or identifier. We refer to this as tip of the tongue (TOT) known-item retrieval, named after the cognitive state of not being abl… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  32. arXiv:2012.12712  [pdf, other

    eess.IV cs.CV

    Chest x-ray automated triage: a semiologic approach designed for clinical implementation, exploiting different types of labels through a combination of four Deep Learning architectures

    Authors: Candelaria Mosquera, Facundo Nahuel Diaz, Fernando Binder, Jose Martin Rabellino, Sonia Elizabeth Benitez, Alejandro Daniel Beresñak, Alberto Seehaus, Gabriel Ducrey, Jorge Alberto Ocantos, Daniel Roberto Luna

    Abstract: BACKGROUND AND OBJECTIVES: The multiple chest x-ray datasets released in the last years have ground-truth labels intended for different computer vision tasks, suggesting that performance in automated chest-xray interpretation might improve by using a method that can exploit diverse types of annotations. This work presents a Deep Learning method based on the late fusion of different convolutional a… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

  33. arXiv:2008.08871  [pdf

    eess.IV cs.CV cs.LG physics.med-ph

    Deep learning-based transformation of the H&E stain into special stains

    Authors: Kevin de Haan, Yijie Zhang, Jonathan E. Zuckerman, Tairan Liu, Anthony E. Sisk, Miguel F. P. Diaz, Kuang-Yu Jen, Alexander Nobori, Sofia Liou, Sarah Zhang, Rana Riahi, Yair Rivenson, W. Dean Wallace, Aydogan Ozcan

    Abstract: Pathology is practiced by visual inspection of histochemically stained slides. Most commonly, the hematoxylin and eosin (H&E) stain is used in the diagnostic workflow and it is the gold standard for cancer diagnosis. However, in many cases, especially for non-neoplastic diseases, additional "special stains" are used to provide different levels of contrast and color to tissue components and allow p… ▽ More

    Submitted 12 August, 2021; v1 submitted 20 August, 2020; originally announced August 2020.

    Comments: 27 Pages, 6 Figures

    MSC Class: 68T01; 68T05; 68U10; 62M45; 78M32; 92C50; 92C55; 94A08 ACM Class: I.2; I.2.1; I.2.6; I.2.10; I.3; I.3.3; I.4.3; I.4.4; I.4.9; J.3

    Journal ref: Nature Communications (2021)

  34. On the Social and Technical Challenges of Web Search Autosuggestion Moderation

    Authors: Timothy J. Hazen, Alexandra Olteanu, Gabriella Kazai, Fernando Diaz, Michael Golebiewski

    Abstract: Past research shows that users benefit from systems that support them in their writing and exploration tasks. The autosuggestion feature of Web search engines is an example of such a system: It helps users in formulating their queries by offering a list of suggestions as they type. Autosuggestions are typically generated by machine learning (ML) systems trained on a corpus of search logs and docum… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

    Comments: 17 Pages, 4 images displayed within 3 latex figures

    Journal ref: First Monday, Volume 27, Number 2, February 7, 2022

  35. arXiv:2006.00166  [pdf, other

    cs.IR

    Analyzing and Learning from User Interactions for Search Clarification

    Authors: Hamed Zamani, Bhaskar Mitra, Everest Chen, Gord Lueck, Fernando Diaz, Paul N. Bennett, Nick Craswell, Susan T. Dumais

    Abstract: Asking clarifying questions in response to search queries has been recognized as a useful technique for revealing the underlying intent of the query. Clarification has applications in retrieval systems with different interfaces, from the traditional web search interfaces to the limited bandwidth interfaces as in speech-only and small screen devices. Generation and evaluation of clarifying question… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: To appear in the Proceedings of SIGIR 2020

  36. arXiv:2005.13718  [pdf, other

    cs.CY cs.IR cs.LG

    Operationalizing the Legal Principle of Data Minimization for Personalization

    Authors: Asia J. Biega, Peter Potash, Hal Daumé III, Fernando Diaz, Michèle Finck

    Abstract: Article 5(1)(c) of the European Union's General Data Protection Regulation (GDPR) requires that "personal data shall be [...] adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed (`data minimisation')". To date, the legal and computational definitions of `purpose limitation' and `data minimization' remain largely unclear. In particular, the… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: SIGIR 2020 paper: In Proc. of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

  37. Evaluating Stochastic Rankings with Expected Exposure

    Authors: Fernando Diaz, Bhaskar Mitra, Michael D. Ekstrand, Asia J. Biega, Ben Carterette

    Abstract: We introduce the concept of \emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principl… ▽ More

    Submitted 20 October, 2020; v1 submitted 27 April, 2020; originally announced April 2020.

    Comments: In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20). Association for Computing Machinery, New York, NY, USA

  38. arXiv:2003.11650  [pdf, other

    cs.IR cs.DL cs.LG

    Overview of the TREC 2019 Fair Ranking Track

    Authors: Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, Sebastian Kohlmeier

    Abstract: The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers in addition to classic notions of relevance. As part of the benchmark, we defined standardized fairness metrics with evaluation protocols and released a dataset for the fair ranking problem. The 2019 task focused on reranking academic paper abstrac… ▽ More

    Submitted 25 March, 2020; originally announced March 2020.

    Comments: Published in The Twenty-Eighth Text REtrieval Conference Proceedings (TREC 2019)

  39. arXiv:1907.03693  [pdf, ps, other

    cs.IR cs.LG

    Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks

    Authors: Bhaskar Mitra, Corby Rosset, David Hawking, Nick Craswell, Fernando Diaz, Emine Yilmaz

    Abstract: Classical information retrieval (IR) methods, such as query likelihood and BM25, score documents independently w.r.t. each query term, and then accumulate the scores. Assuming query term independence allows precomputing term-document scores using these models---which can be combined with specialized data structures, such as inverted index, for efficient retrieval. Deep neural IR models, in contras… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.

  40. arXiv:1710.06871  [pdf

    cs.CY

    Promoting Saving for College Through Data Science

    Authors: Fernando Diaz, Natnaell Mammo

    Abstract: The cost of attending college has been steadily rising and in 10 years is estimated to reach $140,000 for a 4-year public university. Recent surveys estimate just over half of US families are saving for college. State-operated 529 college savings plans are an effective way for families to plan and save for future college costs, but only 3% of families currently use them. The Office of the Illinois… ▽ More

    Submitted 18 October, 2017; originally announced October 2017.

    Comments: Presented at the Data For Good Exchange 2017

  41. Auditing Search Engines for Differential Satisfaction Across Demographics

    Authors: Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, Emine Yilmaz

    Abstract: Many online services, such as search engines, social media platforms, and digital marketplaces, are advertised as being available to any user, regardless of their age, gender, or other demographic factors. However, there are growing concerns that these services may systematically underserve some groups of users. In this paper, we present a framework for internally auditing such services for differ… ▽ More

    Submitted 24 May, 2017; originally announced May 2017.

    Comments: 8 pages Accepted at WWW 2017

  42. arXiv:1702.05042  [pdf, ps, other

    cs.IR

    Luandri: a Clean Lua Interface to the Indri Search Engine

    Authors: Bhaskar Mitra, Fernando Diaz, Nick Craswell

    Abstract: In recent years, the information retrieval (IR) community has witnessed the first successful applications of deep neural network models to short-text matching and ad-hoc retrieval. It is exciting to see the research on deep neural networks and IR converge on these tasks of shared interest. However, the two communities have less in common when it comes to the choice of programming languages. Indri,… ▽ More

    Submitted 16 February, 2017; originally announced February 2017.

    Comments: Under review for SIGIR'17

  43. arXiv:1610.08136  [pdf, other

    cs.IR

    Learning to Match Using Local and Distributed Representations of Text for Web Search

    Authors: Bhaskar Mitra, Fernando Diaz, Nick Craswell

    Abstract: Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text.… ▽ More

    Submitted 25 October, 2016; originally announced October 2016.

  44. arXiv:1609.02075  [pdf, other

    cs.CL cs.SI physics.soc-ph

    The Social Dynamics of Language Change in Online Networks

    Authors: Rahul Goel, Sandeep Soni, Naman Goyal, John Paparrizos, Hanna Wallach, Fernando Diaz, Jacob Eisenstein

    Abstract: Language change is a complex social phenomenon, revealing pathways of communication and sociocultural influence. But, while language change has long been a topic of study in sociolinguistics, traditional linguistic research methods rely on circumstantial evidence, estimating the direction of change from differences between older and younger speakers. In this paper, we use a data set of several mil… ▽ More

    Submitted 7 September, 2016; originally announced September 2016.

    Comments: This paper appears in the Proceedings of the International Conference on Social Informatics (SocInfo16). The final publication is available at springer.com

    ACM Class: I.2.7; J.4; J.5

  45. arXiv:1605.07891  [pdf, other

    cs.IR cs.CL

    Query Expansion with Locally-Trained Word Embeddings

    Authors: Fernando Diaz, Bhaskar Mitra, Nick Craswell

    Abstract: Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships. We study the use of term relatedness in the context of query expansion for ad hoc information retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when trained globally,… ▽ More

    Submitted 22 June, 2016; v1 submitted 25 May, 2016; originally announced May 2016.

  46. arXiv:1605.03664  [pdf, other

    cs.CL

    Real-Time Web Scale Event Summarization Using Sequential Decision Making

    Authors: Chris Kedzie, Fernando Diaz, Kathleen McKeown

    Abstract: We present a system based on sequential decision making for the online summarization of massive document streams, such as those found on the web. Given an event of interest (e.g. "Boston marathon bombing"), our system is able to filter the stream for relevance and produce a series of short text updates describing the event as it unfolds over time. Unlike previous work, our approach is able to join… ▽ More

    Submitted 11 May, 2016; originally announced May 2016.

    Comments: in Proceedings of the 25th International Joint Conference on Artificial Intelligence 2016

  47. arXiv:1604.04146  [pdf, other

    cs.NE cs.AI math.OC

    A Discrete Firefly Algorithm to Solve a Rich Vehicle Routing Problem Modelling a Newspaper Distribution System with Recycling Policy

    Authors: E. Osaba, Xin-She Yang, F. Diaz, E. Onieva, A. D. Masegosa, A. Perallos

    Abstract: A real-world newspaper distribution problem with recycling policy is tackled in this work. In order to meet all the complex restrictions contained in such a problem, it has been modeled as a rich vehicle routing problem, which can be more specifically considered as an asymmetric and clustered vehicle routing problem with simultaneous pickup and deliveries, variable costs and forbidden paths (AC-VR… ▽ More

    Submitted 14 April, 2016; originally announced April 2016.

    Comments: 7 tables and 4 figures

    MSC Class: 78M32

  48. An Improved Discrete Bat Algorithm for Symmetric and Asymmetric Traveling Salesman Problems

    Authors: Eneko Osaba, Xin-She Yang, Fernando Diaz, Pedro Lopez-Garcia, Roberto Carballedo

    Abstract: Bat algorithm is a population metaheuristic proposed in 2010 which is based on the echolocation or bio-sonar characteristics of microbats. Since its first implementation, the bat algorithm has been used in a wide range of fields. In this paper, we present a discrete version of the bat algorithm to solve the well-known symmetric and asymmetric traveling salesman problems. In addition, we propose an… ▽ More

    Submitted 14 April, 2016; originally announced April 2016.

    Comments: 1 figure, 8 tables

    MSC Class: 78M32

    Journal ref: Engineering Applications of Artificial Intelligence, 48 (1), 59-71 (2016)

  49. arXiv:1603.04119  [pdf, other

    cs.AI cs.LG stat.ML

    Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

    Authors: David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire

    Abstract: High-dimensional observations and complex real-world dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals. And second, we propose an exploration strateg… ▽ More

    Submitted 13 March, 2016; originally announced March 2016.

  50. arXiv:1507.03928  [pdf, other

    cs.IR

    Pseudo-Query Reformulation

    Authors: Fernando Diaz

    Abstract: Automatic query reformulation refers to rewriting a user's original query in order to improve the ranking of retrieval results compared to the original query. We present a general framework for automatic query reformulation based on discrete optimization. Our approach, referred to as pseudo-query reformulation, treats automatic query reformulation as a search problem over the graph of unweighted q… ▽ More

    Submitted 14 July, 2015; originally announced July 2015.