Zum Hauptinhalt springen

Showing 1–50 of 66 results for author: Balog, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.08379  [pdf, other

    cs.CL cs.IR cs.LG

    Towards Realistic Synthetic User-Generated Content: A Scaffolding Approach to Generating Online Discussions

    Authors: Krisztian Balog, John Palowitch, Barbara Ikica, Filip Radlinski, Hamidreza Alvari, Mehdi Manshadi

    Abstract: The emergence of synthetic data represents a pivotal shift in modern machine learning, offering a solution to satisfy the need for large volumes of data in domains where real data is scarce, highly private, or difficult to obtain. We investigate the feasibility of creating realistic, large-scale synthetic datasets of user-generated content, noting that such content is increasingly prevalent and a… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  2. Towards a Formal Characterization of User Simulation Objectives in Conversational Information Access

    Authors: Nolwenn Bernard, Krisztian Balog

    Abstract: User simulation is a promising approach for automatically training and evaluating conversational information access agents, enabling the generation of synthetic dialogues and facilitating reproducible experiments at scale. However, the objectives of user simulation for the different uses remain loosely defined, hindering the development of effective simulators. In this work, we formally characteri… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Proceedings of the 2024 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR '24), July 13, 2024, Washington DC, DC, USA

  3. arXiv:2406.18960  [pdf, ps, other

    cs.IR

    A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage Retrieval

    Authors: Ivica Kostric, Krisztian Balog

    Abstract: Conversational passage retrieval is challenging as it often requires the resolution of references to previous utterances and needs to deal with the complexities of natural language, such as coreference and ellipsis. To address these challenges, pre-trained sequence-to-sequence neural query rewriters are commonly used to generate a single de-contextualized query based on conversation history. Previ… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

  4. Identifying Breakdowns in Conversational Recommender Systems using User Simulation

    Authors: Nolwenn Bernard, Krisztian Balog

    Abstract: We present a methodology to systematically test conversational recommender systems with regards to conversational breakdowns. It involves examining conversations generated between the system and simulated users for a set of pre-defined breakdown types, extracting responsible conversational paths, and characterizing them in terms of the underlying dialogue intents. User simulation offers the advant… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ACM Conversational User Interfaces 2024 (CUI '24), July 8--10, 2024, Luxembourg, Luxembourg

  5. Dataset and Models for Item Recommendation Using Multi-Modal User Interactions

    Authors: Simone Borg Bruun, Krisztian Balog, Maria Maistro

    Abstract: While recommender systems with multi-modal item representations (image, audio, and text), have been widely explored, learning recommendations from multi-modal user interactions (e.g., clicks and speech) remains an open problem. We study the case of multi-modal user interactions in a setting where users engage with a service provider through multiple channels (website and call center). In such case… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  6. Explainability for Transparent Conversational Information-Seeking

    Authors: Weronika Łajewska, Damiano Spina, Johanne Trippas, Krisztian Balog

    Abstract: The increasing reliance on digital information necessitates advancements in conversational search systems, particularly in terms of information transparency. While prior research in conversational information-seeking has concentrated on improving retrieval techniques, the challenge remains in generating responses useful from a user perspective. This study explores different methods of explaining t… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: This is the author's version of the work. The definitive version is published in: 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24), July 14-18, 2024, Washington, DC, USA

  7. Towards Self-Contained Answers: Entity-Based Answer Rewriting in Conversational Search

    Authors: Ivan Sekulić, Krisztian Balog, Fabio Crestani

    Abstract: Conversational information-seeking (CIS) is an emerging paradigm for knowledge acquisition and exploratory search. Traditional web search interfaces enable easy exploration of entities, but this is limited in conversational settings due to the limited-bandwidth interface. This paper explore ways to rewrite answers in CIS, so that users can understand them without having to resort to external servi… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  8. IAI MovieBot 2.0: An Enhanced Research Platform with Trainable Neural Components and Transparent User Modeling

    Authors: Nolwenn Bernard, Ivica Kostric, Krisztian Balog

    Abstract: While interest in conversational recommender systems has been on the rise, operational systems suitable for serving as research platforms for comprehensive studies are currently lacking. This paper introduces an enhanced version of the IAI MovieBot conversational movie recommender system, aiming to evolve it into a robust and adaptable platform for conducting user-facing experiments. The key highl… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Proceedings of the 17th ACM International Conference on Web Search and Data Mining (WSDM '24), March 4--8, 2024, Merida, Mexico

  9. arXiv:2402.07540  [pdf, other

    cs.HC cs.AI cs.CL

    PKG API: A Tool for Personal Knowledge Graph Management

    Authors: Nolwenn Bernard, Ivica Kostric, Weronika Łajewska, Krisztian Balog, Petra Galuščáková, Vinay Setty, Martin G. Skjæveland

    Abstract: Personal knowledge graphs (PKGs) offer individuals a way to store and consolidate their fragmented personal data in a central place, improving service personalization while maintaining full user control. Despite their potential, practical PKG implementations with user-friendly interfaces remain scarce. This work addresses this gap by proposing a complete solution to represent, manage, and interfac… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  10. arXiv:2401.11463  [pdf, ps, other

    cs.IR cs.CL

    Estimating the Usefulness of Clarifying Questions and Answers for Conversational Search

    Authors: Ivan Sekulić, Weronika Łajewska, Krisztian Balog, Fabio Crestani

    Abstract: While the body of research directed towards constructing and generating clarifying questions in mixed-initiative conversational search systems is vast, research aimed at processing and comprehending users' answers to such questions is scarce. To this end, we present a simple yet effective method for processing answers to clarifying questions, moving away from previous work that simply appends answ… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: This is the author's version of the work. The definitive version is published in: Proceedings of the 46th European Conference on Information Retrieval (ECIR '24), March 24-28, 2024, Glasgow, Scotland

  11. arXiv:2401.11452  [pdf, other

    cs.IR cs.CL

    Towards Reliable and Factual Response Generation: Detecting Unanswerable Questions in Information-Seeking Conversations

    Authors: Weronika Łajewska, Krisztian Balog

    Abstract: Generative AI models face the challenge of hallucinations that can undermine users' trust in such systems. We approach the problem of conversational information seeking as a two-step process, where relevant passages in a corpus are identified first and then summarized into a final system response. This way we can automatically assess if the answer to the user's question is present in the corpus. S… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: This is the author's version of the work. The definitive version is published in: Proceedings of the 46th European Conference on Information Retrieval} (ECIR '24), March 24--28, 2024, Glasgow, Scotland

  12. Towards Filling the Gap in Conversational Search: From Passage Retrieval to Conversational Response Generation

    Authors: Weronika Łajewska, Krisztian Balog

    Abstract: Research on conversational search has so far mostly focused on query rewriting and multi-stage passage retrieval. However, synthesizing the top retrieved passages into a complete, relevant, and concise response is still an open challenge. Having snippet-level annotations of relevant passages would enable both (1) the training of response generation models that are able to ground answers in actual… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Extended version of the paper that appeared in the Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM '23)

  13. arXiv:2307.14225  [pdf, ps, other

    cs.IR cs.LG

    Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences

    Authors: Scott Sanner, Krisztian Balog, Filip Radlinski, Ben Wedin, Lucas Dixon

    Abstract: Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, modern dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendati… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: To appear at RecSys'23

  14. arXiv:2306.08550  [pdf, other

    cs.HC cs.AI cs.IR

    User Simulation for Evaluating Information Access Systems

    Authors: Krisztian Balog, ChengXiang Zhai

    Abstract: Information access systems, such as search engines, recommender systems, and conversational assistants, have become integral to our daily lives as they help us satisfy our information needs. However, evaluating the effectiveness of these systems presents a long-standing and complex scientific challenge. This challenge is rooted in the difficulty of assessing a system's overall effectiveness in ass… ▽ More

    Submitted 23 May, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: v1: initial draft; v2: final version to appear in Foundations and Trends in Information Retrieval

  15. MG-ShopDial: A Multi-Goal Conversational Dataset for e-Commerce

    Authors: Nolwenn Bernard, Krisztian Balog

    Abstract: Conversational systems can be particularly effective in supporting complex information seeking scenarios with evolving information needs. Finding the right products on an e-commerce platform is one such scenario, where a conversational agent would need to be able to provide search capabilities over the item catalog, understand and make recommendations based on the user's preferences, and answer a… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23), July 23--27, 2023, Taipei, Taiwan

  16. An Ecosystem for Personal Knowledge Graphs: A Survey and Research Roadmap

    Authors: Martin G. Skjæveland, Krisztian Balog, Nolwenn Bernard, Weronika Łajewska, Trond Linjordet

    Abstract: This paper presents an ecosystem for personal knowledge graphs (PKGs), commonly defined as resources of structured information about entities related to an individual, their attributes, and the relations between them. PKGs are a key enabler of secure and sophisticated personal data management and personalized services. However, there are challenges that need to be addressed before PKGs can achieve… ▽ More

    Submitted 15 March, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: Published in AI Open, 2024

    Journal ref: An Ecosystem for Personal Knowledge Graphs: A Survey and Research Roadmap, M. G. Skjæveland, K. Balog, N. Bernard, W. Łajewska, and T. Linjordet. In: AI Open, 5:55-69, 2024

  17. Measuring the Impact of Explanation Bias: A Study of Natural Language Justifications for Recommender Systems

    Authors: Krisztian Balog, Filip Radlinski, Andrey Petrov

    Abstract: Despite the potential impact of explanations on decision making, there is a lack of research on quantifying their effect on users' choices. This paper presents an experimental protocol for measuring the degree to which positively or negatively biased explanations can lead to users choosing suboptimal recommendations. Key elements of this protocol include a preference elicitation stage to allow for… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA '23), 2023

  18. Beyond Single Items: Exploring User Preferences in Item Sets with the Conversational Playlist Curation Dataset

    Authors: Arun Tejasvi Chaganty, Megan Leszczynski, Shu Zhang, Ravi Ganti, Krisztian Balog, Filip Radlinski

    Abstract: Users in consumption domains, like music, are often able to more efficiently provide preferences over a set of items (e.g. a playlist or radio) than over single items (e.g. songs). Unfortunately, this is an underexplored area of research, with most existing recommendation systems limited to understanding preferences over single items. Curating an item set exponentiates the search space that recomm… ▽ More

    Submitted 5 May, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: Appearing in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

  19. arXiv:2301.11489  [pdf, other

    cs.IR cs.CL

    Talk the Walk: Synthetic Data Generation for Conversational Music Recommendation

    Authors: Megan Leszczynski, Shu Zhang, Ravi Ganti, Krisztian Balog, Filip Radlinski, Fernando Pereira, Arun Tejasvi Chaganty

    Abstract: Recommender systems are ubiquitous yet often difficult for users to control, and adjust if recommendation quality is poor. This has motivated conversational recommender systems (CRSs), with control provided through natural language feedback. However, as with most application domains, building robust CRSs requires training data that reflects system usage$\unicode{x2014}$here conversations with user… ▽ More

    Submitted 17 November, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

  20. arXiv:2301.10493  [pdf, other

    cs.IR

    From Baseline to Top Performer: A Reproducibility Study of Approaches at the TREC 2021 Conversational Assistance Track

    Authors: Weronika Lajewska, Krisztian Balog

    Abstract: This paper reports on an effort of reproducing the organizers' baseline as well as the top performing participant submission at the 2021 edition of the TREC Conversational Assistance track. TREC systems are commonly regarded as reference points for effectiveness comparison. Yet, the papers accompanying them have less strict requirements than peer-reviewed publications, which can make reproducibili… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

  21. UserSimCRS: A User Simulation Toolkit for Evaluating Conversational Recommender Systems

    Authors: Jafar Afzali, Aleksander Mark Drzewiecki, Krisztian Balog, Shuo Zhang

    Abstract: We present an extensible user simulation toolkit to facilitate automatic evaluation of conversational recommender systems. It builds on an established agenda-based approach and extends it with several novel elements, including user satisfaction prediction, persona and context modeling, and conditional natural language generation. We showcase the toolkit with a pre-existing movie recommender system… ▽ More

    Submitted 24 January, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

    Comments: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

  22. DAGFiNN: A Conversational Conference Assistant

    Authors: Ivica Kostric, Krisztian Balog, Tølløv Alexander Aresvik, Nolwenn Bernard, Eyvinn Thu Dørheim, Pholit Hantula, Sander Havn-Sørensen, Rune Henriksen, Hengameh Hosseini, Ekaterina Khlybova, Weronika Lajewska, Sindre Ekrheim Mosand, Narmin Orujova

    Abstract: DAGFiNN is a conversational conference assistant that can be made available for a given conference both as a chatbot on the website and as a Furhat robot physically exhibited at the conference venue. Conference participants can interact with the assistant to get advice on various questions, ranging from where to eat in the city or how to get to the airport to which sessions we recommend them to at… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  23. Would You Ask it that Way? Measuring and Improving Question Naturalness for Knowledge Graph Question Answering

    Authors: Trond Linjordet, Krisztian Balog

    Abstract: Knowledge graph question answering (KGQA) facilitates information access by leveraging structured data without requiring formal query language expertise from the user. Instead, users can express their information needs by simply asking their questions in natural language (NL). Datasets used to train KGQA models that would provide such a service are expensive to construct, both in terms of expert a… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: 9 pages, 3 figures. Accepted for publication as a resource paper in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22), July 11-15, 2022, Madrid, Spain. For test collection, see https://github.com/iai-group/IQN-KGQA

  24. On Natural Language User Profiles for Transparent and Scrutable Recommendation

    Authors: Filip Radlinski, Krisztian Balog, Fernando Diaz, Lucas Dixon, Ben Wedin

    Abstract: Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22), 2022

  25. Analyzing and Simulating User Utterance Reformulation in Conversational Recommender Systems

    Authors: Shuo Zhang, Mu-Chun Wang, Krisztian Balog

    Abstract: User simulation has been a cost-effective technique for evaluating conversational recommender systems. However, building a human-like simulator is still an open challenge. In this work, we focus on how users reformulate their utterances when a conversational agent fails to understand them. First, we perform a user study, involving five conversational agents across different domains, to identify co… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

  26. arXiv:2201.11030  [pdf, other

    cs.DL

    Diverse Reviewer Suggestion for Extending Conference Program Committees

    Authors: Christin Katharina Kreutz, Krisztian Balog, Ralf Schenkel

    Abstract: Automated reviewer recommendation for scientific conferences currently relies on the assumption that the program committee has the necessary expertise to handle all submissions. However, topical discrepancies between received submissions and reviewer candidates might lead to unreliable reviews or overburdening of reviewers, and may result in the rejection of high-quality papers. In this work, we p… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

  27. arXiv:2111.13463  [pdf, other

    cs.IR cs.AI cs.CL

    Soliciting User Preferences in Conversational Recommender Systems via Usage-related Questions

    Authors: Ivica Kostric, Krisztian Balog, Filip Radlinski

    Abstract: A key distinguishing feature of conversational recommender systems over traditional recommender systems is their ability to elicit user preferences using natural language. Currently, the predominant approach to preference elicitation is to ask questions directly about items or item attributes. These strategies do not perform well in cases where the user does not have sufficient knowledge of the ta… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

    Comments: Proceedings of ACM Conference on Recommender Systems (RecSys '21)

    Journal ref: Proceedings of the 15th ACM Conference on Recommender Systems, RecSys '21, 2021, pp. 724-729

  28. arXiv:2109.06714  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Semantic Answer Type Prediction using BERT: IAI at the ISWC SMART Task 2020

    Authors: Vinay Setty, Krisztian Balog

    Abstract: This paper summarizes our participation in the SMART Task of the ISWC 2020 Challenge. A particular question we are interested in answering is how well neural methods, and specifically transformer models, such as BERT, perform on the answer type prediction task compared to traditional approaches. Our main finding is that coarse-grained answer types can be identified effectively with standard text c… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Published in Proceedings of the SeMantic AnsweR Type prediction task (SMART) at ISWC 2020 Semantic Web Challenge co-located with the 19th International Semantic Web Conference (ISWC 2020). http://ceur-ws.org/Vol-2774/paper-02.pdf

    Journal ref: SMART@ISWC 2020: 10-18

  29. POINTREC: A Test Collection for Narrative-driven Point of Interest Recommendation

    Authors: Jafar Afzali, Aleksander Mark Drzewiecki, Krisztian Balog

    Abstract: This paper presents a test collection for contextual point of interest (POI) recommendation in a narrative-driven scenario. There, user history is not available, instead, user requests are described in natural language. The requests in our collection are manually collected from social sharing websites, and are annotated with various types of metadata, including location, categories, constraints, a… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21), 2021

  30. On Interpretation and Measurement of Soft Attributes for Recommendation

    Authors: Krisztian Balog, Filip Radlinski, Alexandros Karatzoglou

    Abstract: We address how to robustly interpret natural language refinements (or critiques) in recommender systems. In particular, in human-human recommendation settings people frequently use soft attributes to express preferences about items, including concepts like the originality of a movie plot, the noisiness of a venue, or the complexity of a recipe. While binary tagging is extensively studied in the co… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21), 2021

  31. Semantic Table Retrieval using Keyword and Table Queries

    Authors: Shuo Zhang, Krisztian Balog

    Abstract: Tables on the Web contain a vast amount of knowledge in a structured form. To tap into this valuable resource, we address the problem of table retrieval: answering an information need with a ranked list of tables. We investigate this problem in two different variants, based on how the information need is expressed: as a keyword query or as an existing table ("query-by-table"). The main novel contr… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

    Comments: ACM Transactions on the Web (TWEB). arXiv admin note: substantial text overlap with arXiv:1802.06159

  32. Conversational Entity Linking: Problem Definition and Datasets

    Authors: Hideaki Joko, Faegheh Hasibi, Krisztian Balog, Arjen P. de Vries

    Abstract: Machine understanding of user utterances in conversational systems is of utmost importance for enabling engaging and meaningful conversations with users. Entity Linking (EL) is one of the means of text understanding, with proven efficacy for various downstream tasks in information retrieval. In this paper, we study entity linking for conversational systems. To develop a better understanding of wha… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    ACM Class: H.3

  33. Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems

    Authors: Weiwei Sun, Shuo Zhang, Krisztian Balog, Zhaochun Ren, Pengjie Ren, Zhumin Chen, Maarten de Rijke

    Abstract: Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic evaluation. To help build a human-like user simulator that can measure the quality of a dialogue, we propose the following task: simulating user satisfacti… ▽ More

    Submitted 8 May, 2021; originally announced May 2021.

    Comments: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21), 2021

  34. ArXivDigest: A Living Lab for Personalized Scientific Literature Recommendation

    Authors: Kristian Gingstad, Øyvind Jekteberg, Krisztian Balog

    Abstract: Providing personalized recommendations that are also accompanied by explanations as to why an item is recommended is a research area of growing importance. At the same time, progress is limited by the availability of open evaluation resources. In this work, we address the task of scientific literature recommendation. We present arXivDigest, which is an online service providing personalized arXiv r… ▽ More

    Submitted 24 September, 2020; originally announced September 2020.

    Comments: Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM'20), Oct 2020

  35. Sanitizing Synthetic Training Data Generation for Question Answering over Knowledge Graphs

    Authors: Trond Linjordet, Krisztian Balog

    Abstract: Synthetic data generation is important to training and evaluating neural models for question answering over knowledge graphs. The quality of the data and the partitioning of the datasets into training, validation and test splits impact the performance of the models trained on this data. If the synthetic data generation depends on templates, as is the predominant approach for this task, there may b… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

    Comments: Proceedings of the 2020 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR '20), 2020. 6 pages, 3 figures

  36. IAI MovieBot: A Conversational Movie Recommender System

    Authors: Javeria Habib, Shuo Zhang, Krisztian Balog

    Abstract: Conversational recommender systems support users in accomplishing recommendation-related goals via multi-turn conversations. To better model dynamically changing user preferences and provide the community with a reusable development framework, we introduce IAI MovieBot, a conversational recommender system for movies. It features a task-specific dialogue flow, a multi-modal chat interface, and an e… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

    Comments: Proceedings of the 29th ACM International Conference on Information and Knowledge Management, Oct 2020

  37. Generating Categories for Sets of Entities

    Authors: Shuo Zhang, Krisztian Balog, Jamie Callan

    Abstract: Category systems are central components of knowledge bases, as they provide a hierarchical grouping of semantically related concepts and entities. They are a unique and valuable resource that is utilized in a broad range of information access tasks. To aid knowledge editors in the manual process of expanding a category system, this paper presents a method of generating categories for sets of entit… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

    Comments: Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM '20)

  38. Evaluating Conversational Recommender Systems via User Simulation

    Authors: Shuo Zhang, Krisztian Balog

    Abstract: Conversational information access is an emerging research area. Currently, human evaluation is used for end-to-end system evaluation, which is both very time and resource intensive at scale, and thus becomes a bottleneck of progress. As an alternative, we propose automated evaluation by means of simulating users. Our user simulator aims to generate responses that a real human would give by conside… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20), 2020

  39. REL: An Entity Linker Standing on the Shoulders of Giants

    Authors: Johannes M. van Hulst, Faegheh Hasibi, Koen Dercksen, Krisztian Balog, Arjen P. de Vries

    Abstract: Entity linking is a standard component in modern retrieval system that is often performed by third-party toolkits. Despite the plethora of open source options, it is difficult to find a single system that has a modular architecture where certain components may be replaced, does not depend on external sources, can easily be updated to newer Wikipedia versions, and, most important of all, has state-… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    ACM Class: H.3

  40. Summarizing and Exploring Tabular Data in Conversational Search

    Authors: Shuo Zhang, Zhuyun Dai, Krisztian Balog, Jamie Callan

    Abstract: Tabular data provide answers to a significant portion of search queries. However, reciting an entire result table is impractical in conversational search systems. We propose to generate natural language summaries as answers to describe the complex information contained in a table. Through crowdsourcing experiments, we build a new conversation-oriented, open-domain table summarization dataset. It i… ▽ More

    Submitted 10 July, 2020; v1 submitted 23 May, 2020; originally announced May 2020.

    Comments: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), 2020

  41. arXiv:2002.00207  [pdf, other

    cs.IR

    Web Table Extraction, Retrieval and Augmentation: A Survey

    Authors: Shuo Zhang, Krisztian Balog

    Abstract: Tables are a powerful and popular tool for organizing and manipulating data. A vast number of tables can be found on the Web, which represents a valuable knowledge resource. The objective of this survey is to synthesize and present two decades of research on web tables. In particular, we organize existing literature into six main categories of information access tasks: table extraction, table inte… ▽ More

    Submitted 5 February, 2020; v1 submitted 1 February, 2020; originally announced February 2020.

    Comments: ACM Transactions on Intelligent Systems and Technology. 11(2): Article 13, January 2020

  42. Novel Entity Discovery from Web Tables

    Authors: Shuo Zhang, Edgar Meij, Krisztian Balog, Ridho Reinanda

    Abstract: When working with any sort of knowledge base (KB) one has to make sure it is as complete and also as up-to-date as possible. Both tasks are non-trivial as they require recall-oriented efforts to determine which entities and relationships are missing from the KB. As such they require a significant amount of labor. Tables on the Web, on the other hand, are abundant and have the distinct potential to… ▽ More

    Submitted 1 February, 2020; originally announced February 2020.

    Comments: Proceedings of The Web Conference 2020 (WWW '20), 2020

  43. arXiv:2001.06910  [pdf, ps, other

    cs.IR

    Common Conversational Community Prototype: Scholarly Conversational Assistant

    Authors: Krisztian Balog, Lucie Flekova, Matthias Hagen, Rosie Jones, Martin Potthast, Filip Radlinski, Mark Sanderson, Svitlana Vakulenko, Hamed Zamani

    Abstract: This paper discusses the potential for creating academic resources (tools, data, and evaluation approaches) to support research in conversational search, by focusing on realistic information needs and conversational interactions. Specifically, we propose to develop and operate a prototype conversational search system for scholarly activities. This Scholarly Conversational Assistant would serve as… ▽ More

    Submitted 19 January, 2020; originally announced January 2020.

  44. Auto-completion for Data Cells in Relational Tables

    Authors: Shuo Zhang, Krisztian Balog

    Abstract: We address the task of auto-completing data cells in relational tables. Such tables describe entities (in rows) with their attributes (in columns). We present the CellAutoComplete framework to tackle several novel aspects of this problem, including: (i) enabling a cell to have multiple, possibly conflicting values, (ii) supplementing the predicted values with supporting evidence, (iii) combining e… ▽ More

    Submitted 5 February, 2020; v1 submitted 8 September, 2019; originally announced September 2019.

    Comments: In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM '19), 2019

  45. arXiv:1908.01798  [pdf, other

    cs.IR cs.AI cs.CL

    Unsupervised Context Retrieval for Long-tail Entities

    Authors: Darío Garigliotti, Dyaa Albakour, Miguel Martinez, Krisztian Balog

    Abstract: Monitoring entities in media streams often relies on rich entity representations, like structured information available in a knowledge base (KB). For long-tail entities, such monitoring is highly challenging, due to their limited, if not entirely missing, representation in the reference KB. In this paper, we address the problem of retrieving textual contexts for monitoring long-tail entities. We p… ▽ More

    Submitted 5 August, 2019; originally announced August 2019.

    Comments: Proceedings of the 2019 ACM International Conference on Theory of Information Retrieval (ICTIR' 19)

  46. arXiv:1907.03595  [pdf, other

    cs.IR

    Recommending Related Tables

    Authors: Shuo Zhang, Krisztian Balog

    Abstract: Tables are an extremely powerful visual and interactive tool for structuring and manipulating data, making spreadsheet programs one of the most popular computer applications. In this paper we introduce and address the task of recommending related tables: given an input table, identifying and returning a ranked list of relevant tables. One of the many possible application scenarios for this task is… ▽ More

    Submitted 25 July, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

  47. arXiv:1907.03007  [pdf, other

    cs.IR cs.AI cs.CL

    NeuType: A Simple and Effective Neural Network Approach for Predicting Missing Entity Type Information in Knowledge Bases

    Authors: Jon Arne Bø Hovda, Darío Garigliotti, Krisztian Balog

    Abstract: Knowledge bases store information about the semantic types of entities, which can be utilized in a range of information access tasks. This information, however, is often incomplete, due to new entities emerging on a daily basis. We address the task of automatically assigning types to entities in a knowledge base from a type taxonomy. Specifically, we present two neural network architectures, which… ▽ More

    Submitted 5 July, 2019; originally announced July 2019.

  48. arXiv:1906.00041  [pdf, other

    cs.IR cs.CL cs.LG

    Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval

    Authors: Li Deng, Shuo Zhang, Krisztian Balog

    Abstract: Tables contain valuable knowledge in a structured form. We employ neural language modeling approaches to embed tabular data into vector spaces. Specifically, we consider different table elements, such caption, column headings, and cells, for training word and entity embeddings. These embeddings are then utilized in three particular table-related tasks, row population, column population, and table… ▽ More

    Submitted 31 May, 2019; originally announced June 2019.

    Comments: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '19), 2019

  49. arXiv:1901.10496  [pdf, other

    cs.IR cs.CL

    Impact of Training Dataset Size on Neural Answer Selection Models

    Authors: Trond Linjordet, Krisztian Balog

    Abstract: It is held as a truism that deep neural networks require large datasets to train effective models. However, large datasets, especially with high-quality labels, can be expensive to obtain. This study sets out to investigate (i) how large a dataset must be to train well-performing models, and (ii) what impact can be shown from fractional changes to the dataset size. A practical method to investigat… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.

    Comments: 7 pages, 2 figures

  50. Identifying Unclear Questions in Community Question Answering Websites

    Authors: Jan Trienes, Krisztian Balog

    Abstract: Thousands of complex natural language questions are submitted to community question answering websites on a daily basis, rendering them as one of the most important information sources these days. However, oftentimes submitted questions are unclear and cannot be answered without further clarification questions by expert community members. This study is the first to investigate the complex task of… ▽ More

    Submitted 18 January, 2019; originally announced January 2019.

    Comments: Proceedings of the 41th European Conference on Information Retrieval (ECIR '19), 2019