Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: Pavlopoulos, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07779  [pdf, other

    cs.DL

    A New Framework for Error Analysis in Computational Paleographic Dating of Greek Papyri

    Authors: Giuseppe De Gregorio, Lavinia Ferretti, Rodrigo C. G. Pena, Isabelle Marthot-Santaniello, Maria Konstantinidou, John Pavlopoulos

    Abstract: The study of Greek papyri from ancient Egypt is fundamental for understanding Graeco-Roman Antiquity, offering insights into various aspects of ancient culture and textual production. Palaeography, traditionally used for dating these manuscripts, relies on identifying chronologically relevant features in handwriting styles yet lacks a unified methodology, resulting in subjective interpretations an… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  2. arXiv:2407.14487  [pdf, other

    cs.CL

    Evaluating the Reliability of Self-Explanations in Large Language Models

    Authors: Korbinian Randl, John Pavlopoulos, Aron Henriksson, Tony Lindgren

    Abstract: This paper investigates the reliability of explanations generated by large language models (LLMs) when prompted to explain their previous output. We evaluate two kinds of such self-explanations - extractive and counterfactual - using three state-of-the-art LLMs (2B to 8B parameters) on two different classification tasks (objective and subjective). Our findings reveal, that, while these self-explan… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Not peer-reviewed. Under review at Discovery Science 2024

  3. arXiv:2407.09861  [pdf, other

    cs.CL cs.AI

    Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP

    Authors: Juli Bakagianni, Kanella Pouli, Maria Gavriilidou, John Pavlopoulos

    Abstract: Natural Language Processing (NLP) research has traditionally been predominantly focused on English, driven by the availability of resources, the size of the research community, and market demands. Recently, there has been a noticeable shift towards multilingualism in NLP, recognizing the need for inclusivity and effectiveness across diverse languages and cultures. Monolingual surveys have the pote… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 68 pages

  4. arXiv:2406.14164  [pdf, other

    cs.AI cs.CL

    A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning

    Authors: Panagiotis Kaliosis, John Pavlopoulos, Foivos Charalampakos, Georgios Moschovis, Ion Androutsopoulos

    Abstract: Diagnostic Captioning (DC) automatically generates a diagnostic text from one or more medical images (e.g., X-rays, MRIs) of a patient. Treated as a draft, the generated text may assist clinicians, by providing an initial estimation of the patient's condition, speeding up and helping safeguard the diagnostic process. The accuracy of a diagnostic text, however, strongly depends on how well the key… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: [Pre-print] ACL Findings 2024, 17 pages, 7 figures, 7 tables

  5. arXiv:2403.11904  [pdf, other

    cs.CL cs.LG

    CICLe: Conformal In-Context Learning for Largescale Multi-Class Food Risk Classification

    Authors: Korbinian Randl, John Pavlopoulos, Aron Henriksson, Tony Lindgren

    Abstract: Contaminated or adulterated food poses a substantial risk to human health. Given sets of labeled web texts for training, Machine Learning and Natural Language Processing can be applied to automatically detect such risks. We publish a dataset of 7,546 short texts describing public food recall announcements. Each text is manually labeled, on two granularity levels (coarse and fine), for food product… ▽ More

    Submitted 30 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  6. arXiv:2401.05831  [pdf, other

    cs.LG cs.AI

    Revisiting Silhouette Aggregation

    Authors: John Pavlopoulos, Georgios Vardakas, Aristidis Likas

    Abstract: Silhouette coefficient is an established internal clustering evaluation measure that produces a score per data point, assessing the quality of its clustering assignment. To assess the quality of the clustering of the whole dataset, the scores of all the points in the dataset are typically (micro) averaged into a single value. An alternative path, however, that is rarely employed, is to average fir… ▽ More

    Submitted 22 June, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  7. arXiv:2210.12883  [pdf, other

    cs.CL cs.AI

    A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis

    Authors: Konstantina Dritsa, Kaiti Thoma, John Pavlopoulos, Panos Louridas

    Abstract: Large, diachronic datasets of political discourse are hard to come across, especially for resource-lean languages such as Greek. In this paper, we introduce a curated dataset of the Greek Parliament Proceedings that extends chronologically from 1989 up to 2020. It consists of more than 1 million speeches with extensive metadata, extracted from 5,355 parliamentary record files. We explain how it wa… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Accepted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks

  8. arXiv:2210.06918  [pdf, other

    cs.CL

    Automotive Multilingual Fault Diagnosis

    Authors: John Pavlopoulos, Alv Romell, Jacob Curman, Olof Steinert, Tony Lindgren, Markus Borg

    Abstract: Automated fault diagnosis can facilitate diagnostics assistance, speedier troubleshooting, and better-organised logistics. Currently, AI-based prognostics and health management in the automotive industry ignore the textual descriptions of the experienced problems or symptoms. With this study, however, we show that a multilingual pre-trained Transformer can effectively classify the textual claims f… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  9. arXiv:2210.04756  [pdf, other

    cs.CL

    Metaphorical Paraphrase Generation: Feeding Metaphorical Language Models with Literal Texts

    Authors: Giorgio Ottolina, John Pavlopoulos

    Abstract: This study presents a new approach to metaphorical paraphrase generation by masking literal tokens of literal sentences and unmasking them with metaphorical language models. Unlike similar studies, the proposed algorithm does not only focus on verbs but also on nouns and adjectives. Despite the fact that the transfer rate for the former is the highest (56%), the transfer of the latter is feasible… ▽ More

    Submitted 13 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: 14 pages, 2 figures

  10. arXiv:2205.12012  [pdf, other

    cs.CL

    Analysing the Greek Parliament Records with Emotion Classification

    Authors: John Pavlopoulos, Vanessa Lislevand

    Abstract: In this project, we tackle emotion classification for the Greek language, presenting and releasing a new dataset in Greek. We fine-tune and assess Transformer-based masked language models that were pre-trained on monolingual and multilingual resources, and we present the results per emotion and by aggregating at the sentiment and subjectivity level. The potential of the presented resources is inve… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  11. arXiv:2111.10223  [pdf, other

    cs.CL

    Toxicity Detection can be Sensitive to the Conversational Context

    Authors: Alexandros Xenos, John Pavlopoulos, Ion Androutsopoulos, Lucas Dixon, Jeffrey Sorensen, Leo Laugier

    Abstract: User posts whose perceived toxicity depends on the conversational context are rare in current toxicity detection datasets. Hence, toxicity detectors trained on existing datasets will also tend to disregard context, making the detection of context-sensitive toxicity harder when it does occur. We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels: (i) annotato… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

    Comments: 13 pages, 8 figures

  12. arXiv:2102.05456  [pdf, other

    cs.CL cs.AI cs.LG

    Civil Rephrases Of Toxic Texts With Self-Supervised Transformers

    Authors: Leo Laugier, John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon

    Abstract: Platforms that support online commentary, from social networks to news sites, are increasingly leveraging machine learning to assist their moderation efforts. But this process does not typically provide feedback to the author that would help them contribute according to the community guidelines. This is prohibitively time-consuming for human moderators to do, and computational approaches are still… ▽ More

    Submitted 11 February, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

  13. arXiv:2101.07299  [pdf, other

    cs.CV

    Diagnostic Captioning: A Survey

    Authors: John Pavlopoulos, Vasiliki Kougia, Ion Androutsopoulos, Dimitris Papamichail

    Abstract: Diagnostic Captioning (DC) concerns the automatic generation of a diagnostic text from a set of medical images of a patient collected during an examination. DC can assist inexperienced physicians, reducing clinical errors. It can also help experienced physicians produce diagnostic reports faster. Following the advances of deep learning, especially in generic image captioning, DC has recently attra… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  14. arXiv:2006.12040  [pdf, other

    cs.CL

    Clinical Predictive Keyboard using Statistical and Neural Language Modeling

    Authors: John Pavlopoulos, Panagiotis Papapetrou

    Abstract: A language model can be used to predict the next word during authoring, to correct spelling or to accelerate writing (e.g., in sms or emails). Language models, however, have only been applied in a very small scale to assist physicians during authoring (e.g., discharge summaries or radiology reports). But along with the assistance to the physician, computer-based systems which expedite the patient'… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Comments: To appear in CBMS'20

  15. arXiv:2006.06316  [pdf, other

    cs.CV

    RTEX: A novel methodology for Ranking, Tagging, and Explanatory diagnostic captioning of radiography exams

    Authors: Vasiliki Kougia, John Pavlopoulos, Panagiotis Papapetrou, Max Gordon

    Abstract: This paper introduces RTEx, a novel methodology for a) ranking radiography exams based on their probability to contain an abnormality, b) generating abnormality tags for abnormal exams, and c) providing a diagnostic explanation in natural language for each abnormal exam. The task of ranking radiography exams is an important first step for practitioners who want to identify and prioritize those rad… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

  16. arXiv:2006.00998  [pdf, other

    cs.CL

    Toxicity Detection: Does Context Really Matter?

    Authors: John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, Ion Androutsopoulos

    Abstract: Moderation is crucial to promoting healthy on-line discussions. Although several `toxicity' detection datasets and models have been published, most of them ignore the context of the posts, implicitly assuming that comments maybe judged independently. We investigate this assumption by focusing on two questions: (a) does context affect the human judgement, and (b) does conditioning on context improv… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

  17. arXiv:1905.13302  [pdf, other

    cs.CV cs.AI

    A Survey on Biomedical Image Captioning

    Authors: Vasiliki Kougia, John Pavlopoulos, Ion Androutsopoulos

    Abstract: Image captioning applied to biomedical images can assist and accelerate the diagnosis process followed by clinicians. This article is the first survey of biomedical image captioning, discussing datasets, evaluation measures, and state of the art methods. Additionally, we suggest two baselines, a weak and a stronger one; the latter outperforms all current state of the art systems on one of the data… ▽ More

    Submitted 26 May, 2019; originally announced May 2019.

    Comments: SiVL 2019

  18. arXiv:1708.03699  [pdf, other

    cs.CL

    Improved Abusive Comment Moderation with User Embeddings

    Authors: John Pavlopoulos, Prodromos Malakasiotis, Juli Bakagianni, Ion Androutsopoulos

    Abstract: Experimenting with a dataset of approximately 1.6M user comments from a Greek news sports portal, we explore how a state of the art RNN-based moderation method can be improved by adding user embeddings, user type embeddings, user biases, or user type biases. We observe improvements in all cases, with user embeddings leading to the biggest performance gains.

    Submitted 11 August, 2017; originally announced August 2017.

  19. arXiv:1705.09993  [pdf, other

    cs.CL cs.LG

    Deep Learning for User Comment Moderation

    Authors: John Pavlopoulos, Prodromos Malakasiotis, Ion Androutsopoulos

    Abstract: Experimenting with a new dataset of 1.6M user comments from a Greek news portal and existing datasets of English Wikipedia comments, we show that an RNN outperforms the previous state of the art in moderation. A deep, classification-specific attention mechanism improves further the overall performance of the RNN. We also compare against a CNN and a word-list baseline, considering both fully automa… ▽ More

    Submitted 17 July, 2017; v1 submitted 28 May, 2017; originally announced May 2017.