Zum Hauptinhalt springen

Showing 1–9 of 9 results for author: Nowakowski, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.11942  [pdf, other

    cs.CL

    FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation Purposes

    Authors: Dawid Wiśniewski, Zofia Rostek, Artur Nowakowski

    Abstract: People use language for various purposes. Apart from sharing information, individuals may use it to express emotions or to show respect for another person. In this paper, we focus on the formality level of machine-generated translations and present FAME-MT -- a dataset consisting of 11.2 million translations between 15 European source languages and 8 European target languages classified to formal… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted at EAMT 2024

  2. arXiv:2405.11937  [pdf, other

    cs.CL cs.AI cs.LG

    Chasing COMET: Leveraging Minimum Bayes Risk Decoding for Self-Improving Machine Translation

    Authors: Kamil Guttmann, Mikołaj Pokrywka, Adrian Charkiewicz, Artur Nowakowski

    Abstract: This paper explores Minimum Bayes Risk (MBR) decoding for self-improvement in machine translation (MT), particularly for domain adaptation and low-resource languages. We implement the self-improvement process by fine-tuning the model on its MBR-decoded forward translations. By employing COMET as the MBR utility metric, we aim to achieve the reranking of translations that better aligns with human p… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: EAMT 2024

  3. arXiv:2309.12810  [pdf, other

    cs.CL

    StyloMetrix: An Open-Source Multilingual Tool for Representing Stylometric Vectors

    Authors: Inez Okulska, Daria Stetsenko, Anna Kołos, Agnieszka Karlińska, Kinga Głąbińska, Adam Nowakowski

    Abstract: This work aims to provide an overview on the open-source multilanguage tool called StyloMetrix. It offers stylometric text representations that cover various aspects of grammar, syntax and lexicon. StyloMetrix covers four languages: Polish as the primary language, English, Ukrainian and Russian. The normalized output of each feature can become a fruitful course for machine learning models and a va… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: 26 pages, 6 figures, pre-print for the conference

  4. arXiv:2304.05336  [pdf, other

    cs.CL

    Exploring the Use of Foundation Models for Named Entity Recognition and Lemmatization Tasks in Slavic Languages

    Authors: Gabriela Pałka, Artur Nowakowski

    Abstract: This paper describes Adam Mickiewicz University's (AMU) solution for the 4th Shared Task on SlavNER. The task involves the identification, categorization, and lemmatization of named entities in Slavic languages. Our approach involved exploring the use of foundation models for these tasks. In particular, we used models based on the popular BERT and T5 model architectures. Additionally, we used exte… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: Slavic NLP 2023 @ EACL 2023

  5. arXiv:2209.11016  [pdf, ps, other

    cs.CL

    Approaching English-Polish Machine Translation Quality Assessment with Neural-based Methods

    Authors: Artur Nowakowski

    Abstract: This paper presents our contribution to the PolEval 2021 Task 2: Evaluation of translation quality assessment metrics. We describe experiments with pre-trained language models and state-of-the-art frameworks for translation quality assessment in both nonblind and blind versions of the task. Our solutions ranked second in the nonblind version and third in the blind version.

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: PolEval 2021

    Journal ref: Proceedings of the PolEval 2021 Workshop, 2021, 73-78

  6. arXiv:2209.02962  [pdf, other

    cs.CL

    Adam Mickiewicz University at WMT 2022: NER-Assisted and Quality-Aware Neural Machine Translation

    Authors: Artur Nowakowski, Gabriela Pałka, Kamil Guttmann, Mikołaj Pokrywka

    Abstract: This paper presents Adam Mickiewicz University's (AMU) submissions to the constrained track of the WMT 2022 General MT Task. We participated in the Ukrainian $\leftrightarrow$ Czech translation directions. The systems are a weighted ensemble of four models based on the Transformer (big) architecture. The models use source factors to utilize the information about named entities present in the input… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: WMT 2022

  7. arXiv:2204.02100  [pdf, other

    cs.LG

    Self-supervised learning -- A way to minimize time and effort for precision agriculture?

    Authors: Michael L. Marszalek, Bertrand Le Saux, Pierre-Philippe Mathieu, Artur Nowakowski, Daniel Springer

    Abstract: Machine learning, satellites or local sensors are key factors for a sustainable and resource-saving optimisation of agriculture and proved its values for the management of agricultural land. Up to now, the main focus was on the enlargement of data which were evaluated by means of supervised learning methods. Nevertheless, the need for labels is also a limiting and time-consuming factor, while in c… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted for ISPRS Archives 2022

  8. arXiv:2108.10580  [pdf, ps, other

    cs.CL

    Detection of Criminal Texts for the Polish State Border Guard

    Authors: Artur Nowakowski, Krzysztof Jassem

    Abstract: This paper describes research on the detection of Polish criminal texts appearing on the Internet. We carried out experiments to find the best available setup for the efficient classification of unbalanced and noisy data. The best performance was achieved when our model was fine-tuned on a pre-trained Polish-based transformer language model. For the detection task, a large corpus of annotated Inte… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

    Comments: Accepted for MIS2 workshop at KDD 2021

  9. arXiv:2106.12226  [pdf, other

    cs.CV cs.AI eess.IV

    Spatio-Temporal SAR-Optical Data Fusion for Cloud Removal via a Deep Hierarchical Model

    Authors: Alessandro Sebastianelli, Artur Nowakowski, Erika Puglisi, Maria Pia Del Rosso, Jamila Mifdal, Fiora Pirri, Pierre Philippe Mathieu, Silvia Liberata Ullo

    Abstract: Cloud removal is a relevant topic in Remote Sensing as it fosters the usability of high-resolution optical images for Earth monitoring and study. Related techniques have been analyzed for years with a progressively clearer view of the appropriate methods to adopt, from multi-spectral to inpainting methods. Recent applications of deep generative models and sequence-to-sequence-based models have pro… ▽ More

    Submitted 28 March, 2022; v1 submitted 23 June, 2021; originally announced June 2021.