Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Zyblewski, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.02568  [pdf, other

    cs.LG cs.CV

    Cross-Modality Clustering-based Self-Labeling for Multimodal Data Classification

    Authors: Paweł Zyblewski, Leandro L. Minku

    Abstract: Technological advances facilitate the ability to acquire multimodal data, posing a challenge for recognition systems while also providing an opportunity to use the heterogeneous nature of the information to increase the generalization capability of models. An often overlooked issue is the cost of the labeling process, which is typically high due to the need for a significant investment in time and… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, 9 tables

  2. arXiv:2407.10807  [pdf, other

    cs.CL cs.LG

    Employing Sentence Space Embedding for Classification of Data Stream from Fake News Domain

    Authors: Paweł Zyblewski, Jakub Klikowski, Weronika Borek-Marciniec, Paweł Ksieniewicz

    Abstract: Tabular data is considered the last unconquered castle of deep learning, yet the task of data stream classification is stated to be an equally important and demanding research area. Due to the temporal constraints, it is assumed that deep learning methods are not the optimal solution for application in this field. However, excluding the entire -- and prevalent -- group of methods seems rather rash… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 8 pages, 8 figures

  3. arXiv:2406.10255  [pdf, other

    cs.CL cs.SI

    WarCov -- Large multilabel and multimodal dataset from social platform

    Authors: Weronika Borek-Marciniec, Pawel Zyblewski, Jakub Klikowski, Pawel Ksieniewicz

    Abstract: In the classification tasks, from raw data acquisition to the curation of a dataset suitable for use in evaluating machine learning models, a series of steps - often associated with high costs - are necessary. In the case of Natural Language Processing, initial cleaning and conversion can be performed automatically, but obtaining labels still requires the rationalized input of human experts. As a… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures

  4. arXiv:2404.15836  [pdf, other

    cs.LG

    Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream Classification

    Authors: Paweł Zyblewski

    Abstract: Rapid technological advances are inherently linked to the increased amount of data, a substantial portion of which can be interpreted as data stream, capable of exhibiting the phenomenon of concept drift and having a high imbalance ratio. Consequently, developing new approaches to classifying difficult data streams is a rapidly growing research area. At the same time, the proliferation of deep lea… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 16 pages, 8 figures

  5. arXiv:2206.11867  [pdf, other

    cs.CL cs.LG

    Lifelong Learning Natural Language Processing Approach for Multilingual Data Classification

    Authors: Jędrzej Kozal, Michał Leś, Paweł Zyblewski, Paweł Ksieniewicz, Michał Woźniak

    Abstract: The abundance of information in digital media, which in today's world is the main source of knowledge about current events for the masses, makes it possible to spread disinformation on a larger scale than ever before. Consequently, there is a need to develop novel fake news detection approaches capable of adapting to changing factual contexts and generalizing previously or concurrently acquired kn… ▽ More

    Submitted 25 May, 2022; originally announced June 2022.

  6. arXiv:2112.10150  [pdf, ps, other

    cs.LG

    Active Weighted Aging Ensemble for Drifted Data Stream Classification

    Authors: Michał Woźniak, Paweł Zyblewski, Paweł Ksieniewicz

    Abstract: One of the significant problems of streaming data classification is the occurrence of concept drift, consisting of the change of probabilistic characteristics of the classification task. This phenomenon destabilizes the performance of the classification model and seriously degrades its quality. An appropriate strategy counteracting this phenomenon is required to adapt the classifier to the changin… ▽ More

    Submitted 19 December, 2021; originally announced December 2021.

    Comments: 29 pages, 3 figures

  7. arXiv:2001.11077  [pdf, ps, other

    cs.LG cs.CV stat.ML

    stream-learn -- open-source Python library for difficult data stream batch analysis

    Authors: Paweł Ksieniewicz, Paweł Zyblewski

    Abstract: stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate each of the three main concept drift types (i.e. sudden, gradual and incremental drift) in their recurring or non-recurring versions. The package allows conduc… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.