Search | arXiv e-print repository

Post-OCR Text Correction for Bulgarian Historical Documents

Authors: Angel Beshirov, Milena Dobreva, Dimitar Dimitrov, Momchil Hardalov, Ivan Koychev, Preslav Nakov

Abstract: The digitization of historical documents is crucial for preserving the cultural heritage of the society. An important step in this process is converting scanned images to text using Optical Character Recognition (OCR), which can enable further search, information extraction, etc. Unfortunately, this is a hard problem as standard OCR tools are not tailored to deal with historical orthography as wel… ▽ More The digitization of historical documents is crucial for preserving the cultural heritage of the society. An important step in this process is converting scanned images to text using Optical Character Recognition (OCR), which can enable further search, information extraction, etc. Unfortunately, this is a hard problem as standard OCR tools are not tailored to deal with historical orthography as well as with challenging layouts. Thus, it is standard to apply an additional text correction step on the OCR output when dealing with such documents. In this work, we focus on Bulgarian, and we create the first benchmark dataset for evaluating the OCR text correction for historical Bulgarian documents written in the first standardized Bulgarian orthography: the Drinov orthography from the 19th century. We further develop a method for automatically generating synthetic data in this orthography, as well as in the subsequent Ivanchev orthography, by leveraging vast amounts of contemporary literature Bulgarian texts. We then use state-of-the-art LLMs and encoder-decoder framework which we augment with diagonal attention loss and copy and coverage mechanisms to improve the post-OCR text correction. The proposed method reduces the errors introduced during recognition and improves the quality of the documents by 25\%, which is an increase of 16\% compared to the state-of-the-art on the ICDAR 2019 Bulgarian dataset. We release our data and code at \url{https://github.com/angelbeshirov/post-ocr-text-correction}.} △ Less

Submitted 31 August, 2024; originally announced September 2024.

Comments: Accepted for publication in the International Journal on Digital Libraries

arXiv:2407.12833 [pdf, other]

ESQA: Event Sequences Question Answering

Authors: Irina Abdullaeva, Andrei Filatov, Mikhail Orlov, Ivan Karpukhin, Viacheslav Vasilev, Denis Dimitrov, Andrey Kuznetsov, Ivan Kireev, Andrey Savchenko

Abstract: Event sequences (ESs) arise in many practical domains including finance, retail, social networks, and healthcare. In the context of machine learning, event sequences can be seen as a special type of tabular data with annotated timestamps. Despite the importance of ESs modeling and analysis, little effort was made in adapting large language models (LLMs) to the ESs domain. In this paper, we highlig… ▽ More Event sequences (ESs) arise in many practical domains including finance, retail, social networks, and healthcare. In the context of machine learning, event sequences can be seen as a special type of tabular data with annotated timestamps. Despite the importance of ESs modeling and analysis, little effort was made in adapting large language models (LLMs) to the ESs domain. In this paper, we highlight the common difficulties of ESs processing and propose a novel solution capable of solving multiple downstream tasks with little or no finetuning. In particular, we solve the problem of working with long sequences and improve time and numeric features processing. The resulting method, called ESQA, effectively utilizes the power of LLMs and, according to extensive experiments, achieves state-of-the-art results in the ESs domain. △ Less

Submitted 19 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: 25 pages, 3 figures

arXiv:2406.14227 [pdf, other]

Modular Synthesis of Efficient Quantum Uncomputation

Authors: Hristo Venev, Timon Gehr, Dimitar Dimitrov, Martin Vechev

Abstract: A key challenge of quantum programming is uncomputation: the reversible deallocation of qubits. And while there has been much recent progress on automating uncomputation, state-of-the-art methods are insufficient for handling today's expressive quantum programming languages. A core reason is that they operate on primitive quantum circuits, while quantum programs express computations beyond circuit… ▽ More A key challenge of quantum programming is uncomputation: the reversible deallocation of qubits. And while there has been much recent progress on automating uncomputation, state-of-the-art methods are insufficient for handling today's expressive quantum programming languages. A core reason is that they operate on primitive quantum circuits, while quantum programs express computations beyond circuits, for instance, they can capture families of circuits defined recursively in terms of uncomputation and adjoints. In this paper, we introduce the first modular automatic approach to synthesize correct and efficient uncomputation for expressive quantum programs. Our method is based on two core technical contributions: (i) an intermediate representation (IR) that can capture expressive quantum programs and comes with support for uncomputation, and (ii) modular algorithms over that IR for synthesizing uncomputation and adjoints. We have built a complete end-to-end implementation of our method, including an implementation of the IR and the synthesis algorithms, as well as a translation from an expressive fragment of the Silq programming language to our IR and circuit generation from the IR. Our experimental evaluation demonstrates that we can handle programs beyond the capabilities of existing uncomputation approaches, while being competitive on the benchmarks they can handle. More broadly, we show that it is possible to benefit from the greater expressivity and safety offered by high-level quantum languages without sacrificing efficiency. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 25 pages, 9 figures

ACM Class: D.3.1

arXiv:2405.15586 [pdf, other]

DAGER: Exact Gradient Inversion for Large Language Models

Authors: Ivo Petrov, Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, Martin Vechev

Abstract: Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data. However, prior work has shown that the data can actually be recovered by the server using so-called gradient inversion attacks. While these attacks perform well when applied on images, they are limited in the text domain and only permit… ▽ More Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data. However, prior work has shown that the data can actually be recovered by the server using so-called gradient inversion attacks. While these attacks perform well when applied on images, they are limited in the text domain and only permit approximate reconstruction of small batches and short input sequences. In this work, we propose DAGER, the first algorithm to recover whole batches of input text exactly. DAGER leverages the low-rank structure of self-attention layer gradients and the discrete nature of token embeddings to efficiently check if a given token sequence is part of the client data. We use this check to exactly recover full batches in the honest-but-curious setting without any prior on the data for both encoder- and decoder-based architectures using exhaustive heuristic search and a greedy approach, respectively. We provide an efficient GPU implementation of DAGER and show experimentally that it recovers full batches of size up to 128 on large language models (LLMs), beating prior attacks in speed (20x at same batch size), scalability (10x larger batches), and reconstruction quality (ROUGE-1/2 > 0.99). △ Less

Submitted 24 May, 2024; originally announced May 2024.

ACM Class: I.2.7; I.2.11

arXiv:2405.12250 [pdf, other]

Your Transformer is Secretly Linear

Authors: Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Nikolai Gerasimenko, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

Abstract: This paper reveals a novel linear characteristic exclusive to transformer decoders, including models such as GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed due to a consistently low output norm o… ▽ More This paper reveals a novel linear characteristic exclusive to transformer decoders, including models such as GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed due to a consistently low output norm of the transformer layer. Our experiments show that removing or linearly approximating some of the most linear blocks of transformers does not affect significantly the loss or model performance. Moreover, in our pretraining experiments on smaller models we introduce a cosine-similarity-based regularization, aimed at reducing layer linearity. This regularization improves performance metrics on benchmarks like Tiny Stories and SuperGLUE and as well successfully decreases the linearity of the models. This study challenges the existing understanding of transformer architectures, suggesting that their operation may be more linear than previously assumed. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 9 pages, 9 figures

arXiv:2405.04398 [pdf]

A first-principles study of structural, elastic, electronic, and transport properties of Cs2Te

Authors: Gaoxue Wang, Jinlin Zhang, Chengkun Huang, Dimitre A. Dimitrov, Anna M. Alexander, Evgenya I Simakov

Abstract: The pursuit to operate photocathodes at high accelerating gradients to increase brightness of electron beams is gaining interests within the accelerator community. Cesium telluride (Cs2Te) is a widely used photocathode material and it is presumed to offer resilience to higher gradients because of its wider band gap compared to other semiconductors. Despite its advantages, crucial material properti… ▽ More The pursuit to operate photocathodes at high accelerating gradients to increase brightness of electron beams is gaining interests within the accelerator community. Cesium telluride (Cs2Te) is a widely used photocathode material and it is presumed to offer resilience to higher gradients because of its wider band gap compared to other semiconductors. Despite its advantages, crucial material properties of Cs2Te remain largely unknown both in theory and experiments. In this study, we employ first-principles calculations to provide detailed structural, elastic, electronic and transport properties of Cs2Te. It is found that Cs2Te has an intrinsic mobility of 20 cm2/Vs for electrons and 2.0 cm2/Vs for holes at room temperature. The low mobility is primarily limited by the strong polar optical phonon scattering. Cs2Te also exhibits ultralow lattice thermal conductivity of 0.2 W/(m*K) at room temperature. Based on the energy gain/loss balance under external field and electron-phonon scattering, we predict that Cs2Te has a dielectric breakdown field in the range from ~60 MV/m to ~132 MV/m at room temperature dependent on the doping level of Cs2Te. Our results are crucial to advance the understanding of applicability of Cs2Te photocathodes for high-gradient operation. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.06212 [pdf, other]

OmniFusion Technical Report

Authors: Elizaveta Goncharova, Anton Razzhigaev, Matvey Mikhalchuk, Maxim Kurkin, Irina Abdullaeva, Matvey Skripkin, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

Abstract: Last year, multimodal architectures served up a revolution in AI-based approaches and solutions, extending the capabilities of large language models (LLM). We propose an \textit{OmniFusion} model based on a pretrained LLM and adapters for visual modality. We evaluated and compared several architecture design principles for better text and visual data coupling: MLP and transformer adapters, various… ▽ More Last year, multimodal architectures served up a revolution in AI-based approaches and solutions, extending the capabilities of large language models (LLM). We propose an \textit{OmniFusion} model based on a pretrained LLM and adapters for visual modality. We evaluated and compared several architecture design principles for better text and visual data coupling: MLP and transformer adapters, various CLIP ViT-based encoders (SigLIP, InternVIT, etc.), and their fusing approach, image encoding method (whole image or tiles encoding) and two 7B LLMs (the proprietary one and open-source Mistral). Experiments on 8 visual-language benchmarks show the top score for the best OmniFusion setup in terms of different VQA tasks in comparison with open-source LLaVA-like solutions: VizWiz, Pope, MM-Vet, ScienceQA, MMBench, TextVQA, VQAv2, MMMU. We also propose a variety of situations, where OmniFusion provides highly-detailed answers in different domains: housekeeping, sightseeing, culture, medicine, handwritten and scanned equations recognition, etc. Mistral-based OmniFusion model is an open-source solution with weights, training and inference scripts available at https://github.com/AIRI-Institute/OmniFusion. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 17 pages, 4 figures, 9 tables, 2 appendices

MSC Class: 6804; 68T50 (Primary) ACM Class: I.2.7; I.2.10; I.4.9

arXiv:2404.01992 [pdf, other]

Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models

Authors: Stephan Linzbach, Dimitar Dimitrov, Laura Kallmeyer, Kilian Evang, Hajira Jabeen, Stefan Dietze

Abstract: Pre-trained Language Models (PLMs) are known to contain various kinds of knowledge. One method to infer relational knowledge is through the use of cloze-style prompts, where a model is tasked to predict missing subjects or objects. Typically, designing these prompts is a tedious task because small differences in syntax or semantics can have a substantial impact on knowledge retrieval performance.… ▽ More Pre-trained Language Models (PLMs) are known to contain various kinds of knowledge. One method to infer relational knowledge is through the use of cloze-style prompts, where a model is tasked to predict missing subjects or objects. Typically, designing these prompts is a tedious task because small differences in syntax or semantics can have a substantial impact on knowledge retrieval performance. Simultaneously, evaluating the impact of either prompt syntax or information is challenging due to their interdependence. We designed CONPARE-LAMA - a dedicated probe, consisting of 34 million distinct prompts that facilitate comparison across minimal paraphrases. These paraphrases follow a unified meta-template enabling the controlled variation of syntax and semantics across arbitrary relations. CONPARE-LAMA enables insights into the independent impact of either syntactical form or semantic information of paraphrases on the knowledge retrieval performance of PLMs. Extensive knowledge retrieval experiments using our probe reveal that prompts following clausal syntax have several desirable properties in comparison to appositive syntax: i) they are more useful when querying PLMs with a combination of supplementary information, ii) knowledge is more consistently recalled across different combinations of supplementary information, and iii) they decrease response uncertainty when retrieving known facts. In addition, range information can boost knowledge retrieval performance more than domain information, even though domain information is more reliably helpful across syntactic forms. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted for NAACL 2024

arXiv:2403.10378 [pdf, other]

EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models

Authors: Rocktim Jyoti Das, Simeon Emilov Hristov, Haonan Li, Dimitar Iliyanov Dimitrov, Ivan Koychev, Preslav Nakov

Abstract: We introduce EXAMS-V, a new challenging multi-discipline multimodal multilingual exam benchmark for evaluating vision language models. It consists of 20,932 multiple-choice questions across 20 school disciplines covering natural science, social science, and other miscellaneous studies, e.g., religion, fine arts, business, etc. EXAMS-V includes a variety of multimodal features such as text, images,… ▽ More We introduce EXAMS-V, a new challenging multi-discipline multimodal multilingual exam benchmark for evaluating vision language models. It consists of 20,932 multiple-choice questions across 20 school disciplines covering natural science, social science, and other miscellaneous studies, e.g., religion, fine arts, business, etc. EXAMS-V includes a variety of multimodal features such as text, images, tables, figures, diagrams, maps, scientific symbols, and equations. The questions come in 11 languages from 7 language families. Unlike existing benchmarks, EXAMS-V is uniquely curated by gathering school exam questions from various countries, with a variety of education systems. This distinctive approach calls for intricate reasoning across diverse languages and relies on region-specific knowledge. Solving the problems in the dataset requires advanced perception and joint reasoning over the text and the visual content of the image. Our evaluation results demonstrate that this is a challenging dataset, which is difficult even for advanced vision-text models such as GPT-4V and Gemini; this underscores the inherent complexity of the dataset and its significance as a future benchmark. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.03945 [pdf, other]

SPEAR:Exact Gradient Inversion of Batches in Federated Learning

Authors: Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, Martin Vechev

Abstract: Federated learning is a framework for collaborative machine learning where clients only share gradient updates and not their private data with a server. However, it was recently shown that gradient inversion attacks can reconstruct this data from the shared gradients. In the important honest-but-curious setting, existing attacks enable exact reconstruction only for a batch size of $b=1$, with larg… ▽ More Federated learning is a framework for collaborative machine learning where clients only share gradient updates and not their private data with a server. However, it was recently shown that gradient inversion attacks can reconstruct this data from the shared gradients. In the important honest-but-curious setting, existing attacks enable exact reconstruction only for a batch size of $b=1$, with larger batches permitting only approximate reconstruction. In this work, we propose SPEAR, the first algorithm reconstructing whole batches with $b >1$ exactly. SPEAR combines insights into the explicit low-rank structure of gradients with a sampling-based algorithm. Crucially, we leverage ReLU-induced gradient sparsity to precisely filter out large numbers of incorrect samples, making a final reconstruction step tractable. We provide an efficient GPU implementation for fully connected networks and show that it recovers high-dimensional ImageNet inputs in batches of up to $b \lesssim 25$ exactly while scaling to large networks. Finally, we show theoretically that much larger batches can be reconstructed with high probability given exponential time. △ Less

Submitted 3 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

ACM Class: I.2.11

arXiv:2401.04531 [pdf, other]

MERA: A Comprehensive LLM Evaluation in Russian

Authors: Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin, Polina Mikhailova, Denis Dimitrov, Alexander Panchenko, Sergei Markov

Abstract: Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitatio… ▽ More Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce an open Multimodal Evaluation of Russian-language Architectures (MERA), a new instruction benchmark for evaluating foundation models oriented towards the Russian language. The benchmark encompasses 21 evaluation tasks for generative models in 11 skill domains and is designed as a black-box test to ensure the exclusion of data leakage. The paper introduces a methodology to evaluate FMs and LMs in zero- and few-shot fixed instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find that they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential societal drawbacks. △ Less

Submitted 2 August, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: The paper version comparable with the release code v.1.1.0 of the benchmark MERA. ACL-2024 main track camera ready version

arXiv:2312.03511 [pdf, other]

Kandinsky 3.0 Technical Report

Authors: Vladimir Arkhipkin, Andrei Filatov, Viacheslav Vasilev, Anastasia Maltseva, Said Azizov, Igor Pavlov, Julia Agafonova, Andrey Kuznetsov, Denis Dimitrov

Abstract: We present Kandinsky 3.0, a large-scale text-to-image generation model based on latent diffusion, continuing the series of text-to-image Kandinsky models and reflecting our progress to achieve higher quality and realism of image generation. In this report we describe the architecture of the model, the data collection procedure, the training technique, and the production system for user interaction… ▽ More We present Kandinsky 3.0, a large-scale text-to-image generation model based on latent diffusion, continuing the series of text-to-image Kandinsky models and reflecting our progress to achieve higher quality and realism of image generation. In this report we describe the architecture of the model, the data collection procedure, the training technique, and the production system for user interaction. We focus on the key components that, as we have identified as a result of a large number of experiments, had the most significant impact on improving the quality of our model compared to the others. We also describe extensions and applications of our model, including super resolution, inpainting, image editing, image-to-video generation, and a distilled version of Kandinsky 3.0 - Kandinsky 3.1, which does inference in 4 steps of the reverse process and 20 times faster without visual quality decrease. By side-by-side human preferences comparison, Kandinsky becomes better in text understanding and works better on specific domains. The code is available at https://github.com/ai-forever/Kandinsky-3 △ Less

Submitted 28 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Project page: https://ai-forever.github.io/Kandinsky-3

arXiv:2311.13073 [pdf, other]

FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline

Authors: Vladimir Arkhipkin, Zein Shaheen, Viacheslav Vasilev, Elizaveta Dakhova, Andrey Kuznetsov, Denis Dimitrov

Abstract: Multimedia generation approaches occupy a prominent place in artificial intelligence research. Text-to-image models achieved high-quality results over the last few years. However, video synthesis methods recently started to develop. This paper presents a new two-stage latent diffusion text-to-video generation architecture based on the text-to-image diffusion model. The first stage concerns keyfram… ▽ More Multimedia generation approaches occupy a prominent place in artificial intelligence research. Text-to-image models achieved high-quality results over the last few years. However, video synthesis methods recently started to develop. This paper presents a new two-stage latent diffusion text-to-video generation architecture based on the text-to-image diffusion model. The first stage concerns keyframes synthesis to figure the storyline of a video, while the second one is devoted to interpolation frames generation to make movements of the scene and objects smooth. We compare several temporal conditioning approaches for keyframes generation. The results show the advantage of using separate temporal blocks over temporal layers in terms of metrics reflecting video generation quality aspects and human preference. The design of our interpolation model significantly reduces computational costs compared to other masked frame interpolation approaches. Furthermore, we evaluate different configurations of MoVQ-based video decoding scheme to improve consistency and achieve higher PSNR, SSIM, MSE, and LPIPS scores. Finally, we compare our pipeline with existing solutions and achieve top-2 scores overall and top-1 among open-source solutions: CLIPSIM = 0.2976 and FVD = 433.054. Project page: https://ai-forever.github.io/kandinsky-video/ △ Less

Submitted 20 December, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: Project page: https://ai-forever.github.io/kandinsky-video/

arXiv:2311.05928 [pdf, other]

The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

Authors: Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

Abstract: In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from… ▽ More In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties. △ Less

Submitted 26 February, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: Accepted to EACL-2024

arXiv:2310.03502 [pdf, other]

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Authors: Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhipkin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, Andrey Kuznetsov, Denis Dimitrov

Abstract: Text-to-image generation is a significant domain in modern computer vision and has achieved substantial improvements through the evolution of generative architectures. Among these, there are diffusion-based models that have demonstrated essential quality enhancements. These models are generally split into two categories: pixel-level and latent-level approaches. We present Kandinsky1, a novel explo… ▽ More Text-to-image generation is a significant domain in modern computer vision and has achieved substantial improvements through the evolution of generative architectures. Among these, there are diffusion-based models that have demonstrated essential quality enhancements. These models are generally split into two categories: pixel-level and latent-level approaches. We present Kandinsky1, a novel exploration of latent diffusion architecture, combining the principles of the image prior models with latent diffusion techniques. The image prior model is trained separately to map text embeddings to image embeddings of CLIP. Another distinct feature of the proposed model is the modified MoVQ implementation, which serves as the image autoencoder component. Overall, the designed model contains 3.3B parameters. We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting. Additionally, we released the source code and checkpoints for the Kandinsky models. Experimental evaluations demonstrate a FID score of 8.03 on the COCO-30K dataset, marking our model as the top open-source performer in terms of measurable image generation quality. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2309.06844 [pdf, other]

Gpachov at CheckThat! 2023: A Diverse Multi-Approach Ensemble for Subjectivity Detection in News Articles

Authors: Georgi Pachov, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov

Abstract: The wide-spread use of social networks has given rise to subjective, misleading, and even false information on the Internet. Thus, subjectivity detection can play an important role in ensuring the objectiveness and the quality of a piece of information. This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task~2 on subjectivity detection. Three different rese… ▽ More The wide-spread use of social networks has given rise to subjective, misleading, and even false information on the Internet. Thus, subjectivity detection can play an important role in ensuring the objectiveness and the quality of a piece of information. This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task~2 on subjectivity detection. Three different research directions are explored. The first one is based on fine-tuning a sentence embeddings encoder model and dimensionality reduction. The second one explores a sample-efficient few-shot learning model. The third one evaluates fine-tuning a multilingual transformer on an altered dataset, using data from multiple languages. Finally, the three approaches are combined in a simple majority voting ensemble, resulting in 0.77 macro F1 on the test set and achieving 2nd place on the English subtask. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2306.08172 [pdf, ps, other]

Sharp Hardy's Inequalities in Hilbert Spaces

Authors: Dimitar K. Dimitrov, Ivan Gadjev, Mourad E. H. Ismail

Abstract: We study the behavior of the smallest possible constants $d(a,b)$ and $d_n$ in Hardy's inequalities $$ \int_a^b\left(\frac{1}{x}\int_a^xf(t)dt\right)^2\,dx\leq d(a,b)\,\int_a^b [f(x)]^2 dx $$ and $$ \sum_{k=1}^{n}\Big(\frac{1}{k}\sum_{j=1}^{k}a_j\Big)^2\leq d_n\,\sum_{k=1}^{n}a_k^2. $$ The exact constant $d(a,b)$ and the precise rate of convergence of $d_n$ are established and the extremal functio… ▽ More We study the behavior of the smallest possible constants $d(a,b)$ and $d_n$ in Hardy's inequalities $$ \int_a^b\left(\frac{1}{x}\int_a^xf(t)dt\right)^2\,dx\leq d(a,b)\,\int_a^b [f(x)]^2 dx $$ and $$ \sum_{k=1}^{n}\Big(\frac{1}{k}\sum_{j=1}^{k}a_j\Big)^2\leq d_n\,\sum_{k=1}^{n}a_k^2. $$ The exact constant $d(a,b)$ and the precise rate of convergence of $d_n$ are established and the extremal function and the ``almost extremal'' sequence are found. △ Less

Submitted 5 February, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: 10 pages

MSC Class: Primary 26D10; 26D15; Secondary 33D45

arXiv:2306.03013 [pdf, other]

Hiding in Plain Sight: Disguising Data Stealing Attacks in Federated Learning

Authors: Kostadin Garov, Dimitar I. Dimitrov, Nikola Jovanović, Martin Vechev

Abstract: Malicious server (MS) attacks have enabled the scaling of data stealing in federated learning to large batch sizes and secure aggregation, settings previously considered private. However, many concerns regarding the client-side detectability of MS attacks were raised, questioning their practicality. In this work, for the first time, we thoroughly study client-side detectability. We first demonstra… ▽ More Malicious server (MS) attacks have enabled the scaling of data stealing in federated learning to large batch sizes and secure aggregation, settings previously considered private. However, many concerns regarding the client-side detectability of MS attacks were raised, questioning their practicality. In this work, for the first time, we thoroughly study client-side detectability. We first demonstrate that all prior MS attacks are detectable by principled checks, and formulate a necessary set of requirements that a practical MS attack must satisfy. Next, we propose SEER, a novel attack framework that satisfies these requirements. The key insight of SEER is the use of a secret decoder, jointly trained with the shared model. We show that SEER can steal user data from gradients of realistic networks, even for large batch sizes of up to 512 and under secure aggregation. Our work is a promising step towards assessing the true vulnerability of federated learning in real-world settings. △ Less

Submitted 15 April, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

ACM Class: I.2.11

arXiv:2304.05337 [pdf, ps, other]

An extremal problem and inequalities for entire functions of exponential type

Authors: Andrés Chirre, Dimitar K. Dimitrov, Emily Quesada-Herrera, Mateus Sousa

Abstract: We study two variations of the classical one-delta problem for entire functions of exponential type, known also as the Carathéodory--Fejér--Turán problem. The first variation imposes the additional requirement that the function is radially decreasing while the second one is a generalization which involves derivatives of the entire function. Various interesting inequalities, inspired by results due… ▽ More We study two variations of the classical one-delta problem for entire functions of exponential type, known also as the Carathéodory--Fejér--Turán problem. The first variation imposes the additional requirement that the function is radially decreasing while the second one is a generalization which involves derivatives of the entire function. Various interesting inequalities, inspired by results due to Duffin and Schaeffer, Landau, and Hardy and Littlewood, are also established. △ Less

Submitted 30 October, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: 18 pages, 4 figures

MSC Class: 42A38; 30D15; 41A17

arXiv:2303.16531 [pdf, other]

RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition

Authors: Igor Markov, Sergey Nesteruk, Andrey Kuznetsov, Denis Dimitrov

Abstract: Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While… ▽ More Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While there are many high-quality datasets for English text recognition; there are no available datasets for Russian language. In this paper, we present a large-scale human-labeled dataset for Russian text recognition in-the-wild. We also publish a synthetic dataset and code to reproduce the generation process △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: 5 pages, 6 figures, 2 tables

arXiv:2212.11691 [pdf, other]

doi 10.1088/2053-1583/ac6fec

Resolving spin currents and spin densities generated by charge-spin interconversion in systems with reduced crystal symmetry

Authors: Lorenzo Camosi, Josef Svetlik, Marius V. Costache, Williams Savero Torres, Iván Fernández Aguirre, Vera Marinova, Dimitre Dimitrov, Marin Gospodinov, Juan F. Sierra, Sergio O. Valenzuela

Abstract: The ability to control the generation of spins in arbitrary directions is a long-sought goal in spintronics. Charge-to-spin interconversion (CSI) phenomena depend strongly on symmetry. Systems with reduced crystal symmetry allow anisotropic CSI with unconventional components, where charge and spin currents and the spin polarization are not mutually perpendicular to each other. Here, we demonstrate… ▽ More The ability to control the generation of spins in arbitrary directions is a long-sought goal in spintronics. Charge-to-spin interconversion (CSI) phenomena depend strongly on symmetry. Systems with reduced crystal symmetry allow anisotropic CSI with unconventional components, where charge and spin currents and the spin polarization are not mutually perpendicular to each other. Here, we demonstrate experimentally that the CSI in graphene-WTe2 induces spins with components in all three spatial directions. By performing multi-terminal nonlocal spin precession experiments, with specific magnetic field orientations, we discuss how to disentangle the CSI from the spin Hall and inverse spin galvanic effects. △ Less

Submitted 22 December, 2022; originally announced December 2022.

Journal ref: 2D Materials 9, 035014 (2022)

arXiv:2212.08851 [pdf, ps, other]

Green's Functions and Existence of Solutions of Nonlinear Fractional Implicit Difference Equations with Dirichlet Boundary Conditions

Authors: Alberto Cabada, Nikolay D. Dimitrov, Jagan Mohan Jonnalagadda

Abstract: This article is devoted to deduce the expression of the Green's function related to a general constant coefficients fractional difference equation coupled to Dirichlet conditions. In this case, due to the points where some of the fractional operators is applied, we are in presence of a implicit fractional difference equation. Such property makes it more complicated to calculate and manage the expr… ▽ More This article is devoted to deduce the expression of the Green's function related to a general constant coefficients fractional difference equation coupled to Dirichlet conditions. In this case, due to the points where some of the fractional operators is applied, we are in presence of a implicit fractional difference equation. Such property makes it more complicated to calculate and manage the expression of the Green's function. Such expression, on the contrary to the explicit case where it follows from finite sums, is deduced from series of infinity terms. Such expression will be deduced from the Laplace transform on the time scales of the integers. Finally, we prove two existence results for nonlinear problems, via suitable fixed point theorems. △ Less

Submitted 17 December, 2022; originally announced December 2022.

arXiv:2210.07213 [pdf, other]

FARE: Provably Fair Representation Learning with Practical Certificates

Authors: Nikola Jovanović, Mislav Balunović, Dimitar I. Dimitrov, Martin Vechev

Abstract: Fair representation learning (FRL) is a popular class of methods aiming to produce fair classifiers via data preprocessing. Recent regulatory directives stress the need for FRL methods that provide practical certificates, i.e., provable upper bounds on the unfairness of any downstream classifier trained on preprocessed data, which directly provides assurance in a practical scenario. Creating such… ▽ More Fair representation learning (FRL) is a popular class of methods aiming to produce fair classifiers via data preprocessing. Recent regulatory directives stress the need for FRL methods that provide practical certificates, i.e., provable upper bounds on the unfairness of any downstream classifier trained on preprocessed data, which directly provides assurance in a practical scenario. Creating such FRL methods is an important challenge that remains unsolved. In this work, we address that challenge and introduce FARE (Fairness with Restricted Encoders), the first FRL method with practical fairness certificates. FARE is based on our key insight that restricting the representation space of the encoder enables the derivation of practical guarantees, while still permitting favorable accuracy-fairness tradeoffs for suitable instantiations, such as one we propose based on fair trees. To produce a practical certificate, we develop and apply a statistical procedure that computes a finite sample high-confidence upper bound on the unfairness of any downstream classifier trained on FARE embeddings. In our comprehensive experimental evaluation, we demonstrate that FARE produces practical certificates that are tight and often even comparable with purely empirical results obtained by prior methods, which establishes the practical value of our approach. △ Less

Submitted 8 June, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: ICML 2023

arXiv:2210.01785 [pdf, other]

TabLeak: Tabular Data Leakage in Federated Learning

Authors: Mark Vero, Mislav Balunović, Dimitar I. Dimitrov, Martin Vechev

Abstract: While federated learning (FL) promises to preserve privacy, recent works in the image and text domains have shown that training updates leak private client data. However, most high-stakes applications of FL (e.g., in healthcare and finance) use tabular data, where the risk of data leakage has not yet been explored. A successful attack for tabular data must address two key challenges unique to the… ▽ More While federated learning (FL) promises to preserve privacy, recent works in the image and text domains have shown that training updates leak private client data. However, most high-stakes applications of FL (e.g., in healthcare and finance) use tabular data, where the risk of data leakage has not yet been explored. A successful attack for tabular data must address two key challenges unique to the domain: (i) obtaining a solution to a high-variance mixed discrete-continuous optimization problem, and (ii) enabling human assessment of the reconstruction as unlike for image and text data, direct human inspection is not possible. In this work we address these challenges and propose TabLeak, the first comprehensive reconstruction attack on tabular data. TabLeak is based on two key contributions: (i) a method which leverages a softmax relaxation and pooled ensembling to solve the optimization problem, and (ii) an entropy-based uncertainty quantification scheme to enable human assessment. We evaluate TabLeak on four tabular datasets for both FedSGD and FedAvg training protocols, and show that it successfully breaks several settings previously deemed safe. For instance, we extract large subsets of private data at >90% accuracy even at the large batch size of 128. Our findings demonstrate that current high-stakes tabular FL is excessively vulnerable to leakage attacks. △ Less

Submitted 7 July, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

ACM Class: I.2.11

arXiv:2209.10597 [pdf, ps, other]

doi 10.1051/0004-6361/202244280

Planet-star interactions with precise transit timing. III. Entering the regime of dynamical tides

Authors: G. Maciejewski, M. Fernandez, A. Sota, P. J. Amado, D. Dimitrov, Y. Nikolov, J. Ohlert, M. Mugrauer, R. Bischoff, T. Heyne, F. Hildebrandt, W. Stenglein, A. A. Arevalo, S. Neira, L. A. Riesco, V. Sanchez Martinez, M. M. Verdugo

Abstract: Hot Jupiters on extremely short-period orbits are expected to be unstable to tidal dissipation and spiral toward their host stars. That is because they transfer the angular momentum of the orbital motion through tidal dissipation into the stellar interior. Although the magnitude of this phenomenon is related to the physical properties of a specific star-planet system, statistical studies show that… ▽ More Hot Jupiters on extremely short-period orbits are expected to be unstable to tidal dissipation and spiral toward their host stars. That is because they transfer the angular momentum of the orbital motion through tidal dissipation into the stellar interior. Although the magnitude of this phenomenon is related to the physical properties of a specific star-planet system, statistical studies show that tidal dissipation might shape the architecture of hot Jupiter systems during the stellar lifetime on the main sequence. The efficiency of tidal dissipation remains poorly constrained in star-planet systems. Stellar interior models show that the dissipation of dynamical tides in radiation zones could be the dominant mechanism driving planetary orbital decay. These theoretical predictions can be verified with the transit timing method. We acquired new precise transit mid-times for five planets. They were previously identified as the best candidates for which orbital decay might be detected. Analysis of the timing data allowed us to place tighter constraints on the orbital decay rate. No statistically significant changes in their orbital periods were detected for all five hot Jupiters in systems HAT-P-23, KELT-1, KELT-16, WASP-18, and WASP-103. For planets HAT-P-23 b, WASP-18 b, and WASP-103 b, observations show that the mechanism of the dynamical tides dissipation probably does not operate in their host stars, preventing them from rapid orbital decay. This finding aligns with the models of stellar interiors of F-type stars, in which dynamical tides are not fully damped due to convective cores. For KELT-16 b, the span of transit timing data was not long enough to verify the theoretical predictions. KELT-1 b was identified as a potential laboratory for studying the dissipative tidal interactions of inertial waves in a convective layer. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: Accepted for publication in A&A

Journal ref: A&A 667, A127 (2022)

arXiv:2208.00406 [pdf, other]

Eco2AI: carbon emissions tracking of machine learning models as the first step towards sustainable AI

Authors: Semen Budennyy, Vladimir Lazarev, Nikita Zakharenko, Alexey Korovin, Olga Plosskaya, Denis Dimitrov, Vladimir Arkhipkin, Ivan Oseledets, Ivan Barsola, Ilya Egorov, Aleksandra Kosterina, Leonid Zhukov

Abstract: The size and complexity of deep neural networks continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an open-source package eco2AI to help data scientists and researchers to track energy consumption and equivalent CO2 emissions of their models in a straightforward way. In eco2AI we put emphasis on accuracy of energy co… ▽ More The size and complexity of deep neural networks continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an open-source package eco2AI to help data scientists and researchers to track energy consumption and equivalent CO2 emissions of their models in a straightforward way. In eco2AI we put emphasis on accuracy of energy consumption tracking and correct regional CO2 emissions accounting. We encourage research community to search for new optimal Artificial Intelligence (AI) architectures with a lower computational cost. The motivation also comes from the concept of AI-based green house gases sequestrating cycle with both Sustainable AI and Green AI pathways. △ Less

Submitted 3 August, 2022; v1 submitted 31 July, 2022; originally announced August 2022.

Comments: Source code for eco2AI package (energy consumption and carbon emission tracker of code in python) is available at: https://github.com/sb-ai-lab/Eco2AI , the package is also available at PyPi: https://pypi.org/project/eco2ai/

arXiv:2206.12395 [pdf, other]

Data Leakage in Federated Averaging

Authors: Dimitar I. Dimitrov, Mislav Balunović, Nikola Konstantinov, Martin Vechev

Abstract: Recent attacks have shown that user data can be recovered from FedSGD updates, thus breaking privacy. However, these attacks are of limited practical relevance as federated learning typically uses the FedAvg algorithm. Compared to FedSGD, recovering data from FedAvg updates is much harder as: (i) the updates are computed at unobserved intermediate network weights, (ii) a large number of batches ar… ▽ More Recent attacks have shown that user data can be recovered from FedSGD updates, thus breaking privacy. However, these attacks are of limited practical relevance as federated learning typically uses the FedAvg algorithm. Compared to FedSGD, recovering data from FedAvg updates is much harder as: (i) the updates are computed at unobserved intermediate network weights, (ii) a large number of batches are used, and (iii) labels and network weights vary simultaneously across client steps. In this work, we propose a new optimization-based attack which successfully attacks FedAvg by addressing the above challenges. First, we solve the optimization problem using automatic differentiation that forces a simulation of the client's update that generates the unobserved parameters for the recovered labels and inputs to match the received client update. Second, we address the large number of batches by relating images from different epochs with a permutation invariant prior. Third, we recover the labels by estimating the parameters of existing FedSGD attacks at every FedAvg step. On the popular FEMNIST dataset, we demonstrate that on average we successfully recover >45% of the client's images from realistic FedAvg updates computed on 10 local epochs of 10 batches each with 5 images, compared to only <10% using the baseline. Our findings show many real-world federated learning implementations based on FedAvg are vulnerable. △ Less

Submitted 1 November, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

ACM Class: I.2.11

arXiv:2205.04274 [pdf, other]

Detecting and Understanding Harmful Memes: A Survey

Authors: Shivam Sharma, Firoj Alam, Md. Shad Akhtar, Dimitar Dimitrov, Giovanni Da San Martino, Hamed Firooz, Alon Halevy, Fabrizio Silvestri, Preslav Nakov, Tanmoy Chakraborty

Abstract: The automatic identification of harmful content online is of major concern for social media platforms, policymakers, and society. Researchers have studied textual, visual, and audio content, but typically in isolation. Yet, harmful content often combines multiple modalities, as in the case of memes, which are of particular interest due to their viral nature. With this in mind, here we offer a comp… ▽ More The automatic identification of harmful content online is of major concern for social media platforms, policymakers, and society. Researchers have studied textual, visual, and audio content, but typically in isolation. Yet, harmful content often combines multiple modalities, as in the case of memes, which are of particular interest due to their viral nature. With this in mind, here we offer a comprehensive survey with a focus on harmful memes. Based on a systematic analysis of recent literature, we first propose a new typology of harmful memes, and then we highlight and summarize the relevant state of the art. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism, partly due to the lack of suitable datasets. We further find that existing datasets mostly capture multi-class scenarios, which are not inclusive of the affective spectrum that memes can represent. Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual, blending different cultures. We conclude by highlighting several challenges related to multimodal semiotics, technological constraints, and non-trivial social engagement, and we present several open-ended aspects such as delineating online harm and empirically examining related frameworks and assistive interventions, which we believe will motivate and drive future research. △ Less

Submitted 29 May, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: Accepted at IJCAI-ECAI 2022 (Survey Track) - Editorial Feedback Revised, 9 pages (7 main + 2 reference pages)

arXiv:2203.16872 [pdf, ps, other]

Group Control for Procedural Rules: Parameterized Complexity and Consecutive Domains

Authors: Yongjie Yang, Dinko Dimitrov

Abstract: We consider Group Control by Adding Individuals (GCAI) in the setting of group identification for two procedural rules -- the consensus-start-respecting rule and the liberal-start-respecting rule. It is known that GCAI for both rules are NP-hard, but whether they are fixed-parameter tractable with respect to the number of distinguished individuals remained open. We resolve both open problems in th… ▽ More We consider Group Control by Adding Individuals (GCAI) in the setting of group identification for two procedural rules -- the consensus-start-respecting rule and the liberal-start-respecting rule. It is known that GCAI for both rules are NP-hard, but whether they are fixed-parameter tractable with respect to the number of distinguished individuals remained open. We resolve both open problems in the affirmative. In addition, we strengthen the NP-hardness of GCAI by showing that, with respect to the natural parameter the number of added individuals, GCAI for both rules are W[2]-hard. Notably, the W[2]-hardness for the liberal-start-respecting rule holds even when restricted to a very special case where the qualifications of individuals satisfy the so-called consecutive ones property. However, for the consensus-start-respecting rule, the problem becomes polynomial-time solvable in this special case. We also study a dual restriction where the disqualifications of individuals fulfill the consecutive ones property, and show that under this restriction GCAI for both rules turn out to be polynomial-time solvable. Our reductions for showing W[2]-hardness also imply several lower bounds concerning kernelization and exact algorithms. △ Less

Submitted 26 January, 2023; v1 submitted 31 March, 2022; originally announced March 2022.

arXiv:2202.10784 [pdf, other]

RuCLIP -- new models and experiments: a technical report

Authors: Alex Shonenkov, Andrey Kuznetsov, Denis Dimitrov, Tatyana Shavrina, Daniil Chesakov, Anastasia Maltseva, Alena Fenogenova, Igor Pavlov, Anton Emelyanov, Sergey Markov, Daria Bakshandaeva, Vera Shybaeva, Andrey Chertok

Abstract: In the report we propose six new implementations of ruCLIP model trained on our 240M pairs. The accuracy results are compared with original CLIP model with Ru-En translation (OPUS-MT) on 16 datasets from different domains. Our best implementations outperform CLIP + OPUS-MT solution on most of the datasets in few-show and zero-shot tasks. In the report we briefly describe the implementations and co… ▽ More In the report we propose six new implementations of ruCLIP model trained on our 240M pairs. The accuracy results are compared with original CLIP model with Ru-En translation (OPUS-MT) on 16 datasets from different domains. Our best implementations outperform CLIP + OPUS-MT solution on most of the datasets in few-show and zero-shot tasks. In the report we briefly describe the implementations and concentrate on the conducted experiments. Inference execution time comparison is also presented in the report. △ Less

Submitted 22 February, 2022; originally announced February 2022.

arXiv:2202.10435 [pdf, ps, other]

Survey on Large Scale Neural Network Training

Authors: Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont

Abstract: Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good us… ▽ More Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good use of computation and communication resources on architectures with a single or several GPUs. We summarize the main categories of strategies and compare strategies within and across categories. Along with approaches proposed in the literature, we discuss available implementations. △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:2202.08827 [pdf, other]

LAMP: Extracting Text from Gradients with Language Model Priors

Authors: Mislav Balunović, Dimitar I. Dimitrov, Nikola Jovanović, Martin Vechev

Abstract: Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients… ▽ More Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients. Our attack is based on two key insights: (i) modeling prior text probability with an auxiliary language model, guiding the search towards more natural text, and (ii) alternating continuous and discrete optimization, which minimizes reconstruction loss on embeddings, while avoiding local minima by applying discrete text transformations. Our experiments demonstrate that LAMP is significantly more effective than prior work: it reconstructs 5x more bigrams and 23% longer subsequences on average. Moreover, we are the first to recover inputs from batch sizes larger than 1 for textual models. These findings indicate that gradient updates of models operating on textual data leak more information than previously thought. △ Less

Submitted 19 October, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

ACM Class: I.2.7; I.2.11

arXiv:2202.03046 [pdf, other]

A new face swap method for image and video domains: a technical report

Authors: Daniil Chesakov, Anastasia Maltseva, Alexander Groshev, Andrey Kuznetsov, Denis Dimitrov

Abstract: Deep fake technology became a hot field of research in the last few years. Researchers investigate sophisticated Generative Adversarial Networks (GAN), autoencoders, and other approaches to establish precise and robust algorithms for face swapping. Achieved results show that the deep fake unsupervised synthesis task has problems in terms of the visual quality of generated data. These problems usua… ▽ More Deep fake technology became a hot field of research in the last few years. Researchers investigate sophisticated Generative Adversarial Networks (GAN), autoencoders, and other approaches to establish precise and robust algorithms for face swapping. Achieved results show that the deep fake unsupervised synthesis task has problems in terms of the visual quality of generated data. These problems usually lead to high fake detection accuracy when an expert analyzes them. The first problem is that existing image-to-image approaches do not consider video domain specificity and frame-by-frame processing leads to face jittering and other clearly visible distortions. Another problem is the generated data resolution, which is low for many existing methods due to high computational complexity. The third problem appears when the source face has larger proportions (like bigger cheeks), and after replacement it becomes visible on the face border. Our main goal was to develop such an approach that could solve these problems and outperform existing solutions on a number of clue metrics. We introduce a new face swap pipeline that is based on FaceShifter architecture and fixes the problems stated above. With a new eye loss function, super-resolution block, and Gaussian-based face mask generation leads to improvements in quality which is confirmed during evaluation. △ Less

Submitted 7 February, 2022; originally announced February 2022.

arXiv:2202.00441 [pdf, other]

Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Authors: Georgii Novikov, Daniel Bershatsky, Julia Gusak, Alex Shonenkov, Denis Dimitrov, Ivan Oseledets

Abstract: Memory footprint is one of the main limiting factors for large neural network training. In backpropagation, one needs to store the input to each operation in the computational graph. Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of… ▽ More Memory footprint is one of the main limiting factors for large neural network training. In backpropagation, one needs to store the input to each operation in the computational graph. Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients. We propose a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions with only a few bits per each element. We show that such approximation can be achieved by computing optimal piecewise-constant approximation of the derivative of the activation function, which can be done by dynamic programming. The drop-in replacements are implemented for all popular nonlinearities and can be used in any existing pipeline. We confirm the memory reduction and the same convergence on several open benchmarks. △ Less

Submitted 2 February, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

Comments: Submitted

arXiv:2112.07395 [pdf, other]

Handwritten text generation and strikethrough characters augmentation

Authors: Alex Shonenkov, Denis Karachev, Max Novopoltsev, Mark Potanin, Denis Dimitrov, Andrey Chertok

Abstract: We introduce two data augmentation techniques, which, used with a Resnet-BiLSTM-CTC network, significantly reduce Word Error Rate (WER) and Character Error Rate (CER) beyond best-reported results on handwriting text recognition (HTR) tasks. We apply a novel augmentation that simulates strikethrough text (HandWritten Blots) and a handwritten text generation method based on printed text (StackMix),… ▽ More We introduce two data augmentation techniques, which, used with a Resnet-BiLSTM-CTC network, significantly reduce Word Error Rate (WER) and Character Error Rate (CER) beyond best-reported results on handwriting text recognition (HTR) tasks. We apply a novel augmentation that simulates strikethrough text (HandWritten Blots) and a handwritten text generation method based on printed text (StackMix), which proved to be very effective in HTR tasks. StackMix uses weakly-supervised framework to get character boundaries. Because these data augmentation techniques are independent of the network used, they could also be applied to enhance the performance of other networks and approaches to HTR. Extensive experiments on ten handwritten text datasets show that HandWritten Blots augmentation and StackMix significantly improve the quality of HTR models △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: 16 pages, 15 figures. arXiv admin note: substantial text overlap with arXiv:2108.11667

MSC Class: 68-04 ACM Class: I.7.5; I.4.6

arXiv:2112.02448 [pdf, other]

Emojich -- zero-shot emoji generation using Russian language: a technical report

Authors: Alex Shonenkov, Daria Bakshandaeva, Denis Dimitrov, Aleksandr Nikolich

Abstract: This technical report presents a text-to-image neural network "Emojich" that generates emojis using captions in Russian language as a condition. We aim to keep the generalization ability of a pretrained big model ruDALL-E Malevich (XL) 1.3B parameters at the fine-tuning stage, while giving special style to the images generated. Here are presented some engineering methods, code realization, all hyp… ▽ More This technical report presents a text-to-image neural network "Emojich" that generates emojis using captions in Russian language as a condition. We aim to keep the generalization ability of a pretrained big model ruDALL-E Malevich (XL) 1.3B parameters at the fine-tuning stage, while giving special style to the images generated. Here are presented some engineering methods, code realization, all hyper-parameters for reproducing results and a Telegram bot where everyone can create their own customized sets of stickers. Also, some newly generated emojis obtained by "Emojich" model are demonstrated. △ Less

Submitted 12 January, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

Comments: 5 pages, 4 figures and big figure at appendix, technical report

arXiv:2111.10974 [pdf, other]

Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture

Authors: Daria Bakshandaeva, Denis Dimitrov, Vladimir Arkhipkin, Alex Shonenkov, Mark Potanin, Denis Karachev, Andrey Kuznetsov, Anton Voronov, Vera Davydova, Elena Tutubalina, Aleksandr Petiushko

Abstract: Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called Fusion Brain, the first competition which is targeted to make the universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The Fusion Brain Challenge combines the following specific tasks: Code2code Transl… ▽ More Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called Fusion Brain, the first competition which is targeted to make the universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The Fusion Brain Challenge combines the following specific tasks: Code2code Translation, Handwritten Text recognition, Zero-shot Object Detection, and Visual Question Answering. We have created datasets for each task to test the participants' submissions on it. Moreover, we have collected and made publicly available a new handwritten dataset in both English and Russian, which consists of 94,128 pairs of images and texts. We also propose a multimodal and multitask architecture - a baseline solution, in the center of which is a frozen foundation model and which has been trained in Fusion mode along with Single-task mode. The proposed Fusion approach proves to be competitive and more energy-efficient compared to the task-specific one. △ Less

Submitted 28 December, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

arXiv:2111.04706 [pdf, other]

Bayesian Framework for Gradient Leakage

Authors: Mislav Balunović, Dimitar I. Dimitrov, Robin Staab, Martin Vechev

Abstract: Federated learning is an established method for training machine learning models without sharing training data. However, recent work has shown that it cannot guarantee data privacy as shared gradients can still leak sensitive information. To formalize the problem of gradient leakage, we propose a theoretical framework that enables, for the first time, analysis of the Bayes optimal adversary phrase… ▽ More Federated learning is an established method for training machine learning models without sharing training data. However, recent work has shown that it cannot guarantee data privacy as shared gradients can still leak sensitive information. To formalize the problem of gradient leakage, we propose a theoretical framework that enables, for the first time, analysis of the Bayes optimal adversary phrased as an optimization problem. We demonstrate that existing leakage attacks can be seen as approximations of this optimal adversary with different assumptions on the probability distributions of the input data and gradients. Our experiments confirm the effectiveness of the Bayes optimal adversary when it has knowledge of the underlying distribution. Further, our experimental evaluation shows that several existing heuristic defenses are not effective against stronger attacks, especially early in the training process. Thus, our findings indicate that the construction of more effective defenses and their evaluation remains an open problem. △ Less

Submitted 17 March, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

arXiv:2110.14712 [pdf, other]

Complete characterization of the minimal-ABC trees

Authors: Darko Dimitrov, Zhibin Du

Abstract: The problem of characterizing trees with minimal atom-bond-connectivity index (minimal-ABC trees) has a reputation as one of the most demanding recent open optimization problems in mathematical chemistry. Here firstly, we give an affirmative answer to the conjecture, which states that enough large minimal-ABC trees are comprised solely of a root vertex and so-called $D_z$- and $D_{z+1}$-branches.… ▽ More The problem of characterizing trees with minimal atom-bond-connectivity index (minimal-ABC trees) has a reputation as one of the most demanding recent open optimization problems in mathematical chemistry. Here firstly, we give an affirmative answer to the conjecture, which states that enough large minimal-ABC trees are comprised solely of a root vertex and so-called $D_z$- and $D_{z+1}$-branches. Based on the presented theoretical results here and some already known results, we obtain enough constraints to reduce the search space and solve the optimization problem, and thus, determine exactly the minimal-ABC trees of a given arbitrary order. △ Less

Submitted 19 January, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

arXiv:2110.14294 [pdf, ps, other]

doi 10.1051/0004-6361/202142424

Revisiting TrES-5 b: departure from a linear ephemeris instead of short-period transit timing variation

Authors: G. Maciejewski, M. Fernandez, F. Aceituno, J. L. Ramos, D. Dimitrov, Z. Donchev, J. Ohlert

Abstract: The orbital motion of the transiting hot Jupiter TrES-5 b was reported to be perturbed by a planetary companion on a nearby orbit. Such compact systems do not frequently occur in nature, and learning their orbital architecture could shed some light on hot Jupiters' formation processes. We acquired fifteen new precise photometric time series for twelve transits of TrES-5 b between June 2019 and Oct… ▽ More The orbital motion of the transiting hot Jupiter TrES-5 b was reported to be perturbed by a planetary companion on a nearby orbit. Such compact systems do not frequently occur in nature, and learning their orbital architecture could shed some light on hot Jupiters' formation processes. We acquired fifteen new precise photometric time series for twelve transits of TrES-5 b between June 2019 and October 2020 using 0.9-2.0 m telescopes. The method of precise transit timing was employed to verify the deviation of the planet from the Keplerian motion. Although our results show no detectable short-time variation in the orbital period of TrES-5 b and the existence of the additional nearby planet is not confirmed, the new transits were observed about two minutes earlier than expected. We conclude that the orbital period of the planet could vary in a long timescale. We found that the most likely explanation of the observations is the line-of-sight acceleration of the system's barycentre due to the orbital motion induced by a massive, wide-orbiting companion. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: Accepted for publication in A&A

Journal ref: A&A 656, A88 (2021)

arXiv:2110.00413 [pdf, other]

Detecting Harmful Memes and Their Targets

Authors: Shraman Pramanick, Dimitar Dimitrov, Rituparna Mukherjee, Shivam Sharma, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

Abstract: Among the various modes of communication in social media, the use of Internet memes has emerged as a powerful means to convey political, psychological, and socio-cultural opinions. Although memes are typically humorous in nature, recent days have witnessed a proliferation of harmful memes targeted to abuse various social entities. As most harmful memes are highly satirical and abstruse without app… ▽ More Among the various modes of communication in social media, the use of Internet memes has emerged as a powerful means to convey political, psychological, and socio-cultural opinions. Although memes are typically humorous in nature, recent days have witnessed a proliferation of harmful memes targeted to abuse various social entities. As most harmful memes are highly satirical and abstruse without appropriate contexts, off-the-shelf multimodal models may not be adequate to understand their underlying semantics. In this work, we propose two novel problem formulations: detecting harmful memes and the social entities that these harmful memes target. To this end, we present HarMeme, the first benchmark dataset, containing 3,544 memes related to COVID-19. Each meme went through a rigorous two-stage annotation process. In the first stage, we labeled a meme as very harmful, partially harmful, or harmless; in the second stage, we further annotated the type of target(s) that each harmful meme points to: individual, organization, community, or society/general public/other. The evaluation results using ten unimodal and multimodal models highlight the importance of using multimodal signals for both tasks. We further discuss the limitations of these models and we argue that more research is needed to address these problems. △ Less

Submitted 24 September, 2021; originally announced October 2021.

Comments: harmful memes, multimodality, social media

MSC Class: 68T50 ACM Class: F.2.2; I.2.7

Journal ref: ACL-2021 (Findings)

arXiv:2109.08013 [pdf, other]

Detecting Propaganda Techniques in Memes

Authors: Dimitar Dimitrov, Bishr Bin Ali, Shaden Shaar, Firoj Alam, Fabrizio Silvestri, Hamed Firooz, Preslav Nakov, Giovanni Da San Martino

Abstract: Propaganda can be defined as a form of communication that aims to influence the opinions or the actions of people towards a specific goal; this is achieved by means of well-defined rhetorical and psychological devices. Propaganda, in the form we know it today, can be dated back to the beginning of the 17th century. However, it is with the advent of the Internet and the social media that it has sta… ▽ More Propaganda can be defined as a form of communication that aims to influence the opinions or the actions of people towards a specific goal; this is achieved by means of well-defined rhetorical and psychological devices. Propaganda, in the form we know it today, can be dated back to the beginning of the 17th century. However, it is with the advent of the Internet and the social media that it has started to spread on a much larger scale than before, thus becoming major societal and political issue. Nowadays, a large fraction of propaganda in social media is multimodal, mixing textual with visual content. With this in mind, here we propose a new multi-label multimodal task: detecting the type of propaganda techniques used in memes. We further create and release a new corpus of 950 memes, carefully annotated with 22 propaganda techniques, which can appear in the text, in the image, or in both. Our analysis of the corpus shows that understanding both modalities together is essential for detecting these techniques. This is further confirmed in our experiments with several state-of-the-art multimodal models. △ Less

Submitted 7 August, 2021; originally announced September 2021.

Comments: propaganda, disinformation, fake news, memes, multimodality. arXiv admin note: text overlap with arXiv:2105.09284

MSC Class: 68T50 ACM Class: I.2.7

Journal ref: ACL-2021

arXiv:2109.05184 [pdf, other]

MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets

Authors: Shraman Pramanick, Shivam Sharma, Dimitar Dimitrov, Md Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

Abstract: Internet memes have become powerful means to transmit political, psychological, and socio-cultural ideas. Although memes are typically humorous, recent days have witnessed an escalation of harmful memes used for trolling, cyberbullying, and abuse. Detecting such memes is challenging as they can be highly satirical and cryptic. Moreover, while previous work has focused on specific aspects of memes… ▽ More Internet memes have become powerful means to transmit political, psychological, and socio-cultural ideas. Although memes are typically humorous, recent days have witnessed an escalation of harmful memes used for trolling, cyberbullying, and abuse. Detecting such memes is challenging as they can be highly satirical and cryptic. Moreover, while previous work has focused on specific aspects of memes such as hate speech and propaganda, there has been little work on harm in general. Here, we aim to bridge this gap. We focus on two tasks: (i)detecting harmful memes, and (ii)identifying the social entities they target. We further extend a recently released HarMeme dataset, which covered COVID-19, with additional memes and a new topic: US politics. To solve these tasks, we propose MOMENTA (MultimOdal framework for detecting harmful MemEs aNd Their tArgets), a novel multimodal deep neural network that uses global and local perspectives to detect harmful memes. MOMENTA systematically analyzes the local and the global perspective of the input meme (in both modalities) and relates it to the background context. MOMENTA is interpretable and generalizable, and our experiments show that it outperforms several strong rivaling approaches. △ Less

Submitted 22 September, 2021; v1 submitted 11 September, 2021; originally announced September 2021.

Comments: The paper has been accepted in the Findings of Empirical Methods in Natural Language Processing (EMNLP), 2021

arXiv:2109.00542 [pdf, other]

Shared Certificates for Neural Network Verification

Authors: Marc Fischer, Christian Sprecher, Dimitar I. Dimitrov, Gagandeep Singh, Martin Vechev

Abstract: Existing neural network verifiers compute a proof that each input is handled correctly under a given perturbation by propagating a symbolic abstraction of reachable values at each layer. This process is repeated from scratch independently for each input (e.g., image) and perturbation (e.g., rotation), leading to an expensive overall proof effort when handling an entire dataset. In this work, we in… ▽ More Existing neural network verifiers compute a proof that each input is handled correctly under a given perturbation by propagating a symbolic abstraction of reachable values at each layer. This process is repeated from scratch independently for each input (e.g., image) and perturbation (e.g., rotation), leading to an expensive overall proof effort when handling an entire dataset. In this work, we introduce a new method for reducing this verification cost without losing precision based on a key insight that abstractions obtained at intermediate layers for different inputs and perturbations can overlap or contain each other. Leveraging our insight, we introduce the general concept of shared certificates, enabling proof effort reuse across multiple inputs to reduce overall verification costs. We perform an extensive experimental evaluation to demonstrate the effectiveness of shared certificates in reducing the verification cost on a range of datasets and attack specifications on image classifiers including the popular patch and geometric perturbations. We release our implementation at https://github.com/eth-sri/proof-sharing. △ Less

Submitted 23 November, 2023; v1 submitted 1 September, 2021; originally announced September 2021.

Comments: Extended version of our CAV'22 paper

arXiv:2108.11667 [pdf, other]

StackMix and Blot Augmentations for Handwritten Text Recognition

Authors: Alex Shonenkov, Denis Karachev, Maxim Novopoltsev, Mark Potanin, Denis Dimitrov

Abstract: This paper proposes a handwritten text recognition(HTR) system that outperforms current state-of-the-artmethods. The comparison was carried out on three of themost frequently used in HTR task datasets, namely Ben-tham, IAM, and Saint Gall. In addition, the results on tworecently presented datasets, Peter the Greats manuscriptsand HKR Dataset, are provided.The paper describes the architecture of th… ▽ More This paper proposes a handwritten text recognition(HTR) system that outperforms current state-of-the-artmethods. The comparison was carried out on three of themost frequently used in HTR task datasets, namely Ben-tham, IAM, and Saint Gall. In addition, the results on tworecently presented datasets, Peter the Greats manuscriptsand HKR Dataset, are provided.The paper describes the architecture of the neural net-work and two ways of increasing the volume of train-ing data: augmentation that simulates strikethrough text(HandWritten Blots) and a new text generation method(StackMix), which proved to be very effective in HTR tasks.StackMix can also be applied to the standalone task of gen-erating handwritten text based on printed text. △ Less

Submitted 26 August, 2021; originally announced August 2021.

Comments: 17 pages, 9 figures

MSC Class: 68-04 ACM Class: I.7.5; I.4.6

arXiv:2106.13356 [pdf, other]

doi 10.1063/5.0060151

Monte Carlo Modeling of Spin-polarized Photoemission from p-doped GaAs Activated to Negative Electron Affinity

Authors: Oksana Chubenko, Siddharth Karkare, Dimitre Dimitrov, Jai Kwan Bae, Luca Cultrera, Ivan Bazarov, Andrei Afanasev

Abstract: The anticorrelation between quantum efficiency (QE) and electron spin polarization (ESP) from a p-doped GaAs activated to negative electron affinity (NEA) is studied in detail using an ensemble Monte Carlo approach. The photoabsorption, momentum and spin relaxation during transport, and tunnelling of electrons through the surface potential barrier are modeled to identify fundamental mechanisms, wh… ▽ More The anticorrelation between quantum efficiency (QE) and electron spin polarization (ESP) from a p-doped GaAs activated to negative electron affinity (NEA) is studied in detail using an ensemble Monte Carlo approach. The photoabsorption, momentum and spin relaxation during transport, and tunnelling of electrons through the surface potential barrier are modeled to identify fundamental mechanisms, which limit the efficiency of GaAs spin-polarized electron sources. In particular, we study the response of QE and ESP to various parameters such as the photoexcitation energy, doping density, and electron affinity level. Our modeling results for various transport and emission characteristics are in a good agreement with available experimental data. Our findings show that the behaviour of both QE and ESP at room temperature can be fully explained by the bulk relaxation mechanisms and the time which electrons spend in the material before being emitted. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Journal ref: Journal of Applied Physics 130, 063101 (2021)

arXiv:2105.09284 [pdf, other]

SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images

Authors: Dimitar Dimitrov, Bishr Bin Ali, Shaden Shaar, Firoj Alam, Fabrizio Silvestri, Hamed Firooz, Preslav Nakov, Giovanni Da San Martino

Abstract: We describe SemEval-2021 task 6 on Detection of Persuasion Techniques in Texts and Images: the data, the annotation guidelines, the evaluation setup, the results, and the participating systems. The task focused on memes and had three subtasks: (i) detecting the techniques in the text, (ii) detecting the text spans where the techniques are used, and (iii) detecting techniques in the entire meme, i.… ▽ More We describe SemEval-2021 task 6 on Detection of Persuasion Techniques in Texts and Images: the data, the annotation guidelines, the evaluation setup, the results, and the participating systems. The task focused on memes and had three subtasks: (i) detecting the techniques in the text, (ii) detecting the text spans where the techniques are used, and (iii) detecting techniques in the entire meme, i.e., both in the text and in the image. It was a popular task, attracting 71 registrations, and 22 teams that eventually made an official submission on the test set. The evaluation results for the third subtask confirmed the importance of both modalities, the text and the image. Moreover, some teams reported benefits when not just combining the two modalities, e.g., by using early or late fusion, but rather modeling the interaction between them in a joint model. △ Less

Submitted 25 April, 2021; originally announced May 2021.

Comments: propaganda, disinformation, misinformation, fake news, memes, multimodality

MSC Class: 68T50 ACM Class: F.2.2; I.2.7

Journal ref: SemEval-2021

arXiv:2103.12541 [pdf, other]

A Survey on Multimodal Disinformation Detection

Authors: Firoj Alam, Stefano Cresci, Tanmoy Chakraborty, Fabrizio Silvestri, Dimiter Dimitrov, Giovanni Da San Martino, Shaden Shaar, Hamed Firooz, Preslav Nakov

Abstract: Recent years have witnessed the proliferation of offensive content online such as fake news, propaganda, misinformation, and disinformation. While initially this was mostly about textual content, over time images and videos gained popularity, as they are much easier to consume, attract more attention, and spread further than text. As a result, researchers started leveraging different modalities an… ▽ More Recent years have witnessed the proliferation of offensive content online such as fake news, propaganda, misinformation, and disinformation. While initially this was mostly about textual content, over time images and videos gained popularity, as they are much easier to consume, attract more attention, and spread further than text. As a result, researchers started leveraging different modalities and combinations thereof to tackle online multimodal offensive content. In this study, we offer a survey on the state-of-the-art on multimodal disinformation detection covering various combinations of modalities: text, images, speech, video, social media network structure, and temporal information. Moreover, while some studies focused on factuality, others investigated how harmful the content is. While these two components in the definition of disinformation (i) factuality, and (ii) harmfulness, are equally important, they are typically studied in isolation. Thus, we argue for the need to tackle disinformation detection by taking into account multiple modalities as well as both factuality and harmfulness, in the same framework. Finally, we discuss current challenges and future research directions △ Less

Submitted 28 September, 2022; v1 submitted 13 March, 2021; originally announced March 2021.

Comments: Accepted at COLING-2022, disinformation, misinformation, factuality, harmfulness, fake news, propaganda, multimodality, text, images, videos, network structure, temporality

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2103.09354 [pdf, other]

Digital Peter: Dataset, Competition and Handwriting Recognition Methods

Authors: Mark Potanin, Denis Dimitrov, Alex Shonenkov, Vladimir Bataev, Denis Karachev, Maxim Novopoltsev

Abstract: This paper presents a new dataset of Peter the Great's manuscripts and describes a segmentation procedure that converts initial images of documents into the lines. The new dataset may be useful for researchers to train handwriting text recognition models as a benchmark for comparing different models. It consists of 9 694 images and text files corresponding to lines in historical documents. The ope… ▽ More This paper presents a new dataset of Peter the Great's manuscripts and describes a segmentation procedure that converts initial images of documents into the lines. The new dataset may be useful for researchers to train handwriting text recognition models as a benchmark for comparing different models. It consists of 9 694 images and text files corresponding to lines in historical documents. The open machine learning competition Digital Peter was held based on the considered dataset. The baseline solution for this competition as well as more advanced methods on handwritten text recognition are described in the article. Full dataset and all code are publicly available. △ Less

Submitted 27 August, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: 17 pages, 7 figures, submitted to ICDAR 2021

ACM Class: I.7.5; I.4.6

arXiv:2011.09798 [pdf]

Giant anisotropy of the magnetocaloric effect in the orthovanadate TbVO4 single crystals

Authors: Mohamed Balli, Sabeur Mansouri, Dimitre Z. Dimitrov, Patrick Fournier, Serge Jandl, Jenh-Yih Juang

Abstract: It is known that the Zircon-type orthovanadates RVO4 show promise in many different applications as catalysts and optical materials. In this work, we demonstrate that the TbVO4 compound can be also used as magnetic refrigerant in efficient and ecofriendly cryocoolers due to its strong magnetocaloric effect at low temperature regime. The application of a relatively low magnetic field of 2 T along t… ▽ More It is known that the Zircon-type orthovanadates RVO4 show promise in many different applications as catalysts and optical materials. In this work, we demonstrate that the TbVO4 compound can be also used as magnetic refrigerant in efficient and ecofriendly cryocoolers due to its strong magnetocaloric effect at low temperature regime. The application of a relatively low magnetic field of 2 T along the easy magnetization axis (a) gives rise to a maximum entropy change of about 20 J/kg K at 4 K. More interestingly, under sufficiently high magnetic fields, the isothermal entropy change -ΔST remains approximately constant over a wide temperature range which is highly appreciated from a practical point of view. In the magnetic field change of 7 T, -ΔST that reaches roughly 22 J/kg K remains practically unchanged between 0 and 34 K leading to an outstanding refrigerant capacity of about 823 J/kg. On the other hand, the lowering of crystallographic symmetry from the tetragonal to the orthorhombic structure occurring close to 33 K as confirmed by Raman scattering data results in a strong magnetic anisotropy. Accordingly, strong thermal effects can be also obtained simply by spinning the TbVO4 single crystals between their hard and easy orientations in constant magnetic fields instead the standard magnetization-demagnetization process. Such rotating magnetocaloric effects would open the way for the implementation of TbVO4 in a new generation of compact and simplified magnetic refrigerators that can be dedicated to the liquefaction of hydrogen and helium. △ Less

Submitted 19 November, 2020; originally announced November 2020.

Comments: A revised version of this paper is accepted for publication in Physical Review Materials, 2020

Showing 1–50 of 164 results for author: Dimitrov, D