Search | arXiv e-print repository

"Vorbeşti Româneşte?" A Recipe to Train Powerful Romanian LLMs with English Instructions

Authors: Mihai Masala, Denis C. Ilie-Ablachim, Alexandru Dima, Dragos Corlatescu, Miruna Zavelca, Ovio Olaru, Simina Terian, Andrei Terian, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian Rebedea

Abstract: In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English; hence, their performance in English greatly exceeds other languages. To our knowledge, we are the first to collect and translate a large collection of texts, instructions, and benchmarks and trai… ▽ More In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English; hence, their performance in English greatly exceeds other languages. To our knowledge, we are the first to collect and translate a large collection of texts, instructions, and benchmarks and train, evaluate, and release open-source LLMs tailored for Romanian. We evaluate our methods on four different categories, including academic benchmarks, MT-Bench (manually translated), and a professionally built historical, cultural, and social benchmark adapted to Romanian. We argue for the usefulness and high performance of RoLLMs by obtaining state-of-the-art results across the board. We publicly release all resources (i.e., data, training and evaluation code, models) to support and encourage research on Romanian LLMs while concurrently creating a generalizable recipe, adequate for other low or less-resourced languages. △ Less

Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2405.07703

arXiv:2405.07703 [pdf, other]

OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs

Authors: Mihai Masala, Denis C. Ilie-Ablachim, Dragos Corlatescu, Miruna Zavelca, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian Rebedea

Abstract: In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English. Hence, their performance in English greatly exceeds their performance in other languages. This document presents our approach to training and evaluating the first foundational and chat LLM specia… ▽ More In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English. Hence, their performance in English greatly exceeds their performance in other languages. This document presents our approach to training and evaluating the first foundational and chat LLM specialized for Romanian. △ Less

Submitted 17 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2402.16197 [pdf]

Language Models for Code Completion: A Practical Evaluation

Authors: Maliheh Izadi, Jonathan Katzy, Tim van Dam, Marc Otten, Razvan Mihai Popescu, Arie van Deursen

Abstract: Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public code language models when completing real-world code. We first developed an open-source IDE extension, Code4Me, for the online evaluation of the models. We collect… ▽ More Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public code language models when completing real-world code. We first developed an open-source IDE extension, Code4Me, for the online evaluation of the models. We collected real auto-completion usage data for over a year from more than 1200 users, resulting in over 600K valid completions. These models were then evaluated using six standard metrics across twelve programming languages. Next, we conducted a qualitative study of 1690 real-world completion requests to identify the reasons behind the poor model performance. A comparative analysis of the models' performance in online and offline settings was also performed, using benchmark synthetic datasets and two masking strategies. Our findings suggest that while developers utilize code completion across various languages, the best results are achieved for mainstream languages such as Python and Java. InCoder outperformed the other models across all programming languages, highlighting the significance of training data and objectives. Our study also revealed that offline evaluations do not accurately reflect real-world scenarios. Upon qualitative analysis of the model's predictions, we found that 66.3% of failures were due to the models' limitations, 24.4% occurred due to inappropriate model usage in a development context, and 9.3% were valid requests that developers overwrote. Given these findings, we propose several strategies to overcome the current limitations. These include refining training objectives, improving resilience to typographical errors, adopting hybrid approaches, and enhancing implementations and usability. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: To be published in the proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE 2024)

arXiv:2306.12041 [pdf, other]

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

Authors: Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Radu Tudor Ionescu, Marius Popescu, Fahad Shahbaz Khan, Mubarak Shah

Abstract: We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level. The novelty of the proposed model is threefold. First, we introduce an approach to weight tokens based on motion gradients, thus shifting the focus from the static background scene to the foreground objects. Second, we integrate a teacher decoder and a student de… ▽ More We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level. The novelty of the proposed model is threefold. First, we introduce an approach to weight tokens based on motion gradients, thus shifting the focus from the static background scene to the foreground objects. Second, we integrate a teacher decoder and a student decoder into our architecture, leveraging the discrepancy between the outputs given by the two decoders to improve anomaly detection. Third, we generate synthetic abnormal events to augment the training videos, and task the masked AE model to jointly reconstruct the original frames (without anomalies) and the corresponding pixel-level anomaly maps. Our design leads to an efficient and effective model, as demonstrated by the extensive experiments carried out on four benchmarks: Avenue, ShanghaiTech, UBnormal and UCSD Ped2. The empirical results show that our model achieves an excellent trade-off between speed and accuracy, obtaining competitive AUC scores, while processing 1655 FPS. Hence, our model is between 8 and 70 times faster than competing methods. We also conduct an ablation study to justify our design. Our code is freely available at: https://github.com/ristea/aed-mae. △ Less

Submitted 9 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

Comments: Accepted at CVPR 2024

arXiv:2207.03477 [pdf, other]

VeriDark: A Large-Scale Benchmark for Authorship Verification on the Dark Web

Authors: Andrei Manolache, Florin Brad, Antonio Barbalau, Radu Tudor Ionescu, Marius Popescu

Abstract: The DarkWeb represents a hotbed for illicit activity, where users communicate on different market forums in order to exchange goods and services. Law enforcement agencies benefit from forensic tools that perform authorship analysis, in order to identify and profile users based on their textual content. However, authorship analysis has been traditionally studied using corpora featuring literary tex… ▽ More The DarkWeb represents a hotbed for illicit activity, where users communicate on different market forums in order to exchange goods and services. Law enforcement agencies benefit from forensic tools that perform authorship analysis, in order to identify and profile users based on their textual content. However, authorship analysis has been traditionally studied using corpora featuring literary texts such as fragments from novels or fan fiction, which may not be suitable in a cybercrime context. Moreover, the few works that employ authorship analysis tools for cybercrime prevention usually employ ad-hoc experimental setups and datasets. To address these issues, we release VeriDark: a benchmark comprised of three large scale authorship verification datasets and one authorship identification dataset obtained from user activity from either Dark Web related Reddit communities or popular illicit Dark Web market forums. We evaluate competitive NLP baselines on the three datasets and perform an analysis of the predictions to better understand the limitations of such approaches. We make the datasets and baselines publicly available at https://github.com/bit-ml/VeriDark △ Less

Submitted 1 November, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

Comments: Accepted at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks. 21 pages, 4 figures, 11 tables

arXiv:2201.12216 [pdf, other]

Self-paced learning to improve text row detection in historical documents with missing labels

Authors: Mihaela Gaman, Lida Ghadamiyan, Radu Tudor Ionescu, Marius Popescu

Abstract: An important preliminary step of optical character recognition systems is the detection of text rows. To address this task in the context of historical data with missing labels, we propose a self-paced learning algorithm capable of improving the row detection performance. We conjecture that pages with more ground-truth bounding boxes are less likely to have missing annotations. Based on this hypot… ▽ More An important preliminary step of optical character recognition systems is the detection of text rows. To address this task in the context of historical data with missing labels, we propose a self-paced learning algorithm capable of improving the row detection performance. We conjecture that pages with more ground-truth bounding boxes are less likely to have missing annotations. Based on this hypothesis, we sort the training examples in descending order with respect to the number of ground-truth bounding boxes, and organize them into k batches. Using our self-paced learning method, we train a row detector over k iterations, progressively adding batches with less ground-truth annotations. At each iteration, we combine the ground-truth bounding boxes with pseudo-bounding boxes (bounding boxes predicted by the model itself) using non-maximum suppression, and we include the resulting annotations at the next training iteration. We demonstrate that our self-paced learning strategy brings significant performance gains on two data sets of historical documents, improving the average precision of YOLOv4 with more than 12% on one data set and 39% on the other. △ Less

Submitted 15 August, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

Comments: Accepted at ECCV Workshop on Text in Everything (TiE 2022)

arXiv:2112.05125 [pdf, other]

Rethinking the Authorship Verification Experimental Setups

Authors: Florin Brad, Andrei Manolache, Elena Burceanu, Antonio Barbalau, Radu Ionescu, Marius Popescu

Abstract: One of the main drivers of the recent advances in authorship verification is the PAN large-scale authorship dataset. Despite generating significant progress in the field, inconsistent performance differences between the closed and open test sets have been reported. To this end, we improve the experimental setup by proposing five new public splits over the PAN dataset, specifically designed to isol… ▽ More One of the main drivers of the recent advances in authorship verification is the PAN large-scale authorship dataset. Despite generating significant progress in the field, inconsistent performance differences between the closed and open test sets have been reported. To this end, we improve the experimental setup by proposing five new public splits over the PAN dataset, specifically designed to isolate and identify biases related to the text topic and to the author's writing style. We evaluate several BERT-like baselines on these splits, showing that such models are competitive with authorship verification state-of-the-art methods. Furthermore, using explainable AI, we find that these baselines are biased towards named entities. We show that models trained without the named entities obtain better results and generalize better when tested on DarkReddit, our new dataset for authorship verification. △ Less

Submitted 1 November, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

Comments: Accepted as a short paper at the EMNLP 2022 conference. 10 pages, 5 figures, 9 tables

arXiv:2109.01745 [pdf, other]

A realistic approach to generate masked faces applied on two novel masked face recognition data sets

Authors: Tudor Mare, Georgian Duta, Mariana-Iuliana Georgescu, Adrian Sandru, Bogdan Alexe, Marius Popescu, Radu Tudor Ionescu

Abstract: The COVID-19 pandemic raises the problem of adapting face recognition systems to the new reality, where people may wear surgical masks to cover their noses and mouths. Traditional data sets (e.g., CelebA, CASIA-WebFace) used for training these systems were released before the pandemic, so they now seem unsuited due to the lack of examples of people wearing masks. We propose a method for enhancing… ▽ More The COVID-19 pandemic raises the problem of adapting face recognition systems to the new reality, where people may wear surgical masks to cover their noses and mouths. Traditional data sets (e.g., CelebA, CASIA-WebFace) used for training these systems were released before the pandemic, so they now seem unsuited due to the lack of examples of people wearing masks. We propose a method for enhancing data sets containing faces without masks by creating synthetic masks and overlaying them on faces in the original images. Our method relies on SparkAR Studio, a developer program made by Facebook that is used to create Instagram face filters. In our approach, we use 9 masks of different colors, shapes and fabrics. We employ our method to generate a number of 445,446 (90%) samples of masks for the CASIA-WebFace data set and 196,254 (96.8%) masks for the CelebA data set, releasing the mask images at https://github.com/securifai/masked_faces. We show that our method produces significantly more realistic training examples of masks overlaid on faces by asking volunteers to qualitatively compare it to other methods or data sets designed for the same task. We also demonstrate the usefulness of our method by evaluating state-of-the-art face recognition systems (FaceNet, VGG-face, ArcFace) trained on our enhanced data sets and showing that they outperform equivalent systems trained on original data sets (containing faces without masks) or competing data sets (containing masks generated by related methods), when the test benchmarks contain masked faces. △ Less

Submitted 25 October, 2021; v1 submitted 3 September, 2021; originally announced September 2021.

Comments: Accepted at NeurIPS 2021

arXiv:2107.05754 [pdf, other]

EvoBA: An Evolution Strategy as a Strong Baseline forBlack-Box Adversarial Attacks

Authors: Andrei Ilie, Marius Popescu, Alin Stefanescu

Abstract: Recent work has shown how easily white-box adversarial attacks can be applied to state-of-the-art image classifiers. However, real-life scenarios resemble more the black-box adversarial conditions, lacking transparency and usually imposing natural, hard constraints on the query budget. We propose $\textbf{EvoBA}$, a black-box adversarial attack based on a surprisingly simple evolutionary search… ▽ More Recent work has shown how easily white-box adversarial attacks can be applied to state-of-the-art image classifiers. However, real-life scenarios resemble more the black-box adversarial conditions, lacking transparency and usually imposing natural, hard constraints on the query budget. We propose $\textbf{EvoBA}$, a black-box adversarial attack based on a surprisingly simple evolutionary search strategy. $\textbf{EvoBA}$ is query-efficient, minimizes $L_0$ adversarial perturbations, and does not require any form of training. $\textbf{EvoBA}$ shows efficiency and efficacy through results that are in line with much more complex state-of-the-art black-box attacks such as $\textbf{AutoZOOM}$. It is more query-efficient than $\textbf{SimBA}$, a simple and powerful baseline black-box attack, and has a similar level of complexity. Therefore, we propose it both as a new strong baseline for black-box adversarial attacks and as a fast and general tool for gaining empirical insight into how robust image classifiers are with respect to $L_0$ adversarial perturbations. There exist fast and reliable $L_2$ black-box attacks, such as $\textbf{SimBA}$, and $L_{\infty}$ black-box attacks, such as $\textbf{DeepSearch}$. We propose $\textbf{EvoBA}$ as a query-efficient $L_0$ black-box adversarial attack which, together with the aforementioned methods, can serve as a generic tool to assess the empirical robustness of image classifiers. The main advantages of such methods are that they run fast, are query-efficient, and can easily be integrated in image classifiers development pipelines. While our attack minimises the $L_0$ adversarial perturbation, we also report $L_2$, and notice that we compare favorably to the state-of-the-art $L_2$ black-box attack, $\textbf{AutoZOOM}$, and of the $L_2$ strong baseline, $\textbf{SimBA}$. △ Less

Submitted 12 July, 2021; originally announced July 2021.

arXiv:2011.07491 [pdf, other]

Anomaly Detection in Video via Self-Supervised and Multi-Task Learning

Authors: Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah

Abstract: Anomaly detection in video is a challenging computer vision problem. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without full supervision. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level. We first utilize a pre-trained detector to detect objects. The… ▽ More Anomaly detection in video is a challenging computer vision problem. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without full supervision. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level. We first utilize a pre-trained detector to detect objects. Then, we train a 3D convolutional neural network to produce discriminative anomaly-specific information by jointly learning multiple proxy tasks: three self-supervised and one based on knowledge distillation. The self-supervised tasks are: (i) discrimination of forward/backward moving objects (arrow of time), (ii) discrimination of objects in consecutive/intermittent frames (motion irregularity) and (iii) reconstruction of object-specific appearance information. The knowledge distillation task takes into account both classification and detection information, generating large prediction discrepancies between teacher and student models when anomalies occur. To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture. Our lightweight architecture outperforms the state-of-the-art methods on three benchmarks: Avenue, ShanghaiTech and UCSD Ped2. Additionally, we perform an ablation study demonstrating the importance of integrating self-supervised learning and normality-specific distillation in a multi-task learning setting. △ Less

Submitted 10 September, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

Comments: Accepted at CVPR 2021. Main paper and supplementary are both included

arXiv:2010.11158 [pdf, other]

Black-Box Ripper: Copying black-box models using generative evolutionary algorithms

Authors: Antonio Barbalau, Adrian Cosma, Radu Tudor Ionescu, Marius Popescu

Abstract: We study the task of replicating the functionality of black-box neural models, for which we only know the output class probabilities provided for a set of input images. We assume back-propagation through the black-box model is not possible and its training images are not available, e.g. the model could be exposed only through an API. In this context, we present a teacher-student framework that can… ▽ More We study the task of replicating the functionality of black-box neural models, for which we only know the output class probabilities provided for a set of input images. We assume back-propagation through the black-box model is not possible and its training images are not available, e.g. the model could be exposed only through an API. In this context, we present a teacher-student framework that can distill the black-box (teacher) model into a student model with minimal accuracy loss. To generate useful data samples for training the student, our framework (i) learns to generate images on a proxy data set (with images and classes different from those used to train the black-box) and (ii) applies an evolutionary strategy to make sure that each generated data sample exhibits a high response for a specific class when given as input to the black box. Our framework is compared with several baseline and state-of-the-art methods on three benchmark data sets. The empirical evidence indicates that our model is superior to the considered baselines. Although our method does not back-propagate through the black-box network, it generally surpasses state-of-the-art methods that regard the teacher as a glass-box model. Our code is available at: https://github.com/antoniobarbalau/black-box-ripper. △ Less

Submitted 21 October, 2020; originally announced October 2020.

Comments: Accepted as Oral at NeurIPS 2020

arXiv:2010.11081 [pdf, other]

Anatomically-Informed Deep Learning on Contrast-Enhanced Cardiac MRI for Scar Segmentation and Clinical Feature Extraction

Authors: Haley G. Abramson, Dan M. Popescu, Rebecca Yu, Changxin Lai, Julie K. Shade, Katherine C. Wu, Mauro Maggioni, Natalia A. Trayanova

Abstract: Visualizing disease-induced scarring and fibrosis in the heart on cardiac magnetic resonance (CMR) imaging with contrast enhancement (LGE) is paramount in characterizing disease progression and quantifying pathophysiological substrates of arrhythmias. However, segmentation and scar/fibrosis identification from LGE-CMR is an intensive manual process prone to large inter-observer variability. Here,… ▽ More Visualizing disease-induced scarring and fibrosis in the heart on cardiac magnetic resonance (CMR) imaging with contrast enhancement (LGE) is paramount in characterizing disease progression and quantifying pathophysiological substrates of arrhythmias. However, segmentation and scar/fibrosis identification from LGE-CMR is an intensive manual process prone to large inter-observer variability. Here, we present a novel fully-automated anatomically-informed deep learning solution for left ventricle (LV) and scar/fibrosis segmentation and clinical feature extraction from LGE-CMR. The technology involves three cascading convolutional neural networks that segment myocardium and scar/fibrosis from raw LGE-CMR images and constrain these segmentations within anatomical guidelines, thus facilitating seamless derivation of clinically-significant parameters. In addition to available LGE-CMR images, training used "LGE-like" synthetically enhanced cine scans. Results show excellent agreement with those of trained experts in terms of segmentation (balanced accuracy of $96\%$ and $75\%$ for LV and scar segmentation), clinical features ($2\%$ difference in mean scar-to-LV wall volume fraction), and anatomical fidelity. Our segmentation technology is extendable to other computer vision medical applications and to problems requiring guidelines adherence of predicted outputs. △ Less

Submitted 8 January, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

Comments: Haley G. Abramson and Dan M. Popescu contributed equally to this work

arXiv:2008.12328 [pdf, other]

doi 10.1109/TPAMI.2021.3074805

A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video

Authors: Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah

Abstract: Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. The complexity of the task arises from the commonly-adopted definition of an abnormal event, that is, a rarely occurring event that typically depends on the surrounding context. Following the standard formulation of abnormal event detection as outlier detection, we propo… ▽ More Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. The complexity of the task arises from the commonly-adopted definition of an abnormal event, that is, a rarely occurring event that typically depends on the surrounding context. Following the standard formulation of abnormal event detection as outlier detection, we propose a background-agnostic framework that learns from training videos containing only normal events. Our framework is composed of an object detector, a set of appearance and motion auto-encoders, and a set of classifiers. Since our framework only looks at object detections, it can be applied to different scenes, provided that normal events are defined identically across scenes and that the single main factor of variation is the background. To overcome the lack of abnormal data during training, we propose an adversarial learning strategy for the auto-encoders. We create a scene-agnostic set of out-of-domain pseudo-abnormal examples, which are correctly reconstructed by the auto-encoders before applying gradient ascent on the pseudo-abnormal examples. We further utilize the pseudo-abnormal examples to serve as abnormal examples when training appearance-based and motion-based binary classifiers to discriminate between normal and abnormal latent features and reconstructions. We compare our framework with the state-of-the-art methods on four benchmark data sets, using various evaluation metrics. Compared to existing methods, the empirical results indicate that our approach achieves favorable performance on all data sets. In addition, we provide region-based and track-based annotations for two large-scale abnormal event detection data sets from the literature, namely ShanghaiTech and Subway. △ Less

Submitted 6 April, 2023; v1 submitted 27 August, 2020; originally announced August 2020.

Comments: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:2006.03896 [pdf, other]

A Generic and Model-Agnostic Exemplar Synthetization Framework for Explainable AI

Authors: Antonio Barbalau, Adrian Cosma, Radu Tudor Ionescu, Marius Popescu

Abstract: With the growing complexity of deep learning methods adopted in practical applications, there is an increasing and stringent need to explain and interpret the decisions of such methods. In this work, we focus on explainable AI and propose a novel generic and model-agnostic framework for synthesizing input exemplars that maximize a desired response from a machine learning model. To this end, we use… ▽ More With the growing complexity of deep learning methods adopted in practical applications, there is an increasing and stringent need to explain and interpret the decisions of such methods. In this work, we focus on explainable AI and propose a novel generic and model-agnostic framework for synthesizing input exemplars that maximize a desired response from a machine learning model. To this end, we use a generative model, which acts as a prior for generating data, and traverse its latent space using a novel evolutionary strategy with momentum updates. Our framework is generic because (i) it can employ any underlying generator, e.g. Variational Auto-Encoders (VAEs) or Generative Adversarial Networks (GANs), and (ii) it can be applied to any input data, e.g. images, text samples or tabular data. Since we use a zero-order optimization method, our framework is model-agnostic, in the sense that the machine learning model that we aim to explain is a black-box. We stress out that our novel framework does not require access or knowledge of the internal structure or the training data of the black-box model. We conduct experiments with two generative models, VAEs and GANs, and synthesize exemplars for various data formats, image, text and tabular, demonstrating that our framework is generic. We also employ our prototype synthetization framework on various black-box models, for which we only know the input and the output formats, showing that it is model-agnostic. Moreover, we compare our framework (available at https://github.com/antoniobarbalau/exemplar) with a model-dependent approach based on gradient descent, proving that our framework obtains equally-good exemplars in a shorter computational time. △ Less

Submitted 4 August, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

Comments: Accepted at ECML-PKDD 2020

arXiv:2004.10605 [pdf, ps, other]

Self-Supervised Representation Learning on Document Images

Authors: Adrian Cosma, Mihai Ghidoveanu, Michael Panaitescu-Liess, Marius Popescu

Abstract: This work analyses the impact of self-supervised pre-training on document images in the context of document image classification. While previous approaches explore the effect of self-supervision on natural images, we show that patch-based pre-training performs poorly on document images because of their different structural properties and poor intra-sample semantic information. We propose two conte… ▽ More This work analyses the impact of self-supervised pre-training on document images in the context of document image classification. While previous approaches explore the effect of self-supervision on natural images, we show that patch-based pre-training performs poorly on document images because of their different structural properties and poor intra-sample semantic information. We propose two context-aware alternatives to improve performance on the Tobacco-3482 image classification task. We also propose a novel method for self-supervision, which makes use of the inherent multi-modality of documents (image and text), which performs better than other popular self-supervised methods, including supervised ImageNet pre-training, on document image classification scenarios with a limited amount of data. △ Less

Submitted 27 May, 2020; v1 submitted 18 April, 2020; originally announced April 2020.

Comments: 15 pages, 5 figures. Accepted at DAS 2020: IAPR International Workshop on Document Analysis Systems

MSC Class: 68T05

arXiv:1912.02259 [pdf, other]

Extending the Morphological Hit-or-Miss Transform to Deep Neural Networks

Authors: Muhammad Aminul Islam, Bryce Murray, Andrew Buck, Derek T. Anderson, Grant Scott, Mihail Popescu, James Keller

Abstract: While most deep learning architectures are built on convolution, alternative foundations like morphology are being explored for purposes like interpretability and its connection to the analysis and processing of geometric structures. The morphological hit-or-miss operation has the advantage that it takes into account both foreground and background information when evaluating target shape in an ima… ▽ More While most deep learning architectures are built on convolution, alternative foundations like morphology are being explored for purposes like interpretability and its connection to the analysis and processing of geometric structures. The morphological hit-or-miss operation has the advantage that it takes into account both foreground and background information when evaluating target shape in an image. Herein, we identify limitations in existing hit-or-miss neural definitions and we formulate an optimization problem to learn the transform relative to deeper architectures. To this end, we model the semantically important condition that the intersection of the hit and miss structuring elements (SEs) should be empty and we present a way to express Don't Care (DNC), which is important for denoting regions of an SE that are not relevant to detecting a target pattern. Our analysis shows that convolution, in fact, acts like a hit-miss transform through semantic interpretation of its filter differences. On these premises, we introduce an extension that outperforms conventional convolution on benchmark data. Quantitative experiments are provided on synthetic and benchmark data, showing that the direct encoding hit-or-miss transform provides better interpretability on learned shapes consistent with objects whereas our morphologically inspired generalized convolution yields higher classification accuracy. Last, qualitative hit and miss filter visualizations are provided relative to single morphological layer. △ Less

Submitted 27 September, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

arXiv:1905.08847 [pdf, ps, other]

doi 10.1016/j.ascom.2019.100356

Mega-Archive and the EURONEAR Tools for Datamining World Astronomical Images

Authors: Ovidiu Vaduvescu, Lucian Curelaru, Marcel Popescu

Abstract: The world astronomical image archives represent huge opportunities to time-domain astronomy sciences and other hot topics such as space defense, and astronomical observatories should improve this wealth and make it more accessible in the big data era. In 2010 we introduced the Mega-Archive database and the Mega-Precovery server for data mining images containing Solar system bodies, with focus on n… ▽ More The world astronomical image archives represent huge opportunities to time-domain astronomy sciences and other hot topics such as space defense, and astronomical observatories should improve this wealth and make it more accessible in the big data era. In 2010 we introduced the Mega-Archive database and the Mega-Precovery server for data mining images containing Solar system bodies, with focus on near Earth asteroids (NEAs). This paper presents the improvements and introduces some new related data mining tools developed during the last five years. Currently, the Mega-Archive has indexed 15 million images available from six major collections (CADC, ESO, ING, LCOGT, NVO and SMOKA) and other instrument archives and surveys. This meta-data index collection is daily updated (since 2014) by a crawler which performs automated query of five major collections. Since 2016, these data mining tools run to the new dedicated EURONEAR server, and the database migrated to SQL engine which supports robust and fast queries. To constrain the area to search moving or fixed objects in images taken by large mosaic cameras, we built the graphical tools FindCCD and FindCCD for Fixed Objects which overlay the targets across one of seven mosaic cameras (Subaru-SuprimeCam, VST-OmegaCam, INT-WFC, VISTA-VIRCAM, CFHT-MegaCam, Blanco-DECam and Subaru-HSC), also plotting the uncertainty ellipse for poorly observed NEAs. In 2017 we improved Mega-Precovery, which offers now two options for calculus of the ephemerides and three options for the input (objects defined by designation, orbit or observations). Additionally, we developed Mega-Archive for Fixed Objects (MASFO) and Mega-Archive Search for Double Stars (MASDS). We believe that the huge potential of science imaging archives is still insufficiently exploited. △ Less

Submitted 21 May, 2019; originally announced May 2019.

Comments: Paper submitted to Astronomy and Computing (25 Mar 2019)

Report number: Accepted, available online 12 Dec 2019

Journal ref: Astronomy and Computing, 2019

arXiv:1804.10892 [pdf, other]

doi 10.1109/ACCESS.2019.2917266

Local Learning with Deep and Handcrafted Features for Facial Expression Recognition

Authors: Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Marius Popescu

Abstract: We present an approach that combines automatic features learned by convolutional neural networks (CNN) and handcrafted features computed by the bag-of-visual-words (BOVW) model in order to achieve state-of-the-art results in facial expression recognition. To obtain automatic features, we experiment with multiple CNN architectures, pre-trained models and training procedures, e.g. Dense-Sparse-Dense… ▽ More We present an approach that combines automatic features learned by convolutional neural networks (CNN) and handcrafted features computed by the bag-of-visual-words (BOVW) model in order to achieve state-of-the-art results in facial expression recognition. To obtain automatic features, we experiment with multiple CNN architectures, pre-trained models and training procedures, e.g. Dense-Sparse-Dense. After fusing the two types of features, we employ a local learning framework to predict the class label for each test image. The local learning framework is based on three steps. First, a k-nearest neighbors model is applied in order to select the nearest training samples for an input test image. Second, a one-versus-all Support Vector Machines (SVM) classifier is trained on the selected training samples. Finally, the SVM classifier is used to predict the class label only for the test image it was trained for. Although we have used local learning in combination with handcrafted features in our previous work, to the best of our knowledge, local learning has never been employed in combination with deep features. The experiments on the 2013 Facial Expression Recognition (FER) Challenge data set, the FER+ data set and the AffectNet data set demonstrate that our approach achieves state-of-the-art results. With a top accuracy of 75.42% on FER 2013, 87.76% on the FER+, 59.58% on AffectNet 8-way classification and 63.31% on AffectNet 7-way classification, we surpass the state-of-the-art methods by more than 1% on all data sets. △ Less

Submitted 12 March, 2020; v1 submitted 29 April, 2018; originally announced April 2018.

Comments: Accepted in IEEE Access

Journal ref: in IEEE Access, vol. 7, pp. 64827-64836, 2019

arXiv:1801.05030 [pdf, other]

Detecting abnormal events in video using Narrowed Normality Clusters

Authors: Radu Tudor Ionescu, Sorina Smeureanu, Marius Popescu, Bogdan Alexe

Abstract: We formulate the abnormal event detection problem as an outlier detection task and we propose a two-stage algorithm based on k-means clustering and one-class Support Vector Machines (SVM) to eliminate outliers. In the feature extraction stage, we propose to augment spatio-temporal cubes with deep appearance features extracted from the last convolutional layer of a pre-trained neural network. After… ▽ More We formulate the abnormal event detection problem as an outlier detection task and we propose a two-stage algorithm based on k-means clustering and one-class Support Vector Machines (SVM) to eliminate outliers. In the feature extraction stage, we propose to augment spatio-temporal cubes with deep appearance features extracted from the last convolutional layer of a pre-trained neural network. After extracting motion and appearance features from the training video containing only normal events, we apply k-means clustering to find clusters representing different types of normal motion and appearance features. In the first stage, we consider that clusters with fewer samples (with respect to a given threshold) contain mostly outliers, and we eliminate these clusters altogether. In the second stage, we shrink the borders of the remaining clusters by training a one-class SVM model on each cluster. To detected abnormal events in the test video, we analyze each test sample and consider its maximum normality score provided by the trained one-class SVM models, based on the intuition that a test sample can belong to only one cluster of normality. If the test sample does not fit well in any narrowed normality cluster, then it is labeled as abnormal. We compare our method with several state-of-the-art methods on three benchmark data sets. The empirical results indicate that our abnormal event detection framework can achieve better results in most cases, while processing the test video in real-time at 24 frames per second on a single CPU. △ Less

Submitted 16 November, 2018; v1 submitted 12 January, 2018; originally announced January 2018.

Comments: Accepted at WACV 2019. arXiv admin note: text overlap with arXiv:1705.08182

arXiv:1707.08349 [pdf, other]

Can string kernels pass the test of time in Native Language Identification?

Authors: Radu Tudor Ionescu, Marius Popescu

Abstract: We describe a machine learning approach for the 2017 shared task on Native Language Identification (NLI). The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from essays or speech transcripts, we also use a kernel based on i-vectors, a low-dimensional representation of audio record… ▽ More We describe a machine learning approach for the 2017 shared task on Native Language Identification (NLI). The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from essays or speech transcripts, we also use a kernel based on i-vectors, a low-dimensional representation of audio recordings, provided by the shared task organizers. For the learning stage, we choose Kernel Discriminant Analysis (KDA) over Kernel Ridge Regression (KRR), because the former classifier obtains better results than the latter one on the development set. In our previous work, we have used a similar machine learning approach to achieve state-of-the-art NLI results. The goal of this paper is to demonstrate that our shallow and simple approach based on string kernels (with minor improvements) can pass the test of time and reach state-of-the-art performance in the 2017 NLI shared task, despite the recent advances in natural language processing. We participated in all three tracks, in which the competitors were allowed to use only the essays (essay track), only the speech transcripts (speech track), or both (fusion track). Using only the data provided by the organizers for training our models, we have reached a macro F1 score of 86.95% in the closed essay track, a macro F1 score of 87.55% in the closed speech track, and a macro F1 score of 93.19% in the closed fusion track. With these scores, our team (UnibucKernel) ranked in the first group of teams in all three tracks, while attaining the best scores in the speech and the fusion tracks. △ Less

Submitted 4 August, 2017; v1 submitted 26 July, 2017; originally announced July 2017.

Comments: In Proceedings of the 12th Workshop on Building Educational Applications Using NLP, 2017

arXiv:1705.08280 [pdf, other]

How hard can it be? Estimating the difficulty of visual search in an image

Authors: Radu Tudor Ionescu, Bogdan Alexe, Marius Leordeanu, Marius Popescu, Dim P. Papadopoulos, Vittorio Ferrari

Abstract: We address the problem of estimating image difficulty defined as the human response time for solving a visual search task. We collect human annotations of image difficulty for the PASCAL VOC 2012 data set through a crowd-sourcing platform. We then analyze what human interpretable image properties can have an impact on visual search difficulty, and how accurate are those properties for predicting d… ▽ More We address the problem of estimating image difficulty defined as the human response time for solving a visual search task. We collect human annotations of image difficulty for the PASCAL VOC 2012 data set through a crowd-sourcing platform. We then analyze what human interpretable image properties can have an impact on visual search difficulty, and how accurate are those properties for predicting difficulty. Next, we build a regression model based on deep features learned with state of the art convolutional neural networks and show better results for predicting the ground-truth visual search difficulty scores produced by human annotators. Our model is able to correctly rank about 75% image pairs according to their difficulty score. We also show that our difficulty predictor generalizes well to new classes not seen during training. Finally, we demonstrate that our predicted difficulty scores are useful for weakly supervised object localization (8% improvement) and semi-supervised object classification (1% improvement). △ Less

Submitted 23 May, 2017; originally announced May 2017.

Comments: Published at CVPR 2016

Journal ref: In Proceedings of CVPR, pp. 2157-2166, 2016

arXiv:1705.08182 [pdf, other]

Unmasking the abnormal events in video

Authors: Radu Tudor Ionescu, Sorina Smeureanu, Bogdan Alexe, Marius Popescu

Abstract: We propose a novel framework for abnormal event detection in video that requires no training sequences. Our framework is based on unmasking, a technique previously used for authorship verification in text documents, which we adapt to our task. We iteratively train a binary classifier to distinguish between two consecutive video sequences while removing at each step the most discriminant features.… ▽ More We propose a novel framework for abnormal event detection in video that requires no training sequences. Our framework is based on unmasking, a technique previously used for authorship verification in text documents, which we adapt to our task. We iteratively train a binary classifier to distinguish between two consecutive video sequences while removing at each step the most discriminant features. Higher training accuracy rates of the intermediately obtained classifiers represent abnormal events. To the best of our knowledge, this is the first work to apply unmasking for a computer vision task. We compare our method with several state-of-the-art supervised and unsupervised methods on four benchmark data sets. The empirical results indicate that our abnormal event detection framework can achieve state-of-the-art results, while running in real-time at 20 frames per second. △ Less

Submitted 25 July, 2017; v1 submitted 23 May, 2017; originally announced May 2017.

Comments: Accepted at the 2017 International Conference on Computer Vision (ICCV 2017)

arXiv:1509.03591 [pdf]

High Performance Computer Acoustic Data Accelerator: A New System for Exploring Marine Mammal Acoustics for Big Data Applications

Authors: Peter Dugan, John Zollweg, Marian Popescu, Denise Risch, Herve Glotin, Yann LeCun, and Christopher Clark

Abstract: This paper presents a new software model designed for distributed sonic signal detection runtime using machine learning algorithms called DeLMA. A new algorithm--Acoustic Data-mining Accelerator (ADA)--is also presented. ADA is a robust yet scalable solution for efficiently processing big sound archives using distributing computing technologies. Together, DeLMA and the ADA algorithm provide a powe… ▽ More This paper presents a new software model designed for distributed sonic signal detection runtime using machine learning algorithms called DeLMA. A new algorithm--Acoustic Data-mining Accelerator (ADA)--is also presented. ADA is a robust yet scalable solution for efficiently processing big sound archives using distributing computing technologies. Together, DeLMA and the ADA algorithm provide a powerful tool currently being used by the Bioacoustics Research Program (BRP) at the Cornell Lab of Ornithology, Cornell University. This paper provides a high level technical overview of the system, and discusses various aspects of the design. Basic runtime performance and project summary are presented. The DeLMA-ADA baseline performance comparing desktop serial configuration to a 64 core distributed HPC system shows as much as a 44 times faster increase in runtime execution. Performance tests using 48 cores on the HPC shows a 9x to 12x efficiency over a 4 core desktop solution. Project summary results for 19 east coast deployments show that the DeLMA-ADA solution has processed over three million channel hours of sound to date. △ Less

Submitted 11 September, 2015; originally announced September 2015.

Comments: Seven pages, submitted at International Conference on Machine Learning 2014, Workshop uLearnBio, unsupervised learning for bioacoustic applications

MSC Class: 68-04

arXiv:1307.0414 [pdf, other]

Challenges in Representation Learning: A report on three machine learning contests

Authors: Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, Yingbo Zhou, Chetan Ramaiah, Fangxiang Feng, Ruifan Li, Xiaojie Wang, Dimitris Athanasakis, John Shawe-Taylor, Maxim Milakov, John Park, Radu Ionescu, Marius Popescu, Cristian Grozea, James Bergstra, Jingjing Xie, Lukasz Romaszko , et al. (3 additional authors not shown)

Abstract: The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kin… ▽ More The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions. △ Less

Submitted 1 July, 2013; originally announced July 2013.

Comments: 8 pages, 2 figures

arXiv:1305.3635 [pdf]

Bioacoustic Signal Classification Based on Continuous Region Processing, Grid Masking and Artificial Neural Network

Authors: Mohammad Pourhomayoun, Peter Dugan, Marian Popescu, Christopher Clark

Abstract: In this paper, we develop a novel method based on machine-learning and image processing to identify North Atlantic right whale (NARW) up-calls in the presence of high levels of ambient and interfering noise. We apply a continuous region algorithm on the spectrogram to extract the regions of interest, and then use grid masking techniques to generate a small feature set that is then used in an artif… ▽ More In this paper, we develop a novel method based on machine-learning and image processing to identify North Atlantic right whale (NARW) up-calls in the presence of high levels of ambient and interfering noise. We apply a continuous region algorithm on the spectrogram to extract the regions of interest, and then use grid masking techniques to generate a small feature set that is then used in an artificial neural network classifier to identify the NARW up-calls. It is shown that the proposed technique is effective in detecting and capturing even very faint up-calls, in the presence of ambient and interfering noises. The method is evaluated on a dataset recorded in Massachusetts Bay, United States. The dataset includes 20000 sound clips for training, and 10000 sound clips for testing. The results show that the proposed technique can achieve an error rate of less than FPR = 4.5% for a 90% true positive rate. △ Less

Submitted 17 June, 2013; v1 submitted 15 May, 2013; originally announced May 2013.

Comments: To be Submitted to "ICML 2013 Workshop on Machine Learning for Bioacoustics", 6 pages, 8 figures

arXiv:1305.3633 [pdf]

Classification for Big Dataset of Bioacoustic Signals Based on Human Scoring System and Artificial Neural Network

Authors: Mohammad Pourhomayoun, Peter Dugan, Marian Popescu, Denise Risch, Hal Lewis, Christopher Clark

Abstract: In this paper, we propose a method to improve sound classification performance by combining signal features, derived from the time-frequency spectrogram, with human perception. The method presented herein exploits an artificial neural network (ANN) and learns the signal features based on the human perception knowledge. The proposed method is applied to a large acoustic dataset containing 24 months… ▽ More In this paper, we propose a method to improve sound classification performance by combining signal features, derived from the time-frequency spectrogram, with human perception. The method presented herein exploits an artificial neural network (ANN) and learns the signal features based on the human perception knowledge. The proposed method is applied to a large acoustic dataset containing 24 months of nearly continuous recordings. The results show a significant improvement in performance of the detection-classification system; yielding as much as 20% improvement in true positive rate for a given false positive rate. △ Less

Submitted 17 June, 2013; v1 submitted 15 May, 2013; originally announced May 2013.

Comments: To be Submitted to "ICML 2013 Workshop on Machine Learning for Bioacoustics", 6 pages, 4 figures

arXiv:1305.3250 [pdf]

Bioacoustical Periodic Pulse Train Signal Detection and Classification using Spectrogram Intensity Binarization and Energy Projection

Authors: Marian Popescu, Peter J. Dugan, Mohammad Pourhomayoun, Denise Risch, Harold W. Lewis III, Christopher W. Clark

Abstract: The following work outlines an approach for automatic detection and recognition of periodic pulse train signals using a multi-stage process based on spectrogram edge detection, energy projection and classification. The method has been implemented to automatically detect and recognize pulse train songs of minke whales. While the long term goal of this work is to properly identify and detect minke s… ▽ More The following work outlines an approach for automatic detection and recognition of periodic pulse train signals using a multi-stage process based on spectrogram edge detection, energy projection and classification. The method has been implemented to automatically detect and recognize pulse train songs of minke whales. While the long term goal of this work is to properly identify and detect minke songs from large multi-year datasets, this effort was developed using sounds off the coast of Massachusetts, in the Stellwagen Bank National Marine Sanctuary. The detection methodology is presented and evaluated on 232 continuous hours of acoustic recordings and a qualitative analysis of machine learning classifiers and their performance is described. The trained automatic detection and classification system is applied to 120 continuous hours, comprised of various challenges such as broadband and narrowband noises, low SNR, and other pulse train signatures. This automatic system achieves a TPR of 63% for FPR of 0.6% (or 0.87 FP/h), at a Precision (PPV) of 84% and an F1 score of 71%. △ Less

Submitted 28 June, 2013; v1 submitted 14 May, 2013; originally announced May 2013.

Comments: ICML 2013 Workshop on Machine Learning for Bioacoustics, 2013, 6 pages

Showing 1–27 of 27 results for author: Popescu, M