-
Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Authors:
Alexandre Lopes,
Fernando Pereira dos Santos,
Diulhio de Oliveira,
Mauricio Schiezaro,
Helio Pedrini
Abstract:
Deep neural networks have consistently represented the state of the art in most computer vision problems. In these scenarios, larger and more complex models have demonstrated superior performance to smaller architectures, especially when trained with plenty of representative data. With the recent adoption of Vision Transformer (ViT) based architectures and advanced Convolutional Neural Networks (C…
▽ More
Deep neural networks have consistently represented the state of the art in most computer vision problems. In these scenarios, larger and more complex models have demonstrated superior performance to smaller architectures, especially when trained with plenty of representative data. With the recent adoption of Vision Transformer (ViT) based architectures and advanced Convolutional Neural Networks (CNNs), the total number of parameters of leading backbone architectures increased from 62M parameters in 2012 with AlexNet to 7B parameters in 2024 with AIM-7B. Consequently, deploying such deep architectures faces challenges in environments with processing and runtime constraints, particularly in embedded systems. This paper covers the main model compression techniques applied for computer vision tasks, enabling modern models to be used in embedded systems. We present the characteristics of compression subareas, compare different approaches, and discuss how to choose the best technique and expected variations when analyzing it on various embedded devices. We also share codes to assist researchers and new practitioners in overcoming initial implementation challenges for each subarea and present trends for Model Compression. Case studies for compression models are available at \href{https://github.com/venturusbr/cv-model-compression}{https://github.com/venturusbr/cv-model-compression}.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
MyoGestic: EMG Interfacing Framework for Decoding Multiple Spared Degrees of Freedom of the Hand in Individuals with Neural Lesions
Authors:
Raul C. Sîmpetru,
Dominik I. Braun,
Arndt U. Simon,
Michael März,
Vlad Cnejevici,
Daniela Souza de Oliveira,
Nico Weber,
Jonas Walter,
Jörg Franke,
Daniel Höglinger,
Cosima Prahm,
Matthias Ponfick,
Alessandro Del Vecchio
Abstract:
Restoring limb motor function in individuals with spinal cord injury (SCI), stroke, or amputation remains a critical challenge, one which affects millions worldwide. Recent studies show through surface electromyography (EMG) that spared motor neurons can still be voluntarily controlled, even without visible limb movement . These signals can be decoded and used for motor intent estimation; however,…
▽ More
Restoring limb motor function in individuals with spinal cord injury (SCI), stroke, or amputation remains a critical challenge, one which affects millions worldwide. Recent studies show through surface electromyography (EMG) that spared motor neurons can still be voluntarily controlled, even without visible limb movement . These signals can be decoded and used for motor intent estimation; however, current wearable solutions lack the necessary hardware and software for intuitive interfacing of the spared degrees of freedom after neural injuries. To address these limitations, we developed a wireless, high-density EMG bracelet, coupled with a novel software framework, MyoGestic. Our system allows rapid and tailored adaptability of machine learning models to the needs of the users, facilitating real-time decoding of multiple spared distinctive degrees of freedom. In our study, we successfully decoded the motor intent from two participants with SCI, two with spinal stroke , and three amputees in real-time, achieving several controllable degrees of freedom within minutes after wearing the EMG bracelet. We provide a proof-of-concept that these decoded signals can be used to control a digitally rendered hand, a wearable orthosis, a prosthesis, or a 2D cursor. Our framework promotes a participant-centered approach, allowing immediate feedback integration, thus enhancing the iterative development of myocontrol algorithms. The proposed open-source software framework, MyoGestic, allows researchers and patients to focus on the augmentation and training of the spared degrees of freedom after neural lesions, thus potentially bridging the gap between research and clinical application and advancing the development of intuitive EMG interfaces for diverse neural lesions.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Curio: A Dataflow-Based Framework for Collaborative Urban Visual Analytics
Authors:
Gustavo Moreira,
Maryam Hosseini,
Carolina Veiga,
Lucas Alexandre,
Nicola Colaninno,
Daniel de Oliveira,
Nivan Ferreira,
Marcos Lage,
Fabio Miranda
Abstract:
Over the past decade, several urban visual analytics systems and tools have been proposed to tackle a host of challenges faced by cities, in areas as diverse as transportation, weather, and real estate. Many of these tools have been designed through collaborations with urban experts, aiming to distill intricate urban analysis workflows into interactive visualizations and interfaces. However, the d…
▽ More
Over the past decade, several urban visual analytics systems and tools have been proposed to tackle a host of challenges faced by cities, in areas as diverse as transportation, weather, and real estate. Many of these tools have been designed through collaborations with urban experts, aiming to distill intricate urban analysis workflows into interactive visualizations and interfaces. However, the design, implementation, and practical use of these tools still rely on siloed approaches, resulting in bespoke applications that are difficult to reproduce and extend. At the design level, these tools undervalue rich data workflows from urban experts, typically treating them only as data providers and evaluators. At the implementation level, they lack interoperability with other technical frameworks. At the practical use level, they tend to be narrowly focused on specific fields, inadvertently creating barriers to cross-domain collaboration. To address these gaps, we present Curio, a framework for collaborative urban visual analytics. Curio uses a dataflow model with multiple abstraction levels (code, grammar, GUI elements) to facilitate collaboration across the design and implementation of visual analytics components. The framework allows experts to intertwine data preprocessing, management, and visualization stages while tracking the provenance of code and visualizations. In collaboration with urban experts, we evaluate Curio through a diverse set of usage scenarios targeting urban accessibility, urban microclimate, and sunlight access. These scenarios use different types of data and domain methodologies to illustrate Curio's flexibility in tackling pressing societal challenges. Curio is available at https://urbantk.org/curio.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
A Survey on Cell Nuclei Instance Segmentation and Classification: Leveraging Context and Attention
Authors:
João D. Nunes,
Diana Montezuma,
Domingos Oliveira,
Tania Pereira,
Jaime S. Cardoso
Abstract:
Manually annotating nuclei from the gigapixel Hematoxylin and Eosin (H&E)-stained Whole Slide Images (WSIs) is a laborious and costly task, meaning automated algorithms for cell nuclei instance segmentation and classification could alleviate the workload of pathologists and clinical researchers and at the same time facilitate the automatic extraction of clinically interpretable features. But due t…
▽ More
Manually annotating nuclei from the gigapixel Hematoxylin and Eosin (H&E)-stained Whole Slide Images (WSIs) is a laborious and costly task, meaning automated algorithms for cell nuclei instance segmentation and classification could alleviate the workload of pathologists and clinical researchers and at the same time facilitate the automatic extraction of clinically interpretable features. But due to high intra- and inter-class variability of nuclei morphological and chromatic features, as well as H&E-stains susceptibility to artefacts, state-of-the-art algorithms cannot correctly detect and classify instances with the necessary performance. In this work, we hypothesise context and attention inductive biases in artificial neural networks (ANNs) could increase the generalization of algorithms for cell nuclei instance segmentation and classification. We conduct a thorough survey on context and attention methods for cell nuclei instance segmentation and classification from H&E-stained microscopy imaging, while providing a comprehensive discussion of the challenges being tackled with context and attention. Besides, we illustrate some limitations of current approaches and present ideas for future research. As a case study, we extend both a general instance segmentation and classification method (Mask-RCNN) and a tailored cell nuclei instance segmentation and classification model (HoVer-Net) with context- and attention-based mechanisms, and do a comparative analysis on a multi-centre colon nuclei identification and counting dataset. Although pathologists rely on context at multiple levels while paying attention to specific Regions of Interest (RoIs) when analysing and annotating WSIs, our findings suggest translating that domain knowledge into algorithm design is no trivial task, but to fully exploit these mechanisms, the scientific understanding of these methods should be addressed.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Cryptocurrency Price Forecasting Using XGBoost Regressor and Technical Indicators
Authors:
Abdelatif Hafid,
Maad Ebrahim,
Ali Alfatemi,
Mohamed Rahouti,
Diogo Oliveira
Abstract:
The rapid growth of the stock market has attracted many investors due to its potential for significant profits. However, predicting stock prices accurately is difficult because financial markets are complex and constantly changing. This is especially true for the cryptocurrency market, which is known for its extreme volatility, making it challenging for traders and investors to make wise and profi…
▽ More
The rapid growth of the stock market has attracted many investors due to its potential for significant profits. However, predicting stock prices accurately is difficult because financial markets are complex and constantly changing. This is especially true for the cryptocurrency market, which is known for its extreme volatility, making it challenging for traders and investors to make wise and profitable decisions. This study introduces a machine learning approach to predict cryptocurrency prices. Specifically, we make use of important technical indicators such as Exponential Moving Average (EMA) and Moving Average Convergence Divergence (MACD) to train and feed the XGBoost regressor model. We demonstrate our approach through an analysis focusing on the closing prices of Bitcoin cryptocurrency. We evaluate the model's performance through various simulations, showing promising results that suggest its usefulness in aiding/guiding cryptocurrency traders and investors in dynamic market conditions.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Full Iso-recursive Types
Authors:
Litao Zhou,
Qianyong Wan,
Bruno C. d. S. Oliveira
Abstract:
There are two well-known formulations of recursive types: iso-recursive and equi-recursive types. Abadi and Fiore [1996] have shown that iso- and equi-recursive types have the same expressive power. However, their encoding of equi-recursive types in terms of iso-recursive types requires explicit coercions. These coercions come with significant additional computational overhead, and complicate reas…
▽ More
There are two well-known formulations of recursive types: iso-recursive and equi-recursive types. Abadi and Fiore [1996] have shown that iso- and equi-recursive types have the same expressive power. However, their encoding of equi-recursive types in terms of iso-recursive types requires explicit coercions. These coercions come with significant additional computational overhead, and complicate reasoning about the equivalence of the two formulations of recursive types.
This paper proposes a generalization of iso-recursive types called full iso-recursive types. Full iso-recursive types allow encoding all programs with equi-recursive types without computational overhead. Instead of explicit term coercions, all type transformations are captured by computationally irrelevant casts, which can be erased at runtime without affecting the semantics of the program. Consequently, reasoning about the equivalence between the two approaches can be greatly simplified. We present a calculus called $λ^μ_{Fi}$, which extends the simply typed lambda calculus (STLC) with full iso-recursive types. The $λ^μ_{Fi}$ calculus is proved to be type sound, and shown to have the same expressive power as a calculus with equi-recursive types. We also extend our results to subtyping, and show that equi-recursive subtyping can be expressed in terms of iso-recursive subtyping with cast operators.
△ Less
Submitted 7 July, 2024; v1 submitted 30 June, 2024;
originally announced July 2024.
-
The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement
Authors:
Danilo de Oliveira,
Simon Welker,
Julius Richter,
Timo Gerkmann
Abstract:
To obtain improved speech enhancement models, researchers often focus on increasing performance according to specific instrumental metrics. However, when the same metric is used in a loss function to optimize models, it may be detrimental to aspects that the given metric does not see. The goal of this paper is to illustrate the risk of overfitting a speech enhancement model to the metric used for…
▽ More
To obtain improved speech enhancement models, researchers often focus on increasing performance according to specific instrumental metrics. However, when the same metric is used in a loss function to optimize models, it may be detrimental to aspects that the given metric does not see. The goal of this paper is to illustrate the risk of overfitting a speech enhancement model to the metric used for evaluation. For this, we introduce enhancement models that exploit the widely used PESQ measure. Our "PESQetarian" model achieves 3.82 PESQ on VB-DMD while scoring very poorly in a listening experiment. While the obtained PESQ value of 3.82 would imply "state-of-the-art" PESQ-performance on the VB-DMD benchmark, our examples show that when optimizing w.r.t. a metric, an isolated evaluation on the same metric may be misleading. Instead, other metrics should be included in the evaluation and the resulting performance predictions should be confirmed by listening.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
Authors:
Daniel A. P. Oliveira,
Eugénio Ribeiro,
David Martins de Matos
Abstract:
Creating engaging narratives from visual data is crucial for automated digital media consumption, assistive technologies, and interactive entertainment. This survey covers methodologies used in the generation of these narratives, focusing on their principles, strengths, and limitations.
The survey also covers tasks related to automatic story generation, such as image and video captioning, and vi…
▽ More
Creating engaging narratives from visual data is crucial for automated digital media consumption, assistive technologies, and interactive entertainment. This survey covers methodologies used in the generation of these narratives, focusing on their principles, strengths, and limitations.
The survey also covers tasks related to automatic story generation, such as image and video captioning, and visual question answering, as well as story generation without visual inputs. These tasks share common challenges with visual story generation and have served as inspiration for the techniques used in the field. We analyze the main datasets and evaluation metrics, providing a critical perspective on their limitations.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Contrastive Pretraining for Visual Concept Explanations of Socioeconomic Outcomes
Authors:
Ivica Obadic,
Alex Levering,
Lars Pennig,
Dario Oliveira,
Diego Marcos,
Xiaoxiang Zhu
Abstract:
Predicting socioeconomic indicators from satellite imagery with deep learning has become an increasingly popular research direction. Post-hoc concept-based explanations can be an important step towards broader adoption of these models in policy-making as they enable the interpretation of socioeconomic outcomes based on visual concepts that are intuitive to humans. In this paper, we study the inter…
▽ More
Predicting socioeconomic indicators from satellite imagery with deep learning has become an increasingly popular research direction. Post-hoc concept-based explanations can be an important step towards broader adoption of these models in policy-making as they enable the interpretation of socioeconomic outcomes based on visual concepts that are intuitive to humans. In this paper, we study the interplay between representation learning using an additional task-specific contrastive loss and post-hoc concept explainability for socioeconomic studies. Our results on two different geographical locations and tasks indicate that the task-specific pretraining imposes a continuous ordering of the latent space embeddings according to the socioeconomic outcomes. This improves the model's interpretability as it enables the latent space of the model to associate concepts encoding typical urban and natural area patterns with continuous intervals of socioeconomic outcomes. Further, we illustrate how analyzing the model's conceptual sensitivity for the intervals of socioeconomic outcomes can shed light on new insights for urban studies.
△ Less
Submitted 13 June, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
ProvDeploy: Provenance-oriented Containerization of High Performance Computing Scientific Workflows
Authors:
Liliane Kunstmann,
Débora Pina,
Daniel de Oliveira,
Marta Mattoso
Abstract:
Many existing scientific workflows require High Performance Computing environments to produce results in a timely manner. These workflows have several software library components and use different environments, making the deployment and execution of the software stack not trivial. This complexity increases if the user needs to add provenance data capture services to the workflow. This manuscript i…
▽ More
Many existing scientific workflows require High Performance Computing environments to produce results in a timely manner. These workflows have several software library components and use different environments, making the deployment and execution of the software stack not trivial. This complexity increases if the user needs to add provenance data capture services to the workflow. This manuscript introduces ProvDeploy to assist the user in configuring containers for scientific workflows with integrated provenance data capture. ProvDeploy was evaluated with a Scientific Machine Learning workflow, exploring containerization strategies focused on provenance in two distinct HPC environments
△ Less
Submitted 25 March, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Ontologia para monitorar a deficiência mental em seus déficts no processamento da informação por declínio cognitivo e evitar agressões psicológicas e físicas em ambientes educacionais com ajuda da I.A*
Authors:
Bruna Araújo de Castro Oliveira
Abstract:
The intention of this article is to propose the use of artificial intelligence to detect through analysis by UFO ontology the emergence of verbal and physical aggression related to psychosocial deficiencies and their provoking agents, in an attempt to prevent catastrophic consequences within school environments.
The intention of this article is to propose the use of artificial intelligence to detect through analysis by UFO ontology the emergence of verbal and physical aggression related to psychosocial deficiencies and their provoking agents, in an attempt to prevent catastrophic consequences within school environments.
△ Less
Submitted 31 January, 2024;
originally announced March 2024.
-
Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing
Authors:
Adrian Höhl,
Ivica Obadic,
Miguel Ángel Fernández Torres,
Hiba Najjar,
Dario Oliveira,
Zeynep Akata,
Andreas Dengel,
Xiao Xiang Zhu
Abstract:
In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in Remote Sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the used explainable AI methods and their objectives, findings, and challenges in Remote Sensing applications is still mis…
▽ More
In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in Remote Sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the used explainable AI methods and their objectives, findings, and challenges in Remote Sensing applications is still missing. In this paper, we address this issue by performing a systematic review to identify the key trends of how explainable AI is used in Remote Sensing and shed light on novel explainable AI approaches and emerging directions that tackle specific Remote Sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights in Remote Sensing, and reflect on the approaches used for explainable AI methods evaluation. Our review provides a complete summary of the state-of-the-art in the field. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field of explainable AI in Remote Sensing.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Are Fact-Checking Tools Helpful? An Exploration of the Usability of Google Fact Check
Authors:
Qiangeng Yang,
Tess Christensen,
Shlok Gilda,
Juliana Fernandes,
Daniela Oliveira,
Ronald Wilson,
Damon Woodard
Abstract:
Fact-checking-specific search engines such as Google Fact Check are a promising way to combat misinformation on social media, especially during significant events such as the COVID-19 pandemic and the U.S. presidential elections, but the usability of such an approach has not been thoroughly studied. We evaluated the performance of Google Fact Check by analyzing the retrieved fact-checking results…
▽ More
Fact-checking-specific search engines such as Google Fact Check are a promising way to combat misinformation on social media, especially during significant events such as the COVID-19 pandemic and the U.S. presidential elections, but the usability of such an approach has not been thoroughly studied. We evaluated the performance of Google Fact Check by analyzing the retrieved fact-checking results regarding 1,000 COVID-19-related false claims and found it able to retrieve the fact-checking results for 15.8% of the input claims, and the results are relatively reliable. We also found that the false claims receiving different fact-checking verdicts (i.e., "False," "Partly False," "True," and "Unratable") tend to reflect diverse emotional tones, and fact-checking sources tend to check the claims in different lengths and using dictionary words to various extents. Claim variations addressing the same issue yet described differently are likely to retrieve distinct fact-checking results. We suggested that the quantities of the retrieved fact-checking results could be optimized and that slightly adjusting input wording may be the best practice for users to retrieve more useful information. This study aims to contribute to the understanding of state-of-the-art fact-checking tools and information integrity.
△ Less
Submitted 15 August, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Evolution of urban areas and land surface temperature
Authors:
Sudipan Saha,
Tushar Verma,
Dario Augusto Borges Oliveira
Abstract:
With the global population on the rise, our cities have been expanding to accommodate the growing number of people. The expansion of cities generally leads to the engulfment of peripheral areas. However, such expansion of urban areas is likely to cause increment in areas with increased land surface temperature (LST). By considering each summer as a data point, we form LST multi-year time-series an…
▽ More
With the global population on the rise, our cities have been expanding to accommodate the growing number of people. The expansion of cities generally leads to the engulfment of peripheral areas. However, such expansion of urban areas is likely to cause increment in areas with increased land surface temperature (LST). By considering each summer as a data point, we form LST multi-year time-series and cluster it to obtain spatio-temporal pattern. We observe several interesting phenomena from these patterns, e.g., some clusters show reasonable similarity to the built-up area, whereas the locations with high temporal variation are seen more in the peripheral areas. Furthermore, the LST center of mass shifts over the years for cities with development activities tilted towards a direction. We conduct the above-mentioned studies for three different cities in three different continents.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Foundation Models for Generalist Geospatial Artificial Intelligence
Authors:
Johannes Jakubik,
Sujit Roy,
C. E. Phillips,
Paolo Fraccaro,
Denys Godwin,
Bianca Zadrozny,
Daniela Szwarcman,
Carlos Gomes,
Gabby Nyirjesy,
Blair Edwards,
Daiki Kimura,
Naomi Simumba,
Linsong Chu,
S. Karthik Mukkavilli,
Devyani Lambhate,
Kamal Das,
Ranjini Bangalore,
Dario Oliveira,
Michal Muszynski,
Kumar Ankur,
Muthukumaran Ramasubramanian,
Iksha Gurung,
Sam Khallaghi,
Hanxi,
Li
, et al. (8 additional authors not shown)
Abstract:
Significant progress in the development of highly adaptable and reusable Artificial Intelligence (AI) models is expected to have a significant impact on Earth science and remote sensing. Foundation models are pre-trained on large unlabeled datasets through self-supervision, and then fine-tuned for various downstream tasks with small labeled datasets. This paper introduces a first-of-a-kind framewo…
▽ More
Significant progress in the development of highly adaptable and reusable Artificial Intelligence (AI) models is expected to have a significant impact on Earth science and remote sensing. Foundation models are pre-trained on large unlabeled datasets through self-supervision, and then fine-tuned for various downstream tasks with small labeled datasets. This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive geospatial data. We have utilized this framework to create Prithvi, a transformer-based geospatial foundational model pre-trained on more than 1TB of multispectral satellite imagery from the Harmonized Landsat-Sentinel 2 (HLS) dataset. Our study demonstrates the efficacy of our framework in successfully fine-tuning Prithvi to a range of Earth observation tasks that have not been tackled by previous work on foundation models involving multi-temporal cloud gap imputation, flood mapping, wildfire scar segmentation, and multi-temporal crop segmentation. Our experiments show that the pre-trained model accelerates the fine-tuning process compared to leveraging randomly initialized weights. In addition, pre-trained Prithvi compares well against the state-of-the-art, e.g., outperforming a conditional GAN model in multi-temporal cloud imputation by up to 5pp (or 5.7%) in the structural similarity index. Finally, due to the limited availability of labeled data in the field of Earth observation, we gradually reduce the quantity of available labeled data for refining the model to evaluate data efficiency and demonstrate that data can be decreased significantly without affecting the model's accuracy. The pre-trained 100 million parameter model and corresponding fine-tuning workflows have been released publicly as open source contributions to the global Earth sciences community through Hugging Face.
△ Less
Submitted 8 November, 2023; v1 submitted 28 October, 2023;
originally announced October 2023.
-
MCU-Wide Timing Side Channels and Their Detection
Authors:
Johannes Müller,
Anna Lena Duque Antón,
Lucas Deutschmann,
Dino Mehmedagić,
Cristiano Rodrigues,
Daniel Oliveira,
Keerthikumara Devarajegowda,
Mohammad Rahmani Fadiheh,
Sandro Pinto,
Dominik Stoffel,
Wolfgang Kunz
Abstract:
Microarchitectural timing side channels have been thoroughly investigated as a security threat in hardware designs featuring shared buffers (e.g., caches) or parallelism between attacker and victim task execution. However, contradicting common intuitions, recent activities demonstrate that this threat is real even in microcontroller SoCs without such features. In this paper, we describe SoC-wide t…
▽ More
Microarchitectural timing side channels have been thoroughly investigated as a security threat in hardware designs featuring shared buffers (e.g., caches) or parallelism between attacker and victim task execution. However, contradicting common intuitions, recent activities demonstrate that this threat is real even in microcontroller SoCs without such features. In this paper, we describe SoC-wide timing side channels previously neglected by security analysis and present a new formal method to close this gap. In a case study on the RISC-V Pulpissimo SoC, our method detected a vulnerability to a previously unknown attack variant that allows an attacker to obtain information about a victim's memory access behavior. After implementing a conservative fix, we were able to verify that the SoC is now secure w.r.t. the considered class of timing side channels.
△ Less
Submitted 18 July, 2024; v1 submitted 22 September, 2023;
originally announced September 2023.
-
Distilling HuBERT with LSTMs via Decoupled Knowledge Distillation
Authors:
Danilo de Oliveira,
Timo Gerkmann
Abstract:
Much research effort is being applied to the task of compressing the knowledge of self-supervised models, which are powerful, yet large and memory consuming. In this work, we show that the original method of knowledge distillation (and its more recently proposed extension, decoupled knowledge distillation) can be applied to the task of distilling HuBERT. In contrast to methods that focus on distil…
▽ More
Much research effort is being applied to the task of compressing the knowledge of self-supervised models, which are powerful, yet large and memory consuming. In this work, we show that the original method of knowledge distillation (and its more recently proposed extension, decoupled knowledge distillation) can be applied to the task of distilling HuBERT. In contrast to methods that focus on distilling internal features, this allows for more freedom in the network architecture of the compressed model. We thus propose to distill HuBERT's Transformer layers into an LSTM-based distilled model that reduces the number of parameters even below DistilHuBERT and at the same time shows improved performance in automatic speech recognition.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
ProWis: A Visual Approach for Building, Managing, and Analyzing Weather Simulation Ensembles at Runtime
Authors:
Carolina Veiga Ferreira de Souza,
Suzanna Maria Bonnet,
Daniel de Oliveira,
Marcio Cataldi,
Fabio Miranda,
Marcos Lage
Abstract:
Weather forecasting is essential for decision-making and is usually performed using numerical modeling. Numerical weather models, in turn, are complex tools that require specialized training and laborious setup and are challenging even for weather experts. Moreover, weather simulations are data-intensive computations and may take hours to days to complete. When the simulation is finished, the expe…
▽ More
Weather forecasting is essential for decision-making and is usually performed using numerical modeling. Numerical weather models, in turn, are complex tools that require specialized training and laborious setup and are challenging even for weather experts. Moreover, weather simulations are data-intensive computations and may take hours to days to complete. When the simulation is finished, the experts face challenges analyzing its outputs, a large mass of spatiotemporal and multivariate data. From the simulation setup to the analysis of results, working with weather simulations involves several manual and error-prone steps. The complexity of the problem increases exponentially when the experts must deal with ensembles of simulations, a frequent task in their daily duties. To tackle these challenges, we propose ProWis: an interactive and provenance-oriented system to help weather experts build, manage, and analyze simulation ensembles at runtime. Our system follows a human-in-the-loop approach to enable the exploration of multiple atmospheric variables and weather scenarios. ProWis was built in close collaboration with weather experts, and we demonstrate its effectiveness by presenting two case studies of rainfall events in Brazil.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
IRQ Coloring and the Subtle Art of Mitigating Interrupt-generated Interference
Authors:
Diogo Costa,
Luca Cuomo,
Daniel Oliveira,
Ida Maria Savino,
Bruno Morelli,
José Martins,
Alessandro Biasci,
Sandro Pinto
Abstract:
Integrating workloads with differing criticality levels presents a formidable challenge in achieving the stringent spatial and temporal isolation requirements imposed by safety-critical standards such as ISO26262. The shift towards high-performance multicore platforms has been posing increasing issues to the so-called mixed-criticality systems (MCS) due to the reciprocal interference created by co…
▽ More
Integrating workloads with differing criticality levels presents a formidable challenge in achieving the stringent spatial and temporal isolation requirements imposed by safety-critical standards such as ISO26262. The shift towards high-performance multicore platforms has been posing increasing issues to the so-called mixed-criticality systems (MCS) due to the reciprocal interference created by consolidated subsystems vying for access to shared (microarchitectural) resources (e.g., caches, bus interconnect, memory controller). The research community has acknowledged all these challenges. Thus, several techniques, such as cache partitioning and memory throttling, have been proposed to mitigate such interference; however, these techniques have some drawbacks and limitations that impact performance, memory footprint, and availability. In this work, we look from a different perspective. Departing from the observation that safety-critical workloads are typically event- and thus interrupt-driven, we mask "colored" interrupts based on the \ac{QoS} assessment, providing fine-grain control to mitigate interference on critical workloads without entirely suspending non-critical workloads. We propose the so-called IRQ coloring technique. We implement and evaluate the IRQ Coloring on a reference high-performance multicore platform, i.e., Xilinx ZCU102. Results demonstrate negligible performance overhead, i.e., <1% for a 100 microseconds period, and reasonable throughput guarantees for medium-critical workloads. We argue that the IRQ coloring technique presents predictability and intermediate guarantees advantages compared to state-of-art mechanisms
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Kernels, Data & Physics
Authors:
Francesco Cagnetta,
Deborah Oliveira,
Mahalakshmi Sabanayagam,
Nikolaos Tsilivis,
Julia Kempe
Abstract:
Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as…
▽ More
Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as data distillation and adversarial robustness, examples of inductive bias are also discussed.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings
Authors:
Danilo de Oliveira,
Julius Richter,
Jean-Marie Lemercier,
Tal Peer,
Timo Gerkmann
Abstract:
Since its inception, the field of deep speech enhancement has been dominated by predictive (discriminative) approaches, such as spectral mapping or masking. Recently, however, novel generative approaches have been applied to speech enhancement, attaining good denoising performance with high subjective quality scores. At the same time, advances in deep learning also allowed for the creation of neur…
▽ More
Since its inception, the field of deep speech enhancement has been dominated by predictive (discriminative) approaches, such as spectral mapping or masking. Recently, however, novel generative approaches have been applied to speech enhancement, attaining good denoising performance with high subjective quality scores. At the same time, advances in deep learning also allowed for the creation of neural network-based metrics, which have desirable traits such as being able to work without a reference (non-intrusively). Since generatively enhanced speech tends to exhibit radically different residual distortions, its evaluation using instrumental speech metrics may behave differently compared to predictively enhanced speech. In this paper, we evaluate the performance of the same speech enhancement backbone trained under predictive and generative paradigms on a variety of metrics and show that intrusive and non-intrusive measures correlate differently for each paradigm. This analysis motivates the search for metrics that can together paint a complete and unbiased picture of speech enhancement performance, irrespective of the model's training process.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Datasets for Portuguese Legal Semantic Textual Similarity: Comparing weak supervision and an annotation process approaches
Authors:
Daniel da Silva Junior,
Paulo Roberto dos S. Corval,
Aline Paes,
Daniel de Oliveira
Abstract:
The Brazilian judiciary has a large workload, resulting in a long time to finish legal proceedings. Brazilian National Council of Justice has established in Resolution 469/2022 formal guidance for document and process digitalization opening up the possibility of using automatic techniques to help with everyday tasks in the legal field, particularly in a large number of texts yielded on the routine…
▽ More
The Brazilian judiciary has a large workload, resulting in a long time to finish legal proceedings. Brazilian National Council of Justice has established in Resolution 469/2022 formal guidance for document and process digitalization opening up the possibility of using automatic techniques to help with everyday tasks in the legal field, particularly in a large number of texts yielded on the routine of law procedures. Notably, Artificial Intelligence (AI) techniques allow for processing and extracting useful information from textual data, potentially speeding up the process. However, datasets from the legal domain required by several AI techniques are scarce and difficult to obtain as they need labels from experts. To address this challenge, this article contributes with four datasets from the legal domain, two with documents and metadata but unlabeled, and another two labeled with a heuristic aiming at its use in textual semantic similarity tasks. Also, to evaluate the effectiveness of the proposed heuristic label process, this article presents a small ground truth dataset generated from domain expert annotations. The analysis of ground truth labels highlights that semantic analysis of domain text can be challenging even for domain experts. Also, the comparison between ground truth and heuristic labels shows that heuristic labels are useful.
△ Less
Submitted 29 May, 2023;
originally announced June 2023.
-
Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models
Authors:
Danilo de Oliveira,
Navin Raj Prabhu,
Timo Gerkmann
Abstract:
In large part due to their implicit semantic modeling, self-supervised learning (SSL) methods have significantly increased the performance of valence recognition in speech emotion recognition (SER) systems. Yet, their large size may often hinder practical implementations. In this work, we take HuBERT as an example of an SSL model and analyze the relevance of each of its layers for SER. We show tha…
▽ More
In large part due to their implicit semantic modeling, self-supervised learning (SSL) methods have significantly increased the performance of valence recognition in speech emotion recognition (SER) systems. Yet, their large size may often hinder practical implementations. In this work, we take HuBERT as an example of an SSL model and analyze the relevance of each of its layers for SER. We show that shallow layers are more important for arousal recognition while deeper layers are more important for valence. This observation motivates the importance of additional textual information for accurate valence recognition, as the distilled framework lacks the depth of its large-scale SSL teacher. Thus, we propose an audio-textual distilled SSL framework that, while having only ~20% of the trainable parameters of a large SSL model, achieves on par performance across the three emotion dimensions (arousal, valence, dominance) on the MSP-Podcast v1.10 dataset.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
An interpretable machine learning system for colorectal cancer diagnosis from pathology slides
Authors:
Pedro C. Neto,
Diana Montezuma,
Sara P. Oliveira,
Domingos Oliveira,
João Fraga,
Ana Monteiro,
João Monteiro,
Liliana Ribeiro,
Sofia Gonçalves,
Stefan Reinhard,
Inti Zlobec,
Isabel M. Pinto,
Jaime S. Cardoso
Abstract:
Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an app…
▽ More
Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation. Noting some problems in the literature, this study is conducted with one of the largest WSI colorectal samples dataset with approximately 10,500 WSIs. Of these samples, 900 are testing samples. Furthermore, the robustness of the proposed method is assessed with two additional external datasets (TCGA and PAIP) and a dataset of samples collected directly from the proposed prototype. Our proposed method predicts, for the patch-based tiles, a class based on the severity of the dysplasia and uses that information to classify the whole slide. It is trained with an interpretable mixed-supervision scheme to leverage the domain knowledge introduced by pathologists through spatial annotations. The mixed-supervision scheme allowed for an intelligent sampling strategy effectively evaluated in several different scenarios without compromising the performance. On the internal dataset, the method shows an accuracy of 93.44% and a sensitivity between positive (low-grade and high-grade dysplasia) and non-neoplastic samples of 0.996. On the external test samples varied with TCGA being the most challenging dataset with an overall accuracy of 84.91% and a sensitivity of 0.996.
△ Less
Submitted 30 April, 2024; v1 submitted 6 January, 2023;
originally announced January 2023.
-
Chronic pain patient narratives allow for the estimation of current pain intensity
Authors:
Diogo A. P. Nunes,
Joana Ferreira-Gomes,
Daniela Oliveira,
Carlos Vaz,
Sofia Pimenta,
Fani Neto,
David Martins de Matos
Abstract:
Chronic pain is a multi-dimensional experience, and pain intensity plays an important part, impacting the patients emotional balance, psychology, and behaviour. Standard self-reporting tools, such as the Visual Analogue Scale for pain, fail to capture this burden. Moreover, this type of tools is susceptible to a degree of subjectivity, dependent on the patients clear understanding of how to use it…
▽ More
Chronic pain is a multi-dimensional experience, and pain intensity plays an important part, impacting the patients emotional balance, psychology, and behaviour. Standard self-reporting tools, such as the Visual Analogue Scale for pain, fail to capture this burden. Moreover, this type of tools is susceptible to a degree of subjectivity, dependent on the patients clear understanding of how to use it, social biases, and their ability to translate a complex experience to a scale. To overcome these and other self-reporting challenges, pain intensity estimation has been previously studied based on facial expressions, electroencephalograms, brain imaging, and autonomic features. However, to the best of our knowledge, it has never been attempted to base this estimation on the patient narratives of the personal experience of chronic pain, which is what we propose in this work. Indeed, in the clinical assessment and management of chronic pain, verbal communication is essential to convey information to physicians that would otherwise not be easily accessible through standard reporting tools, since language, sociocultural, and psychosocial variables are intertwined. We show that language features from patient narratives indeed convey information relevant for pain intensity estimation, and that our computational models can take advantage of that. Specifically, our results show that patients with mild pain focus more on the use of verbs, whilst moderate and severe pain patients focus on adverbs, and nouns and adjectives, respectively, and that these differences allow for the distinction between these three pain classes.
△ Less
Submitted 17 November, 2022; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Exploring Self-Attention for Crop-type Classification Explainability
Authors:
Ivica Obadic,
Ribana Roscher,
Dario Augusto Borges Oliveira,
Xiao Xiang Zhu
Abstract:
Automated crop-type classification using Sentinel-2 satellite time series is essential to support agriculture monitoring. Recently, deep learning models based on transformer encoders became a promising approach for crop-type classification. Using explainable machine learning to reveal the inner workings of these models is an important step towards improving stakeholders' trust and efficient agricu…
▽ More
Automated crop-type classification using Sentinel-2 satellite time series is essential to support agriculture monitoring. Recently, deep learning models based on transformer encoders became a promising approach for crop-type classification. Using explainable machine learning to reveal the inner workings of these models is an important step towards improving stakeholders' trust and efficient agriculture monitoring.
In this paper, we introduce a novel explainability framework that aims to shed a light on the essential crop disambiguation patterns learned by a state-of-the-art transformer encoder model. More specifically, we process the attention weights of a trained transformer encoder to reveal the critical dates for crop disambiguation and use domain knowledge to uncover the phenological events that support the model performance. We also present a sensitivity analysis approach to understand better the attention capability for revealing crop-specific phenological events.
We report compelling results showing that attention patterns strongly relate to key dates, and consequently, to the critical phenological events for crop-type classification. These findings might be relevant for improving stakeholder trust and optimizing agriculture monitoring processes. Additionally, our sensitivity analysis demonstrates the limitation of attention weights for identifying the important events in the crop phenology as we empirically show that the unveiled phenological events depend on the other crops in the data considered during training.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Improving Data Quality with Training Dynamics of Gradient Boosting Decision Trees
Authors:
Moacir Antonelli Ponti,
Lucas de Angelis Oliveira,
Mathias Esteban,
Valentina Garcia,
Juan Martín Román,
Luis Argerich
Abstract:
Real world datasets contain incorrectly labeled instances that hamper the performance of the model and, in particular, the ability to generalize out of distribution. Also, each example might have different contribution towards learning. This motivates studies to better understanding of the role of data instances with respect to their contribution in good metrics in models. In this paper we propose…
▽ More
Real world datasets contain incorrectly labeled instances that hamper the performance of the model and, in particular, the ability to generalize out of distribution. Also, each example might have different contribution towards learning. This motivates studies to better understanding of the role of data instances with respect to their contribution in good metrics in models. In this paper we propose a method based on metrics computed from training dynamics of Gradient Boosting Decision Trees (GBDTs) to assess the behavior of each training example. We focus on datasets containing mostly tabular or structured data, for which the use of Decision Trees ensembles are still the state-of-the-art in terms of performance. Our methods achieved the best results overall when compared with confident learning, direct heuristics and a robust boosting algorithm. We show results on detecting noisy labels in order clean datasets, improving models' metrics in synthetic and real public datasets, as well as on a industry case in which we deployed a model based on the proposed solution.
△ Less
Submitted 22 February, 2024; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Transfer-learning for video classification: Video Swin Transformer on multiple domains
Authors:
Daniel Oliveira,
David Martins de Matos
Abstract:
The computer vision community has seen a shift from convolutional-based to pure transformer architectures for both image and video tasks. Training a transformer from zero for these tasks usually requires a lot of data and computational resources. Video Swin Transformer (VST) is a pure-transformer model developed for video classification which achieves state-of-the-art results in accuracy and effic…
▽ More
The computer vision community has seen a shift from convolutional-based to pure transformer architectures for both image and video tasks. Training a transformer from zero for these tasks usually requires a lot of data and computational resources. Video Swin Transformer (VST) is a pure-transformer model developed for video classification which achieves state-of-the-art results in accuracy and efficiency on several datasets. In this paper, we aim to understand if VST generalizes well enough to be used in an out-of-domain setting. We study the performance of VST on two large-scale datasets, namely FCVID and Something-Something using a transfer learning approach from Kinetics-400, which requires around 4x less memory than training from scratch. We then break down the results to understand where VST fails the most and in which scenarios the transfer-learning approach is viable. Our experiments show an 85\% top-1 accuracy on FCVID without retraining the whole model which is equal to the state-of-the-art for the dataset and a 21\% accuracy on Something-Something. The experiments also suggest that the performance of the VST decreases on average when the video duration increases which seems to be a consequence of a design choice of the model. From the results, we conclude that VST generalizes well enough to classify out-of-domain videos without retraining when the target classes are from the same type as the classes used to train the model. We observed this effect when we performed transfer-learning from Kinetics-400 to FCVID, where most datasets target mostly objects. On the other hand, if the classes are not from the same type, then the accuracy after the transfer-learning approach is expected to be poor. We observed this effect when we performed transfer-learning from Kinetics-400, where the classes represent mostly objects, to Something-Something, where the classes represent mostly actions.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Image-Based Detection of Modifications in Gas Pump PCBs with Deep Convolutional Autoencoders
Authors:
Diulhio Candido de Oliveira,
Bogdan Tomoyuki Nassu,
Marco Aurelio Wehrmeister
Abstract:
In this paper, we introduce an approach for detecting modifications in assembled printed circuit boards based on photographs taken without tight control over perspective and illumination conditions. One instance of this problem is the visual inspection of gas pumps PCBs, which can be modified by fraudsters wishing to deceive costumers or evade taxes. Given the uncontrolled environment and the huge…
▽ More
In this paper, we introduce an approach for detecting modifications in assembled printed circuit boards based on photographs taken without tight control over perspective and illumination conditions. One instance of this problem is the visual inspection of gas pumps PCBs, which can be modified by fraudsters wishing to deceive costumers or evade taxes. Given the uncontrolled environment and the huge number of possible modifications, we address the problem as a case of anomaly detection, proposing an approach that is directed towards the characteristics of that scenario, while being well-suited for other similar applications. The proposed approach employs a deep convolutional autoencoder trained to reconstruct images of an unmodified board, but which remains unable to do the same for images showing modifications. By comparing the input image with its reconstruction, it is possible to segment anomalies and modifications in a pixel-wise manner. Experiments performed on a dataset built to represent real-world situations (and which we will make publicly available) show that our approach outperforms other state-of-the-art approaches for anomaly segmentation in the considered scenario, while producing comparable results on the popular MVTec-AD dataset for a more general object anomaly detection task.
△ Less
Submitted 6 October, 2022; v1 submitted 30 September, 2022;
originally announced October 2022.
-
A Systematic Literature Review on the Impact of Formatting Elements on Code Legibility
Authors:
Delano Oliveira,
Reydne Santos,
Fernanda Madeiral,
Hidehiko Masuhara,
Fernando Castor
Abstract:
Context: Software programs can be written in different but functionally equivalent ways. Even though previous research has compared specific formatting elements to find out which alternatives affect code legibility, seeing the bigger picture of what makes code more or less legible is challenging. Goal: We aim to find which formatting elements have been investigated in empirical studies and which a…
▽ More
Context: Software programs can be written in different but functionally equivalent ways. Even though previous research has compared specific formatting elements to find out which alternatives affect code legibility, seeing the bigger picture of what makes code more or less legible is challenging. Goal: We aim to find which formatting elements have been investigated in empirical studies and which alternatives were found to be more legible for human subjects. Method: We conducted a systematic literature review and identified 15 papers containing human-centric studies that directly compared alternative formatting elements. We analyzed and organized these formatting elements using a card-sorting method. Results: We identified 13 formatting elements (e.g., indentation) and 33 levels of formatting elements (e.g., two-space indentation), which are about formatting styles, spacing, block delimiters, long or complex code lines, and word boundary styles. While some levels were found to be statistically better than other equivalent ones in terms of code legibility, e.g., appropriate use of indentation with blocks, others were not, e.g., formatting layout. For identifier style, we found divergent results, where one study found a significant difference in favor of camel case, while another study found a positive result in favor of snake case. Conclusion: The number of identified papers, some of which are outdated, and the many null and contradictory results emphasize the relative lack of work in this area and underline the importance of more research. There is much to be understood about how formatting elements influence code legibility before the creation of guidelines and automated aids to help developers make their code more legible.
△ Less
Submitted 1 June, 2023; v1 submitted 25 August, 2022;
originally announced August 2022.
-
Learning crop type mapping from regional label proportions in large-scale SAR and optical imagery
Authors:
Laura E. C. La Rosa,
Dario A. B. Oliveira,
Pedram Ghamisi
Abstract:
The application of deep learning algorithms to Earth observation (EO) in recent years has enabled substantial progress in fields that rely on remotely sensed data. However, given the data scale in EO, creating large datasets with pixel-level annotations by experts is expensive and highly time-consuming. In this context, priors are seen as an attractive way to alleviate the burden of manual labelin…
▽ More
The application of deep learning algorithms to Earth observation (EO) in recent years has enabled substantial progress in fields that rely on remotely sensed data. However, given the data scale in EO, creating large datasets with pixel-level annotations by experts is expensive and highly time-consuming. In this context, priors are seen as an attractive way to alleviate the burden of manual labeling when training deep learning methods for EO. For some applications, those priors are readily available. Motivated by the great success of contrastive-learning methods for self-supervised feature representation learning in many computer-vision tasks, this study proposes an online deep clustering method using crop label proportions as priors to learn a sample-level classifier based on government crop-proportion data for a whole agricultural region. We evaluate the method using two large datasets from two different agricultural regions in Brazil. Extensive experiments demonstrate that the method is robust to different data types (synthetic-aperture radar and optical images), reporting higher accuracy values considering the major crop types in the target regions. Thus, it can alleviate the burden of large-scale image annotation in EO applications.
△ Less
Submitted 24 August, 2022;
originally announced August 2022.
-
Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
Authors:
Alef Iury Siqueira Ferreira,
Gustavo dos Reis Oliveira
Abstract:
This paper presents our efforts to build a robust ASR model for the shared task Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022). The goal of the challenge is to advance the ASR research for the Portuguese language, considering prepared and spontaneous speech in different dialects. Our method consist on fine-tuning an ASR model…
▽ More
This paper presents our efforts to build a robust ASR model for the shared task Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022). The goal of the challenge is to advance the ASR research for the Portuguese language, considering prepared and spontaneous speech in different dialects. Our method consist on fine-tuning an ASR model in a domain-specific approach, applying gain normalization and selective noise insertion. The proposed method improved over the strong baseline provided on the test set in 3 of the 4 tracks available
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes
Authors:
Danilo de Oliveira,
Tal Peer,
Timo Gerkmann
Abstract:
The SepFormer architecture shows very good results in speech separation. Like other learned-encoder models, it uses short frames, as they have been shown to obtain better performance in these cases. This results in a large number of frames at the input, which is problematic; since the SepFormer is transformer-based, its computational complexity drastically increases with longer sequences. In this…
▽ More
The SepFormer architecture shows very good results in speech separation. Like other learned-encoder models, it uses short frames, as they have been shown to obtain better performance in these cases. This results in a large number of frames at the input, which is problematic; since the SepFormer is transformer-based, its computational complexity drastically increases with longer sequences. In this paper, we employ the SepFormer in a speech enhancement task and show that by replacing the learned-encoder features with a magnitude short-time Fourier transform (STFT) representation, we can use long frames without compromising perceptual enhancement performance. We obtained equivalent quality and intelligibility evaluation scores while reducing the number of operations by a factor of approximately 8 for a 10-second utterance.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Learning from Positive and Negative Examples: New Proof for Binary Alphabets
Authors:
Jonas Lingg,
Mateus de Oliveira Oliveira,
Petra Wolf
Abstract:
One of the most fundamental problems in computational learning theory is the the problem of learning a finite automaton $A$ consistent with a finite set $P$ of positive examples and with a finite set $N$ of negative examples. By consistency, we mean that $A$ accepts all strings in $P$ and rejects all strings in $N$. It is well known that this problem is NP-complete. In the literature, it is stated…
▽ More
One of the most fundamental problems in computational learning theory is the the problem of learning a finite automaton $A$ consistent with a finite set $P$ of positive examples and with a finite set $N$ of negative examples. By consistency, we mean that $A$ accepts all strings in $P$ and rejects all strings in $N$. It is well known that this problem is NP-complete. In the literature, it is stated that this NP-hardness holds even in the case of a binary alphabet. As a standard reference for this theorem, the work of Gold from 1978 is either cited or adapted. But as a crucial detail, the work of Gold actually considered Mealy machines and not deterministic finite state automata (DFAs) as they are considered nowadays. As Mealy automata are equipped with an output function, they can be more compact than DFAs which accept the same language. We show that the adaptions of Gold's construction for Mealy machines stated in the literature have some issues and give a new construction for DFAs with a binary alphabet ourselves.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
On the Bug-proneness of Structures Inspired by Functional Programming in JavaScript Projects
Authors:
Fernando Alves,
Delano Oliveira,
Fernanda Madeiral,
Fernando Castor
Abstract:
Language constructs inspired by functional programming have made their way into most mainstream programming languages. Many researchers and developers consider that these constructs lead to programs that are more concise, reusable, and easier to understand. However, few studies investigate the implications of using them in mainstream programming languages. This paper quantifies the prevalence of f…
▽ More
Language constructs inspired by functional programming have made their way into most mainstream programming languages. Many researchers and developers consider that these constructs lead to programs that are more concise, reusable, and easier to understand. However, few studies investigate the implications of using them in mainstream programming languages. This paper quantifies the prevalence of four concepts typically associated with functional programming in JavaScript: recursion, immutability, lazy evaluation, and functions as values. We focus on JavaScript programs due to the availability of some of these concepts in the language since its inception, its inspiration from functional programming languages, and its popularity. We mine 91 GitHub repositories (22+ million LOC) written mostly in JavaScript (over 50% of the code), measuring the usage of these concepts from both static and temporal perspectives. We also measure the likelihood of bug-fixing commits removing uses of these concepts (which would hint at bug-proneness) and their association with the presence of code comments (which would hint at code that is hard to understand). We find that these concepts are in widespread use (1 for every 46.65 LOC, 43.59% of LOC). In addition, the usage of higher-order functions, immutability, and lazy evaluation-related structures has been growing throughout the years for the analyzed projects, while the usage of recursion and callbacks & promises has decreased. We also find statistical evidence that removing these structures, with the exception of the ones associated to immutability, is less common in bug-fixing commits than in other commits. In addition, their presence is not correlated with comment size. Our findings suggest that functional programming concepts are important for developers using a multi-paradigm language, and their usage does not make programs harder to understand.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Unikernel Linux (UKL)
Authors:
Ali Raza,
Thomas Unger,
Matthew Boyd,
Eric Munson,
Parul Sohal,
Ulrich Drepper,
Richard Jones,
Daniel Bristot de Oliveira,
Larry Woodman,
Renato Mancuso,
Jonathan Appavoo,
Orran Krieger
Abstract:
This paper presents Unikernel Linux (UKL), a path toward integrating unikernel optimization techniques in Linux, a general purpose operating system. UKL adds a configuration option to Linux allowing for a single, optimized process to link with the kernel directly, and run at supervisor privilege. This UKL process does not require application source code modification, only a re-link with our, sligh…
▽ More
This paper presents Unikernel Linux (UKL), a path toward integrating unikernel optimization techniques in Linux, a general purpose operating system. UKL adds a configuration option to Linux allowing for a single, optimized process to link with the kernel directly, and run at supervisor privilege. This UKL process does not require application source code modification, only a re-link with our, slightly modified, Linux kernel and glibc. Unmodified applications show modest performance gains out of the box, and developers can further optimize applications for more significant gains (e.g. 26% throughput improvement for Redis). UKL retains support for co-running multiple user level processes capable of communicating with the UKL process using standard IPC. UKL preserves Linux's battle-tested codebase, community, and ecosystem of tools, applications, and hardware support. UKL runs both on bare-metal and virtual servers and supports multi-core execution. The changes to the Linux kernel are modest (1250 LOC).
△ Less
Submitted 22 June, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
From Width-Based Model Checking to Width-Based Automated Theorem Proving
Authors:
Mateus de Oliveira Oliveira,
Farhad Vadiee
Abstract:
In the field of parameterized complexity theory, the study of graph width measures has been intimately connected with the development of width-based model checking algorithms for combinatorial properties on graphs. In this work, we introduce a general framework to convert a large class of width-based model-checking algorithms into algorithms that can be used to test the validity of graph-theoretic…
▽ More
In the field of parameterized complexity theory, the study of graph width measures has been intimately connected with the development of width-based model checking algorithms for combinatorial properties on graphs. In this work, we introduce a general framework to convert a large class of width-based model-checking algorithms into algorithms that can be used to test the validity of graph-theoretic conjectures on classes of graphs of bounded width. Our framework is modular and can be applied with respect to several well-studied width measures for graphs, including treewidth and cliquewidth.
As a quantitative application of our framework, we show that for several long-standing graph-theoretic conjectures, there exists an algorithm that takes a number $k$ as input and correctly determines in time double-exponential in $k^{O(1)}$ whether the conjecture is valid on all graphs of treewidth at most $k$. This improves significantly on upper bounds obtained using previously available techniques.
△ Less
Submitted 29 May, 2022; v1 submitted 22 May, 2022;
originally announced May 2022.
-
Direct Foundations for Compositional Programming
Authors:
Andong Fan,
Xuejing Huang,
Han Xu,
Yaozhu Sun,
Bruno C. d. S. Oliveira
Abstract:
The recently proposed CP language adopts Compositional Programming: a new modular programming style that solves challenging problems such as the Expression Problem. CP is implemented on top of a polymorphic core language with disjoint intersection types called Fi+. The semantics of Fi+ employs an elaboration to a target language and relies on a sophisticated proof technique to prove the coherence…
▽ More
The recently proposed CP language adopts Compositional Programming: a new modular programming style that solves challenging problems such as the Expression Problem. CP is implemented on top of a polymorphic core language with disjoint intersection types called Fi+. The semantics of Fi+ employs an elaboration to a target language and relies on a sophisticated proof technique to prove the coherence of the elaboration. Unfortunately, the proof technique is technically challenging and hard to scale to many common features, including recursion or impredicative polymorphism. Thus, the original formulation of Fi+ does not support the two later features, which creates a gap between theory and practice, since CP fundamentally relies on them.
This paper presents a new formulation of Fi+ based on a type-directed operational semantics (TDOS). The TDOS approach was recently proposed to model the semantics of languages with disjoint intersection types (but without polymorphism). Our work shows that the TDOS approach can be extended to languages with disjoint polymorphism and model the full Fi+ calculus. Unlike the elaboration semantics, which gives the semantics to Fi+ indirectly via a target language, the TDOS approach gives a semantics to Fi+ directly. With a TDOS, there is no need for a coherence proof. Instead, we can simply prove that the semantics is deterministic. The proof of determinism only uses simple reasoning techniques, such as straightforward induction, and is able to handle problematic features such as recursion and impredicative polymorphism. This removes the gap between theory and practice and validates the original proofs of correctness for CP. We formalized the TDOS variant of the Fi+ calculus and all its proofs in the Coq proof assistant.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
QuFI: a Quantum Fault Injector to Measure the Reliability of Qubits and Quantum Circuits
Authors:
Daniel Oliveira,
Edoardo Giusto,
Emanuele Dri,
Nadir Casciola,
Betis Baheri,
Qiang Guan,
Bartolomeo Montrucchio,
Paolo Rech
Abstract:
Quantum computing is a new technology that is expected to revolutionize the computation paradigm in the next few years. Qubits exploit the quantum physics proprieties to increase the parallelism and speed of computation. Unfortunately, besides being intrinsically noisy, qubits have also been shown to be highly susceptible to external sources of faults, such as ionizing radiation. The latest discov…
▽ More
Quantum computing is a new technology that is expected to revolutionize the computation paradigm in the next few years. Qubits exploit the quantum physics proprieties to increase the parallelism and speed of computation. Unfortunately, besides being intrinsically noisy, qubits have also been shown to be highly susceptible to external sources of faults, such as ionizing radiation. The latest discoveries highlight a much higher radiation sensitivity of qubits than traditional transistors and identify a much more complex fault model than bit-flip. We propose a framework to identify the quantum circuits sensitivity to radiation-induced faults and the probability for a fault in a qubit to propagate to the output. Based on the latest studies and radiation experiments performed on real quantum machines, we model the transient faults in a qubit as a phase shift with a parametrized magnitude. Additionally, our framework can inject multiple qubit faults, tuning the phase shift magnitude based on the proximity of the qubit to the particle strike location. As we show in the paper, the proposed fault injector is highly flexible, and it can be used on both quantum circuit simulators and real quantum machines. We report the finding of more than 285M injections on the Qiskit simulator and 53K injections on real IBM machines. We consider three quantum algorithms and identify the faults and qubits that are more likely to impact the output. We also consider the fault propagation dependence on the circuit scale, showing that the reliability profile for some quantum algorithms is scale-dependent, with increased impact from radiation-induced faults as we increase the number of qubits. Finally, we also consider multi qubits faults, showing that they are much more critical than single faults. The fault injector and the data presented in this paper are available in a public repository to allow further analysis.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
More Software Analytics Patterns: Broad-Spectrum Diagnostic and Embedded Improvements
Authors:
Duarte Oliveira,
João Fidalgo,
Joelma Choma,
Eduardo Guerra,
Filipe Correia
Abstract:
Software analytics is a data-driven approach to decision making, which allows software practitioners to leverage valuable insights from data about software to achieve higher development process productivity and improve different aspects of software quality. In previous work, a set of patterns for adopting a lean software analytics process was identified through a literature review. This paper pres…
▽ More
Software analytics is a data-driven approach to decision making, which allows software practitioners to leverage valuable insights from data about software to achieve higher development process productivity and improve different aspects of software quality. In previous work, a set of patterns for adopting a lean software analytics process was identified through a literature review. This paper presents two patterns to add to the original set, forming a pattern language for adopting software analytics practices that aims to inform decision-making activities of software practitioners. The writing of these two patterns was informed by the solutions employed in the context of two case studies on software analytics practices, and the patterns were further validated by searching for their occurrence in the literature. The pattern Broad-Spectrum Diagnostic proposes to conduct more broad analysis based on common metrics when the team does not have the expertise to understand the kind of problems that software analytics can help to solve; and the pattern Embedded Improvements suggests adding improvement tasks as part of other routine activities.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
Gait Recognition Based on Deep Learning: A Survey
Authors:
Claudio Filipi Gonçalves dos Santos,
Diego de Souza Oliveira,
Leandro A. Passos,
Rafael Gonçalves Pires,
Daniel Felipe Silva Santos,
Lucas Pascotti Valem,
Thierry P. Moreira,
Marcos Cleison S. Santana,
Mateus Roder,
João Paulo Papa,
Danilo Colombo
Abstract:
In general, biometry-based control systems may not rely on individual expected behavior or cooperation to operate appropriately. Instead, such systems should be aware of malicious procedures for unauthorized access attempts. Some works available in the literature suggest addressing the problem through gait recognition approaches. Such methods aim at identifying human beings through intrinsic perce…
▽ More
In general, biometry-based control systems may not rely on individual expected behavior or cooperation to operate appropriately. Instead, such systems should be aware of malicious procedures for unauthorized access attempts. Some works available in the literature suggest addressing the problem through gait recognition approaches. Such methods aim at identifying human beings through intrinsic perceptible features, despite dressed clothes or accessories. Although the issue denotes a relatively long-time challenge, most of the techniques developed to handle the problem present several drawbacks related to feature extraction and low classification rates, among other issues. However, deep learning-based approaches recently emerged as a robust set of tools to deal with virtually any image and computer-vision related problem, providing paramount results for gait recognition as well. Therefore, this work provides a surveyed compilation of recent works regarding biometric detection through gait recognition with a focus on deep learning approaches, emphasizing their benefits, and exposing their weaknesses. Besides, it also presents categorized and characterized descriptions of the datasets, approaches, and architectures employed to tackle associated constraints.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
A Systematic Methodology to Compute the Quantum Vulnerability Factors for Quantum Circuits
Authors:
Daniel Oliveira,
Edoardo Giusto,
Betis Baheri,
Qiang Guan,
Bartolomeo Montrucchio,
Paolo Rech
Abstract:
Quantum computing is one of the most promising technology advances of the latest years. Once only a conceptual idea to solve physics simulations, quantum computation is today a reality, with numerous machines able to execute quantum algorithms. One of the hardest challenges in quantum computing is reliability. Qubits are highly sensitive to noise, which can make the output useless. Moreover, latel…
▽ More
Quantum computing is one of the most promising technology advances of the latest years. Once only a conceptual idea to solve physics simulations, quantum computation is today a reality, with numerous machines able to execute quantum algorithms. One of the hardest challenges in quantum computing is reliability. Qubits are highly sensitive to noise, which can make the output useless. Moreover, lately it has been shown that superconducting qubits are extremely susceptible to external sources of faults, such as ionizing radiation. When adopted in large scale, radiation-induced errors are expected to become a serious challenge for qubits reliability. In this paper, we propose an evaluation of the impact of transient faults in the execution of quantum circuits. Inspired by the Architectural and Program Vulnerability Factors, widely adopted to characterize the reliability of classical computing architectures and algorithms, we propose the Quantum Vulnerability Factor (QVF) as a metric to measure the impact that the corruption of a qubit has on the circuit output probability distribution. First, we model faults based on the latest studies on real machines and recently performed radiation experiments. Then, we design a quantum fault injector, built over Qiskit, and characterize the propagation of faults in quantum circuits. We report the finding of more than 15,000,000 fault injections, evaluating the reliability of three quantum circuits and identifying the faults and qubits that are more likely than others to impact the output. With our results, we give guidelines on how to map the qubits in the real quantum computer to reduce the output error and to reduce the probability of having a radiation-induced corruption to modify the output. Finally, we compare the simulation results with experiments on physical quantum computers.
△ Less
Submitted 13 November, 2021;
originally announced November 2021.
-
On the Complexity of Intersection Non-emptiness for Star-Free Language Classes
Authors:
Emmanuel Arrighi,
Henning Fernau,
Stefan Hoffmann,
Markus Holzer,
Ismaël Jecker,
Mateus de Oliveira Oliveira,
Petra Wolf
Abstract:
In the Intersection Non-Emptiness problem, we are given a list of finite automata $A_1,A_2,\dots,A_m$ over a common alphabet $Σ$ as input, and the goal is to determine whether some string $w\in Σ^*$ lies in the intersection of the languages accepted by the automata in the list. We analyze the complexity of the Intersection Non-Emptiness problem under the promise that all input automata accept a la…
▽ More
In the Intersection Non-Emptiness problem, we are given a list of finite automata $A_1,A_2,\dots,A_m$ over a common alphabet $Σ$ as input, and the goal is to determine whether some string $w\in Σ^*$ lies in the intersection of the languages accepted by the automata in the list. We analyze the complexity of the Intersection Non-Emptiness problem under the promise that all input automata accept a language in some level of the dot-depth hierarchy, or some level of the Straubing-Thérien hierarchy. Automata accepting languages from the lowest levels of these hierarchies arise naturally in the context of model checking. We identify a dichotomy in the dot-depth hierarchy by showing that the problem is already NP-complete when all input automata accept languages of the levels zero or one half and already PSPACE-hard when all automata accept a language from the level one. Conversely, we identify a tetrachotomy in the Straubing-Thérien hierarchy. More precisely, we show that the problem is in AC$^0$ when restricted to level zero; complete for LOGSPACE or NLOGSPACE, depending on the input representation, when restricted to languages in the level one half; NP-complete when the input is given as DFAs accepting a language in from level one or three half; and finally, PSPACE-complete when the input automata accept languages in level two or higher. Moreover, we show that the proof technique used to show containment in NP for DFAs accepting languages in the Straubing-Thérien hierarchy levels one ore three half does not generalize to the context of NFAs. To prove this, we identify a family of languages that provide an exponential separation between the state complexity of general NFAs and that of partially ordered NFAs. To the best of our knowledge, this is the first superpolynomial separation between these two models of computation.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Evaluating Code Readability and Legibility: An Examination of Human-centric Studies
Authors:
Delano Oliveira,
Reydne Bruno,
Fernanda Madeiral,
Fernando Castor
Abstract:
Reading code is an essential activity in software maintenance and evolution. Several studies with human subjects have investigated how different factors, such as the employed programming constructs and naming conventions, can impact code readability, i.e., what makes a program easier or harder to read and apprehend by developers, and code legibility, i.e., what influences the ease of identifying e…
▽ More
Reading code is an essential activity in software maintenance and evolution. Several studies with human subjects have investigated how different factors, such as the employed programming constructs and naming conventions, can impact code readability, i.e., what makes a program easier or harder to read and apprehend by developers, and code legibility, i.e., what influences the ease of identifying elements of a program. These studies evaluate readability and legibility by means of different comprehension tasks and response variables. In this paper, we examine these tasks and variables in studies that compare programming constructs, coding idioms, naming conventions, and formatting guidelines, e.g., recursive vs. iterative code. To that end, we have conducted a systematic literature review where we found 54 relevant papers. Most of these studies evaluate code readability and legibility by measuring the correctness of the subjects' results (83.3%) or simply asking their opinions (55.6%). Some studies (16.7%) rely exclusively on the latter variable.There are still few studies that monitor subjects' physical signs, such as brain activation regions (5%). Moreover, our study shows that some variables are multi-faceted. For instance, correctness can be measured as the ability to predict the output of a program, answer questions about its behavior, or recall parts of it. These results make it clear that different evaluation approaches require different competencies from subjects, e.g., tracing the program vs. summarizing its goal vs. memorizing its text. To assist researchers in the design of new studies and improve our comprehension of existing ones, we model program comprehension as a learning activity by adapting a preexisting learning taxonomy. This adaptation indicates that some competencies are often exercised in these evaluations whereas others are rarely targeted.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Recommending Code Understandability Improvements based on Code Reviews
Authors:
Delano Oliveira
Abstract:
Developers spend 70% of their time understanding code. Code that is easy to read can save time, while hard-to-read code can lead to the introduction of bugs. However, it is difficult to establish what makes code more understandable. Although there are guides and directives on improving code understandability, in some contexts, these practices can have a detrimental effect. Practical software devel…
▽ More
Developers spend 70% of their time understanding code. Code that is easy to read can save time, while hard-to-read code can lead to the introduction of bugs. However, it is difficult to establish what makes code more understandable. Although there are guides and directives on improving code understandability, in some contexts, these practices can have a detrimental effect. Practical software development projects often employ code review to improve code quality, including understandability. Reviewers are often senior developers who have contributed extensively to projects and have an in-depth understanding of the impacts of different solutions on code understandability. This paper is an early research proposal to recommend code understandability improvements based on code reviewer knowledge. The core of the proposal comprises a dataset of code understandability improvements extracted from code reviews. This dataset will serve as a basis to train machine learning systems to recommend understandability improvements.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Toward Reusable Science with Readable Code and Reproducibility
Authors:
Layan Bahaidarah,
Ethan Hung,
Andreas F. De Melo Oliveira,
Jyotsna Penumaka,
Lukas Rosario,
Ana Trisovic
Abstract:
An essential part of research and scientific communication is researchers' ability to reproduce the results of others. While there have been increasing standards for authors to make data and code available, many of these files are hard to re-execute in practice, leading to a lack of research reproducibility. This poses a major problem for students and researchers in the same field who cannot lever…
▽ More
An essential part of research and scientific communication is researchers' ability to reproduce the results of others. While there have been increasing standards for authors to make data and code available, many of these files are hard to re-execute in practice, leading to a lack of research reproducibility. This poses a major problem for students and researchers in the same field who cannot leverage the previously published findings for study or further inquiry. To address this, we propose an open-source platform named RE3 that helps improve the reproducibility and readability of research projects involving R code. Our platform incorporates assessing code readability with a machine learning model trained on a code readability survey and an automatic containerization service that executes code files and warns users of reproducibility errors. This process helps ensure the reproducibility and readability of projects and therefore fast-track their verification and reuse.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
A Tutorial on Trusted and Untrusted non-3GPP Accesses in 5G Systems -- First Steps Towards a Unified Communications Infrastructure
Authors:
Mario Teixeira Lemes,
Antonio Marcos Alberti,
Cristiano Bonato Both,
Antonio C. de Oliveira Jr.,
Kleber Vieira Cardoso
Abstract:
Fifth-generation (5G) systems are designed to enable convergent access-agnostic service availability. This means that 5G services will be available over 5G New Radio air interface and also through other non-Third Generation Partnership Project (3GPP) access networks, e.g., IEEE 802.11 (Wi-Fi). 3GPP has recently published the Release 16 that includes trusted non-3GPP access network concept and wire…
▽ More
Fifth-generation (5G) systems are designed to enable convergent access-agnostic service availability. This means that 5G services will be available over 5G New Radio air interface and also through other non-Third Generation Partnership Project (3GPP) access networks, e.g., IEEE 802.11 (Wi-Fi). 3GPP has recently published the Release 16 that includes trusted non-3GPP access network concept and wireless wireline convergence. The main goal of this tutorial is to present an overview of access to 5G core via non-3GPP access networks specified by 3GPP until Release 16 (i.e., untrusted, trusted, and wireline access). The tutorial describes aspects of the convergence of a 5G system and these non-3GPP access networks, such as the authentication and authorization procedures and the data session establishment from the point of view of protocol stack and exchanged messages between the network functions. In order to illustrate several concepts and part of 3GPP specification, we present a basic but fully operational implementation of untrusted non-3GPP access using WLAN. We perform experiments that demonstrate how a Wi-Fi user is authorized in a 5G core and establishes user plane connectivity to a data network. Moreover, we evaluate the performance of this access in terms of time consumed, number of messages, and protocol overhead to established data sessions.
△ Less
Submitted 11 November, 2022; v1 submitted 18 September, 2021;
originally announced September 2021.
-
On the Compression of Neural Networks Using $\ell_0$-Norm Regularization and Weight Pruning
Authors:
Felipe Dennis de Resende Oliveira,
Eduardo Luiz Ortiz Batista,
Rui Seara
Abstract:
Despite the growing availability of high-capacity computational platforms, implementation complexity still has been a great concern for the real-world deployment of neural networks. This concern is not exclusively due to the huge costs of state-of-the-art network architectures, but also due to the recent push towards edge intelligence and the use of neural networks in embedded applications. In thi…
▽ More
Despite the growing availability of high-capacity computational platforms, implementation complexity still has been a great concern for the real-world deployment of neural networks. This concern is not exclusively due to the huge costs of state-of-the-art network architectures, but also due to the recent push towards edge intelligence and the use of neural networks in embedded applications. In this context, network compression techniques have been gaining interest due to their ability for reducing deployment costs while keeping inference accuracy at satisfactory levels. The present paper is dedicated to the development of a novel compression scheme for neural networks. To this end, a new form of $\ell_0$-norm-based regularization is firstly developed, which is capable of inducing strong sparseness in the network during training. Then, targeting the smaller weights of the trained network with pruning techniques, smaller yet highly effective networks can be obtained. The proposed compression scheme also involves the use of $\ell_2$-norm regularization to avoid overfitting as well as fine tuning to improve the performance of the pruned network. Experimental results are presented aiming to show the effectiveness of the proposed scheme as well as to make comparisons with competing approaches.
△ Less
Submitted 18 December, 2023; v1 submitted 10 September, 2021;
originally announced September 2021.
-
Second-Order Finite Automata
Authors:
Alexsander Andrade de Melo,
Mateus de Oliveira Oliveira
Abstract:
Traditionally, finite automata theory has been used as a framework for the representation of possibly infinite sets of strings. In this work, we introduce the notion of second-order finite automata, a formalism that combines finite automata with ordered decision diagrams, with the aim of representing possibly infinite {\em sets of sets} of strings. Our main result states that second-order finite a…
▽ More
Traditionally, finite automata theory has been used as a framework for the representation of possibly infinite sets of strings. In this work, we introduce the notion of second-order finite automata, a formalism that combines finite automata with ordered decision diagrams, with the aim of representing possibly infinite {\em sets of sets} of strings. Our main result states that second-order finite automata can be canonized with respect to the second-order languages they represent. Using this canonization result, we show that sets of sets of strings represented by second-order finite automata are closed under the usual Boolean operations, such as union, intersection, difference and even under a suitable notion of complementation. Additionally, emptiness of intersection and inclusion are decidable.
We provide two algorithmic applications for second-order automata. First, we show that several width/size minimization problems for deterministic and nondeterministic ODDs are solvable in fixed-parameter tractable time when parameterized by the width of the input ODD. In particular, our results imply FPT algorithms for corresponding width/size minimization problems for ordered binary decision diagrams (OBDDs) with a fixed variable ordering. Previously, only algorithms that take exponential time in the size of the input OBDD were known for width minimization, even for OBDDs of constant width. Second, we show that for each $k$ and $w$ one can count the number of distinct functions computable by ODDs of width at most $w$ and length $k$ in time $h(|Σ|,w)\cdot k^{O(1)}$, for a suitable $h:\mathbb{N}\times \mathbb{N}\rightarrow \mathbb{N}$. This improves exponentially on the time necessary to explicitly enumerate all such functions, which is exponential in both the width parameter $w$ and in the length $k$ of the ODDs.
△ Less
Submitted 29 August, 2021;
originally announced August 2021.
-
Early-exit deep neural networks for distorted images: providing an efficient edge offloading
Authors:
Roberto G. Pacheco,
Fernanda D. V. R. Oliveira,
Rodrigo S. Couto
Abstract:
Edge offloading for deep neural networks (DNNs) can be adaptive to the input's complexity by using early-exit DNNs. These DNNs have side branches throughout their architecture, allowing the inference to end earlier in the edge. The branches estimate the accuracy for a given input. If this estimated accuracy reaches a threshold, the inference ends on the edge. Otherwise, the edge offloads the infer…
▽ More
Edge offloading for deep neural networks (DNNs) can be adaptive to the input's complexity by using early-exit DNNs. These DNNs have side branches throughout their architecture, allowing the inference to end earlier in the edge. The branches estimate the accuracy for a given input. If this estimated accuracy reaches a threshold, the inference ends on the edge. Otherwise, the edge offloads the inference to the cloud to process the remaining DNN layers. However, DNNs for image classification deals with distorted images, which negatively impact the branches' estimated accuracy. Consequently, the edge offloads more inferences to the cloud. This work introduces expert side branches trained on a particular distortion type to improve robustness against image distortion. The edge detects the distortion type and selects appropriate expert branches to perform the inference. This approach increases the estimated accuracy on the edge, improving the offloading decisions. We validate our proposal in a realistic scenario, in which the edge offloads DNN inference to Amazon EC2 instances.
△ Less
Submitted 25 August, 2021; v1 submitted 20 August, 2021;
originally announced August 2021.