-
Waterfall: Framework for Robust and Scalable Text Watermarking
Authors:
Gregory Kang Ruey Lau,
Xinyuan Niu,
Hieu Dao,
Jiangwei Chen,
Chuan-Sheng Foo,
Bryan Kian Hsiang Low
Abstract:
Protecting intellectual property (IP) of text such as articles and code is increasingly important, especially as sophisticated attacks become possible, such as paraphrasing by large language models (LLMs) or even unauthorized training of LLMs on copyrighted text to infringe such IP. However, existing text watermarking methods are not robust enough against such attacks nor scalable to millions of u…
▽ More
Protecting intellectual property (IP) of text such as articles and code is increasingly important, especially as sophisticated attacks become possible, such as paraphrasing by large language models (LLMs) or even unauthorized training of LLMs on copyrighted text to infringe such IP. However, existing text watermarking methods are not robust enough against such attacks nor scalable to millions of users for practical implementation. In this paper, we propose Waterfall, the first training-free framework for robust and scalable text watermarking applicable across multiple text types (e.g., articles, code) and languages supportable by LLMs, for general text and LLM data provenance. Waterfall comprises several key innovations, such as being the first to use LLM as paraphrasers for watermarking along with a novel combination of techniques that are surprisingly effective in achieving robust verifiability and scalability. We empirically demonstrate that Waterfall achieves significantly better scalability, robust verifiability, and computational efficiency compared to SOTA article-text watermarking methods, and also showed how it could be directly applied to the watermarking of code.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Data-Centric AI in the Age of Large Language Models
Authors:
Xinyi Xu,
Zhaoxuan Wu,
Rui Qiao,
Arun Verma,
Yao Shu,
Jingtan Wang,
Xinyuan Niu,
Zhenfeng He,
Jiangwei Chen,
Zijian Zhou,
Gregory Kang Ruey Lau,
Hieu Dao,
Lucas Agussurja,
Rachael Hwee Ling Sim,
Xiaoqiang Lin,
Wenyang Hu,
Zhongxiang Dai,
Pang Wei Koh,
Bryan Kian Hsiang Low
Abstract:
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific…
▽ More
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization. In each scenario, we underscore the importance of data, highlight promising research directions, and articulate the potential impacts on the research community and, where applicable, the society as a whole. For instance, we advocate for a suite of data-centric benchmarks tailored to the scale and complexity of data for LLMs. These benchmarks can be used to develop new data curation methods and document research efforts and results, which can help promote openness and transparency in AI and LLM research.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
The Prompt Report: A Systematic Survey of Prompting Techniques
Authors:
Sander Schulhoff,
Michael Ilie,
Nishant Balepur,
Konstantine Kahadze,
Amanda Liu,
Chenglei Si,
Yinheng Li,
Aayush Gupta,
HyoJung Han,
Sevien Schulhoff,
Pranav Sandeep Dulepet,
Saurav Vidyadhara,
Dayeon Ki,
Sweta Agrawal,
Chau Pham,
Gerson Kroiz,
Feileen Li,
Hudson Tao,
Ashay Srivastava,
Hevander Da Costa,
Saloni Gupta,
Megan L. Rogers,
Inna Goncearenco,
Giuseppe Sarli,
Igor Galynker
, et al. (6 additional authors not shown)
Abstract:
Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a p…
▽ More
Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a prompt due to the area's nascency. This paper establishes a structured understanding of prompts, by assembling a taxonomy of prompting techniques and analyzing their use. We present a comprehensive vocabulary of 33 vocabulary terms, a taxonomy of 58 text-only prompting techniques, and 40 techniques for other modalities. We further present a meta-analysis of the entire literature on natural language prefix-prompting.
△ Less
Submitted 14 July, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Deep Learning Calabi-Yau four folds with hybrid and recurrent neural network architectures
Authors:
H. L. Dao
Abstract:
In this work, we report the results of applying deep learning based on hybrid convolutional-recurrent and purely recurrent neural network architectures to the dataset of almost one million complete intersection Calabi-Yau four-folds (CICY4) to machine-learn their four Hodge numbers $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$. In particular, we explored and experimented with twelve different neural networ…
▽ More
In this work, we report the results of applying deep learning based on hybrid convolutional-recurrent and purely recurrent neural network architectures to the dataset of almost one million complete intersection Calabi-Yau four-folds (CICY4) to machine-learn their four Hodge numbers $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$. In particular, we explored and experimented with twelve different neural network models, nine of which are convolutional-recurrent (CNN-RNN) hybrids with the RNN unit being either GRU (Gated Recurrent Unit) or Long Short Term Memory (LSTM). The remaining four models are purely recurrent neural networks based on LSTM. In terms of the $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$ prediction accuracies, at 72% training ratio, our best performing individual model is CNN-LSTM-400, a hybrid CNN-LSTM with the LSTM hidden size of 400, which obtained 99.74%, 98.07%, 95.19%, 81.01%, our second best performing individual model is LSTM-448, an LSTM-based model with the hidden size of 448, which obtained 99.74%, 97.51%, 94.24%, and 78.63%. These results were improved by forming ensembles of the top two, three or even four models. Our best ensemble, consisting of the top four models, achieved the accuracies of 99.84%, 98.71%, 96.26%, 85.03%. At 80% training ratio, the top two performing models LSTM-448 and LSTM-424 are both LSTM-based with the hidden sizes of 448 and 424. Compared with the 72% training ratio, there is a significant improvement of accuracies, which reached 99.85%, 98.66%, 96.26%, 84.77% for the best individual model and 99.90%, 99.03%, 97.97%, 87.34% for the best ensemble.
△ Less
Submitted 3 June, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Asyn2F: An Asynchronous Federated Learning Framework with Bidirectional Model Aggregation
Authors:
Tien-Dung Cao,
Nguyen T. Vuong,
Thai Q. Le,
Hoang V. N. Dao,
Tram Truong-Huu
Abstract:
In federated learning, the models can be trained synchronously or asynchronously. Many research works have focused on developing an aggregation method for the server to aggregate multiple local models into the global model with improved performance. They ignore the heterogeneity of the training workers, which causes the delay in the training of the local models, leading to the obsolete information…
▽ More
In federated learning, the models can be trained synchronously or asynchronously. Many research works have focused on developing an aggregation method for the server to aggregate multiple local models into the global model with improved performance. They ignore the heterogeneity of the training workers, which causes the delay in the training of the local models, leading to the obsolete information issue. In this paper, we design and develop Asyn2F, an Asynchronous Federated learning Framework with bidirectional model aggregation. By bidirectional model aggregation, Asyn2F, on one hand, allows the server to asynchronously aggregate multiple local models and results in a new global model. On the other hand, it allows the training workers to aggregate the new version of the global model into the local model, which is being trained even in the middle of a training epoch. We develop Asyn2F considering the practical implementation requirements such as using cloud services for model storage and message queuing protocols for communications. Extensive experiments with different datasets show that the models trained by Asyn2F achieve higher performance compared to the state-of-the-art techniques. The experiments also demonstrate the effectiveness, practicality, and scalability of Asyn2F, making it ready for deployment in real scenarios.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Chronicles of CI/CD: A Deep Dive into its Usage Over Time
Authors:
Hugo da Gião,
André Flores,
Rui Pereira,
Jácome Cunha
Abstract:
DevOps is a combination of methodologies and tools that improves the software development, build, deployment, and monitoring processes by shortening its lifecycle and improving software quality. Part of this process is CI/CD, which embodies mostly the first parts, right up to the deployment. Despite the many benefits of DevOps and CI/CD, it still presents many challenges promoted by the tremendous…
▽ More
DevOps is a combination of methodologies and tools that improves the software development, build, deployment, and monitoring processes by shortening its lifecycle and improving software quality. Part of this process is CI/CD, which embodies mostly the first parts, right up to the deployment. Despite the many benefits of DevOps and CI/CD, it still presents many challenges promoted by the tremendous proliferation of different tools, languages, and syntaxes, which makes the field quite challenging to learn and keep up to date. Software repositories contain data regarding various software practices, tools, and uses. This data can help gather multiple insights that inform technical and academic decision-making. GitHub is currently the most popular software hosting platform and provides a search API that lets users query its repositories. Our goal with this paper is to gain insights into the technologies developers use for CI/CD by analyzing GitHub repositories. Using a list of the state-of-the-art CI/CD technologies, we use the GitHub search API to find repositories using each of these technologies. We also use the API to extract various insights regarding those repositories. We then organize and analyze the data collected. From our analysis, we provide an overview of the use of CI/CD technologies in our days, but also what happened in the last 12 years. We also show developers use several technologies simultaneously in the same project and that the change between technologies is quite common. From these insights, we find several research paths, from how to support the use of multiple technologies, both in terms of techniques, but also in terms of human-computer interaction, to aiding developers in evolving their CI/CD pipelines, again considering the various dimensions of the problem.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Auto Tuning for OpenMP Dynamic Scheduling applied to FWI
Authors:
Felipe H. S. da Silva,
João B. Fernandes,
Idalmis M. Sardina,
Tiago Barros,
Samuel Xavier-de-Souza,
Italo A. S. Assis
Abstract:
Because Full Waveform Inversion (FWI) works with a massive amount of data, its execution requires much time and computational resources, being restricted to large-scale computer systems such as supercomputers. Techniques such as FWI adapt well to parallel computing and can be parallelized in shared memory systems using the application programming interface (API) OpenMP. The management of parallel…
▽ More
Because Full Waveform Inversion (FWI) works with a massive amount of data, its execution requires much time and computational resources, being restricted to large-scale computer systems such as supercomputers. Techniques such as FWI adapt well to parallel computing and can be parallelized in shared memory systems using the application programming interface (API) OpenMP. The management of parallel tasks can be performed through loop schedulers contained in OpenMP. The dynamic scheduler stands out for distributing predefined fixed-size chunk sizes to idle processing cores at runtime. It can better adapt to FWI, where data processing can be irregular. However, the relationship between the size of the chunk size and the runtime is unknown. Optimization techniques can employ meta-heuristics to explore the parameter search space, avoiding testing all possible solutions. Here, we propose a strategy to use the Parameter Auto Tuning for Shared Memory Algorithms (PATSMA), with Coupled Simulated Annealing (CSA) as its optimization method, to automatically adjust the chunk size for the dynamic scheduling of wave propagation, one of the most expensive steps in FWI. Since testing each candidate chunk size in the complete FWI is unpractical, our approach consists of running a PATSMA where the objective function is the runtime of the first time iteration of the first seismic shot of the first FWI iteration. The resulting chunk size is then employed in all wave propagations involved in an FWI. We conducted tests to measure the runtime of an FWI using the proposed autotuning, varying the problem size and running on different computational environments, such as supercomputers and cloud computing instances. The results show that applying the proposed autotuning in an FWI reduces its runtime by up to 70.46% compared to standard OpenMP schedulers.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
PATSMA: Parameter Auto-tuning for Shared Memory Algorithms
Authors:
Joao B. Fernandes,
Felipe H. S. da Silva,
Samuel Xavier-de-Souza,
Italo A. S. Assis
Abstract:
Programs with high levels of complexity often face challenges in adjusting execution parameters, particularly when these parameters vary based on the execution context. These dynamic parameters significantly impact the program's performance, such as loop granularity, which can vary depending on factors like the execution environment, program input, or the choice of compiler. Given the expensive na…
▽ More
Programs with high levels of complexity often face challenges in adjusting execution parameters, particularly when these parameters vary based on the execution context. These dynamic parameters significantly impact the program's performance, such as loop granularity, which can vary depending on factors like the execution environment, program input, or the choice of compiler. Given the expensive nature of testing each case individually, one viable solution is to automate parameter adjustments using optimization methods. This article introduces PATSMA, a parameter auto-tuning tool that leverages Coupled Simulated Annealing (CSA) and Nelder-Mead (NM) optimization methods to fine-tune existing parameters in an application. We demonstrate how auto-tuning can contribute to the real-time optimization of parallel algorithms designed for shared memory systems. PATSMA is a C++ library readily available under the MIT license.
△ Less
Submitted 14 June, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design
Authors:
Manna Dai,
Yang Jiang,
Feng Yang,
Joyjit Chattoraj,
Yingzhi Xia,
Xinxing Xu,
Weijiang Zhao,
My Ha Dao,
Yong Liu
Abstract:
Metasurfaces have widespread applications in fifth-generation (5G) microwave communication. Among the metasurface family, free-form metasurfaces excel in achieving intricate spectral responses compared to regular-shape counterparts. However, conventional numerical methods for free-form metasurfaces are time-consuming and demand specialized expertise. Alternatively, recent studies demonstrate that…
▽ More
Metasurfaces have widespread applications in fifth-generation (5G) microwave communication. Among the metasurface family, free-form metasurfaces excel in achieving intricate spectral responses compared to regular-shape counterparts. However, conventional numerical methods for free-form metasurfaces are time-consuming and demand specialized expertise. Alternatively, recent studies demonstrate that deep learning has great potential to accelerate and refine metasurface designs. Here, we present XGAN, an extended generative adversarial network (GAN) with a surrogate for high-quality free-form metasurface designs. The proposed surrogate provides a physical constraint to XGAN so that XGAN can accurately generate metasurfaces monolithically from input spectral responses. In comparative experiments involving 20000 free-form metasurface designs, XGAN achieves 0.9734 average accuracy and is 500 times faster than the conventional methodology. This method facilitates the metasurface library building for specific spectral responses and can be extended to various inverse design problems, including optical metamaterials, nanophotonic devices, and drug discovery.
△ Less
Submitted 18 October, 2023;
originally announced January 2024.
-
Skin cancer diagnosis using NIR spectroscopy data of skin lesions in vivo using machine learning algorithms
Authors:
Flavio P. Loss,
Pedro H. da Cunha,
Matheus B. Rocha,
Madson Poltronieri Zanoni,
Leandro M. de Lima,
Isadora Tavares Nascimento,
Isabella Rezende,
Tania R. P. Canuto,
Luciana de Paula Vieira,
Renan Rossoni,
Maria C. S. Santos,
Patricia Lyra Frasson,
Wanderson Romão,
Paulo R. Filgueiras,
Renato A. Krohling
Abstract:
Skin lesions are classified in benign or malignant. Among the malignant, melanoma is a very aggressive cancer and the major cause of deaths. So, early diagnosis of skin cancer is very desired. In the last few years, there is a growing interest in computer aided diagnostic (CAD) using most image and clinical data of the lesion. These sources of information present limitations due to their inability…
▽ More
Skin lesions are classified in benign or malignant. Among the malignant, melanoma is a very aggressive cancer and the major cause of deaths. So, early diagnosis of skin cancer is very desired. In the last few years, there is a growing interest in computer aided diagnostic (CAD) using most image and clinical data of the lesion. These sources of information present limitations due to their inability to provide information of the molecular structure of the lesion. NIR spectroscopy may provide an alternative source of information to automated CAD of skin lesions. The most commonly used techniques and classification algorithms used in spectroscopy are Principal Component Analysis (PCA), Partial Least Squares - Discriminant Analysis (PLS-DA), and Support Vector Machines (SVM). Nonetheless, there is a growing interest in applying the modern techniques of machine and deep learning (MDL) to spectroscopy. One of the main limitations to apply MDL to spectroscopy is the lack of public datasets. Since there is no public dataset of NIR spectral data to skin lesions, as far as we know, an effort has been made and a new dataset named NIR-SC-UFES, has been collected, annotated and analyzed generating the gold-standard for classification of NIR spectral data to skin cancer. Next, the machine learning algorithms XGBoost, CatBoost, LightGBM, 1D-convolutional neural network (1D-CNN) were investigated to classify cancer and non-cancer skin lesions. Experimental results indicate the best performance obtained by LightGBM with pre-processing using standard normal variate (SNV), feature extraction providing values of 0.839 for balanced accuracy, 0.851 for recall, 0.852 for precision, and 0.850 for F-score. The obtained results indicate the first steps in CAD of skin lesions aiming the automated triage of patients with skin lesions in vivo using NIR spectral data.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Exploring Crowd Dynamics: Simulating Structured Behaviors through Crowd Simulation Models
Authors:
Thiago Gomes Vidal de Mello,
Matheus Schreiner Homrich da Silva,
Gabriel Fonseca Silva,
Soraia Raupp Musse
Abstract:
This paper proposes the simulation of structured behaviors in a crowd of virtual agents by extending the BioCrowds simulation model.
Three behaviors were simulated and evaluated, a queue as a generic case and two specific behaviors observed at rock concerts. The extended model incorporates new parameters and modifications to replicate these behaviors accurately. Experiments were conducted to ana…
▽ More
This paper proposes the simulation of structured behaviors in a crowd of virtual agents by extending the BioCrowds simulation model.
Three behaviors were simulated and evaluated, a queue as a generic case and two specific behaviors observed at rock concerts. The extended model incorporates new parameters and modifications to replicate these behaviors accurately. Experiments were conducted to analyze the impact of parameters on simulation results, and computational performance was considered.
The results demonstrate the model's effectiveness in simulating structured behaviors and its potential for replicating complex social phenomena in diverse scenarios.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Detecting Events in Crowds Through Changes in Geometrical Dimensions of Pedestrians
Authors:
Matheus Schreiner Homrich da Silva,
Paulo Brossard de Souza Pinto Neto,
Rodolfo Migon Favaretto,
Soraia Raupp Musse
Abstract:
Security is an important topic in our contemporary world, and the ability to automate the detection of any events of interest that can take place in a crowd is of great interest to a population. We hypothesize that the detection of events in videos is correlated with significant changes in pedestrian behaviors. In this paper, we examine three different scenarios of crowd behavior, containing both…
▽ More
Security is an important topic in our contemporary world, and the ability to automate the detection of any events of interest that can take place in a crowd is of great interest to a population. We hypothesize that the detection of events in videos is correlated with significant changes in pedestrian behaviors. In this paper, we examine three different scenarios of crowd behavior, containing both the cases where an event triggers a change in the behavior of the crowd and two video sequences where the crowd and its motion remain mostly unchanged. With both the videos and the tracking of the individual pedestrians (performed in a pre-processed phase), we use Geomind, a software we developed to extract significant data about the scene, in particular, the geometrical features, personalities, and emotions of each person. We then examine the output, seeking a significant change in the way each person acts as a function of the time, that could be used as a basis to identify events or to model realistic crowd actions. When applied to the games area, our method can use the detected events to find some sort of pattern to be then used in agent simulation. Results indicate that our hypothesis seems valid in the sense that the visually observed events could be automatically detected using GeoMind.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Generalizable Neural Physics Solvers by Baldwinian Evolution
Authors:
Jian Cheng Wong,
Chin Chun Ooi,
Abhishek Gupta,
Pao-Hsiung Chiu,
Joshua Shao Zheng Low,
My Ha Dao,
Yew-Soon Ong
Abstract:
Physics-informed neural networks (PINNs) are at the forefront of scientific machine learning, making possible the creation of machine intelligence that is cognizant of physical laws and able to accurately simulate them. In this paper, the potential of discovering PINNs that generalize over an entire family of physics tasks is studied, for the first time, through a biological lens of the Baldwin ef…
▽ More
Physics-informed neural networks (PINNs) are at the forefront of scientific machine learning, making possible the creation of machine intelligence that is cognizant of physical laws and able to accurately simulate them. In this paper, the potential of discovering PINNs that generalize over an entire family of physics tasks is studied, for the first time, through a biological lens of the Baldwin effect. Drawing inspiration from the neurodevelopment of precocial species that have evolved to learn, predict and react quickly to their environment, we envision PINNs that are pre-wired with connection strengths inducing strong biases towards efficient learning of physics. To this end, evolutionary selection pressure (guided by proficiency over a family of tasks) is coupled with lifetime learning (to specialize on a smaller subset of those tasks) to produce PINNs that demonstrate fast and physics-compliant prediction capabilities across a range of empirically challenging problem instances. The Baldwinian approach achieves an order of magnitude improvement in prediction accuracy at a fraction of the computation cost compared to state-of-the-art results with PINNs meta-learned by gradient descent. This paper marks a leap forward in the meta-learning of PINNs as generalizable physics solvers.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Instance Segmentation under Occlusions via Location-aware Copy-Paste Data Augmentation
Authors:
Son Nguyen,
Mikel Lainsa,
Hung Dao,
Daeyoung Kim,
Giang Nguyen
Abstract:
Occlusion is a long-standing problem in computer vision, particularly in instance segmentation. ACM MMSports 2023 DeepSportRadar has introduced a dataset that focuses on segmenting human subjects within a basketball context and a specialized evaluation metric for occlusion scenarios. Given the modest size of the dataset and the highly deformable nature of the objects to be segmented, this challeng…
▽ More
Occlusion is a long-standing problem in computer vision, particularly in instance segmentation. ACM MMSports 2023 DeepSportRadar has introduced a dataset that focuses on segmenting human subjects within a basketball context and a specialized evaluation metric for occlusion scenarios. Given the modest size of the dataset and the highly deformable nature of the objects to be segmented, this challenge demands the application of robust data augmentation techniques and wisely-chosen deep learning architectures. Our work (ranked 1st in the competition) first proposes a novel data augmentation technique, capable of generating more training samples with wider distribution. Then, we adopt a new architecture - Hybrid Task Cascade (HTC) framework with CBNetV2 as backbone and MaskIoU head to improve segmentation performance. Furthermore, we employ a Stochastic Weight Averaging (SWA) training strategy to improve the model's generalization. As a result, we achieve a remarkable occlusion score (OM) of 0.533 on the challenge dataset, securing the top-1 position on the leaderboard. Source code is available at this https://github.com/nguyendinhson-kaist/MMSports23-Seg-AutoID.
△ Less
Submitted 21 November, 2023; v1 submitted 27 October, 2023;
originally announced October 2023.
-
Types, equations, dimensions and the Pi theorem
Authors:
Nicola Botta,
Patrik Jansson,
Guilherme Horta Alvares Da Silva
Abstract:
The languages of mathematical physics and modelling are endowed with a rich "grammar of dimensions" that common abstractions of programming languages fail to represent. We propose a dependently typed domain-specific language (embedded in Idris) that captures this grammar. We apply it to explain basic notions of dimensional analysis and Buckingham's Pi theorem. We hope that the language makes mathe…
▽ More
The languages of mathematical physics and modelling are endowed with a rich "grammar of dimensions" that common abstractions of programming languages fail to represent. We propose a dependently typed domain-specific language (embedded in Idris) that captures this grammar. We apply it to explain basic notions of dimensional analysis and Buckingham's Pi theorem. We hope that the language makes mathematical physics more accessible to computer scientists and functional programming more palatable to modelers and physicists.
△ Less
Submitted 4 September, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
ISP meets Deep Learning: A Survey on Deep Learning Methods for Image Signal Processing
Authors:
Matheus Henrique Marques da Silva,
Jhessica Victoria Santos da Silva,
Rodrigo Reis Arrais,
Wladimir Barroso Guedes de Araújo Neto,
Leonardo Tadeu Lopes,
Guilherme Augusto Bileki,
Iago Oliveira Lima,
Lucas Borges Rondon,
Bruno Melo de Souza,
Mayara Costa Regazio,
Rodolfo Coelho Dalapicola,
Claudio Filipi Gonçalves dos Santos
Abstract:
The entire Image Signal Processor (ISP) of a camera relies on several processes to transform the data from the Color Filter Array (CFA) sensor, such as demosaicing, denoising, and enhancement. These processes can be executed either by some hardware or via software. In recent years, Deep Learning has emerged as one solution for some of them or even to replace the entire ISP using a single neural ne…
▽ More
The entire Image Signal Processor (ISP) of a camera relies on several processes to transform the data from the Color Filter Array (CFA) sensor, such as demosaicing, denoising, and enhancement. These processes can be executed either by some hardware or via software. In recent years, Deep Learning has emerged as one solution for some of them or even to replace the entire ISP using a single neural network for the task. In this work, we investigated several recent pieces of research in this area and provide deeper analysis and comparison among them, including results and possible points of improvement for future researchers.
△ Less
Submitted 23 May, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation
Authors:
Huy Dao,
Dung D. Le,
Cuong Chu
Abstract:
State-of-the-art methods on conversational recommender systems (CRS) leverage external knowledge to enhance both items' and contextual words' representations to achieve high quality recommendations and responses generation. However, the representations of the items and words are usually modeled in two separated semantic spaces, which leads to misalignment issue between them. Consequently, this wil…
▽ More
State-of-the-art methods on conversational recommender systems (CRS) leverage external knowledge to enhance both items' and contextual words' representations to achieve high quality recommendations and responses generation. However, the representations of the items and words are usually modeled in two separated semantic spaces, which leads to misalignment issue between them. Consequently, this will cause the CRS to only achieve a sub-optimal ranking performance, especially when there is a lack of sufficient information from the user's input. To address limitations of previous works, we propose a new CRS framework KLEVER, which jointly models items and their associated contextual words in the same semantic space. Particularly, we construct an item descriptive graph from the rich items' textual features, such as item description and categories. Based on the constructed descriptive graph, KLEVER jointly learns the embeddings of the words and items, towards enhancing both recommender and dialog generation modules. Extensive experiments on benchmarking CRS dataset demonstrate that KLEVER achieves superior performance, especially when the information from the users' responses is lacking.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
LSA-PINN: Linear Boundary Connectivity Loss for Solving PDEs on Complex Geometry
Authors:
Jian Cheng Wong,
Pao-Hsiung Chiu,
Chinchun Ooi,
My Ha Dao,
Yew-Soon Ong
Abstract:
We present a novel loss formulation for efficient learning of complex dynamics from governing physics, typically described by partial differential equations (PDEs), using physics-informed neural networks (PINNs). In our experiments, existing versions of PINNs are seen to learn poorly in many problems, especially for complex geometries, as it becomes increasingly difficult to establish appropriate…
▽ More
We present a novel loss formulation for efficient learning of complex dynamics from governing physics, typically described by partial differential equations (PDEs), using physics-informed neural networks (PINNs). In our experiments, existing versions of PINNs are seen to learn poorly in many problems, especially for complex geometries, as it becomes increasingly difficult to establish appropriate sampling strategy at the near boundary region. Overly dense sampling can adversely impede training convergence if the local gradient behaviors are too complex to be adequately modelled by PINNs. On the other hand, if the samples are too sparse, existing PINNs tend to overfit the near boundary region, leading to incorrect solution. To prevent such issues, we propose a new Boundary Connectivity (BCXN) loss function which provides linear local structure approximation (LSA) to the gradient behaviors at the boundary for PINN. Our BCXN-loss implicitly imposes local structure during training, thus facilitating fast physics-informed learning across entire problem domains with order of magnitude sparser training samples. This LSA-PINN method shows a few orders of magnitude smaller errors than existing methods in terms of the standard L2-norm metric, while using dramatically fewer training samples and iterations. Our proposed LSA-PINN does not pose any requirement on the differentiable property of the networks, and we demonstrate its benefits and ease of implementation on both multi-layer perceptron and convolutional neural network versions as commonly used in current PINN literature.
△ Less
Submitted 2 March, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Graph Neural Network Based Surrogate Model of Physics Simulations for Geometry Design
Authors:
Jian Cheng Wong,
Chin Chun Ooi,
Joyjit Chattoraj,
Lucas Lestandi,
Guoying Dong,
Umesh Kizhakkinan,
David William Rosen,
Mark Hyunpong Jhon,
My Ha Dao
Abstract:
Computational Intelligence (CI) techniques have shown great potential as a surrogate model of expensive physics simulation, with demonstrated ability to make fast predictions, albeit at the expense of accuracy in some cases. For many scientific and engineering problems involving geometrical design, it is desirable for the surrogate models to precisely describe the change in geometry and predict th…
▽ More
Computational Intelligence (CI) techniques have shown great potential as a surrogate model of expensive physics simulation, with demonstrated ability to make fast predictions, albeit at the expense of accuracy in some cases. For many scientific and engineering problems involving geometrical design, it is desirable for the surrogate models to precisely describe the change in geometry and predict the consequences. In that context, we develop graph neural networks (GNNs) as fast surrogate models for physics simulation, which allow us to directly train the models on 2/3D geometry designs that are represented by an unstructured mesh or point cloud, without the need for any explicit or hand-crafted parameterization. We utilize an encoder-processor-decoder-type architecture which can flexibly make prediction at both node level and graph level. The performance of our proposed GNN-based surrogate model is demonstrated on 2 example applications: feature designs in the domain of additive engineering and airfoil design in the domain of aerodynamics. The models show good accuracy in their predictions on a separate set of test geometries after training, with almost instant prediction speeds, as compared to O(hour) for the high-fidelity simulations required otherwise.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Robustness of Physics-Informed Neural Networks to Noise in Sensor Data
Authors:
Jian Cheng Wong,
Pao-Hsiung Chiu,
Chin Chun Ooi,
My Ha Da
Abstract:
Physics-Informed Neural Networks (PINNs) have been shown to be an effective way of incorporating physics-based domain knowledge into neural network models for many important real-world systems. They have been particularly effective as a means of inferring system information based on data, even in cases where data is scarce. Most of the current work however assumes the availability of high-quality…
▽ More
Physics-Informed Neural Networks (PINNs) have been shown to be an effective way of incorporating physics-based domain knowledge into neural network models for many important real-world systems. They have been particularly effective as a means of inferring system information based on data, even in cases where data is scarce. Most of the current work however assumes the availability of high-quality data. In this work, we further conduct a preliminary investigation of the robustness of physics-informed neural networks to the magnitude of noise in the data. Interestingly, our experiments reveal that the inclusion of physics in the neural network is sufficient to negate the impact of noise in data originating from hypothetical low quality sensors with high signal-to-noise ratios of up to 1. The resultant predictions for this test case are seen to still match the predictive value obtained for equivalent data obtained from high-quality sensors with potentially 10x less noise. This further implies the utility of physics-informed neural network modeling for making sense of data from sensor networks in the future, especially with the advent of Industry 4.0 and the increasing trend towards ubiquitous deployment of low-cost sensors which are typically noisier.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
From Disfluency Detection to Intent Detection and Slot Filling
Authors:
Mai Hoang Dao,
Thinh Hung Truong,
Dat Quoc Nguyen
Abstract:
We present the first empirical study investigating the influence of disfluency detection on downstream tasks of intent detection and slot filling. We perform this study for Vietnamese -- a low-resource language that has no previous study as well as no public dataset available for disfluency detection. First, we extend the fluent Vietnamese intent detection and slot filling dataset PhoATIS by manua…
▽ More
We present the first empirical study investigating the influence of disfluency detection on downstream tasks of intent detection and slot filling. We perform this study for Vietnamese -- a low-resource language that has no previous study as well as no public dataset available for disfluency detection. First, we extend the fluent Vietnamese intent detection and slot filling dataset PhoATIS by manually adding contextual disfluencies and annotating them. Then, we conduct experiments using strong baselines for disfluency detection and joint intent detection and slot filling, which are based on pre-trained language models. We find that: (i) disfluencies produce negative effects on the performances of the downstream intent detection and slot filling tasks, and (ii) in the disfluency context, the pre-trained multilingual language model XLM-R helps produce better intent detection and slot filling performances than the pre-trained monolingual language model PhoBERT, and this is opposite to what generally found in the fluency context.
△ Less
Submitted 17 September, 2022;
originally announced September 2022.
-
Towards Immediate Feedback for Security Relevant Code in Development Environments
Authors:
Markus Haug Ana Cristina Franco Da Silva,
Stefan Wagner
Abstract:
Nowadays, the correct use of cryptography libraries is essential to ensure the necessary information security in different kinds of applications. A common practice in software development is the use of static application security testing (SAST) tools to analyze code regarding security vulnerabilities. Most of these tools are designed to run separately from development environments. Their results a…
▽ More
Nowadays, the correct use of cryptography libraries is essential to ensure the necessary information security in different kinds of applications. A common practice in software development is the use of static application security testing (SAST) tools to analyze code regarding security vulnerabilities. Most of these tools are designed to run separately from development environments. Their results are extensive lists of security notifications, which software developers have to inspect manually in a time-consuming follow-up step. To support developers in their tasks of developing secure code, we present an approach for providing them with continuous immediate feedback of SAST tools in integrated development environments (IDEs). Our approach also considers the understandability of security notifications and aims for a user-centered approach that leverages developers' feedback to build an adaptive system tailored to each individual developer.
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider
Authors:
C. Fanelli,
Z. Papandreou,
K. Suresh,
J. K. Adkins,
Y. Akiba,
A. Albataineh,
M. Amaryan,
I. C. Arsene,
C. Ayerbe Gayoso,
J. Bae,
X. Bai,
M. D. Baker,
M. Bashkanov,
R. Bellwied,
F. Benmokhtar,
V. Berdnikov,
J. C. Bernauer,
F. Bock,
W. Boeglin,
M. Borysova,
E. Brash,
P. Brindza,
W. J. Briscoe,
M. Brooks,
S. Bueltmann
, et al. (258 additional authors not shown)
Abstract:
The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to…
▽ More
The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to leverage Artificial Intelligence (AI) already starting from the design and R&D phases. The EIC Comprehensive Chromodynamics Experiment (ECCE) is a consortium that proposed a detector design based on a 1.5T solenoid. The EIC detector proposal review concluded that the ECCE design will serve as the reference design for an EIC detector. Herein we describe a comprehensive optimization of the ECCE tracker using AI. The work required a complex parametrization of the simulated detector system. Our approach dealt with an optimization problem in a multidimensional design space driven by multiple objectives that encode the detector performance, while satisfying several mechanical constraints. We describe our strategy and show results obtained for the ECCE tracking system. The AI-assisted design is agnostic to the simulation framework and can be extended to other sub-detectors or to a system of sub-detectors to further optimize the performance of the EIC detector.
△ Less
Submitted 19 May, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
CAN-PINN: A Fast Physics-Informed Neural Network Based on Coupled-Automatic-Numerical Differentiation Method
Authors:
Pao-Hsiung Chiu,
Jian Cheng Wong,
Chinchun Ooi,
My Ha Dao,
Yew-Soon Ong
Abstract:
In this study, novel physics-informed neural network (PINN) methods for coupling neighboring support points and their derivative terms which are obtained by automatic differentiation (AD), are proposed to allow efficient training with improved accuracy. The computation of differential operators required for PINNs loss evaluation at collocation points are conventionally obtained via AD. Although AD…
▽ More
In this study, novel physics-informed neural network (PINN) methods for coupling neighboring support points and their derivative terms which are obtained by automatic differentiation (AD), are proposed to allow efficient training with improved accuracy. The computation of differential operators required for PINNs loss evaluation at collocation points are conventionally obtained via AD. Although AD has the advantage of being able to compute the exact gradients at any point, such PINNs can only achieve high accuracies with large numbers of collocation points, otherwise they are prone to optimizing towards unphysical solution. To make PINN training fast, the dual ideas of using numerical differentiation (ND)-inspired method and coupling it with AD are employed to define the loss function. The ND-based formulation for training loss can strongly link neighboring collocation points to enable efficient training in sparse sample regimes, but its accuracy is restricted by the interpolation scheme. The proposed coupled-automatic-numerical differentiation framework, labeled as can-PINN, unifies the advantages of AD and ND, providing more robust and efficient training than AD-based PINNs, while further improving accuracy by up to 1-2 orders of magnitude relative to ND-based PINNs. For a proof-of-concept demonstration of this can-scheme to fluid dynamic problems, two numerical-inspired instantiations of can-PINN schemes for the convection and pressure gradient terms were derived to solve the incompressible Navier-Stokes (N-S) equations. The superior performance of can-PINNs is demonstrated on several challenging problems, including the flow mixing phenomena, lid driven flow in a cavity, and channel flow over a backward facing step. The results reveal that for challenging problems like these, can-PINNs can consistently achieve very good accuracy whereas conventional AD-based PINNs fail.
△ Less
Submitted 27 March, 2022; v1 submitted 29 October, 2021;
originally announced October 2021.
-
ReINTEL Challenge 2020: A Comparative Study of Hybrid Deep Neural Network for Reliable Intelligence Identification on Vietnamese SNSs
Authors:
Hoang Viet Trinh,
Tung Tien Bui,
Tam Minh Nguyen,
Huy Quang Dao,
Quang Huu Pham,
Ngoc N. Tran,
Ta Minh Thanh
Abstract:
The overwhelming abundance of data has created a misinformation crisis. Unverified sensationalism that is designed to grab the readers' short attention span, when crafted with malice, has caused irreparable damage to our society's structure. As a result, determining the reliability of an article has become a crucial task. After various ablation studies, we propose a multi-input model that can effe…
▽ More
The overwhelming abundance of data has created a misinformation crisis. Unverified sensationalism that is designed to grab the readers' short attention span, when crafted with malice, has caused irreparable damage to our society's structure. As a result, determining the reliability of an article has become a crucial task. After various ablation studies, we propose a multi-input model that can effectively leverage both tabular metadata and post content for the task. Applying state-of-the-art finetuning techniques for the pretrained component and training strategies for our complete model, we have achieved a 0.9462 ROC-score on the VLSP private test set.
△ Less
Submitted 26 September, 2021;
originally announced September 2021.
-
Exploring the Use of Static and Dynamic Analysis to Improve the Performance of the Mining Sandbox Approach for Android Malware Identification
Authors:
Francisco Handrick da Costa,
Ismael Medeiros,
Thales Menezes,
João Victor da Silva,
Ingrid Lorraine da Silva,
Rodrigo Bonifácio,
Krishna Narasimhan,
Márcio Ribeiro
Abstract:
The Android mining sandbox approach consists in running dynamic analysis tools on a benign version of an Android app and recording every call to sensitive APIs. Later, one can use this information to (a) prevent calls to other sensitive APIs (those not previously recorded) or (b) run the dynamic analysis tools again in a different version of the app -- in order to identify possible malicious behav…
▽ More
The Android mining sandbox approach consists in running dynamic analysis tools on a benign version of an Android app and recording every call to sensitive APIs. Later, one can use this information to (a) prevent calls to other sensitive APIs (those not previously recorded) or (b) run the dynamic analysis tools again in a different version of the app -- in order to identify possible malicious behavior. Although the use of dynamic analysis for mining Android sandboxes has been empirically investigated before, little is known about the potential benefits of combining static analysis with the mining sandbox approach for identifying malicious behavior. As such, in this paper we present the results of two empirical studies: The first is a non-exact replication of a previous research work from Bao et al., which compares the performance of test case generation tools for mining Android sandboxes. The second is a new experiment to investigate the implications of using taint analysis algorithms to complement the mining sandbox approach in the task to identify malicious behavior. Our study brings several findings. For instance, the first study reveals that a static analysis component of DroidFax (a tool used for instrumenting Android apps in the Bao et al. study) contributes substantially to the performance of the dynamic analysis tools explored in the previous work. The results of the second study show that taint analysis is also practical to complement the mining sandboxes approach, improve the performance of the later strategy in at most 28.57%.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
Projection-Based Reduced Order Model for Simulations of Nonlinear Flows with Multiple Moving Objects
Authors:
My Ha Dao
Abstract:
This paper presents a reduced order approach for transient modeling of multiple moving objects in nonlinear crossflows. The Proper Orthogonal Decomposition method and the Galerkin projection are used to construct a reduced version of the nonlinear Navier-Stokes equations. The Galerkin projection implemented in OpenFOAM platform allows accurate impositions of arbitrary time-dependent boundary condi…
▽ More
This paper presents a reduced order approach for transient modeling of multiple moving objects in nonlinear crossflows. The Proper Orthogonal Decomposition method and the Galerkin projection are used to construct a reduced version of the nonlinear Navier-Stokes equations. The Galerkin projection implemented in OpenFOAM platform allows accurate impositions of arbitrary time-dependent boundary conditions at the moving boundaries. A modelling technique based on moving domain and immersed boundary techniques is proposed to overcome the challenge of handling moving boundaries due to movements of the multiple objects. The model is demonstrated capable to capture the complex flow fields past one and two oscillating cylinders and the forces acting on the cylinders. Simulation time could be reduced by more than three orders for a small case on a fine mesh as compared to an existing method and could be more for large cases. In general, the simulation time of the reduced model is of order of seconds as compared to hours of the full order Computational Fluid Dynamics models.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
Projection-Based Reduced Order Model and Machine Learning Closure for Transient Simulations of High-Re Flows
Authors:
My Ha Dao,
Hoang Huy Nguyen
Abstract:
The paper presents a Projection-Based Reduced-Order Model for simulations of high Reynolds turbulent flows. The PBROM are enhanced by incorporating various models of turbulent viscosity and residual closures to model the effects of interactions among the modes and energy dissipations. Remarkable improvements in prediction accuracies are achieved with a suitable turbulent viscosity model and a resi…
▽ More
The paper presents a Projection-Based Reduced-Order Model for simulations of high Reynolds turbulent flows. The PBROM are enhanced by incorporating various models of turbulent viscosity and residual closures to model the effects of interactions among the modes and energy dissipations. Remarkable improvements in prediction accuracies are achieved with a suitable turbulent viscosity model and a residual closure. The enhanced PBROM models are demonstrated for high-Re flows past a cylinder in two- and three- dimensions. These enhancements have shown capable of capturing complex flow features and removing unnecessary ones, while not affecting the efficiency of the overall model.
△ Less
Submitted 23 May, 2021;
originally announced May 2021.
-
Improved Surrogate Modeling of Fluid Dynamics with Physics-Informed Neural Networks
Authors:
Jian Cheng Wong,
Chinchun Ooi,
Pao-Hsiung Chiu,
My Ha Dao
Abstract:
Physics-Informed Neural Networks (PINNs) have recently shown great promise as a way of incorporating physics-based domain knowledge, including fundamental governing equations, into neural network models for many complex engineering systems. They have been particularly effective in the area of inverse problems, where boundary conditions may be ill-defined, and data-absent scenarios, where typical s…
▽ More
Physics-Informed Neural Networks (PINNs) have recently shown great promise as a way of incorporating physics-based domain knowledge, including fundamental governing equations, into neural network models for many complex engineering systems. They have been particularly effective in the area of inverse problems, where boundary conditions may be ill-defined, and data-absent scenarios, where typical supervised learning approaches will fail. Here, we further explore the use of this modeling methodology to surrogate modeling of a fluid dynamical system, and demonstrate additional undiscussed and interesting advantages of such a modeling methodology over conventional data-driven approaches: 1) improving the model's predictive performance even with incomplete description of the underlying physics; 2) improving the robustness of the model to noise in the dataset; 3) reduced effort to convergence during optimization for a new, previously unseen scenario by transfer optimization of a pre-existing model. Hence, we noticed the inclusion of a physics-based regularization term can substantially improve the equivalent data-driven surrogate model in many substantive ways, including an order of magnitude improvement in test error when the dataset is very noisy, and a 2-3x improvement when only partial physics is included. In addition, we propose a novel transfer optimization scheme for use in such surrogate modeling scenarios and demonstrate an approximately 3x improvement in speed to convergence and an order of magnitude improvement in predictive performance over conventional Xavier initialization for training of new scenarios.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Network Coding in Photonic-land: Three Commandments for Future-proof Optical Core Networks
Authors:
Hai Dao
Abstract:
The digital transformation has been underway, creating digital shadows of (almost) all physical entities and moving them to the Internet. The era of Internet of Everything has therefore started to come into play, giving rise to unprecedented traffic growths. In this context, optical core networks forming the backbone of Internet infrastructure have been under critical issues of reaching the capaci…
▽ More
The digital transformation has been underway, creating digital shadows of (almost) all physical entities and moving them to the Internet. The era of Internet of Everything has therefore started to come into play, giving rise to unprecedented traffic growths. In this context, optical core networks forming the backbone of Internet infrastructure have been under critical issues of reaching the capacity limit of conventional fiber, a phenomenon widely referred as capacity crunch. For many years, the many-fold increases in fiber capacity is thanks to exploiting physical dimensions for multiplexing optical signals such as wavelength, polarization, time and lately space-division multiplexing using multi-core fibers and such route seems to come to an end as almost all known ways have been exploited. This necessitates for a departure from traditional approaches to use the fiber capacity more efficiently and thereby improve economics of scale. This paper lays out a new perspective to integrate network coding (NC) functions into optical networks to achieve greater capacity efficiency by upgrading intermediate nodes functionalities. In addition to the review of recent proposals on new research problems enabled by NC operation in optical networks, we also report state-of-the-art findings in the literature in an effort to renew the interest of NC in optical networks and discuss three critical points for pushing forward its applicability and practicality including i) NC as a new dimension for multiplexing optical signals ii) algorithmic aspects of NC-enabled optical networks design iii) NC as an entirely fresh way for securing optical signals at physical layers
△ Less
Submitted 27 September, 2021; v1 submitted 3 May, 2021;
originally announced May 2021.
-
COVID-19 Named Entity Recognition for Vietnamese
Authors:
Thinh Hung Truong,
Mai Hoang Dao,
Dat Quoc Nguyen
Abstract:
The current COVID-19 pandemic has lead to the creation of many corpora that facilitate NLP research and downstream applications to help fight the pandemic. However, most of these corpora are exclusively for English. As the pandemic is a global problem, it is worth creating COVID-19 related datasets for languages other than English. In this paper, we present the first manually-annotated COVID-19 do…
▽ More
The current COVID-19 pandemic has lead to the creation of many corpora that facilitate NLP research and downstream applications to help fight the pandemic. However, most of these corpora are exclusively for English. As the pandemic is a global problem, it is worth creating COVID-19 related datasets for languages other than English. In this paper, we present the first manually-annotated COVID-19 domain-specific dataset for Vietnamese. Particularly, our dataset is annotated for the named entity recognition (NER) task with newly-defined entity types that can be used in other future epidemics. Our dataset also contains the largest number of entities compared to existing Vietnamese NER datasets. We empirically conduct experiments using strong baselines on our dataset, and find that: automatic Vietnamese word segmentation helps improve the NER results and the highest performances are obtained by fine-tuning pre-trained language models where the monolingual model PhoBERT for Vietnamese (Nguyen and Nguyen, 2020) produces higher results than the multilingual model XLM-R (Conneau et al., 2020). We publicly release our dataset at: https://github.com/VinAIResearch/PhoNER_COVID19
△ Less
Submitted 8 April, 2021;
originally announced April 2021.
-
Intent Detection and Slot Filling for Vietnamese
Authors:
Mai Hoang Dao,
Thinh Hung Truong,
Dat Quoc Nguyen
Abstract:
Intent detection and slot filling are important tasks in spoken and natural language understanding. However, Vietnamese is a low-resource language in these research topics. In this paper, we present the first public intent detection and slot filling dataset for Vietnamese. In addition, we also propose a joint model for intent detection and slot filling, that extends the recent state-of-the-art Joi…
▽ More
Intent detection and slot filling are important tasks in spoken and natural language understanding. However, Vietnamese is a low-resource language in these research topics. In this paper, we present the first public intent detection and slot filling dataset for Vietnamese. In addition, we also propose a joint model for intent detection and slot filling, that extends the recent state-of-the-art JointBERT+CRF model with an intent-slot attention layer to explicitly incorporate intent context information into slot filling via "soft" intent label embedding. Experimental results on our Vietnamese dataset show that our proposed model significantly outperforms JointBERT+CRF. We publicly release our dataset and the implementation of our model at: https://github.com/VinAIResearch/JointIDSF
△ Less
Submitted 9 June, 2021; v1 submitted 5 April, 2021;
originally announced April 2021.
-
Interpreting the Latent Space of Generative Adversarial Networks using Supervised Learning
Authors:
Toan Pham Van,
Tam Minh Nguyen,
Ngoc N. Tran,
Hoai Viet Nguyen,
Linh Bao Doan,
Huy Quang Dao,
Thanh Ta Minh
Abstract:
With great progress in the development of Generative Adversarial Networks (GANs), in recent years, the quest for insights in understanding and manipulating the latent space of GAN has gained more and more attention due to its wide range of applications. While most of the researches on this task have focused on unsupervised learning method, which induces difficulties in training and limitation in r…
▽ More
With great progress in the development of Generative Adversarial Networks (GANs), in recent years, the quest for insights in understanding and manipulating the latent space of GAN has gained more and more attention due to its wide range of applications. While most of the researches on this task have focused on unsupervised learning method, which induces difficulties in training and limitation in results, our work approaches another direction, encoding human's prior knowledge to discover more about the hidden space of GAN. With this supervised manner, we produce promising results, demonstrated by accurate manipulation of generated images. Even though our model is more suitable for task-specific problems, we hope that its ease in implementation, preciseness, robustness, and the allowance of richer set of properties (compared to other approaches) for image manipulation can enhance the result of many current applications.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets
Authors:
Dat Quoc Nguyen,
Thanh Vu,
Afshin Rahimi,
Mai Hoang Dao,
Linh The Nguyen,
Long Doan
Abstract:
In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems…
▽ More
In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems obtain very high performance, up to 0.91 F1 score, (ii) the majority of the submissions achieve substantially higher results than the baseline fastText (Joulin et al., 2017), and (iii) fine-tuning pre-trained language models on relevant language data followed by supervised training performs well in this task.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese
Authors:
Anh Tuan Nguyen,
Mai Hoang Dao,
Dat Quoc Nguyen
Abstract:
Semantic parsing is an important NLP task. However, Vietnamese is a low-resource language in this research area. In this paper, we present the first public large-scale Text-to-SQL semantic parsing dataset for Vietnamese. We extend and evaluate two strong semantic parsing baselines EditSQL (Zhang et al., 2019) and IRNet (Guo et al., 2019) on our dataset. We compare the two baselines with key config…
▽ More
Semantic parsing is an important NLP task. However, Vietnamese is a low-resource language in this research area. In this paper, we present the first public large-scale Text-to-SQL semantic parsing dataset for Vietnamese. We extend and evaluate two strong semantic parsing baselines EditSQL (Zhang et al., 2019) and IRNet (Guo et al., 2019) on our dataset. We compare the two baselines with key configurations and find that: automatic Vietnamese word segmentation improves the parsing results of both baselines; the normalized pointwise mutual information (NPMI) score (Bouma, 2009) is useful for schema linking; latent syntactic features extracted from a neural dependency parser for Vietnamese also improve the results; and the monolingual language model PhoBERT for Vietnamese (Nguyen and Nguyen, 2020) helps produce higher performances than the recent best multilingual language model XLM-R (Conneau et al., 2020).
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
A machine learning approach for detecting CNAME cloaking-based tracking on the Web
Authors:
Ha Dao,
Kensuke Fukuda
Abstract:
Various in-browser privacy protection techniques have been designed to protect end-users from third-party tracking. In an arms race against these counter-measures, the tracking providers developed a new technique called CNAME cloaking based tracking to avoid issues with browsers that block third-party cookies and requests. To detect this tracking technique, browser extensions require on-demand DNS…
▽ More
Various in-browser privacy protection techniques have been designed to protect end-users from third-party tracking. In an arms race against these counter-measures, the tracking providers developed a new technique called CNAME cloaking based tracking to avoid issues with browsers that block third-party cookies and requests. To detect this tracking technique, browser extensions require on-demand DNS lookup APIs. This feature is however only supported by the Firefox browser.
In this paper, we propose a supervised machine learning-based method to detect CNAME cloaking-based tracking without the on-demand DNS lookup. Our goal is to detect both sites and requests linked to CNAME cloaking-related tracking. We crawl a list of target sites and store all HTTP/HTTPS requests with their attributes. Then we label all instances automatically by looking up CNAME record of subdomain, and applying wildcard matching based on well-known tracking filter lists. After extracting features, we build a supervised classification model to distinguish site and request related to CNAME cloaking-based tracking. Our evaluation shows that the proposed approach outperforms well-known tracking filter lists: F1 scores of 0.790 for sites and 0.885 for requests. By analyzing the feature permutation importance, we demonstrate that the number of scripts and the proportion of XMLHttpRequests are discriminative for detecting sites, and the length of URL request is helpful in detecting requests. Finally, we analyze concept drift by using the 2018 dataset to train a model and obtain a reasonable performance on the 2020 dataset for detecting both sites and requests using CNAME cloaking-based tracking.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification
Authors:
Haocong Rao,
Siqi Wang,
Xiping Hu,
Mingkui Tan,
Huang Da,
Jun Cheng,
Bin Hu
Abstract:
Gait-based person re-identification (Re-ID) is valuable for safety-critical applications, and using only 3D skeleton data to extract discriminative gait features for person Re-ID is an emerging open topic. Existing methods either adopt hand-crafted features or learn gait features by traditional supervised learning paradigms. Unlike previous methods, we for the first time propose a generic gait enc…
▽ More
Gait-based person re-identification (Re-ID) is valuable for safety-critical applications, and using only 3D skeleton data to extract discriminative gait features for person Re-ID is an emerging open topic. Existing methods either adopt hand-crafted features or learn gait features by traditional supervised learning paradigms. Unlike previous methods, we for the first time propose a generic gait encoding approach that can utilize unlabeled skeleton data to learn gait representations in a self-supervised manner. Specifically, we first propose to introduce self-supervision by learning to reconstruct input skeleton sequences in reverse order, which facilitates learning richer high-level semantics and better gait representations. Second, inspired by the fact that motion's continuity endows temporally adjacent skeletons with higher correlations ("locality"), we propose a locality-aware attention mechanism that encourages learning larger attention weights for temporally adjacent skeletons when reconstructing current skeleton, so as to learn locality when encoding gait. Finally, we propose Attention-based Gait Encodings (AGEs), which are built using context vectors learned by locality-aware attention, as final gait representations. AGEs are directly utilized to realize effective person Re-ID. Our approach typically improves existing skeleton-based methods by 10-20% Rank-1 accuracy, and it achieves comparable or even superior performance to multi-modal methods with extra RGB or depth information. Our codes are available at https://github.com/Kali-Hac/SGE-LA.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
FCN+RL: A Fully Convolutional Network followed by Refinement Layers to Offline Handwritten Signature Segmentation
Authors:
Celso A. M. Lopes Junior,
Matheus Henrique M. da Silva,
Byron Leite Dantas Bezerra,
Bruno Jose Torres Fernandes,
Donato Impedovo
Abstract:
Although secular, handwritten signature is one of the most reliable biometric methods used by most countries. In the last ten years, the application of technology for verification of handwritten signatures has evolved strongly, including forensic aspects. Some factors, such as the complexity of the background and the small size of the region of interest - signature pixels - increase the difficulty…
▽ More
Although secular, handwritten signature is one of the most reliable biometric methods used by most countries. In the last ten years, the application of technology for verification of handwritten signatures has evolved strongly, including forensic aspects. Some factors, such as the complexity of the background and the small size of the region of interest - signature pixels - increase the difficulty of the targeting task. Other factors that make it challenging are the various variations present in handwritten signatures such as location, type of ink, color and type of pen, and the type of stroke. In this work, we propose an approach to locate and extract the pixels of handwritten signatures on identification documents, without any prior information on the location of the signatures. The technique used is based on a fully convolutional encoder-decoder network combined with a block of refinement layers for the alpha channel of the predicted image. The experimental results demonstrate that the technique outputs a clean signature with higher fidelity in the lines than the traditional approaches and preservation of the pertinent characteristics to the signer's spelling. To evaluate the quality of our proposal, we use the following image similarity metrics: SSIM, SIFT, and Dice Coefficient. The qualitative and quantitative results show a significant improvement in comparison with the baseline system.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
Recognizing Families through Images with Pretrained Encoder
Authors:
Tuan-Duy H. Nguyen,
Huu-Nghia H. Nguyen,
Hieu Dao
Abstract:
Kinship verification and kinship retrieval are emerging tasks in computer vision. Kinship verification aims at determining whether two facial images are from related people or not, while kinship retrieval is the task of retrieving possible related facial images to a person from a gallery of images. They introduce unique challenges because of the hidden relations and features that carry inherent ch…
▽ More
Kinship verification and kinship retrieval are emerging tasks in computer vision. Kinship verification aims at determining whether two facial images are from related people or not, while kinship retrieval is the task of retrieving possible related facial images to a person from a gallery of images. They introduce unique challenges because of the hidden relations and features that carry inherent characteristics between the facial images. We employ 3 methods, FaceNet, Siamese VGG-Face, and a combination of FaceNet and VGG-Face models as feature extractors, to achieve the 9th standing for kinship verification and the 5th standing for kinship retrieval in the Recognizing Family in The Wild 2020 competition. We then further experimented using StyleGAN2 as another encoder, with no improvement in the result.
△ Less
Submitted 24 May, 2020;
originally announced May 2020.
-
Entropy as a measure of attractiveness and socioeconomic complexity in Rio de Janeiro metropolitan area
Authors:
Maxime Lenormand,
Horacio Samaniego,
Julio C. Chaves,
Vinicius F. Vieira,
Moacyr A. H. B. da Silva,
Alexandre G. Evsukoff
Abstract:
Defining and measuring spatial inequalities across the urban environment remains a complex and elusive task that has been facilitated by the increasing availability of large geolocated databases. In this study, we rely on a mobile phone dataset and an entropy-based metric to measure the attractiveness of a location in the Rio de Janeiro Metropolitan Area (Brazil) as the diversity of visitors' loca…
▽ More
Defining and measuring spatial inequalities across the urban environment remains a complex and elusive task that has been facilitated by the increasing availability of large geolocated databases. In this study, we rely on a mobile phone dataset and an entropy-based metric to measure the attractiveness of a location in the Rio de Janeiro Metropolitan Area (Brazil) as the diversity of visitors' location of residence. The results show that the attractiveness of a given location measured by entropy is an important descriptor of the socioeconomic status of the location, and can thus be used as a proxy for complex socioeconomic indicators.
△ Less
Submitted 23 March, 2020;
originally announced March 2020.
-
Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models
Authors:
Edresson Casanova,
Arnaldo Candido Junior,
Christopher Shulby,
Frederico Santos de Oliveira,
Lucas Rafael Stefanel Gris,
Hamilton Pereira da Silva,
Sandra Maria Aluisio,
Moacir Antonelli Ponti
Abstract:
In this paper we present an efficient method for training models for speaker recognition using small or under-resourced datasets. This method requires less data than other SOTA (State-Of-The-Art) methods, e.g. the Angular Prototypical and GE2E loss functions, while achieving similar results to those methods. This is done using the knowledge of the reconstruction of a phoneme in the speaker's voice…
▽ More
In this paper we present an efficient method for training models for speaker recognition using small or under-resourced datasets. This method requires less data than other SOTA (State-Of-The-Art) methods, e.g. the Angular Prototypical and GE2E loss functions, while achieving similar results to those methods. This is done using the knowledge of the reconstruction of a phoneme in the speaker's voice. For this purpose, a new dataset was built, composed of 40 male speakers, who read sentences in Portuguese, totaling approximately 3h. We compare the three best architectures trained using our method to select the best one, which is the one with a shallow architecture. Then, we compared this model with the SOTA method for the speaker recognition task: the Fast ResNet-34 trained with approximately 2,000 hours, using the loss functions Angular Prototypical and GE2E. Three experiments were carried out with datasets in different languages. Among these three experiments, our model achieved the second best result in two experiments and the best result in one of them. This highlights the importance of our method, which proved to be a great competitor to SOTA speaker recognition models, with 500x less data and a simpler approach.
△ Less
Submitted 18 June, 2021; v1 submitted 25 February, 2020;
originally announced February 2020.