Skip to main content

Showing 1–41 of 41 results for author: Dao, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04411  [pdf, other

    cs.CR cs.AI cs.CL

    Waterfall: Framework for Robust and Scalable Text Watermarking

    Authors: Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, Bryan Kian Hsiang Low

    Abstract: Protecting intellectual property (IP) of text such as articles and code is increasingly important, especially as sophisticated attacks become possible, such as paraphrasing by large language models (LLMs) or even unauthorized training of LLMs on copyrighted text to infringe such IP. However, existing text watermarking methods are not robust enough against such attacks nor scalable to millions of u… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2406.14473  [pdf, other

    cs.LG cs.CL

    Data-Centric AI in the Age of Large Language Models

    Authors: Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, Jingtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh, Bryan Kian Hsiang Low

    Abstract: This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  3. arXiv:2406.06608  [pdf, other

    cs.CL cs.AI

    The Prompt Report: A Systematic Survey of Prompting Techniques

    Authors: Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker , et al. (6 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a p… ▽ More

    Submitted 14 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2405.17406  [pdf, other

    hep-th cs.LG math.AG

    Deep Learning Calabi-Yau four folds with hybrid and recurrent neural network architectures

    Authors: H. L. Dao

    Abstract: In this work, we report the results of applying deep learning based on hybrid convolutional-recurrent and purely recurrent neural network architectures to the dataset of almost one million complete intersection Calabi-Yau four-folds (CICY4) to machine-learn their four Hodge numbers $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$. In particular, we explored and experimented with twelve different neural networ… ▽ More

    Submitted 3 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: v2: new (improved) results added, references added, typos corrected

  5. arXiv:2403.01417  [pdf, other

    cs.LG cs.DC

    Asyn2F: An Asynchronous Federated Learning Framework with Bidirectional Model Aggregation

    Authors: Tien-Dung Cao, Nguyen T. Vuong, Thai Q. Le, Hoang V. N. Dao, Tram Truong-Huu

    Abstract: In federated learning, the models can be trained synchronously or asynchronously. Many research works have focused on developing an aggregation method for the server to aggregate multiple local models into the global model with improved performance. They ignore the heterogeneity of the training workers, which causes the delay in the training of the local models, leading to the obsolete information… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  6. arXiv:2402.17588  [pdf, other

    cs.SE

    Chronicles of CI/CD: A Deep Dive into its Usage Over Time

    Authors: Hugo da Gião, André Flores, Rui Pereira, Jácome Cunha

    Abstract: DevOps is a combination of methodologies and tools that improves the software development, build, deployment, and monitoring processes by shortening its lifecycle and improving software quality. Part of this process is CI/CD, which embodies mostly the first parts, right up to the deployment. Despite the many benefits of DevOps and CI/CD, it still presents many challenges promoted by the tremendous… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  7. arXiv:2402.16728  [pdf, other

    cs.DC

    Auto Tuning for OpenMP Dynamic Scheduling applied to FWI

    Authors: Felipe H. S. da Silva, João B. Fernandes, Idalmis M. Sardina, Tiago Barros, Samuel Xavier-de-Souza, Italo A. S. Assis

    Abstract: Because Full Waveform Inversion (FWI) works with a massive amount of data, its execution requires much time and computational resources, being restricted to large-scale computer systems such as supercomputers. Techniques such as FWI adapt well to parallel computing and can be parallelized in shared memory systems using the application programming interface (API) OpenMP. The management of parallel… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  8. PATSMA: Parameter Auto-tuning for Shared Memory Algorithms

    Authors: Joao B. Fernandes, Felipe H. S. da Silva, Samuel Xavier-de-Souza, Italo A. S. Assis

    Abstract: Programs with high levels of complexity often face challenges in adjusting execution parameters, particularly when these parameters vary based on the execution context. These dynamic parameters significantly impact the program's performance, such as loop granularity, which can vary depending on factors like the execution environment, program input, or the choice of compiler. Given the expensive na… ▽ More

    Submitted 14 June, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Journal ref: SoftwareX, Volume 27, 2024, 101789

  9. arXiv:2401.02961  [pdf, other

    cs.LG cs.CV eess.IV physics.optics

    A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design

    Authors: Manna Dai, Yang Jiang, Feng Yang, Joyjit Chattoraj, Yingzhi Xia, Xinxing Xu, Weijiang Zhao, My Ha Dao, Yong Liu

    Abstract: Metasurfaces have widespread applications in fifth-generation (5G) microwave communication. Among the metasurface family, free-form metasurfaces excel in achieving intricate spectral responses compared to regular-shape counterparts. However, conventional numerical methods for free-form metasurfaces are time-consuming and demand specialized expertise. Alternatively, recent studies demonstrate that… ▽ More

    Submitted 18 October, 2023; originally announced January 2024.

  10. arXiv:2401.01200  [pdf, other

    cs.CV cs.AI

    Skin cancer diagnosis using NIR spectroscopy data of skin lesions in vivo using machine learning algorithms

    Authors: Flavio P. Loss, Pedro H. da Cunha, Matheus B. Rocha, Madson Poltronieri Zanoni, Leandro M. de Lima, Isadora Tavares Nascimento, Isabella Rezende, Tania R. P. Canuto, Luciana de Paula Vieira, Renan Rossoni, Maria C. S. Santos, Patricia Lyra Frasson, Wanderson Romão, Paulo R. Filgueiras, Renato A. Krohling

    Abstract: Skin lesions are classified in benign or malignant. Among the malignant, melanoma is a very aggressive cancer and the major cause of deaths. So, early diagnosis of skin cancer is very desired. In the last few years, there is a growing interest in computer aided diagnostic (CAD) using most image and clinical data of the lesion. These sources of information present limitations due to their inability… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  11. arXiv:2312.06549  [pdf, other

    cs.GR

    Exploring Crowd Dynamics: Simulating Structured Behaviors through Crowd Simulation Models

    Authors: Thiago Gomes Vidal de Mello, Matheus Schreiner Homrich da Silva, Gabriel Fonseca Silva, Soraia Raupp Musse

    Abstract: This paper proposes the simulation of structured behaviors in a crowd of virtual agents by extending the BioCrowds simulation model. Three behaviors were simulated and evaluated, a queue as a generic case and two specific behaviors observed at rock concerts. The extended model incorporates new parameters and modifications to replicate these behaviors accurately. Experiments were conducted to ana… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Paper presented as Final project of Computer Science Undergraduate Course at PUCRS

  12. arXiv:2312.06495  [pdf, other

    cs.CV

    Detecting Events in Crowds Through Changes in Geometrical Dimensions of Pedestrians

    Authors: Matheus Schreiner Homrich da Silva, Paulo Brossard de Souza Pinto Neto, Rodolfo Migon Favaretto, Soraia Raupp Musse

    Abstract: Security is an important topic in our contemporary world, and the ability to automate the detection of any events of interest that can take place in a crowd is of great interest to a population. We hypothesize that the detection of events in videos is correlated with significant changes in pedestrian behaviors. In this paper, we examine three different scenarios of crowd behavior, containing both… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: SBGames 2019

  13. arXiv:2312.03243  [pdf, other

    cs.NE cs.CE cs.LG

    Generalizable Neural Physics Solvers by Baldwinian Evolution

    Authors: Jian Cheng Wong, Chin Chun Ooi, Abhishek Gupta, Pao-Hsiung Chiu, Joshua Shao Zheng Low, My Ha Dao, Yew-Soon Ong

    Abstract: Physics-informed neural networks (PINNs) are at the forefront of scientific machine learning, making possible the creation of machine intelligence that is cognizant of physical laws and able to accurately simulate them. In this paper, the potential of discovering PINNs that generalize over an entire family of physics tasks is studied, for the first time, through a biological lens of the Baldwin ef… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  14. arXiv:2310.17949  [pdf, other

    cs.CV

    Instance Segmentation under Occlusions via Location-aware Copy-Paste Data Augmentation

    Authors: Son Nguyen, Mikel Lainsa, Hung Dao, Daeyoung Kim, Giang Nguyen

    Abstract: Occlusion is a long-standing problem in computer vision, particularly in instance segmentation. ACM MMSports 2023 DeepSportRadar has introduced a dataset that focuses on segmenting human subjects within a basketball context and a specialized evaluation metric for occlusion scenarios. Given the modest size of the dataset and the highly deformable nature of the objects to be segmented, this challeng… ▽ More

    Submitted 21 November, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

  15. arXiv:2308.09481  [pdf, ps, other

    cs.PL cs.LO

    Types, equations, dimensions and the Pi theorem

    Authors: Nicola Botta, Patrik Jansson, Guilherme Horta Alvares Da Silva

    Abstract: The languages of mathematical physics and modelling are endowed with a rich "grammar of dimensions" that common abstractions of programming languages fail to represent. We propose a dependently typed domain-specific language (embedded in Idris) that captures this grammar. We apply it to explain basic notions of dimensional analysis and Buckingham's Pi theorem. We hope that the language makes mathe… ▽ More

    Submitted 4 September, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Submitted for publication in the "Journal of Functional Programming" in August 2023

  16. arXiv:2305.11994  [pdf, other

    cs.LG eess.IV

    ISP meets Deep Learning: A Survey on Deep Learning Methods for Image Signal Processing

    Authors: Matheus Henrique Marques da Silva, Jhessica Victoria Santos da Silva, Rodrigo Reis Arrais, Wladimir Barroso Guedes de Araújo Neto, Leonardo Tadeu Lopes, Guilherme Augusto Bileki, Iago Oliveira Lima, Lucas Borges Rondon, Bruno Melo de Souza, Mayara Costa Regazio, Rodolfo Coelho Dalapicola, Claudio Filipi Gonçalves dos Santos

    Abstract: The entire Image Signal Processor (ISP) of a camera relies on several processes to transform the data from the Color Filter Array (CFA) sensor, such as demosaicing, denoising, and enhancement. These processes can be executed either by some hardware or via software. In recent years, Deep Learning has emerged as one solution for some of them or even to replace the entire ISP using a single neural ne… ▽ More

    Submitted 23 May, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

  17. arXiv:2304.09093  [pdf, other

    cs.IR cs.CL cs.LG

    Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation

    Authors: Huy Dao, Dung D. Le, Cuong Chu

    Abstract: State-of-the-art methods on conversational recommender systems (CRS) leverage external knowledge to enhance both items' and contextual words' representations to achieve high quality recommendations and responses generation. However, the representations of the items and words are usually modeled in two separated semantic spaces, which leads to misalignment issue between them. Consequently, this wil… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: 14 pages, 3 figures, 9 tables

  18. arXiv:2302.01518  [pdf, other

    cs.LG cs.CE physics.flu-dyn

    LSA-PINN: Linear Boundary Connectivity Loss for Solving PDEs on Complex Geometry

    Authors: Jian Cheng Wong, Pao-Hsiung Chiu, Chinchun Ooi, My Ha Dao, Yew-Soon Ong

    Abstract: We present a novel loss formulation for efficient learning of complex dynamics from governing physics, typically described by partial differential equations (PDEs), using physics-informed neural networks (PINNs). In our experiments, existing versions of PINNs are seen to learn poorly in many problems, especially for complex geometries, as it becomes increasingly difficult to establish appropriate… ▽ More

    Submitted 2 March, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 11 pages, 7 figures

    Journal ref: 2023 International Joint Conference on Neural Networks (IJCNN)

  19. Graph Neural Network Based Surrogate Model of Physics Simulations for Geometry Design

    Authors: Jian Cheng Wong, Chin Chun Ooi, Joyjit Chattoraj, Lucas Lestandi, Guoying Dong, Umesh Kizhakkinan, David William Rosen, Mark Hyunpong Jhon, My Ha Dao

    Abstract: Computational Intelligence (CI) techniques have shown great potential as a surrogate model of expensive physics simulation, with demonstrated ability to make fast predictions, albeit at the expense of accuracy in some cases. For many scientific and engineering problems involving geometrical design, it is desirable for the surrogate models to precisely describe the change in geometry and predict th… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: 7 pages, 5 figures, 2022 IEEE Symposium Series on Computational Intelligence

  20. arXiv:2211.12042  [pdf, other

    cs.LG physics.comp-ph

    Robustness of Physics-Informed Neural Networks to Noise in Sensor Data

    Authors: Jian Cheng Wong, Pao-Hsiung Chiu, Chin Chun Ooi, My Ha Da

    Abstract: Physics-Informed Neural Networks (PINNs) have been shown to be an effective way of incorporating physics-based domain knowledge into neural network models for many important real-world systems. They have been particularly effective as a means of inferring system information based on data, even in cases where data is scarce. Most of the current work however assumes the availability of high-quality… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  21. From Disfluency Detection to Intent Detection and Slot Filling

    Authors: Mai Hoang Dao, Thinh Hung Truong, Dat Quoc Nguyen

    Abstract: We present the first empirical study investigating the influence of disfluency detection on downstream tasks of intent detection and slot filling. We perform this study for Vietnamese -- a low-resource language that has no previous study as well as no public dataset available for disfluency detection. First, we extend the fluent Vietnamese intent detection and slot filling dataset PhoATIS by manua… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

    Comments: In Proceedings of INTERSPEECH 2022

  22. arXiv:2207.03225  [pdf, other

    cs.SE cs.CR

    Towards Immediate Feedback for Security Relevant Code in Development Environments

    Authors: Markus Haug Ana Cristina Franco Da Silva, Stefan Wagner

    Abstract: Nowadays, the correct use of cryptography libraries is essential to ensure the necessary information security in different kinds of applications. A common practice in software development is the use of static application security testing (SAST) tools to analyze code regarding security vulnerabilities. Most of these tools are designed to run separately from development environments. Their results a… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: submitted to the 16th Symposium and Summer School On Service-Oriented Computing 2022

  23. arXiv:2205.09185  [pdf, other

    physics.ins-det cs.LG hep-ex nucl-ex physics.comp-ph

    AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider

    Authors: C. Fanelli, Z. Papandreou, K. Suresh, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann , et al. (258 additional authors not shown)

    Abstract: The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to… ▽ More

    Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: 16 pages, 18 figures, 2 appendices, 3 tables

  24. arXiv:2110.15832  [pdf

    cs.LG cs.CE math.NA physics.comp-ph physics.flu-dyn

    CAN-PINN: A Fast Physics-Informed Neural Network Based on Coupled-Automatic-Numerical Differentiation Method

    Authors: Pao-Hsiung Chiu, Jian Cheng Wong, Chinchun Ooi, My Ha Dao, Yew-Soon Ong

    Abstract: In this study, novel physics-informed neural network (PINN) methods for coupling neighboring support points and their derivative terms which are obtained by automatic differentiation (AD), are proposed to allow efficient training with improved accuracy. The computation of differential operators required for PINNs loss evaluation at collocation points are conventionally obtained via AD. Although AD… ▽ More

    Submitted 27 March, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

    Comments: 25 pages, 20 figures

    Journal ref: Computer Methods in Applied Mechanics and Engineering, Volume 395, 15 May 2022, 114909

  25. arXiv:2109.12777  [pdf, other

    cs.LG cs.CL

    ReINTEL Challenge 2020: A Comparative Study of Hybrid Deep Neural Network for Reliable Intelligence Identification on Vietnamese SNSs

    Authors: Hoang Viet Trinh, Tung Tien Bui, Tam Minh Nguyen, Huy Quang Dao, Quang Huu Pham, Ngoc N. Tran, Ta Minh Thanh

    Abstract: The overwhelming abundance of data has created a misinformation crisis. Unverified sensationalism that is designed to grab the readers' short attention span, when crafted with malice, has caused irreparable damage to our society's structure. As a result, determining the reliability of an article has become a crucial task. After various ablation studies, we propose a multi-input model that can effe… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Journal ref: Proceedings of the 7th International Workshop on Vietnamese Language and Speech Processing (VLSP), Hanoi, Vietnam, 2020, pp. 6-12

  26. arXiv:2109.06613  [pdf, other

    cs.CR cs.SE

    Exploring the Use of Static and Dynamic Analysis to Improve the Performance of the Mining Sandbox Approach for Android Malware Identification

    Authors: Francisco Handrick da Costa, Ismael Medeiros, Thales Menezes, João Victor da Silva, Ingrid Lorraine da Silva, Rodrigo Bonifácio, Krishna Narasimhan, Márcio Ribeiro

    Abstract: The Android mining sandbox approach consists in running dynamic analysis tools on a benign version of an Android app and recording every call to sensitive APIs. Later, one can use this information to (a) prevent calls to other sensitive APIs (those not previously recorded) or (b) run the dynamic analysis tools again in a different version of the app -- in order to identify possible malicious behav… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: 31 pages, 6 figures. Paper accepted for publication in The Journal of Systems & Software

  27. arXiv:2106.02338  [pdf

    physics.flu-dyn cs.CE

    Projection-Based Reduced Order Model for Simulations of Nonlinear Flows with Multiple Moving Objects

    Authors: My Ha Dao

    Abstract: This paper presents a reduced order approach for transient modeling of multiple moving objects in nonlinear crossflows. The Proper Orthogonal Decomposition method and the Galerkin projection are used to construct a reduced version of the nonlinear Navier-Stokes equations. The Galerkin projection implemented in OpenFOAM platform allows accurate impositions of arbitrary time-dependent boundary condi… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  28. arXiv:2105.10854  [pdf

    cs.CE

    Projection-Based Reduced Order Model and Machine Learning Closure for Transient Simulations of High-Re Flows

    Authors: My Ha Dao, Hoang Huy Nguyen

    Abstract: The paper presents a Projection-Based Reduced-Order Model for simulations of high Reynolds turbulent flows. The PBROM are enhanced by incorporating various models of turbulent viscosity and residual closures to model the effects of interactions among the modes and energy dissipations. Remarkable improvements in prediction accuracies are achieved with a suitable turbulent viscosity model and a resi… ▽ More

    Submitted 23 May, 2021; originally announced May 2021.

  29. arXiv:2105.01838  [pdf

    cs.LG physics.comp-ph physics.flu-dyn

    Improved Surrogate Modeling of Fluid Dynamics with Physics-Informed Neural Networks

    Authors: Jian Cheng Wong, Chinchun Ooi, Pao-Hsiung Chiu, My Ha Dao

    Abstract: Physics-Informed Neural Networks (PINNs) have recently shown great promise as a way of incorporating physics-based domain knowledge, including fundamental governing equations, into neural network models for many complex engineering systems. They have been particularly effective in the area of inverse problems, where boundary conditions may be ill-defined, and data-absent scenarios, where typical s… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

    Comments: No comment

  30. arXiv:2105.01194  [pdf, ps, other

    cs.NI

    Network Coding in Photonic-land: Three Commandments for Future-proof Optical Core Networks

    Authors: Hai Dao

    Abstract: The digital transformation has been underway, creating digital shadows of (almost) all physical entities and moving them to the Internet. The era of Internet of Everything has therefore started to come into play, giving rise to unprecedented traffic growths. In this context, optical core networks forming the backbone of Internet infrastructure have been under critical issues of reaching the capaci… ▽ More

    Submitted 27 September, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: 6 pages, 6 figures, 4 tables, accepted version to IEEE Workshop on Microwave Theory and Techniques in Wireless Communications (MTTW 2021)

  31. arXiv:2104.03879  [pdf, other

    cs.CL

    COVID-19 Named Entity Recognition for Vietnamese

    Authors: Thinh Hung Truong, Mai Hoang Dao, Dat Quoc Nguyen

    Abstract: The current COVID-19 pandemic has lead to the creation of many corpora that facilitate NLP research and downstream applications to help fight the pandemic. However, most of these corpora are exclusively for English. As the pandemic is a global problem, it is worth creating COVID-19 related datasets for languages other than English. In this paper, we present the first manually-annotated COVID-19 do… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: To appear in Proceedings of NAACL 2021

  32. arXiv:2104.02021  [pdf, other

    cs.CL

    Intent Detection and Slot Filling for Vietnamese

    Authors: Mai Hoang Dao, Thinh Hung Truong, Dat Quoc Nguyen

    Abstract: Intent detection and slot filling are important tasks in spoken and natural language understanding. However, Vietnamese is a low-resource language in these research topics. In this paper, we present the first public intent detection and slot filling dataset for Vietnamese. In addition, we also propose a joint model for intent detection and slot filling, that extends the recent state-of-the-art Joi… ▽ More

    Submitted 9 June, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

    Comments: To appear in Proceedings of INTERSPEECH 2021; The first two authors contributed equally to this work

  33. Interpreting the Latent Space of Generative Adversarial Networks using Supervised Learning

    Authors: Toan Pham Van, Tam Minh Nguyen, Ngoc N. Tran, Hoai Viet Nguyen, Linh Bao Doan, Huy Quang Dao, Thanh Ta Minh

    Abstract: With great progress in the development of Generative Adversarial Networks (GANs), in recent years, the quest for insights in understanding and manipulating the latent space of GAN has gained more and more attention due to its wide range of applications. While most of the researches on this task have focused on unsupervised learning method, which induces difficulties in training and limitation in r… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

    Comments: Published in 2020 International Conference on Advanced Computing and Applications (ACOMP)

    Journal ref: 2020 International Conference on Advanced Computing and Applications (ACOMP), Quy Nhon, Vietnam, 2020, pp. 49-54

  34. arXiv:2010.08232  [pdf, other

    cs.CL

    WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets

    Authors: Dat Quoc Nguyen, Thanh Vu, Afshin Rahimi, Mai Hoang Dao, Linh The Nguyen, Long Doan

    Abstract: In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

    Comments: In Proceedings of the 6th Workshop on Noisy User-generated Text

  35. arXiv:2010.01891  [pdf, other

    cs.CL cs.AI

    A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese

    Authors: Anh Tuan Nguyen, Mai Hoang Dao, Dat Quoc Nguyen

    Abstract: Semantic parsing is an important NLP task. However, Vietnamese is a low-resource language in this research area. In this paper, we present the first public large-scale Text-to-SQL semantic parsing dataset for Vietnamese. We extend and evaluate two strong semantic parsing baselines EditSQL (Zhang et al., 2019) and IRNet (Guo et al., 2019) on our dataset. We compare the two baselines with key config… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020 (Findings)

  36. arXiv:2009.14330  [pdf, ps, other

    cs.CR cs.CY

    A machine learning approach for detecting CNAME cloaking-based tracking on the Web

    Authors: Ha Dao, Kensuke Fukuda

    Abstract: Various in-browser privacy protection techniques have been designed to protect end-users from third-party tracking. In an arms race against these counter-measures, the tracking providers developed a new technique called CNAME cloaking based tracking to avoid issues with browsers that block third-party cookies and requests. To detect this tracking technique, browser extensions require on-demand DNS… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

    Comments: This paper is going to be published in IEEE Globecom 2020

  37. Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification

    Authors: Haocong Rao, Siqi Wang, Xiping Hu, Mingkui Tan, Huang Da, Jun Cheng, Bin Hu

    Abstract: Gait-based person re-identification (Re-ID) is valuable for safety-critical applications, and using only 3D skeleton data to extract discriminative gait features for person Re-ID is an emerging open topic. Existing methods either adopt hand-crafted features or learn gait features by traditional supervised learning paradigms. Unlike previous methods, we for the first time propose a generic gait enc… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

    Comments: Accepted at IJCAI 2020 Main Track. Sole copyright holder is IJCAI. Codes are available at https://github.com/Kali-Hac/SGE-LA

    Journal ref: In IJCAI, pages 898-905, 2020

  38. arXiv:2005.14229  [pdf, other

    cs.CV

    FCN+RL: A Fully Convolutional Network followed by Refinement Layers to Offline Handwritten Signature Segmentation

    Authors: Celso A. M. Lopes Junior, Matheus Henrique M. da Silva, Byron Leite Dantas Bezerra, Bruno Jose Torres Fernandes, Donato Impedovo

    Abstract: Although secular, handwritten signature is one of the most reliable biometric methods used by most countries. In the last ten years, the application of technology for verification of handwritten signatures has evolved strongly, including forensic aspects. Some factors, such as the complexity of the background and the small size of the region of interest - signature pixels - increase the difficulty… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: 7 pages, 6 figures, Accepted at IJCNN 2020: International Joint Conference on Neural Networks

  39. arXiv:2005.11811  [pdf, other

    cs.CV cs.LG eess.IV

    Recognizing Families through Images with Pretrained Encoder

    Authors: Tuan-Duy H. Nguyen, Huu-Nghia H. Nguyen, Hieu Dao

    Abstract: Kinship verification and kinship retrieval are emerging tasks in computer vision. Kinship verification aims at determining whether two facial images are from related people or not, while kinship retrieval is the task of retrieving possible related facial images to a person from a gallery of images. They introduce unique challenges because of the hidden relations and features that carry inherent ch… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

    Comments: Will appear as part of RFIW2020 in the Proceedings of 2020 International Conference on Automatic Face and Gesture Recognition (IEEE AMFG)

  40. arXiv:2003.10340  [pdf, other

    physics.soc-ph cs.CY

    Entropy as a measure of attractiveness and socioeconomic complexity in Rio de Janeiro metropolitan area

    Authors: Maxime Lenormand, Horacio Samaniego, Julio C. Chaves, Vinicius F. Vieira, Moacyr A. H. B. da Silva, Alexandre G. Evsukoff

    Abstract: Defining and measuring spatial inequalities across the urban environment remains a complex and elusive task that has been facilitated by the increasing availability of large geolocated databases. In this study, we rely on a mobile phone dataset and an entropy-based metric to measure the attractiveness of a location in the Rio de Janeiro Metropolitan Area (Brazil) as the diversity of visitors' loca… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

    Comments: 11 pages, 8 figures + Appendix

    Journal ref: Entropy 22, 368 (2020)

  41. arXiv:2002.11213  [pdf, other

    cs.CL cs.SD eess.AS

    Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models

    Authors: Edresson Casanova, Arnaldo Candido Junior, Christopher Shulby, Frederico Santos de Oliveira, Lucas Rafael Stefanel Gris, Hamilton Pereira da Silva, Sandra Maria Aluisio, Moacir Antonelli Ponti

    Abstract: In this paper we present an efficient method for training models for speaker recognition using small or under-resourced datasets. This method requires less data than other SOTA (State-Of-The-Art) methods, e.g. the Angular Prototypical and GE2E loss functions, while achieving similar results to those methods. This is done using the knowledge of the reconstruction of a phoneme in the speaker's voice… ▽ More

    Submitted 18 June, 2021; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: Submitted to BRACIS