Search | arXiv e-print repository

Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection

Authors: YeongHyeon Park, Sungho Kang, Myung Jin Kim, Hyeong Seok Kim, Juneho Yi

Abstract: In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that av… ▽ More In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that avoids large-scale complex NNs. Motivated by this, we aim to optimize the UAD performance with minimal changes to NN settings. Thus, we revisit the reconstruction-by-inpainting approach and rethink to improve it by analyzing strengths and weaknesses. The strength of the SOTA methods is a single deterministic masking approach that addresses the challenges of random multiple masking that is inference latency and output inconsistency. Nevertheless, the issue of failure to provide a mask to completely cover anomalous regions is a remaining weakness. To mitigate this issue, we propose Feature Attenuation of Defective Representation (FADeR) that only employs two MLP layers which attenuates feature information of anomaly reconstruction during decoding. By leveraging FADeR, features of unseen anomaly patterns are reconstructed into seen normal patterns, reducing false alarms. Experimental results demonstrate that FADeR achieves enhanced performance compared to similar-scale NNs. Furthermore, our approach exhibits scalability in performance enhancement when integrated with other single deterministic masking methods in a plug-and-play manner. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 11 pages, 6 figures, 5 tables

arXiv:2406.09246 [pdf, other]

OpenVLA: An Open-Source Vision-Language-Action Model

Authors: Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn

Abstract: Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has be… ▽ More Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has been challenging as 1) existing VLAs are largely closed and inaccessible to the public, and 2) prior work fails to explore methods for efficiently fine-tuning VLAs for new tasks, a key component for adoption. Addressing these challenges, we introduce OpenVLA, a 7B-parameter open-source VLA trained on a diverse collection of 970k real-world robot demonstrations. OpenVLA builds on a Llama 2 language model combined with a visual encoder that fuses pretrained features from DINOv2 and SigLIP. As a product of the added data diversity and new model components, OpenVLA demonstrates strong results for generalist manipulation, outperforming closed models such as RT-2-X (55B) by 16.5% in absolute task success rate across 29 tasks and multiple robot embodiments, with 7x fewer parameters. We further show that we can effectively fine-tune OpenVLA for new settings, with especially strong generalization results in multi-task environments involving multiple objects and strong language grounding abilities, and outperform expressive from-scratch imitation learning methods such as Diffusion Policy by 20.4%. We also explore compute efficiency; as a separate contribution, we show that OpenVLA can be fine-tuned on consumer GPUs via modern low-rank adaptation methods and served efficiently via quantization without a hit to downstream success rate. Finally, we release model checkpoints, fine-tuning notebooks, and our PyTorch codebase with built-in support for training VLAs at scale on Open X-Embodiment datasets. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Website: https://openvla.github.io/

arXiv:2405.02499 [pdf, other]

DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands

Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

Abstract: The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t… ▽ More The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address mapping and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: To appear at the 51st IEEE/ACM International Symposium on Computer Architecture (ISCA)

arXiv:2403.08302 [pdf, other]

Online Multi-Contact Feedback Model Predictive Control for Interactive Robotic Tasks

Authors: Seo Wook Han, Maged Iskandar, Jinoh Lee, Min Jun Kim

Abstract: In this paper, we propose a model predictive control (MPC) that accomplishes interactive robotic tasks, in which multiple contacts may occur at unknown locations. To address such scenarios, we made an explicit contact feedback loop in the MPC framework. An algorithm called Multi-Contact Particle Filter with Exploration Particle (MCP-EP) is employed to establish real-time feedback of multi-contact… ▽ More In this paper, we propose a model predictive control (MPC) that accomplishes interactive robotic tasks, in which multiple contacts may occur at unknown locations. To address such scenarios, we made an explicit contact feedback loop in the MPC framework. An algorithm called Multi-Contact Particle Filter with Exploration Particle (MCP-EP) is employed to establish real-time feedback of multi-contact information. Then the interaction locations and forces are accommodated in the MPC framework via a spring contact model. Moreover, we achieved real-time control for a 7 degrees of freedom robot without any simplifying assumptions by employing a Differential-Dynamic-Programming algorithm. We achieved 6.8kHz, 1.9kHz, and 1.8kHz update rates of the MPC for 0, 1, and 2 contacts, respectively. This allows the robot to handle unexpected contacts in real time. Real-world experiments show the effectiveness of the proposed method in various scenarios. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: This paper has been accepted for publication at the IEEE International Conference on Robotics and Automation (ICRA), Yokohama, 2024

arXiv:2403.08187 [pdf, other]

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

Authors: Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung Nam

Abstract: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit… ▽ More This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children with SSDs is impractical. We fine-tuned the wav2vec 2.0 XLS-R model to recognize speech as pronounced rather than as existing words. The model was fine-tuned with a speech dataset from 137 children with inadequate speech production pronouncing 73 Korean words selected for actual clinical diagnosis. The model's predictions of the pronunciations of the words matched the human annotations with about 90% accuracy. While the model still requires improvement in recognizing unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 12 pages, 2 figures

ACM Class: I.2.7

arXiv:2402.16785 [pdf, other]

CARTE: Pretraining and Transfer for Tabular Learning

Authors: Myung Jun Kim, Léo Grinsztajn, Gaël Varoquaux

Abstract: Pretrained deep-learning models are the go-to solution for images or text. However, for tabular data the standard is still to train tree-based models. Indeed, transfer learning on tables hits the challenge of data integration: finding correspondences, correspondences in the entries (entity matching) where different words may denote the same entity, correspondences across columns (schema matching),… ▽ More Pretrained deep-learning models are the go-to solution for images or text. However, for tabular data the standard is still to train tree-based models. Indeed, transfer learning on tables hits the challenge of data integration: finding correspondences, correspondences in the entries (entity matching) where different words may denote the same entity, correspondences across columns (schema matching), which may come in different orders, names... We propose a neural architecture that does not need such correspondences. As a result, we can pretrain it on background data that has not been matched. The architecture -- CARTE for Context Aware Representation of Table Entries -- uses a graph representation of tabular (or relational) data to process tables with different columns, string embedding of entries and columns names to model an open vocabulary, and a graph-attentional network to contextualize entries with column names and neighboring entries. An extensive benchmark shows that CARTE facilitates learning, outperforming a solid set of baselines including the best tree-based models. CARTE also enables joint learning across tables with unmatched columns, enhancing a small table with bigger ones. CARTE opens the door to large pretrained models for tabular data. △ Less

Submitted 31 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2312.09634 [pdf, other]

Vectorizing string entries for data processing on tables: when are larger language models better?

Authors: Léo Grinsztajn, Edouard Oyallon, Myung Jun Kim, Gaël Varoquaux

Abstract: There are increasingly efficient data processing pipelines that work on vectors of numbers, for instance most machine learning models, or vector databases for fast similarity search. These require converting the data to numbers. While this conversion is easy for simple numerical and categorical entries, databases are strife with text entries, such as names or descriptions. In the age of large lang… ▽ More There are increasingly efficient data processing pipelines that work on vectors of numbers, for instance most machine learning models, or vector databases for fast similarity search. These require converting the data to numbers. While this conversion is easy for simple numerical and categorical entries, databases are strife with text entries, such as names or descriptions. In the age of large language models, what's the best strategies to vectorize tables entries, baring in mind that larger models entail more operational complexity? We study the benefits of language models in 14 analytical tasks on tables while varying the training size, as well as for a fuzzy join benchmark. We introduce a simple characterization of a column that reveals two settings: 1) a dirty categories setting, where strings share much similarities across entries, and conversely 2) a diverse entries setting. For dirty categories, pretrained language models bring little-to-no benefit compared to simpler string models. For diverse entries, we show that larger language models improve data processing. For these we investigate the complexity-performance tradeoffs and show that they reflect those of classic text embedding: larger models tend to perform better, but it is useful to fine tune them for embedding purposes. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2310.08864 [pdf, other]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io. △ Less

Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Project website: https://robotics-transformer-x.github.io

arXiv:2310.04044 [pdf, other]

Graph-based 3D Collision-distance Estimation Network with Probabilistic Graph Rewiring

Authors: Minjae Song, Yeseung Kim, Min Jun Kim, Daehyung Park

Abstract: We aim to solve the problem of data-driven collision-distance estimation given 3-dimensional (3D) geometries. Conventional algorithms suffer from low accuracy due to their reliance on limited representations, such as point clouds. In contrast, our previous graph-based model, GraphDistNet, achieves high accuracy using edge information but incurs higher message-passing costs with growing graph size,… ▽ More We aim to solve the problem of data-driven collision-distance estimation given 3-dimensional (3D) geometries. Conventional algorithms suffer from low accuracy due to their reliance on limited representations, such as point clouds. In contrast, our previous graph-based model, GraphDistNet, achieves high accuracy using edge information but incurs higher message-passing costs with growing graph size, limiting its applicability to 3D geometries. To overcome these challenges, we propose GDN-R, a novel 3D graph-based estimation network.GDN-R employs a layer-wise probabilistic graph-rewiring algorithm leveraging the differentiable Gumbel-top-K relaxation. Our method accurately infers minimum distances through iterative graph rewiring and updating relevant embeddings. The probabilistic rewiring enables fast and robust embedding with respect to unforeseen categories of geometries. Through 41,412 random benchmark tasks with 150 pairs of 3D objects, we show GDN-R outperforms state-of-the-art baseline methods in terms of accuracy and generalizability. We also show that the proposed rewiring improves the update performance reducing the size of the estimation model. We finally show its batch prediction and auto-differentiation capabilities for trajectory optimization in both simulated and real-world scenarios. △ Less

Submitted 10 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: 7 pages, 6 figures

arXiv:2310.04010 [pdf, other]

Excision And Recovery: Visual Defect Obfuscation Based Self-Supervised Anomaly Detection Strategy

Authors: YeongHyeon Park, Sungho Kang, Myung Jin Kim, Yeonho Lee, Hyeong Seok Kim, Juneho Yi

Abstract: Due to scarcity of anomaly situations in the early manufacturing stage, an unsupervised anomaly detection (UAD) approach is widely adopted which only uses normal samples for training. This approach is based on the assumption that the trained UAD model will accurately reconstruct normal patterns but struggles with unseen anomalous patterns. To enhance the UAD performance, reconstruction-by-inpainti… ▽ More Due to scarcity of anomaly situations in the early manufacturing stage, an unsupervised anomaly detection (UAD) approach is widely adopted which only uses normal samples for training. This approach is based on the assumption that the trained UAD model will accurately reconstruct normal patterns but struggles with unseen anomalous patterns. To enhance the UAD performance, reconstruction-by-inpainting based methods have recently been investigated, especially on the masking strategy of suspected defective regions. However, there are still issues to overcome: 1) time-consuming inference due to multiple masking, 2) output inconsistency by random masking strategy, and 3) inaccurate reconstruction of normal patterns when the masked area is large. Motivated by this, we propose a novel reconstruction-by-inpainting method, dubbed Excision And Recovery (EAR), that features single deterministic masking based on the ImageNet pre-trained DINO-ViT and visual obfuscation for hint-providing. Experimental results on the MVTec AD dataset show that deterministic masking by pre-trained attention effectively cuts out suspected defective regions and resolve the aforementioned issues 1 and 2. Also, hint-providing by mosaicing proves to enhance the UAD performance than emptying those regions by binary masking, thereby overcomes issue 3. Our approach achieves a high UAD performance without any change of the neural network structure. Thus, we suggest that EAR be adopted in various manufacturing industries as a practically deployable solution. △ Less

Submitted 9 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: 10 pages, 5 figures, 5 tables

arXiv:2308.14595 [pdf, other]

Neural Network Training Strategy to Enhance Anomaly Detection Performance: A Perspective on Reconstruction Loss Amplification

Authors: YeongHyeon Park, Sungho Kang, Myung Jin Kim, Hyeonho Jeong, Hyunkyu Park, Hyeong Seok Kim, Juneho Yi

Abstract: Unsupervised anomaly detection (UAD) is a widely adopted approach in industry due to rare anomaly occurrences and data imbalance. A desirable characteristic of an UAD model is contained generalization ability which excels in the reconstruction of seen normal patterns but struggles with unseen anomalies. Recent studies have pursued to contain the generalization capability of their UAD models in rec… ▽ More Unsupervised anomaly detection (UAD) is a widely adopted approach in industry due to rare anomaly occurrences and data imbalance. A desirable characteristic of an UAD model is contained generalization ability which excels in the reconstruction of seen normal patterns but struggles with unseen anomalies. Recent studies have pursued to contain the generalization capability of their UAD models in reconstruction from different perspectives, such as design of neural network (NN) structure and training strategy. In contrast, we note that containing of generalization ability in reconstruction can also be obtained simply from steep-shaped loss landscape. Motivated by this, we propose a loss landscape sharpening method by amplifying the reconstruction loss, dubbed Loss AMPlification (LAMP). LAMP deforms the loss landscape into a steep shape so the reconstruction error on unseen anomalies becomes greater. Accordingly, the anomaly detection performance is improved without any change of the NN architecture. Our findings suggest that LAMP can be easily applied to any reconstruction error metrics in UAD settings where the reconstruction model is trained with anomaly-free samples only. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: 5 pages, 4 figures, 2 tables

arXiv:2308.12952 [pdf, other]

BridgeData V2: A Dataset for Robot Learning at Scale

Authors: Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine

Abstract: We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains,… ▽ More We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains, and institutions, making the dataset a useful resource for a broad range of researchers. Additionally, the dataset is compatible with a wide variety of open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions. In our experiments, we train 6 state-of-the-art imitation learning and offline reinforcement learning methods on our dataset, and find that they succeed on a suite of tasks requiring varying amounts of generalization. We also demonstrate that the performance of these methods improves with more data and higher capacity models, and that training on a greater variety of skills leads to improved generalization. By publicly sharing BridgeData V2 and our pre-trained models, we aim to accelerate research in scalable robot learning methods. Project page at https://rail-berkeley.github.io/bridgedata △ Less

Submitted 17 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 9 pages

arXiv:2307.05959 [pdf, other]

Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

Authors: Moo Jin Kim, Jiajun Wu, Chelsea Finn

Abstract: Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. However, for robotic imitation, it is still expensive to have a human teleoperator collect large amounts of expert demonstrations with a real robot. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for e… ▽ More Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. However, for robotic imitation, it is still expensive to have a human teleoperator collect large amounts of expert demonstrations with a real robot. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation and can be quickly captured in a wide range of scenarios. Therefore, human video demonstrations are a promising data source for learning generalizable robotic manipulation policies at scale. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies. Although a clear visual domain gap exists between human and robot data, our framework does not need to employ any explicit domain adaptation method, as we leverage the partial observability of eye-in-hand cameras as well as a simple fixed image masking scheme. On a suite of eight real-world tasks involving both 3-DoF and 6-DoF robot arm control, our method improves the success rates of eye-in-hand manipulation policies by 58% (absolute) on average, enabling robots to generalize to both new environment configurations and new tasks that are unseen in the robot demonstration data. See video results at https://giving-robots-a-hand.github.io/ . △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: 21 pages, 7 figures, project webpage at https://giving-robots-a-hand.github.io/

arXiv:2307.04298 [pdf, other]

Edge Storage Management Recipe with Zero-Shot Data Compression for Road Anomaly Detection

Authors: YeongHyeon Park, Uju Gim, Myung Jin Kim

Abstract: Recent studies show edge computing-based road anomaly detection systems which may also conduct data collection simultaneously. However, the edge computers will have small data storage but we need to store the collected audio samples for a long time in order to update existing models or develop a novel method. Therefore, we should consider an approach for efficient storage management methods while… ▽ More Recent studies show edge computing-based road anomaly detection systems which may also conduct data collection simultaneously. However, the edge computers will have small data storage but we need to store the collected audio samples for a long time in order to update existing models or develop a novel method. Therefore, we should consider an approach for efficient storage management methods while preserving high-fidelity audio. A hardware-perspective approach, such as using a low-resolution microphone, is an intuitive way to reduce file size but is not recommended because it fundamentally cuts off high-frequency components. On the other hand, a computational file compression approach that encodes collected high-resolution audio into a compact code should be recommended because it also provides a corresponding decoding method. Motivated by this, we propose a way of simple yet effective pre-trained autoencoder-based data compression method. The pre-trained autoencoder is trained for the purpose of audio super-resolution so it can be utilized to encode or decode any arbitrary sampling rate. Moreover, it will reduce the communication cost for data transmission from the edge to the central server. Via the comparative experiments, we confirm that the zero-shot audio compression and decompression highly preserve anomaly detection performance while enhancing storage and transmission efficiency. △ Less

Submitted 26 August, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

Comments: 5 pages, 3 figures, 4 tables

arXiv:2306.03366 [pdf, other]

doi 10.1109/LCA.2023.3296153

X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands

Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

Abstract: The demand for accurate information about the internal structure and characteristics of dynamic random-access memory (DRAM) has been on the rise. Recent studies have explored the structure and characteristics of DRAM to improve processing in memory, enhance reliability, and mitigate a vulnerability known as rowhammer. However, DRAM manufacturers only disclose limited information through official d… ▽ More The demand for accurate information about the internal structure and characteristics of dynamic random-access memory (DRAM) has been on the rise. Recent studies have explored the structure and characteristics of DRAM to improve processing in memory, enhance reliability, and mitigate a vulnerability known as rowhammer. However, DRAM manufacturers only disclose limited information through official documents, making it difficult to find specific information about actual DRAM devices. This paper presents reliable findings on the internal structure and characteristics of DRAM using activate-induced bitflips (AIBs), retention time test, and row-copy operation. While previous studies have attempted to understand the internal behaviors of DRAM devices, they have only shown results without identifying the causes or have analyzed DRAM modules rather than individual chips. We first uncover the size, structure, and operation of DRAM subarrays and verify our findings on the characteristics of DRAM. Then, we correct misunderstood information related to AIBs and demonstrate experimental results supporting the cause of rowhammer. We expect that the information we uncover about the structure, behavior, and characteristics of DRAM will help future DRAM research. △ Less

Submitted 12 August, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 4 pages, 7 figures, accepted at IEEE Computer Architecture Letters

arXiv:2305.16466 [pdf, other]

Hierarchical Whole-body Control of the cable-Suspended Aerial Manipulator endowed with Winch-based Actuation

Authors: Yuri Sarkisov, Andre Coelho, Maihara Santos, Min Jun Kim, Dzmitry Tsetserukou, Christian Ott, Konstantin Kondak

Abstract: During operation, aerial manipulation systems are affected by various disturbances. Among them is a gravitational torque caused by the weight of the robotic arm. Common propeller-based actuation is ineffective against such disturbances because of possible overheating and high power consumption. To overcome this issue, in this paper we propose a winchbased actuation for the crane-stationed cable-su… ▽ More During operation, aerial manipulation systems are affected by various disturbances. Among them is a gravitational torque caused by the weight of the robotic arm. Common propeller-based actuation is ineffective against such disturbances because of possible overheating and high power consumption. To overcome this issue, in this paper we propose a winchbased actuation for the crane-stationed cable-suspended aerial manipulator. Three winch-controlled suspension rigging cables produce a desired cable tension distribution to generate a wrench that reduces the effect of gravitational torque. In order to coordinate the robotic arm and the winch-based actuation, a model-based hierarchical whole-body controller is adapted. It resolves two tasks: keeping the robotic arm end-effector at the desired pose and shifting the system center of mass in the location with zero gravitational torque. The performance of the introduced actuation system as well as control strategy is validated through experimental studies. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: accepted to IEEE International Conference on Robotics and Automation (ICRA) 2023

arXiv:2305.07844 [pdf, other]

Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions

Authors: Michael Gimelfarb, Michael Jong Kim

Abstract: We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference. One key defining feature of such models is the presence of "uninformative" actions that provide no information about the unknown parameters. We contribute a set of assumptions for PMDPs under which Thompson sampling guarantees an asymptotically optimal expected regr… ▽ More We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference. One key defining feature of such models is the presence of "uninformative" actions that provide no information about the unknown parameters. We contribute a set of assumptions for PMDPs under which Thompson sampling guarantees an asymptotically optimal expected regret bound of $O(T^{-1})$, which are easily verified for many classes of problems such as queuing, inventory control, and dynamic pricing. △ Less

Submitted 13 May, 2023; originally announced May 2023.

arXiv:2303.10567 [pdf, other]

doi 10.1109/ICRA48891.2023.10160334

Passivity-based Decentralized Control for Collaborative Grasping of Under-Actuated Aerial Manipulators

Authors: Jinyeong Jeong, Min Jun Kim

Abstract: This paper proposes a decentralized passive impedance control scheme for collaborative grasping using under-actuated aerial manipulators (AMs). The AM system is formulated, using a proper coordinate transformation, as an inertially decoupled dynamics with which a passivity-based control design is conducted. Since the interaction for grasping can be interpreted as a feedback interconnection of pass… ▽ More This paper proposes a decentralized passive impedance control scheme for collaborative grasping using under-actuated aerial manipulators (AMs). The AM system is formulated, using a proper coordinate transformation, as an inertially decoupled dynamics with which a passivity-based control design is conducted. Since the interaction for grasping can be interpreted as a feedback interconnection of passive systems, an arbitrary number of AMs can be modularly combined, leading to a decentralized control scheme. Another interesting consequence of the passivity property is that the AMs automatically converge to a certain configuration to accomplish the grasping. Collaborative grasping using 10 AMs is presented in simulation. △ Less

Submitted 19 March, 2023; originally announced March 2023.

Comments: IEEE International Conference on Robotics and Automation (ICRA) 2023

arXiv:2303.03903 [pdf, other]

doi 10.1109/ICRA48891.2023.10161173

Proprioceptive Sensor-Based Simultaneous Multi-Contact Point Localization and Force Identification for Robotic Arms

Authors: Seo Wook Han, Min Jun Kim

Abstract: In this paper, we propose an algorithm that estimates contact point and force simultaneously. We consider a collaborative robot equipped with proprioceptive sensors, in particular, joint torque sensors (JTSs) and a base force/torque (F/T) sensor. The proposed method has the following advantages. First, fast computation is achieved by proper preprocessing of robot meshes. Second, multi-contact can… ▽ More In this paper, we propose an algorithm that estimates contact point and force simultaneously. We consider a collaborative robot equipped with proprioceptive sensors, in particular, joint torque sensors (JTSs) and a base force/torque (F/T) sensor. The proposed method has the following advantages. First, fast computation is achieved by proper preprocessing of robot meshes. Second, multi-contact can be identified with the aid of the base F/T sensor, while this is challenging when the robot is equipped with only JTSs. The proposed method is a modification of the standard particle filter to cope with mesh preprocessing and with available sensor data. In simulation validation, for a 7 degree-of-freedom robot, the algorithm runs at 2200Hz with 99.96% success rate for the single-contact case. In terms of the run-time, the proposed method was >=3.5X faster compared to the existing methods. Dual and triple contacts are also reported in the manuscript. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: 2023 International Conference on Robotics and Automation (ICRA)

arXiv:2303.03825 [pdf, other]

doi 10.1109/ICRA48891.2023.10160294.

A Reachability Tree-Based Algorithm for Robot Task and Motion Planning

Authors: Kanghyun Kim, Daehyung Park, Min Jun Kim

Abstract: This paper presents a novel algorithm for robot task and motion planning (TAMP) problems by utilizing a reachability tree. While tree-based algorithms are known for their speed and simplicity in motion planning (MP), they are not well-suited for TAMP problems that involve both abstracted and geometrical state variables. To address this challenge, we propose a hierarchical sampling strategy, which… ▽ More This paper presents a novel algorithm for robot task and motion planning (TAMP) problems by utilizing a reachability tree. While tree-based algorithms are known for their speed and simplicity in motion planning (MP), they are not well-suited for TAMP problems that involve both abstracted and geometrical state variables. To address this challenge, we propose a hierarchical sampling strategy, which first generates an abstracted task plan using Monte Carlo tree search (MCTS) and then fills in the details with a geometrically feasible motion trajectory. Moreover, we show that the performance of the proposed method can be significantly enhanced by selecting an appropriate reward for MCTS and by using a pre-generated goal state that is guaranteed to be geometrically feasible. A comparative study using TAMP benchmark problems demonstrates the effectiveness of the proposed approach. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: IEEE International Conference on Robotics and Automation (ICRA) 2023

arXiv:2301.08556 [pdf, other]

NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis

Authors: Allan Zhou, Moo Jin Kim, Lirui Wang, Pete Florence, Chelsea Finn

Abstract: Expert demonstrations are a rich source of supervision for training visual robotic manipulation policies, but imitation learning methods often require either a large number of demonstrations or expensive online expert supervision to learn reactive closed-loop behaviors. In this work, we introduce SPARTN (Synthetic Perturbations for Augmenting Robot Trajectories via NeRF): a fully-offline data augm… ▽ More Expert demonstrations are a rich source of supervision for training visual robotic manipulation policies, but imitation learning methods often require either a large number of demonstrations or expensive online expert supervision to learn reactive closed-loop behaviors. In this work, we introduce SPARTN (Synthetic Perturbations for Augmenting Robot Trajectories via NeRF): a fully-offline data augmentation scheme for improving robot policies that use eye-in-hand cameras. Our approach leverages neural radiance fields (NeRFs) to synthetically inject corrective noise into visual demonstrations, using NeRFs to generate perturbed viewpoints while simultaneously calculating the corrective actions. This requires no additional expert supervision or environment interaction, and distills the geometric information in NeRFs into a real-time reactive RGB-only policy. In a simulated 6-DoF visual grasping benchmark, SPARTN improves success rates by 2.8$\times$ over imitation learning without the corrective augmentations and even outperforms some methods that use online supervision. It additionally closes the gap between RGB-only and RGB-D success rates, eliminating the previous need for depth sensors. In real-world 6-DoF robotic grasping experiments from limited human demonstrations, our method improves absolute success rates by $22.5\%$ on average, including objects that are traditionally challenging for depth-based methods. See video results at \url{https://bland.website/spartn}. △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2211.01705 [pdf]

A speech corpus for chronic kidney disease

Authors: Jihyun Mun, Sunhee Kim, Myeong Ju Kim, Jiwon Ryu, Sejoong Kim, Minhwa Chung

Abstract: In this study, we present a speech corpus of patients with chronic kidney disease (CKD) that will be used for research on pathological voice analysis, automatic illness identification, and severity prediction. This paper introduces the steps involved in creating this corpus, including the choice of speech-related parameters and speech lists as well as the recording technique. The speakers in this… ▽ More In this study, we present a speech corpus of patients with chronic kidney disease (CKD) that will be used for research on pathological voice analysis, automatic illness identification, and severity prediction. This paper introduces the steps involved in creating this corpus, including the choice of speech-related parameters and speech lists as well as the recording technique. The speakers in this corpus, 289 CKD patients with varying degrees of severity who were categorized based on estimated glomerular filtration rate (eGFR), delivered sustained vowels, sentence, and paragraph stimuli. This study compared and analyzed the voice characteristics of CKD patients with those of the control group; the results revealed differences in voice quality, phoneme-level pronunciation, prosody, glottal source, and aerodynamic parameters. △ Less

Submitted 3 November, 2022; originally announced November 2022.

arXiv:2210.11068 [pdf, other]

Frequency of Interest-based Noise Attenuation Method to Improve Anomaly Detection Performance

Authors: YeongHyeon Park, Myung Jin Kim, Won Seok Park

Abstract: Accurately extracting driving events is the way to maximize computational efficiency and anomaly detection performance in the tire frictional nose-based anomaly detection task. This study proposes a concise and highly useful method for improving the precision of the event extraction that is hindered by extra noise such as wind noise, which is difficult to characterize clearly due to its randomness… ▽ More Accurately extracting driving events is the way to maximize computational efficiency and anomaly detection performance in the tire frictional nose-based anomaly detection task. This study proposes a concise and highly useful method for improving the precision of the event extraction that is hindered by extra noise such as wind noise, which is difficult to characterize clearly due to its randomness. The core of the proposed method is based on the identification of the road friction sound corresponding to the frequency of interest and removing the opposite characteristics with several frequency filters. Our method enables precision maximization of driving event extraction while improving anomaly detection performance by an average of 8.506%. Therefore, we conclude our method is a practical solution suitable for road surface anomaly detection purposes in outdoor edge computing environments. △ Less

Submitted 2 December, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: 5 pages, 4 figures, 4 tables

arXiv:2203.12677 [pdf, other]

Vision-Based Manipulators Need to Also See from Their Hands

Authors: Kyle Hsu, Moo Jin Kim, Rafael Rafailov, Jiajun Wu, Chelsea Finn

Abstract: We study how the choice of visual perspective affects learning and generalization in the context of physical manipulation from raw sensor observations. Compared with the more commonly used global third-person perspective, a hand-centric (eye-in-hand) perspective affords reduced observability, but we find that it consistently improves training efficiency and out-of-distribution generalization. Thes… ▽ More We study how the choice of visual perspective affects learning and generalization in the context of physical manipulation from raw sensor observations. Compared with the more commonly used global third-person perspective, a hand-centric (eye-in-hand) perspective affords reduced observability, but we find that it consistently improves training efficiency and out-of-distribution generalization. These benefits hold across a variety of learning algorithms, experimental settings, and distribution shifts, and for both simulated and real robot apparatuses. However, this is only the case when hand-centric observability is sufficient; otherwise, including a third-person perspective is necessary for learning, but also harms out-of-distribution generalization. To mitigate this, we propose to regularize the third-person information stream via a variational information bottleneck. On six representative manipulation tasks with varying hand-centric observability adapted from the Meta-World benchmark, this results in a state-of-the-art reinforcement learning agent operating from both perspectives improving its out-of-distribution generalization on every task. While some practitioners have long put cameras in the hands of robots, our work systematically analyzes the benefits of doing so and provides simple and broadly applicable insights for improving end-to-end learned vision-based robotic manipulation. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: First two authors contributed equally. ICLR 2022 (oral) camera-ready. 30 pages, 20 figures. Project website: https://sites.google.com/view/seeing-from-hands

arXiv:2201.06699 [pdf, other]

AESPA: Accuracy Preserving Low-degree Polynomial Activation for Fast Private Inference

Authors: Jaiyoung Park, Michael Jaemin Kim, Wonkyung Jung, Jung Ho Ahn

Abstract: Hybrid private inference (PI) protocol, which synergistically utilizes both multi-party computation (MPC) and homomorphic encryption, is one of the most prominent techniques for PI. However, even the state-of-the-art PI protocols are bottlenecked by the non-linear layers, especially the activation functions. Although a standard non-linear activation function can generate higher model accuracy, it… ▽ More Hybrid private inference (PI) protocol, which synergistically utilizes both multi-party computation (MPC) and homomorphic encryption, is one of the most prominent techniques for PI. However, even the state-of-the-art PI protocols are bottlenecked by the non-linear layers, especially the activation functions. Although a standard non-linear activation function can generate higher model accuracy, it must be processed via a costly garbled-circuit MPC primitive. A polynomial activation can be processed via Beaver's multiplication triples MPC primitive but has been incurring severe accuracy drops so far. In this paper, we propose an accuracy preserving low-degree polynomial activation function (AESPA) that exploits the Hermite expansion of the ReLU and basis-wise normalization. We apply AESPA to popular ML models, such as VGGNet, ResNet, and pre-activation ResNet, to show an inference accuracy comparable to those of the standard models with ReLU activation, achieving superior accuracy over prior low-degree polynomial studies. When applied to the all-RELU baseline on the state-of-the-art Delphi PI protocol, AESPA shows up to 42.1x and 28.3x lower online latency and communication cost. △ Less

Submitted 18 February, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

Comments: 11 pages, 5 figures

arXiv:2112.15479 [pdf, other]

doi 10.1145/3470496.3527415

BTS: An Accelerator for Bootstrappable Fully Homomorphic Encryption

Authors: Sangpyo Kim, Jongmin Kim, Michael Jaemin Kim, Wonkyung Jung, Minsoo Rhu, John Kim, Jung Ho Ahn

Abstract: Homomorphic encryption (HE) enables the secure offloading of computations to the cloud by providing computation on encrypted data (ciphertexts). HE is based on noisy encryption schemes in which noise accumulates as more computations are applied to the data. The limited number of operations applicable to the data prevents practical applications from exploiting HE. Bootstrapping enables an unlimited… ▽ More Homomorphic encryption (HE) enables the secure offloading of computations to the cloud by providing computation on encrypted data (ciphertexts). HE is based on noisy encryption schemes in which noise accumulates as more computations are applied to the data. The limited number of operations applicable to the data prevents practical applications from exploiting HE. Bootstrapping enables an unlimited number of operations or fully HE (FHE) by refreshing the ciphertext. Unfortunately, bootstrapping requires a significant amount of additional computation and memory bandwidth as well. Prior works have proposed hardware accelerators for computation primitives of FHE. However, to the best of our knowledge, this is the first to propose a hardware FHE accelerator that supports bootstrapping as a first-class citizen. In particular, we propose BTS - Bootstrappable, Technologydriven, Secure accelerator architecture for FHE. We identify the challenges of supporting bootstrapping in the accelerator and analyze the off-chip memory bandwidth and computation required. In particular, given the limitations of modern memory technology, we identify the HE parameter sets that are efficient for FHE acceleration. Based on the insights gained from our analysis, we propose BTS, which effectively exploits the parallelism innate in HE operations by arranging a massive number of processing elements in a grid. We present the design and microarchitecture of BTS, including a network-on-chip design that exploits a deterministic communication pattern. BTS shows 5,556x and 1,306x improved execution time on ResNet-20 and logistic regression over a CPU, with a chip area of 373.6mm^2 and up to 163.2W of power. △ Less

Submitted 28 April, 2022; v1 submitted 31 December, 2021; originally announced December 2021.

Comments: 15 pages, 10 figures

arXiv:2112.07214 [pdf, other]

Noise Reduction and Driving Event Extraction Method for Performance Improvement on Driving Noise-based Surface Anomaly Detection

Authors: YeongHyeon Park, JoonSung Lee, Myung Jin Kim, Wonseok Park

Abstract: Foreign substances on the road surface, such as rainwater or black ice, reduce the friction between the tire and the surface. The above situation will reduce the braking performance and make difficult to control the vehicle body posture. In that case, there is a possibility of property damage at least. In the worst case, personal damage will be occured. To avoid this problem, a road anomaly detect… ▽ More Foreign substances on the road surface, such as rainwater or black ice, reduce the friction between the tire and the surface. The above situation will reduce the braking performance and make difficult to control the vehicle body posture. In that case, there is a possibility of property damage at least. In the worst case, personal damage will be occured. To avoid this problem, a road anomaly detection model is proposed based on vehicle driving noise. However, the prior proposal does not consider the extra noise, mixed with driving noise, and skipping calculations for moments without vehicle driving. In this paper, we propose a simple driving event extraction method and noise reduction method for improving computational efficiency and anomaly detection performance. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: 3 pages, 3 figures, 2 tables

arXiv:2108.06703 [pdf, other]

Mithril: Cooperative Row Hammer Protection on Commodity DRAM Leveraging Managed Refresh

Authors: Michael Jaemin Kim, Jaehyun Park, Yeonhong Park, Wanju Doh, Namhoon Kim, Tae Jun Ham, Jae W. Lee, Jung Ho Ahn

Abstract: Since its public introduction in the mid-2010s, the Row Hammer (RH) phenomenon has drawn significant attention from the research community due to its security implications. Although many RH-protection schemes have been proposed by processor vendors, DRAM manufacturers, and academia, they still have shortcomings. Solutions implemented in the memory controller (MC) incur increasingly higher costs du… ▽ More Since its public introduction in the mid-2010s, the Row Hammer (RH) phenomenon has drawn significant attention from the research community due to its security implications. Although many RH-protection schemes have been proposed by processor vendors, DRAM manufacturers, and academia, they still have shortcomings. Solutions implemented in the memory controller (MC) incur increasingly higher costs due to their conservative design for the worst case in terms of the number of DRAM banks and RH threshold to support. Meanwhile, DRAM-side implementation either has a limited time margin for RH-protection measures or requires extensive modifications to the standard DRAM interface. Recently, a new command for RH-protection has been introduced in the DDR5/LPDDR5 standards, referred to as refresh management (RFM). RFM enables the separation of the tasks for RHprotection to both MC and DRAM by having the former generate an RFM command at a specific activation frequency and the latter take proper RH-protection measures within a given time window. Although promising, no existing study presents and analyzes RFM-based solutions for RH-protection. In this paper, we propose Mithril, the first RFM interfacecompatible, DRAM-MC cooperative RH-protection scheme providing deterministic protection guarantees. Mithril has minimal energy overheads for common use cases without adversarial memory access patterns. We also introduce Mithril+, an optional extension to provide minimal performance overheads at the expense of a tiny modification to the MC, while utilizing existing DRAM commands. △ Less

Submitted 24 December, 2021; v1 submitted 15 August, 2021; originally announced August 2021.

Comments: 16 pages, to appear in HPCA 2022

arXiv:2107.10167 [pdf, other]

Enumeration of Polyominoes & Polycubes Composed of Magnetic Cubes

Authors: Yitong Lu, Anuruddha Bhattacharjee, Daniel Biediger, Min Jun Kim, Aaron T. Becker

Abstract: This paper examines a family of designs for magnetic cubes and counts how many configurations are possible for each design as a function of the number of modules. Magnetic modular cubes are cubes with magnets arranged on their faces. The magnets are positioned so that each face has either magnetic south or north pole outward. Moreover, we require that the net magnetic moment of the cube passes t… ▽ More This paper examines a family of designs for magnetic cubes and counts how many configurations are possible for each design as a function of the number of modules. Magnetic modular cubes are cubes with magnets arranged on their faces. The magnets are positioned so that each face has either magnetic south or north pole outward. Moreover, we require that the net magnetic moment of the cube passes through the center of opposing faces. These magnetic arrangements enable coupling when cube faces with opposite polarity are brought in close proximity and enable moving the cubes by controlling the orientation of a global magnetic field. This paper investigates the 2D and 3D shapes that can be constructed by magnetic modular cubes, and describes all possible magnet arrangements that obey these rules. We select ten magnetic arrangements and assign a "colo"' to each of them for ease of visualization and reference. We provide a method to enumerate the number of unique polyominoes and polycubes that can be constructed from a given set of colored cubes. We use this method to enumerate all arrangements for up to 20 modules in 2D and 16 modules in 3D. We provide a motion planner for 2D assembly and through simulations compare which arrangements require fewer movements to generate and which arrangements are more common. Hardware demonstrations explore the self-assembly and disassembly of these modules in 2D and 3D. △ Less

Submitted 21 July, 2021; originally announced July 2021.

Comments: 8 pages, 9 figures, 2 tables

arXiv:2105.12342 [pdf, ps, other]

A data-driven approach to beating SAA out-of-sample

Authors: Jun-ya Gotoh, Michael Jong Kim, Andrew E. B. Lim

Abstract: While solutions of Distributionally Robust Optimization (DRO) problems can sometimes have a higher out-of-sample expected reward than the Sample Average Approximation (SAA), there is no guarantee. In this paper, we introduce a class of Distributionally Optimistic Optimization (DOO) models, and show that it is always possible to ``beat" SAA out-of-sample if we consider not just worst-case (DRO) mod… ▽ More While solutions of Distributionally Robust Optimization (DRO) problems can sometimes have a higher out-of-sample expected reward than the Sample Average Approximation (SAA), there is no guarantee. In this paper, we introduce a class of Distributionally Optimistic Optimization (DOO) models, and show that it is always possible to ``beat" SAA out-of-sample if we consider not just worst-case (DRO) models but also best-case (DOO) ones. We also show, however, that this comes at a cost: Optimistic solutions are more sensitive to model error than either worst-case or SAA optimizers, and hence are less robust and calibrating the worst- or best-case model to outperform SAA may be difficult when data is limited. △ Less

Submitted 11 June, 2023; v1 submitted 26 May, 2021; originally announced May 2021.

Comments: 25 pages, 2 page bibliography, 2 Figures, 12 page Appendix

MSC Class: 90C17; 90C31; 93B35; 90C47; 90B50; 62G35; 62K25;

arXiv:2003.11509 [pdf, other]

Visual-Inertial Telepresence for Aerial Manipulation

Authors: Jongseok Lee, Ribin Balachandran, Yuri S. Sarkisov, Marco De Stefano, Andre Coelho, Kashmira Shinde, Min Jun Kim, Rudolph Triebel, Konstantin Kondak

Abstract: This paper presents a novel telepresence system for enhancing aerial manipulation capabilities. It involves not only a haptic device, but also a virtual reality that provides a 3D visual feedback to a remotely-located teleoperator in real-time. We achieve this by utilizing onboard visual and inertial sensors, an object tracking algorithm and a pre-generated object database. As the virtual reality… ▽ More This paper presents a novel telepresence system for enhancing aerial manipulation capabilities. It involves not only a haptic device, but also a virtual reality that provides a 3D visual feedback to a remotely-located teleoperator in real-time. We achieve this by utilizing onboard visual and inertial sensors, an object tracking algorithm and a pre-generated object database. As the virtual reality has to closely match the real remote scene, we propose an extension of a marker tracking algorithm with visual-inertial odometry. Both indoor and outdoor experiments show benefits of our proposed system in achieving advanced aerial manipulation tasks, namely grasping, placing, force exertion and peg-in-hole insertion. △ Less

Submitted 20 June, 2020; v1 submitted 25 March, 2020; originally announced March 2020.

Comments: Accepted to International Conference on Robotics and Automation (ICRA) 2020, IEEE copyright, 8 pages, 10 figures

arXiv:2003.00472 [pdf, other]

Optimal Oscillation Damping Control of cable-Suspended Aerial Manipulator with a Single IMU Sensor

Authors: Yuri S. Sarkisov, Min Jun Kim, Andre Coelho, Dzmitry Tsetserukou, Christian Ott, Konstantin Kondak

Abstract: This paper presents a design of oscillation damping control for the cable-Suspended Aerial Manipulator (SAM). The SAM is modeled as a double pendulum, and it can generate a body wrench as a control action. The main challenge is the fact that there is only one onboard IMU sensor which does not provide full information on the system state. To overcome this difficulty, we design a controller motivate… ▽ More This paper presents a design of oscillation damping control for the cable-Suspended Aerial Manipulator (SAM). The SAM is modeled as a double pendulum, and it can generate a body wrench as a control action. The main challenge is the fact that there is only one onboard IMU sensor which does not provide full information on the system state. To overcome this difficulty, we design a controller motivated by a simplified SAM model. The proposed controller is very simple yet robust to model uncertainties. Moreover, we propose a gain tuning rule by formulating the proposed controller in the form of output feedback linear quadratic regulation problem. Consequently, it is possible to quickly dampen oscillations with minimal energy consumption. The proposed approach is validated through simulations and experiments. △ Less

Submitted 1 March, 2020; originally announced March 2020.

arXiv:1907.00553 [pdf, other]

Model-free Friction Observers for Flexible Joint Robots with Torque Measurements

Authors: Min Jun Kim, Fabian Beck, Christian Ott, Alin Albu-Schaeffer

Abstract: This paper tackles a friction compensation problem without using a friction model. The unique feature of the proposed friction observer is that the nominal motor-side signal is fed back into the controller instead of the measured signal. By doing so, asymptotic stability and passivity of the controller are maintained. Another advantage of the proposed observer is that it provides a clear understan… ▽ More This paper tackles a friction compensation problem without using a friction model. The unique feature of the proposed friction observer is that the nominal motor-side signal is fed back into the controller instead of the measured signal. By doing so, asymptotic stability and passivity of the controller are maintained. Another advantage of the proposed observer is that it provides a clear understanding for the stiction compensation which is hard to be captured in model-free approaches. This allows to design observers that do not overcompensate for the stiction. The proposed scheme is validated through simulations and experiments. △ Less

Submitted 1 July, 2019; originally announced July 2019.

Comments: IEEE Transactions on Robotics

arXiv:1904.08833 [pdf, other]

A Passivity-based Nonlinear Admittance Control with Application to Powered Upper-limb Control under Unknown Environmental Interactions

Authors: Min Jun Kim, Woongyong Lee, Jae Yeon Choi, Goobong Chung, Kyung-Lyong Han, Il Seop Choi, Christian Ott, Wan Kyun Chung

Abstract: This paper presents an admittance controller based on the passivity theory for a powered upper-limb exoskeleton robot which is governed by the nonlinear equation of motion. Passivity allows us to include a human operator and environmental interaction in the control loop. The robot interacts with the human operator via F/T sensor and interacts with the environment mainly via end-effectors. Although… ▽ More This paper presents an admittance controller based on the passivity theory for a powered upper-limb exoskeleton robot which is governed by the nonlinear equation of motion. Passivity allows us to include a human operator and environmental interaction in the control loop. The robot interacts with the human operator via F/T sensor and interacts with the environment mainly via end-effectors. Although the environmental interaction cannot be detected by any sensors (hence unknown), passivity allows us to have natural interaction. An analysis shows that the behavior of the actual system mimics that of a nominal model as the control gain goes to infinity, which implies that the proposed approach is an admittance controller. However, because the control gain cannot grow infinitely in practice, the performance limitation according to the achievable control gain is also analyzed. The result of this analysis indicates that the performance in the sense of infinite norm increases linearly with the control gain. In the experiments, the proposed properties were verified using 1 degree-of-freedom testbench, and an actual powered upper-limb exoskeleton was used to lift and maneuver the unknown payload. △ Less

Submitted 18 April, 2019; originally announced April 2019.

Comments: Accepted in IEEE/ASME Transactions on Mechatronics (T-MECH)

arXiv:1903.02426 [pdf, other]

Development of SAM: cable-Suspended Aerial Manipulator

Authors: Yuri S. Sarkisov, Min Jun Kim, Davide Bicego, Dzmitry Tsetserukou, Christian Ott, Antonio Franchi, Konstantin Kondak

Abstract: High risk of a collision between rotor blades and the obstacles in a complex environment imposes restrictions on the aerial manipulators. To solve this issue, a novel system cable-Suspended Aerial Manipulator (SAM) is presented in this paper. Instead of attaching a robotic manipulator directly to an aerial carrier, it is mounted on an active platform which is suspended on the carrier by means of a… ▽ More High risk of a collision between rotor blades and the obstacles in a complex environment imposes restrictions on the aerial manipulators. To solve this issue, a novel system cable-Suspended Aerial Manipulator (SAM) is presented in this paper. Instead of attaching a robotic manipulator directly to an aerial carrier, it is mounted on an active platform which is suspended on the carrier by means of a cable. As a result, higher safety can be achieved because the aerial carrier can keep a distance from the obstacles. For self-stabilization, the SAM is equipped with two actuation systems: winches and propulsion units. This paper presents an overview of the SAM including the concept behind, hardware realization, control strategy, and the first experimental results. △ Less

Submitted 6 March, 2019; originally announced March 2019.

Comments: Accepted to International Conference on Robotics and Automation (ICRA) 2019, IEEE copyright, 7 pages, 14 figures

arXiv:1901.07213 [pdf]

doi 10.1109/ACCESS.2019.2960371

Reducing the Model Variance of a Rectal Cancer Segmentation Network

Authors: Joohyung Lee, Ji Eun Oh, Min Ju Kim, Bo Yun Hur, Dae Kyung Sohn

Abstract: In preoperative imaging, the demarcation of rectal cancer with magnetic resonance images provides an important basis for cancer staging and treatment planning. Recently, deep learning has greatly improved the state-of-the-art method in automatic segmentation. However, limitations in data availability in the medical field can cause large variance and consequent overfitting to medical image segmenta… ▽ More In preoperative imaging, the demarcation of rectal cancer with magnetic resonance images provides an important basis for cancer staging and treatment planning. Recently, deep learning has greatly improved the state-of-the-art method in automatic segmentation. However, limitations in data availability in the medical field can cause large variance and consequent overfitting to medical image segmentation networks. In this study, we propose methods to reduce the model variance of a rectal cancer segmentation network by adding a rectum segmentation task and performing data augmentation; the geometric correlation between the rectum and rectal cancer motivated the former approach. Moreover, we propose a method to perform a bias-variance analysis within an arbitrary region-of-interest (ROI) of a segmentation network, which we applied to assess the efficacy of our approaches in reducing model variance. As a result, adding a rectum segmentation task reduced the model variance of the rectal cancer segmentation network within tumor regions by a factor of 0.90; data augmentation further reduced the variance by a factor of 0.89. These approaches also reduced the training duration by a factor of 0.96 and a further factor of 0.78, respectively. Our approaches will improve the quality of rectal cancer staging by increasing the accuracy of its automatic demarcation and by providing rectum boundary information since rectal cancer staging requires the demarcation of both rectum and rectal cancer. Besides such clinical benefits, our method also enables segmentation networks to be assessed with bias-variance analysis within an arbitrary ROI, such as a cancerous region. △ Less

Submitted 30 December, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

Comments: published at IEEE ACCESS

Journal ref: IEEE Access, vol. 7, Issue. 1, pp. 182725-182733, 2019

arXiv:1812.05015 [pdf, ps, other]

McNie2-Gabidulin: An improvement of McNie public key encryption using Gabidulin code

Authors: Jon-Lark Kim, Young-Sik Kim, Lucky Galvez, Myeong Jae Kim

Abstract: McNie is a code-based public key encryption scheme submitted as a candidate to the NIST Post-Quantum Cryptography standardization. In this paper, we present McNie2-Gabidulin, an improvement of McNie. By using Gabidulin code, we eliminate the decoding failure, which is one of the limitations of the McNie public key cryptosystem that uses LRPC codes. We prove that this new cryptosystem is IND-CPA se… ▽ More McNie is a code-based public key encryption scheme submitted as a candidate to the NIST Post-Quantum Cryptography standardization. In this paper, we present McNie2-Gabidulin, an improvement of McNie. By using Gabidulin code, we eliminate the decoding failure, which is one of the limitations of the McNie public key cryptosystem that uses LRPC codes. We prove that this new cryptosystem is IND-CPA secure. Suggested parameters are also given which provides low key sizes compared to other known code based cryptosystems with zero decryption failure probability. △ Less

Submitted 12 December, 2018; originally announced December 2018.

MSC Class: 94B05; 94A60

arXiv:1812.05008 [pdf, ps, other]

McNie: A code-based public-key cryptosystem

Authors: Jon-Lark Kim, Young-Sik Kim, Lucky Galvez, Myeong Jae Kim, Nari Lee

Abstract: In this paper, we suggest a code-based public key encryption scheme, called McNie. McNie is a hybrid version of the McEliece and Niederreiter cryptosystems and its security is reduced to the hard problem of syndrome decoding. The public key involves a random generator matrix which is also used to mask the code used in the secret key. This makes the system safer against known structural attacks. In… ▽ More In this paper, we suggest a code-based public key encryption scheme, called McNie. McNie is a hybrid version of the McEliece and Niederreiter cryptosystems and its security is reduced to the hard problem of syndrome decoding. The public key involves a random generator matrix which is also used to mask the code used in the secret key. This makes the system safer against known structural attacks. In particular, we apply rank-metric codes to McNie. △ Less

Submitted 26 January, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

MSC Class: 94B05; 94A60

arXiv:1809.08819 [pdf, other]

Oscillation Damping Control of Pendulum-like Manipulation Platform using Moving Masses

Authors: Min Jun Kim, Jianjie Lin, Konstantin Kondak, Dongheui Lee, Christian Ott

Abstract: This paper presents an approach to damp out the oscillatory motion of the pendulum-like hanging platform on which a robotic manipulator is mounted. To this end, moving masses were installed on top of the platform. In this paper, asymptotic stability of the platform (which implies oscillation damping) is achieved by designing reference acceleration of the moving masses properly. A main feature of t… ▽ More This paper presents an approach to damp out the oscillatory motion of the pendulum-like hanging platform on which a robotic manipulator is mounted. To this end, moving masses were installed on top of the platform. In this paper, asymptotic stability of the platform (which implies oscillation damping) is achieved by designing reference acceleration of the moving masses properly. A main feature of this work is that we can achieve asymptotic stability of not only the platform, but also the moving masses, which may be challenging due to the under-actuation nature. The proposed scheme is validated by the simulation studies. △ Less

Submitted 24 September, 2018; originally announced September 2018.

Comments: IFAC Symposium on Robot Control (SYROCO) 2018

arXiv:1808.03037 [pdf, other]

Passive Compliance Control of Aerial Manipulators

Authors: Min Jun Kim, Ribin Balachandran, Marco De Stefano, Konstantin Kondak, Christian Ott

Abstract: This paper presents a passive compliance control for aerial manipulators to achieve stable environmental interactions. The main challenge is the absence of actuation along body-planar directions of the aerial vehicle which might be required during the interaction to preserve passivity. The controller proposed in this paper guarantees passivity of the manipulator through a proper choice of end-effe… ▽ More This paper presents a passive compliance control for aerial manipulators to achieve stable environmental interactions. The main challenge is the absence of actuation along body-planar directions of the aerial vehicle which might be required during the interaction to preserve passivity. The controller proposed in this paper guarantees passivity of the manipulator through a proper choice of end-effector coordinates, and that of vehicle fuselage is guaranteed by exploiting time domain passivity technique. Simulation studies validate the proposed approach. △ Less

Submitted 9 August, 2018; originally announced August 2018.

Comments: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018

arXiv:1207.3127 [pdf, other]

Tracking Tetrahymena Pyriformis Cells using Decision Trees

Authors: Quan Wang, Yan Ou, A. Agung Julius, Kim L. Boyer, Min Jun Kim

Abstract: Matching cells over time has long been the most difficult step in cell tracking. In this paper, we approach this problem by recasting it as a classification problem. We construct a feature set for each cell, and compute a feature difference vector between a cell in the current frame and a cell in a previous frame. Then we determine whether the two cells represent the same cell over time by trainin… ▽ More Matching cells over time has long been the most difficult step in cell tracking. In this paper, we approach this problem by recasting it as a classification problem. We construct a feature set for each cell, and compute a feature difference vector between a cell in the current frame and a cell in a previous frame. Then we determine whether the two cells represent the same cell over time by training decision trees as our binary classifiers. With the output of decision trees, we are able to formulate an assignment problem for our cell association task and solve it using a modified version of the Hungarian algorithm. △ Less

Submitted 12 July, 2012; originally announced July 2012.

Comments: 21st International Conference on Pattern Recognition, 2012

Showing 1–41 of 41 results for author: Kim, M J