-
Extraction of Maternal and fetal ECG in a non-invasive way from abdominal ECG recordings using modified Progressive FastICA Peel-off
Authors:
Yao Li,
Xuanyu Luo,
Haowen Zhao,
Jiawen Cui,
Yangfan She,
Dongfang Li,
Lai Jiang,
Xu Zhang
Abstract:
The non-invasive abdominal electrocardiogram (AECG) gives a non-invasive way to monitor fetal well-being during pregnancy. Due to the overlap with maternal ECG (MECG) as well as potential noises from other sources, it is challenging to extract weak fetal ECG (FECG) using surface electrodes. Taking advantage of precise source separation capability of the FastICA approach combined with its constrain…
▽ More
The non-invasive abdominal electrocardiogram (AECG) gives a non-invasive way to monitor fetal well-being during pregnancy. Due to the overlap with maternal ECG (MECG) as well as potential noises from other sources, it is challenging to extract weak fetal ECG (FECG) using surface electrodes. Taking advantage of precise source separation capability of the FastICA approach combined with its constrained version specific to FECG, with weak source extraction capability warranted by the peel-off strategy and FECG waveform reconstruction ability ensured by singular value decomposition (SVD) method, a novel framework for FECG extraction from AECG recordings is presented in this paper. Specifically, a periodic constrained FastICA(pcFastICA) was developed to improve the precision of examining and correcting FECG source signals, based on the statistical characteristics of continuous and repetitive ECG emissions. Additionally, a successive judgement algorithm is designed to selected the optimal maternal and fetal ECG. The performance of the proposed method was examined on public datasets, synthetic data and clinical data, with an F1-scores for FECG extraction on ADFECG and NIFECGA dataset of 99.71% and 99.36%, on synthetic data with the highest noise level of 98.77%, on clinical data of 98.09%, which are all superior to other comparative methods. The results indicates that our proposed method has potential and effectiveness to separate weak FECG from multichannel AECG with high precision in high noise condition, which is of vital importance for ensuring the safety of both the fetus and the mother, as well as the advancement of artificial intelligent clinical monitoring.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
MAGIC: Modular Auto-encoder for Generalisable Model Inversion with Bias Corrections
Authors:
Yihang She,
Clement Atzberger,
Andrew Blake,
Adriano Gualandi,
Srinivasan Keshav
Abstract:
Scientists often model physical processes to understand the natural world and uncover the causation behind observations. Due to unavoidable simplification, discrepancies often arise between model predictions and actual observations, in the form of systematic biases, whose impact varies with model completeness. Classical model inversion methods such as Bayesian inference or regressive neural networ…
▽ More
Scientists often model physical processes to understand the natural world and uncover the causation behind observations. Due to unavoidable simplification, discrepancies often arise between model predictions and actual observations, in the form of systematic biases, whose impact varies with model completeness. Classical model inversion methods such as Bayesian inference or regressive neural networks tend either to overlook biases or make assumptions about their nature during data preprocessing, potentially leading to implausible results. Inspired by recent work in inverse graphics, we replace the decoder stage of a standard autoencoder with a physical model followed by a bias-correction layer. This generalisable approach simultaneously inverts the model and corrects its biases in an end-to-end manner without making strong assumptions about the nature of the biases. We demonstrate the effectiveness of our approach using two physical models from disparate domains: a complex radiative transfer model from remote sensing; and a volcanic deformation model from geodesy. Our method matches or surpasses results from classical approaches without requiring biases to be explicitly filtered out, suggesting an effective pathway for understanding the causation of various physical processes.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
LeTac-MPC: Learning Model Predictive Control for Tactile-reactive Grasping
Authors:
Zhengtong Xu,
Yu She
Abstract:
Grasping is a crucial task in robotics, necessitating tactile feedback and reactive grasping adjustments for robust grasping of objects under various conditions and with differing physical properties. In this paper, we introduce LeTac-MPC, a learning-based model predictive control (MPC) for tactile-reactive grasping. Our approach enables the gripper grasp objects with different physical properties…
▽ More
Grasping is a crucial task in robotics, necessitating tactile feedback and reactive grasping adjustments for robust grasping of objects under various conditions and with differing physical properties. In this paper, we introduce LeTac-MPC, a learning-based model predictive control (MPC) for tactile-reactive grasping. Our approach enables the gripper grasp objects with different physical properties on dynamic and force-interactive tasks. We utilize a vision-based tactile sensor, GelSight, which is capable of perceiving high-resolution tactile feedback that contains the information of physical properties and states of the grasped object. LeTac-MPC incorporates a differentiable MPC layer designed to model the embeddings extracted by a neural network (NN) from tactile feedback. This design facilitates convergent and robust grasping control at a frequency of 25 Hz. We propose a fully automated data collection pipeline and collect a dataset only using standardized blocks with different physical properties. However, our trained controller can generalize to daily objects with different sizes, shapes, materials, and textures. Experimental results demonstrate the effectiveness and robustness of the proposed approach. We compare LeTac-MPC with two purely model-based tactile-reactive controllers (MPC and PD) and open-loop grasping. Our results show that LeTac-MPC has the best performance on dynamic and force-interactive tasks and the best generalization ability. We release our code and dataset at https://github.com/ZhengtongXu/LeTac-MPC.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
From Spectra to Biophysical Insights: End-to-End Learning with a Biased Radiative Transfer Model
Authors:
Yihang She,
Clement Atzberger,
Andrew Blake,
Srinivasan Keshav
Abstract:
Advances in machine learning have boosted the use of Earth observation data for climate change research. Yet, the interpretability of machine-learned representations remains a challenge, particularly in understanding forests' biophysical reactions to climate change. Traditional methods in remote sensing that invert radiative transfer models (RTMs) to retrieve biophysical variables from spectral da…
▽ More
Advances in machine learning have boosted the use of Earth observation data for climate change research. Yet, the interpretability of machine-learned representations remains a challenge, particularly in understanding forests' biophysical reactions to climate change. Traditional methods in remote sensing that invert radiative transfer models (RTMs) to retrieve biophysical variables from spectral data often fail to account for biases inherent in the RTM, especially for complex forests. We propose to integrate RTMs into an auto-encoder architecture, creating an end-to-end learning approach. Our method not only corrects biases in RTMs but also outperforms traditional techniques for variable retrieval like neural network regression. Furthermore, our framework has potential generally for inverting biased physical models. The code is available on https://github.com/yihshe/ai-refined-rtm.git.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
LeTO: Learning Constrained Visuomotor Policy with Differentiable Trajectory Optimization
Authors:
Zhengtong Xu,
Yu She
Abstract:
This paper introduces LeTO, a method for learning constrained visuomotor policy via differentiable trajectory optimization. Our approach uniquely integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and controlled fashion without extra modules.…
▽ More
This paper introduces LeTO, a method for learning constrained visuomotor policy via differentiable trajectory optimization. Our approach uniquely integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and controlled fashion without extra modules. Our method allows for the introduction of constraints information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This "gray box" method marries the optimization-based safety and interpretability with the powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and on the real robot. In simulation, LeTO achieves a success rate comparable to state-of-the-art imitation learning methods, but the generated trajectories are of less uncertainty, higher quality, and smoother. In real-world experiments, we deployed LeTO to handle constraints-critical tasks. The results show the effectiveness of LeTO comparing with state-of-the-art imitation learning approaches. We release our code at https://github.com/ZhengtongXu/LeTO.
△ Less
Submitted 18 March, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Robot Tape Manipulation for 3D Printing
Authors:
Nahid Tushar,
Rencheng Wu,
Yu She,
Wenchao Zhou,
Wan Shou
Abstract:
3D printing has enabled various applications using different forms of materials, such as filaments, sheets, and inks. Typically, during 3D printing, feedstocks are transformed into discrete building blocks and placed or deposited in a designated location similar to the manipulation and assembly of discrete objects. However, 3D printing of continuous and flexible tape (with the geometry between fil…
▽ More
3D printing has enabled various applications using different forms of materials, such as filaments, sheets, and inks. Typically, during 3D printing, feedstocks are transformed into discrete building blocks and placed or deposited in a designated location similar to the manipulation and assembly of discrete objects. However, 3D printing of continuous and flexible tape (with the geometry between filaments and sheets) without breaking or transformation remains underexplored and challenging. Here, we report the design and implementation of a customized end-effector, i.e., tape print module (TPM), to realize robot tape manipulation for 3D printing by leveraging the tension formed on the tape between two endpoints. We showcase the feasibility of manufacturing representative 2D and 3D structures while utilizing conductive copper tape for various electronic applications, such as circuits and sensors. We believe this manipulation strategy could unlock the potential of other tape materials for manufacturing, including packaging tape and carbon fiber prepreg tape, and inspire new mechanisms for robot manipulation, 3D printing, and packaging.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Transfer Learning across Different Chemical Domains: Virtual Screening of Organic Materials with Deep Learning Models Pretrained on Small Molecule and Chemical Reaction Data
Authors:
Chengwei Zhang,
Yushuang Zhai,
Ziyang Gong,
Hongliang Duan,
Yuan-Bin She,
Yun-Fang Yang,
An Su
Abstract:
Machine learning is becoming a preferred method for the virtual screening of organic materials due to its cost-effectiveness over traditional computationally demanding techniques. However, the scarcity of labeled data for organic materials poses a significant challenge for training advanced machine learning models. This study showcases the potential of utilizing databases of drug-like small molecu…
▽ More
Machine learning is becoming a preferred method for the virtual screening of organic materials due to its cost-effectiveness over traditional computationally demanding techniques. However, the scarcity of labeled data for organic materials poses a significant challenge for training advanced machine learning models. This study showcases the potential of utilizing databases of drug-like small molecules and chemical reactions to pretrain the BERT model, enhancing its performance in the virtual screening of organic materials. By fine-tuning the BERT models with data from five virtual screening tasks, the version pretrained with the USPTO-SMILES dataset achieved R2 scores exceeding 0.94 for three tasks and over 0.81 for two others. This performance surpasses that of models pretrained on the small molecule or organic materials databases and outperforms three traditional machine learning models trained directly on virtual screening data. The success of the USPTO-SMILES pretrained BERT model can be attributed to the diverse array of organic building blocks in the USPTO database, offering a broader exploration of the chemical space. The study further suggests that accessing a reaction database with a wider range of reactions than the USPTO could further enhance model performance. Overall, this research validates the feasibility of applying transfer learning across different chemical domains for the efficient virtual screening of organic materials.
△ Less
Submitted 5 March, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Slow Kill for Big Data Learning
Authors:
Yiyuan She,
Jianhui Shen,
Adrian Barbu
Abstract:
Big-data applications often involve a vast number of observations and features, creating new challenges for variable selection and parameter estimation. This paper presents a novel technique called ``slow kill,'' which utilizes nonconvex constrained optimization, adaptive $\ell_2$-shrinkage, and increasing learning rates. The fact that the problem size can decrease during the slow kill iterations…
▽ More
Big-data applications often involve a vast number of observations and features, creating new challenges for variable selection and parameter estimation. This paper presents a novel technique called ``slow kill,'' which utilizes nonconvex constrained optimization, adaptive $\ell_2$-shrinkage, and increasing learning rates. The fact that the problem size can decrease during the slow kill iterations makes it particularly effective for large-scale variable screening. The interaction between statistics and optimization provides valuable insights into controlling quantiles, stepsize, and shrinkage parameters in order to relax the regularity conditions required to achieve the desired level of statistical accuracy. Experimental results on real and synthetic data show that slow kill outperforms state-of-the-art algorithms in various situations while being computationally efficient for large-scale data.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
From Images to Features: Unbiased Morphology Classification via Variational Auto-Encoders and Domain Adaptation
Authors:
Quanfeng Xu,
Shiyin Shen,
Rafael S. de Souza,
Mi Chen,
Renhao Ye,
Yumei She,
Zhu Chen,
Emille E. O. Ishida,
Alberto Krone-Martins,
Rupesh Durgesh
Abstract:
We present a novel approach for the dimensionality reduction of galaxy images by leveraging a combination of variational auto-encoders (VAE) and domain adaptation (DA). We demonstrate the effectiveness of this approach using a sample of low redshift galaxies with detailed morphological type labels from the Galaxy-Zoo DECaLS project. We show that 40-dimensional latent variables can effectively repr…
▽ More
We present a novel approach for the dimensionality reduction of galaxy images by leveraging a combination of variational auto-encoders (VAE) and domain adaptation (DA). We demonstrate the effectiveness of this approach using a sample of low redshift galaxies with detailed morphological type labels from the Galaxy-Zoo DECaLS project. We show that 40-dimensional latent variables can effectively reproduce most morphological features in galaxy images. To further validate the effectiveness of our approach, we utilised a classical random forest (RF) classifier on the 40-dimensional latent variables to make detailed morphology feature classifications. This approach performs similarly to a direct neural network application on galaxy images. We further enhance our model by tuning the VAE network via DA using galaxies in the overlapping footprint of DECaLS and BASS+MzLS, enabling the unbiased application of our model to galaxy images in both surveys. We observed that DA led to even better morphological feature extraction and classification performance. Overall, this combination of VAE and DA can be applied to achieve image dimensionality reduction, defect image identification, and morphology classification in large optical surveys.
△ Less
Submitted 13 October, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Visuotactile Affordances for Cloth Manipulation with Local Control
Authors:
Neha Sunil,
Shaoxiong Wang,
Yu She,
Edward Adelson,
Alberto Rodriguez
Abstract:
Cloth in the real world is often crumpled, self-occluded, or folded in on itself such that key regions, such as corners, are not directly graspable, making manipulation difficult. We propose a system that leverages visual and tactile perception to unfold the cloth via grasping and sliding on edges. By doing so, the robot is able to grasp two adjacent corners, enabling subsequent manipulation tasks…
▽ More
Cloth in the real world is often crumpled, self-occluded, or folded in on itself such that key regions, such as corners, are not directly graspable, making manipulation difficult. We propose a system that leverages visual and tactile perception to unfold the cloth via grasping and sliding on edges. By doing so, the robot is able to grasp two adjacent corners, enabling subsequent manipulation tasks like folding or hanging. As components of this system, we develop tactile perception networks that classify whether an edge is grasped and estimate the pose of the edge. We use the edge classification network to supervise a visuotactile edge grasp affordance network that can grasp edges with a 90% success rate. Once an edge is grasped, we demonstrate that the robot can slide along the cloth to the adjacent corner using tactile pose estimation/control in real time. See http://nehasunil.com/visuotactile/visuotactile.html for videos.
△ Less
Submitted 9 December, 2022;
originally announced December 2022.
-
Fast Hierarchical Learning for Few-Shot Object Detection
Authors:
Yihang She,
Goutam Bhat,
Martin Danelljan,
Fisher Yu
Abstract:
Transfer learning based approaches have recently achieved promising results on the few-shot detection task. These approaches however suffer from ``catastrophic forgetting'' issue due to finetuning of base detector, leading to sub-optimal performance on the base classes. Furthermore, the slow convergence rate of stochastic gradient descent (SGD) results in high latency and consequently restricts re…
▽ More
Transfer learning based approaches have recently achieved promising results on the few-shot detection task. These approaches however suffer from ``catastrophic forgetting'' issue due to finetuning of base detector, leading to sub-optimal performance on the base classes. Furthermore, the slow convergence rate of stochastic gradient descent (SGD) results in high latency and consequently restricts real-time applications. We tackle the aforementioned issues in this work. We pose few-shot detection as a hierarchical learning problem, where the novel classes are treated as the child classes of existing base classes and the background class. The detection heads for the novel classes are then trained using a specialized optimization strategy, leading to significantly lower training times compared to SGD. Our approach obtains competitive novel class performance on few-shot MS-COCO benchmark, while completely retaining the performance of the initial model on the base classes. We further demonstrate the application of our approach to a new class-refined few-shot detection task.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Unsupervised multi-branch Capsule for Hyperspectral and LiDAR classification
Authors:
Quanfeng Xu,
Yi Tang,
Yumei She
Abstract:
With the convenient availability of remote sensing data, how to make models to interpret complex remote sensing data attracts wide attention. In remote sensing data, hyperspectral images contain spectral information and LiDAR contains elevation information. Hence, more explorations are warranted to better fuse the features of different source data. In this paper, we introduce semantic understandin…
▽ More
With the convenient availability of remote sensing data, how to make models to interpret complex remote sensing data attracts wide attention. In remote sensing data, hyperspectral images contain spectral information and LiDAR contains elevation information. Hence, more explorations are warranted to better fuse the features of different source data. In this paper, we introduce semantic understanding to dynamically fuse data from two different sources, extract features of HSI and LiDAR through different capsule network branches and improve self-supervised loss and random rigid rotation in Canonical Capsule to a high-dimensional situation. Canonical Capsule computes the capsule decomposition of objects by permutation-equivariant attention and the process is self-supervised by training pairs of randomly rotated objects. After fusing the features of HSI and LiDAR with semantic understanding, the unsupervised extraction of spectral-spatial-elevation fusion features is achieved. With two real-world examples of HSI and LiDAR fused, the experimental results show that the proposed multi-branch high-dimensional canonical capsule algorithm can be effective for semantic understanding of HSI and LiDAR. It indicates that the model can extract HSI and LiDAR data features effectively as opposed to existing models for unsupervised extraction of multi-source RS data.
△ Less
Submitted 8 November, 2022; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Supervised Multivariate Learning with Simultaneous Feature Auto-grouping and Dimension Reduction
Authors:
Yiyuan She,
Jiahui Shen,
Chao Zhang
Abstract:
Modern high-dimensional methods often adopt the "bet on sparsity" principle, while in supervised multivariate learning statisticians may face "dense" problems with a large number of nonzero coefficients. This paper proposes a novel clustered reduced-rank learning (CRL) framework that imposes two joint matrix regularizations to automatically group the features in constructing predictive factors. CR…
▽ More
Modern high-dimensional methods often adopt the "bet on sparsity" principle, while in supervised multivariate learning statisticians may face "dense" problems with a large number of nonzero coefficients. This paper proposes a novel clustered reduced-rank learning (CRL) framework that imposes two joint matrix regularizations to automatically group the features in constructing predictive factors. CRL is more interpretable than low-rank modeling and relaxes the stringent sparsity assumption in variable selection. In this paper, new information-theoretical limits are presented to reveal the intrinsic cost of seeking for clusters, as well as the blessing from dimensionality in multivariate learning. Moreover, an efficient optimization algorithm is developed, which performs subspace learning and clustering with guaranteed convergence. The obtained fixed-point estimators, though not necessarily globally optimal, enjoy the desired statistical accuracy beyond the standard likelihood setup under some regularity conditions. Moreover, a new kind of information criterion, as well as its scale-free form, is proposed for cluster and rank selection, and has a rigorous theoretical support without assuming an infinite sample size. Extensive simulations and real-data experiments demonstrate the statistical accuracy and interpretability of the proposed method.
△ Less
Submitted 9 February, 2022; v1 submitted 17 December, 2021;
originally announced December 2021.
-
Learning Generalizable Vision-Tactile Robotic Grasping Strategy for Deformable Objects via Transformer
Authors:
Yunhai Han,
Kelin Yu,
Rahul Batra,
Nathan Boyd,
Chaitanya Mehta,
Tuo Zhao,
Yu She,
Seth Hutchinson,
Ye Zhao
Abstract:
Reliable robotic grasping, especially with deformable objects such as fruits, remains a challenging task due to underactuated contact interactions with a gripper, unknown object dynamics and geometries. In this study, we propose a Transformer-based robotic grasping framework for rigid grippers that leverage tactile and visual information for safe object grasping. Specifically, the Transformer mode…
▽ More
Reliable robotic grasping, especially with deformable objects such as fruits, remains a challenging task due to underactuated contact interactions with a gripper, unknown object dynamics and geometries. In this study, we propose a Transformer-based robotic grasping framework for rigid grippers that leverage tactile and visual information for safe object grasping. Specifically, the Transformer models learn physical feature embeddings with sensor feedback through performing two pre-defined explorative actions (pinching and sliding) and predict a grasping outcome through a multilayer perceptron (MLP) with a given grasping strength. Using these predictions, the gripper predicts a safe grasping strength via inference. Compared with convolutional recurrent networks, the Transformer models can capture the long-term dependencies across the image sequences and process spatial-temporal features simultaneously. We first benchmark the Transformer models on a public dataset for slip detection. Following that, we show that the Transformer models outperform a CNN+LSTM model in terms of grasping accuracy and computational efficiency. We also collect a new fruit grasping dataset and conduct online grasping experiments using the proposed framework for both seen and unseen fruits. {In addition, we extend our model to objects with different shapes and demonstrate the effectiveness of our pre-trained model trained on our large-scale fruit dataset. Our codes and dataset are public on GitHub.
△ Less
Submitted 23 July, 2023; v1 submitted 12 December, 2021;
originally announced December 2021.
-
Segmentation of Roads in Satellite Images using specially modified U-Net CNNs
Authors:
Jonas Bokstaller,
Yihang She,
Zhehan Fu,
Tommaso Macrì
Abstract:
The image classification problem has been deeply investigated by the research community, with computer vision algorithms and with the help of Neural Networks. The aim of this paper is to build an image classifier for satellite images of urban scenes that identifies the portions of the images in which a road is located, separating these portions from the rest. Unlike conventional computer vision al…
▽ More
The image classification problem has been deeply investigated by the research community, with computer vision algorithms and with the help of Neural Networks. The aim of this paper is to build an image classifier for satellite images of urban scenes that identifies the portions of the images in which a road is located, separating these portions from the rest. Unlike conventional computer vision algorithms, convolutional neural networks (CNNs) provide accurate and reliable results on this task. Our novel approach uses a sliding window to extract patches out of the whole image, data augmentation for generating more training/testing data and lastly a series of specially modified U-Net CNNs. This proposed technique outperforms all other baselines tested in terms of mean F-score metric.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
GelSight Wedge: Measuring High-Resolution 3D Contact Geometry with a Compact Robot Finger
Authors:
Shaoxiong Wang,
Yu She,
Branden Romero,
Edward Adelson
Abstract:
Vision-based tactile sensors have the potential to provide important contact geometry to localize the objective with visual occlusion. However, it is challenging to measure high-resolution 3D contact geometry for a compact robot finger, to simultaneously meet optical and mechanical constraints. In this work, we present the GelSight Wedge sensor, which is optimized to have a compact shape for robot…
▽ More
Vision-based tactile sensors have the potential to provide important contact geometry to localize the objective with visual occlusion. However, it is challenging to measure high-resolution 3D contact geometry for a compact robot finger, to simultaneously meet optical and mechanical constraints. In this work, we present the GelSight Wedge sensor, which is optimized to have a compact shape for robot fingers, while achieving high-resolution 3D reconstruction. We evaluate the 3D reconstruction under different lighting configurations, and extend the method from 3 lights to 1 or 2 lights. We demonstrate the flexibility of the design by shrinking the sensor to the size of a human finger for fine manipulation tasks. We also show the effectiveness and potential of the reconstructed 3D geometry for pose tracking in the 3D space.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Digital Taxonomist: Identifying Plant Species in Community Scientists' Photographs
Authors:
Riccardo de Lutio,
Yihang She,
Stefano D'Aronco,
Stefania Russo,
Philipp Brun,
Jan D. Wegner,
Konrad Schindler
Abstract:
Automatic identification of plant specimens from amateur photographs could improve species range maps, thus supporting ecosystems research as well as conservation efforts. However, classifying plant specimens based on image data alone is challenging: some species exhibit large variations in visual appearance, while at the same time different species are often visually similar; additionally, specie…
▽ More
Automatic identification of plant specimens from amateur photographs could improve species range maps, thus supporting ecosystems research as well as conservation efforts. However, classifying plant specimens based on image data alone is challenging: some species exhibit large variations in visual appearance, while at the same time different species are often visually similar; additionally, species observations follow a highly imbalanced, long-tailed distribution due to differences in abundance as well as observer biases. On the other hand, most species observations are accompanied by side information about the spatial, temporal and ecological context. Moreover, biological species are not an unordered list of classes but embedded in a hierarchical taxonomic structure. We propose a multimodal deep learning model that takes into account these additional cues in a unified framework. Our Digital Taxonomist is able to identify plant species in photographs better than a classifier trained on the image content alone, the performance gained is over 6 percent points in terms of accuracy.
△ Less
Submitted 5 October, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Hierarchical Classification of Pulmonary Lesions: A Large-Scale Radio-Pathomics Study
Authors:
Jiancheng Yang,
Mingze Gao,
Kaiming Kuang,
Bingbing Ni,
Yunlang She,
Dong Xie,
Chang Chen
Abstract:
Diagnosis of pulmonary lesions from computed tomography (CT) is important but challenging for clinical decision making in lung cancer related diseases. Deep learning has achieved great success in computer aided diagnosis (CADx) area for lung cancer, whereas it suffers from label ambiguity due to the difficulty in the radiological diagnosis. Considering that invasive pathological analysis serves as…
▽ More
Diagnosis of pulmonary lesions from computed tomography (CT) is important but challenging for clinical decision making in lung cancer related diseases. Deep learning has achieved great success in computer aided diagnosis (CADx) area for lung cancer, whereas it suffers from label ambiguity due to the difficulty in the radiological diagnosis. Considering that invasive pathological analysis serves as the clinical golden standard of lung cancer diagnosis, in this study, we solve the label ambiguity issue via a large-scale radio-pathomics dataset containing 5,134 radiological CT images with pathologically confirmed labels, including cancers (e.g., invasive/non-invasive adenocarcinoma, squamous carcinoma) and non-cancer diseases (e.g., tuberculosis, hamartoma). This retrospective dataset, named Pulmonary-RadPath, enables development and validation of accurate deep learning systems to predict invasive pathological labels with a non-invasive procedure, i.e., radiological CT scans. A three-level hierarchical classification system for pulmonary lesions is developed, which covers most diseases in cancer-related diagnosis. We explore several techniques for hierarchical classification on this dataset, and propose a Leaky Dense Hierarchy approach with proven effectiveness in experiments. Our study significantly outperforms prior arts in terms of data scales (6x larger), disease comprehensiveness and hierarchies. The promising results suggest the potentials to facilitate precision medicine.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Network Pruning via Annealing and Direct Sparsity Control
Authors:
Yangzi Guo,
Yiyuan She,
Adrian Barbu
Abstract:
Artificial neural networks (ANNs) especially deep convolutional networks are very popular these days and have been proved to successfully offer quite reliable solutions to many vision problems. However, the use of deep neural networks is widely impeded by their intensive computational and memory cost. In this paper, we propose a novel efficient network pruning method that is suitable for both non-…
▽ More
Artificial neural networks (ANNs) especially deep convolutional networks are very popular these days and have been proved to successfully offer quite reliable solutions to many vision problems. However, the use of deep neural networks is widely impeded by their intensive computational and memory cost. In this paper, we propose a novel efficient network pruning method that is suitable for both non-structured and structured channel-level pruning. Our proposed method tightens a sparsity constraint by gradually removing network parameters or filter channels based on a criterion and a schedule. The attractive fact that the network size keeps dropping throughout the iterations makes it suitable for the pruning of any untrained or pre-trained network. Because our method uses a $L_0$ constraint instead of the $L_1$ penalty, it does not introduce any bias in the training parameters or filter channels. Furthermore, the $L_0$ constraint makes it easy to directly specify the desired sparsity level during the network pruning process. Finally, experimental validation on extensive synthetic and real vision datasets show that the proposed method obtains better or competitive performance compared to other states of art network pruning methods.
△ Less
Submitted 26 July, 2020; v1 submitted 11 February, 2020;
originally announced February 2020.
-
Cable Manipulation with a Tactile-Reactive Gripper
Authors:
Yu She,
Shaoxiong Wang,
Siyuan Dong,
Neha Sunil,
Alberto Rodriguez,
Edward Adelson
Abstract:
Cables are complex, high dimensional, and dynamic objects. Standard approaches to manipulate them often rely on conservative strategies that involve long series of very slow and incremental deformations, or various mechanical fixtures such as clamps, pins or rings. We are interested in manipulating freely moving cables, in real time, with a pair of robotic grippers, and with no added mechanical co…
▽ More
Cables are complex, high dimensional, and dynamic objects. Standard approaches to manipulate them often rely on conservative strategies that involve long series of very slow and incremental deformations, or various mechanical fixtures such as clamps, pins or rings. We are interested in manipulating freely moving cables, in real time, with a pair of robotic grippers, and with no added mechanical constraints. The main contribution of this paper is a perception and control framework that moves in that direction, and uses real-time tactile feedback to accomplish the task of following a dangling cable. The approach relies on a vision-based tactile sensor, GelSight, that estimates the pose of the cable in the grip, and the friction forces during cable sliding. We achieve the behavior by combining two tactile-based controllers: 1) Cable grip controller, where a PD controller combined with a leaky integrator regulates the gripping force to maintain the frictional sliding forces close to a suitable value; and 2) Cable pose controller, where an LQR controller based on a learned linear model of the cable sliding dynamics keeps the cable centered and aligned on the fingertips to prevent the cable from falling from the grip. This behavior is possible by a reactive gripper fitted with GelSight-based high-resolution tactile sensors. The robot can follow one meter of cable in random configurations within 2-3 hand regrasps, adapting to cables of different materials and thicknesses. We demonstrate a robot grasping a headphone cable, sliding the fingers to the jack connector, and inserting it. To the best of our knowledge, this is the first implementation of real-time cable following without the aid of mechanical fixtures.
△ Less
Submitted 23 June, 2020; v1 submitted 2 October, 2019;
originally announced October 2019.
-
Exoskeleton-covered soft finger with vision-based proprioception and tactile sensing
Authors:
Yu She,
Sandra Q. Liu,
Peiyu Yu,
Edward Adelson
Abstract:
Soft robots offer significant advantages in adaptability, safety, and dexterity compared to conventional rigid-body robots. However, it is challenging to equip soft robots with accurate proprioception and tactile sensing due to their high flexibility and elasticity. In this work, we describe the development of a vision-based proprioceptive and tactile sensor for soft robots called GelFlex, which i…
▽ More
Soft robots offer significant advantages in adaptability, safety, and dexterity compared to conventional rigid-body robots. However, it is challenging to equip soft robots with accurate proprioception and tactile sensing due to their high flexibility and elasticity. In this work, we describe the development of a vision-based proprioceptive and tactile sensor for soft robots called GelFlex, which is inspired by previous GelSight sensing techniques. More specifically, we develop a novel exoskeleton-covered soft finger with embedded cameras and deep learning methods that enable high-resolution proprioceptive sensing and rich tactile sensing. To do so, we design features along the axial direction of the finger, which enable high-resolution proprioceptive sensing, and incorporate a reflective ink coating on the surface of the finger to enable rich tactile sensing. We design a highly underactuated exoskeleton with a tendon-driven mechanism to actuate the finger. Finally, we assemble 2 of the fingers together to form a robotic gripper and successfully perform a bar stock classification task, which requires both shape and tactile information. We train neural networks for proprioception and shape (box versus cylinder) classification using data from the embedded sensors. The proprioception CNN had over 99\% accuracy on our testing set (all six joint angles were within 1 degree of error) and had an average accumulative distance error of 0.77 mm during live testing, which is better than human finger proprioception. These proposed techniques offer soft robots the high-level ability to simultaneously perceive their proprioceptive state and peripheral environment, providing potential solutions for soft robots to solve everyday manipulation tasks. We believe the methods developed in this work can be widely applied to different designs and applications.
△ Less
Submitted 23 June, 2020; v1 submitted 2 October, 2019;
originally announced October 2019.
-
Logic could be learned from images
Authors:
Qian Guo,
Yuhua Qian,
Xinyan Liang,
Yanhong She,
Deyu Li,
Jiye Liang
Abstract:
Logic reasoning is a significant ability of human intelligence and also an important task in artificial intelligence. The existing logic reasoning methods, quite often, need to design some reasoning patterns beforehand. This has led to an interesting question: can logic reasoning patterns be directly learned from given data? The problem is termed as a data concept logic. In this study, a learning…
▽ More
Logic reasoning is a significant ability of human intelligence and also an important task in artificial intelligence. The existing logic reasoning methods, quite often, need to design some reasoning patterns beforehand. This has led to an interesting question: can logic reasoning patterns be directly learned from given data? The problem is termed as a data concept logic. In this study, a learning logic task from images, called a LiLi task, first is proposed. This task is to learn and reason the logic relation from images, without presetting any reasoning patterns. As a preliminary exploration, we design six LiLi data sets (Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication), in which each image is embedded with a n-digit number. It is worth noting that a learning model beforehand does not know the meaning of the n-digit numbers embedded in images and the relation between the input images and the output image. In order to tackle the task, in this work we use many typical neural network models and produce fruitful results. However, these models have the poor performances on the difficult logic task. For furthermore addressing this task, a novel network framework called a divide and conquer model by adding some label information is designed, achieving a high testing accuracy.
△ Less
Submitted 29 June, 2021; v1 submitted 5 August, 2019;
originally announced August 2019.
-
Artificial Noise Injection for Securing Single-Antenna Systems
Authors:
Biao He,
Yechao She,
Vincent K. N. Lau
Abstract:
We propose a novel artificial noise (AN) injection scheme for wireless systems over quasi-static fading channels, in which a single-antenna transmitter sends confidential messages to a half-duplex receiver in the presence of an eavesdropper. Different from classical AN injection schemes, which rely on a multi-antenna transmitter or external helpers, our proposed scheme is applicable to the scenari…
▽ More
We propose a novel artificial noise (AN) injection scheme for wireless systems over quasi-static fading channels, in which a single-antenna transmitter sends confidential messages to a half-duplex receiver in the presence of an eavesdropper. Different from classical AN injection schemes, which rely on a multi-antenna transmitter or external helpers, our proposed scheme is applicable to the scenario where the legitimate transceivers are very simple. We analyze the performance of the proposed scheme and optimize the design of the transmission. Our results highlight that perfect secrecy is always achievable by properly designing the AN injection scheme.
△ Less
Submitted 8 May, 2017;
originally announced May 2017.
-
Feature Selection with Annealing for Computer Vision and Big Data Learning
Authors:
Adrian Barbu,
Yiyuan She,
Liangjing Ding,
Gary Gramajo
Abstract:
Many computer vision and medical imaging problems are faced with learning from large-scale datasets, with millions of observations and features. In this paper we propose a novel efficient learning scheme that tightens a sparsity constraint by gradually removing variables based on a criterion and a schedule. The attractive fact that the problem size keeps dropping throughout the iterations makes it…
▽ More
Many computer vision and medical imaging problems are faced with learning from large-scale datasets, with millions of observations and features. In this paper we propose a novel efficient learning scheme that tightens a sparsity constraint by gradually removing variables based on a criterion and a schedule. The attractive fact that the problem size keeps dropping throughout the iterations makes it particularly suitable for big data learning. Our approach applies generically to the optimization of any differentiable loss function, and finds applications in regression, classification and ranking. The resultant algorithms build variable screening into estimation and are extremely simple to implement. We provide theoretical guarantees of convergence and selection consistency. In addition, one dimensional piecewise linear response functions are used to account for nonlinearity and a second order prior is imposed on these functions to avoid overfitting. Experiments on real and synthetic data show that the proposed method compares very well with other state of the art methods in regression, classification and ranking while being computationally very efficient and scalable.
△ Less
Submitted 17 March, 2016; v1 submitted 10 October, 2013;
originally announced October 2013.
-
Approximating Higher-Order Distances Using Random Projections
Authors:
Ping Li,
Michael W. Mahoney,
Yiyuan She
Abstract:
We provide a simple method and relevant theoretical analysis for efficiently estimating higher-order lp distances. While the analysis mainly focuses on l4, our methodology extends naturally to p = 6,8,10..., (i.e., when p is even). Distance-based methods are popular in machine learning. In large-scale applications, storing, computing, and retrieving the distances can be both space and time prohibi…
▽ More
We provide a simple method and relevant theoretical analysis for efficiently estimating higher-order lp distances. While the analysis mainly focuses on l4, our methodology extends naturally to p = 6,8,10..., (i.e., when p is even). Distance-based methods are popular in machine learning. In large-scale applications, storing, computing, and retrieving the distances can be both space and time prohibitive. Efficient algorithms exist for estimating lp distances if 0 < p <= 2. The task for p > 2 is known to be difficult. Our work partially fills this gap.
△ Less
Submitted 15 March, 2012;
originally announced March 2012.
-
Outlier Detection Using Nonconvex Penalized Regression
Authors:
Yiyuan She,
Art B. Owen
Abstract:
This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the $n$ data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual $L_1$ penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The $L_1$ penalty correspond…
▽ More
This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the $n$ data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual $L_1$ penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The $L_1$ penalty corresponds to soft thresholding. We introduce a thresholding (denoted by $Θ$) based iterative procedure for outlier detection ($Θ$-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that $Θ$-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most $O(np)$ (and sometimes much less) avoiding an $O(np^2)$ least squares estimate. We describe the connection between $Θ$-IPOD and $M$-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-dependent choice can be made based on BIC. The tuned $Θ$-IPOD shows outstanding performance in identifying outliers in various situations in comparison to other existing approaches. This methodology extends to high-dimensional modeling with $p\gg n$, if both the coefficient vector and the outlier pattern are sparse.
△ Less
Submitted 16 October, 2011; v1 submitted 13 June, 2010;
originally announced June 2010.
-
Yet Another Pacman 3D Adventures
Authors:
Serguei A. Mokhov,
Yingying She
Abstract:
This game is meant to be extension of the overly-beaten pacman-style game (code-named "Yet Another Pacman 3D Adventures", or YAP3DAD) from the proposed ideas and other projects with advance visual and computer graphics features, including a-game-in-a-game approach. The project is an open-source project published on SourceForge.net for possible future development and extension.
This game is meant to be extension of the overly-beaten pacman-style game (code-named "Yet Another Pacman 3D Adventures", or YAP3DAD) from the proposed ideas and other projects with advance visual and computer graphics features, including a-game-in-a-game approach. The project is an open-source project published on SourceForge.net for possible future development and extension.
△ Less
Submitted 26 October, 2009;
originally announced October 2009.