Search | arXiv e-print repository

Scaling Law of Sim2Real Transfer Learning in Expanding Computational Materials Databases for Real-World Predictions

Authors: Shunya Minami, Yoshihiro Hayashi, Stephen Wu, Kenji Fukumizu, Hiroki Sugisawa, Masashi Ishii, Isao Kuwajima, Kazuya Shiratori, Ryo Yoshida

Abstract: To address the challenge of limited experimental materials data, extensive physical property databases are being developed based on high-throughput computational experiments, such as molecular dynamics simulations. Previous studies have shown that fine-tuning a predictor pretrained on a computational database to a real system can result in models with outstanding generalization capabilities compar… ▽ More To address the challenge of limited experimental materials data, extensive physical property databases are being developed based on high-throughput computational experiments, such as molecular dynamics simulations. Previous studies have shown that fine-tuning a predictor pretrained on a computational database to a real system can result in models with outstanding generalization capabilities compared to learning from scratch. This study demonstrates the scaling law of simulation-to-real (Sim2Real) transfer learning for several machine learning tasks in materials science. Case studies of three prediction tasks for polymers and inorganic materials reveal that the prediction error on real systems decreases according to a power-law as the size of the computational data increases. Observing the scaling behavior offers various insights for database development, such as determining the sample size necessary to achieve a desired performance, identifying equivalent sample sizes for physical and computational experiments, and guiding the design of data production protocols for downstream real-world tasks. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 22 pages, 6 figures

arXiv:2405.17842 [pdf, other]

Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation

Authors: Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Abstract: In this study, we aim to construct an audio-video generative model with minimal computational cost by leveraging pre-trained single-modal generative models for audio and video. To achieve this, we propose a novel method that guides each single-modal model to cooperatively generate well-aligned samples across modalities. Specifically, given two pre-trained base diffusion models, we train a lightwei… ▽ More In this study, we aim to construct an audio-video generative model with minimal computational cost by leveraging pre-trained single-modal generative models for audio and video. To achieve this, we propose a novel method that guides each single-modal model to cooperatively generate well-aligned samples across modalities. Specifically, given two pre-trained base diffusion models, we train a lightweight joint guidance module to adjust scores separately estimated by the base models to match the score of joint distribution over audio and video. We theoretically show that this guidance can be computed through the gradient of the optimal discriminator distinguishing real audio-video pairs from fake ones independently generated by the base models. On the basis of this analysis, we construct the joint guidance module by training this discriminator. Additionally, we adopt a loss function to make the gradient of the discriminator work as a noise estimator, as in standard diffusion models, stabilizing the gradient of the discriminator. Empirical evaluations on several benchmark datasets demonstrate that our method improves both single-modal fidelity and multi-modal alignment with a relatively small number of parameters. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.14598 [pdf, other]

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Authors: Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Abstract: In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation method… ▽ More In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation methods usually resort to huge large language model or composable diffusion models. Instead of designing another giant model for audio-visual generation, in this paper we take a step back showing a simple and lightweight generative transformer, which is not fully investigated in multi-modal generation, can achieve excellent results on image2audio generation. The transformer operates in the discrete audio and visual Vector-Quantized GAN space, and is trained in the mask denoising manner. After training, the classifier-free guidance could be deployed off-the-shelf achieving better performance, without any extra training or modification. Since the transformer model is modality symmetrical, it could also be directly deployed for audio2image generation and co-generation. In the experiments, we show that our simple method surpasses recent image2audio generation methods. Generated audio samples can be found at https://docs.google.com/presentation/d/1ZtC0SeblKkut4XJcRaDsSTuCRIXB3ypxmSi7HTY3IyQ/ △ Less

Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 10 pages

arXiv:2402.11840 [pdf, other]

An Endoscopic Chisel: Intraoperative Imaging Carves 3D Anatomical Models

Authors: Jan Emily Mangulabnan, Roger D. Soberanis-Mukul, Timo Teufel, Manish Sahu, Jose L. Porras, S. Swaroop Vedula, Masaru Ishii, Gregory Hager, Russell H. Taylor, Mathias Unberath

Abstract: Purpose: Preoperative imaging plays a pivotal role in sinus surgery where CTs offer patient-specific insights of complex anatomy, enabling real-time intraoperative navigation to complement endoscopy imaging. However, surgery elicits anatomical changes not represented in the preoperative model, generating an inaccurate basis for navigation during surgery progression. Methods: We propose a first v… ▽ More Purpose: Preoperative imaging plays a pivotal role in sinus surgery where CTs offer patient-specific insights of complex anatomy, enabling real-time intraoperative navigation to complement endoscopy imaging. However, surgery elicits anatomical changes not represented in the preoperative model, generating an inaccurate basis for navigation during surgery progression. Methods: We propose a first vision-based approach to update the preoperative 3D anatomical model leveraging intraoperative endoscopic video for navigated sinus surgery where relative camera poses are known. We rely on comparisons of intraoperative monocular depth estimates and preoperative depth renders to identify modified regions. The new depths are integrated in these regions through volumetric fusion in a truncated signed distance function representation to generate an intraoperative 3D model that reflects tissue manipulation. Results: We quantitatively evaluate our approach by sequentially updating models for a five-step surgical progression in an ex vivo specimen. We compute the error between correspondences from the updated model and ground-truth intraoperative CT in the region of anatomical modification. The resulting models show a decrease in error during surgical progression as opposed to increasing when no update is employed. Conclusion: Our findings suggest that preoperative 3D anatomical models can be updated using intraoperative endoscopy video in navigated sinus surgery. Future work will investigate improvements to monocular depth estimation as well as removing the need for external navigation systems. The resulting ability to continuously update the patient model may provide surgeons with a more precise understanding of the current anatomical state and paves the way toward a digital twin paradigm for sinus surgery. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2401.11052 [pdf, other]

doi 10.1080/27660400.2024.2356506

Mining experimental data from Materials Science literature with Large Language Models: an evaluation study

Authors: Luca Foppiano, Guillaume Lambard, Toshiyuki Amagasa, Masashi Ishii

Abstract: This study is dedicated to assessing the capabilities of large language models (LLMs) such as GPT-3.5-Turbo, GPT-4, and GPT-4-Turbo in extracting structured information from scientific documents in materials science. To this end, we primarily focus on two critical tasks of information extraction: (i) a named entity recognition (NER) of studied materials and physical properties and (ii) a relation… ▽ More This study is dedicated to assessing the capabilities of large language models (LLMs) such as GPT-3.5-Turbo, GPT-4, and GPT-4-Turbo in extracting structured information from scientific documents in materials science. To this end, we primarily focus on two critical tasks of information extraction: (i) a named entity recognition (NER) of studied materials and physical properties and (ii) a relation extraction (RE) between these entities. Due to the evident lack of datasets within Materials Informatics (MI), we evaluated using SuperMat, based on superconductor research, and MeasEval, a generic measurement evaluation corpus. The performance of LLMs in executing these tasks is benchmarked against traditional models based on the BERT architecture and rule-based approaches (baseline). We introduce a novel methodology for the comparative analysis of intricate material expressions, emphasising the standardisation of chemical formulas to tackle the complexities inherent in materials science information assessment. For NER, LLMs fail to outperform the baseline with zero-shot prompting and exhibit only limited improvement with few-shot prompting. However, a GPT-3.5-Turbo fine-tuned with the appropriate strategy for RE outperforms all models, including the baseline. Without any fine-tuning, GPT-4 and GPT-4-Turbo display remarkable reasoning and relationship extraction capabilities after being provided with merely a couple of examples, surpassing the baseline. Overall, the results suggest that although LLMs demonstrate relevant reasoning skills in connecting concepts, specialised models are currently a better choice for tasks requiring extracting complex domain-specific entities like materials. These insights provide initial guidance applicable to other materials science sub-domains in future work. △ Less

Submitted 30 May, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

Comments: 40 pages: 5 figures and 1 table in the body. 32 Tables in the Appendix / Supplementary materials

Journal ref: Science and Technology of Advanced Materials: Methods (2024)

arXiv:2312.04779 [pdf, other]

Image Synthesis-based Late Stage Cancer Augmentation and Semi-Supervised Segmentation for MRI Rectal Cancer Staging

Authors: Saeko Sasuga, Akira Kudo, Yoshiro Kitamura, Satoshi Iizuka, Edgar Simo-Serra, Atsushi Hamabe, Masayuki Ishii, Ichiro Takemasa

Abstract: Rectal cancer is one of the most common diseases and a major cause of mortality. For deciding rectal cancer treatment plans, T-staging is important. However, evaluating the index from preoperative MRI images requires high radiologists' skill and experience. Therefore, the aim of this study is to segment the mesorectum, rectum, and rectal cancer region so that the system can predict T-stage from se… ▽ More Rectal cancer is one of the most common diseases and a major cause of mortality. For deciding rectal cancer treatment plans, T-staging is important. However, evaluating the index from preoperative MRI images requires high radiologists' skill and experience. Therefore, the aim of this study is to segment the mesorectum, rectum, and rectal cancer region so that the system can predict T-stage from segmentation results. Generally, shortage of large and diverse dataset and high quality annotation are known to be the bottlenecks in computer aided diagnostics development. Regarding rectal cancer, advanced cancer images are very rare, and per-pixel annotation requires high radiologists' skill and time. Therefore, it is not feasible to collect comprehensive disease patterns in a training dataset. To tackle this, we propose two kinds of approaches of image synthesis-based late stage cancer augmentation and semi-supervised learning which is designed for T-stage prediction. In the image synthesis data augmentation approach, we generated advanced cancer images from labels. The real cancer labels were deformed to resemble advanced cancer labels by artificial cancer progress simulation. Next, we introduce a T-staging loss which enables us to train segmentation models from per-image T-stage labels. The loss works to keep inclusion/invasion relationships between rectum and cancer region consistent to the ground truth T-stage. The verification tests show that the proposed method obtains the best sensitivity (0.76) and specificity (0.80) in distinguishing between over T3 stage and underT2. In the ablation studies, our semi-supervised learning approach with the T-staging loss improved specificity by 0.13. Adding the image synthesis-based data augmentation improved the DICE score of invasion cancer area by 0.08 from baseline. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 10 pages, 7 figures, Accepted to Data Augmentation, Labeling, and Imperfections (DALI) at MICCAI 2022

arXiv:2310.14364 [pdf, other]

A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video

Authors: Jan Emily Mangulabnan, Roger D. Soberanis-Mukul, Timo Teufel, Isabela Hernández, Jonas Winter, Manish Sahu, Jose L. Porras, S. Swaroop Vedula, Masaru Ishii, Gregory Hager, Russell H. Taylor, Mathias Unberath

Abstract: Generating accurate 3D reconstructions from endoscopic video is a promising avenue for longitudinal radiation-free analysis of sinus anatomy and surgical outcomes. Several methods for monocular reconstruction have been proposed, yielding visually pleasant 3D anatomical structures by retrieving relative camera poses with structure-from-motion-type algorithms and fusion of monocular depth estimates.… ▽ More Generating accurate 3D reconstructions from endoscopic video is a promising avenue for longitudinal radiation-free analysis of sinus anatomy and surgical outcomes. Several methods for monocular reconstruction have been proposed, yielding visually pleasant 3D anatomical structures by retrieving relative camera poses with structure-from-motion-type algorithms and fusion of monocular depth estimates. However, due to the complex properties of the underlying algorithms and endoscopic scenes, the reconstruction pipeline may perform poorly or fail unexpectedly. Further, acquiring medical data conveys additional challenges, presenting difficulties in quantitatively benchmarking these models, understanding failure cases, and identifying critical components that contribute to their precision. In this work, we perform a quantitative analysis of a self-supervised approach for sinus reconstruction using endoscopic sequences paired with optical tracking and high-resolution computed tomography acquired from nine ex-vivo specimens. Our results show that the generated reconstructions are in high agreement with the anatomy, yielding an average point-to-mesh error of 0.91 mm between reconstructions and CT segmentations. However, in a point-to-point matching scenario, relevant for endoscope tracking and navigation, we found average target registration errors of 6.58 mm. We identified that pose and depth estimation inaccuracies contribute equally to this error and that locally consistent sequences with shorter trajectories generate more accurate reconstructions. These results suggest that achieving global consistency between relative camera poses and estimated depths with the anatomy is essential. In doing so, we can ensure proper synergy between all components of the pipeline for improved reconstructions that will facilitate clinical application of this innovative technology. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2309.10923 [pdf, other]

doi 10.1080/27660400.2023.2286219

Semi-automatic staging area for high-quality structured data extraction from scientific literature

Authors: Luca Foppiano, Tomoya Mato, Kensei Terashima, Pedro Ortiz Suarez, Taku Tou, Chikako Sakai, Wei-Sheng Wang, Toshiyuki Amagasa, Yoshihiko Takano, Masashi Ishii

Abstract: We propose a semi-automatic staging area for efficiently building an accurate database of experimental physical properties of superconductors from literature, called SuperCon2, to enrich the existing manually-built superconductor database SuperCon. Here we report our curation interface (SuperCon2 Interface) and a workflow managing the state transitions of each examined record, to validate the data… ▽ More We propose a semi-automatic staging area for efficiently building an accurate database of experimental physical properties of superconductors from literature, called SuperCon2, to enrich the existing manually-built superconductor database SuperCon. Here we report our curation interface (SuperCon2 Interface) and a workflow managing the state transitions of each examined record, to validate the dataset of superconductors from PDF documents collected using Grobid-superconductors in a previous work. This curation workflow allows both automatic and manual operations, the former contains ``anomaly detection'' that scans new data identifying outliers, and a ``training data collector'' mechanism that collects training data examples based on manual corrections. Such training data collection policy is effective in improving the machine-learning models with a reduced number of examples. For manual operations, the interface (SuperCon2 interface) is developed to increase efficiency during manual correction by providing a smart interface and an enhanced PDF document viewer. We show that our interface significantly improves the curation quality by boosting precision and recall as compared with the traditional ``manual correction''. Our semi-automatic approach would provide a solution for achieving a reliable database with text-data mining of scientific documents. △ Less

Submitted 16 November, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: 5 tables, 6 figures, 18 pages

arXiv:2309.03395 [pdf, other]

The Quiet Eye Phenomenon in Minimally Invasive Surgery

Authors: Alaa Eldin Abdelaal, Rachelle Van Rumpt, Sayem Nazmuz Zaman, Irene Tong, Anthony Jarc, Gary L. Gallia, Masaru Ishii, Gregory D. Hager, Septimiu E. Salcudean

Abstract: In this paper, we report our discovery of a gaze behavior called Quiet Eye (QE) in minimally invasive surgery. The QE behavior has been extensively studied in sports training and has been associated with higher level of expertise in multiple sports. We investigated the QE behavior in two independently collected data sets of surgeons performing tasks in a sinus surgery setting and a robotic surgery… ▽ More In this paper, we report our discovery of a gaze behavior called Quiet Eye (QE) in minimally invasive surgery. The QE behavior has been extensively studied in sports training and has been associated with higher level of expertise in multiple sports. We investigated the QE behavior in two independently collected data sets of surgeons performing tasks in a sinus surgery setting and a robotic surgery setting, respectively. Our results show that the QE behavior is more likely to occur in successful task executions and in performances of surgeons of high level of expertise. These results open the door to use the QE behavior in both training and skill assessment in minimally invasive surgery. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2303.15780 [pdf, other]

Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion

Authors: Hiromichi Kamata, Yuiko Sakuma, Akio Hayakawa, Masato Ishii, Takuya Narihira

Abstract: We propose a high-quality 3D-to-3D conversion method, Instruct 3D-to-3D. Our method is designed for a novel task, which is to convert a given 3D scene to another scene according to text instructions. Instruct 3D-to-3D applies pretrained Image-to-Image diffusion models for 3D-to-3D conversion. This enables the likelihood maximization of each viewpoint image and high-quality 3D generation. In additi… ▽ More We propose a high-quality 3D-to-3D conversion method, Instruct 3D-to-3D. Our method is designed for a novel task, which is to convert a given 3D scene to another scene according to text instructions. Instruct 3D-to-3D applies pretrained Image-to-Image diffusion models for 3D-to-3D conversion. This enables the likelihood maximization of each viewpoint image and high-quality 3D generation. In addition, our proposed method explicitly inputs the source 3D scene as a condition, which enhances 3D consistency and controllability of how much of the source 3D scene structure is reflected. We also propose dynamic scaling, which allows the intensity of the geometry transformation to be adjusted. We performed quantitative and qualitative evaluations and showed that our proposed method achieves higher quality 3D-to-3D conversions than baseline methods. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: Project page: https://sony.github.io/Instruct3Dto3D-doc/

arXiv:2303.13121 [pdf, other]

DetOFA: Efficient Training of Once-for-All Networks for Object Detection Using Path Filter

Authors: Yuiko Sakuma, Masato Ishii, Takuya Narihira

Abstract: We address the challenge of training a large supernet for the object detection task, using a relatively small amount of training data. Specifically, we propose an efficient supernet-based neural architecture search (NAS) method that uses search space pruning. The search space defined by the supernet is pruned by removing candidate models that are predicted to perform poorly. To effectively remove… ▽ More We address the challenge of training a large supernet for the object detection task, using a relatively small amount of training data. Specifically, we propose an efficient supernet-based neural architecture search (NAS) method that uses search space pruning. The search space defined by the supernet is pruned by removing candidate models that are predicted to perform poorly. To effectively remove the candidates over a wide range of resource constraints, we particularly design a performance predictor for supernet, called path filter, which is conditioned by resource constraints and can accurately predict the relative performance of the models that satisfy similar resource constraints. Hence, supernet training is more focused on the best-performing candidates. Our path filter handles prediction for paths with different resource budgets. Compared to once-for-all, our proposed method reduces the computational cost of the optimal network architecture by 30% and 63%, while yielding better accuracy-floating point operations Pareto front (0.85 and 0.45 points of improvement on average precision for Pascal VOC and COCO, respectively). △ Less

Submitted 19 October, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted to ICCV workshop 2023

arXiv:2212.02024 [pdf, other]

Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models

Authors: Naoki Matsunaga, Masato Ishii, Akio Hayakawa, Kenji Suzuki, Takuya Narihira

Abstract: Our goal is to develop fine-grained real-image editing methods suitable for real-world applications. In this paper, we first summarize four requirements for these methods and propose a novel diffusion-based image editing framework with pixel-wise guidance that satisfies these requirements. Specifically, we train pixel-classifiers with a few annotated data and then infer the segmentation map of a t… ▽ More Our goal is to develop fine-grained real-image editing methods suitable for real-world applications. In this paper, we first summarize four requirements for these methods and propose a novel diffusion-based image editing framework with pixel-wise guidance that satisfies these requirements. Specifically, we train pixel-classifiers with a few annotated data and then infer the segmentation map of a target image. Users then manipulate the map to instruct how the image will be edited. We utilize a pre-trained diffusion model to generate edited images aligned with the user's intention with pixel-wise guidance. The effective combination of proposed guidance and other techniques enables highly controllable editing with preserving the outside of the edited area, which results in meeting our requirements. The experimental results demonstrate that our proposal outperforms the GAN-based method for editing quality and speed. △ Less

Submitted 31 May, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

Comments: Accepted by AI for Content Creation (AI4CC) workshop at CVPR 2023

arXiv:2210.15600 [pdf, other]

doi 10.1080/27660400.2022.2153633

Automatic extraction of materials and properties from superconductors scientific literature

Authors: Luca Foppiano, Pedro Baptista de Castro, Pedro Ortiz Suarez, Kensei Terashima, Yoshihiko Takano, Masashi Ishii

Abstract: The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic a… ▽ More The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40324 materials and properties records from 37700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method. △ Less

Submitted 22 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

Comments: 20 pages, 11 figures, 8 tables

Journal ref: STAM:M, 2023, VOL. 3, NO. 1, 2153633

arXiv:2209.12130 [pdf, other]

Scalable adaptive algorithms for next-generation multiphase flow simulations

Authors: Kumar Saurabh, Masado Ishii, Makrand A. Khanwale, Hari Sundar, Baskar Ganapathysubramanian

Abstract: High-fidelity flow simulations are indispensable when analyzing systems exhibiting multiphase flow phenomena. The accuracy of multiphase flow simulations is strongly contingent upon the finest mesh resolution used to represent the fluid-fluid interfaces. However, the increased resolution comes at a higher computational cost. In this work, we propose algorithmic advances that aim to reduce the comp… ▽ More High-fidelity flow simulations are indispensable when analyzing systems exhibiting multiphase flow phenomena. The accuracy of multiphase flow simulations is strongly contingent upon the finest mesh resolution used to represent the fluid-fluid interfaces. However, the increased resolution comes at a higher computational cost. In this work, we propose algorithmic advances that aim to reduce the computational cost without compromising on the physics by selectively detecting key regions of interest (droplets/filaments) that require significantly higher resolution. The framework uses an adaptive octree-based meshing framework that is integrated with PETSc's linear algebra solvers. We demonstrate scaling of the framework up to 114,688 processes on TACC's Frontera. Finally, we deploy the framework to simulate one of the most resolved simulations of primary jet atomization. This simulation -- equivalent to 35 trillion grid points on a uniform grid -- is 64 times larger than the current state-of-the-art simulations and provides unprecedented insights into an important flow physics problem with a diverse array of engineering applications. △ Less

Submitted 3 April, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

Comments: 12 pages, 9 figures; Accepted for publication in Proceedings of 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

arXiv:2202.09487 [pdf, other]

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Authors: Xingtong Liu, Zhaoshuo Li, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath

Abstract: In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor grap… ▽ More In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The appearance and geometry priors are explicitly learned in an end-to-end differentiable training pipeline to master the task of pair-wise image alignment, one of the core components of the SLAM system. In our experiments, the proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy. The system generalizes well to unseen endoscopes and subjects and performs favorably compared with a state-of-the-art feature-based SLAM system. The code repository is available at https://github.com/lppllppl920/SAGE-SLAM.git. △ Less

Submitted 22 February, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

Comments: Accepted to ICRA 2022

arXiv:2111.10646 [pdf]

doi 10.20965/jaciii.2021.p0944

Frailty Care Robot for Elderly and Its Application for Physical and Psychological Support

Authors: Yoichi Yamazaki, Masayuki Ishii, Takahiro Ito, Takuya Hashimoto

Abstract: To achieve continuous frail care in the daily lives of the elderly, we propose AHOBO, a frail care robot for the elderly at home. Two types of support systems by AHOBO were implemented to support the elderly in both physical health and psychological aspects. For physical health frailty care, we focused on blood pressure and developed a support system for blood pressure measurement with AHOBO. For… ▽ More To achieve continuous frail care in the daily lives of the elderly, we propose AHOBO, a frail care robot for the elderly at home. Two types of support systems by AHOBO were implemented to support the elderly in both physical health and psychological aspects. For physical health frailty care, we focused on blood pressure and developed a support system for blood pressure measurement with AHOBO. For psychological frailty care, we implemented reminiscent coloring with the AHOBO as a recreational activity with the robot. The usability of the system was evaluated based on the assumption of continuous use in daily life. For the support system in blood pressure measurement, we performed a qualitative evaluation using a questionnaire for 16 subjects, including elderly people under blood pressure measurement by the system. The results confirmed that the proposed robot does not affect the blood pressure readings and is acceptable in terms of ease of use based on subjective evaluation. For the reminiscent coloring interaction, a subjective evaluation was conducted on two elderly people under the verbal fluency task, and it has been confirmed that the interaction can be used continuously in daily life. The widespread use of the proposed robot as an interface for AI that supports daily life will lead to a society in which AI robots support people from the cradle to the grave. △ Less

Submitted 20 November, 2021; originally announced November 2021.

Comments: 9 pages, 15 figures, J. Adv. Comput. Intell. Intell. Inform.(JACIII)

Journal ref: J. Adv. Comput. Intell. Intell. Inform., Vol.25, No.6, pp. 944-952, 2021

arXiv:2111.09353 [pdf, other]

Case study of SARS-CoV-2 transmission risk assessment in indoor environments using cloud computing resources

Authors: Kumar Saurabh, Santi Adavani, Kendrick Tan, Masado Ishii, Boshun Gao, Adarsh Krishnamurthy, Hari Sundar, Baskar Ganapathysubramanian

Abstract: Complex flow simulations are conventionally performed on HPC clusters. However, the limited availability of HPC resources and steep learning curve of executing on traditional supercomputer infrastructure has drawn attention towards deploying flow simulation software on the cloud. We showcase how a complex computational framework -- that can evaluate COVID-19 transmission risk in various indoor cla… ▽ More Complex flow simulations are conventionally performed on HPC clusters. However, the limited availability of HPC resources and steep learning curve of executing on traditional supercomputer infrastructure has drawn attention towards deploying flow simulation software on the cloud. We showcase how a complex computational framework -- that can evaluate COVID-19 transmission risk in various indoor classroom scenarios -- can be abstracted and deployed on cloud services. The availability of such cloud-based personalized planning tools can enable educational institutions, medical institutions, public sector workers (courthouses, police stations, airports, etc.), and other entities to comprehensively evaluate various in-person interaction scenarios for transmission risk. We deploy the simulation framework on the Azure cloud framework, utilizing the Dendro-kT mesh generation tool and PETSc solvers. The cloud abstraction is provided by RocketML cloud infrastructure. We compare the performance of the cloud machines with state-of-the-art HPC machine TACC Frontera. Our results suggest that cloud-based HPC resources are a viable strategy for a diverse array of end-users to rapidly and efficiently deploy simulation software. △ Less

Submitted 17 November, 2021; originally announced November 2021.

Comments: Accepted for publication at SuperCompCloud: 5th Workshop on Interoperability of Supercomputing and Cloud Technologies

arXiv:2108.03757 [pdf, other]

doi 10.1145/3458817.3476220

Scalable adaptive PDE solvers in arbitrary domains

Authors: Kumar Saurabh, Masado Ishii, Milinda Fernando, Boshun Gao, Kendrick Tan, Ming-Chen Hsu, Adarsh Krishnamurthy, Hari Sundar, Baskar Ganapathysubramanian

Abstract: Efficiently and accurately simulating partial differential equations (PDEs) in and around arbitrarily defined geometries, especially with high levels of adaptivity, has significant implications for different application domains. A key bottleneck in the above process is the fast construction of a `good' adaptively-refined mesh. In this work, we present an efficient novel octree-based adaptive discr… ▽ More Efficiently and accurately simulating partial differential equations (PDEs) in and around arbitrarily defined geometries, especially with high levels of adaptivity, has significant implications for different application domains. A key bottleneck in the above process is the fast construction of a `good' adaptively-refined mesh. In this work, we present an efficient novel octree-based adaptive discretization approach capable of carving out arbitrarily shaped void regions from the parent domain: an essential requirement for fluid simulations around complex objects. Carving out objects produces an $\textit{incomplete}$ octree. We develop efficient top-down and bottom-up traversal methods to perform finite element computations on $\textit{incomplete}$ octrees. We validate the framework by (a) showing appropriate convergence analysis and (b) computing the drag coefficient for flow past a sphere for a wide range of Reynolds numbers ($\mathcal{O}(1-10^6)$) encompassing the drag crisis regime. Finally, we deploy the framework on a realistic geometry on a current project to evaluate COVID-19 transmission risk in classrooms. △ Less

Submitted 8 August, 2021; originally announced August 2021.

Comments: 16 pages. Accepted for publication at Supercomputing '21: The International Conference for High Performance Computing, Networking, Storage, and Analysis

arXiv:2105.07660 [pdf]

Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods

Authors: Etienne David, Mario Serouart, Daniel Smith, Simon Madec, Kaaviya Velumani, Shouyang Liu, Xu Wang, Francisco Pinto Espinosa, Shahameh Shafiee, Izzat S. A. Tahir, Hisashi Tsujimoto, Shuhei Nasuda, Bangyou Zheng, Norbert Kichgessner, Helge Aasen, Andreas Hund, Pouria Sadhegi-Tehran, Koichi Nagasawa, Goro Ishikawa, Sébastien Dandrifosse, Alexis Carlier, Benoit Mercatoris, Ken Kuroki, Haozhou Wang, Masanori Ishii , et al. (10 additional authors not shown)

Abstract: The Global Wheat Head Detection (GWHD) dataset was created in 2020 and has assembled 193,634 labelled wheat heads from 4,700 RGB images acquired from various acquisition platforms and 7 countries/institutions. With an associated competition hosted in Kaggle, GWHD has successfully attracted attention from both the computer vision and agricultural science communities. From this first experience in 2… ▽ More The Global Wheat Head Detection (GWHD) dataset was created in 2020 and has assembled 193,634 labelled wheat heads from 4,700 RGB images acquired from various acquisition platforms and 7 countries/institutions. With an associated competition hosted in Kaggle, GWHD has successfully attracted attention from both the computer vision and agricultural science communities. From this first experience in 2020, a few avenues for improvements have been identified, especially from the perspective of data size, head diversity and label reliability. To address these issues, the 2020 dataset has been reexamined, relabeled, and augmented by adding 1,722 images from 5 additional countries, allowing for 81,553 additional wheat heads to be added. We now release a new version of the Global Wheat Head Detection (GWHD) dataset in 2021, which is bigger, more diverse, and less noisy than the 2020 version. The GWHD 2021 is now publicly available at http://www.global-wheat.com/ and a new data challenge has been organized on AIcrowd to make use of this updated dataset. △ Less

Submitted 3 June, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: 8 pages, 2 figures, 1 table

arXiv:2103.08193 [pdf, other]

Semi-supervised learning by selective training with pseudo labels via confidence estimation

Authors: Masato Ishii

Abstract: We propose a novel semi-supervised learning (SSL) method that adopts selective training with pseudo labels. In our method, we generate hard pseudo-labels and also estimate their confidence, which represents how likely each pseudo-label is to be correct. Then, we explicitly select which pseudo-labeled data should be used to update the model. Specifically, assuming that loss on incorrectly pseudo-la… ▽ More We propose a novel semi-supervised learning (SSL) method that adopts selective training with pseudo labels. In our method, we generate hard pseudo-labels and also estimate their confidence, which represents how likely each pseudo-label is to be correct. Then, we explicitly select which pseudo-labeled data should be used to update the model. Specifically, assuming that loss on incorrectly pseudo-labeled data sensitively increase against data augmentation, we select the data corresponding to relatively small loss after applying data augmentation. The confidence is used not only for screening candidates of pseudo-labeled data to be selected but also for automatically deciding how many pseudo-labeled data should be selected within a mini-batch. Since accurate estimation of the confidence is crucial in our method, we also propose a new data augmentation method, called MixConf, that enables us to obtain confidence-calibrated models even when the number of training data is small. Experimental results with several benchmark datasets validate the advantage of our SSL method as well as MixConf. △ Less

Submitted 15 March, 2021; originally announced March 2021.

arXiv:2103.04037 [pdf, other]

Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision

Authors: Andrew Shin, Masato Ishii, Takuya Narihira

Abstract: Transformer architectures have brought about fundamental changes to computational linguistic field, which had been dominated by recurrent neural networks for many years. Its success also implies drastic changes in cross-modal tasks with language and vision, and many researchers have already tackled the issue. In this paper, we review some of the most critical milestones in the field, as well as ov… ▽ More Transformer architectures have brought about fundamental changes to computational linguistic field, which had been dominated by recurrent neural networks for many years. Its success also implies drastic changes in cross-modal tasks with language and vision, and many researchers have already tackled the issue. In this paper, we review some of the most critical milestones in the field, as well as overall trends on how transformer architecture has been incorporated into visuolinguistic cross-modal tasks. Furthermore, we discuss its current limitations and speculate upon some of the prospects that we find imminent. △ Less

Submitted 9 November, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

Comments: Accepted for publication by International Journal of Computer Vision (IJCV)

arXiv:2102.06725 [pdf, other]

Neural Network Libraries: A Deep Learning Framework Designed from Engineers' Perspectives

Authors: Takuya Narihira, Javier Alonsogarcia, Fabien Cardinaux, Akio Hayakawa, Masato Ishii, Kazunori Iwaki, Thomas Kemp, Yoshiyuki Kobayashi, Lukas Mauch, Akira Nakamura, Yukio Obuchi, Andrew Shin, Kenji Suzuki, Stephen Tiedmann, Stefan Uhlich, Takuya Yashima, Kazuki Yoshiyama

Abstract: While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between different tools. In this paper, we introduce Neural Network Libraries (https://nnabla.org), a deep learning framework designed from engineer's perspe… ▽ More While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between different tools. In this paper, we introduce Neural Network Libraries (https://nnabla.org), a deep learning framework designed from engineer's perspective, with emphasis on usability and compatibility as its core design principles. We elaborate on each of our design principles and its merits, and validate our attempts via experiments. △ Less

Submitted 21 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: https://nnabla.org

arXiv:2101.10842 [pdf, other]

Source-free Domain Adaptation via Distributional Alignment by Matching Batch Normalization Statistics

Authors: Masato Ishii, Masashi Sugiyama

Abstract: In this paper, we propose a novel domain adaptation method for the source-free setting. In this setting, we cannot access source data during adaptation, while unlabeled target data and a model pretrained with source data are given. Due to lack of source data, we cannot directly match the data distributions between domains unlike typical domain adaptation algorithms. To cope with this problem, we p… ▽ More In this paper, we propose a novel domain adaptation method for the source-free setting. In this setting, we cannot access source data during adaptation, while unlabeled target data and a model pretrained with source data are given. Due to lack of source data, we cannot directly match the data distributions between domains unlike typical domain adaptation algorithms. To cope with this problem, we propose utilizing batch normalization statistics stored in the pretrained model to approximate the distribution of unobserved source data. Specifically, we fix the classifier part of the model during adaptation and only fine-tune the remaining feature encoder part so that batch normalization statistics of the features extracted by the encoder match those stored in the fixed classifier. Additionally, we also maximize the mutual information between the features and the classifier's outputs to further boost the classification performance. Experimental results with several benchmark datasets show that our method achieves competitive performance with state-of-the-art domain adaptation methods even though it does not require access to source data. △ Less

Submitted 19 January, 2021; originally announced January 2021.

arXiv:2010.11741 [pdf, other]

Ultra-low power on-chip learning of speech commands with phase-change memories

Authors: Venkata Pavan Kumar Miriyala, Masatoshi Ishii

Abstract: Embedding artificial intelligence at the edge (edge-AI) is an elegant solution to tackle the power and latency issues in the rapidly expanding Internet of Things. As edge devices typically spend most of their time in sleep mode and only wake-up infrequently to collect and process sensor data, non-volatile in-memory computing (NVIMC) is a promising approach to design the next generation of edge-AI… ▽ More Embedding artificial intelligence at the edge (edge-AI) is an elegant solution to tackle the power and latency issues in the rapidly expanding Internet of Things. As edge devices typically spend most of their time in sleep mode and only wake-up infrequently to collect and process sensor data, non-volatile in-memory computing (NVIMC) is a promising approach to design the next generation of edge-AI devices. Recently, we proposed an NVIMC-based neuromorphic accelerator using the phase change memories (PCMs), which we call as Raven. In this work, we demonstrate the ultra-low-power on-chip training and inference of speech commands using Raven. We showed that Raven can be trained on-chip with power consumption as low as 30~uW, which is suitable for edge applications. Furthermore, we showed that at iso-accuracies, Raven needs 70.36x and 269.23x less number of computations to be performed than a deep neural network (DNN) during inference and training, respectively. Owing to such low power and computational requirements, Raven provides a promising pathway towards ultra-low-power training and inference at the edge. △ Less

Submitted 21 October, 2020; originally announced October 2020.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2008.12321 [pdf, other]

Learning Representations of Endoscopic Videos to Detect Tool Presence Without Supervision

Authors: David Z. Li, Masaru Ishii, Russell H. Taylor, Gregory D. Hager, Ayushi Sinha

Abstract: In this work, we explore whether it is possible to learn representations of endoscopic video frames to perform tasks such as identifying surgical tool presence without supervision. We use a maximum mean discrepancy (MMD) variational autoencoder (VAE) to learn low-dimensional latent representations of endoscopic videos and manipulate these representations to distinguish frames containing tools from… ▽ More In this work, we explore whether it is possible to learn representations of endoscopic video frames to perform tasks such as identifying surgical tool presence without supervision. We use a maximum mean discrepancy (MMD) variational autoencoder (VAE) to learn low-dimensional latent representations of endoscopic videos and manipulate these representations to distinguish frames containing tools from those without tools. We use three different methods to manipulate these latent representations in order to predict tool presence in each frame. Our fully unsupervised methods can identify whether endoscopic video frames contain tools with average precision of 71.56, 73.93, and 76.18, respectively, comparable to supervised methods. Our code is available at https://github.com/zdavidli/tool-presence/ △ Less

Submitted 27 August, 2020; originally announced August 2020.

Comments: 10 pages, 4 figures, CLIP 2020

arXiv:2003.08502 [pdf, other]

Reconstructing Sinus Anatomy from Endoscopic Video -- Towards a Radiation-free Approach for Quantitative Longitudinal Assessment

Authors: Xingtong Liu, Maia Stiber, Jindan Huang, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath

Abstract: Reconstructing accurate 3D surface models of sinus anatomy directly from an endoscopic video is a promising avenue for cross-sectional and longitudinal analysis to better understand the relationship between sinus anatomy and surgical outcomes. We present a patient-specific, learning-based method for 3D reconstruction of sinus surface anatomy directly and only from endoscopic videos. We demonstrate… ▽ More Reconstructing accurate 3D surface models of sinus anatomy directly from an endoscopic video is a promising avenue for cross-sectional and longitudinal analysis to better understand the relationship between sinus anatomy and surgical outcomes. We present a patient-specific, learning-based method for 3D reconstruction of sinus surface anatomy directly and only from endoscopic videos. We demonstrate the effectiveness and accuracy of our method on in and ex vivo data where we compare to sparse reconstructions from Structure from Motion, dense reconstruction from COLMAP, and ground truth anatomy from CT. Our textured reconstructions are watertight and enable measurement of clinically relevant parameters in good agreement with CT. The source code is available at https://github.com/lppllppl920/DenseReconstruction-Pytorch. △ Less

Submitted 2 July, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

Comments: Accepted to MICCAI 2020

arXiv:2003.00619 [pdf, other]

Extremely Dense Point Correspondences using a Learned Feature Descriptor

Authors: Xingtong Liu, Yiping Zheng, Benjamin Killeen, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath

Abstract: High-quality 3D reconstructions from endoscopy video play an important role in many clinical applications, including surgical navigation where they enable direct video-CT registration. While many methods exist for general multi-view 3D reconstruction, these methods often fail to deliver satisfactory performance on endoscopic video. Part of the reason is that local descriptors that establish pair-w… ▽ More High-quality 3D reconstructions from endoscopy video play an important role in many clinical applications, including surgical navigation where they enable direct video-CT registration. While many methods exist for general multi-view 3D reconstruction, these methods often fail to deliver satisfactory performance on endoscopic video. Part of the reason is that local descriptors that establish pair-wise point correspondences, and thus drive reconstruction, struggle when confronted with the texture-scarce surface of anatomy. Learning-based dense descriptors usually have larger receptive fields enabling the encoding of global information, which can be used to disambiguate matches. In this work, we present an effective self-supervised training scheme and novel loss design for dense descriptor learning. In direct comparison to recent local and dense descriptors on an in-house sinus endoscopy dataset, we demonstrate that our proposed dense descriptor can generalize to unseen patients and scopes, thereby largely improving the performance of Structure from Motion (SfM) in terms of model density and completeness. We also evaluate our method on a public dense optical flow dataset and a small-scale SfM public dataset to further demonstrate the effectiveness and generality of our method. The source code is available at https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch. △ Less

Submitted 27 March, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

Comments: The work has been accepted for publication in CVPR 2020

arXiv:1909.03101 [pdf, other]

Self-supervised Dense 3D Reconstruction from Monocular Endoscopic Video

Authors: Xingtong Liu, Ayushi Sinha, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath

Abstract: We present a self-supervised learning-based pipeline for dense 3D reconstruction from full-length monocular endoscopic videos without a priori modeling of anatomy or shading. Our method only relies on unlabeled monocular endoscopic videos and conventional multi-view stereo algorithms, and requires neither manual interaction nor patient CT in both training and application phases. In a cross-patient… ▽ More We present a self-supervised learning-based pipeline for dense 3D reconstruction from full-length monocular endoscopic videos without a priori modeling of anatomy or shading. Our method only relies on unlabeled monocular endoscopic videos and conventional multi-view stereo algorithms, and requires neither manual interaction nor patient CT in both training and application phases. In a cross-patient study using CT scans as groundtruth, we show that our method is able to produce photo-realistic dense 3D reconstructions with submillimeter mean residual errors from endoscopic videos from unseen patients and scopes. △ Less

Submitted 6 September, 2019; originally announced September 2019.

arXiv:1908.02750 [pdf, other]

doi 10.1016/j.ijheatmasstransfer.2022.122919

A physics-informed reinforcement learning approach for the interfacial area transport in two-phase flow

Authors: Zhuoran Dang, Mamoru Ishii

Abstract: The prediction of interfacial structure in two-phase flow systems is difficult and challenging. In this paper, a novel physics-informed reinforcement learning-aided framework (PIRLF) for the interfacial area transport is proposed. A Markov Decision Process that describes the bubble transport is established by assuming that the development of two-phase flow is a stochastic process with Markov prope… ▽ More The prediction of interfacial structure in two-phase flow systems is difficult and challenging. In this paper, a novel physics-informed reinforcement learning-aided framework (PIRLF) for the interfacial area transport is proposed. A Markov Decision Process that describes the bubble transport is established by assuming that the development of two-phase flow is a stochastic process with Markov property. The framework aims to capture the complexity of two-phase flow using the advantage of reinforcement learning (RL) in discovering complex patterns with the help of the physical model (Interfacial Area Transport Equation) as reference. The details of the framework design are described including the design of the environment and the algorithm used in solving the RL problem. The performance of the PIRLF is tested through experiments using the experimental database for vertical upward bubbly air-water flows. The result shows a good performance of PIRLF with rRMSE of 6.556%. The case studies on the PIRLF performance also show that the type of reward function that is related to the physical model can affect the framework performance. Based on the study, the optimal reward function is established. The approaches to extending the capability of PIRLF are discussed, which can be a reference for the further development of this methodology. △ Less

Submitted 4 October, 2020; v1 submitted 6 August, 2019; originally announced August 2019.

Journal ref: International Journal of Heat and Mass Transfer, Volume 192, 15 August 2022, 122919

arXiv:1904.00291 [pdf, other]

Two-phase flow regime prediction using LSTM based deep recurrent neural network

Authors: Zhuoran Dang, Mamoru Ishii

Abstract: Long short-term memory (LSTM) and recurrent neural network (RNN) has achieved great successes on time-series prediction. In this paper, a methodology of using LSTM-based deep-RNN for two-phase flow regime prediction is proposed, motivated by previous research on constructing deep RNN. The method is featured with fast response and accuracy. The built RNN networks are trained and tested with time-se… ▽ More Long short-term memory (LSTM) and recurrent neural network (RNN) has achieved great successes on time-series prediction. In this paper, a methodology of using LSTM-based deep-RNN for two-phase flow regime prediction is proposed, motivated by previous research on constructing deep RNN. The method is featured with fast response and accuracy. The built RNN networks are trained and tested with time-series void fraction data collected using impedance void meter. The result shows that the prediction accuracy depends on the depth of network and the number of layer cells. However, deeper and larger network consumes more time in predicting. △ Less

Submitted 30 March, 2019; originally announced April 2019.

arXiv:1903.05312 [pdf, other]

Zero-shot Domain Adaptation Based on Attribute Information

Authors: Masato Ishii, Takashi Takenouchi, Masashi Sugiyama

Abstract: In this paper, we propose a novel domain adaptation method that can be applied without target data. We consider the situation where domain shift is caused by a prior change of a specific factor and assume that we know how the prior changes between source and target domains. We call this factor an attribute, and reformulate the domain adaptation problem to utilize the attribute prior instead of tar… ▽ More In this paper, we propose a novel domain adaptation method that can be applied without target data. We consider the situation where domain shift is caused by a prior change of a specific factor and assume that we know how the prior changes between source and target domains. We call this factor an attribute, and reformulate the domain adaptation problem to utilize the attribute prior instead of target data. In our method, the source data are reweighted with the sample-wise weight estimated by the attribute prior and the data themselves so that they are useful in the target domain. We theoretically reveal that our method provides more precise estimation of sample-wise transferability than a straightforward attribute-based reweighting approach. Experimental results with both toy datasets and benchmark datasets show that our method can perform well, though it does not use any target data. △ Less

Submitted 13 March, 2019; originally announced March 2019.

Comments: 14 pages

arXiv:1902.07766 [pdf, other]

Dense Depth Estimation in Monocular Endoscopy with Self-supervised Learning Methods

Authors: Xingtong Liu, Ayushi Sinha, Masaru Ishii, Gregory D. Hager, Austin Reiter, Russell H. Taylor, Mathias Unberath

Abstract: We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires monocular endoscopic videos and a multi-view stereo method, e.g., structure from motion, to supervise learning in a sparse manner. Consequently, our method requires neither manual labeling… ▽ More We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires monocular endoscopic videos and a multi-view stereo method, e.g., structure from motion, to supervise learning in a sparse manner. Consequently, our method requires neither manual labeling nor patient computed tomography (CT) scan in the training and application phases. In a cross-patient experiment using CT scans as groundtruth, the proposed method achieved submillimeter mean residual error. In a comparison study to recent self-supervised depth estimation methods designed for natural video on in vivo sinus endoscopy data, we demonstrate that the proposed approach outperforms the previous methods by a large margin. The source code for this work is publicly available online at https://github.com/lppllppl920/EndoscopyDepthEstimation-Pytorch. △ Less

Submitted 29 October, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

Comments: Accepted to IEEE Transactions on Medical Imaging

arXiv:1806.10748 [pdf, other]

Towards automatic initialization of registration algorithms using simulated endoscopy images

Authors: Ayushi Sinha, Masaru Ishii, Russell H. Taylor, Gregory D. Hager, Austin Reiter

Abstract: Registering images from different modalities is an active area of research in computer aided medical interventions. Several registration algorithms have been developed, many of which achieve high accuracy. However, these results are dependent on many factors, including the quality of the extracted features or segmentations being registered as well as the initial alignment. Although several methods… ▽ More Registering images from different modalities is an active area of research in computer aided medical interventions. Several registration algorithms have been developed, many of which achieve high accuracy. However, these results are dependent on many factors, including the quality of the extracted features or segmentations being registered as well as the initial alignment. Although several methods have been developed towards improving segmentation algorithms and automating the segmentation process, few automatic initialization algorithms have been explored. In many cases, the initial alignment from which a registration is initiated is performed manually, which interferes with the clinical workflow. Our aim is to use scene classification in endoscopic procedures to achieve coarse alignment of the endoscope and a preoperative image of the anatomy. In this paper, we show using simulated scenes that a neural network can predict the region of anatomy (with respect to a preoperative image) that the endoscope is located in by observing a single endoscopic video frame. With limited training and without any hyperparameter tuning, our method achieves an accuracy of 76.53 (+/-1.19)%. There are several avenues for improvement, making this a promising direction of research. Code is available at https://github.com/AyushiSinha/AutoInitialization. △ Less

Submitted 27 June, 2018; originally announced June 2018.

Comments: 4 pages, 4 figures

ACM Class: J.2; J.3; I.2.6; I.2.10; I.3.3; I.3.7

arXiv:1806.09521 [pdf, other]

doi 10.1007/978-3-030-01201-4_15

Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy

Authors: Xingtong Liu, Ayushi Sinha, Mathias Unberath, Masaru Ishii, Gregory Hager, Russell H. Taylor, Austin Reiter

Abstract: We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires sequential data from monocular endoscopic videos and a multi-view stereo reconstruction method, e.g. structure from motion, that supervises learning in a sparse but accurate manner. Consequ… ▽ More We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires sequential data from monocular endoscopic videos and a multi-view stereo reconstruction method, e.g. structure from motion, that supervises learning in a sparse but accurate manner. Consequently, our method requires neither manual interaction, such as scaling or labeling, nor patient CT in the training and application phases. We demonstrate the performance of our method on sinus endoscopy data from two patients and validate depth prediction quantitatively using corresponding patient CT scans where we found submillimeter residual errors. △ Less

Submitted 26 July, 2018; v1 submitted 25 June, 2018; originally announced June 2018.

Comments: 11 pages, 5 figures

arXiv:1806.03997 [pdf, other]

doi 10.1007/978-3-030-00937-3_8

Endoscopic navigation in the absence of CT imaging

Authors: Ayushi Sinha, Xingtong Liu, Austin Reiter, Masaru Ishii, Gregory D. Hager, Russell H. Taylor

Abstract: Clinical examinations that involve endoscopic exploration of the nasal cavity and sinuses often do not have a reference image to provide structural context to the clinician. In this paper, we present a system for navigation during clinical endoscopic exploration in the absence of computed tomography (CT) scans by making use of shape statistics from past CT scans. Using a deformable registration al… ▽ More Clinical examinations that involve endoscopic exploration of the nasal cavity and sinuses often do not have a reference image to provide structural context to the clinician. In this paper, we present a system for navigation during clinical endoscopic exploration in the absence of computed tomography (CT) scans by making use of shape statistics from past CT scans. Using a deformable registration algorithm along with dense reconstructions from video, we show that we are able to achieve submillimeter registrations in in-vivo clinical data and are able to assign confidence to these registrations using confidence criteria established using simulated data. △ Less

Submitted 7 June, 2018; originally announced June 2018.

Comments: 8 pages, 3 figures, MICCAI 2018

ACM Class: G.3; I.4.m; J.3

arXiv:1805.11178 [pdf, other]

Towards computational fluorescence microscopy: Machine learning-based integrated prediction of morphological and molecular tumor profiles

Authors: Alexander Binder, Michael Bockmayr, Miriam Hägele, Stephan Wienert, Daniel Heim, Katharina Hellweg, Albrecht Stenzinger, Laura Parlow, Jan Budczies, Benjamin Goeppert, Denise Treue, Manato Kotani, Masaru Ishii, Manfred Dietel, Andreas Hocke, Carsten Denkert, Klaus-Robert Müller, Frederick Klauschen

Abstract: Recent advances in cancer research largely rely on new developments in microscopic or molecular profiling techniques offering high level of detail with respect to either spatial or molecular features, but usually not both. Here, we present a novel machine learning-based computational approach that allows for the identification of morphological tissue features and the prediction of molecular proper… ▽ More Recent advances in cancer research largely rely on new developments in microscopic or molecular profiling techniques offering high level of detail with respect to either spatial or molecular features, but usually not both. Here, we present a novel machine learning-based computational approach that allows for the identification of morphological tissue features and the prediction of molecular properties from breast cancer imaging data. This integration of microanatomic information of tumors with complex molecular profiling data, including protein or gene expression, copy number variation, gene methylation and somatic mutations, provides a novel means to computationally score molecular markers with respect to their relevance to cancer and their spatial associations within the tumor microenvironment. △ Less

Submitted 28 May, 2018; originally announced May 2018.

arXiv:1610.07931 [pdf, other]

doi 10.1007/978-3-319-46726-9_16

Anatomically Constrained Video-CT Registration via the V-IMLOP Algorithm

Authors: Seth D. Billings, Ayushi Sinha, Austin Reiter, Simon Leonard, Masaru Ishii, Gregory D. Hager, Russell H. Taylor

Abstract: Functional endoscopic sinus surgery (FESS) is a surgical procedure used to treat acute cases of sinusitis and other sinus diseases. FESS is fast becoming the preferred choice of treatment due to its minimally invasive nature. However, due to the limited field of view of the endoscope, surgeons rely on navigation systems to guide them within the nasal cavity. State of the art navigation systems rep… ▽ More Functional endoscopic sinus surgery (FESS) is a surgical procedure used to treat acute cases of sinusitis and other sinus diseases. FESS is fast becoming the preferred choice of treatment due to its minimally invasive nature. However, due to the limited field of view of the endoscope, surgeons rely on navigation systems to guide them within the nasal cavity. State of the art navigation systems report registration accuracy of over 1mm, which is large compared to the size of the nasal airways. We present an anatomically constrained video-CT registration algorithm that incorporates multiple video features. Our algorithm is robust in the presence of outliers. We also test our algorithm on simulated and in-vivo data, and test its accuracy against degrading initializations. △ Less

Submitted 25 October, 2016; originally announced October 2016.

Comments: 8 pages, 4 figures, MICCAI

Journal ref: Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part III. Vol. 9902, pp. 133-141

arXiv:1610.04276 [pdf, ps, other]

Perspectives on Surgical Data Science

Authors: S. Swaroop Vedula, Masaru Ishii, Gregory D. Hager

Abstract: The availability of large amounts of data together with advances in analytical techniques afford an opportunity to address difficult challenges in ensuring that healthcare is safe, effective, efficient, patient-centered, equitable, and timely. Surgical care and training stand to tremendously gain through surgical data science. Herein, we discuss a few perspectives on the scope and objectives for s… ▽ More The availability of large amounts of data together with advances in analytical techniques afford an opportunity to address difficult challenges in ensuring that healthcare is safe, effective, efficient, patient-centered, equitable, and timely. Surgical care and training stand to tremendously gain through surgical data science. Herein, we discuss a few perspectives on the scope and objectives for surgical data science. △ Less

Submitted 13 October, 2016; originally announced October 2016.

Comments: Workshop on Surgical Data Science, Heidelberg, Germany, June 20, 2016

arXiv:1412.6163 [pdf]

Automated Objective Surgical Skill Assessment in the Operating Room Using Unstructured Tool Motion

Authors: Piyush Poddar, Narges Ahmidi, S. Swaroop Vedula, Lisa Ishii, Gregory D. Hager, Masaru Ishii

Abstract: Previous work on surgical skill assessment using intraoperative tool motion in the operating room (OR) has focused on highly-structured surgical tasks such as cholecystectomy. Further, these methods only considered generic motion metrics such as time and number of movements, which are of limited instructive value. In this paper, we developed and evaluated an automated approach to the surgical skil… ▽ More Previous work on surgical skill assessment using intraoperative tool motion in the operating room (OR) has focused on highly-structured surgical tasks such as cholecystectomy. Further, these methods only considered generic motion metrics such as time and number of movements, which are of limited instructive value. In this paper, we developed and evaluated an automated approach to the surgical skill assessment of nasal septoplasty in the OR. The obstructed field of view and highly unstructured nature of septoplasty precludes trainees from efficiently learning the procedure. We propose a descriptive structure of septoplasty consisting of two types of activity: (1) brushing activity directed away from the septum plane characterizing the consistency of the surgeon's wrist motion and (2) activity along the septal plane characterizing the surgeon's coverage pattern. We derived features related to these two activity types that classify a surgeon's level of training with an average accuracy of about 72%. The features we developed provide surgeons with personalized, actionable feedback regarding their tool motion. △ Less

Submitted 18 December, 2014; originally announced December 2014.

arXiv:cs/0111027 [pdf]

Upgrade of Spring-8 Beamline Network with Vlan Technology Over Gigabit Ethernet

Authors: M. Ishii, T. Fukui, Y. Furukawa, T. Nakatani, T. Ohata, R. Tanaka

Abstract: The beamline network system at SPring-8 consists of three LANs; a BL-LAN for beamline component control, a BL-USER-LAN for beamline experimental users and an OA-LAN for the information services. These LANs are interconnected by a firewall system. Since the network traffic and the number of beamlines have increased, we upgraded the backbone of BL-USER-LAN from Fast Ethernet to Gigabit Ethernet. A… ▽ More The beamline network system at SPring-8 consists of three LANs; a BL-LAN for beamline component control, a BL-USER-LAN for beamline experimental users and an OA-LAN for the information services. These LANs are interconnected by a firewall system. Since the network traffic and the number of beamlines have increased, we upgraded the backbone of BL-USER-LAN from Fast Ethernet to Gigabit Ethernet. And then, to establish the independency of a beamline and to raise flexibility of every beamline, we also introduced the IEEE802.1Q Virtual LAN (VLAN) technology into the BL-USER-LAN. We discuss here a future plan to build the firewall system with hardware load balancers. △ Less

Submitted 17 December, 2001; v1 submitted 9 November, 2001; originally announced November 2001.

Comments: 3 pages, 2 figure, 8th International Conference on Accelerator and Large Experimental Physics Control Systems (PSN TUAP056), San Jose, CA, USA, November 27-30

ACM Class: C.2.1

Journal ref: eConf C011127 (2001) TUAP056

Showing 1–40 of 40 results for author: Ishii, M