-
Scaling Law of Sim2Real Transfer Learning in Expanding Computational Materials Databases for Real-World Predictions
Authors:
Shunya Minami,
Yoshihiro Hayashi,
Stephen Wu,
Kenji Fukumizu,
Hiroki Sugisawa,
Masashi Ishii,
Isao Kuwajima,
Kazuya Shiratori,
Ryo Yoshida
Abstract:
To address the challenge of limited experimental materials data, extensive physical property databases are being developed based on high-throughput computational experiments, such as molecular dynamics simulations. Previous studies have shown that fine-tuning a predictor pretrained on a computational database to a real system can result in models with outstanding generalization capabilities compar…
▽ More
To address the challenge of limited experimental materials data, extensive physical property databases are being developed based on high-throughput computational experiments, such as molecular dynamics simulations. Previous studies have shown that fine-tuning a predictor pretrained on a computational database to a real system can result in models with outstanding generalization capabilities compared to learning from scratch. This study demonstrates the scaling law of simulation-to-real (Sim2Real) transfer learning for several machine learning tasks in materials science. Case studies of three prediction tasks for polymers and inorganic materials reveal that the prediction error on real systems decreases according to a power-law as the size of the computational data increases. Observing the scaling behavior offers various insights for database development, such as determining the sample size necessary to achieve a desired performance, identifying equivalent sample sizes for physical and computational experiments, and guiding the design of data production protocols for downstream real-world tasks.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
Authors:
Akio Hayakawa,
Masato Ishii,
Takashi Shibuya,
Yuki Mitsufuji
Abstract:
In this study, we aim to construct an audio-video generative model with minimal computational cost by leveraging pre-trained single-modal generative models for audio and video. To achieve this, we propose a novel method that guides each single-modal model to cooperatively generate well-aligned samples across modalities. Specifically, given two pre-trained base diffusion models, we train a lightwei…
▽ More
In this study, we aim to construct an audio-video generative model with minimal computational cost by leveraging pre-trained single-modal generative models for audio and video. To achieve this, we propose a novel method that guides each single-modal model to cooperatively generate well-aligned samples across modalities. Specifically, given two pre-trained base diffusion models, we train a lightweight joint guidance module to adjust scores separately estimated by the base models to match the score of joint distribution over audio and video. We theoretically show that this guidance can be computed through the gradient of the optimal discriminator distinguishing real audio-video pairs from fake ones independently generated by the base models. On the basis of this analysis, we construct the joint guidance module by training this discriminator. Additionally, we adopt a loss function to make the gradient of the discriminator work as a noise estimator, as in standard diffusion models, stabilizing the gradient of the discriminator. Empirical evaluations on several benchmark datasets demonstrate that our method improves both single-modal fidelity and multi-modal alignment with a relatively small number of parameters.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Authors:
Shiqi Yang,
Zhi Zhong,
Mengjie Zhao,
Shusuke Takahashi,
Masato Ishii,
Takashi Shibuya,
Yuki Mitsufuji
Abstract:
In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation method…
▽ More
In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation methods usually resort to huge large language model or composable diffusion models. Instead of designing another giant model for audio-visual generation, in this paper we take a step back showing a simple and lightweight generative transformer, which is not fully investigated in multi-modal generation, can achieve excellent results on image2audio generation. The transformer operates in the discrete audio and visual Vector-Quantized GAN space, and is trained in the mask denoising manner. After training, the classifier-free guidance could be deployed off-the-shelf achieving better performance, without any extra training or modification. Since the transformer model is modality symmetrical, it could also be directly deployed for audio2image generation and co-generation. In the experiments, we show that our simple method surpasses recent image2audio generation methods. Generated audio samples can be found at https://docs.google.com/presentation/d/1ZtC0SeblKkut4XJcRaDsSTuCRIXB3ypxmSi7HTY3IyQ/
△ Less
Submitted 24 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
An Endoscopic Chisel: Intraoperative Imaging Carves 3D Anatomical Models
Authors:
Jan Emily Mangulabnan,
Roger D. Soberanis-Mukul,
Timo Teufel,
Manish Sahu,
Jose L. Porras,
S. Swaroop Vedula,
Masaru Ishii,
Gregory Hager,
Russell H. Taylor,
Mathias Unberath
Abstract:
Purpose: Preoperative imaging plays a pivotal role in sinus surgery where CTs offer patient-specific insights of complex anatomy, enabling real-time intraoperative navigation to complement endoscopy imaging. However, surgery elicits anatomical changes not represented in the preoperative model, generating an inaccurate basis for navigation during surgery progression.
Methods: We propose a first v…
▽ More
Purpose: Preoperative imaging plays a pivotal role in sinus surgery where CTs offer patient-specific insights of complex anatomy, enabling real-time intraoperative navigation to complement endoscopy imaging. However, surgery elicits anatomical changes not represented in the preoperative model, generating an inaccurate basis for navigation during surgery progression.
Methods: We propose a first vision-based approach to update the preoperative 3D anatomical model leveraging intraoperative endoscopic video for navigated sinus surgery where relative camera poses are known. We rely on comparisons of intraoperative monocular depth estimates and preoperative depth renders to identify modified regions. The new depths are integrated in these regions through volumetric fusion in a truncated signed distance function representation to generate an intraoperative 3D model that reflects tissue manipulation.
Results: We quantitatively evaluate our approach by sequentially updating models for a five-step surgical progression in an ex vivo specimen. We compute the error between correspondences from the updated model and ground-truth intraoperative CT in the region of anatomical modification. The resulting models show a decrease in error during surgical progression as opposed to increasing when no update is employed.
Conclusion: Our findings suggest that preoperative 3D anatomical models can be updated using intraoperative endoscopy video in navigated sinus surgery. Future work will investigate improvements to monocular depth estimation as well as removing the need for external navigation systems. The resulting ability to continuously update the patient model may provide surgeons with a more precise understanding of the current anatomical state and paves the way toward a digital twin paradigm for sinus surgery.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Mining experimental data from Materials Science literature with Large Language Models: an evaluation study
Authors:
Luca Foppiano,
Guillaume Lambard,
Toshiyuki Amagasa,
Masashi Ishii
Abstract:
This study is dedicated to assessing the capabilities of large language models (LLMs) such as GPT-3.5-Turbo, GPT-4, and GPT-4-Turbo in extracting structured information from scientific documents in materials science. To this end, we primarily focus on two critical tasks of information extraction: (i) a named entity recognition (NER) of studied materials and physical properties and (ii) a relation…
▽ More
This study is dedicated to assessing the capabilities of large language models (LLMs) such as GPT-3.5-Turbo, GPT-4, and GPT-4-Turbo in extracting structured information from scientific documents in materials science. To this end, we primarily focus on two critical tasks of information extraction: (i) a named entity recognition (NER) of studied materials and physical properties and (ii) a relation extraction (RE) between these entities. Due to the evident lack of datasets within Materials Informatics (MI), we evaluated using SuperMat, based on superconductor research, and MeasEval, a generic measurement evaluation corpus. The performance of LLMs in executing these tasks is benchmarked against traditional models based on the BERT architecture and rule-based approaches (baseline). We introduce a novel methodology for the comparative analysis of intricate material expressions, emphasising the standardisation of chemical formulas to tackle the complexities inherent in materials science information assessment. For NER, LLMs fail to outperform the baseline with zero-shot prompting and exhibit only limited improvement with few-shot prompting. However, a GPT-3.5-Turbo fine-tuned with the appropriate strategy for RE outperforms all models, including the baseline. Without any fine-tuning, GPT-4 and GPT-4-Turbo display remarkable reasoning and relationship extraction capabilities after being provided with merely a couple of examples, surpassing the baseline. Overall, the results suggest that although LLMs demonstrate relevant reasoning skills in connecting concepts, specialised models are currently a better choice for tasks requiring extracting complex domain-specific entities like materials. These insights provide initial guidance applicable to other materials science sub-domains in future work.
△ Less
Submitted 30 May, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Image Synthesis-based Late Stage Cancer Augmentation and Semi-Supervised Segmentation for MRI Rectal Cancer Staging
Authors:
Saeko Sasuga,
Akira Kudo,
Yoshiro Kitamura,
Satoshi Iizuka,
Edgar Simo-Serra,
Atsushi Hamabe,
Masayuki Ishii,
Ichiro Takemasa
Abstract:
Rectal cancer is one of the most common diseases and a major cause of mortality. For deciding rectal cancer treatment plans, T-staging is important. However, evaluating the index from preoperative MRI images requires high radiologists' skill and experience. Therefore, the aim of this study is to segment the mesorectum, rectum, and rectal cancer region so that the system can predict T-stage from se…
▽ More
Rectal cancer is one of the most common diseases and a major cause of mortality. For deciding rectal cancer treatment plans, T-staging is important. However, evaluating the index from preoperative MRI images requires high radiologists' skill and experience. Therefore, the aim of this study is to segment the mesorectum, rectum, and rectal cancer region so that the system can predict T-stage from segmentation results. Generally, shortage of large and diverse dataset and high quality annotation are known to be the bottlenecks in computer aided diagnostics development. Regarding rectal cancer, advanced cancer images are very rare, and per-pixel annotation requires high radiologists' skill and time. Therefore, it is not feasible to collect comprehensive disease patterns in a training dataset. To tackle this, we propose two kinds of approaches of image synthesis-based late stage cancer augmentation and semi-supervised learning which is designed for T-stage prediction. In the image synthesis data augmentation approach, we generated advanced cancer images from labels. The real cancer labels were deformed to resemble advanced cancer labels by artificial cancer progress simulation. Next, we introduce a T-staging loss which enables us to train segmentation models from per-image T-stage labels. The loss works to keep inclusion/invasion relationships between rectum and cancer region consistent to the ground truth T-stage. The verification tests show that the proposed method obtains the best sensitivity (0.76) and specificity (0.80) in distinguishing between over T3 stage and underT2. In the ablation studies, our semi-supervised learning approach with the T-staging loss improved specificity by 0.13. Adding the image synthesis-based data augmentation improved the DICE score of invasion cancer area by 0.08 from baseline.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video
Authors:
Jan Emily Mangulabnan,
Roger D. Soberanis-Mukul,
Timo Teufel,
Isabela Hernández,
Jonas Winter,
Manish Sahu,
Jose L. Porras,
S. Swaroop Vedula,
Masaru Ishii,
Gregory Hager,
Russell H. Taylor,
Mathias Unberath
Abstract:
Generating accurate 3D reconstructions from endoscopic video is a promising avenue for longitudinal radiation-free analysis of sinus anatomy and surgical outcomes. Several methods for monocular reconstruction have been proposed, yielding visually pleasant 3D anatomical structures by retrieving relative camera poses with structure-from-motion-type algorithms and fusion of monocular depth estimates.…
▽ More
Generating accurate 3D reconstructions from endoscopic video is a promising avenue for longitudinal radiation-free analysis of sinus anatomy and surgical outcomes. Several methods for monocular reconstruction have been proposed, yielding visually pleasant 3D anatomical structures by retrieving relative camera poses with structure-from-motion-type algorithms and fusion of monocular depth estimates. However, due to the complex properties of the underlying algorithms and endoscopic scenes, the reconstruction pipeline may perform poorly or fail unexpectedly. Further, acquiring medical data conveys additional challenges, presenting difficulties in quantitatively benchmarking these models, understanding failure cases, and identifying critical components that contribute to their precision. In this work, we perform a quantitative analysis of a self-supervised approach for sinus reconstruction using endoscopic sequences paired with optical tracking and high-resolution computed tomography acquired from nine ex-vivo specimens. Our results show that the generated reconstructions are in high agreement with the anatomy, yielding an average point-to-mesh error of 0.91 mm between reconstructions and CT segmentations. However, in a point-to-point matching scenario, relevant for endoscope tracking and navigation, we found average target registration errors of 6.58 mm. We identified that pose and depth estimation inaccuracies contribute equally to this error and that locally consistent sequences with shorter trajectories generate more accurate reconstructions. These results suggest that achieving global consistency between relative camera poses and estimated depths with the anatomy is essential. In doing so, we can ensure proper synergy between all components of the pipeline for improved reconstructions that will facilitate clinical application of this innovative technology.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
Semi-automatic staging area for high-quality structured data extraction from scientific literature
Authors:
Luca Foppiano,
Tomoya Mato,
Kensei Terashima,
Pedro Ortiz Suarez,
Taku Tou,
Chikako Sakai,
Wei-Sheng Wang,
Toshiyuki Amagasa,
Yoshihiko Takano,
Masashi Ishii
Abstract:
We propose a semi-automatic staging area for efficiently building an accurate database of experimental physical properties of superconductors from literature, called SuperCon2, to enrich the existing manually-built superconductor database SuperCon. Here we report our curation interface (SuperCon2 Interface) and a workflow managing the state transitions of each examined record, to validate the data…
▽ More
We propose a semi-automatic staging area for efficiently building an accurate database of experimental physical properties of superconductors from literature, called SuperCon2, to enrich the existing manually-built superconductor database SuperCon. Here we report our curation interface (SuperCon2 Interface) and a workflow managing the state transitions of each examined record, to validate the dataset of superconductors from PDF documents collected using Grobid-superconductors in a previous work. This curation workflow allows both automatic and manual operations, the former contains ``anomaly detection'' that scans new data identifying outliers, and a ``training data collector'' mechanism that collects training data examples based on manual corrections. Such training data collection policy is effective in improving the machine-learning models with a reduced number of examples. For manual operations, the interface (SuperCon2 interface) is developed to increase efficiency during manual correction by providing a smart interface and an enhanced PDF document viewer. We show that our interface significantly improves the curation quality by boosting precision and recall as compared with the traditional ``manual correction''. Our semi-automatic approach would provide a solution for achieving a reliable database with text-data mining of scientific documents.
△ Less
Submitted 16 November, 2023; v1 submitted 19 September, 2023;
originally announced September 2023.
-
The Quiet Eye Phenomenon in Minimally Invasive Surgery
Authors:
Alaa Eldin Abdelaal,
Rachelle Van Rumpt,
Sayem Nazmuz Zaman,
Irene Tong,
Anthony Jarc,
Gary L. Gallia,
Masaru Ishii,
Gregory D. Hager,
Septimiu E. Salcudean
Abstract:
In this paper, we report our discovery of a gaze behavior called Quiet Eye (QE) in minimally invasive surgery. The QE behavior has been extensively studied in sports training and has been associated with higher level of expertise in multiple sports. We investigated the QE behavior in two independently collected data sets of surgeons performing tasks in a sinus surgery setting and a robotic surgery…
▽ More
In this paper, we report our discovery of a gaze behavior called Quiet Eye (QE) in minimally invasive surgery. The QE behavior has been extensively studied in sports training and has been associated with higher level of expertise in multiple sports. We investigated the QE behavior in two independently collected data sets of surgeons performing tasks in a sinus surgery setting and a robotic surgery setting, respectively. Our results show that the QE behavior is more likely to occur in successful task executions and in performances of surgeons of high level of expertise. These results open the door to use the QE behavior in both training and skill assessment in minimally invasive surgery.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion
Authors:
Hiromichi Kamata,
Yuiko Sakuma,
Akio Hayakawa,
Masato Ishii,
Takuya Narihira
Abstract:
We propose a high-quality 3D-to-3D conversion method, Instruct 3D-to-3D. Our method is designed for a novel task, which is to convert a given 3D scene to another scene according to text instructions. Instruct 3D-to-3D applies pretrained Image-to-Image diffusion models for 3D-to-3D conversion. This enables the likelihood maximization of each viewpoint image and high-quality 3D generation. In additi…
▽ More
We propose a high-quality 3D-to-3D conversion method, Instruct 3D-to-3D. Our method is designed for a novel task, which is to convert a given 3D scene to another scene according to text instructions. Instruct 3D-to-3D applies pretrained Image-to-Image diffusion models for 3D-to-3D conversion. This enables the likelihood maximization of each viewpoint image and high-quality 3D generation. In addition, our proposed method explicitly inputs the source 3D scene as a condition, which enhances 3D consistency and controllability of how much of the source 3D scene structure is reflected. We also propose dynamic scaling, which allows the intensity of the geometry transformation to be adjusted. We performed quantitative and qualitative evaluations and showed that our proposed method achieves higher quality 3D-to-3D conversions than baseline methods.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
DetOFA: Efficient Training of Once-for-All Networks for Object Detection Using Path Filter
Authors:
Yuiko Sakuma,
Masato Ishii,
Takuya Narihira
Abstract:
We address the challenge of training a large supernet for the object detection task, using a relatively small amount of training data. Specifically, we propose an efficient supernet-based neural architecture search (NAS) method that uses search space pruning. The search space defined by the supernet is pruned by removing candidate models that are predicted to perform poorly. To effectively remove…
▽ More
We address the challenge of training a large supernet for the object detection task, using a relatively small amount of training data. Specifically, we propose an efficient supernet-based neural architecture search (NAS) method that uses search space pruning. The search space defined by the supernet is pruned by removing candidate models that are predicted to perform poorly. To effectively remove the candidates over a wide range of resource constraints, we particularly design a performance predictor for supernet, called path filter, which is conditioned by resource constraints and can accurately predict the relative performance of the models that satisfy similar resource constraints. Hence, supernet training is more focused on the best-performing candidates. Our path filter handles prediction for paths with different resource budgets. Compared to once-for-all, our proposed method reduces the computational cost of the optimal network architecture by 30% and 63%, while yielding better accuracy-floating point operations Pareto front (0.85 and 0.45 points of improvement on average precision for Pascal VOC and COCO, respectively).
△ Less
Submitted 19 October, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models
Authors:
Naoki Matsunaga,
Masato Ishii,
Akio Hayakawa,
Kenji Suzuki,
Takuya Narihira
Abstract:
Our goal is to develop fine-grained real-image editing methods suitable for real-world applications. In this paper, we first summarize four requirements for these methods and propose a novel diffusion-based image editing framework with pixel-wise guidance that satisfies these requirements. Specifically, we train pixel-classifiers with a few annotated data and then infer the segmentation map of a t…
▽ More
Our goal is to develop fine-grained real-image editing methods suitable for real-world applications. In this paper, we first summarize four requirements for these methods and propose a novel diffusion-based image editing framework with pixel-wise guidance that satisfies these requirements. Specifically, we train pixel-classifiers with a few annotated data and then infer the segmentation map of a target image. Users then manipulate the map to instruct how the image will be edited. We utilize a pre-trained diffusion model to generate edited images aligned with the user's intention with pixel-wise guidance. The effective combination of proposed guidance and other techniques enables highly controllable editing with preserving the outside of the edited area, which results in meeting our requirements. The experimental results demonstrate that our proposal outperforms the GAN-based method for editing quality and speed.
△ Less
Submitted 31 May, 2023; v1 submitted 4 December, 2022;
originally announced December 2022.
-
Automatic extraction of materials and properties from superconductors scientific literature
Authors:
Luca Foppiano,
Pedro Baptista de Castro,
Pedro Ortiz Suarez,
Kensei Terashima,
Yoshihiko Takano,
Masashi Ishii
Abstract:
The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic a…
▽ More
The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40324 materials and properties records from 37700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method.
△ Less
Submitted 22 November, 2022; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Scalable adaptive algorithms for next-generation multiphase flow simulations
Authors:
Kumar Saurabh,
Masado Ishii,
Makrand A. Khanwale,
Hari Sundar,
Baskar Ganapathysubramanian
Abstract:
High-fidelity flow simulations are indispensable when analyzing systems exhibiting multiphase flow phenomena. The accuracy of multiphase flow simulations is strongly contingent upon the finest mesh resolution used to represent the fluid-fluid interfaces. However, the increased resolution comes at a higher computational cost. In this work, we propose algorithmic advances that aim to reduce the comp…
▽ More
High-fidelity flow simulations are indispensable when analyzing systems exhibiting multiphase flow phenomena. The accuracy of multiphase flow simulations is strongly contingent upon the finest mesh resolution used to represent the fluid-fluid interfaces. However, the increased resolution comes at a higher computational cost. In this work, we propose algorithmic advances that aim to reduce the computational cost without compromising on the physics by selectively detecting key regions of interest (droplets/filaments) that require significantly higher resolution. The framework uses an adaptive octree-based meshing framework that is integrated with PETSc's linear algebra solvers. We demonstrate scaling of the framework up to 114,688 processes on TACC's Frontera. Finally, we deploy the framework to simulate one of the most resolved simulations of primary jet atomization. This simulation -- equivalent to 35 trillion grid points on a uniform grid -- is 64 times larger than the current state-of-the-art simulations and provides unprecedented insights into an important flow physics problem with a diverse array of engineering applications.
△ Less
Submitted 3 April, 2023; v1 submitted 24 September, 2022;
originally announced September 2022.
-
SAGE: SLAM with Appearance and Geometry Prior for Endoscopy
Authors:
Xingtong Liu,
Zhaoshuo Li,
Masaru Ishii,
Gregory D. Hager,
Russell H. Taylor,
Mathias Unberath
Abstract:
In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor grap…
▽ More
In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The appearance and geometry priors are explicitly learned in an end-to-end differentiable training pipeline to master the task of pair-wise image alignment, one of the core components of the SLAM system. In our experiments, the proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy. The system generalizes well to unseen endoscopes and subjects and performs favorably compared with a state-of-the-art feature-based SLAM system. The code repository is available at https://github.com/lppllppl920/SAGE-SLAM.git.
△ Less
Submitted 22 February, 2022; v1 submitted 18 February, 2022;
originally announced February 2022.
-
Frailty Care Robot for Elderly and Its Application for Physical and Psychological Support
Authors:
Yoichi Yamazaki,
Masayuki Ishii,
Takahiro Ito,
Takuya Hashimoto
Abstract:
To achieve continuous frail care in the daily lives of the elderly, we propose AHOBO, a frail care robot for the elderly at home. Two types of support systems by AHOBO were implemented to support the elderly in both physical health and psychological aspects. For physical health frailty care, we focused on blood pressure and developed a support system for blood pressure measurement with AHOBO. For…
▽ More
To achieve continuous frail care in the daily lives of the elderly, we propose AHOBO, a frail care robot for the elderly at home. Two types of support systems by AHOBO were implemented to support the elderly in both physical health and psychological aspects. For physical health frailty care, we focused on blood pressure and developed a support system for blood pressure measurement with AHOBO. For psychological frailty care, we implemented reminiscent coloring with the AHOBO as a recreational activity with the robot. The usability of the system was evaluated based on the assumption of continuous use in daily life. For the support system in blood pressure measurement, we performed a qualitative evaluation using a questionnaire for 16 subjects, including elderly people under blood pressure measurement by the system. The results confirmed that the proposed robot does not affect the blood pressure readings and is acceptable in terms of ease of use based on subjective evaluation. For the reminiscent coloring interaction, a subjective evaluation was conducted on two elderly people under the verbal fluency task, and it has been confirmed that the interaction can be used continuously in daily life. The widespread use of the proposed robot as an interface for AI that supports daily life will lead to a society in which AI robots support people from the cradle to the grave.
△ Less
Submitted 20 November, 2021;
originally announced November 2021.
-
Case study of SARS-CoV-2 transmission risk assessment in indoor environments using cloud computing resources
Authors:
Kumar Saurabh,
Santi Adavani,
Kendrick Tan,
Masado Ishii,
Boshun Gao,
Adarsh Krishnamurthy,
Hari Sundar,
Baskar Ganapathysubramanian
Abstract:
Complex flow simulations are conventionally performed on HPC clusters. However, the limited availability of HPC resources and steep learning curve of executing on traditional supercomputer infrastructure has drawn attention towards deploying flow simulation software on the cloud. We showcase how a complex computational framework -- that can evaluate COVID-19 transmission risk in various indoor cla…
▽ More
Complex flow simulations are conventionally performed on HPC clusters. However, the limited availability of HPC resources and steep learning curve of executing on traditional supercomputer infrastructure has drawn attention towards deploying flow simulation software on the cloud. We showcase how a complex computational framework -- that can evaluate COVID-19 transmission risk in various indoor classroom scenarios -- can be abstracted and deployed on cloud services. The availability of such cloud-based personalized planning tools can enable educational institutions, medical institutions, public sector workers (courthouses, police stations, airports, etc.), and other entities to comprehensively evaluate various in-person interaction scenarios for transmission risk. We deploy the simulation framework on the Azure cloud framework, utilizing the Dendro-kT mesh generation tool and PETSc solvers. The cloud abstraction is provided by RocketML cloud infrastructure. We compare the performance of the cloud machines with state-of-the-art HPC machine TACC Frontera. Our results suggest that cloud-based HPC resources are a viable strategy for a diverse array of end-users to rapidly and efficiently deploy simulation software.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Scalable adaptive PDE solvers in arbitrary domains
Authors:
Kumar Saurabh,
Masado Ishii,
Milinda Fernando,
Boshun Gao,
Kendrick Tan,
Ming-Chen Hsu,
Adarsh Krishnamurthy,
Hari Sundar,
Baskar Ganapathysubramanian
Abstract:
Efficiently and accurately simulating partial differential equations (PDEs) in and around arbitrarily defined geometries, especially with high levels of adaptivity, has significant implications for different application domains. A key bottleneck in the above process is the fast construction of a `good' adaptively-refined mesh. In this work, we present an efficient novel octree-based adaptive discr…
▽ More
Efficiently and accurately simulating partial differential equations (PDEs) in and around arbitrarily defined geometries, especially with high levels of adaptivity, has significant implications for different application domains. A key bottleneck in the above process is the fast construction of a `good' adaptively-refined mesh. In this work, we present an efficient novel octree-based adaptive discretization approach capable of carving out arbitrarily shaped void regions from the parent domain: an essential requirement for fluid simulations around complex objects. Carving out objects produces an $\textit{incomplete}$ octree. We develop efficient top-down and bottom-up traversal methods to perform finite element computations on $\textit{incomplete}$ octrees. We validate the framework by (a) showing appropriate convergence analysis and (b) computing the drag coefficient for flow past a sphere for a wide range of Reynolds numbers ($\mathcal{O}(1-10^6)$) encompassing the drag crisis regime. Finally, we deploy the framework on a realistic geometry on a current project to evaluate COVID-19 transmission risk in classrooms.
△ Less
Submitted 8 August, 2021;
originally announced August 2021.
-
Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods
Authors:
Etienne David,
Mario Serouart,
Daniel Smith,
Simon Madec,
Kaaviya Velumani,
Shouyang Liu,
Xu Wang,
Francisco Pinto Espinosa,
Shahameh Shafiee,
Izzat S. A. Tahir,
Hisashi Tsujimoto,
Shuhei Nasuda,
Bangyou Zheng,
Norbert Kichgessner,
Helge Aasen,
Andreas Hund,
Pouria Sadhegi-Tehran,
Koichi Nagasawa,
Goro Ishikawa,
Sébastien Dandrifosse,
Alexis Carlier,
Benoit Mercatoris,
Ken Kuroki,
Haozhou Wang,
Masanori Ishii
, et al. (10 additional authors not shown)
Abstract:
The Global Wheat Head Detection (GWHD) dataset was created in 2020 and has assembled 193,634 labelled wheat heads from 4,700 RGB images acquired from various acquisition platforms and 7 countries/institutions. With an associated competition hosted in Kaggle, GWHD has successfully attracted attention from both the computer vision and agricultural science communities. From this first experience in 2…
▽ More
The Global Wheat Head Detection (GWHD) dataset was created in 2020 and has assembled 193,634 labelled wheat heads from 4,700 RGB images acquired from various acquisition platforms and 7 countries/institutions. With an associated competition hosted in Kaggle, GWHD has successfully attracted attention from both the computer vision and agricultural science communities. From this first experience in 2020, a few avenues for improvements have been identified, especially from the perspective of data size, head diversity and label reliability. To address these issues, the 2020 dataset has been reexamined, relabeled, and augmented by adding 1,722 images from 5 additional countries, allowing for 81,553 additional wheat heads to be added. We now release a new version of the Global Wheat Head Detection (GWHD) dataset in 2021, which is bigger, more diverse, and less noisy than the 2020 version. The GWHD 2021 is now publicly available at http://www.global-wheat.com/ and a new data challenge has been organized on AIcrowd to make use of this updated dataset.
△ Less
Submitted 3 June, 2021; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Semi-supervised learning by selective training with pseudo labels via confidence estimation
Authors:
Masato Ishii
Abstract:
We propose a novel semi-supervised learning (SSL) method that adopts selective training with pseudo labels. In our method, we generate hard pseudo-labels and also estimate their confidence, which represents how likely each pseudo-label is to be correct. Then, we explicitly select which pseudo-labeled data should be used to update the model. Specifically, assuming that loss on incorrectly pseudo-la…
▽ More
We propose a novel semi-supervised learning (SSL) method that adopts selective training with pseudo labels. In our method, we generate hard pseudo-labels and also estimate their confidence, which represents how likely each pseudo-label is to be correct. Then, we explicitly select which pseudo-labeled data should be used to update the model. Specifically, assuming that loss on incorrectly pseudo-labeled data sensitively increase against data augmentation, we select the data corresponding to relatively small loss after applying data augmentation. The confidence is used not only for screening candidates of pseudo-labeled data to be selected but also for automatically deciding how many pseudo-labeled data should be selected within a mini-batch. Since accurate estimation of the confidence is crucial in our method, we also propose a new data augmentation method, called MixConf, that enables us to obtain confidence-calibrated models even when the number of training data is small. Experimental results with several benchmark datasets validate the advantage of our SSL method as well as MixConf.
△ Less
Submitted 15 March, 2021;
originally announced March 2021.
-
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Authors:
Andrew Shin,
Masato Ishii,
Takuya Narihira
Abstract:
Transformer architectures have brought about fundamental changes to computational linguistic field, which had been dominated by recurrent neural networks for many years. Its success also implies drastic changes in cross-modal tasks with language and vision, and many researchers have already tackled the issue. In this paper, we review some of the most critical milestones in the field, as well as ov…
▽ More
Transformer architectures have brought about fundamental changes to computational linguistic field, which had been dominated by recurrent neural networks for many years. Its success also implies drastic changes in cross-modal tasks with language and vision, and many researchers have already tackled the issue. In this paper, we review some of the most critical milestones in the field, as well as overall trends on how transformer architecture has been incorporated into visuolinguistic cross-modal tasks. Furthermore, we discuss its current limitations and speculate upon some of the prospects that we find imminent.
△ Less
Submitted 9 November, 2021; v1 submitted 6 March, 2021;
originally announced March 2021.
-
Neural Network Libraries: A Deep Learning Framework Designed from Engineers' Perspectives
Authors:
Takuya Narihira,
Javier Alonsogarcia,
Fabien Cardinaux,
Akio Hayakawa,
Masato Ishii,
Kazunori Iwaki,
Thomas Kemp,
Yoshiyuki Kobayashi,
Lukas Mauch,
Akira Nakamura,
Yukio Obuchi,
Andrew Shin,
Kenji Suzuki,
Stephen Tiedmann,
Stefan Uhlich,
Takuya Yashima,
Kazuki Yoshiyama
Abstract:
While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between different tools. In this paper, we introduce Neural Network Libraries (https://nnabla.org), a deep learning framework designed from engineer's perspe…
▽ More
While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between different tools. In this paper, we introduce Neural Network Libraries (https://nnabla.org), a deep learning framework designed from engineer's perspective, with emphasis on usability and compatibility as its core design principles. We elaborate on each of our design principles and its merits, and validate our attempts via experiments.
△ Less
Submitted 21 June, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Source-free Domain Adaptation via Distributional Alignment by Matching Batch Normalization Statistics
Authors:
Masato Ishii,
Masashi Sugiyama
Abstract:
In this paper, we propose a novel domain adaptation method for the source-free setting. In this setting, we cannot access source data during adaptation, while unlabeled target data and a model pretrained with source data are given. Due to lack of source data, we cannot directly match the data distributions between domains unlike typical domain adaptation algorithms. To cope with this problem, we p…
▽ More
In this paper, we propose a novel domain adaptation method for the source-free setting. In this setting, we cannot access source data during adaptation, while unlabeled target data and a model pretrained with source data are given. Due to lack of source data, we cannot directly match the data distributions between domains unlike typical domain adaptation algorithms. To cope with this problem, we propose utilizing batch normalization statistics stored in the pretrained model to approximate the distribution of unobserved source data. Specifically, we fix the classifier part of the model during adaptation and only fine-tune the remaining feature encoder part so that batch normalization statistics of the features extracted by the encoder match those stored in the fixed classifier. Additionally, we also maximize the mutual information between the features and the classifier's outputs to further boost the classification performance. Experimental results with several benchmark datasets show that our method achieves competitive performance with state-of-the-art domain adaptation methods even though it does not require access to source data.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Ultra-low power on-chip learning of speech commands with phase-change memories
Authors:
Venkata Pavan Kumar Miriyala,
Masatoshi Ishii
Abstract:
Embedding artificial intelligence at the edge (edge-AI) is an elegant solution to tackle the power and latency issues in the rapidly expanding Internet of Things. As edge devices typically spend most of their time in sleep mode and only wake-up infrequently to collect and process sensor data, non-volatile in-memory computing (NVIMC) is a promising approach to design the next generation of edge-AI…
▽ More
Embedding artificial intelligence at the edge (edge-AI) is an elegant solution to tackle the power and latency issues in the rapidly expanding Internet of Things. As edge devices typically spend most of their time in sleep mode and only wake-up infrequently to collect and process sensor data, non-volatile in-memory computing (NVIMC) is a promising approach to design the next generation of edge-AI devices. Recently, we proposed an NVIMC-based neuromorphic accelerator using the phase change memories (PCMs), which we call as Raven. In this work, we demonstrate the ultra-low-power on-chip training and inference of speech commands using Raven. We showed that Raven can be trained on-chip with power consumption as low as 30~uW, which is suitable for edge applications. Furthermore, we showed that at iso-accuracies, Raven needs 70.36x and 269.23x less number of computations to be performed than a deep neural network (DNN) during inference and training, respectively. Owing to such low power and computational requirements, Raven provides a promising pathway towards ultra-low-power training and inference at the edge.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Learning Representations of Endoscopic Videos to Detect Tool Presence Without Supervision
Authors:
David Z. Li,
Masaru Ishii,
Russell H. Taylor,
Gregory D. Hager,
Ayushi Sinha
Abstract:
In this work, we explore whether it is possible to learn representations of endoscopic video frames to perform tasks such as identifying surgical tool presence without supervision. We use a maximum mean discrepancy (MMD) variational autoencoder (VAE) to learn low-dimensional latent representations of endoscopic videos and manipulate these representations to distinguish frames containing tools from…
▽ More
In this work, we explore whether it is possible to learn representations of endoscopic video frames to perform tasks such as identifying surgical tool presence without supervision. We use a maximum mean discrepancy (MMD) variational autoencoder (VAE) to learn low-dimensional latent representations of endoscopic videos and manipulate these representations to distinguish frames containing tools from those without tools. We use three different methods to manipulate these latent representations in order to predict tool presence in each frame. Our fully unsupervised methods can identify whether endoscopic video frames contain tools with average precision of 71.56, 73.93, and 76.18, respectively, comparable to supervised methods. Our code is available at https://github.com/zdavidli/tool-presence/
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Reconstructing Sinus Anatomy from Endoscopic Video -- Towards a Radiation-free Approach for Quantitative Longitudinal Assessment
Authors:
Xingtong Liu,
Maia Stiber,
Jindan Huang,
Masaru Ishii,
Gregory D. Hager,
Russell H. Taylor,
Mathias Unberath
Abstract:
Reconstructing accurate 3D surface models of sinus anatomy directly from an endoscopic video is a promising avenue for cross-sectional and longitudinal analysis to better understand the relationship between sinus anatomy and surgical outcomes. We present a patient-specific, learning-based method for 3D reconstruction of sinus surface anatomy directly and only from endoscopic videos. We demonstrate…
▽ More
Reconstructing accurate 3D surface models of sinus anatomy directly from an endoscopic video is a promising avenue for cross-sectional and longitudinal analysis to better understand the relationship between sinus anatomy and surgical outcomes. We present a patient-specific, learning-based method for 3D reconstruction of sinus surface anatomy directly and only from endoscopic videos. We demonstrate the effectiveness and accuracy of our method on in and ex vivo data where we compare to sparse reconstructions from Structure from Motion, dense reconstruction from COLMAP, and ground truth anatomy from CT. Our textured reconstructions are watertight and enable measurement of clinically relevant parameters in good agreement with CT. The source code is available at https://github.com/lppllppl920/DenseReconstruction-Pytorch.
△ Less
Submitted 2 July, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.
-
Extremely Dense Point Correspondences using a Learned Feature Descriptor
Authors:
Xingtong Liu,
Yiping Zheng,
Benjamin Killeen,
Masaru Ishii,
Gregory D. Hager,
Russell H. Taylor,
Mathias Unberath
Abstract:
High-quality 3D reconstructions from endoscopy video play an important role in many clinical applications, including surgical navigation where they enable direct video-CT registration. While many methods exist for general multi-view 3D reconstruction, these methods often fail to deliver satisfactory performance on endoscopic video. Part of the reason is that local descriptors that establish pair-w…
▽ More
High-quality 3D reconstructions from endoscopy video play an important role in many clinical applications, including surgical navigation where they enable direct video-CT registration. While many methods exist for general multi-view 3D reconstruction, these methods often fail to deliver satisfactory performance on endoscopic video. Part of the reason is that local descriptors that establish pair-wise point correspondences, and thus drive reconstruction, struggle when confronted with the texture-scarce surface of anatomy. Learning-based dense descriptors usually have larger receptive fields enabling the encoding of global information, which can be used to disambiguate matches. In this work, we present an effective self-supervised training scheme and novel loss design for dense descriptor learning. In direct comparison to recent local and dense descriptors on an in-house sinus endoscopy dataset, we demonstrate that our proposed dense descriptor can generalize to unseen patients and scopes, thereby largely improving the performance of Structure from Motion (SfM) in terms of model density and completeness. We also evaluate our method on a public dense optical flow dataset and a small-scale SfM public dataset to further demonstrate the effectiveness and generality of our method. The source code is available at https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch.
△ Less
Submitted 27 March, 2020; v1 submitted 1 March, 2020;
originally announced March 2020.
-
Self-supervised Dense 3D Reconstruction from Monocular Endoscopic Video
Authors:
Xingtong Liu,
Ayushi Sinha,
Masaru Ishii,
Gregory D. Hager,
Russell H. Taylor,
Mathias Unberath
Abstract:
We present a self-supervised learning-based pipeline for dense 3D reconstruction from full-length monocular endoscopic videos without a priori modeling of anatomy or shading. Our method only relies on unlabeled monocular endoscopic videos and conventional multi-view stereo algorithms, and requires neither manual interaction nor patient CT in both training and application phases. In a cross-patient…
▽ More
We present a self-supervised learning-based pipeline for dense 3D reconstruction from full-length monocular endoscopic videos without a priori modeling of anatomy or shading. Our method only relies on unlabeled monocular endoscopic videos and conventional multi-view stereo algorithms, and requires neither manual interaction nor patient CT in both training and application phases. In a cross-patient study using CT scans as groundtruth, we show that our method is able to produce photo-realistic dense 3D reconstructions with submillimeter mean residual errors from endoscopic videos from unseen patients and scopes.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
A physics-informed reinforcement learning approach for the interfacial area transport in two-phase flow
Authors:
Zhuoran Dang,
Mamoru Ishii
Abstract:
The prediction of interfacial structure in two-phase flow systems is difficult and challenging. In this paper, a novel physics-informed reinforcement learning-aided framework (PIRLF) for the interfacial area transport is proposed. A Markov Decision Process that describes the bubble transport is established by assuming that the development of two-phase flow is a stochastic process with Markov prope…
▽ More
The prediction of interfacial structure in two-phase flow systems is difficult and challenging. In this paper, a novel physics-informed reinforcement learning-aided framework (PIRLF) for the interfacial area transport is proposed. A Markov Decision Process that describes the bubble transport is established by assuming that the development of two-phase flow is a stochastic process with Markov property. The framework aims to capture the complexity of two-phase flow using the advantage of reinforcement learning (RL) in discovering complex patterns with the help of the physical model (Interfacial Area Transport Equation) as reference. The details of the framework design are described including the design of the environment and the algorithm used in solving the RL problem. The performance of the PIRLF is tested through experiments using the experimental database for vertical upward bubbly air-water flows. The result shows a good performance of PIRLF with rRMSE of 6.556%. The case studies on the PIRLF performance also show that the type of reward function that is related to the physical model can affect the framework performance. Based on the study, the optimal reward function is established. The approaches to extending the capability of PIRLF are discussed, which can be a reference for the further development of this methodology.
△ Less
Submitted 4 October, 2020; v1 submitted 6 August, 2019;
originally announced August 2019.
-
Two-phase flow regime prediction using LSTM based deep recurrent neural network
Authors:
Zhuoran Dang,
Mamoru Ishii
Abstract:
Long short-term memory (LSTM) and recurrent neural network (RNN) has achieved great successes on time-series prediction. In this paper, a methodology of using LSTM-based deep-RNN for two-phase flow regime prediction is proposed, motivated by previous research on constructing deep RNN. The method is featured with fast response and accuracy. The built RNN networks are trained and tested with time-se…
▽ More
Long short-term memory (LSTM) and recurrent neural network (RNN) has achieved great successes on time-series prediction. In this paper, a methodology of using LSTM-based deep-RNN for two-phase flow regime prediction is proposed, motivated by previous research on constructing deep RNN. The method is featured with fast response and accuracy. The built RNN networks are trained and tested with time-series void fraction data collected using impedance void meter. The result shows that the prediction accuracy depends on the depth of network and the number of layer cells. However, deeper and larger network consumes more time in predicting.
△ Less
Submitted 30 March, 2019;
originally announced April 2019.
-
Zero-shot Domain Adaptation Based on Attribute Information
Authors:
Masato Ishii,
Takashi Takenouchi,
Masashi Sugiyama
Abstract:
In this paper, we propose a novel domain adaptation method that can be applied without target data. We consider the situation where domain shift is caused by a prior change of a specific factor and assume that we know how the prior changes between source and target domains. We call this factor an attribute, and reformulate the domain adaptation problem to utilize the attribute prior instead of tar…
▽ More
In this paper, we propose a novel domain adaptation method that can be applied without target data. We consider the situation where domain shift is caused by a prior change of a specific factor and assume that we know how the prior changes between source and target domains. We call this factor an attribute, and reformulate the domain adaptation problem to utilize the attribute prior instead of target data. In our method, the source data are reweighted with the sample-wise weight estimated by the attribute prior and the data themselves so that they are useful in the target domain. We theoretically reveal that our method provides more precise estimation of sample-wise transferability than a straightforward attribute-based reweighting approach. Experimental results with both toy datasets and benchmark datasets show that our method can perform well, though it does not use any target data.
△ Less
Submitted 13 March, 2019;
originally announced March 2019.
-
Dense Depth Estimation in Monocular Endoscopy with Self-supervised Learning Methods
Authors:
Xingtong Liu,
Ayushi Sinha,
Masaru Ishii,
Gregory D. Hager,
Austin Reiter,
Russell H. Taylor,
Mathias Unberath
Abstract:
We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires monocular endoscopic videos and a multi-view stereo method, e.g., structure from motion, to supervise learning in a sparse manner. Consequently, our method requires neither manual labeling…
▽ More
We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires monocular endoscopic videos and a multi-view stereo method, e.g., structure from motion, to supervise learning in a sparse manner. Consequently, our method requires neither manual labeling nor patient computed tomography (CT) scan in the training and application phases. In a cross-patient experiment using CT scans as groundtruth, the proposed method achieved submillimeter mean residual error. In a comparison study to recent self-supervised depth estimation methods designed for natural video on in vivo sinus endoscopy data, we demonstrate that the proposed approach outperforms the previous methods by a large margin. The source code for this work is publicly available online at https://github.com/lppllppl920/EndoscopyDepthEstimation-Pytorch.
△ Less
Submitted 29 October, 2019; v1 submitted 20 February, 2019;
originally announced February 2019.
-
Towards automatic initialization of registration algorithms using simulated endoscopy images
Authors:
Ayushi Sinha,
Masaru Ishii,
Russell H. Taylor,
Gregory D. Hager,
Austin Reiter
Abstract:
Registering images from different modalities is an active area of research in computer aided medical interventions. Several registration algorithms have been developed, many of which achieve high accuracy. However, these results are dependent on many factors, including the quality of the extracted features or segmentations being registered as well as the initial alignment. Although several methods…
▽ More
Registering images from different modalities is an active area of research in computer aided medical interventions. Several registration algorithms have been developed, many of which achieve high accuracy. However, these results are dependent on many factors, including the quality of the extracted features or segmentations being registered as well as the initial alignment. Although several methods have been developed towards improving segmentation algorithms and automating the segmentation process, few automatic initialization algorithms have been explored. In many cases, the initial alignment from which a registration is initiated is performed manually, which interferes with the clinical workflow. Our aim is to use scene classification in endoscopic procedures to achieve coarse alignment of the endoscope and a preoperative image of the anatomy. In this paper, we show using simulated scenes that a neural network can predict the region of anatomy (with respect to a preoperative image) that the endoscope is located in by observing a single endoscopic video frame. With limited training and without any hyperparameter tuning, our method achieves an accuracy of 76.53 (+/-1.19)%. There are several avenues for improvement, making this a promising direction of research. Code is available at https://github.com/AyushiSinha/AutoInitialization.
△ Less
Submitted 27 June, 2018;
originally announced June 2018.
-
Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy
Authors:
Xingtong Liu,
Ayushi Sinha,
Mathias Unberath,
Masaru Ishii,
Gregory Hager,
Russell H. Taylor,
Austin Reiter
Abstract:
We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires sequential data from monocular endoscopic videos and a multi-view stereo reconstruction method, e.g. structure from motion, that supervises learning in a sparse but accurate manner. Consequ…
▽ More
We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires sequential data from monocular endoscopic videos and a multi-view stereo reconstruction method, e.g. structure from motion, that supervises learning in a sparse but accurate manner. Consequently, our method requires neither manual interaction, such as scaling or labeling, nor patient CT in the training and application phases. We demonstrate the performance of our method on sinus endoscopy data from two patients and validate depth prediction quantitatively using corresponding patient CT scans where we found submillimeter residual errors.
△ Less
Submitted 26 July, 2018; v1 submitted 25 June, 2018;
originally announced June 2018.
-
Endoscopic navigation in the absence of CT imaging
Authors:
Ayushi Sinha,
Xingtong Liu,
Austin Reiter,
Masaru Ishii,
Gregory D. Hager,
Russell H. Taylor
Abstract:
Clinical examinations that involve endoscopic exploration of the nasal cavity and sinuses often do not have a reference image to provide structural context to the clinician. In this paper, we present a system for navigation during clinical endoscopic exploration in the absence of computed tomography (CT) scans by making use of shape statistics from past CT scans. Using a deformable registration al…
▽ More
Clinical examinations that involve endoscopic exploration of the nasal cavity and sinuses often do not have a reference image to provide structural context to the clinician. In this paper, we present a system for navigation during clinical endoscopic exploration in the absence of computed tomography (CT) scans by making use of shape statistics from past CT scans. Using a deformable registration algorithm along with dense reconstructions from video, we show that we are able to achieve submillimeter registrations in in-vivo clinical data and are able to assign confidence to these registrations using confidence criteria established using simulated data.
△ Less
Submitted 7 June, 2018;
originally announced June 2018.
-
Towards computational fluorescence microscopy: Machine learning-based integrated prediction of morphological and molecular tumor profiles
Authors:
Alexander Binder,
Michael Bockmayr,
Miriam Hägele,
Stephan Wienert,
Daniel Heim,
Katharina Hellweg,
Albrecht Stenzinger,
Laura Parlow,
Jan Budczies,
Benjamin Goeppert,
Denise Treue,
Manato Kotani,
Masaru Ishii,
Manfred Dietel,
Andreas Hocke,
Carsten Denkert,
Klaus-Robert Müller,
Frederick Klauschen
Abstract:
Recent advances in cancer research largely rely on new developments in microscopic or molecular profiling techniques offering high level of detail with respect to either spatial or molecular features, but usually not both. Here, we present a novel machine learning-based computational approach that allows for the identification of morphological tissue features and the prediction of molecular proper…
▽ More
Recent advances in cancer research largely rely on new developments in microscopic or molecular profiling techniques offering high level of detail with respect to either spatial or molecular features, but usually not both. Here, we present a novel machine learning-based computational approach that allows for the identification of morphological tissue features and the prediction of molecular properties from breast cancer imaging data. This integration of microanatomic information of tumors with complex molecular profiling data, including protein or gene expression, copy number variation, gene methylation and somatic mutations, provides a novel means to computationally score molecular markers with respect to their relevance to cancer and their spatial associations within the tumor microenvironment.
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
Anatomically Constrained Video-CT Registration via the V-IMLOP Algorithm
Authors:
Seth D. Billings,
Ayushi Sinha,
Austin Reiter,
Simon Leonard,
Masaru Ishii,
Gregory D. Hager,
Russell H. Taylor
Abstract:
Functional endoscopic sinus surgery (FESS) is a surgical procedure used to treat acute cases of sinusitis and other sinus diseases. FESS is fast becoming the preferred choice of treatment due to its minimally invasive nature. However, due to the limited field of view of the endoscope, surgeons rely on navigation systems to guide them within the nasal cavity. State of the art navigation systems rep…
▽ More
Functional endoscopic sinus surgery (FESS) is a surgical procedure used to treat acute cases of sinusitis and other sinus diseases. FESS is fast becoming the preferred choice of treatment due to its minimally invasive nature. However, due to the limited field of view of the endoscope, surgeons rely on navigation systems to guide them within the nasal cavity. State of the art navigation systems report registration accuracy of over 1mm, which is large compared to the size of the nasal airways. We present an anatomically constrained video-CT registration algorithm that incorporates multiple video features. Our algorithm is robust in the presence of outliers. We also test our algorithm on simulated and in-vivo data, and test its accuracy against degrading initializations.
△ Less
Submitted 25 October, 2016;
originally announced October 2016.
-
Perspectives on Surgical Data Science
Authors:
S. Swaroop Vedula,
Masaru Ishii,
Gregory D. Hager
Abstract:
The availability of large amounts of data together with advances in analytical techniques afford an opportunity to address difficult challenges in ensuring that healthcare is safe, effective, efficient, patient-centered, equitable, and timely. Surgical care and training stand to tremendously gain through surgical data science. Herein, we discuss a few perspectives on the scope and objectives for s…
▽ More
The availability of large amounts of data together with advances in analytical techniques afford an opportunity to address difficult challenges in ensuring that healthcare is safe, effective, efficient, patient-centered, equitable, and timely. Surgical care and training stand to tremendously gain through surgical data science. Herein, we discuss a few perspectives on the scope and objectives for surgical data science.
△ Less
Submitted 13 October, 2016;
originally announced October 2016.
-
Automated Objective Surgical Skill Assessment in the Operating Room Using Unstructured Tool Motion
Authors:
Piyush Poddar,
Narges Ahmidi,
S. Swaroop Vedula,
Lisa Ishii,
Gregory D. Hager,
Masaru Ishii
Abstract:
Previous work on surgical skill assessment using intraoperative tool motion in the operating room (OR) has focused on highly-structured surgical tasks such as cholecystectomy. Further, these methods only considered generic motion metrics such as time and number of movements, which are of limited instructive value. In this paper, we developed and evaluated an automated approach to the surgical skil…
▽ More
Previous work on surgical skill assessment using intraoperative tool motion in the operating room (OR) has focused on highly-structured surgical tasks such as cholecystectomy. Further, these methods only considered generic motion metrics such as time and number of movements, which are of limited instructive value. In this paper, we developed and evaluated an automated approach to the surgical skill assessment of nasal septoplasty in the OR. The obstructed field of view and highly unstructured nature of septoplasty precludes trainees from efficiently learning the procedure. We propose a descriptive structure of septoplasty consisting of two types of activity: (1) brushing activity directed away from the septum plane characterizing the consistency of the surgeon's wrist motion and (2) activity along the septal plane characterizing the surgeon's coverage pattern. We derived features related to these two activity types that classify a surgeon's level of training with an average accuracy of about 72%. The features we developed provide surgeons with personalized, actionable feedback regarding their tool motion.
△ Less
Submitted 18 December, 2014;
originally announced December 2014.
-
Upgrade of Spring-8 Beamline Network with Vlan Technology Over Gigabit Ethernet
Authors:
M. Ishii,
T. Fukui,
Y. Furukawa,
T. Nakatani,
T. Ohata,
R. Tanaka
Abstract:
The beamline network system at SPring-8 consists of three LANs; a BL-LAN for beamline component control, a BL-USER-LAN for beamline experimental users and an OA-LAN for the information services. These LANs are interconnected by a firewall system. Since the network traffic and the number of beamlines have increased, we upgraded the backbone of BL-USER-LAN from Fast Ethernet to Gigabit Ethernet. A…
▽ More
The beamline network system at SPring-8 consists of three LANs; a BL-LAN for beamline component control, a BL-USER-LAN for beamline experimental users and an OA-LAN for the information services. These LANs are interconnected by a firewall system. Since the network traffic and the number of beamlines have increased, we upgraded the backbone of BL-USER-LAN from Fast Ethernet to Gigabit Ethernet. And then, to establish the independency of a beamline and to raise flexibility of every beamline, we also introduced the IEEE802.1Q Virtual LAN (VLAN) technology into the BL-USER-LAN. We discuss here a future plan to build the firewall system with hardware load balancers.
△ Less
Submitted 17 December, 2001; v1 submitted 9 November, 2001;
originally announced November 2001.