-
Conceptual Design on the Field of View of Celestial Navigation Systems for Maritime Autonomous Surface Ships
Authors:
Kouki Wakita,
Fuyuki Hane,
Takeshi Sekiguchi,
Shigehito Shimizu,
Shinji Mitani,
Youhei Akimoto,
Atsuo Maki
Abstract:
In order to understand the appropriate field of view (FOV) size of celestial automatic navigation systems for surface ships, we investigate the variations of measurement accuracy of star position and probability of successful star identification with respect to FOV, focusing on the decreasing number of observable star magnitudes and the presence of physically covered stars in marine environments.…
▽ More
In order to understand the appropriate field of view (FOV) size of celestial automatic navigation systems for surface ships, we investigate the variations of measurement accuracy of star position and probability of successful star identification with respect to FOV, focusing on the decreasing number of observable star magnitudes and the presence of physically covered stars in marine environments. The results revealed that, although a larger FOV reduces the measurement accuracy of star positions, it increases the number of observable objects and thus improves the probability of star identification using subgraph isomorphism-based methods. It was also found that, although at least four objects need to be observed for accurate identification, four objects may not be sufficient for wider FOVs. On the other hand, from the point of view of celestial navigation systems, a decrease in the measurement accuracy leads to a decrease in positioning accuracy. Therefore, it was found that maximizing the FOV is required for celestial automatic navigation systems as long as the desired positioning accuracy can be ensured. Furthermore, it was found that algorithms incorporating more than four observed celestial objects are required to achieve highly accurate star identification over a wider FOV.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
MELD-ST: An Emotion-aware Speech Translation Dataset
Authors:
Sirou Chen,
Sakiko Yahata,
Shuichiro Shimizu,
Zhengdong Yang,
Yihang Li,
Chenhui Chu,
Sadao Kurohashi
Abstract:
Emotion plays a crucial role in human conversation. This paper underscores the significance of considering emotion in speech translation. We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs. Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset. Baseline e…
▽ More
Emotion plays a crucial role in human conversation. This paper underscores the significance of considering emotion in speech translation. We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs. Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset. Baseline experiments using the SeamlessM4T model on the dataset indicate that fine-tuning with emotion labels can enhance translation performance in some settings, highlighting the need for further research in emotion-aware speech translation systems.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Gaze-Based Intention Recognition for Human-Robot Collaboration
Authors:
Valerio Belcamino,
Miwa Takase,
Mariya Kilina,
Alessandro Carfì,
Akira Shimada,
Sota Shimizu,
Fulvio Mastrogiovanni
Abstract:
This work aims to tackle the intent recognition problem in Human-Robot Collaborative assembly scenarios. Precisely, we consider an interactive assembly of a wooden stool where the robot fetches the pieces in the correct order and the human builds the parts following the instruction manual. The intent recognition is limited to the idle state estimation and it is needed to ensure a better synchroniz…
▽ More
This work aims to tackle the intent recognition problem in Human-Robot Collaborative assembly scenarios. Precisely, we consider an interactive assembly of a wooden stool where the robot fetches the pieces in the correct order and the human builds the parts following the instruction manual. The intent recognition is limited to the idle state estimation and it is needed to ensure a better synchronization between the two agents. We carried out a comparison between two distinct solutions involving wearable sensors and eye tracking integrated into the perception pipeline of a flexible planning architecture based on Hierarchical Task Networks. At runtime, the wearable sensing module exploits the raw measurements from four 9-axis Inertial Measurement Units positioned on the wrists and hands of the user as an input for a Long Short-Term Memory Network. On the other hand, the eye tracking relies on a Head Mounted Display and Unreal Engine.
We tested the effectiveness of the two approaches with 10 participants, each of whom explored both options in alternate order. We collected explicit metrics about the attractiveness and efficiency of the two techniques through User Experience Questionnaires as well as implicit criteria regarding the classification time and the overall assembly time.
The results of our work show that the two methods can reach comparable performances both in terms of effectiveness and user preference. Future development could aim at joining the two approaches two allow the recognition of more complex activities and to anticipate the user actions.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Counterfactual Explanations of Black-box Machine Learning Models using Causal Discovery with Applications to Credit Rating
Authors:
Daisuke Takahashi,
Shohei Shimizu,
Takuma Tanaka
Abstract:
Explainable artificial intelligence (XAI) has helped elucidate the internal mechanisms of machine learning algorithms, bolstering their reliability by demonstrating the basis of their predictions. Several XAI models consider causal relationships to explain models by examining the input-output relationships of prediction models and the dependencies between features. The majority of these models hav…
▽ More
Explainable artificial intelligence (XAI) has helped elucidate the internal mechanisms of machine learning algorithms, bolstering their reliability by demonstrating the basis of their predictions. Several XAI models consider causal relationships to explain models by examining the input-output relationships of prediction models and the dependencies between features. The majority of these models have been based their explanations on counterfactual probabilities, assuming that the causal graph is known. However, this assumption complicates the application of such models to real data, given that the causal relationships between features are unknown in most cases. Thus, this study proposed a novel XAI framework that relaxed the constraint that the causal graph is known. This framework leveraged counterfactual probabilities and additional prior information on causal structure, facilitating the integration of a causal graph estimated through causal discovery methods and a black-box classification model. Furthermore, explanatory scores were estimated based on counterfactual probabilities. Numerical experiments conducted employing artificial data confirmed the possibility of estimating the explanatory score more accurately than in the absence of a causal graph. Finally, as an application to real data, we constructed a classification model of credit ratings assigned by Shiga Bank, Shiga prefecture, Japan. We demonstrated the effectiveness of the proposed method in cases where the causal graph is unknown.
△ Less
Submitted 26 April, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Integrating Large Language Models in Causal Discovery: A Statistical Causal Approach
Authors:
Masayuki Takayama,
Tadahisa Okuda,
Thong Pham,
Tatsuyoshi Ikenoue,
Shingo Fukuma,
Shohei Shimizu,
Akiyoshi Sannai
Abstract:
In practical statistical causal discovery (SCD), embedding domain expert knowledge as constraints into the algorithm is significant for creating consistent meaningful causal models, despite the challenges in systematic acquisition of the background knowledge. To overcome these challenges, this paper proposes a novel methodology for causal inference, in which SCD methods and knowledge based causal…
▽ More
In practical statistical causal discovery (SCD), embedding domain expert knowledge as constraints into the algorithm is significant for creating consistent meaningful causal models, despite the challenges in systematic acquisition of the background knowledge. To overcome these challenges, this paper proposes a novel methodology for causal inference, in which SCD methods and knowledge based causal inference (KBCI) with a large language model (LLM) are synthesized through ``statistical causal prompting (SCP)'' for LLMs and prior knowledge augmentation for SCD. Experiments have revealed that GPT-4 can cause the output of the LLM-KBCI and the SCD result with prior knowledge from LLM-KBCI to approach the ground truth, and that the SCD result can be further improved, if GPT-4 undergoes SCP. Furthermore, by using an unpublished real-world dataset, we have demonstrated that the background knowledge provided by the LLM can improve SCD on this dataset, even if this dataset has never been included in the training data of the LLM. The proposed approach can thus address challenges such as dataset biases and limitations, illustrating the potential of LLMs to improve data-driven causal inference across diverse scientific domains.
△ Less
Submitted 21 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition
Authors:
Hao Wang,
Shuhei Kurita,
Shuichiro Shimizu,
Daisuke Kawahara
Abstract:
Audio-visual speech recognition (AVSR) is a multimodal extension of automatic speech recognition (ASR), using video as a complement to audio. In AVSR, considerable efforts have been directed at datasets for facial features such as lip-readings, while they often fall short in evaluating the image comprehension capabilities in broader contexts. In this paper, we construct SlideAVSR, an AVSR dataset…
▽ More
Audio-visual speech recognition (AVSR) is a multimodal extension of automatic speech recognition (ASR), using video as a complement to audio. In AVSR, considerable efforts have been directed at datasets for facial features such as lip-readings, while they often fall short in evaluating the image comprehension capabilities in broader contexts. In this paper, we construct SlideAVSR, an AVSR dataset using scientific paper explanation videos. SlideAVSR provides a new benchmark where models transcribe speech utterances with texts on the slides on the presentation recordings. As technical terminologies that are frequent in paper explanations are notoriously challenging to transcribe without reference texts, our SlideAVSR dataset spotlights a new aspect of AVSR problems. As a simple yet effective baseline, we propose DocWhisper, an AVSR model that can refer to textual information from slides, and confirm its effectiveness on SlideAVSR.
△ Less
Submitted 2 July, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Use of Prior Knowledge to Discover Causal Additive Models with Unobserved Variables and its Application to Time Series Data
Authors:
Takashi Nicholas Maeda,
Shohei Shimizu
Abstract:
This paper proposes two methods for causal additive models with unobserved variables (CAM-UV). CAM-UV assumes that the causal functions take the form of generalized additive models and that latent confounders are present. First, we propose a method that leverages prior knowledge for efficient causal discovery. Then, we propose an extension of this method for inferring causality in time series data…
▽ More
This paper proposes two methods for causal additive models with unobserved variables (CAM-UV). CAM-UV assumes that the causal functions take the form of generalized additive models and that latent confounders are present. First, we propose a method that leverages prior knowledge for efficient causal discovery. Then, we propose an extension of this method for inferring causality in time series data. The original CAM-UV algorithm differs from other existing causal function models in that it does not seek the causal order between observed variables, but rather aims to identify the causes for each observed variable. Therefore, the first proposed method in this paper utilizes prior knowledge, such as understanding that certain variables cannot be causes of specific others. Moreover, by incorporating the prior knowledge that causes precedes their effects in time, we extend the first algorithm to the second method for causal discovery in time series data. We validate the first proposed method by using simulated data to demonstrate that the accuracy of causal discovery increases as more prior knowledge is accumulated. Additionally, we test the second proposed method by comparing it with existing time series causal discovery methods, using both simulated data and real-world data.
△ Less
Submitted 17 January, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
Scalable Counterfactual Distribution Estimation in Multivariate Causal Models
Authors:
Thong Pham,
Shohei Shimizu,
Hideitsu Hino,
Tam Le
Abstract:
We consider the problem of estimating the counterfactual joint distribution of multiple quantities of interests (e.g., outcomes) in a multivariate causal model extended from the classical difference-in-difference design. Existing methods for this task either ignore the correlation structures among dimensions of the multivariate outcome by considering univariate causal models on each dimension sepa…
▽ More
We consider the problem of estimating the counterfactual joint distribution of multiple quantities of interests (e.g., outcomes) in a multivariate causal model extended from the classical difference-in-difference design. Existing methods for this task either ignore the correlation structures among dimensions of the multivariate outcome by considering univariate causal models on each dimension separately and hence produce incorrect counterfactual distributions, or poorly scale even for moderate-size datasets when directly dealing with such multivariate causal model. We propose a method that alleviates both issues simultaneously by leveraging a robust latent one-dimensional subspace of the original high-dimension space and exploiting the efficient estimation from the univariate causal model on such space. Since the construction of the one-dimensional subspace uses information from all the dimensions, our method can capture the correlation structures and produce good estimates of the counterfactual distribution. We demonstrate the advantages of our approach over existing methods on both synthetic and real-world data.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Video-Helpful Multimodal Machine Translation
Authors:
Yihang Li,
Shuichiro Shimizu,
Chenhui Chu,
Sadao Kurohashi,
Wei Li
Abstract:
Existing multimodal machine translation (MMT) datasets consist of images and video captions or instructional video subtitles, which rarely contain linguistic ambiguity, making visual information ineffective in generating appropriate translations. Recent work has constructed an ambiguous subtitles dataset to alleviate this problem but is still limited to the problem that videos do not necessarily c…
▽ More
Existing multimodal machine translation (MMT) datasets consist of images and video captions or instructional video subtitles, which rarely contain linguistic ambiguity, making visual information ineffective in generating appropriate translations. Recent work has constructed an ambiguous subtitles dataset to alleviate this problem but is still limited to the problem that videos do not necessarily contribute to disambiguation. We introduce EVA (Extensive training set and Video-helpful evaluation set for Ambiguous subtitles translation), an MMT dataset containing 852k Japanese-English (Ja-En) parallel subtitle pairs, 520k Chinese-English (Zh-En) parallel subtitle pairs, and corresponding video clips collected from movies and TV episodes. In addition to the extensive training set, EVA contains a video-helpful evaluation set in which subtitles are ambiguous, and videos are guaranteed helpful for disambiguation. Furthermore, we propose SAFA, an MMT model based on the Selective Attention model with two novel methods: Frame attention loss and Ambiguity augmentation, aiming to use videos in EVA for disambiguation fully. Experiments on EVA show that visual information and the proposed methods can boost translation performance, and our model performs significantly better than existing MMT models. The EVA dataset and the SAFA model are available at: https://github.com/ku-nlp/video-helpful-MMT.git.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Authors:
Harunori Kawano,
Sota Shimizu
Abstract:
Wav2vec2 has achieved success in applying Transformer architecture and self-supervised learning to speech recognition. Recently, these have come to be used not only for speech recognition but also for the entire speech processing. This paper introduces an effective end-to-end speaker identification model applied Transformer-based contextual model. We explored the relationship between the hyper-par…
▽ More
Wav2vec2 has achieved success in applying Transformer architecture and self-supervised learning to speech recognition. Recently, these have come to be used not only for speech recognition but also for the entire speech processing. This paper introduces an effective end-to-end speaker identification model applied Transformer-based contextual model. We explored the relationship between the hyper-parameters and the performance in order to discern the structure of an effective model. Furthermore, we propose a pooling method, Temporal Gate Pooling, with powerful learning ability for speaker identification. We applied Conformer as encoder and BEST-RQ for pre-training and conducted an evaluation utilizing the speaker identification of VoxCeleb1. The proposed method has achieved an accuracy of 87.1% with 28.5M parameters, demonstrating comparable precision to wav2vec2 with 317.7M parameters. Code is available at https://github.com/HarunoriKawano/speaker-identification-with-tgp.
△ Less
Submitted 10 September, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Joint learning of images and videos with a single Vision Transformer
Authors:
Shuki Shimizu,
Toru Tamaki
Abstract:
In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer IV-ViT, and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action…
▽ More
In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer IV-ViT, and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Causal-learn: Causal Discovery in Python
Authors:
Yujia Zheng,
Biwei Huang,
Wei Chen,
Joseph Ramsey,
Mingming Gong,
Ruichu Cai,
Shohei Shimizu,
Peter Spirtes,
Kun Zhang
Abstract:
Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe $\textit{causal-learn}$, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, m…
▽ More
Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe $\textit{causal-learn}$, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, modular building blocks for developers, detailed documentation for learners, and comprehensive methods for all. Different from previous packages in R or Java, $\textit{causal-learn}$ is fully developed in Python, which could be more in tune with the recent preference shift in programming languages within related communities. The library is available at https://github.com/py-why/causal-learn.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Biological Organisms as End Effectors
Authors:
Josephine Galipon,
Shoya Shimizu,
Kenjiro Tadakuma
Abstract:
In robotics, an end effector is a device at the end of a robotic arm that is designed to physically interact with objects in the environment or with the environment itself. Effectively, it serves as the hand of the robot, carrying out tasks on behalf of humans. But could we turn this concept on its head and consider using living organisms themselves as end effectors? This paper introduces a novel…
▽ More
In robotics, an end effector is a device at the end of a robotic arm that is designed to physically interact with objects in the environment or with the environment itself. Effectively, it serves as the hand of the robot, carrying out tasks on behalf of humans. But could we turn this concept on its head and consider using living organisms themselves as end effectors? This paper introduces a novel idea of using whole living organisms as end effectors for robotics. We showcase this by demonstrating that pill bugs and chitons -- types of small, harmless creatures -- can be utilized as functional grippers. Crucially, this method does not harm these creatures, enabling their release back into nature after use. How this concept may be expanded to other organisms and applications is also discussed.
△ Less
Submitted 12 June, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Towards Speech Dialogue Translation Mediating Speakers of Different Languages
Authors:
Shuichiro Shimizu,
Chenhui Chu,
Sheng Li,
Sadao Kurohashi
Abstract:
We present a new task, speech dialogue translation mediating speakers of different languages. We construct the SpeechBSD dataset for the task and conduct baseline experiments. Furthermore, we consider context to be an important aspect that needs to be addressed in this task and propose two ways of utilizing context, namely monolingual context and bilingual context. We conduct cascaded speech trans…
▽ More
We present a new task, speech dialogue translation mediating speakers of different languages. We construct the SpeechBSD dataset for the task and conduct baseline experiments. Furthermore, we consider context to be an important aspect that needs to be addressed in this task and propose two ways of utilizing context, namely monolingual context and bilingual context. We conduct cascaded speech translation experiments using Whisper and mBART, and show that bilingual context performs better in our settings.
△ Less
Submitted 22 May, 2023; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Neural Architecture Search for Improving Latency-Accuracy Trade-off in Split Computing
Authors:
Shoma Shimizu,
Takayuki Nishio,
Shota Saito,
Yoichi Hirose,
Chen Yen-Hsiu,
Shinichi Shirakawa
Abstract:
This paper proposes a neural architecture search (NAS) method for split computing. Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems. In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks. Thus, the architecture…
▽ More
This paper proposes a neural architecture search (NAS) method for split computing. Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems. In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks. Thus, the architecture of the neural network model significantly impacts the communication payload size, model accuracy, and computational load. In this paper, we address the challenge of optimizing neural network architecture for split computing. To this end, we proposed NASC, which jointly explores optimal model architecture and a split point to achieve higher accuracy while meeting latency requirements (i.e., smaller total latency of computation and communication than a certain threshold). NASC employs a one-shot NAS that does not require repeating model training for a computationally efficient architecture search. Our performance evaluation using hardware (HW)-NAS-Bench of benchmark data demonstrates that the proposed NASC can improve the ``communication latency and model accuracy" trade-off, i.e., reduce the latency by approximately 40-60% from the baseline, with slight accuracy degradation.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider
Authors:
C. Fanelli,
Z. Papandreou,
K. Suresh,
J. K. Adkins,
Y. Akiba,
A. Albataineh,
M. Amaryan,
I. C. Arsene,
C. Ayerbe Gayoso,
J. Bae,
X. Bai,
M. D. Baker,
M. Bashkanov,
R. Bellwied,
F. Benmokhtar,
V. Berdnikov,
J. C. Bernauer,
F. Bock,
W. Boeglin,
M. Borysova,
E. Brash,
P. Brindza,
W. J. Briscoe,
M. Brooks,
S. Bueltmann
, et al. (258 additional authors not shown)
Abstract:
The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to…
▽ More
The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to leverage Artificial Intelligence (AI) already starting from the design and R&D phases. The EIC Comprehensive Chromodynamics Experiment (ECCE) is a consortium that proposed a detector design based on a 1.5T solenoid. The EIC detector proposal review concluded that the ECCE design will serve as the reference design for an EIC detector. Herein we describe a comprehensive optimization of the ECCE tracker using AI. The work required a complex parametrization of the simulated detector system. Our approach dealt with an optimization problem in a multidimensional design space driven by multiple objectives that encode the detector performance, while satisfying several mechanical constraints. We describe our strategy and show results obtained for the ECCE tracking system. The AI-assisted design is agnostic to the simulation framework and can be extended to other sub-detectors or to a system of sub-detectors to further optimize the performance of the EIC detector.
△ Less
Submitted 19 May, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation
Authors:
Yihang Li,
Shuichiro Shimizu,
Weiqi Gu,
Chenhui Chu,
Sadao Kurohashi
Abstract:
Existing multimodal machine translation (MMT) datasets consist of images and video captions or general subtitles, which rarely contain linguistic ambiguity, making visual information not so effective to generate appropriate translations. We introduce VISA, a new dataset that consists of 40k Japanese-English parallel sentence pairs and corresponding video clips with the following key features: (1)…
▽ More
Existing multimodal machine translation (MMT) datasets consist of images and video captions or general subtitles, which rarely contain linguistic ambiguity, making visual information not so effective to generate appropriate translations. We introduce VISA, a new dataset that consists of 40k Japanese-English parallel sentence pairs and corresponding video clips with the following key features: (1) the parallel sentences are subtitles from movies and TV episodes; (2) the source subtitles are ambiguous, which means they have multiple possible translations with different meanings; (3) we divide the dataset into Polysemy and Omission according to the cause of ambiguity. We show that VISA is challenging for the latest MMT system, and we hope that the dataset can facilitate MMT research. The VISA dataset is available at: https://github.com/ku-nlp/VISA.
△ Less
Submitted 26 May, 2022; v1 submitted 20 January, 2022;
originally announced January 2022.
-
Discovery of Causal Additive Models in the Presence of Unobserved Variables
Authors:
Takashi Nicholas Maeda,
Shohei Shimizu
Abstract:
Causal discovery from data affected by unobserved variables is an important but difficult problem to solve. The effects that unobserved variables have on the relationships between observed variables are more complex in nonlinear cases than in linear cases. In this study, we focus on causal additive models in the presence of unobserved variables. Causal additive models exhibit structural equations…
▽ More
Causal discovery from data affected by unobserved variables is an important but difficult problem to solve. The effects that unobserved variables have on the relationships between observed variables are more complex in nonlinear cases than in linear cases. In this study, we focus on causal additive models in the presence of unobserved variables. Causal additive models exhibit structural equations that are additive in the variables and error terms. We take into account the presence of not only unobserved common causes but also unobserved intermediate variables. Our theoretical results show that, when the causal relationships are nonlinear and there are unobserved variables, it is not possible to identify all the causal relationships between observed variables through regression and independence tests. However, our theoretical results also show that it is possible to avoid incorrect inferences. We propose a method to identify all the causal relationships that are theoretically possible to identify without being biased by unobserved variables. The empirical results using artificial data and simulated functional magnetic resonance imaging (fMRI) data show that our method effectively infers causal structures in the presence of unobserved variables.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
Causal Discovery with Multi-Domain LiNGAM for Latent Factors
Authors:
Yan Zeng,
Shohei Shimizu,
Ruichu Cai,
Feng Xie,
Michio Yamamoto,
Zhifeng Hao
Abstract:
Discovering causal structures among latent factors from observed data is a particularly challenging problem. Despite some efforts for this problem, existing methods focus on the single-domain data only. In this paper, we propose Multi-Domain Linear Non-Gaussian Acyclic Models for Latent Factors (MD-LiNA), where the causal structure among latent factors of interest is shared for all domains, and we…
▽ More
Discovering causal structures among latent factors from observed data is a particularly challenging problem. Despite some efforts for this problem, existing methods focus on the single-domain data only. In this paper, we propose Multi-Domain Linear Non-Gaussian Acyclic Models for Latent Factors (MD-LiNA), where the causal structure among latent factors of interest is shared for all domains, and we provide its identification results. The model enriches the causal representation for multi-domain data. We propose an integrated two-phase algorithm to estimate the model. In particular, we first locate the latent factors and estimate the factor loading matrix. Then to uncover the causal structure among shared latent factors of interest, we derive a score function based on the characterization of independence relations between external influences and the dependence relations between multi-domain latent factors and latent factors of interest. We show that the proposed method provides locally consistent estimators. Experimental results on both synthetic and real-world data demonstrate the efficacy and robustness of our approach.
△ Less
Submitted 22 April, 2022; v1 submitted 19 September, 2020;
originally announced September 2020.
-
Hybrid Scheme of Kinematic Analysis and Lagrangian Koopman Operator Analysis for Short-term Precipitation Forecasting
Authors:
Shitao Zheng,
Takashi Miyamoto,
Koyuru Iwanami,
Shingo Shimizu,
Ryohei Kato
Abstract:
With the accumulation of meteorological big data, data-driven models for short-term precipitation forecasting have shown increasing promise. We focus on Koopman operator analysis, which is a data-driven scheme to discover governing laws in observed data. We propose a method to apply this scheme to phenomena accompanying advection currents such as precipitation. The proposed method decomposes time…
▽ More
With the accumulation of meteorological big data, data-driven models for short-term precipitation forecasting have shown increasing promise. We focus on Koopman operator analysis, which is a data-driven scheme to discover governing laws in observed data. We propose a method to apply this scheme to phenomena accompanying advection currents such as precipitation. The proposed method decomposes time evolutions of the phenomena between advection currents under a velocity field and changes in physical quantities under Lagrangian coordinates. The advection currents are estimated by kinematic analysis, and the changes in physical quantities are estimated by Koopman operator analysis. The proposed method is applied to actual precipitation distribution data, and the results show that the development and decay of precipitation are properly captured relative to conventional methods and that stable predictions over long periods are possible.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Causal discovery of linear non-Gaussian acyclic models in the presence of latent confounders
Authors:
Takashi Nicholas Maeda,
Shohei Shimizu
Abstract:
Causal discovery from data affected by latent confounders is an important and difficult challenge. Causal functional model-based approaches have not been used to present variables whose relationships are affected by latent confounders, while some constraint-based methods can present them. This paper proposes a causal functional model-based method called repetitive causal discovery (RCD) to discove…
▽ More
Causal discovery from data affected by latent confounders is an important and difficult challenge. Causal functional model-based approaches have not been used to present variables whose relationships are affected by latent confounders, while some constraint-based methods can present them. This paper proposes a causal functional model-based method called repetitive causal discovery (RCD) to discover the causal structure of observed variables affected by latent confounders. RCD repeats inferring the causal directions between a small number of observed variables and determines whether the relationships are affected by latent confounders. RCD finally produces a causal graph where a bi-directed arrow indicates the pair of variables that have the same latent confounders, and a directed arrow indicates the causal direction of a pair of variables that are not affected by the same latent confounder. The results of experimental validation using simulated data and real-world data confirmed that RCD is effective in identifying latent confounders and causal directions between observed variables.
△ Less
Submitted 4 November, 2020; v1 submitted 13 January, 2020;
originally announced January 2020.
-
A Maximum Edge-Weight Clique Extraction Algorithm Based on Branch-and-Bound
Authors:
Satoshi Shimizu,
Kazuaki Yamaguchi,
Sumio Masuda
Abstract:
The maximum edge-weight clique problem is to find a clique whose sum of edge-weight is the maximum for a given edge-weighted undirected graph. The problem is NP-hard and some branch-and-bound algorithms have been proposed. In this paper, we propose a new exact algorithm based on branch-and-bound. It assigns edge-weights to vertices and calculates upper bounds using vertex coloring. By some computa…
▽ More
The maximum edge-weight clique problem is to find a clique whose sum of edge-weight is the maximum for a given edge-weighted undirected graph. The problem is NP-hard and some branch-and-bound algorithms have been proposed. In this paper, we propose a new exact algorithm based on branch-and-bound. It assigns edge-weights to vertices and calculates upper bounds using vertex coloring. By some computational experiments, we confirmed our algorithm is faster than previous algorithms.
△ Less
Submitted 24 October, 2018;
originally announced October 2018.
-
Optimal Satellite Constellation Spare Strategy Using Multi-Echelon Inventory Control
Authors:
Pauline C. M. Jakob,
Seiichi Shimizu,
Shoji Yoshikawa,
Koki Ho
Abstract:
The recent growing trend to develop large-scale satellite constellations (i.e., mega-constellation) with low-cost small satellites has brought the need for an efficient and scalable maintenance strategy decision plan. Traditional spare strategies for satellite constellations cannot handle these mega-constellations due to their limited scalability in number of satellites and/or frequency of failure…
▽ More
The recent growing trend to develop large-scale satellite constellations (i.e., mega-constellation) with low-cost small satellites has brought the need for an efficient and scalable maintenance strategy decision plan. Traditional spare strategies for satellite constellations cannot handle these mega-constellations due to their limited scalability in number of satellites and/or frequency of failures. In this paper, we propose a novel spare strategy using an inventory management approach. We consider a set of parking orbits at a lower altitude than the constellation for spare storage, and model satellite constellation spare strategy problem using a multi-echelon (s,Q)-type inventory policy, viewing Earth's ground as a supplier, parking orbits as warehouses, and in-plane spare stocks as retailers. This inventory model is unique in that the parking orbits (warehouses) drift away from the orbital planes over time due to orbital mechanics' effects, and the in-plane spare stocks (retailers) would receive the resupply from the closest (i.e., minimum waiting time) available warehouse at the time of delivery. The parking orbits (warehouses) are also resupplied from the ground (supplier) with stochastic lead time caused by the order processing and launch opportunities, leveraging the cost saving effects by launching many satellites in one rocket (i.e., batch launch discount). The proposed analytical model is validated against simulations using Latin Hypercube Sampling. Furthermore, based on the proposed model, an optimization formulation is introduced to identify the optimal spare strategy, comprising the parking orbits characteristics and all locations policies, to minimize the maintenance cost of the system given performance requirements. The proposed model and optimization method are applied to a real-world case study of satellite mega-constellation to demonstrate their value.
△ Less
Submitted 5 June, 2019; v1 submitted 7 July, 2018;
originally announced July 2018.
-
Analysis of cause-effect inference by comparing regression errors
Authors:
Patrick Blöbaum,
Dominik Janzing,
Takashi Washio,
Shohei Shimizu,
Bernhard Schölkopf
Abstract:
We address the problem of inferring the causal direction between two variables by comparing the least-squares errors of the predictions in both possible directions. Under the assumption of an independence between the function relating cause and effect, the conditional noise distribution, and the distribution of the cause, we show that the errors are smaller in causal direction if both variables ar…
▽ More
We address the problem of inferring the causal direction between two variables by comparing the least-squares errors of the predictions in both possible directions. Under the assumption of an independence between the function relating cause and effect, the conditional noise distribution, and the distribution of the cause, we show that the errors are smaller in causal direction if both variables are equally scaled and the causal relation is close to deterministic. Based on this, we provide an easily applicable algorithm that only requires a regression in both possible causal directions and a comparison of the errors. The performance of the algorithm is compared with various related causal inference methods in different artificial and real-world data sets.
△ Less
Submitted 24 January, 2019; v1 submitted 19 February, 2018;
originally announced February 2018.
-
Combining Linear Non-Gaussian Acyclic Model with Logistic Regression Model for Estimating Causal Structure from Mixed Continuous and Discrete Data
Authors:
Chao Li,
Shohei Shimizu
Abstract:
Estimating causal models from observational data is a crucial task in data analysis. For continuous-valued data, Shimizu et al. have proposed a linear acyclic non-Gaussian model to understand the data generating process, and have shown that their model is identifiable when the number of data is sufficiently large. However, situations in which continuous and discrete variables coexist in the same p…
▽ More
Estimating causal models from observational data is a crucial task in data analysis. For continuous-valued data, Shimizu et al. have proposed a linear acyclic non-Gaussian model to understand the data generating process, and have shown that their model is identifiable when the number of data is sufficiently large. However, situations in which continuous and discrete variables coexist in the same problem are common in practice. Most existing causal discovery methods either ignore the discrete data and apply a continuous-valued algorithm or discretize all the continuous data and then apply a discrete Bayesian network approach. These methods possibly loss important information when we ignore discrete data or introduce the approximation error due to discretization. In this paper, we define a novel hybrid causal model which consists of both continuous and discrete variables. The model assumes: (1) the value of a continuous variable is a linear function of its parent variables plus a non-Gaussian noise, and (2) each discrete variable is a logistic variable whose distribution parameters depend on the values of its parent variables. In addition, we derive the BIC scoring function for model selection. The new discovery algorithm can learn causal structures from mixed continuous and discrete data without discretization. We empirically demonstrate the power of our method through thorough simulations.
△ Less
Submitted 16 February, 2018;
originally announced February 2018.
-
Error Asymmetry in Causal and Anticausal Regression
Authors:
Patrick Blöbaum,
Takashi Washio,
Shohei Shimizu
Abstract:
It is generally difficult to make any statements about the expected prediction error in an univariate setting without further knowledge about how the data were generated. Recent work showed that knowledge about the real underlying causal structure of a data generation process has implications for various machine learning settings. Assuming an additive noise and an independence between data generat…
▽ More
It is generally difficult to make any statements about the expected prediction error in an univariate setting without further knowledge about how the data were generated. Recent work showed that knowledge about the real underlying causal structure of a data generation process has implications for various machine learning settings. Assuming an additive noise and an independence between data generating mechanism and its input, we draw a novel connection between the intrinsic causal relationship of two variables and the expected prediction error. We formulate the theorem that the expected error of the true data generating function as prediction model is generally smaller when the effect is predicted from its cause and, on the contrary, greater when the cause is predicted from its effect. The theorem implies an asymmetry in the error depending on the prediction direction. This is further corroborated with empirical evaluations in artificial and real-world data sets.
△ Less
Submitted 17 April, 2017; v1 submitted 11 October, 2016;
originally announced October 2016.
-
Coalgebraic Trace Semantics for Buechi and Parity Automata
Authors:
Natsuki Urabe,
Shunsuke Shimizu,
Ichiro Hasuo
Abstract:
Despite its success in producing numerous general results on state-based dynamics, the theory of coalgebra has struggled to accommodate the Buechi acceptance condition---a basic notion in the theory of automata for infinite words or trees. In this paper we present a clean answer to the question that builds on the "maximality" characterization of infinite traces (by Jacobs and Cirstea): the accepte…
▽ More
Despite its success in producing numerous general results on state-based dynamics, the theory of coalgebra has struggled to accommodate the Buechi acceptance condition---a basic notion in the theory of automata for infinite words or trees. In this paper we present a clean answer to the question that builds on the "maximality" characterization of infinite traces (by Jacobs and Cirstea): the accepted language of a Buechi automaton is characterized by two commuting diagrams, one for a least homomorphism and the other for a greatest, much like in a system of (least and greatest) fixed-point equations. This characterization works uniformly for the nondeterministic branching and the probabilistic one; and for words and trees alike. We present our results in terms of the parity acceptance condition that generalizes Buechi's.
△ Less
Submitted 30 June, 2016;
originally announced June 2016.
-
Lattice-Theoretic Progress Measures and Coalgebraic Model Checking (with Appendices)
Authors:
Ichiro Hasuo,
Shunsuke Shimizu,
Corina Cirstea
Abstract:
In the context of formal verification in general and model checking in particular, parity games serve as a mighty vehicle: many problems are encoded as parity games, which are then solved by the seminal algorithm by Jurdzinski. In this paper we identify the essence of this workflow to be the notion of progress measure, and formalize it in general, possibly infinitary, lattice-theoretic terms. Our…
▽ More
In the context of formal verification in general and model checking in particular, parity games serve as a mighty vehicle: many problems are encoded as parity games, which are then solved by the seminal algorithm by Jurdzinski. In this paper we identify the essence of this workflow to be the notion of progress measure, and formalize it in general, possibly infinitary, lattice-theoretic terms. Our view on progress measures is that they are to nested/alternating fixed points what invariants are to safety/greatest fixed points, and what ranking functions are to liveness/least fixed points. That is, progress measures are combination of the latter two notions (invariant and ranking function) that have been extensively studied in the context of (program) verification.
We then apply our theory of progress measures to a general model-checking framework, where systems are categorically presented as coalgebras. The framework's theoretical robustness is witnessed by a smooth transfer from the branching-time setting to the linear-time one. Although the framework can be used to derive some decision procedures for finite settings, we also expect the proposed framework to form a basis for sound proof methods for some undecidable/infinitary problems.
△ Less
Submitted 9 January, 2016; v1 submitted 1 November, 2015;
originally announced November 2015.
-
A direct method for estimating a causal ordering in a linear non-Gaussian acyclic model
Authors:
Shohei Shimizu,
Aapo Hyvarinen,
Yoshinobu Kawahara
Abstract:
Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the datagenerating process of variables. Recently, it was shown that use of non-Gaussianity identifies a causal ordering of variables in a linear acyclic model without using any prior knowledge on the…
▽ More
Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the datagenerating process of variables. Recently, it was shown that use of non-Gaussianity identifies a causal ordering of variables in a linear acyclic model without using any prior knowledge on the network structure, which is not the case with conventional methods. However, existing estimation methods are based on iterative search algorithms and may not converge to a correct solution in a finite number of steps. In this paper, we propose a new direct method to estimate a causal ordering based on non-Gaussianity. In contrast to the previous methods, our algorithm requires no algorithmic parameters and is guaranteed to converge to the right solution within a small fixed number of steps if the data strictly follows the model.
△ Less
Submitted 9 August, 2014;
originally announced August 2014.
-
Causal Discovery in a Binary Exclusive-or Skew Acyclic Model: BExSAM
Authors:
Takanori Inazumi,
Takashi Washio,
Shohei Shimizu,
Joe Suzuki,
Akihiro Yamamoto,
Yoshinobu Kawahara
Abstract:
Discovering causal relations among observed variables in a given data set is a major objective in studies of statistics and artificial intelligence. Recently, some techniques to discover a unique causal model have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for binary…
▽ More
Discovering causal relations among observed variables in a given data set is a major objective in studies of statistics and artificial intelligence. Recently, some techniques to discover a unique causal model have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for binary data and propose an efficient new approach to deriving the unique causal model governing a given binary data set under skew distributions of external binary noises. Experimental evaluation shows excellent performance for both artificial and real world data sets.
△ Less
Submitted 22 January, 2014;
originally announced January 2014.
-
XPath Satisfiability with Parent Axes or Qualifiers Is Tractable under Many of Real-World DTDs
Authors:
Yasunori Ishihara,
Nobutaka Suzuki,
Kenji Hashimoto,
Shogo Shimizu,
Toru Fujiwara
Abstract:
This paper aims at finding a subclass of DTDs that covers many of the real-world DTDs while offering a polynomial-time complexity for deciding the XPath satisfiability problem. In our previous work, we proposed RW-DTDs, which cover most of the real-world DTDs (26 out of 27 real-world DTDs and 1406 out of 1407 DTD rules). However, under RW-DTDs, XPath satisfiability with only child, descendant-or-s…
▽ More
This paper aims at finding a subclass of DTDs that covers many of the real-world DTDs while offering a polynomial-time complexity for deciding the XPath satisfiability problem. In our previous work, we proposed RW-DTDs, which cover most of the real-world DTDs (26 out of 27 real-world DTDs and 1406 out of 1407 DTD rules). However, under RW-DTDs, XPath satisfiability with only child, descendant-or-self, and sibling axes is tractable. In this paper, we propose MRW-DTDs, which are slightly smaller than RW-DTDs but have tractability on XPath satisfiability with parent axes or qualifiers. MRW-DTDs are a proper superclass of duplicate-free DTDs proposed by Montazerian et al., and cover 24 out of the 27 real-world DTDs and 1403 out of the 1407 DTD rules. Under MRW-DTDs, we show that XPath satisfiability problems with (1) child, parent, and sibling axes, and (2) child and sibling axes and qualifiers are both tractable, which are known to be intractable under RW-DTDs.
△ Less
Submitted 3 August, 2013;
originally announced August 2013.
-
Discovery of non-gaussian linear causal models using ICA
Authors:
Shohei Shimizu,
Aapo Hyvarinen,
Yutaka Kano,
Patrik O. Hoyer
Abstract:
In recent years, several methods have been proposed for the discovery of causal structure from non-experimental data (Spirtes et al. 2000; Pearl 2000). Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to discover the complete causal structure of continuous-valued data,…
▽ More
In recent years, several methods have been proposed for the discovery of causal structure from non-experimental data (Spirtes et al. 2000; Pearl 2000). Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to discover the complete causal structure of continuous-valued data, under the assumptions that (a) the data generating process is linear, (b) there are no unobserved confounders, and (c) disturbance variables have non-gaussian distributions of non-zero variances. The solution relies on the use of the statistical method known as independent component analysis (ICA), and does not require any pre-specified time-ordering of the variables. We provide a complete Matlab package for performing this LiNGAM analysis (short for Linear Non-Gaussian Acyclic Model), and demonstrate the effectiveness of the method using artificially generated data.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
Causal discovery of linear acyclic models with arbitrary distributions
Authors:
Patrik O. Hoyer,
Aapo Hyvarinen,
Richard Scheines,
Peter L. Spirtes,
Joseph Ramsey,
Gustavo Lacerda,
Shohei Shimizu
Abstract:
An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; P…
▽ More
An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; Pearl 2000) cannot distinguish between independence-equivalent models, whereas approaches purely based on Independent Component Analysis (Shimizu et al. 2006) are inapplicable to data which is partially Gaussian. In this paper, we generalize and combine the two approaches, to yield a method able to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible. We give exact graphical conditions for when two distinct models represent the same family of distributions, and empirically demonstrate the power of our method through thorough simulations.
△ Less
Submitted 13 June, 2012;
originally announced June 2012.
-
Discovering causal structures in binary exclusive-or skew acyclic models
Authors:
Takanori Inazumi,
Takashi Washio,
Shohei Shimizu,
Joe Suzuki,
Akihiro Yamamoto,
Yoshinobu Kawahara
Abstract:
Discovering causal relations among observed variables in a given data set is a main topic in studies of statistics and artificial intelligence. Recently, some techniques to discover an identifiable causal structure have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for…
▽ More
Discovering causal relations among observed variables in a given data set is a main topic in studies of statistics and artificial intelligence. Recently, some techniques to discover an identifiable causal structure have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for binary data and propose a new approach to derive an identifiable causal structure governing the data based on skew Bernoulli distributions of external noise. Experimental evaluation shows excellent performance for both artificial and real world data sets.
△ Less
Submitted 14 February, 2012;
originally announced February 2012.
-
Principle of Virtual Use Method in Common Gateway Interface Program on the DACS Scheme
Authors:
Kazuya Odagiri,
Shogo Shimizu,
Naohiro Ishii,
Makoto Takizawa
Abstract:
In the world of the Internet, Web Servers such as Apache and Internet Information Server (IIS) were developed to exchange information among client computers having different Operation System. They have only the function of displaying static information such as HTML files and image files into the Web Browser. However, when the information is updated, the administrator updates it by manual operation…
▽ More
In the world of the Internet, Web Servers such as Apache and Internet Information Server (IIS) were developed to exchange information among client computers having different Operation System. They have only the function of displaying static information such as HTML files and image files into the Web Browser. However, when the information is updated, the administrator updates it by manual operation. In some cases, because it is necessary to update several places about the same information, the work load becomes high than it is assume and update error and update omission may occur. These problems were solved by use of a Common Gateway Interface (CGI) program such as a bulletin board system and a Blog system. However, these programs opened to Internet have often no user authentication mechanism and no access control mechanism. That is, they have the problem that user can access it freely only by getting the URL and inputting it to a Web Browser. Therefore, in this paper, we show a method to add the user authentication and access control mechanism for them. It is called virtual use method of CGI and is realized in the case of introducing the Destination Addressing Control System (DACS) Scheme, which is a kind of Policy Based Network Management Scheme (PBNM). As the result, this kind of the CGI program can be used in the organization with the above two functions.
△ Less
Submitted 8 February, 2012;
originally announced February 2012.
-
GroupLiNGAM: Linear non-Gaussian acyclic models for sets of variables
Authors:
Yoshinobu Kawahara,
Kenneth Bollen,
Shohei Shimizu,
Takashi Washio
Abstract:
Finding the structure of a graphical model has been received much attention in many fields. Recently, it is reported that the non-Gaussianity of data enables us to identify the structure of a directed acyclic graph without any prior knowledge on the structure. In this paper, we propose a novel non-Gaussianity based algorithm for more general type of models; chain graphs. The algorithm finds an ord…
▽ More
Finding the structure of a graphical model has been received much attention in many fields. Recently, it is reported that the non-Gaussianity of data enables us to identify the structure of a directed acyclic graph without any prior knowledge on the structure. In this paper, we propose a novel non-Gaussianity based algorithm for more general type of models; chain graphs. The algorithm finds an ordering of the disjoint subsets of variables by iteratively evaluating the independence between the variable subset and the residuals when the remaining variables are regressed on those. However, its computational cost grows exponentially according to the number of variables. Therefore, we further discuss an efficient approximate approach for applying the algorithm to large sized graphs. We illustrate the algorithm with artificial and real-world datasets.
△ Less
Submitted 24 June, 2010;
originally announced June 2010.
-
Estimation of linear, non-gaussian causal models in the presence of confounding latent variables
Authors:
Patrik O. Hoyer,
Shohei Shimizu,
Antti J. Kerminen
Abstract:
The estimation of linear causal models (also known as structural equation models) from data is a well-known problem which has received much attention in the past. Most previous work has, however, made an explicit or implicit assumption of gaussianity, limiting the identifiability of the models. We have recently shown (Shimizu et al, 2005; Hoyer et al, 2006) that for non-gaussian distributions th…
▽ More
The estimation of linear causal models (also known as structural equation models) from data is a well-known problem which has received much attention in the past. Most previous work has, however, made an explicit or implicit assumption of gaussianity, limiting the identifiability of the models. We have recently shown (Shimizu et al, 2005; Hoyer et al, 2006) that for non-gaussian distributions the full causal model can be estimated in the no hidden variables case. In this contribution, we discuss the estimation of the model when confounding latent variables are present. Although in this case uniqueness is no longer guaranteed, there is at most a finite set of models which can fit the data. We develop an algorithm for estimating this set, and describe numerical simulations which confirm the theoretical arguments and demonstrate the practical viability of the approach. Full Matlab code is provided for all simulations.
△ Less
Submitted 22 May, 2006; v1 submitted 9 March, 2006;
originally announced March 2006.