-
Multi-label Learning with Random Circular Vectors
Authors:
Ken Nishida,
Kojiro Machi,
Kazuma Onishi,
Katsuhiko Hayashi,
Hidetaka Kamigaito
Abstract:
The extreme multi-label classification~(XMC) task involves learning a classifier that can predict from a large label set the most relevant subset of labels for a data instance. While deep neural networks~(DNNs) have demonstrated remarkable success in XMC problems, the task is still challenging because it must deal with a large number of output labels, which make the DNN training computationally ex…
▽ More
The extreme multi-label classification~(XMC) task involves learning a classifier that can predict from a large label set the most relevant subset of labels for a data instance. While deep neural networks~(DNNs) have demonstrated remarkable success in XMC problems, the task is still challenging because it must deal with a large number of output labels, which make the DNN training computationally expensive. This paper addresses the issue by exploring the use of random circular vectors, where each vector component is represented as a complex amplitude. In our framework, we can develop an output layer and loss function of DNNs for XMC by representing the final output layer as a fully connected layer that directly predicts a low-dimensional circular vector encoding a set of labels for a data instance. We conducted experiments on synthetic datasets to verify that circular vectors have better label encoding capacity and retrieval ability than normal real-valued vectors. Then, we conducted experiments on actual XMC datasets and found that these appealing properties of circular vectors contribute to significant improvements in task performance compared with a previous model using random real-valued vectors, while reducing the size of the output layers by up to 99%.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Unified Interpretation of Smoothing Methods for Negative Sampling Loss Functions in Knowledge Graph Embedding
Authors:
Xincan Feng,
Hidetaka Kamigaito,
Katsuhiko Hayashi,
Taro Watanabe
Abstract:
Knowledge Graphs (KGs) are fundamental resources in knowledge-intensive tasks in NLP. Due to the limitation of manually creating KGs, KG Completion (KGC) has an important role in automatically completing KGs by scoring their links with KG Embedding (KGE). To handle many entities in training, KGE relies on Negative Sampling (NS) loss that can reduce the computational cost by sampling. Since the app…
▽ More
Knowledge Graphs (KGs) are fundamental resources in knowledge-intensive tasks in NLP. Due to the limitation of manually creating KGs, KG Completion (KGC) has an important role in automatically completing KGs by scoring their links with KG Embedding (KGE). To handle many entities in training, KGE relies on Negative Sampling (NS) loss that can reduce the computational cost by sampling. Since the appearance frequencies for each link are at most one in KGs, sparsity is an essential and inevitable problem. The NS loss is no exception. As a solution, the NS loss in KGE relies on smoothing methods like Self-Adversarial Negative Sampling (SANS) and subsampling. However, it is uncertain what kind of smoothing method is suitable for this purpose due to the lack of theoretical understanding. This paper provides theoretical interpretations of the smoothing methods for the NS loss in KGE and induces a new NS loss, Triplet Adaptive Negative Sampling (TANS), that can cover the characteristics of the conventional smoothing methods. Experimental results of TransE, DistMult, ComplEx, RotatE, HAKE, and HousE on FB15k-237, WN18RR, and YAGO3-10 datasets and their sparser subsets show the soundness of our interpretation and performance improvement by our TANS.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Artwork Explanation in Large-scale Vision Language Models
Authors:
Kazuki Hayashi,
Yusuke Sakai,
Hidetaka Kamigaito,
Katsuhiko Hayashi,
Taro Watanabe
Abstract:
Large-scale vision-language models (LVLMs) output text from images and instructions, demonstrating advanced capabilities in text generation and comprehension. However, it has not been clarified to what extent LVLMs understand the knowledge necessary for explaining images, the complex relationships between various pieces of knowledge, and how they integrate these understandings into their explanati…
▽ More
Large-scale vision-language models (LVLMs) output text from images and instructions, demonstrating advanced capabilities in text generation and comprehension. However, it has not been clarified to what extent LVLMs understand the knowledge necessary for explaining images, the complex relationships between various pieces of knowledge, and how they integrate these understandings into their explanations. To address this issue, we propose a new task: the artwork explanation generation task, along with its evaluation dataset and metric for quantitatively assessing the understanding and utilization of knowledge about artworks. This task is apt for image description based on the premise that LVLMs are expected to have pre-existing knowledge of artworks, which are often subjects of wide recognition and documented information. It consists of two parts: generating explanations from both images and titles of artworks, and generating explanations using only images, thus evaluating the LVLMs' language-based and vision-based knowledge. Alongside, we release a training dataset for LVLMs to learn explanations that incorporate knowledge about artworks. Our findings indicate that LVLMs not only struggle with integrating language and visual information but also exhibit a more pronounced limitation in acquiring knowledge from images alone. The datasets (ExpArt=Explain Artworks) are available at https://huggingface.co/datasets/naist-nlp/ExpArt.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Extended Flow Matching: a Method of Conditional Generation with Generalized Continuity Equation
Authors:
Noboru Isobe,
Masanori Koyama,
Jinzhe Zhang,
Kohei Hayashi,
Kenji Fukumizu
Abstract:
The task of conditional generation is one of the most important applications of generative models, and numerous methods have been developed to date based on the celebrated flow-based models. However, many flow-based models in use today are not built to allow one to introduce an explicit inductive bias to how the conditional distribution to be generated changes with respect to conditions. This can…
▽ More
The task of conditional generation is one of the most important applications of generative models, and numerous methods have been developed to date based on the celebrated flow-based models. However, many flow-based models in use today are not built to allow one to introduce an explicit inductive bias to how the conditional distribution to be generated changes with respect to conditions. This can result in unexpected behavior in the task of style transfer, for example. In this research, we introduce extended flow matching (EFM), a direct extension of flow matching that learns a "matrix field" corresponding to the continuous map from the space of conditions to the space of distributions. We show that we can introduce inductive bias to the conditional generation through the matrix field and demonstrate this fact with MMOT-EFM, a version of EFM that aims to minimize the Dirichlet energy or the sensitivity of the distribution with respect to conditions. We will present our theory along with experimental results that support the competitiveness of EFM in conditional generation.
△ Less
Submitted 5 July, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Evaluating Image Review Ability of Vision Language Models
Authors:
Shigeki Saito,
Kazuki Hayashi,
Yusuke Ide,
Yusuke Sakai,
Kazuma Onishi,
Toma Suzuki,
Seiji Gobara,
Hidetaka Kamigaito,
Katsuhiko Hayashi,
Taro Watanabe
Abstract:
Large-scale vision language models (LVLMs) are language models that are capable of processing images and text inputs by a single model. This paper explores the use of LVLMs to generate review texts for images. The ability of LVLMs to review images is not fully understood, highlighting the need for a methodical evaluation of their review abilities. Unlike image captions, review texts can be written…
▽ More
Large-scale vision language models (LVLMs) are language models that are capable of processing images and text inputs by a single model. This paper explores the use of LVLMs to generate review texts for images. The ability of LVLMs to review images is not fully understood, highlighting the need for a methodical evaluation of their review abilities. Unlike image captions, review texts can be written from various perspectives such as image composition and exposure. This diversity of review perspectives makes it difficult to uniquely determine a single correct review for an image. To address this challenge, we introduce an evaluation method based on rank correlation analysis, in which review texts are ranked by humans and LVLMs, then, measures the correlation between these rankings. We further validate this approach by creating a benchmark dataset aimed at assessing the image review ability of recent LVLMs. Our experiments with the dataset reveal that LVLMs, particularly those with proven superiority in other evaluative contexts, excel at distinguishing between high-quality and substandard image reviews.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Randomized Algorithms for Symmetric Nonnegative Matrix Factorization
Authors:
Koby Hayashi,
Sinan G. Aksoy,
Grey Ballard,
Haesun Park
Abstract:
Symmetric Nonnegative Matrix Factorization (SymNMF) is a technique in data analysis and machine learning that approximates a symmetric matrix with a product of a nonnegative, low-rank matrix and its transpose. To design faster and more scalable algorithms for SymNMF we develop two randomized algorithms for its computation. The first algorithm uses randomized matrix sketching to compute an initial…
▽ More
Symmetric Nonnegative Matrix Factorization (SymNMF) is a technique in data analysis and machine learning that approximates a symmetric matrix with a product of a nonnegative, low-rank matrix and its transpose. To design faster and more scalable algorithms for SymNMF we develop two randomized algorithms for its computation. The first algorithm uses randomized matrix sketching to compute an initial low-rank input matrix and proceeds to use this input to rapidly compute a SymNMF. The second algorithm uses randomized leverage score sampling to approximately solve constrained least squares problems. Many successful methods for SymNMF rely on (approximately) solving sequences of constrained least squares problems. We prove theoretically that leverage score sampling can approximately solve nonnegative least squares problems to a chosen accuracy with high probability. Finally we demonstrate that both methods work well in practice by applying them to graph clustering tasks on large real world data sets. These experiments show that our methods approximately maintain solution quality and achieve significant speed ups for both large dense and large sparse problems.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
CFTM: Continuous time fractional topic model
Authors:
Kei Nakagawa,
Kohei Hayashi,
Yugo Fujimoto
Abstract:
In this paper, we propose the Continuous Time Fractional Topic Model (cFTM), a new method for dynamic topic modeling. This approach incorporates fractional Brownian motion~(fBm) to effectively identify positive or negative correlations in topic and word distribution over time, revealing long-term dependency or roughness. Our theoretical analysis shows that the cFTM can capture these long-term depe…
▽ More
In this paper, we propose the Continuous Time Fractional Topic Model (cFTM), a new method for dynamic topic modeling. This approach incorporates fractional Brownian motion~(fBm) to effectively identify positive or negative correlations in topic and word distribution over time, revealing long-term dependency or roughness. Our theoretical analysis shows that the cFTM can capture these long-term dependency or roughness in both topic and word distributions, mirroring the main characteristics of fBm. Moreover, we prove that the parameter estimation process for the cFTM is on par with that of LDA, traditional topic models. To demonstrate the cFTM's property, we conduct empirical study using economic news articles. The results from these tests support the model's ability to identify and track long-term dependency or roughness in topics over time.
△ Less
Submitted 6 February, 2024; v1 submitted 29 January, 2024;
originally announced February 2024.
-
SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge
Authors:
Dimitrios Psychogyios,
Emanuele Colleoni,
Beatrice Van Amsterdam,
Chih-Yang Li,
Shu-Yu Huang,
Yuchong Li,
Fucang Jia,
Baosheng Zou,
Guotai Wang,
Yang Liu,
Maxence Boels,
Jiayu Huo,
Rachel Sparks,
Prokar Dasgupta,
Alejandro Granados,
Sebastien Ourselin,
Mengya Xu,
An Wang,
Yanan Wu,
Long Bai,
Hongliang Ren,
Atsushi Yamada,
Yuriko Harai,
Yuto Ishikawa,
Kazuyuki Hayashi
, et al. (25 additional authors not shown)
Abstract:
Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segme…
▽ More
Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation. The complete SAR-RARP50 dataset is available at: https://rdr.ucl.ac.uk/projects/SARRARP50_Segmentation_of_surgical_instrumentation_and_Action_Recognition_on_Robot-Assisted_Radical_Prostatectomy_Challenge/191091
△ Less
Submitted 23 January, 2024; v1 submitted 31 December, 2023;
originally announced January 2024.
-
Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?
Authors:
Yusuke Sakai,
Hidetaka Kamigaito,
Katsuhiko Hayashi,
Taro Watanabe
Abstract:
Knowledge graphs (KGs) consist of links that describe relationships between entities. Due to the difficulty of manually enumerating all relationships between entities, automatically completing them is essential for KGs. Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG. Traditional embedding-based KGC methods, such as RESCAL, TransE, DistMult, Com…
▽ More
Knowledge graphs (KGs) consist of links that describe relationships between entities. Due to the difficulty of manually enumerating all relationships between entities, automatically completing them is essential for KGs. Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG. Traditional embedding-based KGC methods, such as RESCAL, TransE, DistMult, ComplEx, RotatE, HAKE, HousE, etc., infer missing links using only the knowledge from training data. In contrast, the recent Pre-trained Language Model (PLM)-based KGC utilizes knowledge obtained during pre-training. Therefore, PLM-based KGC can estimate missing links between entities by reusing memorized knowledge from pre-training without inference. This approach is problematic because building KGC models aims to infer unseen links between entities. However, conventional evaluations in KGC do not consider inference and memorization abilities separately. Thus, a PLM-based KGC method, which achieves high performance in current KGC evaluations, may be ineffective in practical applications. To address this issue, we analyze whether PLM-based KGC methods make inferences or merely access memorized knowledge. For this purpose, we propose a method for constructing synthetic datasets specified in this analysis and conclude that PLMs acquire the inference abilities required for KGC through pre-training, even though the performance improvements mostly come from textual information of entities and relations.
△ Less
Submitted 6 June, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Model-based Subsampling for Knowledge Graph Completion
Authors:
Xincan Feng,
Hidetaka Kamigaito,
Katsuhiko Hayashi,
Taro Watanabe
Abstract:
Subsampling is effective in Knowledge Graph Embedding (KGE) for reducing overfitting caused by the sparsity in Knowledge Graph (KG) datasets. However, current subsampling approaches consider only frequencies of queries that consist of entities and their relations. Thus, the existing subsampling potentially underestimates the appearance probabilities of infrequent queries even if the frequencies of…
▽ More
Subsampling is effective in Knowledge Graph Embedding (KGE) for reducing overfitting caused by the sparsity in Knowledge Graph (KG) datasets. However, current subsampling approaches consider only frequencies of queries that consist of entities and their relations. Thus, the existing subsampling potentially underestimates the appearance probabilities of infrequent queries even if the frequencies of their entities or relations are high. To address this problem, we propose Model-based Subsampling (MBS) and Mixed Subsampling (MIX) to estimate their appearance probabilities through predictions of KGE models. Evaluation results on datasets FB15k-237, WN18RR, and YAGO3-10 showed that our proposed subsampling methods actually improved the KG completion performances for popular KGE models, RotatE, TransE, HAKE, ComplEx, and DistMult.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Implicit ZCA Whitening Effects of Linear Autoencoders for Recommendation
Authors:
Katsuhiko Hayashi,
Kazuma Onishi
Abstract:
Recently, in the field of recommendation systems, linear regression (autoencoder) models have been investigated as a way to learn item similarity. In this paper, we show a connection between a linear autoencoder model and ZCA whitening for recommendation data. In particular, we show that the dual form solution of a linear autoencoder model actually has ZCA whitening effects on feature vectors of i…
▽ More
Recently, in the field of recommendation systems, linear regression (autoencoder) models have been investigated as a way to learn item similarity. In this paper, we show a connection between a linear autoencoder model and ZCA whitening for recommendation data. In particular, we show that the dual form solution of a linear autoencoder model actually has ZCA whitening effects on feature vectors of items, while items are considered as input features in the primal problem of the autoencoder/regression model. We also show the correctness of applying a linear autoencoder to low-dimensional item vectors obtained using embedding methods such as Item2vec to estimate item-item similarities. Our experiments provide preliminary results indicating the effectiveness of whitening low-dimensional item embeddings.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics
Authors:
Kenta Oono,
Nontawat Charoenphakdee,
Kotatsu Bito,
Zhengyan Gao,
Yoshiaki Ota,
Shoichiro Yamaguchi,
Yohei Sugawara,
Shin-ichi Maeda,
Kunihiko Miyoshi,
Yuki Saito,
Koki Tsuda,
Hiroshi Maruyama,
Kohei Hayashi
Abstract:
Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental conditions. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healt…
▽ More
Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental conditions. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healthcare, lifestyles, and personalities. VHGM is a deep generative model trained with masked modeling to learn the joint distribution of attributes conditioned on known ones. Using heterogeneous tabular datasets, VHGM learns more than 1,800 attributes efficiently. We numerically evaluate the performance of VHGM and its training techniques. As a proof-of-concept of VHGM, we present several applications demonstrating user scenarios, such as virtual measurements of healthcare attributes and hypothesis verifications of lifestyles.
△ Less
Submitted 14 August, 2023; v1 submitted 18 June, 2023;
originally announced June 2023.
-
Using Wikipedia Editor Information to Build High-performance Recommender Systems
Authors:
Katsuhiko Hayashi
Abstract:
Wikipedia has high-quality articles on a variety of topics and has been used in diverse research areas. In this study, a method is presented for using Wikipedia's editor information to build recommender systems in various domains that outperform content-based systems.
Wikipedia has high-quality articles on a variety of topics and has been used in diverse research areas. In this study, a method is presented for using Wikipedia's editor information to build recommender systems in various domains that outperform content-based systems.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models
Authors:
Hidetaka Kamigaito,
Katsuhiko Hayashi,
Taro Watanabe
Abstract:
In this paper, we propose a table and image generation task to verify how the knowledge about entities acquired from natural language is retained in Vision & Language (V&L) models. This task consists of two parts: the first is to generate a table containing knowledge about an entity and its related image, and the second is to generate an image from an entity with a caption and a table containing r…
▽ More
In this paper, we propose a table and image generation task to verify how the knowledge about entities acquired from natural language is retained in Vision & Language (V&L) models. This task consists of two parts: the first is to generate a table containing knowledge about an entity and its related image, and the second is to generate an image from an entity with a caption and a table containing related knowledge of the entity. In both tasks, the model must know the entities used to perform the generation properly. We created the Wikipedia Table and Image Generation (WikiTIG) dataset from about 200,000 infoboxes in English Wikipedia articles to perform the proposed tasks. We evaluated the performance on the tasks with respect to the above research question using the V&L model OFA, which has achieved state-of-the-art results in multiple tasks. Experimental results show that OFA forgets part of its entity knowledge by pre-training as a complement to improve the performance of image related tasks.
△ Less
Submitted 25 July, 2023; v1 submitted 3 June, 2023;
originally announced June 2023.
-
Neural Fourier Transform: A General Approach to Equivariant Representation Learning
Authors:
Masanori Koyama,
Kenji Fukumizu,
Kohei Hayashi,
Takeru Miyato
Abstract:
Symmetry learning has proven to be an effective approach for extracting the hidden structure of data, with the concept of equivariance relation playing the central role. However, most of the current studies are built on architectural theory and corresponding assumptions on the form of data. We propose Neural Fourier Transform (NFT), a general framework of learning the latent linear action of the g…
▽ More
Symmetry learning has proven to be an effective approach for extracting the hidden structure of data, with the concept of equivariance relation playing the central role. However, most of the current studies are built on architectural theory and corresponding assumptions on the form of data. We propose Neural Fourier Transform (NFT), a general framework of learning the latent linear action of the group without assuming explicit knowledge of how the group acts on data. We present the theoretical foundations of NFT and show that the existence of a linear equivariant feature, which has been assumed ubiquitously in equivariance learning, is equivalent to the existence of a group invariant kernel on the dataspace. We also provide experimental results to demonstrate the application of NFT in typical scenarios with varying levels of knowledge about the acting group.
△ Less
Submitted 14 February, 2024; v1 submitted 29 May, 2023;
originally announced May 2023.
-
TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns
Authors:
Soma Onishi,
Kenta Oono,
Kohei Hayashi
Abstract:
We present \emph{TabRet}, a pre-trainable Transformer-based model for tabular data. TabRet is designed to work on a downstream task that contains columns not seen in pre-training. Unlike other methods, TabRet has an extra learning step before fine-tuning called \emph{retokenizing}, which calibrates feature embeddings based on the masked autoencoding loss. In experiments, we pre-trained TabRet with…
▽ More
We present \emph{TabRet}, a pre-trainable Transformer-based model for tabular data. TabRet is designed to work on a downstream task that contains columns not seen in pre-training. Unlike other methods, TabRet has an extra learning step before fine-tuning called \emph{retokenizing}, which calibrates feature embeddings based on the masked autoencoding loss. In experiments, we pre-trained TabRet with a large collection of public health surveys and fine-tuned it on classification tasks in healthcare, and TabRet achieved the best AUC performance on four datasets. In addition, an ablation study shows retokenizing and random shuffle augmentation of columns during pre-training contributed to performance gains. The code is available at https://github.com/pfnet-research/tabret .
△ Less
Submitted 15 April, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Subsampling for Knowledge Graph Embedding Explained
Authors:
Hidetaka Kamigaito,
Katsuhiko Hayashi
Abstract:
In this article, we explain the recent advance of subsampling methods in knowledge graph embedding (KGE) starting from the original one used in word2vec.
In this article, we explain the recent advance of subsampling methods in knowledge graph embedding (KGE) starting from the original one used in word2vec.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning
Authors:
Hidetaka Kamigaito,
Katsuhiko Hayashi
Abstract:
Negative sampling (NS) loss plays an important role in learning knowledge graph embedding (KGE) to handle a huge number of entities. However, the performance of KGE degrades without hyperparameters such as the margin term and number of negative samples in NS loss being appropriately selected. Currently, empirical hyperparameter tuning addresses this problem at the cost of computational time. To so…
▽ More
Negative sampling (NS) loss plays an important role in learning knowledge graph embedding (KGE) to handle a huge number of entities. However, the performance of KGE degrades without hyperparameters such as the margin term and number of negative samples in NS loss being appropriately selected. Currently, empirical hyperparameter tuning addresses this problem at the cost of computational time. To solve this problem, we theoretically analyzed NS loss to assist hyperparameter tuning and understand the better use of the NS loss in KGE learning. Our theoretical analysis showed that scoring methods with restricted value ranges, such as TransE and RotatE, require appropriate adjustment of the margin term or the number of negative samples different from those without restricted value ranges, such as RESCAL, ComplEx, and DistMult. We also propose subsampling methods specialized for the NS loss in KGE studied from a theoretical aspect. Our empirical analysis on the FB15k-237, WN18RR, and YAGO3-10 datasets showed that the results of actually trained models agree with our theoretical findings.
△ Less
Submitted 6 July, 2022; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Finding Hall blockers by matrix scaling
Authors:
Koyo Hayashi,
Hiroshi Hirai,
Keiya Sakabe
Abstract:
For a given nonnegative matrix $A=(A_{ij})$, the matrix scaling problem asks whether $A$ can be scaled to a doubly stochastic matrix $D_1AD_2$ for some positive diagonal matrices $D_1,D_2$.The Sinkhorn algorithm is a simple iterative algorithm, which repeats row-normalization $A_{ij} \leftarrow A_{ij}/\sum_{j}A_{ij}$ and column-normalization $A_{ij} \leftarrow A_{ij}/\sum_{i}A_{ij}$ alternatively.…
▽ More
For a given nonnegative matrix $A=(A_{ij})$, the matrix scaling problem asks whether $A$ can be scaled to a doubly stochastic matrix $D_1AD_2$ for some positive diagonal matrices $D_1,D_2$.The Sinkhorn algorithm is a simple iterative algorithm, which repeats row-normalization $A_{ij} \leftarrow A_{ij}/\sum_{j}A_{ij}$ and column-normalization $A_{ij} \leftarrow A_{ij}/\sum_{i}A_{ij}$ alternatively. By this algorithm, $A$ converges to a doubly stochastic matrix in limit if and only if the bipartite graph associated with $A$ has a perfect matching. This property can decide the existence of a perfect matching in a given bipartite graph $G$, which is identified with the $0,1$-matrix $A_G$.Linial, Samorodnitsky, and Wigderson showed that $O(n^2 \log n)$ iterations for $A_G$ decide whether $G$ has a perfect matching. Here $n$ is the number of vertices in one of the color classes of $G$. In this paper, we show an extension of this result:If $G$ has no perfect matching, then a polynomial number of the Sinkhorn iterations identifies a Hall blocker -- a vertex subset $X$ having neighbors $Γ(X)$ with $|X| > |Γ(X)|$. Specifically, we show that $O(n^2 \log n)$ iterations can identify one Hall blocker, and that further polynomial iterations can also identify all parametric Hall blockers $X$ of maximizing $(1-λ) |X| - λ|Γ(X)|$ for $λ\in [0,1]$.The former result is based on an interpretation of the Sinkhorn algorithm as alternating minimization for geometric programming. The latter is on an interpretation as alternating minimization for KL-divergence (Csiszár and Tusnády 1984, Gietl and Reffel 2013) and its limiting behavior for a nonscalable matrix (Aas 2014). We also relate the Sinkhorn limit with parametric network flow, principal partition of polymatroids, and the Dulmage-Mendelsohn decomposition of a bipartite graph.
△ Less
Submitted 15 June, 2023; v1 submitted 15 April, 2022;
originally announced April 2022.
-
Two flags in a semimodular lattice generate an antimatroid
Authors:
Koyo Hayashi,
Hiroshi Hirai
Abstract:
A basic property in a modular lattice is that any two flags generate a distributive sublattice. It is shown (Abels 1991, Herscovic 1998) that two flags in a semimodular lattice no longer generate such a good sublattice, whereas shortest galleries connecting them form a relatively good join-sublattice. In this note, we sharpen this investigation to establish an analogue of the two-flag generation t…
▽ More
A basic property in a modular lattice is that any two flags generate a distributive sublattice. It is shown (Abels 1991, Herscovic 1998) that two flags in a semimodular lattice no longer generate such a good sublattice, whereas shortest galleries connecting them form a relatively good join-sublattice. In this note, we sharpen this investigation to establish an analogue of the two-flag generation theorem for a semimodular lattice. We consider the notion of a modular convex subset, which is a subset closed under the join and meet only for modular pairs, and show that the modular convex hull of two flags in a semimodular lattice of rank $n$ is isomorphic to a union-closed family on $[n]$. This family uniquely determines an antimatroid, which coincides with the join-sublattice of shortest galleries of the two flags.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
An Independently Learnable Hierarchical Model for Bilateral Control-Based Imitation Learning Applications
Authors:
Kazuki Hayashi,
Sho Sakaino,
Toshiaki Tsuji
Abstract:
Recently, motion generation by machine learning has been actively researched to automate various tasks. Imitation learning is one such method that learns motions from data collected in advance. However, executing long-term tasks remains challenging. Therefore, a novel framework for imitation learning is proposed to solve this problem. The proposed framework comprises upper and lower layers, where…
▽ More
Recently, motion generation by machine learning has been actively researched to automate various tasks. Imitation learning is one such method that learns motions from data collected in advance. However, executing long-term tasks remains challenging. Therefore, a novel framework for imitation learning is proposed to solve this problem. The proposed framework comprises upper and lower layers, where the upper layer model, whose timescale is long, and lower layer model, whose timescale is short, can be independently trained. In this model, the upper layer learns long-term task planning, and the lower layer learns motion primitives. The proposed method was experimentally compared to hierarchical RNN-based methods to validate its effectiveness. Consequently, the proposed method showed a success rate equal to or greater than that of conventional methods. In addition, the proposed method required less than 1/20 of the training time compared to conventional methods. Moreover, it succeeded in executing unlearned tasks by reusing the trained lower layer.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
Skew-Symmetric Adjacency Matrices for Clustering Directed Graphs
Authors:
Koby Hayashi,
Sinan G. Aksoy,
Haesun Park
Abstract:
Cut-based directed graph (digraph) clustering often focuses on finding dense within-cluster or sparse between-cluster connections, similar to cut-based undirected graph clustering methods. In contrast, for flow-based clusterings the edges between clusters tend to be oriented in one direction and have been found in migration data, food webs, and trade data. In this paper we introduce a spectral alg…
▽ More
Cut-based directed graph (digraph) clustering often focuses on finding dense within-cluster or sparse between-cluster connections, similar to cut-based undirected graph clustering methods. In contrast, for flow-based clusterings the edges between clusters tend to be oriented in one direction and have been found in migration data, food webs, and trade data. In this paper we introduce a spectral algorithm for finding flow-based clusterings. The proposed algorithm is based on recent work which uses complex-valued Hermitian matrices to represent digraphs. By establishing an algebraic relationship between a complex-valued Hermitian representation and an associated real-valued, skew-symmetric matrix the proposed algorithm produces clusterings while remaining completely in the real field. Our algorithm uses less memory and asymptotically less computation while provably preserving solution quality. We also show the algorithm can be easily implemented using standard computational building blocks, possesses better numerical properties, and loans itself to a natural interpretation via an objective function relaxation argument.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Fractional SDE-Net: Generation of Time Series Data with Long-term Memory
Authors:
Kohei Hayashi,
Kei Nakagawa
Abstract:
In this paper, we focus on the generation of time-series data using neural networks. It is often the case that input time-series data have only one realized (and usually irregularly sampled) path, which makes it difficult to extract time-series characteristics, and its noise structure is more complicated than i.i.d. type. Time series data, especially from hydrology, telecommunications, economics,…
▽ More
In this paper, we focus on the generation of time-series data using neural networks. It is often the case that input time-series data have only one realized (and usually irregularly sampled) path, which makes it difficult to extract time-series characteristics, and its noise structure is more complicated than i.i.d. type. Time series data, especially from hydrology, telecommunications, economics, and finance, exhibit long-term memory also called long-range dependency (LRD). The main purpose of this paper is to artificially generate time series with the help of neural networks, making the LRD of paths into account. We propose fSDE-Net: neural fractional Stochastic Differential Equation Network. It generalizes the neural stochastic differential equation model by using fractional Brownian motion with a Hurst index larger than half, which exhibits the LRD property. We derive the solver of fSDE-Net and theoretically analyze the existence and uniqueness of the solution to fSDE-Net. Our experiments with artificial and real time-series data demonstrate that the fSDE-Net model can replicate distributional properties well.
△ Less
Submitted 23 August, 2022; v1 submitted 16 January, 2022;
originally announced January 2022.
-
A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?
Authors:
Hiroaki Mikami,
Kenji Fukumizu,
Shogo Murai,
Shuji Suzuki,
Yuta Kikuchi,
Taiji Suzuki,
Shin-ichi Maeda,
Kohei Hayashi
Abstract:
Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks. The most significant advantage of using synthetic images is that the ground-truth labels are automatically available, enabling unlimited expansion of the data size without human cost. However, synthetic data may have a huge doma…
▽ More
Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks. The most significant advantage of using synthetic images is that the ground-truth labels are automatically available, enabling unlimited expansion of the data size without human cost. However, synthetic data may have a huge domain gap, in which case increasing the data size does not improve the performance. How can we know that? In this study, we derive a simple scaling law that predicts the performance from the amount of pre-training data. By estimating the parameters of the law, we can judge whether we should increase the data or change the setting of image synthesis. Further, we analyze the theory of transfer learning by considering learning dynamics and confirm that the derived generalization bound is consistent with our empirical findings. We empirically validated our scaling law on various experimental settings of benchmark tasks, model sizes, and complexities of synthetic images.
△ Less
Submitted 8 October, 2021; v1 submitted 24 August, 2021;
originally announced August 2021.
-
Unified Interpretation of Softmax Cross-Entropy and Negative Sampling: With Case Study for Knowledge Graph Embedding
Authors:
Hidetaka Kamigaito,
Katsuhiko Hayashi
Abstract:
In knowledge graph embedding, the theoretical relationship between the softmax cross-entropy and negative sampling loss functions has not been investigated. This makes it difficult to fairly compare the results of the two different loss functions. We attempted to solve this problem by using the Bregman divergence to provide a unified interpretation of the softmax cross-entropy and negative samplin…
▽ More
In knowledge graph embedding, the theoretical relationship between the softmax cross-entropy and negative sampling loss functions has not been investigated. This makes it difficult to fairly compare the results of the two different loss functions. We attempted to solve this problem by using the Bregman divergence to provide a unified interpretation of the softmax cross-entropy and negative sampling loss functions. Under this interpretation, we can derive theoretical findings for fair comparison. Experimental results on the FB15k-237 and WN18RR datasets show that the theoretical findings are valid in practical settings.
△ Less
Submitted 16 March, 2022; v1 submitted 14 June, 2021;
originally announced June 2021.
-
A New Autoregressive Neural Network Model with Command Compensation for Imitation Learning Based on Bilateral Control
Authors:
Kazuki Hayashi,
Ayumu Sasagawa,
Sho Sakaino,
Toshiaki Tsuji
Abstract:
In the near future, robots are expected to work with humans or operate alone and may replace human workers in various fields such as homes and factories. In a previous study, we proposed bilateral control-based imitation learning that enables robots to utilize force information and operate almost simultaneously with an expert's demonstration. In addition, we recently proposed an autoregressive neu…
▽ More
In the near future, robots are expected to work with humans or operate alone and may replace human workers in various fields such as homes and factories. In a previous study, we proposed bilateral control-based imitation learning that enables robots to utilize force information and operate almost simultaneously with an expert's demonstration. In addition, we recently proposed an autoregressive neural network model (SM2SM) for bilateral control-based imitation learning to obtain long-term inferences. In the SM2SM model, both master and slave states must be input, but the master states are obtained from the previous outputs of the SM2SM model, resulting in destabilized estimation under large environmental variations. Hence, a new autoregressive neural network model (S2SM) is proposed in this study. This model requires only the slave state as input and its outputs are the next slave and master states, thereby improving the task success rates. In addition, a new feedback controller that utilizes the error between the responses and estimates of the slave is proposed, which shows better reproducibility.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Reconstructing Sparse Signals via Greedy Monte-Carlo Search
Authors:
Kao Hayashi,
Tomoyuki Obuchi,
Yoshiyuki Kabashima
Abstract:
We propose a Monte-Carlo-based method for reconstructing sparse signals in the formulation of sparse linear regression in a high-dimensional setting. The basic idea of this algorithm is to explicitly select variables or covariates to represent a given data vector or responses and accept randomly generated updates of that selection if and only if the energy or cost function decreases. This algorith…
▽ More
We propose a Monte-Carlo-based method for reconstructing sparse signals in the formulation of sparse linear regression in a high-dimensional setting. The basic idea of this algorithm is to explicitly select variables or covariates to represent a given data vector or responses and accept randomly generated updates of that selection if and only if the energy or cost function decreases. This algorithm is called the greedy Monte-Carlo (GMC) search algorithm. Its performance is examined via numerical experiments, which suggests that in the noiseless case, GMC can achieve perfect reconstruction in undersampling situations of a reasonable level: it can outperform the $\ell_1$ relaxation but does not reach the algorithmic limit of MC-based methods theoretically clarified by an earlier analysis. The necessary computational time is also examined and compared with that of an algorithm using simulated annealing. Additionally, experiments on the noisy case are conducted on synthetic datasets and on a real-world dataset, supporting the practicality of GMC.
△ Less
Submitted 29 January, 2021; v1 submitted 7 August, 2020;
originally announced August 2020.
-
A System for Worldwide COVID-19 Information Aggregation
Authors:
Akiko Aizawa,
Frederic Bergeron,
Junjie Chen,
Fei Cheng,
Katsuhiko Hayashi,
Kentaro Inui,
Hiroyoshi Ito,
Daisuke Kawahara,
Masaru Kitsuregawa,
Hirokazu Kiyomaru,
Masaki Kobayashi,
Takashi Kodama,
Sadao Kurohashi,
Qianying Liu,
Masaki Matsubara,
Yusuke Miyao,
Atsuyuki Morishima,
Yugo Murawaki,
Kazumasa Omura,
Haiyue Song,
Eiichiro Sumita,
Shinji Suzuki,
Ribeka Tanaka,
Yu Tanaka,
Masashi Toyoda
, et al. (4 additional authors not shown)
Abstract:
The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-…
▽ More
The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories.
△ Less
Submitted 11 October, 2020; v1 submitted 27 July, 2020;
originally announced August 2020.
-
Hypergraph Random Walks, Laplacians, and Clustering
Authors:
Koby Hayashi,
Sinan G. Aksoy,
Cheong Hee Park,
Haesun Park
Abstract:
We propose a flexible framework for clustering hypergraph-structured data based on recently proposed random walks utilizing edge-dependent vertex weights. When incorporating edge-dependent vertex weights (EDVW), a weight is associated with each vertex-hyperedge pair, yielding a weighted incidence matrix of the hypergraph. Such weightings have been utilized in term-document representations of text…
▽ More
We propose a flexible framework for clustering hypergraph-structured data based on recently proposed random walks utilizing edge-dependent vertex weights. When incorporating edge-dependent vertex weights (EDVW), a weight is associated with each vertex-hyperedge pair, yielding a weighted incidence matrix of the hypergraph. Such weightings have been utilized in term-document representations of text data sets. We explain how random walks with EDVW serve to construct different hypergraph Laplacian matrices, and then develop a suite of clustering methods that use these incidence matrices and Laplacians for hypergraph clustering. Using several data sets from real-life applications, we compare the performance of these clustering algorithms experimentally against a variety of existing hypergraph clustering methods. We show that the proposed methods produce higher-quality clusters and conclude by highlighting avenues for future work.
△ Less
Submitted 27 October, 2020; v1 submitted 29 June, 2020;
originally announced June 2020.
-
Weisfeiler-Lehman Embedding for Molecular Graph Neural Networks
Authors:
Katsuhiko Ishiguro,
Kenta Oono,
Kohei Hayashi
Abstract:
A graph neural network (GNN) is a good choice for predicting the chemical properties of molecules. Compared with other deep networks, however, the current performance of a GNN is limited owing to the "curse of depth." Inspired by long-established feature engineering in the field of chemistry, we expanded an atom representation using Weisfeiler-Lehman (WL) embedding, which is designed to capture lo…
▽ More
A graph neural network (GNN) is a good choice for predicting the chemical properties of molecules. Compared with other deep networks, however, the current performance of a GNN is limited owing to the "curse of depth." Inspired by long-established feature engineering in the field of chemistry, we expanded an atom representation using Weisfeiler-Lehman (WL) embedding, which is designed to capture local atomic patterns dominating the chemical properties of a molecule. In terms of representability, we show WL embedding can replace the first two layers of ReLU GNN -- a normal embedding and a hidden GNN layer -- with a smaller weight norm. We then demonstrate that WL embedding consistently improves the empirical performance over multiple GNN architectures and several molecular graph datasets.
△ Less
Submitted 17 August, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Binarized Canonical Polyadic Decomposition for Knowledge Graph Completion
Authors:
Koki Kishimoto,
Katsuhiko Hayashi,
Genki Akai,
Masashi Shimbo
Abstract:
Methods based on vector embeddings of knowledge graphs have been actively pursued as a promising approach to knowledge graph completion.However, embedding models generate storage-inefficient representations, particularly when the number of entities and relations, and the dimensionality of the real-valued embedding vectors are large. We present a binarized CANDECOMP/PARAFAC(CP) decomposition algori…
▽ More
Methods based on vector embeddings of knowledge graphs have been actively pursued as a promising approach to knowledge graph completion.However, embedding models generate storage-inefficient representations, particularly when the number of entities and relations, and the dimensionality of the real-valued embedding vectors are large. We present a binarized CANDECOMP/PARAFAC(CP) decomposition algorithm, which we refer to as B-CP, where real-valued parameters are replaced by binary values to reduce model size. Moreover, we show that a fast score computation technique can be developed with bitwise operations. We prove that B-CP is fully expressive by deriving a bound on the size of its embeddings. Experimental results on several benchmark datasets demonstrate that the proposed method successfully reduces model size by more than an order of magnitude while maintaining task performance at the same level as the real-valued CP model.
△ Less
Submitted 4 December, 2019;
originally announced December 2019.
-
Neuro-SERKET: Development of Integrative Cognitive System through the Composition of Deep Probabilistic Generative Models
Authors:
Tadahiro Taniguchi,
Tomoaki Nakamura,
Masahiro Suzuki,
Ryo Kuniyasu,
Kaede Hayashi,
Akira Taniguchi,
Takato Horii,
Takayuki Nagai
Abstract:
This paper describes a framework for the development of an integrative cognitive system based on probabilistic generative models (PGMs) called Neuro-SERKET. Neuro-SERKET is an extension of SERKET, which can compose elemental PGMs developed in a distributed manner and provide a scheme that allows the composed PGMs to learn throughout the system in an unsupervised way. In addition to the head-to-tai…
▽ More
This paper describes a framework for the development of an integrative cognitive system based on probabilistic generative models (PGMs) called Neuro-SERKET. Neuro-SERKET is an extension of SERKET, which can compose elemental PGMs developed in a distributed manner and provide a scheme that allows the composed PGMs to learn throughout the system in an unsupervised way. In addition to the head-to-tail connection supported by SERKET, Neuro-SERKET supports tail-to-tail and head-to-head connections, as well as neural network-based modules, i.e., deep generative models. As an example of a Neuro-SERKET application, an integrative model was developed by composing a variational autoencoder (VAE), a Gaussian mixture model (GMM), latent Dirichlet allocation (LDA), and automatic speech recognition (ASR). The model is called VAE+GMM+LDA+ASR. The performance of VAE+GMM+LDA+ASR and the validity of Neuro-SERKET were demonstrated through a multimodal categorization task using image data and a speech signal of numerical digits.
△ Less
Submitted 29 January, 2020; v1 submitted 20 October, 2019;
originally announced October 2019.
-
A Non-commutative Bilinear Model for Answering Path Queries in Knowledge Graphs
Authors:
Katsuhiko Hayashi,
Masashi Shimbo
Abstract:
Bilinear diagonal models for knowledge graph embedding (KGE), such as DistMult and ComplEx, balance expressiveness and computational efficiency by representing relations as diagonal matrices. Although they perform well in predicting atomic relations, composite relations (relation paths) cannot be modeled naturally by the product of relation matrices, as the product of diagonal matrices is commutat…
▽ More
Bilinear diagonal models for knowledge graph embedding (KGE), such as DistMult and ComplEx, balance expressiveness and computational efficiency by representing relations as diagonal matrices. Although they perform well in predicting atomic relations, composite relations (relation paths) cannot be modeled naturally by the product of relation matrices, as the product of diagonal matrices is commutative and hence invariant with the order of relations. In this paper, we propose a new bilinear KGE model, called BlockHolE, based on block circulant matrices. In BlockHolE, relation matrices can be non-commutative, allowing composite relations to be modeled by matrix product. The model is parameterized in a way that covers a spectrum ranging from diagonal to full relation matrices. A fast computation technique is developed on the basis of the duality of the Fourier transform of circulant matrices.
△ Less
Submitted 4 September, 2019;
originally announced September 2019.
-
PLANC: Parallel Low Rank Approximation with Non-negativity Constraints
Authors:
Srinivas Eswar,
Koby Hayashi,
Grey Ballard,
Ramakrishnan Kannan,
Michael A. Matheson,
Haesun Park
Abstract:
We consider the problem of low-rank approximation of massive dense non-negative tensor data, for example to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input da…
▽ More
We consider the problem of low-rank approximation of massive dense non-negative tensor data, for example to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called PLANC (Parallel Low Rank Approximation with Non-negativity Constraints), which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work).We describe our parallel distributions and algorithms, which are careful to avoid unnecessary communication and computation, show how to extend the software to include new algorithms and/or constraints, and report efficiency and scalability results for both synthetic and real-world data sets.
△ Less
Submitted 30 August, 2019;
originally announced September 2019.
-
Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks
Authors:
Kohei Hayashi,
Taiki Yamaguchi,
Yohei Sugawara,
Shin-ichi Maeda
Abstract:
Tensor decomposition methods are widely used for model compression and fast inference in convolutional neural networks (CNNs). Although many decompositions are conceivable, only CP decomposition and a few others have been applied in practice, and no extensive comparisons have been made between available methods. Previous studies have not determined how many decompositions are available, nor which…
▽ More
Tensor decomposition methods are widely used for model compression and fast inference in convolutional neural networks (CNNs). Although many decompositions are conceivable, only CP decomposition and a few others have been applied in practice, and no extensive comparisons have been made between available methods. Previous studies have not determined how many decompositions are available, nor which of them is optimal. In this study, we first characterize a decomposition class specific to CNNs by adopting a flexible graphical notation. The class includes such well-known CNN modules as depthwise separable convolution layers and bottleneck layers, but also previously unknown modules with nonlinear activations. We also experimentally compare the tradeoff between prediction accuracy and time/space complexity for modules found by enumerating all possible decompositions, or by using a neural architecture search. We find some nonlinear decompositions outperform existing ones.
△ Less
Submitted 27 November, 2019; v1 submitted 12 August, 2019;
originally announced August 2019.
-
Data Interpolating Prediction: Alternative Interpretation of Mixup
Authors:
Takuya Shimada,
Shoichiro Yamaguchi,
Kohei Hayashi,
Sosuke Kobayashi
Abstract:
Data augmentation by mixing samples, such as Mixup, has widely been used typically for classification tasks. However, this strategy is not always effective due to the gap between augmented samples for training and original samples for testing. This gap may prevent a classifier from learning the optimal decision boundary and increase the generalization error. To overcome this problem, we propose an…
▽ More
Data augmentation by mixing samples, such as Mixup, has widely been used typically for classification tasks. However, this strategy is not always effective due to the gap between augmented samples for training and original samples for testing. This gap may prevent a classifier from learning the optimal decision boundary and increase the generalization error. To overcome this problem, we propose an alternative framework called Data Interpolating Prediction (DIP). Unlike common data augmentations, we encapsulate the sample-mixing process in the hypothesis class of a classifier so that train and test samples are treated equally. We derive the generalization bound and show that DIP helps to reduce the original Rademacher complexity. Also, we empirically demonstrate that DIP can outperform existing Mixup.
△ Less
Submitted 19 June, 2019;
originally announced June 2019.
-
Binarized Knowledge Graph Embeddings
Authors:
Koki Kishimoto,
Katsuhiko Hayashi,
Genki Akai,
Masashi Shimbo,
Kazunori Komatani
Abstract:
Tensor factorization has become an increasingly popular approach to knowledge graph completion(KGC), which is the task of automatically predicting missing facts in a knowledge graph. However, even with a simple model like CANDECOMP/PARAFAC(CP) tensor decomposition, KGC on existing knowledge graphs is impractical in resource-limited environments, as a large amount of memory is required to store par…
▽ More
Tensor factorization has become an increasingly popular approach to knowledge graph completion(KGC), which is the task of automatically predicting missing facts in a knowledge graph. However, even with a simple model like CANDECOMP/PARAFAC(CP) tensor decomposition, KGC on existing knowledge graphs is impractical in resource-limited environments, as a large amount of memory is required to store parameters represented as 32-bit or 64-bit floating point numbers. This limitation is expected to become more stringent as existing knowledge graphs, which are already huge, keep steadily growing in scale. To reduce the memory requirement, we present a method for binarizing the parameters of the CP tensor decomposition by introducing a quantization function to the optimization problem. This method replaces floating point-valued parameters with binary ones after training, which drastically reduces the model size at run time. We investigate the trade-off between the quality and size of tensor factorization models for several KGC benchmark datasets. In our experiments, the proposed method successfully reduced the model size by more than an order of magnitude while maintaining the task performance. Moreover, a fast score computation technique can be developed with bitwise operations.
△ Less
Submitted 8 February, 2019;
originally announced February 2019.
-
On Random Subsampling of Gaussian Process Regression: A Graphon-Based Analysis
Authors:
Kohei Hayashi,
Masaaki Imaizumi,
Yuichi Yoshida
Abstract:
In this paper, we study random subsampling of Gaussian process regression, one of the simplest approximation baselines, from a theoretical perspective. Although subsampling discards a large part of training data, we show provable guarantees on the accuracy of the predictive mean/variance and its generalization ability. For analysis, we consider embedding kernel matrices into graphons, which encaps…
▽ More
In this paper, we study random subsampling of Gaussian process regression, one of the simplest approximation baselines, from a theoretical perspective. Although subsampling discards a large part of training data, we show provable guarantees on the accuracy of the predictive mean/variance and its generalization ability. For analysis, we consider embedding kernel matrices into graphons, which encapsulate the difference of the sample size and enables us to evaluate the approximation and generalization errors in a unified manner. The experimental results show that the subsampling approximation achieves a better trade-off regarding accuracy and runtime than the Nyström and random Fourier expansion methods.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Trainable Projected Gradient Detector for Massive Overloaded MIMO Channels: Data-driven Tuning Approach
Authors:
Satoshi Takabe,
Masayuki Imanishi,
Tadashi Wadayama,
Ryo Hayakawa,
Kazunori Hayashi
Abstract:
This paper presents a deep learning-aided iterative detection algorithm for massive overloaded multiple-input multiple-output (MIMO) systems where the number of transmit antennas $n$ is larger than that of receive antennas $m$. Since the proposed algorithm is based on the projected gradient descent method with trainable parameters, it is named the trainable projected gradient-detector (TPG-detecto…
▽ More
This paper presents a deep learning-aided iterative detection algorithm for massive overloaded multiple-input multiple-output (MIMO) systems where the number of transmit antennas $n$ is larger than that of receive antennas $m$. Since the proposed algorithm is based on the projected gradient descent method with trainable parameters, it is named the trainable projected gradient-detector (TPG-detector). The trainable internal parameters, such as the step-size parameter, can be optimized with standard deep learning techniques, i.e., the back propagation and stochastic gradient descent algorithms. This approach is referred to as data-driven tuning, and ensures fast convergence during parameter estimation in the proposed scheme. The TPG-detector mainly consists of matrix-vector product operations whose computational cost is proportional to $m n$ for each iteration. In addition, the number of trainable parameters in the TPG-detector is independent of the number of antennas. These features of the TPG-detector result in a fast and stable training process and reasonable scalability for large systems. Numerical simulations show that the proposed detector achieves a comparable detection performance to those of existing algorithms for massive overloaded MIMO channels, e.g., the state-of-the-art IW-SOAV detector, with a lower computation cost.
△ Less
Submitted 9 July, 2019; v1 submitted 25 December, 2018;
originally announced December 2018.
-
Reduction of Parameter Redundancy in Biaffine Classifiers with Symmetric and Circulant Weight Matrices
Authors:
Tomoki Matsuno,
Katsuhiko Hayashi,
Takahiro Ishihara,
Hitoshi Manabe,
Yuji Matsumoto
Abstract:
Currently, the biaffine classifier has been attracting attention as a method to introduce an attention mechanism into the modeling of binary relations. For instance, in the field of dependency parsing, the Deep Biaffine Parser by Dozat and Manning has achieved state-of-the-art performance as a graph-based dependency parser on the English Penn Treebank and CoNLL 2017 shared task. On the other hand,…
▽ More
Currently, the biaffine classifier has been attracting attention as a method to introduce an attention mechanism into the modeling of binary relations. For instance, in the field of dependency parsing, the Deep Biaffine Parser by Dozat and Manning has achieved state-of-the-art performance as a graph-based dependency parser on the English Penn Treebank and CoNLL 2017 shared task. On the other hand, it is reported that parameter redundancy in the weight matrix in biaffine classifiers, which has O(n^2) parameters, results in overfitting (n is the number of dimensions). In this paper, we attempted to reduce the parameter redundancy by assuming either symmetry or circularity of weight matrices. In our experiments on the CoNLL 2017 shared task dataset, our model achieved better or comparable accuracy on most of the treebanks with more than 16% parameter reduction.
△ Less
Submitted 18 October, 2018;
originally announced October 2018.
-
Data-dependent Learning of Symmetric/Antisymmetric Relations for Knowledge Base Completion
Authors:
Hitoshi Manabe,
Katsuhiko Hayashi,
Masashi Shimbo
Abstract:
Embedding-based methods for knowledge base completion (KBC) learn representations of entities and relations in a vector space, along with the scoring function to estimate the likelihood of relations between entities. The learnable class of scoring functions is designed to be expressive enough to cover a variety of real-world relations, but this expressive comes at the cost of an increased number o…
▽ More
Embedding-based methods for knowledge base completion (KBC) learn representations of entities and relations in a vector space, along with the scoring function to estimate the likelihood of relations between entities. The learnable class of scoring functions is designed to be expressive enough to cover a variety of real-world relations, but this expressive comes at the cost of an increased number of parameters. In particular, parameters in these methods are superfluous for relations that are either symmetric or antisymmetric. To mitigate this problem, we propose a new L1 regularizer for Complex Embeddings, which is one of the state-of-the-art embedding-based methods for KBC. This regularizer promotes symmetry or antisymmetry of the scoring function on a relation-by-relation basis, in accordance with the observed data. Our empirical evaluation shows that the proposed method outperforms the original Complex Embeddings and other baseline methods on the FB15k dataset.
△ Less
Submitted 25 August, 2018;
originally announced August 2018.
-
Deep Learning-Aided Projected Gradient Detector for Massive Overloaded MIMO Channels
Authors:
Satoshi Takabe,
Masayuki Imanishi,
Tadashi Wadayama,
Kazunori Hayashi
Abstract:
The paper presents a deep learning-aided iterative detection algorithm for massive overloaded MIMO systems. Since the proposed algorithm is based on the projected gradient descent method with trainable parameters, it is named as trainable projected descent-detector (TPG-detector). The trainable internal parameters can be optimized with standard deep learning techniques such as back propagation and…
▽ More
The paper presents a deep learning-aided iterative detection algorithm for massive overloaded MIMO systems. Since the proposed algorithm is based on the projected gradient descent method with trainable parameters, it is named as trainable projected descent-detector (TPG-detector). The trainable internal parameters can be optimized with standard deep learning techniques such as back propagation and stochastic gradient descent algorithms. This approach referred to as data-driven tuning brings notable advantages of the proposed scheme such as fast convergence. The numerical experiments show that TPG-detector achieves comparable detection performance to those of the known algorithms for massive overloaded MIMO channels with lower computation cost.
△ Less
Submitted 25 December, 2018; v1 submitted 28 June, 2018;
originally announced June 2018.
-
Parallel Nonnegative CP Decomposition of Dense Tensors
Authors:
Grey Ballard,
Koby Hayashi,
Ramakrishnan Kannan
Abstract:
The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensor data that can enforce nonnegativity of the computed low-rank factors. The principal task is to parallelize the matricized-tensor times Khatri-Rao product (MTTKRP) bottleneck…
▽ More
The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensor data that can enforce nonnegativity of the computed low-rank factors. The principal task is to parallelize the matricized-tensor times Khatri-Rao product (MTTKRP) bottleneck subcomputation. The algorithm is computation efficient, using dimension trees to avoid redundant computation across MTTKRPs within the alternating method. Our approach is also communication efficient, using a data distribution and parallel algorithm across a multidimensional processor grid that can be tuned to minimize communication. We benchmark our software on synthetic as well as hyperspectral image and neuroscience dynamic functional connectivity data, demonstrating that our algorithm scales well to 100s of nodes (up to 4096 cores) and is faster and more general than the currently available parallel software.
△ Less
Submitted 19 June, 2018;
originally announced June 2018.
-
A polynomial time algorithm to compute geodesics in CAT(0) cubical complexes
Authors:
Koyo Hayashi
Abstract:
This paper presents the first polynomial time algorithm to compute geodesics in a CAT(0) cubical complex in general dimension. The algorithm is a simple iterative method to update breakpoints of a path joining two points using Miller, Owen and Provan's algorithm (2015) as a subroutine. Our algorithm is applicable to any CAT(0) space in which geodesics between two close points can be computed, not…
▽ More
This paper presents the first polynomial time algorithm to compute geodesics in a CAT(0) cubical complex in general dimension. The algorithm is a simple iterative method to update breakpoints of a path joining two points using Miller, Owen and Provan's algorithm (2015) as a subroutine. Our algorithm is applicable to any CAT(0) space in which geodesics between two close points can be computed, not limited to CAT(0) cubical complexes.
△ Less
Submitted 29 June, 2018; v1 submitted 26 October, 2017;
originally announced October 2017.
-
Think Globally, Embed Locally --- Locally Linear Meta-embedding of Words
Authors:
Danushka Bollegala,
Kohei Hayashi,
Ken-ichi Kawarabayashi
Abstract:
Distributed word embeddings have shown superior performances in numerous Natural Language Processing (NLP) tasks. However, their performances vary significantly across different tasks, implying that the word embeddings learnt by those methods capture complementary aspects of lexical semantics. Therefore, we believe that it is important to combine the existing word embeddings to produce more accura…
▽ More
Distributed word embeddings have shown superior performances in numerous Natural Language Processing (NLP) tasks. However, their performances vary significantly across different tasks, implying that the word embeddings learnt by those methods capture complementary aspects of lexical semantics. Therefore, we believe that it is important to combine the existing word embeddings to produce more accurate and complete \emph{meta-embeddings} of words. For this purpose, we propose an unsupervised locally linear meta-embedding learning method that takes pre-trained word embeddings as the input, and produces more accurate meta embeddings. Unlike previously proposed meta-embedding learning methods that learn a global projection over all words in a vocabulary, our proposed method is sensitive to the differences in local neighbourhoods of the individual source word embeddings. Moreover, we show that vector concatenation, a previously proposed highly competitive baseline approach for integrating word embeddings, can be derived as a special case of the proposed method. Experimental results on semantic similarity, word analogy, relation classification, and short-text classification tasks show that our meta-embeddings to significantly outperform prior methods in several benchmark datasets, establishing a new state of the art for meta-embeddings.
△ Less
Submitted 19 September, 2017;
originally announced September 2017.
-
Shared Memory Parallelization of MTTKRP for Dense Tensors
Authors:
Koby Hayashi,
Grey Ballard,
Jeffrey Jiang,
Michael Tobia
Abstract:
The matricized-tensor times Khatri-Rao product (MTTKRP) is the computational bottleneck for algorithms computing CP decompositions of tensors. In this paper, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors. The algorithms cast nearly all of the computation as matrix operations in order to use optimized BLAS subroutines, and they avoid reordering tensor entries in me…
▽ More
The matricized-tensor times Khatri-Rao product (MTTKRP) is the computational bottleneck for algorithms computing CP decompositions of tensors. In this paper, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors. The algorithms cast nearly all of the computation as matrix operations in order to use optimized BLAS subroutines, and they avoid reordering tensor entries in memory. We benchmark sequential and parallel performance of our implementations, demonstrating high sequential performance and efficient parallel scaling. We use our parallel implementation to compute a CP decomposition of a neuroimaging data set and achieve a speedup of up to $7.4\times$ over existing parallel software.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
On the Equivalence of Holographic and Complex Embeddings for Link Prediction
Authors:
Katsuhiko Hayashi,
Masashi Shimbo
Abstract:
We show the equivalence of two state-of-the-art link prediction/knowledge graph completion methods: Nickel et al's holographic embedding and Trouillon et al.'s complex embedding. We first consider a spectral version of the holographic embedding, exploiting the frequency domain in the Fourier transform for efficient computation. The analysis of the resulting method reveals that it can be viewed as…
▽ More
We show the equivalence of two state-of-the-art link prediction/knowledge graph completion methods: Nickel et al's holographic embedding and Trouillon et al.'s complex embedding. We first consider a spectral version of the holographic embedding, exploiting the frequency domain in the Fourier transform for efficient computation. The analysis of the resulting method reveals that it can be viewed as an instance of the complex embedding with certain constraints cast on the initial vectors upon training. Conversely, any complex embedding can be converted to an equivalent holographic embedding.
△ Less
Submitted 22 September, 2017; v1 submitted 17 February, 2017;
originally announced February 2017.
-
Minimizing Quadratic Functions in Constant Time
Authors:
Kohei Hayashi,
Yuichi Yoshida
Abstract:
A sampling-based optimization method for quadratic functions is proposed. Our method approximately solves the following $n$-dimensional quadratic minimization problem in constant time, which is independent of $n$: $z^*=\min_{\mathbf{v} \in \mathbb{R}^n}\langle\mathbf{v}, A \mathbf{v}\rangle + n\langle\mathbf{v}, \mathrm{diag}(\mathbf{d})\mathbf{v}\rangle + n\langle\mathbf{b}, \mathbf{v}\rangle$, w…
▽ More
A sampling-based optimization method for quadratic functions is proposed. Our method approximately solves the following $n$-dimensional quadratic minimization problem in constant time, which is independent of $n$: $z^*=\min_{\mathbf{v} \in \mathbb{R}^n}\langle\mathbf{v}, A \mathbf{v}\rangle + n\langle\mathbf{v}, \mathrm{diag}(\mathbf{d})\mathbf{v}\rangle + n\langle\mathbf{b}, \mathbf{v}\rangle$, where $A \in \mathbb{R}^{n \times n}$ is a matrix and $\mathbf{d},\mathbf{b} \in \mathbb{R}^n$ are vectors. Our theoretical analysis specifies the number of samples $k(δ, ε)$ such that the approximated solution $z$ satisfies $|z - z^*| = O(εn^2)$ with probability $1-δ$. The empirical performance (accuracy and runtime) is positively confirmed by numerical experiments.
△ Less
Submitted 25 August, 2016;
originally announced August 2016.
-
A Tractable Fully Bayesian Method for the Stochastic Block Model
Authors:
Kohei Hayashi,
Takuya Konishi,
Tatsuro Kawamoto
Abstract:
The stochastic block model (SBM) is a generative model revealing macroscopic structures in graphs. Bayesian methods are used for (i) cluster assignment inference and (ii) model selection for the number of clusters. In this paper, we study the behavior of Bayesian inference in the SBM in the large sample limit. Combining variational approximation and Laplace's method, a consistent criterion of the…
▽ More
The stochastic block model (SBM) is a generative model revealing macroscopic structures in graphs. Bayesian methods are used for (i) cluster assignment inference and (ii) model selection for the number of clusters. In this paper, we study the behavior of Bayesian inference in the SBM in the large sample limit. Combining variational approximation and Laplace's method, a consistent criterion of the fully marginalized log-likelihood is established. Based on that, we derive a tractable algorithm that solves tasks (i) and (ii) concurrently, obviating the need for an outer loop to check all model candidates. Our empirical and theoretical results demonstrate that our method is scalable in computation, accurate in approximation, and concise in model selection.
△ Less
Submitted 6 February, 2016;
originally announced February 2016.
-
Multiuser Detection by MAP Estimation with Sum-of-Absolute-Values Relaxation
Authors:
Hampei Sasahara,
Kazunori Hayashi,
Masaaki Nagahara
Abstract:
In this article, we consider multiuser detection that copes with multiple access interference caused in star-topology machine-to-machine (M2M) communications. We assume that the transmitted signals are discrete-valued (e.g. binary signals taking values of $\pm 1$), which is taken into account as prior information in detection. We formulate the detection problem as the maximum a posteriori (MAP) es…
▽ More
In this article, we consider multiuser detection that copes with multiple access interference caused in star-topology machine-to-machine (M2M) communications. We assume that the transmitted signals are discrete-valued (e.g. binary signals taking values of $\pm 1$), which is taken into account as prior information in detection. We formulate the detection problem as the maximum a posteriori (MAP) estimation, which is relaxed to a convex optimization called the sum-of-absolute-values (SOAV) optimization. The SOAV optimization can be efficiently solved by a proximal splitting algorithm, for which we give the proximity operator in a closed form. Numerical simulations are shown to illustrate the effectiveness of the proposed approach compared with the linear minimum mean-square-error (LMMSE) and the least absolute shrinkage and selection operator (LASSO) methods.
△ Less
Submitted 25 October, 2015;
originally announced October 2015.