-
Learning from Complementary Features
Authors:
Kosuke Sugiyama,
Masato Uchida
Abstract:
While precise data observation is essential for the learning processes of predictive models, it can be challenging owing to factors such as insufficient observation accuracy, high collection costs, and privacy constraints. In this paper, we examines cases where some qualitative features are unavailable as precise information indicating "what it is," but rather as complementary information indicati…
▽ More
While precise data observation is essential for the learning processes of predictive models, it can be challenging owing to factors such as insufficient observation accuracy, high collection costs, and privacy constraints. In this paper, we examines cases where some qualitative features are unavailable as precise information indicating "what it is," but rather as complementary information indicating "what it is not." We refer to features defined by precise information as ordinary features (OFs) and those defined by complementary information as complementary features (CFs). We then formulate a new learning scenario termed Complementary Feature Learning (CFL), where predictive models are constructed using instances consisting of OFs and CFs. The simplest formalization of CFL applies conventional supervised learning directly using the observed values of CFs. However, this approach does not resolve the ambiguity associated with CFs, making learning challenging and complicating the interpretation of the predictive model's specific predictions. Therefore, we derive an objective function from an information-theoretic perspective to estimate the OF values corresponding to CFs and to predict output labels based on these estimations. Based on this objective function, we propose a theoretically guaranteed graph-based estimation method along with its practical approximation, for estimating OF values corresponding to CFs. The results of numerical experiments conducted with real-world data demonstrate that our proposed method effectively estimates OF values corresponding to CFs and predicts output labels.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Balancing Embedding Spectrum for Recommendation
Authors:
Shaowen Peng,
Kazunari Sugiyama,
Xin Liu,
Tsunenori Mine
Abstract:
Modern recommender systems heavily rely on high-quality representations learned from high-dimensional sparse data. While significant efforts have been invested in designing powerful algorithms for extracting user preferences, the factors contributing to good representations have remained relatively unexplored. In this work, we shed light on an issue in the existing pair-wise learning paradigm (i.e…
▽ More
Modern recommender systems heavily rely on high-quality representations learned from high-dimensional sparse data. While significant efforts have been invested in designing powerful algorithms for extracting user preferences, the factors contributing to good representations have remained relatively unexplored. In this work, we shed light on an issue in the existing pair-wise learning paradigm (i.e., the embedding collapse problem), that the representations tend to span a subspace of the whole embedding space, leading to a suboptimal solution and reducing the model capacity. Specifically, optimization on observed interactions is equivalent to a low pass filter causing users/items to have the same representations and resulting in a complete collapse. While negative sampling acts as a high pass filter to alleviate the collapse by balancing the embedding spectrum, its effectiveness is only limited to certain losses, which still leads to an incomplete collapse. To tackle this issue, we propose a novel method called DirectSpec, acting as a reliable all pass filter to balance the spectrum distribution of the embeddings during training, ensuring that users/items effectively span the entire embedding space. Additionally, we provide a thorough analysis of DirectSpec from a decorrelation perspective and propose an enhanced variant, DirectSpec+, which employs self-paced gradients to optimize irrelevant samples more effectively. Moreover, we establish a close connection between DirectSpec+ and uniformity, demonstrating that contrastive learning (CL) can alleviate the collapse issue by indirectly balancing the spectrum. Finally, we implement DirectSpec and DirectSpec+ on two popular recommender models: MF and LightGCN. Our experimental results demonstrate its effectiveness and efficiency over competitive baselines.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
How Powerful is Graph Filtering for Recommendation
Authors:
Shaowen Peng,
Xin Liu,
Kazunari Sugiyama,
Tsunenori Mine
Abstract:
It has been shown that the effectiveness of graph convolutional network (GCN) for recommendation is attributed to the spectral graph filtering. Most GCN-based methods consist of a graph filter or followed by a low-rank mapping optimized based on supervised training. However, we show two limitations suppressing the power of graph filtering: (1) Lack of generality. Due to the varied noise distributi…
▽ More
It has been shown that the effectiveness of graph convolutional network (GCN) for recommendation is attributed to the spectral graph filtering. Most GCN-based methods consist of a graph filter or followed by a low-rank mapping optimized based on supervised training. However, we show two limitations suppressing the power of graph filtering: (1) Lack of generality. Due to the varied noise distribution, graph filters fail to denoise sparse data where noise is scattered across all frequencies, while supervised training results in worse performance on dense data where noise is concentrated in middle frequencies that can be removed by graph filters without training. (2) Lack of expressive power. We theoretically show that linear GCN (LGCN) that is effective on collaborative filtering (CF) cannot generate arbitrary embeddings, implying the possibility that optimal data representation might be unreachable.
To tackle the first limitation, we show close relation between noise distribution and the sharpness of spectrum where a sharper spectral distribution is more desirable causing data noise to be separable from important features without training. Based on this observation, we propose a generalized graph normalization G^2N to adjust the sharpness of spectral distribution in order to redistribute data noise to assure that it can be removed by graph filtering without training. As for the second limitation, we propose an individualized graph filter (IGF) adapting to the different confidence levels of the user preference that interactions can reflect, which is proved to be able to generate arbitrary embeddings. By simplifying LGCN, we further propose a simplified graph filtering (SGFCF) which only requires the top-K singular values for recommendation. Finally, experimental results on four datasets with different density settings demonstrate the effectiveness and efficiency of our proposed methods.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Shortest Beer Path Queries based on Graph Decomposition
Authors:
Tesshu Hanaka,
Hirotaka Ono,
Kunihiko Sadakane,
Kosuke Sugiyama
Abstract:
Given a directed edge-weighted graph $G=(V, E)$ with beer vertices $B\subseteq V$, a beer path between two vertices $u$ and $v$ is a path between $u$ and $v$ that visits at least one beer vertex in $B$, and the beer distance between two vertices is the shortest length of beer paths. We consider \emph{indexing problems} on beer paths, that is, a graph is given a priori, and we construct some data s…
▽ More
Given a directed edge-weighted graph $G=(V, E)$ with beer vertices $B\subseteq V$, a beer path between two vertices $u$ and $v$ is a path between $u$ and $v$ that visits at least one beer vertex in $B$, and the beer distance between two vertices is the shortest length of beer paths. We consider \emph{indexing problems} on beer paths, that is, a graph is given a priori, and we construct some data structures (called indexes) for the graph. Then later, we are given two vertices, and we find the beer distance or beer path between them using the data structure. For such a scheme, efficient algorithms using indexes for the beer distance and beer path queries have been proposed for outerplanar graphs and interval graphs. For example, Bacic et al. (2021) present indexes with size $O(n)$ for outerplanar graphs and an algorithm using them that answers the beer distance between given two vertices in $O(α(n))$ time, where $α(\cdot)$ is the inverse Ackermann function; the performance is shown to be optimal. This paper proposes indexing data structures and algorithms for beer path queries on general graphs based on two types of graph decomposition: the tree decomposition and the triconnected component decomposition. We propose indexes with size $O(m+nr^2)$ based on the triconnected component decomposition, where $r$ is the size of the largest triconnected component. For a given query $u,v\in V$, our algorithm using the indexes can output the beer distance in query time $O(α(m))$. In particular, our indexing data structures and algorithms achieve the optimal performance (the space and the query time) for series-parallel graphs, which is a wider class of outerplanar graphs.
△ Less
Submitted 10 July, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Solving Distance-constrained Labeling Problems for Small Diameter Graphs via TSP
Authors:
Tesshu Hanaka,
Hirotaka Ono,
Kosuke Sugiyama
Abstract:
In this paper, we give a simple polynomial-time reduction of {L(p)-Labeling} on graphs with a small diameter to {Metric (Path) TSP}, which enables us to use numerous results on {(Metric) TSP}. On the practical side, we can utilize various high-performance heuristics for TSP, such as Concordo and LKH, to solve our problem. On the theoretical side, we can see that the problem for any p under this fr…
▽ More
In this paper, we give a simple polynomial-time reduction of {L(p)-Labeling} on graphs with a small diameter to {Metric (Path) TSP}, which enables us to use numerous results on {(Metric) TSP}. On the practical side, we can utilize various high-performance heuristics for TSP, such as Concordo and LKH, to solve our problem. On the theoretical side, we can see that the problem for any p under this framework is 1.5-approximable, and it can be solved by the Held-Karp algorithm in O(2^n n^2) time, where n is the number of vertices, and so on.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
SVD-GCN: A Simplified Graph Convolution Paradigm for Recommendation
Authors:
Shaowen Peng,
Kazunari Sugiyama,
Tsunenori Mine
Abstract:
With the tremendous success of Graph Convolutional Networks (GCNs), they have been widely applied to recommender systems and have shown promising performance. However, most GCN-based methods rigorously stick to a common GCN learning paradigm and suffer from two limitations: (1) the limited scalability due to the high computational cost and slow training convergence; (2) the notorious over-smoothin…
▽ More
With the tremendous success of Graph Convolutional Networks (GCNs), they have been widely applied to recommender systems and have shown promising performance. However, most GCN-based methods rigorously stick to a common GCN learning paradigm and suffer from two limitations: (1) the limited scalability due to the high computational cost and slow training convergence; (2) the notorious over-smoothing issue which reduces performance as stacking graph convolution layers. We argue that the above limitations are due to the lack of a deep understanding of GCN-based methods. To this end, we first investigate what design makes GCN effective for recommendation. By simplifying LightGCN, we show the close connection between GCN-based and low-rank methods such as Singular Value Decomposition (SVD) and Matrix Factorization (MF), where stacking graph convolution layers is to learn a low-rank representation by emphasizing (suppressing) components with larger (smaller) singular values. Based on this observation, we replace the core design of GCN-based methods with a flexible truncated SVD and propose a simplified GCN learning paradigm dubbed SVD-GCN, which only exploits $K$-largest singular vectors for recommendation. To alleviate the over-smoothing issue, we propose a renormalization trick to adjust the singular value gap, resulting in significant improvement. Extensive experiments on three real-world datasets show that our proposed SVD-GCN not only significantly outperforms state-of-the-arts but also achieves over 100x and 10x speedups over LightGCN and MF, respectively.
△ Less
Submitted 3 September, 2022; v1 submitted 26 August, 2022;
originally announced August 2022.
-
Less is More: Reweighting Important Spectral Graph Features for Recommendation
Authors:
Shaowen Peng,
Kazunari Sugiyama,
Tsunenori Mine
Abstract:
As much as Graph Convolutional Networks (GCNs) have shown tremendous success in recommender systems and collaborative filtering (CF), the mechanism of how they, especially the core components (\textit{i.e.,} neighborhood aggregation) contribute to recommendation has not been well studied. To unveil the effectiveness of GCNs for recommendation, we first analyze them in a spectral perspective and di…
▽ More
As much as Graph Convolutional Networks (GCNs) have shown tremendous success in recommender systems and collaborative filtering (CF), the mechanism of how they, especially the core components (\textit{i.e.,} neighborhood aggregation) contribute to recommendation has not been well studied. To unveil the effectiveness of GCNs for recommendation, we first analyze them in a spectral perspective and discover two important findings: (1) only a small portion of spectral graph features that emphasize the neighborhood smoothness and difference contribute to the recommendation accuracy, whereas most graph information can be considered as noise that even reduces the performance, and (2) repetition of the neighborhood aggregation emphasizes smoothed features and filters out noise information in an ineffective way. Based on the two findings above, we propose a new GCN learning scheme for recommendation by replacing neihgborhood aggregation with a simple yet effective Graph Denoising Encoder (GDE), which acts as a band pass filter to capture important graph features. We show that our proposed method alleviates the over-smoothing and is comparable to an indefinite-layer GCN that can take any-hop neighborhood into consideration. Finally, we dynamically adjust the gradients over the negative samples to expedite model training without introducing additional complexity. Extensive experiments on five real-world datasets show that our proposed method not only outperforms state-of-the-arts but also achieves 12x speedup over LightGCN.
△ Less
Submitted 24 April, 2022;
originally announced April 2022.
-
More Powerful and General Selective Inference for Stepwise Feature Selection using the Homotopy Continuation Approach
Authors:
Kazuya Sugiyama,
Vo Nguyen Le Duy,
Ichiro Takeuchi
Abstract:
Conditional selective inference (SI) has been actively studied as a new statistical inference framework for data-driven hypotheses. The basic idea of conditional SI is to make inferences conditional on the selection event characterized by a set of linear and/or quadratic inequalities. Conditional SI has been mainly studied in the context of feature selection such as stepwise feature selection (SFS…
▽ More
Conditional selective inference (SI) has been actively studied as a new statistical inference framework for data-driven hypotheses. The basic idea of conditional SI is to make inferences conditional on the selection event characterized by a set of linear and/or quadratic inequalities. Conditional SI has been mainly studied in the context of feature selection such as stepwise feature selection (SFS). The main limitation of the existing conditional SI methods is the loss of power due to over-conditioning, which is required for computational tractability. In this study, we develop a more powerful and general conditional SI method for SFS using the homotopy method which enables us to overcome this limitation. The homotopy-based SI is especially effective for more complicated feature selection algorithms. As an example, we develop a conditional SI method for forward-backward SFS with AIC-based stopping criteria and show that it is not adversely affected by the increased complexity of the algorithm. We conduct several experiments to demonstrate the effectiveness and efficiency of the proposed method.
△ Less
Submitted 21 April, 2021; v1 submitted 25 December, 2020;
originally announced December 2020.
-
FANG: Leveraging Social Context for Fake News Detection Using Graph Representation
Authors:
Van-Hoang Nguyen,
Kazunari Sugiyama,
Preslav Nakov,
Min-Yen Kan
Abstract:
We propose Factual News Graph (FANG), a novel graphical social context representation and learning framework for fake news detection. Unlike previous contextual models that have targeted performance, our focus is on representation learning. Compared to transductive models, FANG is scalable in training as it does not have to maintain all nodes, and it is efficient at inference time, without the nee…
▽ More
We propose Factual News Graph (FANG), a novel graphical social context representation and learning framework for fake news detection. Unlike previous contextual models that have targeted performance, our focus is on representation learning. Compared to transductive models, FANG is scalable in training as it does not have to maintain all nodes, and it is efficient at inference time, without the need to re-process the entire graph. Our experimental results show that FANG is better at capturing the social context into a high fidelity representation, compared to recent graphical and non-graphical models. In particular, FANG yields significant improvements for the task of fake news detection, and it is robust in the case of limited training data. We further demonstrate that the representations learned by FANG generalize to related tasks, such as predicting the factuality of reporting of a news medium.
△ Less
Submitted 8 October, 2020; v1 submitted 18 August, 2020;
originally announced August 2020.
-
Neural Multi-Task Learning for Citation Function and Provenance
Authors:
Xuan Su,
Animesh Prasad,
Min-Yen Kan,
Kazunari Sugiyama
Abstract:
Citation function and provenance are two cornerstone tasks in citation analysis. Given a citation, the former task determines its rhetorical role, while the latter locates the text in the cited paper that contains the relevant cited information. We hypothesize that these two tasks are synergistically related, and build a model that validates this claim. For both tasks, we show that a single-layer…
▽ More
Citation function and provenance are two cornerstone tasks in citation analysis. Given a citation, the former task determines its rhetorical role, while the latter locates the text in the cited paper that contains the relevant cited information. We hypothesize that these two tasks are synergistically related, and build a model that validates this claim. For both tasks, we show that a single-layer convolutional neural network (CNN) outperforms existing state-of-the-art baselines. More importantly, we show that the two tasks are indeed synergistic: by jointly training both of the tasks in a multi-task learning setup, we demonstrate additional performance gains. Altogether, our models improve the current state-of-the-arts up to 2\%, with statistical significance for both citation function and provenance prediction tasks.
△ Less
Submitted 28 January, 2019; v1 submitted 18 November, 2018;
originally announced November 2018.
-
Abstractive Meeting Summarization UsingDependency Graph Fusion
Authors:
Siddhartha Banerjee,
Prasenjit Mitra,
Kazunari Sugiyama
Abstract:
Automatic summarization techniques on meeting conversations developed so far have been primarily extractive, resulting in poor summaries. To improve this, we propose an approach to generate abstractive summaries by fusing important content from several utterances. Any meeting is generally comprised of several discussion topic segments. For each topic segment within a meeting conversation, we aim t…
▽ More
Automatic summarization techniques on meeting conversations developed so far have been primarily extractive, resulting in poor summaries. To improve this, we propose an approach to generate abstractive summaries by fusing important content from several utterances. Any meeting is generally comprised of several discussion topic segments. For each topic segment within a meeting conversation, we aim to generate a one sentence summary from the most important utterances using an integer linear programming-based sentence fusion approach. Experimental results show that our method can generate more informative summaries than the baselines.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.
-
Multi-document abstractive summarization using ILP based multi-sentence compression
Authors:
Siddhartha Banerjee,
Prasenjit Mitra,
Kazunari Sugiyama
Abstract:
Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other doc…
▽ More
Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other documents to generate clusters of similar sentences. Second, we generate K-shortest paths from the sentences in each cluster using a word-graph structure. Finally, we select sentences from the set of shortest paths generated from all the clusters employing a novel integer linear programming (ILP) model with the objective of maximizing information content and readability of the final summary. Our ILP model represents the shortest paths as binary variables and considers the length of the path, information score and linguistic quality score in the objective function. Experimental results on the DUC 2004 and 2005 multi-document summarization datasets show that our proposed approach outperforms all the baselines and state-of-the-art extractive summarizers as measured by the ROUGE scores. Our method also outperforms a recent abstractive summarization technique. In manual evaluation, our approach also achieves promising results on informativeness and readability.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.
-
Generating Abstractive Summaries from Meeting Transcripts
Authors:
Siddhartha Banerjee,
Prasenjit Mitra,
Kazunari Sugiyama
Abstract:
Summaries of meetings are very important as they convey the essential content of discussions in a concise form. Generally, it is time consuming to read and understand the whole documents. Therefore, summaries play an important role as the readers are interested in only the important context of discussions. In this work, we address the task of meeting document summarization. Automatic summarization…
▽ More
Summaries of meetings are very important as they convey the essential content of discussions in a concise form. Generally, it is time consuming to read and understand the whole documents. Therefore, summaries play an important role as the readers are interested in only the important context of discussions. In this work, we address the task of meeting document summarization. Automatic summarization systems on meeting conversations developed so far have been primarily extractive, resulting in unacceptable summaries that are hard to read. The extracted utterances contain disfluencies that affect the quality of the extractive summaries. To make summaries much more readable, we propose an approach to generating abstractive summaries by fusing important content from several utterances. We first separate meeting transcripts into various topic segments, and then identify the important utterances in each segment using a supervised learning approach. The important utterances are then combined together to generate a one-sentence summary. In the text generation step, the dependency parses of the utterances in each segment are combined together to create a directed graph. The most informative and well-formed sub-graph obtained by integer linear programming (ILP) is selected to generate a one-sentence summary for each topic segment. The ILP formulation reduces disfluencies by leveraging grammatical relations that are more prominent in non-conversational style of text, and therefore generates summaries that is comparable to human-written abstractive summaries. Experimental results show that our method can generate more informative summaries than the baselines. In addition, readability assessments by human judges as well as log-likelihood estimates obtained from the dependency parser show that our generated summaries are significantly readable and well-formed.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.
-
Motivating Smartphone Collaboration in Data Acquisition and Distributed Computing
Authors:
Lingjie Duan,
Takeshi Kubo,
Kohei Sugiyama,
Jianwei Huang,
Teruyuki Hasegawa,
Jean Walrand
Abstract:
This paper analyzes and compares different incentive mechanisms for a master to motivate the collaboration of smartphone users on both data acquisition and distributed computing applications. To collect massive sensitive data from users, we propose a reward-based collaboration mechanism, where the master announces a total reward to be shared among collaborators, and the collaboration is successful…
▽ More
This paper analyzes and compares different incentive mechanisms for a master to motivate the collaboration of smartphone users on both data acquisition and distributed computing applications. To collect massive sensitive data from users, we propose a reward-based collaboration mechanism, where the master announces a total reward to be shared among collaborators, and the collaboration is successful if there are enough users wanting to collaborate. We show that if the master knows the users' collaboration costs, then he can choose to involve only users with the lowest costs. However, without knowing users' private information, then he needs to offer a larger total reward to attract enough collaborators. Users will benefit from knowing their costs before the data acquisition. Perhaps surprisingly, the master may benefit as the variance of users' cost distribution increases.
To utilize smartphones' computation resources to solve complex computing problems, we study how the master can design an optimal contract by specifying different task-reward combinations for different user types. Under complete information, we show that the master involves a user type as long as the master's preference characteristic outweighs that type's unit cost. All collaborators achieve a zero payoff in this case. If the master does not know users' private cost information, however, he will conservatively target at a smaller group of users with small costs, and has to give most benefits to the collaborators.
△ Less
Submitted 26 January, 2014;
originally announced January 2014.
-
An MDS code associated to an elliptic curve
Authors:
Ken-ichi Sugiyama
Abstract:
We will construct an MDS(= the most distance separable) code $C$ which admits a decomposition such that every factor is still MDS. An effective way of decoding will be also discussed.
We will construct an MDS(= the most distance separable) code $C$ which admits a decomposition such that every factor is still MDS. An effective way of decoding will be also discussed.
△ Less
Submitted 12 October, 2013;
originally announced October 2013.
-
Product Review Summarization based on Facet Identification and Sentence Clustering
Authors:
Duy Khang Ly,
Kazunari Sugiyama,
Ziheng Lin,
Min-Yen Kan
Abstract:
Product review nowadays has become an important source of information, not only for customers to find opinions about products easily and share their reviews with peers, but also for product manufacturers to get feedback on their products. As the number of product reviews grows, it becomes difficult for users to search and utilize these resources in an efficient way. In this work, we build a produc…
▽ More
Product review nowadays has become an important source of information, not only for customers to find opinions about products easily and share their reviews with peers, but also for product manufacturers to get feedback on their products. As the number of product reviews grows, it becomes difficult for users to search and utilize these resources in an efficient way. In this work, we build a product review summarization system that can automatically process a large collection of reviews and aggregate them to generate a concise summary. More importantly, the drawback of existing product summarization systems is that they cannot provide the underlying reasons to justify users' opinions. In our method, we solve this problem by applying clustering, prior to selecting representative candidates for summarization.
△ Less
Submitted 7 October, 2011;
originally announced October 2011.
-
On a linear code from a configuration of affine lines
Authors:
Ken-ichi Sugiyama
Abstract:
We will show how to obtain a linear code from a configuration of affine lines in general position and a suitable set of rational points. We will also explain a new decoding algorithm based on the configuration, which seems to be quite effective.
We will show how to obtain a linear code from a configuration of affine lines in general position and a suitable set of rational points. We will also explain a new decoding algorithm based on the configuration, which seems to be quite effective.
△ Less
Submitted 21 August, 2007;
originally announced August 2007.