Search | arXiv e-print repository

Polyak Meets Parameter-free Clipped Gradient Descent

Authors: Yuki Takezawa, Han Bao, Ryoma Sato, Kenta Niwa, Makoto Yamada

Abstract: Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparameters, we need to tune the hyperparameters carefully using a grid search, but it is time-consuming, especially when multiple hyperparameters exist. Recently, parameter-free methods that adjust the hyperparameters on the fly have been studied. Ho… ▽ More Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparameters, we need to tune the hyperparameters carefully using a grid search, but it is time-consuming, especially when multiple hyperparameters exist. Recently, parameter-free methods that adjust the hyperparameters on the fly have been studied. However, the existing work only studied parameter-free methods for the stepsize, and parameter-free methods for other hyperparameters have not been explored. For instance, the gradient clipping threshold is also a crucial hyperparameter in addition to the stepsize to prevent gradient explosion issues, but none of the existing studies investigated the parameter-free methods for clipped gradient descent. In this work, we study the parameter-free methods for clipped gradient descent. Specifically, we propose Inexact Polyak Stepsize, which converges to the optimal solution without any hyperparameters tuning, and its convergence rate is asymptotically independent of L under L-smooth and $(L_0, L_1)$-smooth assumptions of the loss function as that of clipped gradient descent with well-tuned hyperparameters. We numerically validated our convergence results using a synthetic function and demonstrated the effectiveness of our proposed methods using LSTM, Nano-GPT, and T5. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2404.19288 [pdf, other]

Training-free Graph Neural Networks and the Power of Labels as Features

Authors: Ryoma Sato

Abstract: We propose training-free graph neural networks (TFGNNs), which can be used without training and can also be improved with optional training, for transductive node classification. We first advocate labels as features (LaF), which is an admissible but not explored technique. We show that LaF provably enhances the expressive power of graph neural networks. We design TFGNNs based on this analysis. In… ▽ More We propose training-free graph neural networks (TFGNNs), which can be used without training and can also be improved with optional training, for transductive node classification. We first advocate labels as features (LaF), which is an admissible but not explored technique. We show that LaF provably enhances the expressive power of graph neural networks. We design TFGNNs based on this analysis. In the experiments, we confirm that TFGNNs outperform existing GNNs in the training-free setting and converge with much fewer training iterations than traditional GNNs. △ Less

Submitted 15 August, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: TMLR 2024

arXiv:2404.11049 [pdf, other]

Stepwise Alignment for Constrained Language Model Policy Optimization

Authors: Akifumi Wachi, Thien Q. Tran, Rei Sato, Takumi Tanabe, Youhei Akimoto

Abstract: Safety and trustworthiness are indispensable requirements for real-world applications of AI systems using large language models (LLMs). This paper formulates human value alignment as an optimization problem of the language model policy to maximize reward under a safety constraint, and then proposes an algorithm, Stepwise Alignment for Constrained Policy Optimization (SACPO). One key idea behind SA… ▽ More Safety and trustworthiness are indispensable requirements for real-world applications of AI systems using large language models (LLMs). This paper formulates human value alignment as an optimization problem of the language model policy to maximize reward under a safety constraint, and then proposes an algorithm, Stepwise Alignment for Constrained Policy Optimization (SACPO). One key idea behind SACPO, supported by theory, is that the optimal policy incorporating reward and safety can be directly obtained from a reward-aligned policy. Building on this key idea, SACPO aligns LLMs step-wise with each metric while leveraging simple yet powerful alignment algorithms such as direct preference optimization (DPO). SACPO offers several advantages, including simplicity, stability, computational efficiency, and flexibility of algorithms and datasets. Under mild assumptions, our theoretical analysis provides the upper bounds on optimality and safety constraint violation. Our experimental results show that SACPO can fine-tune Alpaca-7B better than the state-of-the-art method in terms of both helpfulness and harmlessness. △ Less

Submitted 22 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2403.15757 [pdf, other]

User-Side Realization

Authors: Ryoma Sato

Abstract: Users are dissatisfied with services. Since the service is not tailor-made for a user, it is natural for dissatisfaction to arise. The problem is, that even if users are dissatisfied, they often do not have the means to resolve their dissatisfaction. The user cannot alter the source code of the service, nor can they force the service provider to change. The user has no choice but to remain dissati… ▽ More Users are dissatisfied with services. Since the service is not tailor-made for a user, it is natural for dissatisfaction to arise. The problem is, that even if users are dissatisfied, they often do not have the means to resolve their dissatisfaction. The user cannot alter the source code of the service, nor can they force the service provider to change. The user has no choice but to remain dissatisfied or quit the service. User-side realization offers proactive solutions to this problem by providing general algorithms to deal with common problems on the user's side. These algorithms run on the user's side and solve the problems without having the service provider change the service itself. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: Doctoral Thesis

arXiv:2312.04068 [pdf, other]

Making Translators Privacy-aware on the User's Side

Authors: Ryoma Sato

Abstract: We propose PRISM to enable users of machine translation systems to preserve the privacy of data on their own initiative. There is a growing demand to apply machine translation systems to data that require privacy protection. While several machine translation engines claim to prioritize privacy, the extent and specifics of such protection are largely ambiguous. First, there is often a lack of clari… ▽ More We propose PRISM to enable users of machine translation systems to preserve the privacy of data on their own initiative. There is a growing demand to apply machine translation systems to data that require privacy protection. While several machine translation engines claim to prioritize privacy, the extent and specifics of such protection are largely ambiguous. First, there is often a lack of clarity on how and to what degree the data is protected. Even if service providers believe they have sufficient safeguards in place, sophisticated adversaries might still extract sensitive information. Second, vulnerabilities may exist outside of these protective measures, such as within communication channels, potentially leading to data leakage. As a result, users are hesitant to utilize machine translation engines for data demanding high levels of privacy protection, thereby missing out on their benefits. PRISM resolves this problem. Instead of relying on the translation service to keep data safe, PRISM provides the means to protect data on the user's side. This approach ensures that even machine translation engines with inadequate privacy measures can be used securely. For platforms already equipped with privacy safeguards, PRISM acts as an additional protection layer, reinforcing their security furthermore. PRISM adds these privacy features without significantly compromising translation accuracy. Our experiments demonstrate the effectiveness of PRISM using real-world translators, T5 and ChatGPT (GPT-3.5-turbo), and the datasets with two languages. PRISM effectively balances privacy protection with translation accuracy. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.00044 [pdf]

Advancing AI Audits for Enhanced AI Governance

Authors: Arisa Ema, Ryo Sato, Tomoharu Hase, Masafumi Nakano, Shinji Kamimura, Hiromu Kitamura

Abstract: As artificial intelligence (AI) is integrated into various services and systems in society, many companies and organizations have proposed AI principles, policies, and made the related commitments. Conversely, some have proposed the need for independent audits, arguing that the voluntary principles adopted by the developers and providers of AI services and systems insufficiently address risk. This… ▽ More As artificial intelligence (AI) is integrated into various services and systems in society, many companies and organizations have proposed AI principles, policies, and made the related commitments. Conversely, some have proposed the need for independent audits, arguing that the voluntary principles adopted by the developers and providers of AI services and systems insufficiently address risk. This policy recommendation summarizes the issues related to the auditing of AI services and systems and presents three recommendations for promoting AI auditing that contribute to sound AI governance. Recommendation1.Development of institutional design for AI audits. Recommendation2.Training human resources for AI audits. Recommendation3. Updating AI audits in accordance with technological progress. In this policy recommendation, AI is assumed to be that which recognizes and predicts data with the last chapter outlining how generative AI should be audited. △ Less

Submitted 26 November, 2023; originally announced December 2023.

arXiv:2310.20430 [pdf, ps, other]

Borrowable Fractional Ownership Types for Verification

Authors: Takashi Nakayama, Yusuke Matsushita, Ken Sakayori, Ryosuke Sato, Naoki Kobayashi

Abstract: Automated verification of functional correctness of imperative programs with references (a.k.a. pointers) is challenging because of reference aliasing. Ownership types have recently been applied to address this issue, but the existing approaches were limited in that they are effective only for a class of programs whose reference usage follows a certain style. To relax the limitation, we combine th… ▽ More Automated verification of functional correctness of imperative programs with references (a.k.a. pointers) is challenging because of reference aliasing. Ownership types have recently been applied to address this issue, but the existing approaches were limited in that they are effective only for a class of programs whose reference usage follows a certain style. To relax the limitation, we combine the approaches of ConSORT (based on fractional ownership) and RustHorn (based on borrowable ownership), two recent approaches to automated program verification based on ownership types, and propose the notion of borrowable fractional ownership types. We formalize a new type system based on the borrowable fractional ownership types and show how we can use it to automatically reduce the program verification problem for imperative programs with references to that for functional programs without references. We also show the soundness of our type system and the translation, and conduct experiments to confirm the effectiveness of our approach. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: An extended version of the paper to appear in Proceedings of VMCAI 2024

arXiv:2310.08920 [pdf, other]

Embarrassingly Simple Text Watermarks

Authors: Ryoma Sato, Yuki Takezawa, Han Bao, Kenta Niwa, Makoto Yamada

Abstract: We propose Easymark, a family of embarrassingly simple yet effective watermarks. Text watermarking is becoming increasingly important with the advent of Large Language Models (LLM). LLMs can generate texts that cannot be distinguished from human-written texts. This is a serious problem for the credibility of the text. Easymark is a simple yet effective solution to this problem. Easymark can inject… ▽ More We propose Easymark, a family of embarrassingly simple yet effective watermarks. Text watermarking is becoming increasingly important with the advent of Large Language Models (LLM). LLMs can generate texts that cannot be distinguished from human-written texts. This is a serious problem for the credibility of the text. Easymark is a simple yet effective solution to this problem. Easymark can inject a watermark without changing the meaning of the text at all while a validator can detect if a text was generated from a system that adopted Easymark or not with high credibility. Easymark is extremely easy to implement so that it only requires a few lines of code. Easymark does not require access to LLMs, so it can be implemented on the user-side when the LLM providers do not offer watermarked LLMs. In spite of its simplicity, it achieves higher detection accuracy and BLEU scores than the state-of-the-art text watermarking methods. We also prove the impossibility theorem of perfect watermarking, which is valuable in its own right. This theorem shows that no matter how sophisticated a watermark is, a malicious user could remove it from the text, which motivate us to use a simple watermark such as Easymark. We carry out experiments with LLM-generated texts and confirm that Easymark can be detected reliably without any degradation of BLEU and perplexity, and outperform state-of-the-art watermarks in terms of both quality and reliability. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.00833 [pdf, other]

Necessary and Sufficient Watermark for Large Language Models

Authors: Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada

Abstract: In recent years, large language models (LLMs) have achieved remarkable performances in various NLP tasks. They can generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes, such as generating fake news articles. Therefore, it is necessary to develop methods for distinguishing texts written… ▽ More In recent years, large language models (LLMs) have achieved remarkable performances in various NLP tasks. They can generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes, such as generating fake news articles. Therefore, it is necessary to develop methods for distinguishing texts written by LLMs from those written by humans. Watermarking is one of the most powerful methods for achieving this. Although existing watermarking methods have successfully detected texts generated by LLMs, they significantly degrade the quality of the generated texts. In this study, we propose the Necessary and Sufficient Watermark (NS-Watermark) for inserting watermarks into generated texts without degrading the text quality. More specifically, we derive minimum constraints required to be imposed on the generated texts to distinguish whether LLMs or humans write the texts. Then, we formulate the NS-Watermark as a constrained optimization problem and propose an efficient algorithm to solve it. Through the experiments, we demonstrate that the NS-Watermark can generate more natural texts than existing watermarking methods and distinguish more accurately between texts written by LLMs and those written by humans. Especially in machine translation tasks, the NS-Watermark can outperform the existing watermarking method by up to 30 BLEU scores. △ Less

Submitted 1 October, 2023; originally announced October 2023.

arXiv:2305.11420 [pdf, other]

Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence

Authors: Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada

Abstract: Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponen… ▽ More Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponential graph, generally has a large maximum degree, which incurs significant communication costs. Thus, seeking topologies with both a fast consensus rate and small maximum degree is important. In this study, we propose a novel topology combining both a fast consensus rate and small maximum degree called the Base-$(k + 1)$ Graph. Unlike the existing topologies, the Base-$(k + 1)$ Graph enables all nodes to reach the exact consensus after a finite number of iterations for any number of nodes and maximum degree k. Thanks to this favorable property, the Base-$(k + 1)$ Graph endows Decentralized SGD (DSGD) with both a faster convergence rate and more communication efficiency than the exponential graph. We conducted experiments with various topologies, demonstrating that the Base-$(k + 1)$ Graph enables various decentralized learning methods to achieve higher accuracy with better communication efficiency than the existing topologies. △ Less

Submitted 15 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2303.00231 [pdf, ps, other]

Polyhedral Clinching Auctions for Indivisible Goods

Authors: Hiroshi Hirai, Ryosuke Sato

Abstract: In this study, we propose the polyhedral clinching auction for indivisible goods, which has so far been studied for divisible goods. As in the divisible setting by Goel et al. (2015), our mechanism enjoys incentive compatibility, individual rationality, and Pareto optimality, and works with polymatroidal environments. A notable feature for the indivisible setting is that the whole procedure can be… ▽ More In this study, we propose the polyhedral clinching auction for indivisible goods, which has so far been studied for divisible goods. As in the divisible setting by Goel et al. (2015), our mechanism enjoys incentive compatibility, individual rationality, and Pareto optimality, and works with polymatroidal environments. A notable feature for the indivisible setting is that the whole procedure can be conducted in time polynomial of the number of buyers and goods. Moreover, we show additional efficiency guarantees, recently established by Sato for the divisible setting: The liquid welfare (LW) of our mechanism achieves more than 1/2 of the optimal LW, and that the social welfare is more than the optimal LW. △ Less

Submitted 18 October, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

MSC Class: 91B26

arXiv:2302.03458 [pdf, ps, other]

Polyhedral Clinching Auctions with a Single Sample

Authors: Ryosuke Sato

Abstract: In this study, we investigate auctions in two-sided markets with budget constraints on buyers. Our goal is to propose an efficient mechanism that satisfies dominant strategy incentive compatibility (DSIC), individual rationality (IR), and weak budget balance (WBB). To avoid several known impossibility theorems for each of two-sided markets and budget constraints, we assume prior information on sel… ▽ More In this study, we investigate auctions in two-sided markets with budget constraints on buyers. Our goal is to propose an efficient mechanism that satisfies dominant strategy incentive compatibility (DSIC), individual rationality (IR), and weak budget balance (WBB). To avoid several known impossibility theorems for each of two-sided markets and budget constraints, we assume prior information on sellers' valuations and investigate the efficiency of mechanisms by means of liquid welfare (LW), an efficiency measure used for budget-constrained auctions, in addition to social welfare (SW). Our first result is to improve the efficiency guarantees of the Polyhedral Clinching Auction by Hirai and Sato (2022), proposed for two-sided markets in which each seller is assumed to be truthful. We show that even under budget constraints and polymatroid constraints, their mechanism achieves an LW of more than 1/2 of the optimal LW, and an SW of more than the optimal LW. This significantly strengthens the efficiency guarantees of the existing studies on clinching auctions. Our second result is the extension of the Polyhedral Clinching Auction to a single-sample mechanism. Under the assumption that a single sample is provided from each seller's distribution on her valuation, we utilize the first result and propose an efficient mechanism that satisfies DSIC, IR, and WBB. Our mechanism achieves an LW of more than 1/4 of the optimal LW, and an SW of more than 1/2 of the optimal LW, in expectation. This result can be viewed as a budget extension of D"{u}tting et al. (2021). △ Less

Submitted 7 February, 2023; originally announced February 2023.

MSC Class: 91B26

arXiv:2301.13343 [pdf, other]

doi 10.1109/IJCNN55064.2022.9892464

Few-Shot Image-to-Semantics Translation for Policy Transfer in Reinforcement Learning

Authors: Rei Sato, Kazuto Fukuchi, Jun Sakuma, Youhei Akimoto

Abstract: We investigate policy transfer using image-to-semantics translation to mitigate learning difficulties in vision-based robotics control agents. This problem assumes two environments: a simulator environment with semantics, that is, low-dimensional and essential information, as the state space, and a real-world environment with images as the state space. By learning mapping from images to semantics,… ▽ More We investigate policy transfer using image-to-semantics translation to mitigate learning difficulties in vision-based robotics control agents. This problem assumes two environments: a simulator environment with semantics, that is, low-dimensional and essential information, as the state space, and a real-world environment with images as the state space. By learning mapping from images to semantics, we can transfer a policy, pre-trained in the simulator, to the real world, thereby eliminating real-world on-policy agent interactions to learn, which are costly and risky. In addition, using image-to-semantics mapping is advantageous in terms of the computational efficiency to train the policy and the interpretability of the obtained policy over other types of sim-to-real transfer strategies. To tackle the main difficulty in learning image-to-semantics mapping, namely the human annotation cost for producing a training dataset, we propose two techniques: pair augmentation with the transition function in the simulator environment and active learning. We observed a reduction in the annotation cost without a decline in the performance of the transfer, and the proposed approach outperformed the existing approach without annotation. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: The 2022 International Joint Conference on Neural Networks (IJCNN2022)

arXiv:2301.10956 [pdf, other]

Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure

Authors: Ryoma Sato

Abstract: Graph Neural Networks (GNNs) are popular models for graph learning problems. GNNs show strong empirical performance in many practical tasks. However, the theoretical properties have not been completely elucidated. In this paper, we investigate whether GNNs can exploit the graph structure from the perspective of the expressive power of GNNs. In our analysis, we consider graph generation processes t… ▽ More Graph Neural Networks (GNNs) are popular models for graph learning problems. GNNs show strong empirical performance in many practical tasks. However, the theoretical properties have not been completely elucidated. In this paper, we investigate whether GNNs can exploit the graph structure from the perspective of the expressive power of GNNs. In our analysis, we consider graph generation processes that are controlled by hidden (or latent) node features, which contain all information about the graph structure. A typical example of this framework is kNN graphs constructed from the hidden features. In our main results, we show that GNNs can recover the hidden node features from the input graph alone, even when all node features, including the hidden features themselves and any indirect hints, are unavailable. GNNs can further use the recovered node features for downstream tasks. These results show that GNNs can fully exploit the graph structure by themselves, and in effect, GNNs can use both the hidden and explicit node features for downstream tasks. In the experiments, we confirm the validity of our results by showing that GNNs can accurately recover the hidden features using a GNN architecture built based on our theoretical analysis. △ Less

Submitted 23 March, 2024; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: ICML 2023

arXiv:2212.04984 [pdf, other]

Transformer-based normative modelling for anomaly detection of early schizophrenia

Authors: Pedro F Da Costa, Jessica Dafflon, Sergio Leonardo Mendes, João Ricardo Sato, M. Jorge Cardoso, Robert Leech, Emily JH Jones, Walter H. L. Pinaya

Abstract: Despite the impact of psychiatric disorders on clinical health, early-stage diagnosis remains a challenge. Machine learning studies have shown that classifiers tend to be overly narrow in the diagnosis prediction task. The overlap between conditions leads to high heterogeneity among participants that is not adequately captured by classification models. To address this issue, normative approaches h… ▽ More Despite the impact of psychiatric disorders on clinical health, early-stage diagnosis remains a challenge. Machine learning studies have shown that classifiers tend to be overly narrow in the diagnosis prediction task. The overlap between conditions leads to high heterogeneity among participants that is not adequately captured by classification models. To address this issue, normative approaches have surged as an alternative method. By using a generative model to learn the distribution of healthy brain data patterns, we can identify the presence of pathologies as deviations or outliers from the distribution learned by the model. In particular, deep generative models showed great results as normative models to identify neurological lesions in the brain. However, unlike most neurological lesions, psychiatric disorders present subtle changes widespread in several brain regions, making these alterations challenging to identify. In this work, we evaluate the performance of transformer-based normative models to detect subtle brain changes expressed in adolescents and young adults. We trained our model on 3D MRI scans of neurotypical individuals (N=1,765). Then, we obtained the likelihood of neurotypical controls and psychiatric patients with early-stage schizophrenia from an independent dataset (N=93) from the Human Connectome Project. Using the predicted likelihood of the scans as a proxy for a normative score, we obtained an AUROC of 0.82 when assessing the difference between controls and individuals with early-stage schizophrenia. Our approach surpassed recent normative methods based on brain age and Gaussian Process, showing the promising use of deep generative models to help in individualised analyses. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Comments: 10 pages, 2 figures, 2 tables, presented at NeurIPS22@PAI4MH

arXiv:2211.03413 [pdf, other]

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

Authors: Takumi Tanabe, Rei Sato, Kazuto Fukuchi, Jun Sakuma, Youhei Akimoto

Abstract: In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of poli… ▽ More In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment. In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment. To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach. Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches. △ Less

Submitted 11 January, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

Comments: Neural Information Processing Systems 2022 (NeurIPS '22)

ACM Class: I.2.6

arXiv:2210.08205 [pdf, other]

Active Learning from the Web

Authors: Ryoma Sato

Abstract: Labeling data is one of the most costly processes in machine learning pipelines. Active learning is a standard approach to alleviating this problem. Pool-based active learning first builds a pool of unlabelled data and iteratively selects data to be labeled so that the total number of required labels is minimized, keeping the model performance high. Many effective criteria for choosing data from t… ▽ More Labeling data is one of the most costly processes in machine learning pipelines. Active learning is a standard approach to alleviating this problem. Pool-based active learning first builds a pool of unlabelled data and iteratively selects data to be labeled so that the total number of required labels is minimized, keeping the model performance high. Many effective criteria for choosing data from the pool have been proposed in the literature. However, how to build the pool is less explored. Specifically, most of the methods assume that a task-specific pool is given for free. In this paper, we advocate that such a task-specific pool is not always available and propose the use of a myriad of unlabelled data on the Web for the pool for which active learning is applied. As the pool is extremely large, it is likely that relevant data exist in the pool for many tasks, and we do not need to explicitly design and build the pool for each task. The challenge is that we cannot compute the acquisition scores of all data exhaustively due to the size of the pool. We propose an efficient method, Seafaring, to retrieve informative data in terms of active learning from the Web using a user-side information retrieval algorithm. In the experiments, we use the online Flickr environment as the pool for active learning. This pool contains more than ten billion images and is several orders of magnitude larger than the existing pools in the literature for active learning. We confirm that our method performs better than existing approaches of using a small unlabelled pool. △ Less

Submitted 10 February, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: WWW 2023

arXiv:2209.15505 [pdf, other]

Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data

Authors: Yuki Takezawa, Han Bao, Kenta Niwa, Ryoma Sato, Makoto Yamada

Abstract: SGD with momentum is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum is Distributed SGD (DSGD) with momentum (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momen… ▽ More SGD with momentum is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum is Distributed SGD (DSGD) with momentum (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momentum that are more robust to data heterogeneity than DSGDm, although their convergence rates remain dependent on data heterogeneity and deteriorate when the data distributions are heterogeneous. In this study, we propose Momentum Tracking, which is a method with momentum whose convergence rate is proven to be independent of data heterogeneity. More specifically, we analyze the convergence rate of Momentum Tracking in the setting where the objective function is non-convex and the stochastic gradient is used. Then, we identify that it is independent of data heterogeneity for any momentum coefficient $β\in [0, 1)$. Through experiments, we demonstrate that Momentum Tracking is more robust to data heterogeneity than the existing decentralized learning methods with momentum and can consistently outperform these existing methods when the data distributions are heterogeneous. △ Less

Submitted 24 September, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: Transactions on Machine Learning Research 2023

arXiv:2208.09864 [pdf, other]

Towards Principled User-side Recommender Systems

Authors: Ryoma Sato

Abstract: Traditionally, recommendation algorithms have been designed for service developers. However, recently, a new paradigm called user-side recommender systems has been proposed and they enable web service users to construct their own recommender systems without access to trade-secret data. This approach opens the door to user-defined fair systems even if the official recommender system of the service… ▽ More Traditionally, recommendation algorithms have been designed for service developers. However, recently, a new paradigm called user-side recommender systems has been proposed and they enable web service users to construct their own recommender systems without access to trade-secret data. This approach opens the door to user-defined fair systems even if the official recommender system of the service is not fair. While existing methods for user-side recommender systems have addressed the challenging problem of building recommender systems without using log data, they rely on heuristic approaches, and it is still unclear whether constructing user-side recommender systems is a well-defined problem from theoretical point of view. In this paper, we provide theoretical justification of user-side recommender systems. Specifically, we see that hidden item features can be recovered from the information available to the user, making the construction of user-side recommender system well-defined. However, this theoretically grounded approach is not efficient. To realize practical yet theoretically sound recommender systems, we propose three desirable properties of user-side recommender systems and propose an effective and efficient user-side recommender system, \textsc{Consul}, based on these foundations. We prove that \textsc{Consul} satisfies all three properties, whereas existing user-side recommender systems lack at least one of them. In the experiments, we empirically validate the theory of feature recovery via numerical experiments. We also show that our proposed method achieves an excellent trade-off between effectiveness and efficiency and demonstrate via case studies that the proposed method can retrieve information that the provider's official recommender system cannot. △ Less

Submitted 21 August, 2022; originally announced August 2022.

Comments: CIKM 2022

arXiv:2208.09862 [pdf, other]

Twin Papers: A Simple Framework of Causal Inference for Citations via Coupling

Authors: Ryoma Sato, Makoto Yamada, Hisashi Kashima

Abstract: The research process includes many decisions, e.g., how to entitle and where to publish the paper. In this paper, we introduce a general framework for investigating the effects of such decisions. The main difficulty in investigating the effects is that we need to know counterfactual results, which are not available in reality. The key insight of our framework is inspired by the existing counterfac… ▽ More The research process includes many decisions, e.g., how to entitle and where to publish the paper. In this paper, we introduce a general framework for investigating the effects of such decisions. The main difficulty in investigating the effects is that we need to know counterfactual results, which are not available in reality. The key insight of our framework is inspired by the existing counterfactual analysis using twins, where the researchers regard twins as counterfactual units. The proposed framework regards a pair of papers that cite each other as twins. Such papers tend to be parallel works, on similar topics, and in similar communities. We investigate twin papers that adopted different decisions, observe the progress of the research impact brought by these studies, and estimate the effect of decisions by the difference in the impacts of these studies. We release our code and data, which we believe are highly beneficial owing to the scarcity of the dataset on counterfactual studies. △ Less

Submitted 21 August, 2022; originally announced August 2022.

Comments: CIKM 2022 short paper

arXiv:2206.12116 [pdf, other]

Approximating 1-Wasserstein Distance with Trees

Authors: Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, Sujith Ravi

Abstract: Wasserstein distance, which measures the discrepancy between distributions, shows efficacy in various types of natural language processing (NLP) and computer vision (CV) applications. One of the challenges in estimating Wasserstein distance is that it is computationally expensive and does not scale well for many distribution comparison tasks. In this paper, we aim to approximate the 1-Wasserstein… ▽ More Wasserstein distance, which measures the discrepancy between distributions, shows efficacy in various types of natural language processing (NLP) and computer vision (CV) applications. One of the challenges in estimating Wasserstein distance is that it is computationally expensive and does not scale well for many distribution comparison tasks. In this paper, we aim to approximate the 1-Wasserstein distance by the tree-Wasserstein distance (TWD), where TWD is a 1-Wasserstein distance with tree-based embedding and can be computed in linear time with respect to the number of nodes on a tree. More specifically, we propose a simple yet efficient L1-regularized approach to learning the weights of the edges in a tree. To this end, we first show that the 1-Wasserstein approximation problem can be formulated as a distance approximation problem using the shortest path distance on a tree. We then show that the shortest path distance can be represented by a linear model and can be formulated as a Lasso-based regression problem. Owing to the convex formulation, we can obtain a globally optimal solution efficiently. Moreover, we propose a tree-sliced variant of these methods. Through experiments, we demonstrated that the weighted TWD can accurately approximate the original 1-Wasserstein distance. △ Less

Submitted 24 June, 2022; originally announced June 2022.

arXiv:2206.08521 [pdf, other]

CLEAR: A Fully User-side Image Search System

Authors: Ryoma Sato

Abstract: We use many search engines on the Internet in our daily lives. However, they are not perfect. Their scoring function may not model our intent or they may accept only text queries even though we want to carry out a similar image search. In such cases, we need to make a compromise: We continue to use the unsatisfactory service or leave the service. Recently, a new solution, user-side search systems,… ▽ More We use many search engines on the Internet in our daily lives. However, they are not perfect. Their scoring function may not model our intent or they may accept only text queries even though we want to carry out a similar image search. In such cases, we need to make a compromise: We continue to use the unsatisfactory service or leave the service. Recently, a new solution, user-side search systems, has been proposed. In this framework, each user builds their own search system that meets their preference with a user-defined scoring function and user-defined interface. Although the concept is appealing, it is still not clear if this approach is feasible in practice. In this demonstration, we show the first fully user-side image search system, CLEAR, which realizes a similar-image search engine for Flickr. The challenge is that Flickr does not provide an official similar image search engine or corresponding API. Nevertheless, CLEAR realizes it fully on a user-side. CLEAR does not use a backend server at all nor store any images or build search indices. It is in contrast to traditional search algorithms that require preparing a backend server and building a search index. Therefore, each user can easily deploy their own CLEAR engine, and the resulting service is custom-made and privacy-preserving. The online demo is available at https://clear.joisino.net. The source code is available at https://github.com/joisino/clear. △ Less

Submitted 16 June, 2022; originally announced June 2022.

arXiv:2205.01954 [pdf, other]

Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem

Authors: Ryoma Sato

Abstract: Word embeddings are one of the most fundamental technologies used in natural language processing. Existing word embeddings are high-dimensional and consume considerable computational resources. In this study, we propose WordTour, unsupervised one-dimensional word embeddings. To achieve the challenging goal, we propose a decomposition of the desiderata of word embeddings into two parts, completenes… ▽ More Word embeddings are one of the most fundamental technologies used in natural language processing. Existing word embeddings are high-dimensional and consume considerable computational resources. In this study, we propose WordTour, unsupervised one-dimensional word embeddings. To achieve the challenging goal, we propose a decomposition of the desiderata of word embeddings into two parts, completeness and soundness, and focus on soundness in this paper. Owing to the single dimensionality, WordTour is extremely efficient and provides a minimal means to handle word embeddings. We experimentally confirmed the effectiveness of the proposed method via user study and document classification. △ Less

Submitted 4 May, 2022; originally announced May 2022.

Comments: NAACL 2022

arXiv:2203.08402 [pdf, ps, other]

Gradual Tensor Shape Checking

Authors: Momoko Hattori, Naoki Kobayashi, Ryosuke Sato

Abstract: Tensor shape mismatch is a common source of bugs in deep learning programs. We propose a new type-based approach to detect tensor shape mismatches. One of the main features of our approach is the best-effort shape inference. As the tensor shape inference problem is undecidable in general, we allow static type/shape inference to be performed only in a best-effort manner. If the static inference can… ▽ More Tensor shape mismatch is a common source of bugs in deep learning programs. We propose a new type-based approach to detect tensor shape mismatches. One of the main features of our approach is the best-effort shape inference. As the tensor shape inference problem is undecidable in general, we allow static type/shape inference to be performed only in a best-effort manner. If the static inference cannot guarantee the absence of the shape inconsistencies, dynamic checks are inserted into the program. Another main feature is gradual typing, where users can improve the precision of the inference by adding appropriate type annotations to the program. We formalize our approach and prove that it satisfies the criteria of gradual typing proposed by Siek et al. in 2015. We have implemented a prototype shape checking tool based on our approach and evaluated its effectiveness by applying it to some deep neural network programs. △ Less

Submitted 25 March, 2023; v1 submitted 16 March, 2022; originally announced March 2022.

Comments: 48 pages

ACM Class: D.2.4

arXiv:2203.07601 [pdf, ps, other]

Automatic HFL(Z) Validity Checking for Program Verification

Authors: Naoki Kobayashi, Kento Tanahashi, Ryosuke Sato, Takeshi Tsukada

Abstract: We propose an automated method for checking the validity of a formula of HFL(Z), a higher-order logic with fixpoint operators and integers. Combined with Kobayashi et al.'s reduction from higher-order program verification to HFL(Z) validity checking, our method yields a fully automated, uniform verification method for arbitrary temporal properties of higher-order functional programs expressible in… ▽ More We propose an automated method for checking the validity of a formula of HFL(Z), a higher-order logic with fixpoint operators and integers. Combined with Kobayashi et al.'s reduction from higher-order program verification to HFL(Z) validity checking, our method yields a fully automated, uniform verification method for arbitrary temporal properties of higher-order functional programs expressible in the modal mu-calculus, including termination, non-termination, fair termination, fair non-termination, and also branching-time properties. We have implemented our method and obtained promising experimental results. △ Less

Submitted 8 December, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: A long version of the paper published in Proceedings of POPL 2023

arXiv:2112.14921 [pdf, other]

Retrieving Black-box Optimal Images from External Databases

Authors: Ryoma Sato

Abstract: Suppose we have a black-box function (e.g., deep neural network) that takes an image as input and outputs a value that indicates preference. How can we retrieve optimal images with respect to this function from an external database on the Internet? Standard retrieval problems in the literature (e.g., item recommendations) assume that an algorithm has full access to the set of items. In other words… ▽ More Suppose we have a black-box function (e.g., deep neural network) that takes an image as input and outputs a value that indicates preference. How can we retrieve optimal images with respect to this function from an external database on the Internet? Standard retrieval problems in the literature (e.g., item recommendations) assume that an algorithm has full access to the set of items. In other words, such algorithms are designed for service providers. In this paper, we consider the retrieval problem under different assumptions. Specifically, we consider how users with limited access to an image database can retrieve images using their own black-box functions. This formulation enables a flexible and finer-grained image search defined by each user. We assume the user can access the database through a search query with tight API limits. Therefore, a user needs to efficiently retrieve optimal images in terms of the number of queries. We propose an efficient retrieval algorithm Tiara for this problem. In the experiments, we confirm that our proposed method performs better than several baselines under various settings. △ Less

Submitted 29 December, 2021; originally announced December 2021.

Comments: WSDM 2022

arXiv:2109.03431 [pdf, other]

Fixed Support Tree-Sliced Wasserstein Barycenter

Authors: Yuki Takezawa, Ryoma Sato, Zornitsa Kozareva, Sujith Ravi, Makoto Yamada

Abstract: The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the t… ▽ More The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the tree-Wasserstein distance, can be computed in linear time and allows for the fast comparison of a large number of distributions. In this study, we propose a barycenter under the tree-Wasserstein distance, called the fixed support tree-Wasserstein barycenter (FS-TWB) and its extension, called the fixed support tree-sliced Wasserstein barycenter (FS-TSWB). More specifically, we first show that the FS-TWB and FS-TSWB problems are convex optimization problems and can be solved by using the projected subgradient descent. Moreover, we propose a more efficient algorithm to compute the subgradient and objective function value by using the properties of tree-Wasserstein barycenter problems. Through real-world experiments, we show that, by using the proposed algorithm, the FS-TWB and FS-TSWB can be solved two orders of magnitude faster than the original Wasserstein barycenter. △ Less

Submitted 11 February, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: AISTATS 2022

arXiv:2109.00311 [pdf, ps, other]

Termination Analysis for the $π$-Calculus by Reduction to Sequential Program Termination

Authors: Tsubasa Shoshi, Takuma Ishikawa, Naoki Kobayashi, Ken Sakayori, Ryosuke Sato, Takeshi Tsukada

Abstract: We propose an automated method for proving termination of $π$-calculus processes, based on a reduction to termination of sequential programs: we translate a $π$-calculus process to a sequential program, so that the termination of the latter implies that of the former. We can then use an off-the-shelf termination verification tool to check termination of the sequential program. Our approach has bee… ▽ More We propose an automated method for proving termination of $π$-calculus processes, based on a reduction to termination of sequential programs: we translate a $π$-calculus process to a sequential program, so that the termination of the latter implies that of the former. We can then use an off-the-shelf termination verification tool to check termination of the sequential program. Our approach has been partially inspired by Deng and Sangiorgi's termination analysis for the $π$-calculus, and checks that there is no infinite chain of communications on replicated input channels, by converting such a chain of communications to a chain of recursive function calls in the target sequential program. We have implemented an automated tool based on the proposed method and confirmed its effectiveness. △ Less

Submitted 1 September, 2021; originally announced September 2021.

Comments: A shorter version will appear in Proceedings of APLAS 2021

arXiv:2108.07642 [pdf, ps, other]

Symbolic Automatic Relations and Their Applications to SMT and CHC Solving

Authors: Takumi Shimoda, Naoki Kobayashi, Ken Sakayori, Ryosuke Sato

Abstract: Despite the recent advance of automated program verification, reasoning about recursive data structures remains as a challenge for verification tools and their backends such as SMT and CHC solvers. To address the challenge, we introduce the notion of symbolic automatic relations (SARs), which combines symbolic automata and automatic relations, and inherits their good properties such as the closure… ▽ More Despite the recent advance of automated program verification, reasoning about recursive data structures remains as a challenge for verification tools and their backends such as SMT and CHC solvers. To address the challenge, we introduce the notion of symbolic automatic relations (SARs), which combines symbolic automata and automatic relations, and inherits their good properties such as the closure under Boolean operations. We consider the satisfiability problem for SARs, and show that it is undecidable in general, but that we can construct a sound (but incomplete) and automated satisfiability checker by a reduction to CHC solving. We discuss applications to SMT and CHC solving on data structures, and show the effectiveness of our approach through experiments. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: A shorter version will appear in Proceedings of SAS 2021

arXiv:2105.14423 [pdf, other]

Enumerating Fair Packages for Group Recommendations

Authors: Ryoma Sato

Abstract: Package-to-group recommender systems recommend a set of unified items to a group of people. Different from conventional settings, it is not easy to measure the utility of group recommendations because it involves more than one user. In particular, fairness is crucial in group recommendations. Even if some members in a group are substantially satisfied with a recommendation, it is undesirable if ot… ▽ More Package-to-group recommender systems recommend a set of unified items to a group of people. Different from conventional settings, it is not easy to measure the utility of group recommendations because it involves more than one user. In particular, fairness is crucial in group recommendations. Even if some members in a group are substantially satisfied with a recommendation, it is undesirable if other members are ignored to increase the total utility. Many methods for evaluating and applying the fairness of group recommendations have been proposed in the literature. However, all these methods maximize the score and output only one package. This is in contrast to conventional recommender systems, which output several (e.g., top-$K$) candidates. This can be problematic because a group can be dissatisfied with the recommended package owing to some unobserved reasons, even if the score is high. To address this issue, we propose a method to enumerate fair packages efficiently. Our method furthermore supports filtering queries, such as top-$K$ and intersection, to select favorite packages when the list is long. We confirm that our algorithm scales to large datasets and can balance several aspects of the utility of the packages. △ Less

Submitted 27 December, 2021; v1 submitted 30 May, 2021; originally announced May 2021.

Comments: WSDM 2022

arXiv:2105.14403 [pdf, other]

Re-evaluating Word Mover's Distance

Authors: Ryoma Sato, Makoto Yamada, Hisashi Kashima

Abstract: The word mover's distance (WMD) is a fundamental technique for measuring the similarity of two documents. As the crux of WMD, it can take advantage of the underlying geometry of the word space by employing an optimal transport formulation. The original study on WMD reported that WMD outperforms classical baselines such as bag-of-words (BOW) and TF-IDF by significant margins in various datasets. In… ▽ More The word mover's distance (WMD) is a fundamental technique for measuring the similarity of two documents. As the crux of WMD, it can take advantage of the underlying geometry of the word space by employing an optimal transport formulation. The original study on WMD reported that WMD outperforms classical baselines such as bag-of-words (BOW) and TF-IDF by significant margins in various datasets. In this paper, we point out that the evaluation in the original study could be misleading. We re-evaluate the performances of WMD and the classical baselines and find that the classical baselines are competitive with WMD if we employ an appropriate preprocessing, i.e., L1 normalization. In addition, we introduce an analogy between WMD and L1-normalized BOW and find that not only the performance of WMD but also the distance values resemble those of BOW in high dimensional spaces. △ Less

Submitted 15 June, 2022; v1 submitted 29 May, 2021; originally announced May 2021.

Comments: ICML 2022

arXiv:2105.13954 [pdf, other]

A Gradient Method for Multilevel Optimization

Authors: Ryo Sato, Mirai Tanaka, Akiko Takeda

Abstract: Although application examples of multilevel optimization have already been discussed since the 1990s, the development of solution methods was almost limited to bilevel cases due to the difficulty of the problem. In recent years, in machine learning, Franceschi et al. have proposed a method for solving bilevel optimization problems by replacing their lower-level problems with the $T$ steepest desce… ▽ More Although application examples of multilevel optimization have already been discussed since the 1990s, the development of solution methods was almost limited to bilevel cases due to the difficulty of the problem. In recent years, in machine learning, Franceschi et al. have proposed a method for solving bilevel optimization problems by replacing their lower-level problems with the $T$ steepest descent update equations with some prechosen iteration number $T$. In this paper, we have developed a gradient-based algorithm for multilevel optimization with $n$ levels based on their idea and proved that our reformulation asymptotically converges to the original multilevel problem. As far as we know, this is one of the first algorithms with some theoretical guarantee for multilevel optimization. Numerical experiments show that a trilevel hyperparameter learning model considering data poisoning produces more stable prediction results than an existing bilevel hyperparameter learning model in noisy data settings. △ Less

Submitted 26 October, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: NeurIPS 2021 camera-ready, 27 pages

arXiv:2105.12353 [pdf, other]

Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data?

Authors: Ryoma Sato

Abstract: Fairness is a crucial property in recommender systems. Although some online services have adopted fairness aware systems recently, many other services have not adopted them yet. In this work, we propose methods to enable the users to build their own fair recommender systems. Our methods can generate fair recommendations even when the service does not (or cannot) provide fair recommender systems. T… ▽ More Fairness is a crucial property in recommender systems. Although some online services have adopted fairness aware systems recently, many other services have not adopted them yet. In this work, we propose methods to enable the users to build their own fair recommender systems. Our methods can generate fair recommendations even when the service does not (or cannot) provide fair recommender systems. The key challenge is that a user does not have access to the log data of other users or the latent representations of items. This restriction prohibits us from adopting existing methods designed for service providers. The main idea is that a user has access to unfair recommendations shown by the service provider. Our methods leverage the outputs of an unfair recommender system to construct a new fair recommender system. We empirically validate that our proposed method improves fairness substantially without harming much performance of the original unfair system. △ Less

Submitted 19 January, 2022; v1 submitted 26 May, 2021; originally announced May 2021.

Comments: SDM 2022

arXiv:2101.11520 [pdf, other]

Supervised Tree-Wasserstein Distance

Authors: Yuki Takezawa, Ryoma Sato, Makoto Yamada

Abstract: To measure the similarity of documents, the Wasserstein distance is a powerful tool, but it requires a high computational cost. Recently, for fast computation of the Wasserstein distance, methods for approximating the Wasserstein distance using a tree metric have been proposed. These tree-based methods allow fast comparisons of a large number of documents; however, they are unsupervised and do not… ▽ More To measure the similarity of documents, the Wasserstein distance is a powerful tool, but it requires a high computational cost. Recently, for fast computation of the Wasserstein distance, methods for approximating the Wasserstein distance using a tree metric have been proposed. These tree-based methods allow fast comparisons of a large number of documents; however, they are unsupervised and do not learn task-specific distances. In this work, we propose the Supervised Tree-Wasserstein (STW) distance, a fast, supervised metric learning method based on the tree metric. Specifically, we rewrite the Wasserstein distance on the tree metric by the parent-child relationships of a tree and formulate it as a continuous optimization problem using a contrastive loss. Experimentally, we show that the STW distance can be computed fast, and improves the accuracy of document classification tasks. Furthermore, the STW distance is formulated by matrix multiplications, runs on a GPU, and is suitable for batch processing. Therefore, we show that the STW distance is extremely efficient when comparing a large number of documents. △ Less

Submitted 23 July, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

arXiv:2012.08053 [pdf, other]

A Quantitative Study of Security Bug Fixes of GitHub Repositories

Authors: Daito Nakano, Mingyang Yin, Ryosuke Sato, Abram Hindle, Yasutaka Kamei, Naoyasu Ubayashi

Abstract: Software is prone to bugs and failures. Security bugs are those that expose or share privileged information and access in violation of the software's requirements. Given the seriousness of security bugs, there are centralized mechanisms for supporting and tracking these bugs across multiple products, one such mechanism is the Common Vulnerabilities and Exposures (CVE) ID description. When a bug ge… ▽ More Software is prone to bugs and failures. Security bugs are those that expose or share privileged information and access in violation of the software's requirements. Given the seriousness of security bugs, there are centralized mechanisms for supporting and tracking these bugs across multiple products, one such mechanism is the Common Vulnerabilities and Exposures (CVE) ID description. When a bug gets a CVE, it is referenced by its CVE ID. Thus we explore thousands of Free/Libre Open Source Software (FLOSS) projects, on Github, to determine if developers reference or discuss CVEs in their code, commits, and issues. CVEs will often refer to 3rd party software dependencies of a project and thus the bug will not be in the actual product itself. We study how many of these references are intentional CVE references, and how many are relevant bugs within the projects themselves. We investigate how the bugs that reference CVEs are fixed and how long it takes to fix these bugs. The results of our manual classification for 250 bug reports show that 88 (35%), 32 (13%), and 130 (52%) are classified into "Version Update", "Fixing Code", and "Discussion". To understand how long it takes to fix those bugs, we compare two periods, Reporting Period, a period between the disclosure date of vulnerability information in CVE repositories and the creation date of the bug report in a project, and Fixing Period, a period between the creation date of the bug report and the fixing date of the bug report. We find that 44% of bug reports that are classified into "Version Update" or "Fixing Code" have longer Reporting Period than Fixing Period. This suggests that those who submit CVEs should notify affected projects more directly. △ Less

Submitted 14 December, 2020; originally announced December 2020.

arXiv:2012.06138 [pdf, other]

AdvantageNAS: Efficient Neural Architecture Search with Credit Assignment

Authors: Rei Sato, Jun Sakuma, Youhei Akimoto

Abstract: Neural architecture search (NAS) is an approach for automatically designing a neural network architecture without human effort or expert knowledge. However, the high computational cost of NAS limits its use in commercial applications. Two recent NAS paradigms, namely one-shot and sparse propagation, which reduce the time and space complexities, respectively, provide clues for solving this problem.… ▽ More Neural architecture search (NAS) is an approach for automatically designing a neural network architecture without human effort or expert knowledge. However, the high computational cost of NAS limits its use in commercial applications. Two recent NAS paradigms, namely one-shot and sparse propagation, which reduce the time and space complexities, respectively, provide clues for solving this problem. In this paper, we propose a novel search strategy for one-shot and sparse propagation NAS, namely AdvantageNAS, which further reduces the time complexity of NAS by reducing the number of search iterations. AdvantageNAS is a gradient-based approach that improves the search efficiency by introducing credit assignment in gradient estimation for architecture updates. Experiments on the NAS-Bench-201 and PTB dataset show that AdvantageNAS discovers an architecture with higher performance under a limited time budget compared to existing sparse propagation NAS. To further reveal the reliabilities of AdvantageNAS, we investigate it theoretically and find that it monotonically improves the expected loss and thus converges. △ Less

Submitted 9 March, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

arXiv:2010.09157 [pdf, other]

doi 10.1016/j.joi.2022.101283

Poincare: Recommending Publication Venues via Treatment Effect Estimation

Authors: Ryoma Sato, Makoto Yamada, Hisashi Kashima

Abstract: Choosing a publication venue for an academic paper is a crucial step in the research process. However, in many cases, decisions are based solely on the experience of researchers, which often leads to suboptimal results. Although there exist venue recommender systems for academic papers, they recommend venues where the paper is expected to be published. In this study, we aim to recommend publicatio… ▽ More Choosing a publication venue for an academic paper is a crucial step in the research process. However, in many cases, decisions are based solely on the experience of researchers, which often leads to suboptimal results. Although there exist venue recommender systems for academic papers, they recommend venues where the paper is expected to be published. In this study, we aim to recommend publication venues from a different perspective. We estimate the number of citations a paper will receive if the paper is published in each venue and recommend the venue where the paper has the most potential impact. However, there are two challenges to this task. First, a paper is published in only one venue, and thus, we cannot observe the number of citations the paper would receive if the paper were published in another venue. Secondly, the contents of a paper and the publication venue are not statistically independent; that is, there exist selection biases in choosing publication venues. In this paper, we formulate the venue recommendation problem as a treatment effect estimation problem. We use a bias correction method to estimate the potential impact of choosing a publication venue effectively and to recommend venues based on the potential impact of papers in each venue. We highlight the effectiveness of our method using paper data from computer science conferences. △ Less

Submitted 2 September, 2022; v1 submitted 18 October, 2020; originally announced October 2020.

Comments: Journal of Informetrics

arXiv:2006.02703 [pdf, ps, other]

Fast Unbalanced Optimal Transport on a Tree

Authors: Ryoma Sato, Makoto Yamada, Hisashi Kashima

Abstract: This study examines the time complexities of the unbalanced optimal transport problems from an algorithmic perspective for the first time. We reveal which problems in unbalanced optimal transport can/cannot be solved efficiently. Specifically, we prove that the Kantorovich Rubinstein distance and optimal partial transport in the Euclidean metric cannot be computed in strongly subquadratic time und… ▽ More This study examines the time complexities of the unbalanced optimal transport problems from an algorithmic perspective for the first time. We reveal which problems in unbalanced optimal transport can/cannot be solved efficiently. Specifically, we prove that the Kantorovich Rubinstein distance and optimal partial transport in the Euclidean metric cannot be computed in strongly subquadratic time under the strong exponential time hypothesis. Then, we propose an algorithm that solves a more general unbalanced optimal transport problem exactly in quasi-linear time on a tree metric. The proposed algorithm processes a tree with one million nodes in less than one second. Our analysis forms a foundation for the theoretical study of unbalanced optimal transport algorithms and opens the door to the applications of unbalanced optimal transport to million-scale datasets. △ Less

Submitted 7 January, 2021; v1 submitted 4 June, 2020; originally announced June 2020.

Comments: Accepted to NeurIPS 2020

arXiv:2005.12123 [pdf, other]

Feature Robust Optimal Transport for High-dimensional Data

Authors: Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto Yamada

Abstract: Optimal transport is a machine learning problem with applications including distribution comparison, feature selection, and generative adversarial networks. In this paper, we propose feature-robust optimal transport (FROT) for high-dimensional data, which solves high-dimensional OT problems using feature selection to avoid the curse of dimensionality. Specifically, we find a transport plan with di… ▽ More Optimal transport is a machine learning problem with applications including distribution comparison, feature selection, and generative adversarial networks. In this paper, we propose feature-robust optimal transport (FROT) for high-dimensional data, which solves high-dimensional OT problems using feature selection to avoid the curse of dimensionality. Specifically, we find a transport plan with discriminative features. To this end, we formulate the FROT problem as a min--max optimization problem. We then propose a convex formulation of the FROT problem and solve it using a Frank--Wolfe-based optimization algorithm, whereby the subproblem can be efficiently solved using the Sinkhorn algorithm. Since FROT finds the transport plan from selected features, it is robust to noise features. To show the effectiveness of FROT, we propose using the FROT algorithm for the layer selection problem in deep neural networks for semantic correspondence. By conducting synthetic and benchmark experiments, we demonstrate that the proposed method can find a strong correspondence by determining important layers. We show that the FROT algorithm achieves state-of-the-art performance in real-world semantic correspondence datasets. △ Less

Submitted 29 September, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

arXiv:2003.04078 [pdf, ps, other]

A Survey on The Expressive Power of Graph Neural Networks

Authors: Ryoma Sato

Abstract: Graph neural networks (GNNs) are effective machine learning models for various graph learning problems. Despite their empirical successes, the theoretical limitations of GNNs have been revealed recently. Consequently, many GNN models have been proposed to overcome these limitations. In this survey, we provide a comprehensive overview of the expressive power of GNNs and provably powerful variants o… ▽ More Graph neural networks (GNNs) are effective machine learning models for various graph learning problems. Despite their empirical successes, the theoretical limitations of GNNs have been revealed recently. Consequently, many GNN models have been proposed to overcome these limitations. In this survey, we provide a comprehensive overview of the expressive power of GNNs and provably powerful variants of GNNs. △ Less

Submitted 16 October, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: 42 pages

arXiv:2002.03155 [pdf, ps, other]

Random Features Strengthen Graph Neural Networks

Authors: Ryoma Sato, Makoto Yamada, Hisashi Kashima

Abstract: Graph neural networks (GNNs) are powerful machine learning models for various graph learning tasks. Recently, the limitations of the expressive power of various GNN models have been revealed. For example, GNNs cannot distinguish some non-isomorphic graphs and they cannot learn efficient graph algorithms. In this paper, we demonstrate that GNNs become powerful just by adding a random feature to eac… ▽ More Graph neural networks (GNNs) are powerful machine learning models for various graph learning tasks. Recently, the limitations of the expressive power of various GNN models have been revealed. For example, GNNs cannot distinguish some non-isomorphic graphs and they cannot learn efficient graph algorithms. In this paper, we demonstrate that GNNs become powerful just by adding a random feature to each node. We prove that the random features enable GNNs to learn almost optimal polynomial-time approximation algorithms for the minimum dominating set problem and maximum matching problem in terms of approximation ratios. The main advantage of our method is that it can be combined with off-the-shelf GNN models with slight modifications. Through experiments, we show that the addition of random features enables GNNs to solve various problems that normal GNNs, including the graph convolutional networks (GCNs) and graph isomorphism networks (GINs), cannot solve. △ Less

Submitted 18 January, 2021; v1 submitted 8 February, 2020; originally announced February 2020.

Comments: Accepted to SDM 2021

arXiv:2002.01615 [pdf, ps, other]

Fast and Robust Comparison of Probability Measures in Heterogeneous Spaces

Authors: Ryoma Sato, Marco Cuturi, Makoto Yamada, Hisashi Kashima

Abstract: Comparing two probability measures supported on heterogeneous spaces is an increasingly important problem in machine learning. Such problems arise when comparing for instance two populations of biological cells, each described with its own set of features, or when looking at families of word embeddings trained across different corpora/languages. For such settings, the Gromov Wasserstein (GW) dista… ▽ More Comparing two probability measures supported on heterogeneous spaces is an increasingly important problem in machine learning. Such problems arise when comparing for instance two populations of biological cells, each described with its own set of features, or when looking at families of word embeddings trained across different corpora/languages. For such settings, the Gromov Wasserstein (GW) distance is often presented as the gold standard. GW is intuitive, as it quantifies whether one measure can be isomorphically mapped to the other. However, its exact computation is intractable, and most algorithms that claim to approximate it remain expensive. Building on \cite{memoli-2011}, who proposed to represent each point in each distribution as the 1D distribution of its distances to all other points, we introduce in this paper the Anchor Energy (AE) and Anchor Wasserstein (AW) distances, which are respectively the energy and Wasserstein distances instantiated on such representations. Our main contribution is to propose a sweep line algorithm to compute AE \emph{exactly} in log-quadratic time, where a naive implementation would be cubic. This is quasi-linear w.r.t. the description of the problem itself. Our second contribution is the proposal of robust variants of AE and AW that uses rank statistics rather than the original distances. We show that AE and AW perform well in various experimental settings at a fraction of the computational cost of popular GW approximations. Code is available at \url{https://github.com/joisino/anchor-energy}. △ Less

Submitted 10 February, 2021; v1 submitted 4 February, 2020; originally announced February 2020.

arXiv:1910.06155 [pdf]

GeoSES -- um Índice Socioeconômico para Estudos de Saúde no Brasil

Authors: Ligia Vizeu Barrozo, Michel Fornaciali, Carmen Diva Saldiva de André, Guilherme Augusto Zimeo Morais, Giselle Mansur, William Cabral-Miranda, João Ricardo Sato, Edson Amaro Júnior

Abstract: Objective: to define an index that summarizes the main dimensions of the socioeconomic context for research purposes, evaluation and monitoring health inequalities. Methods: the index was created from the 2010 Brazilian Demographic Census, whose variables selection was guided by theoretical references for health studies, including seven socioeconomic dimensions: education, mobility, poverty, wealt… ▽ More Objective: to define an index that summarizes the main dimensions of the socioeconomic context for research purposes, evaluation and monitoring health inequalities. Methods: the index was created from the 2010 Brazilian Demographic Census, whose variables selection was guided by theoretical references for health studies, including seven socioeconomic dimensions: education, mobility, poverty, wealth, income, segregation and deprivation of resources and services. The index was developed using principal component analysis, and was evaluated for its construct, content and applicability components. Results: GeoSES-BR dimensions showed good association with HDI-M (above 0.85). The model with the poverty dimension best explained the relative risk of avoidable cause mortality in Brazil. In the intraurban scale, the model with GeoSES-IM was the one that best explained the relative risk of mortality from circulatory system diseases. Conclusion: GeoSES showed significant explanatory potential in the studied scales. △ Less

Submitted 9 October, 2019; originally announced October 2019.

Comments: in Portuguese

arXiv:1905.10261 [pdf, ps, other]

Approximation Ratios of Graph Neural Networks for Combinatorial Problems

Authors: Ryoma Sato, Makoto Yamada, Hisashi Kashima

Abstract: In this paper, from a theoretical perspective, we study how powerful graph neural networks (GNNs) can be for learning approximation algorithms for combinatorial problems. To this end, we first establish a new class of GNNs that can solve a strictly wider variety of problems than existing GNNs. Then, we bridge the gap between GNN theory and the theory of distributed local algorithms. We theoretical… ▽ More In this paper, from a theoretical perspective, we study how powerful graph neural networks (GNNs) can be for learning approximation algorithms for combinatorial problems. To this end, we first establish a new class of GNNs that can solve a strictly wider variety of problems than existing GNNs. Then, we bridge the gap between GNN theory and the theory of distributed local algorithms. We theoretically demonstrate that the most powerful GNN can learn approximation algorithms for the minimum dominating set problem and the minimum vertex cover problem with some approximation ratios with the aid of the theory of distributed local algorithms. We also show that most of the existing GNNs such as GIN, GAT, GCN, and GraphSAGE cannot perform better than with these ratios. This paper is the first to elucidate approximation ratios of GNNs for combinatorial problems. Furthermore, we prove that adding coloring or weak-coloring to each node feature improves these approximation ratios. This indicates that preprocessing and feature engineering theoretically strengthen model capabilities. △ Less

Submitted 8 November, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

Comments: Accepted to NeurIPS 2019

arXiv:1902.09700 [pdf, ps, other]

Learning to Sample Hard Instances for Graph Algorithms

Authors: Ryoma Sato, Makoto Yamada, Hisashi Kashima

Abstract: Hard instances, which require a long time for a specific algorithm to solve, help (1) analyze the algorithm for accelerating it and (2) build a good benchmark for evaluating the performance of algorithms. There exist several efforts for automatic generation of hard instances. For example, evolutionary algorithms have been utilized to generate hard instances. However, they generate only finite numb… ▽ More Hard instances, which require a long time for a specific algorithm to solve, help (1) analyze the algorithm for accelerating it and (2) build a good benchmark for evaluating the performance of algorithms. There exist several efforts for automatic generation of hard instances. For example, evolutionary algorithms have been utilized to generate hard instances. However, they generate only finite number of hard instances. The merit of such methods is limited because it is difficult to extract meaningful patterns from small number of instances. We seek for a probabilistic generator of hard instances. Once the generative distribution of hard instances is obtained, we can sample a variety of hard instances to build a benchmark, and we can extract meaningful patterns of hard instances from sampled instances. The existing methods for modeling the hard instance distribution rely on parameters or rules that are found by domain experts; however, they are specific to the problem. Hence, it is challenging to model the distribution for general cases. In this paper, we focus on graph problems. We propose HiSampler, the hard instance sampler, to model the hard instance distribution of graph algorithms. HiSampler makes it possible to obtain the distribution of hard instances without hand-engineered features. To the best of our knowledge, this is the first method to learn the distribution of hard instances using machine learning. Through experiments, we demonstrate that our proposed method can generate instances that are a few to several orders of magnitude harder than the random-based approach in many settings. In particular, our method outperforms rule-based algorithms in the 3-coloring problem. △ Less

Submitted 3 October, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

Comments: 16 pages, 4 figures, accepted by ACML 2019

arXiv:1901.07868 [pdf, ps, other]

doi 10.1145/3502733

Constant Time Graph Neural Networks

Authors: Ryoma Sato, Makoto Yamada, Hisashi Kashima

Abstract: The recent advancements in graph neural networks (GNNs) have led to state-of-the-art performances in various applications, including chemo-informatics, question-answering systems, and recommender systems. However, scaling up these methods to huge graphs, such as social networks and Web graphs, remains a challenge. In particular, the existing methods for accelerating GNNs either are not theoretical… ▽ More The recent advancements in graph neural networks (GNNs) have led to state-of-the-art performances in various applications, including chemo-informatics, question-answering systems, and recommender systems. However, scaling up these methods to huge graphs, such as social networks and Web graphs, remains a challenge. In particular, the existing methods for accelerating GNNs either are not theoretically guaranteed in terms of the approximation error or incur at least a linear time computation cost. In this study, we reveal the query complexity of the uniform node sampling scheme for Message Passing Neural Networks, including GraphSAGE, graph attention networks (GATs), and graph convolutional networks (GCNs). Surprisingly, our analysis reveals that the complexity of the node sampling method is completely independent of the number of the nodes, edges, and neighbors of the input and depends only on the error tolerance and confidence probability while providing a theoretical guarantee for the approximation error. To the best of our knowledge, this is the first paper to provide a theoretical guarantee of approximation for GNNs within constant time. Through experiments with synthetic and real-world datasets, we investigated the speed and precision of the node sampling scheme and validated our theoretical results. △ Less

Submitted 29 March, 2022; v1 submitted 23 January, 2019; originally announced January 2019.

Comments: TKDD 2022

Journal ref: ACM Trans. Knowl. Discov. Data. 16, 5, Article 92 (March 2022)

arXiv:1708.04881 [pdf, other]

Polyhedral Clinching Auctions for Two-sided Markets

Authors: Hiroshi Hirai, Ryosuke Sato

Abstract: In this paper, we present a new model and two mechanisms for auctions in two-sided markets of buyers and sellers, where budget constraints are imposed on buyers. Our model incorporates polymatroidal environments, and is applicable to a wide variety of models that include multiunit auctions, matching markets and reservation exchange markets. Our mechanisms are build on polymatroidal network flow mo… ▽ More In this paper, we present a new model and two mechanisms for auctions in two-sided markets of buyers and sellers, where budget constraints are imposed on buyers. Our model incorporates polymatroidal environments, and is applicable to a wide variety of models that include multiunit auctions, matching markets and reservation exchange markets. Our mechanisms are build on polymatroidal network flow model by Lawler and Martel, and enjoy various nice properties such as incentive compatibility of buyers, individual rationality, pareto optimality, strong budget balance. The first mechanism is a simple "reduce-to-recover" algorithm that reduces the market to be one-sided, applies the polyhedral clinching auction by Goel et al, and lifts the resulting allocation to the original two-sided market via polymatroidal network flow. The second mechanism is a two-sided generalization of the polyhedral clinching auction, which improves the first mechanism in terms of the fairness of revenue sharing on sellers. Both mechanisms are implemented by polymatroid algorithms. We demonstrate how our framework is applied to internet display ad auctions. △ Less

Submitted 13 September, 2018; v1 submitted 14 August, 2017; originally announced August 2017.

MSC Class: 91B26; 91-08

arXiv:1208.2976 [pdf, other]

Discriminating different classes of biological networks by analyzing the graphs spectra distribution

Authors: Daniel Yasumasa Takahashi, João Ricardo Sato, Carlos Eduardo Ferreira, André Fujita

Abstract: The brain's structural and functional systems, protein-protein interaction, and gene networks are examples of biological systems that share some features of complex networks, such as highly connected nodes, modularity, and small-world topology. Recent studies indicate that some pathologies present topological network alterations relative to norms seen in the general population. Therefore, methods… ▽ More The brain's structural and functional systems, protein-protein interaction, and gene networks are examples of biological systems that share some features of complex networks, such as highly connected nodes, modularity, and small-world topology. Recent studies indicate that some pathologies present topological network alterations relative to norms seen in the general population. Therefore, methods to discriminate the processes that generate the different classes of networks (e.g., normal and disease) might be crucial for the diagnosis, prognosis, and treatment of the disease. It is known that several topological properties of a network (graph) can be described by the distribution of the spectrum of its adjacency matrix. Moreover, large networks generated by the same random process have the same spectrum distribution, allowing us to use it as a "fingerprint". Based on this relationship, we introduce and propose the entropy of a graph spectrum to measure the "uncertainty" of a random graph and the Kullback-Leibler and Jensen-Shannon divergences between graph spectra to compare networks. We also introduce general methods for model selection and network model parameter estimation, as well as a statistical procedure to test the nullity of divergence between two classes of complex networks. Finally, we demonstrate the usefulness of the proposed methods by applying them on (1) protein-protein interaction networks of different species and (2) on networks derived from children diagnosed with Attention Deficit Hyperactivity Disorder (ADHD) and typically developing children. We conclude that scale-free networks best describe all the protein-protein interactions. Also, we show that our proposed measures succeeded in the identification of topological changes in the network while other commonly used measures (number of edges, clustering coefficient, average path length) failed. △ Less

Submitted 14 August, 2012; originally announced August 2012.

Showing 1–48 of 48 results for author: Sato, R