Zum Hauptinhalt springen

Showing 1–13 of 13 results for author: Takezawa, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15010  [pdf, other

    cs.LG math.OC

    Polyak Meets Parameter-free Clipped Gradient Descent

    Authors: Yuki Takezawa, Han Bao, Ryoma Sato, Kenta Niwa, Makoto Yamada

    Abstract: Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparameters, we need to tune the hyperparameters carefully using a grid search, but it is time-consuming, especially when multiple hyperparameters exist. Recently, parameter-free methods that adjust the hyperparameters on the fly have been studied. Ho… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2405.14650  [pdf, other

    cs.LG

    PhiNets: Brain-inspired Non-contrastive Learning Based on Temporal Prediction Hypothesis

    Authors: Satoki Ishikawa, Makoto Yamada, Han Bao, Yuki Takezawa

    Abstract: SimSiam is a prominent self-supervised learning method that achieves impressive results in various vision tasks under static environments. However, it has two critical issues: high sensitivity to hyperparameters, especially weight decay, and unsatisfactory performance in online and continual learning, where neuroscientists believe that powerful memory functions are necessary, as in brains. In this… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  3. arXiv:2310.10143  [pdf, other

    stat.ML cs.LG

    An Empirical Study of Self-supervised Learning with Wasserstein Distance

    Authors: Makoto Yamada, Yuki Takezawa, Guillaume Houry, Kira Michaela Dusterwald, Deborah Sulem, Han Zhao, Yao-Hung Hubert Tsai

    Abstract: In this study, we delve into the problem of self-supervised learning (SSL) utilizing the 1-Wasserstein distance on a tree structure (a.k.a., Tree-Wasserstein distance (TWD)), where TWD is defined as the L1 distance between two tree-embedded vectors. In SSL methods, the cosine similarity is often utilized as an objective function; however, it has not been well studied when utilizing the Wasserstein… ▽ More

    Submitted 5 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  4. arXiv:2310.08920  [pdf, other

    cs.LG cs.AI cs.CR

    Embarrassingly Simple Text Watermarks

    Authors: Ryoma Sato, Yuki Takezawa, Han Bao, Kenta Niwa, Makoto Yamada

    Abstract: We propose Easymark, a family of embarrassingly simple yet effective watermarks. Text watermarking is becoming increasingly important with the advent of Large Language Models (LLM). LLMs can generate texts that cannot be distinguished from human-written texts. This is a serious problem for the credibility of the text. Easymark is a simple yet effective solution to this problem. Easymark can inject… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  5. arXiv:2310.00833  [pdf, other

    cs.CL cs.LG

    Necessary and Sufficient Watermark for Large Language Models

    Authors: Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada

    Abstract: In recent years, large language models (LLMs) have achieved remarkable performances in various NLP tasks. They can generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes, such as generating fake news articles. Therefore, it is necessary to develop methods for distinguishing texts written… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  6. arXiv:2305.11420  [pdf, other

    cs.LG cs.DC stat.ML

    Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence

    Authors: Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada

    Abstract: Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponen… ▽ More

    Submitted 15 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  7. arXiv:2209.15505  [pdf, other

    cs.LG

    Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data

    Authors: Yuki Takezawa, Han Bao, Kenta Niwa, Ryoma Sato, Makoto Yamada

    Abstract: SGD with momentum is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum is Distributed SGD (DSGD) with momentum (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momen… ▽ More

    Submitted 24 September, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Transactions on Machine Learning Research 2023

  8. arXiv:2206.12116  [pdf, other

    stat.ML cs.AI cs.LG

    Approximating 1-Wasserstein Distance with Trees

    Authors: Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, Sujith Ravi

    Abstract: Wasserstein distance, which measures the discrepancy between distributions, shows efficacy in various types of natural language processing (NLP) and computer vision (CV) applications. One of the challenges in estimating Wasserstein distance is that it is computationally expensive and does not scale well for many distribution comparison tasks. In this paper, we aim to approximate the 1-Wasserstein… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  9. arXiv:2205.11979  [pdf, other

    math.OC cs.LG

    Theoretical Analysis of Primal-Dual Algorithm for Non-Convex Stochastic Decentralized Optimization

    Authors: Yuki Takezawa, Kenta Niwa, Makoto Yamada

    Abstract: In recent years, decentralized learning has emerged as a powerful tool not only for large-scale machine learning, but also for preserving privacy. One of the key challenges in decentralized learning is that the data distribution held by each node is statistically heterogeneous. To address this challenge, the primal-dual algorithm called the Edge-Consensus Learning (ECL) was proposed and was experi… ▽ More

    Submitted 22 September, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

  10. arXiv:2205.03779  [pdf, other

    cs.LG

    Communication Compression for Decentralized Learning with Operator Splitting Methods

    Authors: Yuki Takezawa, Kenta Niwa, Makoto Yamada

    Abstract: In decentralized learning, operator splitting methods using a primal-dual formulation (e.g., the Edge-Consensus Learning (ECL)) has been shown to be robust to heterogeneous data and has attracted significant attention in recent years. However, in the ECL, a node needs to exchange dual variables with its neighbors. These exchanges incur significant communication costs. For the Gossip-based algorith… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

  11. arXiv:2110.07031  [pdf, other

    cs.AI cs.CL cs.HC

    Improving the Robustness to Variations of Objects and Instructions with a Neuro-Symbolic Approach for Interactive Instruction Following

    Authors: Kazutoshi Shinoda, Yuki Takezawa, Masahiro Suzuki, Yusuke Iwasawa, Yutaka Matsuo

    Abstract: An interactive instruction following task has been proposed as a benchmark for learning to map natural language instructions and first-person vision into sequences of actions to interact with objects in 3D environments. We found that an existing end-to-end neural model for this task tends to fail to interact with objects of unseen attributes and follow various instructions. We assume that this pro… ▽ More

    Submitted 15 November, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Accepted to the 29th International Conference on MultiMedia Modeling (MMM 2023)

  12. arXiv:2109.03431  [pdf, other

    cs.AI cs.LG

    Fixed Support Tree-Sliced Wasserstein Barycenter

    Authors: Yuki Takezawa, Ryoma Sato, Zornitsa Kozareva, Sujith Ravi, Makoto Yamada

    Abstract: The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the t… ▽ More

    Submitted 11 February, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: AISTATS 2022

  13. arXiv:2101.11520  [pdf, other

    cs.LG stat.ML

    Supervised Tree-Wasserstein Distance

    Authors: Yuki Takezawa, Ryoma Sato, Makoto Yamada

    Abstract: To measure the similarity of documents, the Wasserstein distance is a powerful tool, but it requires a high computational cost. Recently, for fast computation of the Wasserstein distance, methods for approximating the Wasserstein distance using a tree metric have been proposed. These tree-based methods allow fast comparisons of a large number of documents; however, they are unsupervised and do not… ▽ More

    Submitted 23 July, 2021; v1 submitted 27 January, 2021; originally announced January 2021.