Search | arXiv e-print repository

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (57 additional authors not shown)

Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp. For the latest activities, visit https://llm-jp.nii.ac.jp/en/. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2303.08909 [pdf, other]

doi 10.1007/978-3-031-44223-0_6

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Authors: Takuya Kanazawa, Chetan Gupta

Abstract: Sequential decision making in the real world often requires finding a good balance of conflicting objectives. In general, there exist a plethora of Pareto-optimal policies that embody different patterns of compromises between objectives, and it is technically challenging to obtain them exhaustively using deep neural networks. In this work, we propose a novel multi-objective reinforcement learning… ▽ More Sequential decision making in the real world often requires finding a good balance of conflicting objectives. In general, there exist a plethora of Pareto-optimal policies that embody different patterns of compromises between objectives, and it is technically challenging to obtain them exhaustively using deep neural networks. In this work, we propose a novel multi-objective reinforcement learning (MORL) algorithm that trains a single neural network via policy gradient to approximately obtain the entire Pareto set in a single run of training, without relying on linear scalarization of objectives. The proposed method works in both continuous and discrete action spaces with no design change of the policy network. Numerical experiments in benchmark environments demonstrate the practicality and efficacy of our approach in comparison to standard MORL baselines. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 23 pages, 16 figures

Journal ref: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning -- ICANN 2023. Lecture Notes in Computer Science, vol 14259, pp 63--76. Springer, Cham

arXiv:2209.08418 [pdf, other]

doi 10.5220/0011546800003332

Sample-based Uncertainty Quantification with a Single Deterministic Neural Network

Authors: Takuya Kanazawa, Chetan Gupta

Abstract: Development of an accurate, flexible, and numerically efficient uncertainty quantification (UQ) method is one of fundamental challenges in machine learning. Previously, a UQ method called DISCO Nets has been proposed (Bouchacourt et al., 2016), which trains a neural network by minimizing the energy score. In this method, a random noise vector in $\mathbb{R}^{10\text{--}100}$ is concatenated with t… ▽ More Development of an accurate, flexible, and numerically efficient uncertainty quantification (UQ) method is one of fundamental challenges in machine learning. Previously, a UQ method called DISCO Nets has been proposed (Bouchacourt et al., 2016), which trains a neural network by minimizing the energy score. In this method, a random noise vector in $\mathbb{R}^{10\text{--}100}$ is concatenated with the original input vector in order to produce a diverse ensemble forecast despite using a single neural network. While this method has shown promising performance on a hand pose estimation task in computer vision, it remained unexplored whether this method works as nicely for regression on tabular data, and how it competes with more recent advanced UQ methods such as NGBoost. In this paper, we propose an improved neural architecture of DISCO Nets that admits faster and more stable training while only using a compact noise vector of dimension $\sim \mathcal{O}(1)$. We benchmark this approach on miscellaneous real-world tabular datasets and confirm that it is competitive with or even superior to standard UQ baselines. Moreover we observe that it exhibits better point forecast performance than a neural network of the same size trained with the conventional mean squared error. As another advantage of the proposed method, we show that local feature importance computation methods such as SHAP can be easily applied to any subregion of the predictive distribution. A new elementary proof for the validity of using the energy score to learn predictive distributions is also provided. △ Less

Submitted 3 November, 2022; v1 submitted 17 September, 2022; originally announced September 2022.

Comments: 16 pages, 17 figures, 2 tables. Accepted by the 14th International Conference on Neural Computation Theory and Applications (NCTA 2022) held as part of IJCCI 2022, October 24-26, 2022, Valletta, Malta

Journal ref: Proceedings of the 14th International Joint Conference on Computational Intelligence - Volume 1: NCTA, 292-304, 2022 , Valletta, Malta

arXiv:2207.13730 [pdf, other]

doi 10.1109/IJCNN55064.2022.9892771

Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control

Authors: Takuya Kanazawa, Haiyan Wang, Chetan Gupta

Abstract: Uncertainty quantification is one of the central challenges for machine learning in real-world applications. In reinforcement learning, an agent confronts two kinds of uncertainty, called epistemic uncertainty and aleatoric uncertainty. Disentangling and evaluating these uncertainties simultaneously stands a chance of improving the agent's final performance, accelerating training, and facilitating… ▽ More Uncertainty quantification is one of the central challenges for machine learning in real-world applications. In reinforcement learning, an agent confronts two kinds of uncertainty, called epistemic uncertainty and aleatoric uncertainty. Disentangling and evaluating these uncertainties simultaneously stands a chance of improving the agent's final performance, accelerating training, and facilitating quality assurance after deployment. In this work, we propose an uncertainty-aware reinforcement learning algorithm for continuous control tasks that extends the Deep Deterministic Policy Gradient algorithm (DDPG). It exploits epistemic uncertainty to accelerate exploration and aleatoric uncertainty to learn a risk-sensitive policy. We conduct numerical experiments showing that our variant of DDPG outperforms vanilla DDPG without uncertainty estimation in benchmark tasks on robotic control and power-grid optimization. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: 10 pages, 6 figures. Accepted to International Joint Conference on Neural Networks (IJCNN 2022), July 18-23, Padua, Italy

Journal ref: 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 2022, pp. 1-10

arXiv:2104.12363 [pdf, other]

doi 10.1109/IJCNN55064.2022.9892219

One-parameter family of acquisition functions for efficient global optimization

Authors: Takuya Kanazawa

Abstract: Bayesian optimization (BO) with Gaussian processes is a powerful methodology to optimize an expensive black-box function with as few function evaluations as possible. The expected improvement (EI) and probability of improvement (PI) are among the most widely used schemes for BO. There is a plethora of other schemes that outperform EI and PI, but most of them are numerically far more expensive than… ▽ More Bayesian optimization (BO) with Gaussian processes is a powerful methodology to optimize an expensive black-box function with as few function evaluations as possible. The expected improvement (EI) and probability of improvement (PI) are among the most widely used schemes for BO. There is a plethora of other schemes that outperform EI and PI, but most of them are numerically far more expensive than EI and PI. In this work, we propose a new one-parameter family of acquisition functions for BO that unifies EI and PI. The proposed method is numerically inexpensive, is easy to implement, can be easily parallelized, and on benchmark tasks shows a performance superior to EI and GP-UCB. Its generalization to BO with Student-t processes is also presented. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: 13 pages, 6 figures. Accepted to IJCNN 2022

Journal ref: 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 2022, pp. 1-8

arXiv:2103.09434 [pdf, other]

Efficient Bayesian Optimization using Multiscale Graph Correlation

Authors: Takuya Kanazawa

Abstract: Bayesian optimization is a powerful tool to optimize a black-box function, the evaluation of which is time-consuming or costly. In this paper, we propose a new approach to Bayesian optimization called GP-MGC, which maximizes multiscale graph correlation with respect to the global maximum to determine the next query point. We present our evaluation of GP-MGC in applications involving both synthetic… ▽ More Bayesian optimization is a powerful tool to optimize a black-box function, the evaluation of which is time-consuming or costly. In this paper, we propose a new approach to Bayesian optimization called GP-MGC, which maximizes multiscale graph correlation with respect to the global maximum to determine the next query point. We present our evaluation of GP-MGC in applications involving both synthetic benchmark functions and real-world datasets and demonstrate that GP-MGC performs as well as or even better than state-of-the-art methods such as max-value entropy search and GP-UCB. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Comments: 12 pages, 2 figures

arXiv:2102.08993 [pdf, other]

Using Distance Correlation for Efficient Bayesian Optimization

Authors: Takuya Kanazawa

Abstract: We propose a novel approach for Bayesian optimization, called $\textsf{GP-DC}$, which combines Gaussian processes with distance correlation. It balances exploration and exploitation automatically, and requires no manual parameter tuning. We evaluate $\textsf{GP-DC}$ on a number of benchmark functions and observe that it outperforms state-of-the-art methods such as $\textsf{GP-UCB}$ and max-value e… ▽ More We propose a novel approach for Bayesian optimization, called $\textsf{GP-DC}$, which combines Gaussian processes with distance correlation. It balances exploration and exploitation automatically, and requires no manual parameter tuning. We evaluate $\textsf{GP-DC}$ on a number of benchmark functions and observe that it outperforms state-of-the-art methods such as $\textsf{GP-UCB}$ and max-value entropy search, as well as the classical expected improvement heuristic. We also apply $\textsf{GP-DC}$ to optimize sequential integral observations with a variable integration range and verify its empirical efficiency on both synthetic and real-world datasets. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: 10 pages

arXiv:1908.09102 [pdf, other]

doi 10.1088/2515-7639/ab3c45

Accelerating small-angle scattering experiments with simulation-based machine learning

Authors: Takuya Kanazawa, Akinori Asahara, Hidekazu Morita

Abstract: Making material experiments more efficient is a high priority for materials scientists who seek to discover new materials with desirable properties. In this paper, we investigate how to optimize the laborious sequential measurements of materials properties with data-driven methods, taking the small-angle neutron scattering (SANS) experiment as a test case. We propose two methods for optimizing seq… ▽ More Making material experiments more efficient is a high priority for materials scientists who seek to discover new materials with desirable properties. In this paper, we investigate how to optimize the laborious sequential measurements of materials properties with data-driven methods, taking the small-angle neutron scattering (SANS) experiment as a test case. We propose two methods for optimizing sequential data sampling. These methods iteratively suggest the best target for the next measurement by performing a statistical analysis of the already acquired data, so that maximal information is gained at each step of an experiment. We conducted numerical simulations of SANS experiments for virtual materials and confirmed that the proposed methods significantly outperform baselines. △ Less

Submitted 24 August, 2019; originally announced August 2019.

Comments: 19 pages, 9 figures. Accepted for publication in Journal of Physics: Materials

Journal ref: J. Phys. Mater. 3 (2019) 015001

arXiv:1312.6916 [pdf, ps, other]

doi 10.1109/CDC.2013.6760408

Game Theoretic Approach to the Stabilization of Heterogeneous Multiagent Systems Using Subsidy

Authors: Takuya Morimoto, Takafumi Kanazawa, Toshimitsu Ushio

Abstract: We consider a multiagent system consisting of selfish and heterogeneous agents. Its behavior is modeled by multipopulation replicator dynamics, where payoff functions of populations are different from each other. In general, there exist several equilibrium points in the replicator dynamics. In order to stabilize a desirable equilibrium point, we introduce a controller called a government which con… ▽ More We consider a multiagent system consisting of selfish and heterogeneous agents. Its behavior is modeled by multipopulation replicator dynamics, where payoff functions of populations are different from each other. In general, there exist several equilibrium points in the replicator dynamics. In order to stabilize a desirable equilibrium point, we introduce a controller called a government which controls the behaviors of agents by offering them subsidies. In previous work, it is assumed that the government determines the subsidies based on the populations the agents belong to. In general, however, the government cannot identify the members of each population. In this paper, we assume that the government observes the action of each agent and determines the subsidies based on the observed action profile. Then, we model the controlled behaviors of the agents using replicator dynamics with feedback. We derive a stabilization condition of the target equilibrium point in the replicator dynamics. △ Less

Submitted 24 December, 2013; originally announced December 2013.

Comments: 6 pages, IEEE Conference on Decision and Control, 2013

Showing 1–9 of 9 results for author: Kanazawa, T