-
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Authors:
LLM-jp,
:,
Akiko Aizawa,
Eiji Aramaki,
Bowen Chen,
Fei Cheng,
Hiroyuki Deguchi,
Rintaro Enomoto,
Kazuki Fujii,
Kensuke Fukumoto,
Takuya Fukushima,
Namgi Han,
Yuto Harada,
Chikara Hashimoto,
Tatsuya Hiraoka,
Shohei Hisada,
Sosuke Hosokawa,
Lu Jie,
Keisuke Kamata,
Teruhito Kanazawa,
Hiroki Kanezashi,
Hiroshi Kataoka,
Satoru Katsumata,
Daisuke Kawahara,
Seiya Kawano
, et al. (57 additional authors not shown)
Abstract:
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its…
▽ More
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp. For the latest activities, visit https://llm-jp.nii.ac.jp/en/.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Compositionality-Aware Graph2Seq Learning
Authors:
Takeshi D. Itoh,
Takatomi Kubo,
Kazushi Ikeda
Abstract:
Graphs are a highly expressive data structure, but it is often difficult for humans to find patterns from a complex graph. Hence, generating human-interpretable sequences from graphs have gained interest, called graph2seq learning. It is expected that the compositionality in a graph can be associated to the compositionality in the output sequence in many graph2seq tasks. Therefore, applying compos…
▽ More
Graphs are a highly expressive data structure, but it is often difficult for humans to find patterns from a complex graph. Hence, generating human-interpretable sequences from graphs have gained interest, called graph2seq learning. It is expected that the compositionality in a graph can be associated to the compositionality in the output sequence in many graph2seq tasks. Therefore, applying compositionality-aware GNN architecture would improve the model performance. In this study, we adopt the multi-level attention pooling (MLAP) architecture, that can aggregate graph representations from multiple levels of information localities. As a real-world example, we take up the extreme source code summarization task, where a model estimate the name of a program function from its source code. We demonstrate that the model having the MLAP architecture outperform the previous state-of-the-art model with more than seven times fewer parameters than it.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Mutualized oblivious DNS ($μ$ODNS): Hiding a tree in the wild forest
Authors:
Jun Kurihara,
Takeshi Kubo
Abstract:
The traditional Domain Name System (DNS) lacks fundamental features of security and privacy in its design. As concerns of privacy increased on the Internet, security and privacy enhancements of DNS have been actively investigated and deployed. Specially for user's privacy in DNS queries, several relay-based anonymization schemes have been recently introduced, however, they are vulnerable to the co…
▽ More
The traditional Domain Name System (DNS) lacks fundamental features of security and privacy in its design. As concerns of privacy increased on the Internet, security and privacy enhancements of DNS have been actively investigated and deployed. Specially for user's privacy in DNS queries, several relay-based anonymization schemes have been recently introduced, however, they are vulnerable to the collusion of a relay with a full-service resolver, i.e., identities of users cannot be hidden to the resolver. This paper introduces a new concept of a multiple-relay-based DNS for user anonymity in DNS queries, called the mutualized oblivious DNS ($μ$ODNS), by extending the concept of existing relay-based schemes. The $μ$ODNS introduces a small and reasonable assumption that each user has at least one trusted/dedicated relay in a network and mutually shares the dedicated one with others. The user just sets the dedicated one as his next-hop, first relay, conveying his queries to the resolver, and randomly chooses its $0$ or more subsequent relays shared by other entities. Under this small assumption, the user's identity is concealed to a target resolver in the $μ$ODNS even if a certain (unknown) subset of relays collude with the resolver. That is, in $μ$ODNS, users can preserve their privacy and anonymity just by paying a small cost of sharing its resource. Moreover, we present a PoC implementation of $μ$ODNS that is publicly available on the Internet. We also show that by measurement of round-trip-time for queries, and our PoC implementation of $μ$ODNS achieves the performance comparable to existing relay-based schemes.
△ Less
Submitted 7 June, 2021; v1 submitted 28 April, 2021;
originally announced April 2021.
-
Multi-Level Attention Pooling for Graph Neural Networks: Unifying Graph Representations with Multiple Localities
Authors:
Takeshi D. Itoh,
Takatomi Kubo,
Kazushi Ikeda
Abstract:
Graph neural networks (GNNs) have been widely used to learn vector representation of graph-structured data and achieved better task performance than conventional methods. The foundation of GNNs is the message passing procedure, which propagates the information in a node to its neighbors. Since this procedure proceeds one step per layer, the range of the information propagation among nodes is small…
▽ More
Graph neural networks (GNNs) have been widely used to learn vector representation of graph-structured data and achieved better task performance than conventional methods. The foundation of GNNs is the message passing procedure, which propagates the information in a node to its neighbors. Since this procedure proceeds one step per layer, the range of the information propagation among nodes is small in the lower layers, and it expands toward the higher layers. Therefore, a GNN model has to be deep enough to capture global structural information in a graph. On the other hand, it is known that deep GNN models suffer from performance degradation because they lose nodes' local information, which would be essential for good model performance, through many message passing steps. In this study, we propose multi-level attention pooling (MLAP) for graph-level classification tasks, which can adapt to both local and global structural information in a graph. It has an attention pooling layer for each message passing step and computes the final graph representation by unifying the layer-wise graph representations. The MLAP architecture allows models to utilize the structural information of graphs with multiple levels of localities because it preserves layer-wise information before losing them due to oversmoothing. Results of our experiments show that the MLAP architecture improves the graph classification performance compared to the baseline architectures. In addition, analyses on the layer-wise graph representations suggest that aggregating information from multiple levels of localities indeed has the potential to improve the discriminability of learned graph representations.
△ Less
Submitted 31 October, 2021; v1 submitted 2 March, 2021;
originally announced March 2021.
-
Evaluation of User Dynamics Created by Weak Ties among Divided Communities
Authors:
Takahiro Kubo,
Chisa Takano,
Masaki Aida
Abstract:
Flaming phenomena represent the divergence in the strength of user dynamics as created by user interactions in online social networks (OSNs). Although it has been known that flaming phenomena occur when the Laplacian matrix of the OSN has non-real eigenvalues, it was recently shown that flaming phenomena may occur even if all the eigenvalues are real numbers. This effect appears only in the situat…
▽ More
Flaming phenomena represent the divergence in the strength of user dynamics as created by user interactions in online social networks (OSNs). Although it has been known that flaming phenomena occur when the Laplacian matrix of the OSN has non-real eigenvalues, it was recently shown that flaming phenomena may occur even if all the eigenvalues are real numbers. This effect appears only in the situation that some eigenvalues are degenerate, and a special unitary transformation is applied to the equations representing user dynamics; whether actual OSNs satisfy this condition has not been fully discussed. In this paper, we clarify that the user dynamics caused by the degeneration of eigenvalue 0 is one specific example of the above condition. We also investigate the mechanism and characteristics of flaming phenomena generated by degenerated eigenvalues. Furthermore, we demonstrate through numerical simulations that the degeneration of eigenvalues can cause divergence.
△ Less
Submitted 11 January, 2021;
originally announced January 2021.
-
Detecting Unknown Behaviors by Pre-defined Behaviours: An Bayesian Non-parametric Approach
Authors:
Jin Watanabe,
Takatomi Kubo,
Fan Yang,
Kazushi Ikeda
Abstract:
An automatic mouse behavior recognition system can considerably reduce the workload of experimenters and facilitate the analysis process. Typically, supervised approaches, unsupervised approaches and semi-supervised approaches are applied for behavior recognition purpose under a setting which has all of predefined behaviors. In the real situation, however, as mouses can show various types of behav…
▽ More
An automatic mouse behavior recognition system can considerably reduce the workload of experimenters and facilitate the analysis process. Typically, supervised approaches, unsupervised approaches and semi-supervised approaches are applied for behavior recognition purpose under a setting which has all of predefined behaviors. In the real situation, however, as mouses can show various types of behaviors, besides the predefined behaviors that we want to analyze, there are many undefined behaviors existing. Both supervised approaches and conventional semi-supervised approaches cannot identify these undefined behaviors. Though unsupervised approaches can detect these undefined behaviors, a post-hoc labeling is needed. In this paper, we propose a semi-supervised infinite Gaussian mixture model (SsIGMM), to incorporate both labeled and unlabelled information in learning process while considering undefined behaviors. It also generates the distribution of the predefined and undefined behaviors by mixture Gaussians, which can be used for further analysis. In our experiments, we confirmed the superiority of SsIGMM for segmenting and labelling mouse-behavior videos.
△ Less
Submitted 11 December, 2019; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Analyzing Insect-Plant Predation Data By Bayesian Nonparametrics
Authors:
Fan Yang,
Takatomi Kubo,
Kazushi Ikeda
Abstract:
In the prospect of ecology and biology, studying insect-plant predation will considerably contribute to pest control, benefit agriculture and afforestation, and also help people to better understand insect-plant co-evolution. Therefore, we are motivated to do two work in this study. The first part is to cluster the insect-plant predation, in such manner, unobserved predation could be estimated. Th…
▽ More
In the prospect of ecology and biology, studying insect-plant predation will considerably contribute to pest control, benefit agriculture and afforestation, and also help people to better understand insect-plant co-evolution. Therefore, we are motivated to do two work in this study. The first part is to cluster the insect-plant predation, in such manner, unobserved predation could be estimated. The second part is to explore the connection between predation and bio-taxonomy, and we find insects get more divergence than plants during the insect-plant co-evolution.
△ Less
Submitted 11 December, 2019; v1 submitted 25 November, 2019;
originally announced November 2019.
-
A Hierarchical Mixture Density Network
Authors:
Fan Yang,
Jaymar Soriano,
Takatomi Kubo,
Kazushi Ikeda
Abstract:
The relationship among three correlated variables could be very sophisticated, as a result, we may not be able to find their hidden causality and model their relationship explicitly. However, we still can make our best guess for possible mappings among these variables, based on the observed relationship. One of the complicated relationships among three correlated variables could be a two-layer hie…
▽ More
The relationship among three correlated variables could be very sophisticated, as a result, we may not be able to find their hidden causality and model their relationship explicitly. However, we still can make our best guess for possible mappings among these variables, based on the observed relationship. One of the complicated relationships among three correlated variables could be a two-layer hierarchical many-to-many mapping. In this paper, we proposed a Hierarchical Mixture Density Network (HMDN) to model the two-layer hierarchical many-to-many mapping. We apply HMDN on an indoor positioning problem and show its benefit.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
Towards Generation of Visual Attention Map for Source Code
Authors:
Takeshi D. Itoh,
Takatomi Kubo,
Kiyoka Ikeda,
Yuki Maruno,
Yoshiharu Ikutani,
Hideaki Hata,
Kenichi Matsumoto,
Kazushi Ikeda
Abstract:
Program comprehension is a dominant process in software development and maintenance. Experts are considered to comprehend the source code efficiently by directing their gaze, or attention, to important components in it. However, reflecting the importance of components is still a remaining issue in gaze behavior analysis for source code comprehension. Here we show a conceptual framework to compare…
▽ More
Program comprehension is a dominant process in software development and maintenance. Experts are considered to comprehend the source code efficiently by directing their gaze, or attention, to important components in it. However, reflecting the importance of components is still a remaining issue in gaze behavior analysis for source code comprehension. Here we show a conceptual framework to compare the quantified importance of source code components with the gaze behavior of programmers. We use "attention" in attention models (e.g., code2vec) as the importance indices for source code components and evaluate programmers' gaze locations based on the quantified importance. In this report, we introduce the idea of our gaze behavior analysis using the attention map, and the results of a preliminary experiment.
△ Less
Submitted 13 August, 2019; v1 submitted 14 July, 2019;
originally announced July 2019.
-
Toward Imitating Visual Attention of Experts in Software Development Tasks
Authors:
Yoshiharu Ikutani,
Nishanth Koganti,
Hideaki Hata,
Takatomi Kubo,
Kenichi Matsumoto
Abstract:
Expert programmers' eye-movements during source code reading are valuable sources that are considered to be associated with their domain expertise. We advocate a vision of new intelligent systems incorporating expertise of experts for software development tasks, such as issue localization, comment generation, and code generation. We present a conceptual framework of neural autonomous agents based…
▽ More
Expert programmers' eye-movements during source code reading are valuable sources that are considered to be associated with their domain expertise. We advocate a vision of new intelligent systems incorporating expertise of experts for software development tasks, such as issue localization, comment generation, and code generation. We present a conceptual framework of neural autonomous agents based on imitation learning (IL), which enables agents to mimic the visual attention of an expert via his/her eye movement. In this framework, an autonomous agent is constructed as a context-based attention model that consists of encoder/decoder network and trained with state-action sequences generated by an experts' demonstration. Challenges to implement an IL-based autonomous agent specialized for software development task are discussed in this paper.
△ Less
Submitted 14 March, 2019;
originally announced March 2019.
-
A Fundamental Inequality for Lower-bounding the Error Probability for Classical and Quantum Multiple Access Channels and Its Applications
Authors:
Takuya Kubo,
Hiroshi Nagaoka
Abstract:
In the study of the capacity problem for multiple access channels (MACs), a lower bound on the error probability obtained by Han plays a crucial role in the converse parts of several kinds of channel coding theorems in the information-spectrum framework. Recently, Yagi and Oohama showed a tighter bound than the Han bound by means of Polyanskiy's converse. In this paper, we give a new bound which g…
▽ More
In the study of the capacity problem for multiple access channels (MACs), a lower bound on the error probability obtained by Han plays a crucial role in the converse parts of several kinds of channel coding theorems in the information-spectrum framework. Recently, Yagi and Oohama showed a tighter bound than the Han bound by means of Polyanskiy's converse. In this paper, we give a new bound which generalizes and strengthens the Yagi-Oohama bound, and demonstrate that the bound plays a fundamental role in deriving extensions of several known bounds. In particular, the Yagi-Oohama bound is generalized to two different directions; i.e, to general input distributions and to general encoders. In addition we extend these bounds to the quantum MACs and apply them to the converse problems for several information-spectrum settings.
△ Less
Submitted 24 March, 2015;
originally announced March 2015.
-
Motivating Smartphone Collaboration in Data Acquisition and Distributed Computing
Authors:
Lingjie Duan,
Takeshi Kubo,
Kohei Sugiyama,
Jianwei Huang,
Teruyuki Hasegawa,
Jean Walrand
Abstract:
This paper analyzes and compares different incentive mechanisms for a master to motivate the collaboration of smartphone users on both data acquisition and distributed computing applications. To collect massive sensitive data from users, we propose a reward-based collaboration mechanism, where the master announces a total reward to be shared among collaborators, and the collaboration is successful…
▽ More
This paper analyzes and compares different incentive mechanisms for a master to motivate the collaboration of smartphone users on both data acquisition and distributed computing applications. To collect massive sensitive data from users, we propose a reward-based collaboration mechanism, where the master announces a total reward to be shared among collaborators, and the collaboration is successful if there are enough users wanting to collaborate. We show that if the master knows the users' collaboration costs, then he can choose to involve only users with the lowest costs. However, without knowing users' private information, then he needs to offer a larger total reward to attract enough collaborators. Users will benefit from knowing their costs before the data acquisition. Perhaps surprisingly, the master may benefit as the variance of users' cost distribution increases.
To utilize smartphones' computation resources to solve complex computing problems, we study how the master can design an optimal contract by specifying different task-reward combinations for different user types. Under complete information, we show that the master involves a user type as long as the master's preference characteristic outweighs that type's unit cost. All collaborators achieve a zero payoff in this case. If the master does not know users' private cost information, however, he will conservatively target at a smaller group of users with small costs, and has to give most benefits to the collaborators.
△ Less
Submitted 26 January, 2014;
originally announced January 2014.