-
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Authors:
LLM-jp,
:,
Akiko Aizawa,
Eiji Aramaki,
Bowen Chen,
Fei Cheng,
Hiroyuki Deguchi,
Rintaro Enomoto,
Kazuki Fujii,
Kensuke Fukumoto,
Takuya Fukushima,
Namgi Han,
Yuto Harada,
Chikara Hashimoto,
Tatsuya Hiraoka,
Shohei Hisada,
Sosuke Hosokawa,
Lu Jie,
Keisuke Kamata,
Teruhito Kanazawa,
Hiroki Kanezashi,
Hiroshi Kataoka,
Satoru Katsumata,
Daisuke Kawahara,
Seiya Kawano
, et al. (57 additional authors not shown)
Abstract:
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its…
▽ More
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp. For the latest activities, visit https://llm-jp.nii.ac.jp/en/.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
Authors:
Jesse Atuhurra,
Iqra Ali,
Tatsuya Hiraoka,
Hidetaka Kamigaito,
Tomoya Iwakura,
Taro Watanabe
Abstract:
Large language models (LLMs) have increased interest in vision language models (VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, but such studies are still preliminary because existing datasets do not permit a comprehensive evaluation of the fine-grained visual linguistic abilities of VLMs across multiple languages.…
▽ More
Large language models (LLMs) have increased interest in vision language models (VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, but such studies are still preliminary because existing datasets do not permit a comprehensive evaluation of the fine-grained visual linguistic abilities of VLMs across multiple languages. To further explore the strengths of VLMs, such as GPT-4V \cite{openai2023GPT4}, we developed new datasets for the systematic and qualitative analysis of VLMs. Our contribution is four-fold: 1) we introduced nine vision-and-language (VL) tasks (including object recognition, image-text matching, and more) and constructed multilingual visual-text datasets in four languages: English, Japanese, Swahili, and Urdu through utilizing templates containing \textit{questions} and prompting GPT4-V to generate the \textit{answers} and the \textit{rationales}, 2) introduced a new VL task named \textit{unrelatedness}, 3) introduced rationales to enable human understanding of the VLM reasoning process, and 4) employed human evaluation to measure the suitability of proposed datasets for VL tasks. We show that VLMs can be fine-tuned on our datasets. Our work is the first to conduct such analyses in Swahili and Urdu. Also, it introduces \textit{rationales} in VL analysis, which played a vital role in the evaluation.
△ Less
Submitted 29 March, 2024;
originally announced June 2024.
-
Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
Authors:
Takuya Hiraoka,
Guanquan Wang,
Takashi Onishi,
Yoshimasa Tsuruoka
Abstract:
In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence of these experiences is valuable for various purposes, such as identifying experiences that negatively influence poorly performing RL agents. One method for estimating the influence of experiences is the leave-one-out (LOO) method. Howev…
▽ More
In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence of these experiences is valuable for various purposes, such as identifying experiences that negatively influence poorly performing RL agents. One method for estimating the influence of experiences is the leave-one-out (LOO) method. However, this method is usually computationally prohibitive. In this paper, we present Policy Iteration with Turn-over Dropout (PIToD), which efficiently estimates the influence of experiences. We evaluate how accurately PIToD estimates the influence of experiences and its efficiency compared to LOO. We then apply PIToD to amend poorly performing RL agents, i.e., we use PIToD to estimate negatively influential experiences for the RL agents and to delete the influence of these experiences. We show that RL agents' performance is significantly improved via amendments with PIToD.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
Authors:
Marco Cognetta,
Tatsuya Hiraoka,
Naoaki Okazaki,
Rico Sennrich,
Yuval Pinter
Abstract:
We explore threshold vocabulary trimming in Byte-Pair Encoding subword tokenization, a postprocessing step that replaces rare subwords with their component subwords. The technique is available in popular tokenization libraries but has not been subjected to rigorous scientific scrutiny. While the removal of rare subwords is suggested as best practice in machine translation implementations, both as…
▽ More
We explore threshold vocabulary trimming in Byte-Pair Encoding subword tokenization, a postprocessing step that replaces rare subwords with their component subwords. The technique is available in popular tokenization libraries but has not been subjected to rigorous scientific scrutiny. While the removal of rare subwords is suggested as best practice in machine translation implementations, both as a means to reduce model size and for improving model performance through robustness, our experiments indicate that, across a large space of hyperparameter settings, vocabulary trimming fails to improve performance, and is even prone to incurring heavy degradation.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Knowledge of Pretrained Language Models on Surface Information of Tokens
Authors:
Tatsuya Hiraoka,
Naoaki Okazaki
Abstract:
Do pretrained language models have knowledge regarding the surface information of tokens? We examined the surface information stored in word or subword embeddings acquired by pretrained language models from the perspectives of token length, substrings, and token constitution. Additionally, we evaluated the ability of models to generate knowledge regarding token surfaces. We focused on 12 pretraine…
▽ More
Do pretrained language models have knowledge regarding the surface information of tokens? We examined the surface information stored in word or subword embeddings acquired by pretrained language models from the perspectives of token length, substrings, and token constitution. Additionally, we evaluated the ability of models to generate knowledge regarding token surfaces. We focused on 12 pretrained language models that were mainly trained on English and Japanese corpora. Experimental results demonstrate that pretrained language models have knowledge regarding token length and substrings but not token constitution. Additionally, the results imply that there is a bottleneck on the decoder side in terms of effectively utilizing acquired knowledge.
△ Less
Submitted 22 February, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization
Authors:
Takuya Hiraoka
Abstract:
Reinforcement learning (RL) methods with a high replay ratio (RR) and regularization have gained interest due to their superior sample efficiency. However, these methods have mainly been developed for dense-reward tasks. In this paper, we aim to extend these RL methods to sparse-reward goal-conditioned tasks. We use Randomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021), an RL method wit…
▽ More
Reinforcement learning (RL) methods with a high replay ratio (RR) and regularization have gained interest due to their superior sample efficiency. However, these methods have mainly been developed for dense-reward tasks. In this paper, we aim to extend these RL methods to sparse-reward goal-conditioned tasks. We use Randomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021), an RL method with a high RR and regularization. To apply REDQ to sparse-reward goal-conditioned tasks, we make the following modifications to it: (i) using hindsight experience replay and (ii) bounding target Q-values. We evaluate REDQ with these modifications on 12 sparse-reward goal-conditioned tasks of Robotics (Plappert et al., 2018), and show that it achieves about $2 \times$ better sample efficiency than previous state-of-the-art (SoTA) RL methods. Furthermore, we reconsider the necessity of specific components of REDQ and simplify it by removing unnecessary ones. The simplified REDQ with our modifications achieves $\sim 8 \times$ better sample efficiency than the SoTA methods in 4 Fetch tasks of Robotics.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation
Authors:
Annan Tang,
Takuma Hiraoka,
Naoki Hiraoka,
Fan Shi,
Kento Kawaharazuka,
Kunio Kojima,
Kei Okada,
Masayuki Inaba
Abstract:
Transferring human motion skills to humanoid robots remains a significant challenge. In this study, we introduce a Wasserstein adversarial imitation learning system, allowing humanoid robots to replicate natural whole-body locomotion patterns and execute seamless transitions by mimicking human motions. First, we present a unified primitive-skeleton motion retargeting to mitigate morphological diff…
▽ More
Transferring human motion skills to humanoid robots remains a significant challenge. In this study, we introduce a Wasserstein adversarial imitation learning system, allowing humanoid robots to replicate natural whole-body locomotion patterns and execute seamless transitions by mimicking human motions. First, we present a unified primitive-skeleton motion retargeting to mitigate morphological differences between arbitrary human demonstrators and humanoid robots. An adversarial critic component is integrated with Reinforcement Learning (RL) to guide the control policy to produce behaviors aligned with the data distribution of mixed reference motions. Additionally, we employ a specific Integral Probabilistic Metric (IPM), namely the Wasserstein-1 distance with a novel soft boundary constraint to stabilize the training process and prevent mode collapse. Our system is evaluated on a full-sized humanoid JAXON in the simulator. The resulting control policy demonstrates a wide range of locomotion patterns, including standing, push-recovery, squat walking, human-like straight-leg walking, and dynamic running. Notably, even in the absence of transition motions in the demonstration dataset, robots showcase an emerging ability to transit naturally between distinct locomotion patterns as desired speed changes.
△ Less
Submitted 23 April, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Enhancing the Driver's Comprehension of ADS's System Limitations: An HMI for Providing Request-to-Intervene Trigger Information
Authors:
Ryuji Matsuo,
Hailong Liu,
Toshihiro Hiraoka,
Takahiro Wada
Abstract:
Level 3 automated driving systems (ADS) have attracted significant attention and are being commercialized. A Level 3 ADS prompts the driver to take control by requesting to intervene (RtI) when its operational design domain (ODD) or system limitations are exceeded. However, complex traffic situations may lead drivers to perceive multiple potential triggers of RtI simultaneously, causing hesitation…
▽ More
Level 3 automated driving systems (ADS) have attracted significant attention and are being commercialized. A Level 3 ADS prompts the driver to take control by requesting to intervene (RtI) when its operational design domain (ODD) or system limitations are exceeded. However, complex traffic situations may lead drivers to perceive multiple potential triggers of RtI simultaneously, causing hesitation or confusion during take-over. Therefore, drivers must clearly understand the ADS's system limitations to understand the triggers of RtI and ensure safe take-over. In this study, we propose a voice-based HMI for providing RtI trigger cues to help drivers understand ADS's system limitations. The results of a between-group experiment using a driving simulator showed that incorporating effective trigger cues into the RtI enabled drivers to comprehend the ADS's system limitations better and reduce collisions. It also improved the subjective evaluations of drivers, such as the comprehensibility of system limitations, hesitation in response to RtI, and acceptance of ADS behaviors when encountering RtI while using the ADS. Therefore, enhanced comprehension resulting from trigger cues is essential for promoting a safer and better user experience using ADS during RtI.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Unsupervised Discovery of Continuous Skills on a Sphere
Authors:
Takahisa Imagawa,
Takuya Hiraoka,
Yoshimasa Tsuruoka
Abstract:
Recently, methods for learning diverse skills to generate various behaviors without external rewards have been actively studied as a form of unsupervised reinforcement learning. However, most of the existing methods learn a finite number of discrete skills, and thus the variety of behaviors that can be exhibited with the learned skills is limited. In this paper, we propose a novel method for learn…
▽ More
Recently, methods for learning diverse skills to generate various behaviors without external rewards have been actively studied as a form of unsupervised reinforcement learning. However, most of the existing methods learn a finite number of discrete skills, and thus the variety of behaviors that can be exhibited with the learned skills is limited. In this paper, we propose a novel method for learning potentially an infinite number of different skills, which is named discovery of continuous skills on a sphere (DISCS). In DISCS, skills are learned by maximizing mutual information between skills and states, and each skill corresponds to a continuous value on a sphere. Because the representations of skills in DISCS are continuous, infinitely diverse skills could be learned. We examine existing methods and DISCS in the MuJoCo Ant robot control environments and show that DISCS can learn much more diverse skills than the other methods.
△ Less
Submitted 25 May, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Tokenization Preference for Human and Machine Learning Model: An Annotation Study
Authors:
Tatsuya Hiraoka,
Tomoya Iwakura
Abstract:
Is preferred tokenization for humans also preferred for machine-learning (ML) models? This study examines the relations between preferred tokenization for humans (appropriateness and readability) and one for ML models (performance on an NLP task). The question texts of the Japanese commonsense question-answering dataset are tokenized with six different tokenizers, and the performances of human ann…
▽ More
Is preferred tokenization for humans also preferred for machine-learning (ML) models? This study examines the relations between preferred tokenization for humans (appropriateness and readability) and one for ML models (performance on an NLP task). The question texts of the Japanese commonsense question-answering dataset are tokenized with six different tokenizers, and the performances of human annotators and ML models were compared. Furthermore, we analyze relations among performance of answers by human and ML model, the appropriateness of tokenization for human, and response time to questions by human. This study provides a quantitative investigation result that shows that preferred tokenizations for humans and ML models are not necessarily always the same. The result also implies that existing methods using language models for tokenization could be a good compromise both for human and ML models.
△ Less
Submitted 16 February, 2024; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing
Authors:
Tatsuya Hiraoka,
Tomoya Iwakura
Abstract:
This paper proposes a method to optimize tokenization for the performance improvement of already trained downstream models. Our method generates tokenization results attaining lower loss values of a given downstream model on the training data for restricting vocabularies and trains a tokenizer reproducing the tokenization results. Therefore, our method can be applied to variety of tokenization met…
▽ More
This paper proposes a method to optimize tokenization for the performance improvement of already trained downstream models. Our method generates tokenization results attaining lower loss values of a given downstream model on the training data for restricting vocabularies and trains a tokenizer reproducing the tokenization results. Therefore, our method can be applied to variety of tokenization methods, while existing work cannot due to the simultaneous learning of the tokenizer and the downstream model. This paper proposes an example of the BiLSTM-based tokenizer with vocabulary restriction, which can capture wider contextual information for the tokenization process than non-neural-based tokenization methods used in existing work. Experimental results on text classification in Japanese, Chinese, and English text classification tasks show that the proposed method improves performance compared to the existing methods for tokenization optimization.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over Dropout
Authors:
Takuya Hiraoka,
Takashi Onishi,
Yoshimasa Tsuruoka
Abstract:
In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence is valuable for various purposes, including experience cleansing and analysis. One method for estimating the influence of individual experiences is agent comparison, but it is prohibitively expensive when there is a large number of exper…
▽ More
In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence is valuable for various purposes, including experience cleansing and analysis. One method for estimating the influence of individual experiences is agent comparison, but it is prohibitively expensive when there is a large number of experiences. In this paper, we present PI+ToD as a method for efficiently estimating the influence of experiences. PI+ToD is a policy iteration that efficiently estimates the influence of experiences by utilizing turn-over dropout. We demonstrate the efficiency of PI+ToD with experiments in MuJoCo environments.
△ Less
Submitted 22 May, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
MaxMatch-Dropout: Subword Regularization for WordPiece
Authors:
Tatsuya Hiraoka
Abstract:
We present a subword regularization method for WordPiece, which uses a maximum matching algorithm for tokenization. The proposed method, MaxMatch-Dropout, randomly drops words in a search using the maximum matching algorithm. It realizes finetuning with subword regularization for popular pretrained language models such as BERT-base. The experimental results demonstrate that MaxMatch-Dropout improv…
▽ More
We present a subword regularization method for WordPiece, which uses a maximum matching algorithm for tokenization. The proposed method, MaxMatch-Dropout, randomly drops words in a search using the maximum matching algorithm. It realizes finetuning with subword regularization for popular pretrained language models such as BERT-base. The experimental results demonstrate that MaxMatch-Dropout improves the performance of text classification and machine translation tasks as well as other subword regularization methods. Moreover, we provide a comparative analysis of subword regularization methods: subword regularization with SentencePiece (Unigram), BPE-Dropout, and MaxMatch-Dropout.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation
Authors:
Sho Takase,
Tatsuya Hiraoka,
Naoaki Okazaki
Abstract:
Subword regularizations use multiple subword segmentations during training to improve the robustness of neural machine translation models. In previous subword regularizations, we use multiple segmentations in the training process but use only one segmentation in the inference. In this study, we propose an inference strategy to address this discrepancy. The proposed strategy approximates the margin…
▽ More
Subword regularizations use multiple subword segmentations during training to improve the robustness of neural machine translation models. In previous subword regularizations, we use multiple segmentations in the training process but use only one segmentation in the inference. In this study, we propose an inference strategy to address this discrepancy. The proposed strategy approximates the marginalized likelihood by using multiple segmentations including the most plausible segmentation and several sampled segmentations. Because the proposed strategy aggregates predictions from several segmentations, we can regard it as a single model ensemble that does not require any additional cost for training. Experimental results show that the proposed strategy improves the performance of models trained with subword regularization in low-resource machine translation tasks.
△ Less
Submitted 25 March, 2022;
originally announced March 2022.
-
Dropout Q-Functions for Doubly Efficient Reinforcement Learning
Authors:
Takuya Hiraoka,
Takahisa Imagawa,
Taisei Hashimoto,
Takashi Onishi,
Yoshimasa Tsuruoka
Abstract:
Randomized ensembled double Q-learning (REDQ) (Chen et al., 2021b) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is made possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC) (Haarnoja et al.,…
▽ More
Randomized ensembled double Q-learning (REDQ) (Chen et al., 2021b) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is made possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC) (Haarnoja et al., 2018a). To make REDQ more computationally efficient, we propose a method of improving computational efficiency called DroQ, which is a variant of REDQ that uses a small ensemble of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped with dropout connection and layer normalization. Despite its simplicity of implementation, our experimental results indicate that DroQ is doubly (sample and computationally) efficient. It achieved comparable sample efficiency with REDQ, much better computational efficiency than REDQ, and comparable computational efficiency with that of SAC.
△ Less
Submitted 16 March, 2022; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Relation Analysis between Hotel Review Rating Scores and Sentiment Analysis of Reviews by Chinese Tourists Visiting Japan
Authors:
Elisa Claire Alemán Carreón,
Hirofumi Nonaka,
Toru Hiraoka
Abstract:
In current times, the importance of online hotel review sites has become more and more apparent. Users of these sites reference of reviews strongly influences their purchase behavior and as such, reviews are important to companies and researchers alike. The majority of review sites offer both text reviews and numerical hotel ratings, and both information sources are widely used by researchers as a…
▽ More
In current times, the importance of online hotel review sites has become more and more apparent. Users of these sites reference of reviews strongly influences their purchase behavior and as such, reviews are important to companies and researchers alike. The majority of review sites offer both text reviews and numerical hotel ratings, and both information sources are widely used by researchers as a representation of a customer's sentiment and opinion. However, an opinion is a difficult concept to measure, and as such, depending on the relation these two sources have, it would be apparent whether or not it is safe to consider them equally in research. In this study we utilize an entropy-based Support Vector Machine to classify positive and negative sentiments in hotel reviews from the site Ctrip, then calculating the ratio of positive and negative sentiment in each review and examine their correlation with said review's rating score using Spearman and Kendall Correlation coefficients and Maximal Information Coefficient (MIC).
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Differences in Chinese and Western tourists faced with Japanese hospitality: A natural language processing approach
Authors:
Elisa Claire Alemán Carreón,
Hugo Alberto Mendoza España,
Hirofumi Nonaka,
Toru Hiraoka
Abstract:
Since culture influences expectations, perceptions, and satisfaction, a cross-culture study is necessary to understand the differences between Japan's biggest tourist populations, Chinese and Western tourists. However, with ever-increasing customer populations, this is hard to accomplish without extensive customer base studies. There is a need for an automated method for identifying these expectat…
▽ More
Since culture influences expectations, perceptions, and satisfaction, a cross-culture study is necessary to understand the differences between Japan's biggest tourist populations, Chinese and Western tourists. However, with ever-increasing customer populations, this is hard to accomplish without extensive customer base studies. There is a need for an automated method for identifying these expectations at a large scale. For this, we used a data-driven approach to our analysis. Our study analyzed their satisfaction factors comparing soft attributes, such as service, with hard attributes, such as location and facilities, and studied different price ranges. We collected hotel reviews and extracted keywords to classify the sentiment of sentences with an SVC. We then used dependency parsing and part-of-speech tagging to extract nouns tied to positive adjectives. We found that Chinese tourists consider room quality more than hospitality, whereas Westerners are delighted more by staff behavior. Furthermore, the lack of a Chinese-friendly environment for Chinese customers and cigarette smell for Western ones can be disappointing factors of their stay. As one of the first studies in the tourism field to use the high-standard Japanese hospitality environment for this analysis, our cross-cultural study contributes to both the theoretical understanding of satisfaction and suggests practical applications and strategies for hotel managers.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
Exploration of increasing drivers trust in a semi-autonomous vehicle through real time visualizations of collaborative driving dynamic
Authors:
A. Koegel,
C. Furet,
T. Suzuki,
Y. Klebanov,
J. Hu,
T. Kappeler,
D. Okazaki,
K. Matsui,
T. Hiraoka,
K. Shimono,
K. Nakano,
K. Honma,
M. Pennington
Abstract:
The Thinking Wave is an ongoing development of visualization concepts showing the real-time effort and confidence of semi-autonomous vehicle (AV) systems. Offering drivers access to this information can inform their decision making, and enable them to handle the situation accordingly and takeover when necessary. Two different visualizations have been designed, Concept one, Tidal, demonstrates the…
▽ More
The Thinking Wave is an ongoing development of visualization concepts showing the real-time effort and confidence of semi-autonomous vehicle (AV) systems. Offering drivers access to this information can inform their decision making, and enable them to handle the situation accordingly and takeover when necessary. Two different visualizations have been designed, Concept one, Tidal, demonstrates the AV systems effort through intensified activity of a simple graphic which fluctuates in speed and frequency. Concept two, Tandem, displays the effort of the AV system as well as the handling dynamic and shared responsibility between the driver and the vehicle system. Working collaboratively with mobility research teams at the University of Tokyo, we are prototyping and refining the Thinking Wave and its embodiments as we work towards building a testable version integrated into a driving simulator. The development of the thinking wave aims to calibrate trust by increasing the drivers knowledge and understanding of vehicle handling capacity. By enabling transparent communication of the AV systems capacity, we hope to empower AV-skeptic drivers and keep over-trusting drivers on alert in the case of an emergency takeover situation, in order to create a safer autonomous driving experience.
△ Less
Submitted 4 July, 2021;
originally announced July 2021.
-
How can design help enhance trust calibration in public autonomous vehicles?
Authors:
Yuri Klebanov,
Romi Mikulinsky,
Tom Reznikov,
Miles Pennington,
Yoshihiro Suda,
Toshihiro Hiraoka,
Shoichi Kanzaki
Abstract:
Trust is a multilayered concept with critical relevance when it comes to introducing new technologies. Understanding how humans will interact with complex vehicle systems and preparing for the functional, societal and psychological aspects of autonomous vehicles' entry into our cities is a pressing concern. Design tools can help calibrate the adequate and affordable level of trust needed for a saf…
▽ More
Trust is a multilayered concept with critical relevance when it comes to introducing new technologies. Understanding how humans will interact with complex vehicle systems and preparing for the functional, societal and psychological aspects of autonomous vehicles' entry into our cities is a pressing concern. Design tools can help calibrate the adequate and affordable level of trust needed for a safe and positive experience. This study focuses on passenger interactions capable of enhancing the system trustworthiness and data accuracy in future shared public transportation.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
Joint Optimization of Tokenization and Downstream Model
Authors:
Tatsuya Hiraoka,
Sho Takase,
Kei Uchiumi,
Atsushi Keyaki,
Naoaki Okazaki
Abstract:
Since traditional tokenizers are isolated from a downstream task and model, they cannot output an appropriate tokenization depending on the task and model, although recent studies imply that the appropriate tokenization improves the performance. In this paper, we propose a novel method to find an appropriate tokenization to a given downstream model by jointly optimizing a tokenizer and the model.…
▽ More
Since traditional tokenizers are isolated from a downstream task and model, they cannot output an appropriate tokenization depending on the task and model, although recent studies imply that the appropriate tokenization improves the performance. In this paper, we propose a novel method to find an appropriate tokenization to a given downstream model by jointly optimizing a tokenizer and the model. The proposed method has no restriction except for using loss values computed by the downstream model to train the tokenizer, and thus, we can apply the proposed method to any NLP task. Moreover, the proposed method can be used to explore the appropriate tokenization for an already trained model as post-processing. Therefore, the proposed method is applicable to various situations. We evaluated whether our method contributes to improving performance on text classification in three languages and machine translation in eight language pairs. Experimental results show that our proposed method improves the performance by determining appropriate tokenizations.
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces
Authors:
Takahisa Imagawa,
Takuya Hiraoka,
Yoshimasa Tsuruoka
Abstract:
Meta-reinforcement learning (RL) addresses the problem of sample inefficiency in deep RL by using experience obtained in past tasks for a new task to be solved.
However, most meta-RL methods require partially or fully on-policy data, i.e., they cannot reuse the data collected by past policies, which hinders the improvement of sample efficiency.
To alleviate this problem, we propose a novel off…
▽ More
Meta-reinforcement learning (RL) addresses the problem of sample inefficiency in deep RL by using experience obtained in past tasks for a new task to be solved.
However, most meta-RL methods require partially or fully on-policy data, i.e., they cannot reuse the data collected by past policies, which hinders the improvement of sample efficiency.
To alleviate this problem, we propose a novel off-policy meta-RL method, embedding learning and evaluation of uncertainty (ELUE).
An ELUE agent is characterized by the learning of a feature embedding space shared among tasks.
It learns a belief model over the embedding space and a belief-conditional policy and Q-function.
Then, for a new task, it collects data by the pretrained policy, and updates its belief based on the belief model.
Thanks to the belief update, the performance can be improved with a small amount of data.
In addition, it updates the parameters of the neural networks to adjust the pretrained relationships when there are enough data.
We demonstrate that ELUE outperforms state-of-the-art meta RL methods through experiments on meta-RL benchmarks.
△ Less
Submitted 6 January, 2021;
originally announced January 2021.
-
Named Entity Recognition and Relation Extraction using Enhanced Table Filling by Contextualized Representations
Authors:
Youmi Ma,
Tatsuya Hiraoka,
Naoaki Okazaki
Abstract:
In this study, a novel method for extracting named entities and relations from unstructured text based on the table representation is presented. By using contextualized word embeddings, the proposed method computes representations for entity mentions and long-range dependencies without complicated hand-crafted features or neural-network architectures. We also adapt a tensor dot-product to predict…
▽ More
In this study, a novel method for extracting named entities and relations from unstructured text based on the table representation is presented. By using contextualized word embeddings, the proposed method computes representations for entity mentions and long-range dependencies without complicated hand-crafted features or neural-network architectures. We also adapt a tensor dot-product to predict relation labels all at once without resorting to history-based predictions or search strategies. These advances significantly simplify the model and algorithm for the extraction of named entities and relations. Despite its simplicity, the experimental results demonstrate that the proposed method outperforms the state-of-the-art methods on the CoNLL04 and ACE05 English datasets. We also confirm that the proposed method achieves a comparable performance with the state-of-the-art NER models on the ACE05 datasets when multiple sentences are provided for context aggregation.
△ Less
Submitted 26 January, 2022; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Meta-Model-Based Meta-Policy Optimization
Authors:
Takuya Hiraoka,
Takahisa Imagawa,
Voot Tangkaratt,
Takayuki Osa,
Takashi Onishi,
Yoshimasa Tsuruoka
Abstract:
Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarante…
▽ More
Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.
△ Less
Submitted 11 October, 2021; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Modeling temporal networks with bursty activity patterns of nodes and links
Authors:
Takayuki Hiraoka,
Naoki Masuda,
Aming Li,
Hang-Hyun Jo
Abstract:
The concept of temporal networks provides a framework to understand how the interaction between system components changes over time. In empirical communication data, we often detect non-Poissonian, so-called bursty behavior in the activity of nodes as well as in the interaction between nodes. However, such reconciliation between node burstiness and link burstiness cannot be explained if the intera…
▽ More
The concept of temporal networks provides a framework to understand how the interaction between system components changes over time. In empirical communication data, we often detect non-Poissonian, so-called bursty behavior in the activity of nodes as well as in the interaction between nodes. However, such reconciliation between node burstiness and link burstiness cannot be explained if the interaction processes on different links are independent of each other. This is because the activity of a node is the superposition of the interaction processes on the links incident to the node and the superposition of independent bursty point processes is not bursty in general. Here we introduce a temporal network model based on bursty node activation and show that it leads to heavy-tailed inter-event time distributions for both node dynamics and link dynamics. Our analysis indicates that activation processes intrinsic to nodes give rise to dynamical correlations across links. Our framework offers a way to model competition and correlation between links, which is key to understanding dynamical processes in various systems.
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
Optimistic Proximal Policy Optimization
Authors:
Takahisa Imagawa,
Takuya Hiraoka,
Yoshimasa Tsuruoka
Abstract:
Reinforcement Learning, a machine learning framework for training an autonomous agent based on rewards, has shown outstanding results in various domains. However, it is known that learning a good policy is difficult in a domain where rewards are rare. We propose a method, optimistic proximal policy optimization (OPPO) to alleviate this difficulty. OPPO considers the uncertainty of the estimated to…
▽ More
Reinforcement Learning, a machine learning framework for training an autonomous agent based on rewards, has shown outstanding results in various domains. However, it is known that learning a good policy is difficult in a domain where rewards are rare. We propose a method, optimistic proximal policy optimization (OPPO) to alleviate this difficulty. OPPO considers the uncertainty of the estimated total return and optimistically evaluates the policy based on that amount. We show that OPPO outperforms the existing methods in a tabular task.
△ Less
Submitted 25 June, 2019;
originally announced June 2019.
-
Explicit behaviors affected by driver's trust in a driving automation system
Authors:
Hailong Liu,
Toshihiro Hiraoka,
Seiya Tanaka
Abstract:
As various driving automation system (DAS) are commonly used in the vehicle, the over-trust in the DAS may put the driver in the risk. In order to prevent the over-trust while driving, the trust state of the driver should be recognized. However, description variables of the trust state are not distinct. This paper assumed that the outward expressions of a driver can represent the trust state of hi…
▽ More
As various driving automation system (DAS) are commonly used in the vehicle, the over-trust in the DAS may put the driver in the risk. In order to prevent the over-trust while driving, the trust state of the driver should be recognized. However, description variables of the trust state are not distinct. This paper assumed that the outward expressions of a driver can represent the trust state of him/her-self. The explicit behaviors when driving with DAS is seen as those outward expressions. In the experiment, a driving simulator with a driver monitoring system was used for simulating a vehicle with the adaptive cruise control (ACC) and observing the motion information of the driver. Results show that if the driver completely trusted in the ACC, then 1) the participants were likely to put their feet far away from the pedals; 2) the operational intervention of the driver will delay in dangerous situations. In the future, a machine learning model will be tried to predict the trust state by using the motion information of the driver.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Learning Robust Options by Conditional Value at Risk Optimization
Authors:
Takuya Hiraoka,
Takahisa Imagawa,
Tatsuya Mori,
Takashi Onishi,
Yoshimasa Tsuruoka
Abstract:
Options are generally learned by using an inaccurate environment model (or simulator), which contains uncertain model parameters. While there are several methods to learn options that are robust against the uncertainty of model parameters, these methods only consider either the worst case or the average (ordinary) case for learning options. This limited consideration of the cases often produces op…
▽ More
Options are generally learned by using an inaccurate environment model (or simulator), which contains uncertain model parameters. While there are several methods to learn options that are robust against the uncertainty of model parameters, these methods only consider either the worst case or the average (ordinary) case for learning options. This limited consideration of the cases often produces options that do not work well in the unconsidered case. In this paper, we propose a conditional value at risk (CVaR)-based method to learn options that work well in both the average and worst cases. We extend the CVaR-based policy gradient method proposed by Chow and Ghavamzadeh (2014) to deal with robust Markov decision processes and then apply the extended method to learning robust options. We conduct experiments to evaluate our method in multi-joint robot control tasks (HopperIceBlock, Half-Cheetah, and Walker2D). Experimental results show that our method produces options that 1) give better worst-case performance than the options learned only to minimize the average-case loss, and 2) give better average-case performance than the options learned only to minimize the worst-case loss.
△ Less
Submitted 31 October, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
Saliency difference based objective evaluation method for a superimposed screen of the HUD with various background
Authors:
Hailong Liu,
Toshihiro Hiraoka,
Takatsugu Hirayama,
Dongmin Kim
Abstract:
The head-up display (HUD) is an emerging device which can project information on a transparent screen. The HUD has been used in airplanes and vehicles, and it is usually placed in front of the operator's view. In the case of the vehicle, the driver can see not only various information on the HUD but also the backgrounds (driving environment) through the HUD. However, the projected information on t…
▽ More
The head-up display (HUD) is an emerging device which can project information on a transparent screen. The HUD has been used in airplanes and vehicles, and it is usually placed in front of the operator's view. In the case of the vehicle, the driver can see not only various information on the HUD but also the backgrounds (driving environment) through the HUD. However, the projected information on the HUD may interfere with the colors in the background because the HUD is transparent. For example, a red message on the HUD will be less noticeable when there is an overlap between it and the red brake light from the front vehicle. As the first step to solve this issue, how to evaluate the mutual interference between the information on the HUD and backgrounds is important. Therefore, this paper proposes a method to evaluate the mutual interference based on saliency. It can be evaluated by comparing the HUD part cut from a saliency map of a measured image with the HUD image.
△ Less
Submitted 13 May, 2019;
originally announced May 2019.
-
Emotional Contribution Analysis of Online Reviews
Authors:
Elisa Claire Alemán Carreón,
Hirofumi Nonaka,
Toru Hiraoka,
Minoru Kumano,
Takao Ito,
Masaharu Hirota
Abstract:
In response to the constant increase in population and tourism worldwide, there is a need for the development of cross-language market research tools that are more cost and time effective than surveys or interviews. Focusing on the Chinese tourism boom and the hotel industry in Japan, we extracted the most influential keywords in emotional judgement from Chinese online reviews of Japanese hotels i…
▽ More
In response to the constant increase in population and tourism worldwide, there is a need for the development of cross-language market research tools that are more cost and time effective than surveys or interviews. Focusing on the Chinese tourism boom and the hotel industry in Japan, we extracted the most influential keywords in emotional judgement from Chinese online reviews of Japanese hotels in the portal site Ctrip. Using an entropy based mathematical model and a machine learning algorithm, we determined the words that most closely represent the demands and emotions of this customer base.
△ Less
Submitted 1 May, 2019;
originally announced May 2019.
-
Analysis of Chinese Tourists in Japan by Text Mining of a Hotel Portal Site
Authors:
Elisa Claire Alemán Carreón,
Hirofumi Nonaka,
Toru Hiraoka
Abstract:
With an increasingly large number of Chinese tourists in Japan, the hotel industry is in need of an affordable market research tool that does not rely on expensive and time-consuming surveys or interviews. Because this problem is real and relevant to the hotel industry in Japan, and otherwise completely unexplored in other studies, we have extracted a list of potential keywords from Chinese review…
▽ More
With an increasingly large number of Chinese tourists in Japan, the hotel industry is in need of an affordable market research tool that does not rely on expensive and time-consuming surveys or interviews. Because this problem is real and relevant to the hotel industry in Japan, and otherwise completely unexplored in other studies, we have extracted a list of potential keywords from Chinese reviews of Japanese hotels in the hotel portal site Ctrip1 using a mathematical model to then use them in a sentiment analysis with a machine learning classifier. While most studies that use information collected from the internet use pre-existing data analysis tools, in our study, we designed the mathematical model to have the highest possible performing results in classification, while also exploring on the potential business implications these may have.
△ Less
Submitted 1 May, 2019; v1 submitted 24 April, 2019;
originally announced April 2019.
-
Topic Classification Method for Analyzing Effect of eWOM on Consumer Game Sales
Authors:
Yoshiki Horii,
Hirofumi Nonaka,
Elisa Claire Alemán Carreón,
Hiroki Horino,
Toru Hiraoka
Abstract:
Electronic word-of-mouth (eWOM) has become an important resource for the analysis of marketing research. In this study, in order to analyze user needs for consumer game software, we focus on tweet data. And we proposed topic extraction method using entropy-based feature selection based feature expansion. We also applied it to the classification of the data extracted from tweet data by using SVM. A…
▽ More
Electronic word-of-mouth (eWOM) has become an important resource for the analysis of marketing research. In this study, in order to analyze user needs for consumer game software, we focus on tweet data. And we proposed topic extraction method using entropy-based feature selection based feature expansion. We also applied it to the classification of the data extracted from tweet data by using SVM. As a result, we achieved a 0.63 F-measure.
△ Less
Submitted 23 April, 2019;
originally announced April 2019.
-
Community Detection and Growth Potential Prediction Using the Stochastic Block Model and the Long Short-term Memory from Patent Citation Networks
Authors:
Kensei Nakai,
Hirofumi Nonaka,
Asahi Hentona,
Yuki Kanai,
Takeshi Sakumoto,
Shotaro Kataoka,
Elisa Claire Alemán Carreón,
Toru Hiraoka
Abstract:
Scoring patent documents is very useful for technology management. However, conventional methods are based on static models and, thus, do not reflect the growth potential of the technology cluster of the patent. Because even if the cluster of a patent has no hope of growing, we recognize the patent is important if PageRank or other ranking score is high. Therefore, there arises a necessity of deve…
▽ More
Scoring patent documents is very useful for technology management. However, conventional methods are based on static models and, thus, do not reflect the growth potential of the technology cluster of the patent. Because even if the cluster of a patent has no hope of growing, we recognize the patent is important if PageRank or other ranking score is high. Therefore, there arises a necessity of developing citation network clustering and prediction of future citations. In our research, clustering of patent citation networks by Stochastic Block Model was done with the aim of enabling corporate managers and investors to evaluate the scale and life cycle of technology. As a result, we confirmed nested SBM is appropriate for graph clustering of patent citation networks. Also, a high MAPE value was obtained and the direction accuracy achieved a value greater than 50% when predicting growth potential for each cluster by using LSTM.
△ Less
Submitted 23 April, 2019;
originally announced April 2019.
-
Community Detection and Growth Potential Prediction from Patent Citation Networks
Authors:
Asahi Hentona,
Takeshi Sakumoto,
Hugo Alberto Mendoza España,
Hirofumi Nonaka,
Shotaro Kataoka,
Toru Hiraoka,
Kensei Nakai,
Elisa Claire Alemán Carreón,
Masaharu Hirota
Abstract:
The scoring of patents is useful for technology management analysis. Therefore, a necessity of developing citation network clustering and prediction of future citations for practical patent scoring arises. In this paper, we propose a community detection method using the Node2vec. And in order to analyze growth potential we compare three ''time series analysis methods'', the Long Short-Term Memory…
▽ More
The scoring of patents is useful for technology management analysis. Therefore, a necessity of developing citation network clustering and prediction of future citations for practical patent scoring arises. In this paper, we propose a community detection method using the Node2vec. And in order to analyze growth potential we compare three ''time series analysis methods'', the Long Short-Term Memory (LSTM), ARIMA model, and Hawkes Process. The results of our experiments, we could find common technical points from those clusters by Node2vec. Furthermore, we found that the prediction accuracy of the ARIMA model was higher than that of other models.
△ Less
Submitted 23 April, 2019;
originally announced April 2019.
-
Causal relationship between eWOM topics and profit of rural tourism at Japanese Roadside Stations "MICHINOEKI"
Authors:
Elisa Claire Alemán Carreón,
Tetsuro Ito,
Hirofumi Nonaka,
Minoru Kumano,
Toru Hiraoka,
Masaharu Hirota
Abstract:
Affected by urbanization, centralization and the decrease of overall population, Japan has been making efforts to revitalize the rural areas across the country. One particular effort is to increase tourism to these rural areas via regional branding, using local farm products as tourist attractions across Japan. Particularly, a program subsidized by the government called Michinoeki, which stands fo…
▽ More
Affected by urbanization, centralization and the decrease of overall population, Japan has been making efforts to revitalize the rural areas across the country. One particular effort is to increase tourism to these rural areas via regional branding, using local farm products as tourist attractions across Japan. Particularly, a program subsidized by the government called Michinoeki, which stands for 'roadside station', was created 20 years ago and it strives to provide a safe and comfortable space for cultural interaction between road travelers and the local community, as well as offering refreshment, and relevant information to travelers. However, despite its importance in the revitalization of the Japanese economy, studies with newer technologies and methodologies are lacking. Using sales data from establishments in the Kyushu area of Japan, we used Support Vector to classify content from Twitter into relevant topics and studied their causal relationship to the sales for each establishment using LiNGAM, a linear non-gaussian acyclic model built for causal structure analysis, to perform an improved market analysis considering more than just correlation. Under the hypotheses stated by the LiNGAM model, we discovered a positive causal relationship between the number of tweets mentioning those establishments, specially mentioning deserts, a need for better access and traf^ic options, and a potentially untapped customer base in motorcycle biker groups.
△ Less
Submitted 1 May, 2019; v1 submitted 25 April, 2019;
originally announced April 2019.
-
Development of an Entropy-Based Feature Selection Method and Analysis of Online Reviews on Real Estate
Authors:
Hiroki Horino,
Hirofumi Nonaka,
Elisa Claire Alemán Carreón,
Toru Hiraoka
Abstract:
In recent years, data posted about real estate on the Internet is currently increasing. In this study, in order to analyze user needs for real estate, we focus on "Mansion Community" which is a Japanese bulletin board system (hereinafter referred to as BBS) about Japanese real estate. In our study, extraction of keywords is performed based on the calculation of the entropy value of each word, and…
▽ More
In recent years, data posted about real estate on the Internet is currently increasing. In this study, in order to analyze user needs for real estate, we focus on "Mansion Community" which is a Japanese bulletin board system (hereinafter referred to as BBS) about Japanese real estate. In our study, extraction of keywords is performed based on the calculation of the entropy value of each word, and we used them as features in a machine learning classifier to analyze 6 million posts at "Mansion Community". As a result, we achieved a 0.69 F-measure and found that the customers are particularly concerned about the facility of apartment, access, and price of an apartment.
△ Less
Submitted 23 April, 2019;
originally announced April 2019.
-
Driving behavior model considering driver's over-trust in driving automation system
Authors:
Hailong Liu,
Toshihiro Hiraoka
Abstract:
Levels one to three of driving automation systems~(DAS) are spreading fast. However, as the DAS functions become more and more sophisticated, not only the driver's driving skills will reduce, but also the problem of over-trust will become serious. If a driver has over-trust in the DAS, he/she will become not aware of hazards in time. To prevent the driver's over-trust in the DAS, this paper discus…
▽ More
Levels one to three of driving automation systems~(DAS) are spreading fast. However, as the DAS functions become more and more sophisticated, not only the driver's driving skills will reduce, but also the problem of over-trust will become serious. If a driver has over-trust in the DAS, he/she will become not aware of hazards in time. To prevent the driver's over-trust in the DAS, this paper discusses the followings: 1) the definition of over-trust in the DAS, 2) a hypothesis of occurrence condition and occurrence process of over-trust in the DAS, and 3) a driving behavior model based on the trust in the DAS, the risk homeostasis theory, and the over-trust prevention human-machine interface.
△ Less
Submitted 19 June, 2019; v1 submitted 21 December, 2018;
originally announced December 2018.
-
Optimization of Information-Seeking Dialogue Strategy for Argumentation-Based Dialogue System
Authors:
Hisao Katsumi,
Takuya Hiraoka,
Koichiro Yoshino,
Kazeto Yamamoto,
Shota Motoura,
Kunihiko Sadamasa,
Satoshi Nakamura
Abstract:
Argumentation-based dialogue systems, which can handle and exchange arguments through dialogue, have been widely researched. It is required that these systems have sufficient supporting information to argue their claims rationally; however, the systems often do not have enough of such information in realistic situations. One way to fill in the gap is acquiring such missing information from dialogu…
▽ More
Argumentation-based dialogue systems, which can handle and exchange arguments through dialogue, have been widely researched. It is required that these systems have sufficient supporting information to argue their claims rationally; however, the systems often do not have enough of such information in realistic situations. One way to fill in the gap is acquiring such missing information from dialogue partners (information-seeking dialogue). Existing information-seeking dialogue systems are based on handcrafted dialogue strategies that exhaustively examine missing information. However, the proposed strategies are not specialized in collecting information for constructing rational arguments. Moreover, the number of system's inquiry candidates grows in accordance with the size of the argument set that the system deal with. In this paper, we formalize the process of information-seeking dialogue as Markov decision processes (MDPs) and apply deep reinforcement learning (DRL) for automatically optimizing a dialogue strategy. By utilizing DRL, our dialogue strategy can successfully minimize objective functions, the number of turns it takes for our system to collect necessary information in a dialogue. We conducted dialogue experiments using two datasets from different domains of argumentative dialogue. Experimental results show that the proposed formalization based on MDP works well, and the policy optimized by DRL outperformed existing heuristic dialogue strategies.
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
Refining Manually-Designed Symbol Grounding and High-Level Planning by Policy Gradients
Authors:
Takuya Hiraoka,
Takashi Onishi,
Takahisa Imagawa,
Yoshimasa Tsuruoka
Abstract:
Hierarchical planners that produce interpretable and appropriate plans are desired, especially in its application to supporting human decision making. In the typical development of the hierarchical planners, higher-level planners and symbol grounding functions are manually created, and this manual creation requires much human effort. In this paper, we propose a framework that can automatically ref…
▽ More
Hierarchical planners that produce interpretable and appropriate plans are desired, especially in its application to supporting human decision making. In the typical development of the hierarchical planners, higher-level planners and symbol grounding functions are manually created, and this manual creation requires much human effort. In this paper, we propose a framework that can automatically refine symbol grounding functions and a high-level planner to reduce human effort for designing these modules. In our framework, symbol grounding and high-level planning, which are based on manually-designed knowledge bases, are modeled with semi-Markov decision processes. A policy gradient method is then applied to refine the modules, in which two terms for updating the modules are considered. The first term, called a reinforcement term, contributes to updating the modules to improve the overall performance of a hierarchical planner to produce appropriate plans. The second term, called a penalty term, contributes to keeping refined modules consistent with the manually-designed original modules. Namely, it keeps the planner, which uses the refined modules, producing interpretable plans. We perform preliminary experiments to solve the Mountain car problem, and its results show that a manually-designed high-level planner and symbol grounding function were successfully refined by our framework.
△ Less
Submitted 29 September, 2018;
originally announced October 2018.
-
Monte Carlo Tree Search with Scalable Simulation Periods for Continuously Running Tasks
Authors:
Seydou Ba,
Takuya Hiraoka,
Takashi Onishi,
Toru Nakata,
Yoshimasa Tsuruoka
Abstract:
Monte Carlo Tree Search (MCTS) is particularly adapted to domains where the potential actions can be represented as a tree of sequential decisions. For an effective action selection, MCTS performs many simulations to build a reliable tree representation of the decision space. As such, a bottleneck to MCTS appears when enough simulations cannot be performed between action selections. This is partic…
▽ More
Monte Carlo Tree Search (MCTS) is particularly adapted to domains where the potential actions can be represented as a tree of sequential decisions. For an effective action selection, MCTS performs many simulations to build a reliable tree representation of the decision space. As such, a bottleneck to MCTS appears when enough simulations cannot be performed between action selections. This is particularly highlighted in continuously running tasks, for which the time available to perform simulations between actions tends to be limited due to the environment's state constantly changing. In this paper, we present an approach that takes advantage of the anytime characteristic of MCTS to increase the simulation time when allowed. Our approach is to effectively balance the prospect of selecting an action with the time that can be spared to perform MCTS simulations before the next action selection. For that, we considered the simulation time as a decision variable to be selected alongside an action. We extended the Hierarchical Optimistic Optimization applied to Tree (HOOT) method to adapt our approach to environments with a continuous decision space. We evaluated our approach for environments with a continuous decision space through OpenAI gym's Pendulum and Continuous Mountain Car environments and for environments with discrete action space through the arcade learning environment (ALE) platform. The evaluation results show that, with variable simulation times, the proposed approach outperforms the conventional MCTS in the evaluated continuous decision space tasks and improves the performance of MCTS in most of the ALE tasks.
△ Less
Submitted 7 September, 2018;
originally announced September 2018.
-
Correlated bursts in temporal networks slow down spreading
Authors:
Takayuki Hiraoka,
Hang-Hyun Jo
Abstract:
Spreading dynamics has been considered to take place in temporal networks, where temporal interaction patterns between nodes show non-Poissonian bursty nature. The effects of inhomogeneous interevent times (IETs) on the spreading have been extensively studied in recent years, yet little is known about the effects of correlations between IETs on the spreading. In order to investigate those effects,…
▽ More
Spreading dynamics has been considered to take place in temporal networks, where temporal interaction patterns between nodes show non-Poissonian bursty nature. The effects of inhomogeneous interevent times (IETs) on the spreading have been extensively studied in recent years, yet little is known about the effects of correlations between IETs on the spreading. In order to investigate those effects, we study two-step deterministic susceptible-infected (SI) and probabilistic SI dynamics when the interaction patterns are modeled by inhomogeneous and correlated IETs, i.e., correlated bursts. By analyzing the transmission time statistics in a single-link setup and by simulating the spreading in Bethe lattices and random graphs, we conclude that the positive correlation between IETs slows down the spreading. We also argue that the shortest transmission time from one infected node to its susceptible neighbors can successfully explain our numerical results.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
Deep Reinforcement Learning for Inquiry Dialog Policies with Logical Formula Embeddings
Authors:
Takuya Hiraoka,
Masaaki Tsuchida,
Yotaro Watanabe
Abstract:
This paper is the first attempt to learn the policy of an inquiry dialog system (IDS) by using deep reinforcement learning (DRL). Most IDS frameworks represent dialog states and dialog acts with logical formulae. In order to make learning inquiry dialog policies more effective, we introduce a logical formula embedding framework based on a recursive neural network. The results of experiments to eva…
▽ More
This paper is the first attempt to learn the policy of an inquiry dialog system (IDS) by using deep reinforcement learning (DRL). Most IDS frameworks represent dialog states and dialog acts with logical formulae. In order to make learning inquiry dialog policies more effective, we introduce a logical formula embedding framework based on a recursive neural network. The results of experiments to evaluate the effect of 1) the DRL and 2) the logical formula embedding framework show that the combination of the two are as effective or even better than existing rule-based methods for inquiry dialog policies.
△ Less
Submitted 2 August, 2017;
originally announced August 2017.