Showing 1–2 of 2 results for author: Yu, Z Z

Search v0.5.6 released 2020-02-24

arXiv:2305.06176 [pdf]

cs.CL cs.AI cs.LG

Fine-tuning Language Models with Generative Adversarial Reward Modelling

Authors: Zhang Ze Yu, Lau Jia Jaw, Zhang Hui, Bryan Kian Hsiang Low

Abstract: Reinforcement Learning with Human Feedback (RLHF) has been demonstrated to significantly enhance the performance of large language models (LLMs) by aligning their outputs with desired human values through instruction tuning. However, RLHF is constrained by the expertise and productivity limitations of human evaluators. A response to this downside is to fall back to supervised fine-tuning (SFT) wit… ▽ More Reinforcement Learning with Human Feedback (RLHF) has been demonstrated to significantly enhance the performance of large language models (LLMs) by aligning their outputs with desired human values through instruction tuning. However, RLHF is constrained by the expertise and productivity limitations of human evaluators. A response to this downside is to fall back to supervised fine-tuning (SFT) with additional carefully selected expert demonstrations. However, while this method has been proven to be effective, it invariably also leads to increased human-in-the-loop overhead. In this study, we propose another alternative approach: Reinforcement Learning with Generative Adversarial Feedback (RLGAF) to RLHF and SFT, which uses a generative adversarial training style to enable the LLMs to learn useful human expert demonstrations without being directly exposed to the training examples, thus enabling good generalization capabilities while preserving sample efficiency. Our preliminary findings indicate that RLGAF can help align LLMs outputs with competitive performance against RLHF and SFT, while not suffering from their respective inherent restrictions, suggesting promising avenues for further research on automating AI alignment. △ Less

Submitted 5 March, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: 22 pages, 9 figures, 12 tables
arXiv:2305.05953 [pdf]

quant-ph cs.DS cs.ET

Novel Quantum Information Processing Methods and Investigation

Authors: Zhang Ze Yu

Abstract: Quantum information processing and its subfield, quantum image processing, are rapidly growing fields as a result of advancements in the practicality of quantum mechanics. In this paper, we propose a quantum algorithm for processing information, such as one-dimensional time series and two-dimensional images, in the frequency domain. The information of interest is encoded into the magnitude of prob… ▽ More Quantum information processing and its subfield, quantum image processing, are rapidly growing fields as a result of advancements in the practicality of quantum mechanics. In this paper, we propose a quantum algorithm for processing information, such as one-dimensional time series and two-dimensional images, in the frequency domain. The information of interest is encoded into the magnitude of probability amplitude or the coefficient of each basis state. The oracle for filtering operates based on postselection results, and its explicit circuit design is presented. This oracle is versatile enough to perform all basic filtering, including high pass, low pass, band pass, band stop, and many other processing techniques. Finally, we present two novel schemes for transposing matrices in this paper. They use similar encoding rules but with deliberate choices in terms of selecting basis states. These schemes could potentially be useful for other quantum information processing tasks, such as edge detection. The proposed techniques are implemented on the IBM Qiskit quantum simulator. Some results are compared with traditional information processing results to verify their correctness and are presented in this paper. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: 12 pages, 53 figures

Search v0.5.6 released 2020-02-24