Search | arXiv e-print repository

PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning

Authors: Shiva Kumar Pentyala, Zhichao Wang, Bin Bi, Kiran Ramnath, Xiang-Bo Mao, Regunathan Radhakrishnan, Sitaram Asur, Na, Cheng

Abstract: Large language models (LLMs) have shown remarkable abilities in diverse natural language processing (NLP) tasks. The LLMs generally undergo supervised fine-tuning (SFT) followed by preference alignment to be usable in downstream applications. However, this sequential training pipeline leads to alignment tax that degrades the LLM performance. This paper introduces PAFT, a new PArallel training pa… ▽ More Large language models (LLMs) have shown remarkable abilities in diverse natural language processing (NLP) tasks. The LLMs generally undergo supervised fine-tuning (SFT) followed by preference alignment to be usable in downstream applications. However, this sequential training pipeline leads to alignment tax that degrades the LLM performance. This paper introduces PAFT, a new PArallel training paradigm for effective LLM Fine-Tuning, which independently performs SFT and preference alignment (e.g., DPO and ORPO, etc.) with the same pre-trained model on respective datasets. The model produced by SFT and the model from preference alignment are then merged into a final model by parameter fusing for use in downstream applications. This work reveals important findings that preference alignment like DPO naturally results in a sparse model while SFT leads to a natural dense model which needs to be sparsified for effective model merging. This paper introduces an effective interference resolution which reduces the redundancy by sparsifying the delta parameters. The LLM resulted from the new training paradigm achieved Rank #1 on the HuggingFace Open LLM Leaderboard. Comprehensive evaluation shows the effectiveness of the parallel training paradigm. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2109.01569 [pdf, other]

Deep Metric Learning for Ground Images

Authors: Raaghav Radhakrishnan, Jan Fabian Schmid, Randolf Scholz, Lars Schmidt-Thieme

Abstract: Ground texture based localization methods are potential prospects for low-cost, high-accuracy self-localization solutions for robots. These methods estimate the pose of a given query image, i.e. the current observation of the ground from a downward-facing camera, in respect to a set of reference images whose poses are known in the application area. In this work, we deal with the initial localizati… ▽ More Ground texture based localization methods are potential prospects for low-cost, high-accuracy self-localization solutions for robots. These methods estimate the pose of a given query image, i.e. the current observation of the ground from a downward-facing camera, in respect to a set of reference images whose poses are known in the application area. In this work, we deal with the initial localization task, in which we have no prior knowledge about the current robot positioning. In this situation, the localization method would have to consider all available reference images. However, in order to reduce computational effort and the risk of receiving a wrong result, we would like to consider only those reference images that are actually overlapping with the query image. For this purpose, we propose a deep metric learning approach that retrieves the most similar reference images to the query image. In contrast to existing approaches to image retrieval for ground images, our approach achieves significantly better recall performance and improves the localization performance of a state-of-the-art ground texture based localization method. △ Less

Submitted 3 September, 2021; originally announced September 2021.

arXiv:2106.14815 [pdf, other]

Feature Importance Guided Attack: A Model Agnostic Adversarial Attack

Authors: Gilad Gressel, Niranjan Hegde, Archana Sreekumar, Rishikumar Radhakrishnan, Kalyani Harikumar, Anjali S., Krishnashree Achuthan

Abstract: Research in adversarial learning has primarily focused on homogeneous unstructured datasets, which often map into the problem space naturally. Inverting a feature space attack on heterogeneous datasets into the problem space is much more challenging, particularly the task of finding the perturbation to perform. This work presents a formal search strategy: the `Feature Importance Guided Attack' (FI… ▽ More Research in adversarial learning has primarily focused on homogeneous unstructured datasets, which often map into the problem space naturally. Inverting a feature space attack on heterogeneous datasets into the problem space is much more challenging, particularly the task of finding the perturbation to perform. This work presents a formal search strategy: the `Feature Importance Guided Attack' (FIGA), which finds perturbations in the feature space of heterogeneous tabular datasets to produce evasion attacks. We first demonstrate FIGA in the feature space and then in the problem space. FIGA assumes no prior knowledge of the defending model's learning algorithm and does not require any gradient information. FIGA assumes knowledge of the feature representation and the mean feature values of defending model's dataset. FIGA leverages feature importance rankings by perturbing the most important features of the input in the direction of the target class. While FIGA is conceptually similar to other work which uses feature selection processes (e.g., mimicry attacks), we formalize an attack algorithm with three tunable parameters and investigate the strength of FIGA on tabular datasets. We demonstrate the effectiveness of FIGA by evading phishing detection models trained on four different tabular phishing datasets and one financial dataset with an average success rate of 94%. We extend FIGA to the phishing problem space by limiting the possible perturbations to be valid and feasible in the phishing domain. We generate valid adversarial phishing sites that are visually identical to their unperturbed counterpart and use them to attack six tabular ML models achieving a 13.05% average success rate. △ Less

Submitted 13 January, 2023; v1 submitted 28 June, 2021; originally announced June 2021.

arXiv:1908.09207 [pdf, ps, other]

Demystifying the MLPerf Benchmark Suite

Authors: Snehil Verma, Qinzhe Wu, Bagus Hanindhito, Gunjan Jha, Eugene B. John, Ramesh Radhakrishnan, Lizy K. John

Abstract: MLPerf, an emerging machine learning benchmark suite strives to cover a broad range of applications of machine learning. We present a study on its characteristics and how the MLPerf benchmarks differ from some of the previous deep learning benchmarks like DAWNBench and DeepBench. We find that application benchmarks such as MLPerf (although rich in kernels) exhibit different features compared to ke… ▽ More MLPerf, an emerging machine learning benchmark suite strives to cover a broad range of applications of machine learning. We present a study on its characteristics and how the MLPerf benchmarks differ from some of the previous deep learning benchmarks like DAWNBench and DeepBench. We find that application benchmarks such as MLPerf (although rich in kernels) exhibit different features compared to kernel benchmarks such as DeepBench. MLPerf benchmark suite contains a diverse set of models which allows unveiling various bottlenecks in the system. Based on our findings, dedicated low latency interconnect between GPUs in multi-GPU systems is required for optimal distributed deep learning training. We also observe variation in scaling efficiency across the MLPerf models. The variation exhibited by the different models highlight the importance of smart scheduling strategies for multi-GPU training. Another observation is that CPU utilization increases with increase in number of GPUs used for training. Corroborating prior work we also observe and quantify improvements possible by compiler optimizations, mixed-precision training and use of Tensor Cores. △ Less

Submitted 24 August, 2019; originally announced August 2019.

arXiv:1609.08583 [pdf, other]

Survey of Inter-satellite Communication for Small Satellite Systems: Physical Layer to Network Layer View

Authors: Radhika Radhakrishnan, William Edmonson, Fatemeh Afghah, R. Rodriguez-Osorio, Frank Pinto, Scott Burleigh

Abstract: Small satellite systems enable whole new class of missions for navigation, communications, remote sensing and scientific research for both civilian and military purposes. As individual spacecraft are limited by the size, mass and power constraints, mass-produced small satellites in large constellations or clusters could be useful in many science missions such as gravity mapping, tracking of forest… ▽ More Small satellite systems enable whole new class of missions for navigation, communications, remote sensing and scientific research for both civilian and military purposes. As individual spacecraft are limited by the size, mass and power constraints, mass-produced small satellites in large constellations or clusters could be useful in many science missions such as gravity mapping, tracking of forest fires, finding water resources, etc. Constellation of satellites provide improved spatial and temporal resolution of the target. Small satellite constellations contribute innovative applications by replacing a single asset with several very capable spacecraft which opens the door to new applications. With increasing levels of autonomy, there will be a need for remote communication networks to enable communication between spacecraft. These space based networks will need to configure and maintain dynamic routes, manage intermediate nodes, and reconfigure themselves to achieve mission objectives. Hence, inter-satellite communication is a key aspect when satellites fly in formation. In this paper, we present the various researches being conducted in the small satellite community for implementing inter-satellite communications based on the Open System Interconnection (OSI) model. This paper also reviews the various design parameters applicable to the first three layers of the OSI model, i.e., physical, data link and network layer. Based on the survey, we also present a comprehensive list of design parameters useful for achieving inter-satellite communications for multiple small satellite missions. Specific topics include proposed solutions for some of the challenges faced by small satellite systems, enabling operations using a network of small satellites, and some examples of small satellite missions involving formation flying aspects. △ Less

Submitted 27 September, 2016; v1 submitted 27 September, 2016; originally announced September 2016.

Comments: 51 pages, 21 Figures, 11 Tables, accepted in IEEE Communications Surveys and Tutorials

arXiv:1411.3071 [pdf]

EMEEDP: Enhanced Multi-hop Energy Efficient Distributed Protocol for Heterogeneous Wireless Sensor Network

Authors: Sunil Kumar, Priya Ranjan, R. Radhakrishnan

Abstract: In WSN (Wireless Sensor Network) every sensor node sensed the data and transmit it to the CH (Cluster head) or BS (Base Station). Sensors are randomly deployed in unreachable areas, where battery replacement or battery charge is not possible. For this reason, Energy conservation is the important design goal while developing a routing and distributed protocol to increase the lifetime of WSN. In thi… ▽ More In WSN (Wireless Sensor Network) every sensor node sensed the data and transmit it to the CH (Cluster head) or BS (Base Station). Sensors are randomly deployed in unreachable areas, where battery replacement or battery charge is not possible. For this reason, Energy conservation is the important design goal while developing a routing and distributed protocol to increase the lifetime of WSN. In this paper, an enhanced energy efficient distributed protocol for heterogeneous WSN have been reported. EMEEDP is proposed for heterogeneous WSN to increase the lifetime of the network. An efficient algorithm is proposed in the form of flowchart and based on various clustering equation proved that the proposed work accomplishes longer lifetime with improved QOS parameters parallel to MEEP. A WSN implemented and tested using Raspberry Pi devices as a base station, temperature sensors as a node and xively.com as a cloud. Users use data for decision purpose or business purposes from xively.com using internet. △ Less

Submitted 14 November, 2014; v1 submitted 12 November, 2014; originally announced November 2014.

Comments: 6 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:1409.1412 by other authors

arXiv:1001.2250 [pdf]

An Efficient Inter Carrier Interference Cancellation Schemes for OFDM Systems

Authors: B. Sathish Kumar, K. R. Shankar Kumar, R. Radhakrishnan

Abstract: Orthogonal Frequency Division Multiplexing (OFDM) has recently been used widely in wireless communication systems. OFDM is very effective in combating intersymbol interference and can achieve high data rate in frequency selective channel. For OFDM communication systems, the frequency offsets in mobile radio channels distort the orthogonality between subcarriers resulting in Inter Carrier Interfe… ▽ More Orthogonal Frequency Division Multiplexing (OFDM) has recently been used widely in wireless communication systems. OFDM is very effective in combating intersymbol interference and can achieve high data rate in frequency selective channel. For OFDM communication systems, the frequency offsets in mobile radio channels distort the orthogonality between subcarriers resulting in Inter Carrier Interference (ICI). ICI causes power leakage among subcarriers thus degrading the system performance. A wellknown problem of OFDM is its sensitivity to frequency offset between the transmitted and received carrier frequencies. There are two deleterious effects caused by frequency offset one is the reduction of signal amplitude in the output of the filters matched to each of the carriers and the second is introduction of ICI from the other carriers. This research work investigates three effective methods for combating the effects of ICI: ICI Self Cancellation (SC), Maximum Likelihood (ML) estimation, and Extended Kalman Filter (EKF) method. These three methods are compared in terms of bit error rate performance and bandwidth efficiency. Through simulations, it is shown that the three techniques are effective in mitigating the modulation schemes, the ML and EKF methods perform better than the SC method. △ Less

Submitted 13 January, 2010; originally announced January 2010.

Comments: 8 pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS December 2009, ISSN 1947 5500, http://sites.google.com/site/ijcsis/

Report number: Volume 6, No. 3, ISSN 1947 5500

Journal ref: International Journal of Computer Science and Information Security, IJCSIS, Vol. 6, No. 3, pp. 141-148, December 2009, USA

Showing 1–7 of 7 results for author: Radhakrishnan, R