Zum Hauptinhalt springen

Showing 1–22 of 22 results for author: Dang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13739  [pdf, other

    cs.AI cs.CL cs.SE

    Scaling Granite Code Models to 128K Context

    Authors: Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, Bridget McGinn, Tim Bula, Mayank Mishra, Adriana Meza Soria, Gaoyuan Zhang, Aditya Prasad, Yikang Shen, Saptha Surendran, Shanmukha Guttula, Hima Patel, Parameswaran Selvam, Xuan-Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda

    Abstract: This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also re… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.03205  [pdf, other

    cs.CV

    Category-Aware Dynamic Label Assignment with High-Quality Oriented Proposal

    Authors: Mingkui Feng, Hancheng Yu, Xiaoyu Dang, Ming Zhou

    Abstract: Objects in aerial images are typically embedded in complex backgrounds and exhibit arbitrary orientations. When employing oriented bounding boxes (OBB) to represent arbitrary oriented objects, the periodicity of angles could lead to discontinuities in label regression values at the boundaries, inducing abrupt fluctuations in the loss function. To address this problem, an OBB representation based o… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  4. arXiv:2404.07919  [pdf, other

    cs.LG cs.AI

    Low-rank Adaptation for Spatio-Temporal Forecasting

    Authors: Weilin Ruan, Wei Chen, Xilin Dang, Jianxiang Zhou, Weichuang Li, Xu Liu, Yuxuan Liang

    Abstract: Spatio-temporal forecasting is crucial in real-world dynamic systems, predicting future changes using historical data from diverse locations. Existing methods often prioritize the development of intricate neural networks to capture the complex dependencies of the data, yet their accuracy fails to show sustained improvement. Besides, these methods also overlook node heterogeneity, hindering customi… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  5. arXiv:2402.18510  [pdf, other

    cs.LG cs.CL stat.ML

    RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

    Authors: Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

    Abstract: This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems. We focus on understanding whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. Our theoretical analysis reveal… ▽ More

    Submitted 10 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 40 pages, 5 figures, fix a bug in hybrid model training

  6. arXiv:2401.02701  [pdf, ps, other

    cs.IT eess.SP

    Joint User Association and Power Control for Cell-Free Massive MIMO

    Authors: Chongzheng Hao, Tung Thanh Vu, Hien Quoc Ngo, Minh N. Dao, Xiaoyu Dang, Chenghua Wang, Michail Matthaiou

    Abstract: This work proposes novel approaches that jointly design user equipment (UE) association and power control (PC) in a downlink user-centric cell-free massive multiple-input multiple-output (CFmMIMO) network, where each UE is only served by a set of access points (APs) for reducing the fronthaul signalling and computational complexity. In order to maximize the sum spectral efficiency (SE) of the UEs,… ▽ More

    Submitted 20 May, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: minor revision of the previous version

  7. arXiv:2311.03485  [pdf, other

    cs.RO cs.AI

    CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations

    Authors: Xuzhe Dang, Stefan Edelkamp, Nicolas Ribault

    Abstract: This paper presents a novel method for learning reward functions for robotic motions by harnessing the power of a CLIP-based model. Traditional reward function design often hinges on manual feature engineering, which can struggle to generalize across an array of tasks. Our approach circumvents this challenge by capitalizing on CLIP's capability to process both state features and image inputs effec… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  8. arXiv:2310.01232  [pdf, other

    cs.LG

    Modality-aware Transformer for Financial Time series Forecasting

    Authors: Hajar Emami, Xuan-Hong Dang, Yousaf Shah, Petros Zerfos

    Abstract: Time series forecasting presents a significant challenge, particularly when its accuracy relies on external data sources rather than solely on historical values. This issue is prevalent in the financial sector, where the future behavior of time series is often intricately linked to information derived from various textual reports and a multitude of economic indicators. In practice, the key challen… ▽ More

    Submitted 20 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

  9. arXiv:2306.00978  [pdf, other

    cs.CL

    AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

    Authors: Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han

    Abstract: Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge devices can reduce the cloud computing cost and protect users' privacy. However, the astronomical model size and the limited hardware resource pose significant deployment challenges. We propose Activation-aware Weight Quantization (AWQ), a hardware-… ▽ More

    Submitted 18 July, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: MLSys 2024 Best Paper Award. Code available at: https://github.com/mit-han-lab/llm-awq

  10. arXiv:2306.00262  [pdf, other

    cs.CV cs.LG

    Maximal Domain Independent Representations Improve Transfer Learning

    Authors: Adrian Shuai Li, Elisa Bertino, Xuan-Hong Dang, Ankush Singla, Yuhai Tu, Mark N Wegman

    Abstract: The most effective domain adaptation (DA) involves the decomposition of data representation into a domain independent representation (DIRep), and a domain dependent representation (DDRep). A classifier is trained by using the DIRep of the labeled source images. Since the DIRep is domain invariant, the classifier can be "transferred" to make predictions for the target domain with no (or few) labels… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

  11. arXiv:2212.01635  [pdf, other

    cs.SE cs.AI

    AI-driven Mobile Apps: an Explorative Study

    Authors: Yinghua Li, Xueqi Dang, Haoye Tian, Tiezhu Sun, Zhijie Wang, Lei Ma, Jacques Klein, Tegawendé F. Bissyandé

    Abstract: The integration of artificial intelligence (AI) into mobile applications has significantly transformed various domains, enhancing user experiences and providing personalized services through advanced machine learning (ML) and deep learning (DL) technologies. AI-driven mobile apps typically refer to applications that leverage ML/DL technologies to perform key tasks such as image recognition and nat… ▽ More

    Submitted 8 June, 2024; v1 submitted 3 December, 2022; originally announced December 2022.

  12. Survivable Free Space Optical Mesh Network using High-Altitude Platforms

    Authors: Dieu Linh Truong, Xuan Vuong Dang, The Ngoc Dang

    Abstract: Free space optical (FSO) communication refers to the information transmission technology based on the propagation of optical signals in space. FSO communication requires that the transmitter and receiver directly see each other. High-altitude platforms (HAPs) have been proposed for carrying FSO transceivers in the stratosphere. A multihop HAP network with FSO links can relay traffic between ground… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    ACM Class: C.2.1

  13. arXiv:2102.12347  [pdf, other

    cs.LG cs.AI

    AutoAI-TS: AutoAI for Time Series Forecasting

    Authors: Syed Yousaf Shah, Dhaval Patel, Long Vu, Xuan-Hong Dang, Bei Chen, Peter Kirchner, Horst Samulowitz, David Wood, Gregory Bramble, Wesley M. Gifford, Giridhar Ganapavarapu, Roman Vaculin, Petros Zerfos

    Abstract: A large number of time series forecasting models including traditional statistical models, machine learning models and more recently deep learning have been proposed in the literature. However, choosing the right model along with good parameter values that performs well on a given data is still challenging. Automatically providing a good set of models to users for a given dataset saves both time a… ▽ More

    Submitted 8 March, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: Accepted for publication at ACM SIGMOD 2021 Industry Track

  14. arXiv:2004.03437  [pdf, other

    eess.AS cs.CL cs.SD

    Homophone-based Label Smoothing in End-to-End Automatic Speech Recognition

    Authors: Yi Zheng, Xianjie Yang, Xuyong Dang

    Abstract: A new label smoothing method that makes use of prior knowledge of a language at human level, homophone, is proposed in this paper for automatic speech recognition (ASR). Compared with its forerunners, the proposed method uses pronunciation knowledge of homophones in a more complex way. End-to-end ASR models that learn acoustic model and language model jointly and modelling units of characters are… ▽ More

    Submitted 14 May, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

  15. arXiv:1912.10858  [pdf, other

    cs.CL cs.LG q-fin.ST stat.ML

    "The Squawk Bot": Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering

    Authors: Xuan-Hong Dang, Syed Yousaf Shah, Petros Zerfos

    Abstract: Multimodal analysis that uses numerical time series and textual corpora as input data sources is becoming a promising approach, especially in the financial industry. However, the main focus of such analysis has been on achieving high prediction accuracy while little effort has been spent on the important task of understanding the association between the two data modalities. Performance on the time… ▽ More

    Submitted 20 December, 2019; originally announced December 2019.

  16. arXiv:1909.08525  [pdf, other

    cs.LG stat.ML

    Measure Contribution of Participants in Federated Learning

    Authors: Guan Wang, Charlie Xiaoqian Dang, Ziye Zhou

    Abstract: Federated Machine Learning (FML) creates an ecosystem for multiple parties to collaborate on building models while protecting data privacy for the participants. A measure of the contribution for each party in FML enables fair credits allocation. In this paper we develop simple but powerful techniques to fairly calculate the contributions of multiple parties in FML, in the context of both horizonta… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

    Comments: arXiv admin note: text overlap with arXiv:1905.04519

  17. Estimating Feature-Label Dependence Using Gini Distance Statistics

    Authors: Silu Zhang, Xin Dang, Dao Nguyen, Dawn Wilkins, Yixin Chen

    Abstract: Identifying statistical dependence between the features and the label is a fundamental problem in supervised learning. This paper presents a framework for estimating dependence between numerical features and a categorical label using generalized Gini distance, an energy distance in reproducing kernel Hilbert spaces (RKHS). Two Gini distance based dependence measures are explored: Gini distance cov… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

  18. arXiv:1812.04448  [pdf, other

    cs.LG stat.ML

    seq2graph: Discovering Dynamic Dependencies from Multivariate Time Series with Multi-level Attention

    Authors: Xuan-Hong Dang, Syed Yousaf Shah, Petros Zerfos

    Abstract: Discovering temporal lagged and inter-dependencies in multivariate time series data is an important task. However, in many real-world applications, such as commercial cloud management, manufacturing predictive maintenance, and portfolios performance analysis, such dependencies can be non-linear and time-variant, which makes it more challenging to extract such dependencies through traditional metho… ▽ More

    Submitted 7 December, 2018; originally announced December 2018.

  19. Robust and Efficient Boosting Method using the Conditional Risk

    Authors: Zhi Xiao, Zhe Luo, Bo Zhong, Xin Dang

    Abstract: Well-known for its simplicity and effectiveness in classification, AdaBoost, however, suffers from overfitting when class-conditional distributions have significant overlap. Moreover, it is very sensitive to noise that appears in the labels. This article tackles the above limitations simultaneously via optimizing a modified loss function (i.e., the conditional risk). The proposed approach has the… ▽ More

    Submitted 21 June, 2018; originally announced June 2018.

    Comments: 14 Pages, 2 figures and 5 tables

  20. arXiv:1610.00054  [pdf, other

    cs.AI cs.LG

    Outlier Detection from Network Data with Subnetwork Interpretation

    Authors: Xuan-Hong Dang, Arlei Silva, Ambuj Singh, Ananthram Swami, Prithwish Basu

    Abstract: Detecting a small number of outliers from a set of data observations is always challenging. This problem is more difficult in the setting of multiple network samples, where computing the anomalous degree of a network sample is generally not sufficient. In fact, explaining why the network is exceptional, expressed in the form of subnetwork, is also equally important. In this paper, we develop a nov… ▽ More

    Submitted 30 September, 2016; originally announced October 2016.

  21. arXiv:1602.03320  [pdf, other

    cs.DS cs.SI

    Graph Wavelets via Sparse Cuts: Extended Version

    Authors: Arlei Silva, Xuan-Hong Dang, Prithwish Basu, Ambuj K Singh, Ananthram Swami

    Abstract: Modeling information that resides on vertices of large graphs is a key problem in several real-life applications, ranging from social networks to the Internet-of-things. Signal Processing on Graphs and, in particular, graph wavelets can exploit the intrinsic smoothness of these datasets in order to represent them in a both compact and accurate manner. However, how to discover wavelet bases that ca… ▽ More

    Submitted 12 June, 2016; v1 submitted 10 February, 2016; originally announced February 2016.

  22. arXiv:1512.06173  [pdf, ps, other

    cs.LG

    Discriminative Subnetworks with Regularized Spectral Learning for Global-state Network Data

    Authors: Xuan Hong Dang, Ambuj K. Singh, Petko Bogdanov, Hongyuan You, Bayyuan Hsu

    Abstract: Data mining practitioners are facing challenges from data with network structure. In this paper, we address a specific class of global-state networks which comprises of a set of network instances sharing a similar structure yet having different values at local nodes. Each instance is associated with a global state which indicates the occurrence of an event. The objective is to uncover a small set… ▽ More

    Submitted 18 December, 2015; originally announced December 2015.

    Comments: manuscript for the ECML 2014 paper