Zum Hauptinhalt springen

Showing 1–20 of 20 results for author: Syed, U

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.08302  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors

    Authors: Usman Syed, Ethan Light, Xingang Guo, Huan Zhang, Lianhui Qin, Yanfeng Ouyang, Bin Hu

    Abstract: In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, GPT-4o, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, Llama 3, and Llama 3.1 in solving some selected undergraduate-level transportation engineering problems. We introduce TransportBench, a benchmark dataset that includes a sample of transportation engineering problems on a wide range of… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  2. arXiv:2407.12108  [pdf, other

    cs.LG cs.CL cs.CR

    Private prediction for large-scale synthetic text generation

    Authors: Kareem Amin, Alex Bie, Weiwei Kong, Alexey Kurakin, Natalia Ponomareva, Umar Syed, Andreas Terzis, Sergei Vassilvitskii

    Abstract: We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that train a generative model on potentially sensitive user-supplied source data and seek to ensure the mod… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 12 pages main text + 15 pages appendix

  3. arXiv:2404.03647  [pdf, other

    math.OC cs.AI cs.LG

    Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra

    Authors: Darioush Kevian, Usman Syed, Xingang Guo, Aaron Havens, Geir Dullerud, Peter Seiler, Lianhui Qin, Bin Hu

    Abstract: In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra in solving undergraduate-level control problems. Controls provides an interesting case study for LLM reasoning due to its combination of mathematical theory and engineering design. We introduce ControlBench, a benchmark dataset tailored to reflect the bread… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  4. arXiv:2403.15607  [pdf, other

    cs.CR cs.CY

    Assessing Web Fingerprinting Risk

    Authors: Enrico Bacis, Igor Bilogrevic, Robert Busa-Fekete, Asanka Herath, Antonio Sartori, Umar Syed

    Abstract: Modern Web APIs allow developers to provide extensively customized experiences for website visitors, but the richness of the device information they provide also make them vulnerable to being abused to construct browser fingerprints, device-specific identifiers that enable covert tracking of users even when cookies are disabled. Previous research has established entropy, a measure of information… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: A version of this report to appear in the proceedings of The Web Conference (WWW) 2024. This version contains additional material in the appendix

  5. arXiv:2307.05608  [pdf, other

    cs.CR

    DP-Auditorium: a Large Scale Library for Auditing Differential Privacy

    Authors: William Kong, Andrés Muñoz Medina, Mónica Ribero, Umar Syed

    Abstract: New regulations and increased awareness of data privacy have led to the deployment of new and more efficient differentially private mechanisms across public institutions and industries. Ensuring the correctness of these mechanisms is therefore crucial to ensure the proper protection of data. However, since differential privacy is a property of the mechanism itself, and not of an individual output,… ▽ More

    Submitted 18 December, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

  6. arXiv:2306.01684  [pdf, other

    cs.LG cs.CR

    Harnessing large-language models to generate private synthetic text

    Authors: Alexey Kurakin, Natalia Ponomareva, Umar Syed, Liam MacDermed, Andreas Terzis

    Abstract: Differentially private training algorithms like DP-SGD protect sensitive training data by ensuring that trained models do not reveal private information. An alternative approach, which this paper studies, is to use a sensitive dataset to generate synthetic data that is differentially private with respect to the original data, and then non-privately training a model on the synthetic data. Doing so… ▽ More

    Submitted 10 January, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: 31 pages; 7 figures; compared to previous version added result of LoRa-finetuning

  7. arXiv:2305.18585  [pdf, other

    cs.CL cs.AI

    Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models

    Authors: Pranath Reddy Kumbam, Sohaib Uddin Syed, Prashanth Thamminedi, Suhas Harish, Ian Perera, Bonnie J. Dorr

    Abstract: The advent of social media has given rise to numerous ethical challenges, with hate speech among the most significant concerns. Researchers are attempting to tackle this problem by leveraging hate-speech detection and employing language models to automatically moderate content and promote civil discourse. Unfortunately, recent studies have revealed that hate-speech detection systems can be misled… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  8. arXiv:2305.07751  [pdf, other

    cs.LG cs.CR cs.IT math.ST

    Private and Communication-Efficient Algorithms for Entropy Estimation

    Authors: Gecia Bravo-Hermsdorff, Róbert Busa-Fekete, Mohammad Ghavamzadeh, Andres Muñoz Medina, Umar Syed

    Abstract: Modern statistical estimation is often performed in a distributed setting where each sample belongs to a single user who shares their data with a central server. Users are typically concerned with preserving the privacy of their samples, and also with minimizing the amount of data they must transmit to the server. We give improved private and communication-efficient algorithms for estimating sever… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: Originally published at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). This version corrects some errors in the original version

  9. arXiv:2201.12306  [pdf, other

    cs.DS cs.CR cs.CY cs.DB stat.CO

    Statistical anonymity: Quantifying reidentification risks without reidentifying users

    Authors: Gecia Bravo-Hermsdorff, Robert Busa-Fekete, Lee M. Gunderson, Andrés Munõz Medina, Umar Syed

    Abstract: Data anonymization is an approach to privacy-preserving data release aimed at preventing participants reidentification, and it is an important alternative to differential privacy in applications that cannot tolerate noisy data. Existing algorithms for enforcing $k$-anonymity in the released data assume that the curator performing the anonymization has complete access to the original data. Reasons… ▽ More

    Submitted 28 January, 2022; originally announced January 2022.

  10. arXiv:2110.02159  [pdf, other

    cs.LG cs.CR cs.DS cs.IT

    Label differential privacy via clustering

    Authors: Hossein Esfandiari, Vahab Mirrokni, Umar Syed, Sergei Vassilvitskii

    Abstract: We present new mechanisms for \emph{label differential privacy}, a relaxation of differentially private machine learning that only protects the privacy of the labels in the training set. Our mechanisms cluster the examples in the training set using their (non-private) feature vectors, randomly re-sample each label from examples in the same cluster, and output a training set with noisy labels as we… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

  11. arXiv:2007.01181  [pdf, other

    cs.LG cs.CR stat.ML

    Private Optimization Without Constraint Violations

    Authors: Andrés Muñoz Medina, Umar Syed, Sergei Vassilvitskii, Ellen Vitercik

    Abstract: We study the problem of differentially private optimization with linear constraints when the right-hand-side of the constraints depends on private data. This type of problem appears in many applications, especially resource allocation. Previous research provided solutions that retained privacy but sometimes violated the constraints. In many settings, however, the constraints cannot be violated und… ▽ More

    Submitted 3 November, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

  12. arXiv:1906.06781  [pdf, ps, other

    cs.LG math.OC stat.ML

    Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

    Authors: Bin Hu, Usman Ahmed Syed

    Abstract: In this paper, we provide a unified analysis of temporal difference learning algorithms with linear function approximators by exploiting their connections to Markov jump linear systems (MJLS). We tailor the MJLS theory developed in the control community to characterize the exact behaviors of the first and second order moments of a large family of temporal difference learning algorithms. For both t… ▽ More

    Submitted 4 November, 2019; v1 submitted 16 June, 2019; originally announced June 2019.

    Comments: To appear in NeurIPS 2019

  13. arXiv:1703.03111  [pdf, other

    cs.GT cs.LG

    Statistical Cost Sharing

    Authors: Eric Balkanski, Umar Syed, Sergei Vassilvitskii

    Abstract: We study the cost sharing problem for cooperative games in situations where the cost function $C$ is not available via oracle queries, but must instead be derived from data, represented as tuples $(S, C(S))$, for different subsets $S$ of players. We formalize this approach, which we call statistical cost sharing, and consider the computation of the core and the Shapley value, when the tuples are d… ▽ More

    Submitted 8 March, 2017; originally announced March 2017.

  14. arXiv:1504.01117  [pdf, ps, other

    cs.DB

    An $\tilde{O}(\frac{1}{\sqrt{T}})$-error online algorithm for retrieving heavily perturbated statistical databases in the low-dimensional querying mode

    Authors: Krzysztof Choromanski, Afshin Rostamizadeh, Umar Syed

    Abstract: We give the first $\tilde{O}(\frac{1}{\sqrt{T}})$-error online algorithm for reconstructing noisy statistical databases, where $T$ is the number of (online) sample queries received. The algorithm, which requires only $O(\log T)$ memory, aims to learn a hidden database-vector $w^{*} \in \mathbb{R}^{D}$ in order to accurately answer a stream of queries regarding the hidden database, which arrive in… ▽ More

    Submitted 5 April, 2015; originally announced April 2015.

  15. arXiv:1407.8320  [pdf

    cs.SE

    An Implementation of Web Services for Inter-Connectivity of Information Systems

    Authors: Aftab Ahmed Chandio, Dingju Zhu, Ali Hassan Sodhro, Muhammad Umer Syed

    Abstract: As educational institutions and their departments rapidly increase, a communication between their end-users becomes more and more difficult in traditional online management systems (OMS). However, the end-users, i.e., employees, teaching staff, and students are associated to different sub-domains and using different subsystems that are executed on different platforms following different administra… ▽ More

    Submitted 31 July, 2014; originally announced July 2014.

    Comments: 7 pages, 5 figures, (Accepted for the Int. J. Com. Dig. Sys. Vol. 3, No. 3, ISSN. 2210-142X)

    Journal ref: International Journal of Computing and Digital Systems (IJCDS), Vol. 3, No. 3, pp. (2014)

  16. arXiv:1407.1466  [pdf

    cs.CY

    The Smart Shower

    Authors: Umair Atique Syed, Uma Kandan Muniandy

    Abstract: The smart shower is an intelligent device that saves the water during the shower. It uses the indicator lamps that inform the user of the amount of the water. Like the traffic signal it has three sets of lamps, green, yellow and red, each indicating the amount of time spent. This device brain is the Siemens Logo PLC.

    Submitted 6 July, 2014; originally announced July 2014.

    Comments: 2 Pages, 3 Figures

  17. arXiv:1311.6838  [pdf, ps, other

    cs.LG cs.GT

    Learning Prices for Repeated Auctions with Strategic Buyers

    Authors: Kareem Amin, Afshin Rostamizadeh, Umar Syed

    Abstract: Inspired by real-time ad exchanges for online display advertising, we consider the problem of inferring a buyer's value distribution for a good when the buyer is repeatedly interacting with a seller through a posted-price mechanism. We model the buyer as a strategic agent, whose goal is to maximize her long-term surplus, and we are interested in mechanisms that maximize the seller's long-term reve… ▽ More

    Submitted 26 November, 2013; originally announced November 2013.

    Comments: Neural Information Processing Systems (NIPS 2013)

  18. arXiv:1206.5290  [pdf

    cs.LG cs.AI stat.ML

    Imitation Learning with a Value-Based Prior

    Authors: Umar Syed, Robert E. Schapire

    Abstract: The goal of imitation learning is for an apprentice to learn how to behave in a stochastic environment by observing a mentor demonstrating the correct behavior. Accurate prior knowledge about the correct behavior can reduce the need for demonstrations from the mentor. We present a novel approach to encoding prior knowledge about the correct behavior, where we assume that this prior knowledge takes… ▽ More

    Submitted 20 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

    Report number: UAI-P-2007-PG-384-391

  19. arXiv:1202.3782  [pdf

    cs.LG cs.AI stat.ML

    Graphical Models for Bandit Problems

    Authors: Kareem Amin, Michael Kearns, Umar Syed

    Abstract: We introduce a rich class of graphical models for multi-armed bandit problems that permit both the state or context space and the action space to be very large, yet succinctly specify the payoffs for any context-action pair. Our main result is an algorithm for such models whose regret is bounded by the number of parameters and whose running time depends only on the treewidth of the graph substruct… ▽ More

    Submitted 14 February, 2012; originally announced February 2012.

    Report number: UAI-P-2011-PG-1-10

  20. arXiv:1007.3799  [pdf, ps, other

    cs.LG

    Adapting to the Shifting Intent of Search Queries

    Authors: Umar Syed, Aleksandrs Slivkins, Nina Mishra

    Abstract: Search engines today present results that are often oblivious to abrupt shifts in intent. For example, the query `independence day' usually refers to a US holiday, but the intent of this query abruptly changed during the release of a major film by that name. While no studies exactly quantify the magnitude of intent-shifting traffic, studies suggest that news events, seasonal topics, pop culture, e… ▽ More

    Submitted 22 July, 2010; originally announced July 2010.

    Comments: This is the full version of the paper in NIPS'09