Zum Hauptinhalt springen

Showing 51–100 of 111 results for author: Ray, B

.
  1. arXiv:2112.10893  [pdf, other

    cs.SE cs.LG

    VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements

    Authors: Yangruibo Ding, Sahil Suneja, Yunhui Zheng, Jim Laredo, Alessandro Morari, Gail Kaiser, Baishakhi Ray

    Abstract: Automatically locating vulnerable statements in source code is crucial to assure software security and alleviate developers' debugging efforts. This becomes even more important in today's software ecosystem, where vulnerable code can flow easily and unwittingly within and across software repositories like GitHub. Across such millions of lines of code, traditional static and dynamic approaches stru… ▽ More

    Submitted 12 January, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: Camera Ready for Research Track of 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2022)

  2. arXiv:2112.00964  [pdf, other

    cs.SE cs.AI cs.LG cs.RO

    A Survey on Scenario-Based Testing for Automated Driving Systems in High-Fidelity Simulation

    Authors: Ziyuan Zhong, Yun Tang, Yuan Zhou, Vania de Oliveira Neves, Yang Liu, Baishakhi Ray

    Abstract: Automated Driving Systems (ADSs) have seen rapid progress in recent years. To ensure the safety and reliability of these systems, extensive testings are being conducted before their future mass deployment. Testing the system on the road is the closest to real-world and desirable approach, but it is incredibly costly. Also, it is infeasible to cover rare corner cases using such real-world testing.… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

  3. arXiv:2110.03868  [pdf, other

    cs.PL cs.AI cs.LG cs.SE

    Towards Learning (Dis)-Similarity of Source Code from Program Contrasts

    Authors: Yangruibo Ding, Luca Buratti, Saurabh Pujar, Alessandro Morari, Baishakhi Ray, Saikat Chakraborty

    Abstract: Understanding the functional (dis)-similarity of source code is significant for code modeling tasks such as software vulnerability and code clone detection. We present DISCO(DIS-similarity of COde), a novel self-supervised model focusing on identifying (dis)similar functionalities of source code. Different from existing works, our approach does not require a huge amount of randomly collected datas… ▽ More

    Submitted 20 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: ACL 2022 Camera-Ready

  4. arXiv:2109.06404  [pdf, other

    cs.RO cs.AI cs.LG cs.SE

    Detecting Multi-Sensor Fusion Errors in Advanced Driver-Assistance Systems

    Authors: Ziyuan Zhong, Zhisheng Hu, Shengjian Guo, Xinyang Zhang, Zhenyu Zhong, Baishakhi Ray

    Abstract: Advanced Driver-Assistance Systems (ADAS) have been thriving and widely deployed in recent years. In general, these systems receive sensor data, compute driving decisions, and output control signals to the vehicles. To smooth out the uncertainties brought by sensor outputs, they usually leverage multi-sensor fusion (MSF) to fuse the sensor outputs and produce a more reliable understanding of the s… ▽ More

    Submitted 25 May, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

  5. arXiv:2109.06126  [pdf, other

    cs.SE cs.LG cs.NE cs.RO

    Neural Network Guided Evolutionary Fuzzing for Finding Traffic Violations of Autonomous Vehicles

    Authors: Ziyuan Zhong, Gail Kaiser, Baishakhi Ray

    Abstract: Self-driving cars and trucks, autonomous vehicles (AVs), should not be accepted by regulatory bodies and the public until they have much higher confidence in their safety and reliability -- which can most practically and convincingly be achieved by testing. But existing testing methods are inadequate for checking the end-to-end behaviors of AV controllers against complex, real-world corner cases i… ▽ More

    Submitted 21 July, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

  6. arXiv:2108.11601  [pdf, other

    cs.SE cs.CL

    Retrieval Augmented Code Generation and Summarization

    Authors: Md Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang

    Abstract: Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting them. To mimic developers' code or summary generation behavior, we propose a retrieval augmented framework, REDCODER, that retrieves relevant code or s… ▽ More

    Submitted 10 September, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

    Comments: accepted in EMNLP-Findings 2021

  7. arXiv:2108.06645  [pdf, other

    cs.SE cs.LG cs.PL

    On Multi-Modal Learning of Editing Source Code

    Authors: Saikat Chakraborty, Baishakhi Ray

    Abstract: In recent years, Neural Machine Translator (NMT) has shown promise in automatically editing source code. Typical NMT based code editor only considers the code that needs to be changed as input and suggests developers with a ranked list of patched code to choose from - where the correct one may not always be at the top of the list. While NMT based code editing systems generate a broad spectrum of p… ▽ More

    Submitted 14 August, 2021; originally announced August 2021.

    Comments: Accepted for publication in 36th IEEE/ACM conference on Automated Software Engineering (ASE-2021)

  8. arXiv:2103.06333  [pdf, other

    cs.CL cs.PL

    Unified Pre-training for Program Understanding and Generation

    Authors: Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang

    Abstract: Code summarization and generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collect… ▽ More

    Submitted 10 April, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: NAACL 2021 (camera ready)

  9. arXiv:2012.08680  [pdf, other

    cs.CR cs.LG cs.SE

    Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity

    Authors: Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, Baishakhi Ray

    Abstract: Detecting semantically similar functions -- a crucial analysis capability with broad real-world security usages including vulnerability detection, malware lineage, and forensics -- requires understanding function behaviors and intentions. This task is challenging as semantically similar functions can be implemented differently, run on different architectures, and compiled with diverse compiler opt… ▽ More

    Submitted 26 April, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

  10. arXiv:2010.06080  [pdf, other

    stat.ML cs.LG q-bio.QM

    Point Process Modeling of Drug Overdoses with Heterogeneous and Missing Data

    Authors: Xueying Liu, Jeremy Carter, Brad Ray, George Mohler

    Abstract: Opioid overdose rates have increased in the United States over the past decade and reflect a major public health crisis. Modeling and prediction of drug and opioid hotspots, where a high percentage of events fall in a small percentage of space-time, could help better focus limited social and health services. In this work we present a spatial-temporal point process model for drug overdose clusterin… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  11. arXiv:2010.06061  [pdf, other

    cs.SE eess.SY

    CADET: Debugging and Fixing Misconfigurations using Counterfactual Reasoning

    Authors: Rahul Krishna, Md Shahriar Iqbal, Mohammad Ali Javidian, Baishakhi Ray, Pooyan Jamshidi

    Abstract: Modern computing platforms are highly-configurable with thousands of interacting configurations. However, configuring these systems is challenging. Erroneous configurations can cause unexpected non-functional faults. This paper proposes CADET (short for Causal Debugging Toolkit) that enables users to identify, explain, and fix the root cause of non-functional faults early and in a principled fashi… ▽ More

    Submitted 8 March, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

  12. arXiv:2010.04821  [pdf, other

    cs.SE cs.AI cs.CV cs.LG

    Understanding Local Robustness of Deep Neural Networks under Natural Variations

    Authors: Ziyuan Zhong, Yuchi Tian, Baishakhi Ray

    Abstract: Deep Neural Networks (DNNs) are being deployed in a wide range of settings today, from safety-critical applications like autonomous driving to commercial applications involving image classifications. However, recent research has shown that DNNs can be brittle to even slight variations of the input data. Therefore, rigorous testing of DNNs has gained widespread attention. While DNN robustness und… ▽ More

    Submitted 22 January, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

  13. arXiv:2009.11658  [pdf

    physics.flu-dyn

    Effects of surface topography on low Reynolds number droplet/bubble flow through constricted passage

    Authors: Aditya Singla, Bahni Ray

    Abstract: This paper is an attempt to study the effects of surface topography on the flow of a droplet (or a bubble) in a low Reynolds number flow regime. Multiphase flows through a constricted passage find many interesting applications in chemistry and biology. The main parameters which determine the flow properties such as flow rate and pressure drop, and govern the complex multiphase phenomena such as dr… ▽ More

    Submitted 28 November, 2020; v1 submitted 24 September, 2020; originally announced September 2020.

    Comments: 29 pages, 21 figures

  14. arXiv:2009.08525  [pdf, other

    cs.SE cs.AI cs.LG

    Deep Learning & Software Engineering: State of Research and Future Directions

    Authors: Prem Devanbu, Matthew Dwyer, Sebastian Elbaum, Michael Lowry, Kevin Moran, Denys Poshyvanyk, Baishakhi Ray, Rishabh Singh, Xiangyu Zhang

    Abstract: Given the current transformative potential of research that sits at the intersection of Deep Learning (DL) and Software Engineering (SE), an NSF-sponsored community workshop was conducted in co-location with the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE'19) in San Diego, California. The goal of this workshop was to outline high priority areas for cross-cutting r… ▽ More

    Submitted 17 September, 2020; originally announced September 2020.

    Comments: Community Report from the 2019 NSF Workshop on Deep Learning & Software Engineering, 37 pages

  15. arXiv:2009.07235  [pdf, other

    cs.SE

    Deep Learning based Vulnerability Detection: Are We There Yet?

    Authors: Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, Baishakhi Ray

    Abstract: Automated detection of software vulnerabilities is a fundamental problem in software security. Existing program analysis techniques either suffer from high false positives or false negatives. Recent progress in Deep Learning (DL) has resulted in a surge of interest in applying DL for automated vulnerability detection. Several recent studies have demonstrated promising results achieving an accuracy… ▽ More

    Submitted 3 September, 2020; originally announced September 2020.

    Comments: Under Review IEEE Transactions on Software Engineering

  16. arXiv:2008.10707  [pdf, other

    cs.SE cs.LG cs.PL

    Patching as Translation: the Data and the Metaphor

    Authors: Yangruibo Ding, Baishakhi Ray, Premkumar Devanbu, Vincent J. Hellendoorn

    Abstract: Machine Learning models from other fields, like Computational Linguistics, have been transplanted to Software Engineering tasks, often quite successfully. Yet a transplanted model's initial success at a given task does not necessarily mean it is well-suited for the task. In this work, we examine a common example of this phenomenon: the conceit that "software patching is like language translation".… ▽ More

    Submitted 31 August, 2020; v1 submitted 24 August, 2020; originally announced August 2020.

  17. arXiv:2008.07779  [pdf, other

    cs.LG stat.ML

    Predicting Future Sales of Retail Products using Machine Learning

    Authors: Devendra Swami, Alay Dilipbhai Shah, Subhrajeet K B Ray

    Abstract: Techniques for making future predictions based upon the present and past data, has always been an area with direct application to various real life problems. We are discussing a similar problem in this paper. The problem statement is provided by Kaggle, which also serves as an ongoing competition on the Kaggle platform. In this project, we worked with a challenging time-series dataset consisting o… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Comments: 6 pages, 4 images

  18. Effect of doping on SGS and weak half-metallic properties of inverse Heusler Alloys

    Authors: R. Dhakal, S. Nepal, R. B. Ray, R. Paudel, G. C. Kaphle

    Abstract: Heusler alloys with Mn and Co have been found to exhibit interesting electronic and magnetic properties. Mn$_2$CoAl is well known SGS compound while Mn$_2$CoGa has weak half metallic character. By using plane wave pseudo-potential method, we studied the effect of Fe and Cr doping on half-metalicity and magnetism of these compounds. The doping destroys the SGS nature of Mn$_2$CoAl while the small-s… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Journal ref: Journal of Magnetism and Magnetic Materials 503 (2020), 166588

  19. arXiv:2007.07236  [pdf, other

    cs.CV cs.CR cs.LG

    Multitask Learning Strengthens Adversarial Robustness

    Authors: Chengzhi Mao, Amogh Gupta, Vikram Nitin, Baishakhi Ray, Shuran Song, Junfeng Yang, Carl Vondrick

    Abstract: Although deep networks achieve strong accuracy on a range of computer vision benchmarks, they remain vulnerable to adversarial attacks, where imperceptible input perturbations fool the network. We present both theoretical and empirical analyses that connect the adversarial robustness of a model to the number of tasks that it is trained on. Experiments on two datasets show that attack difficulty in… ▽ More

    Submitted 10 September, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

  20. MTFuzz: Fuzzing with a Multi-Task Neural Network

    Authors: Dongdong She, Rahul Krishna, Lu Yan, Suman Jana, Baishakhi Ray

    Abstract: Fuzzing is a widely used technique for detecting software bugs and vulnerabilities. Most popular fuzzers generate new inputs using an evolutionary search to maximize code coverage. Essentially, these fuzzers start with a set of seed inputs, mutate them to generate new inputs, and identify the promising inputs using an evolutionary fitness function for further mutation. Despite their success, evolu… ▽ More

    Submitted 11 September, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

    Comments: ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 2020

  21. arXiv:2005.11498  [pdf, other

    cs.SE cs.LG

    Pythia: Grammar-Based Fuzzing of REST APIs with Coverage-guided Feedback and Learning-based Mutations

    Authors: Vaggelis Atlidakis, Roxana Geambasu, Patrice Godefroid, Marina Polishchuk, Baishakhi Ray

    Abstract: This paper introduces Pythia, the first fuzzer that augments grammar-based fuzzing with coverage-guided feedback and a learning-based mutation strategy for stateful REST API fuzzing. Pythia uses a statistical model to learn common usage patterns of a target REST API from structurally valid seed inputs. It then generates learning-based mutations by injecting a small amount of noise deviating from c… ▽ More

    Submitted 23 May, 2020; originally announced May 2020.

  22. arXiv:2005.00653  [pdf, ps, other

    cs.SE cs.AI cs.LG stat.ML

    A Transformer-based Approach for Source Code Summarization

    Authors: Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang

    Abstract: Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their long-range dependencies is crucial. To learn code representation for summarization, we explore the Transformer model that uses a self-attention mechanism and has shown… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: This paper is accepted at ACL2020

  23. arXiv:1911.07393  [pdf

    cs.SE cs.PL

    Rebuttal to Berger et al., TOPLAS 2019

    Authors: Baishakhi Ray, Prem Devanbu, Vladimir Filkov

    Abstract: Berger et al., published in TOPLAS 2019, is a critique of our 2014 FSE conference abstract and its archival version, the 2017 CACM paper: A Large-Scale Study of Programming Languages and Code Quality in Github. In their paper Berger et al. make academic claims about the veracity of our work. Here, we respond to their technical and scientific critiques aimed at our work, attempting to stick with sc… ▽ More

    Submitted 17 November, 2019; originally announced November 2019.

    Comments: 12 pages

  24. Towards the Avoidance of Counterfeit Memory: Identifying the DRAM Origin

    Authors: B. M. S. Bahar Talukder, Vineetha Menon, Biswajit Ray, Tempestt Neal, Md Tauhidur Rahman

    Abstract: Due to the globalization in the semiconductor supply chain, counterfeit dynamic random-access memory (DRAM) chips/modules have been spreading worldwide at an alarming rate. Deploying counterfeit DRAM modules into an electronic system can have severe consequences on security and reliability domains because of their sub-standard quality, poor performance, and shorter life span. Besides, studies sugg… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Journal ref: IEEE Hardware-Oriented Security and Trust Symposium (HOST), 2020

  25. arXiv:1910.09644  [pdf, other

    eess.SY cs.LG cs.SE

    ConEx: Efficient Exploration of Big-Data System Configurations for Better Performance

    Authors: Rahul Krishna, Chong Tang, Kevin Sullivan, Baishakhi Ray

    Abstract: Configuration space complexity makes the big-data software systems hard to configure well. Consider Hadoop, with over nine hundred parameters, developers often just use the default configurations provided with Hadoop distributions. The opportunity costs in lost performance are significant. Popular learning-based approaches to auto-tune software does not scale well for big-data systems because of t… ▽ More

    Submitted 22 June, 2020; v1 submitted 17 October, 2019; originally announced October 2019.

  26. arXiv:1910.02354  [pdf, other

    cs.CV cs.LG eess.IV

    AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation

    Authors: Guangyu Shen, Chengzhi Mao, Junfeng Yang, Baishakhi Ray

    Abstract: Due to the inherent robustness of segmentation models, traditional norm-bounded attack methods show limited effect on such type of models. In this paper, we focus on generating unrestricted adversarial examples for semantic segmentation models. We demonstrate a simple and effective method to generate unrestricted adversarial examples using conditional generative adversarial networks (CGAN) without… ▽ More

    Submitted 18 November, 2019; v1 submitted 5 October, 2019; originally announced October 2019.

  27. arXiv:1909.00900  [pdf, other

    cs.LG cs.CR cs.CV cs.IR stat.ML

    Metric Learning for Adversarial Robustness

    Authors: Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, Baishakhi Ray

    Abstract: Deep networks are well-known to be fragile to adversarial attacks. We conduct an empirical analysis of deep representations under the state-of-the-art attack method called PGD, and find that the attack causes the internal representation to shift closer to the "false" class. Motivated by this observation, we propose to regularize the representation space under attack with metric learning to produce… ▽ More

    Submitted 27 October, 2019; v1 submitted 2 September, 2019; originally announced September 2019.

  28. Coexisting 1T/2H polymorphs, reentrant resistivity behavior, and charge distribution in MoS2-hBN 2D/2D composite thin films

    Authors: Swati Parmar, Abhijit Biswas, Sachin Kumar Singh, Bishakha Ray, Saurabh Parmar, Suresh Gosavi, Vasant Sathe, Ram Janay Choudhary, Suwarna Datar, Satishchandra Ogale

    Abstract: In view of their immensely intriguing properties, two dimensional materials are being intensely researched in search of novel phenomena and diverse application interests, however, studies on the realization of nanocomposites in the application-worthy thin-film platform are rare. Here we have grown MoS2-hBN composite thin films on different substrates by the pulsed laser deposition technique and ma… ▽ More

    Submitted 31 July, 2019; originally announced July 2019.

    Comments: 9 Figures, Published in Physical Review Materials

    Journal ref: Phys. Rev. Materials 3, 074007 (2019)

  29. Symmetry-breaking signatures of multiple Majorana zero modes in one-dimensional spin-triplet superconductors

    Authors: Arnab Barman Ray, Jay D. Sau, Ipsita Mandal

    Abstract: We study the effects of various symmetry-breaking perturbations on the experimentally measurable signatures (such as conductance and Josephson response) of quasi-one-dimensional (quasi-1D) spin-triplet superconductors. In the first part of the paper, we numerically compute the zero and nonzero temperature conductances of the quasi-1D nanowires that host multiple Majorana zero modes. Following the… ▽ More

    Submitted 30 September, 2021; v1 submitted 24 July, 2019; originally announced July 2019.

    Comments: journal version published in PRB

    Journal ref: Phys. Rev. B 104, 104513 (2021)

  30. arXiv:1907.03756  [pdf, other

    cs.CR

    Neutaint: Efficient Dynamic Taint Analysis with Neural Networks

    Authors: Dongdong She, Yizheng Chen, Abhishek Shah, Baishakhi Ray, Suman Jana

    Abstract: Dynamic taint analysis (DTA) is widely used by various applications to track information flow during runtime execution. Existing DTA techniques use rule-based taint-propagation, which is neither accurate (i.e., high false positive) nor efficient (i.e., large runtime overhead). It is hard to specify taint rules for each operation while covering all corner cases correctly. Moreover, the overtaint an… ▽ More

    Submitted 3 September, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

    Comments: To appear in the 41th IEEE Symposium on Security and Privacy, May 18--20, 2020, San Francisco, CA, USA

  31. arXiv:1905.07831  [pdf, other

    cs.SE cs.CV cs.LG

    Testing DNN Image Classifiers for Confusion & Bias Errors

    Authors: Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Gail Kaiser, Baishakhi Ray

    Abstract: Image classifiers are an important component of today's software, from consumer and business applications to safety-critical domains. The advent of Deep Neural Networks (DNNs) is the key catalyst behind such wide-spread success. However, wide adoption comes with serious concerns about the robustness of software systems dependent on DNNs for image classification, as several severe erroneous behavio… ▽ More

    Submitted 11 February, 2020; v1 submitted 19 May, 2019; originally announced May 2019.

  32. arXiv:1811.09862  [pdf, other

    cs.LG cs.CV stat.ML

    On Periodic Functions as Regularizers for Quantization of Neural Networks

    Authors: Maxim Naumov, Utku Diril, Jongsoo Park, Benjamin Ray, Jedrzej Jablonski, Andrew Tulloch

    Abstract: Deep learning models have been successfully used in computer vision and many other fields. We propose an unorthodox algorithm for performing quantization of the model parameters. In contrast with popular quantization schemes based on thresholds, we use a novel technique based on periodic functions, such as continuous trigonometric sine or cosine as well as non-continuous hat functions. We apply th… ▽ More

    Submitted 24 November, 2018; originally announced November 2018.

    Comments: 11 pages, 7 figures

    MSC Class: 68T05 ACM Class: I.2.6; I.5.0

  33. CODIT: Code Editing with Tree-Based Neural Models

    Authors: Saikat Chakraborty, Yangruibo Ding, Miltiadis Allamanis, Baishakhi Ray

    Abstract: The way developers edit day-to-day code tends to be repetitive, often using existing code elements. Many researchers have tried to automate repetitive code changes by learning from specific change templates which are applied to limited scope. The advancement of deep neural networks and the availability of vast open-source evolutionary data opens up the possibility of automatically learning those t… ▽ More

    Submitted 25 August, 2020; v1 submitted 30 September, 2018; originally announced October 2018.

    Report number: 9181462

    Journal ref: IEEE Transaction of Software Engineering - 2022, Volume 48, Number 4

  34. arXiv:1809.08520  [pdf

    physics.ins-det

    State-of-the-Art Flash Chips for Dosimetry Applications

    Authors: Preeti Kumari, Levi Davies, Narayana P. Bhat, En Xia Zhang, Michael W. McCurdy, Daniel M. Fleetwood, Biswajit Ray

    Abstract: In this paper we show that state-of-the-art commercial off-the-shelf Flash memory chip technology (20 nm technology node with multi-level cells) is quite sensitive to ionizing radiation. We find that the fail-bit count in these Flash chips starts to increase monotonically with gamma or X-ray dose at 100 rad(SiO2). Significantly more fail bits are observed in X-ray irradiated devices, most likely d… ▽ More

    Submitted 22 September, 2018; originally announced September 2018.

  35. arXiv:1808.02911  [pdf, other

    cs.SE cs.IR

    A Case Study on the Impact of Similarity Measure on Information Retrieval based Software Engineering Tasks

    Authors: Md Masudur Rahman, Saikat Chakraborty, Gail Kaiser, Baishakhi Ray

    Abstract: Information Retrieval (IR) plays a pivotal role in diverse Software Engineering (SE) tasks, e.g., bug localization and triaging, code retrieval, requirements analysis, etc. The choice of similarity measure is the core component of an IR technique. The performance of any IR method critically depends on selecting an appropriate similarity measure for the given application domain. Since different SE… ▽ More

    Submitted 8 August, 2018; originally announced August 2018.

    Comments: 22 pages, on submission

  36. PreLatPUF: Exploiting DRAM Latency Variations for Generating Robust Device Signatures

    Authors: B. M. S. Bahar Talukder, Biswajit Ray, Domenic Forte, Md Tauhidur Rahman

    Abstract: Physically Unclonable Functions (PUFs) are potential security blocks to generate unique and more secure keys in low-cost cryptographic applications. Dynamic random-access memory (DRAM) has been proposed as one of the promising candidates for generating robust keys. Unfortunately, the existing techniques of generating device signatures from DRAM is very slow, destructive (destroy the current data),… ▽ More

    Submitted 31 July, 2019; v1 submitted 7 August, 2018; originally announced August 2018.

    Journal ref: IEEE Access, vol. 7, pp. 81106-81120, 2019

  37. Exploiting DRAM Latency Variations for Generating True Random Numbers

    Authors: B. M. S. Bahar Talukder, Joseph Kerns, Biswajit Ray, Thomas Morris, Md Tauhidur Rahman

    Abstract: True random number generator (TRNG) plays a vital role in a variety of security applications and protocols. The security and privacy of an asset rely on the encryption, which solely depends on the quality of random numbers. Memory chips are widely used for generating random numbers because of their prevalence in modern electronic systems. Unfortunately, existing Dynamic Random-access Memory (DRAM)… ▽ More

    Submitted 7 November, 2018; v1 submitted 6 August, 2018; originally announced August 2018.

  38. arXiv:1807.05620  [pdf, other

    cs.CR cs.LG

    NEUZZ: Efficient Fuzzing with Neural Program Smoothing

    Authors: Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, Suman Jana

    Abstract: Fuzzing has become the de facto standard technique for finding software vulnerabilities. However, even state-of-the-art fuzzers are not very efficient at finding hard-to-trigger software bugs. Most popular fuzzers use evolutionary guidance to generate inputs that can trigger different bugs. Such evolutionary algorithms, while fast and simple to implement, often get stuck in fruitless sequences of… ▽ More

    Submitted 12 July, 2019; v1 submitted 15 July, 2018; originally announced July 2018.

    Comments: To appear in the 40th IEEE Symposium on Security and Privacy, May 20--22, 2019, San Francisco, CA, USA

  39. arXiv:1806.02432  [pdf, other

    cs.SE cs.CR

    Obfuscation Resilient Search through Executable Classification

    Authors: Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, Baishakhi Ray

    Abstract: Android applications are usually obfuscated before release, making it difficult to analyze them for malware presence or intellectual property violations. Obfuscators might hide the true intent of code by renaming variables and/or modifying program structures. It is challenging to search for executables relevant to an obfuscated application for developers to analyze efficiently. Prior approaches to… ▽ More

    Submitted 11 June, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

    Comments: MAPL, 2018 (Workshop co-located with PLDI 2018)

  40. arXiv:1805.04836  [pdf, other

    cs.CL

    Building Language Models for Text with Named Entities

    Authors: Md Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang

    Abstract: Text in many domains involves a significant amount of named entities. Predict- ing the entity names is often challenging for a language model as they appear less frequent on the training corpus. In this paper, we propose a novel and effective approach to building a discriminative language model which can learn the entity names by leveraging their entity type information. We also introduce two benc… ▽ More

    Submitted 13 May, 2018; originally announced May 2018.

  41. arXiv:1803.08612  [pdf, ps, other

    cs.SE cs.IR

    Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval

    Authors: Md Masudur Rahman, Jed Barson, Sydney Paul, Joshua Kayan, Federico Andres Lois, Sebastian Fernandez Quezada, Christopher Parnin, Kathryn T. Stolee, Baishakhi Ray

    Abstract: Search is an integral part of a software development process. Developers often use search engines to look for information during development, including reusable code snippets, API understanding, and reference examples. Developers tend to prefer general-purpose search engines like Google, which are often not optimized for code related documents and use search strategies and ranking techniques that… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

    Comments: Accepted at MSR-2018

  42. arXiv:1802.06947  [pdf, other

    cs.SE

    Entropy Guided Spectrum Based Bug Localization Using Statistical Language Model

    Authors: Saikat Chakraborty, Yujian Li, Matt Irvine, Ripon Saha, Baishakhi Ray

    Abstract: Locating bugs is challenging but one of the most important activities in software development and maintenance phase because there are no certain rules to identify all types of bugs. Existing automatic bug localization tools use various heuristics based on test coverage, pre-determined buggy patterns, or textual similarity with bug report, to rank suspicious program elements. However, since these t… ▽ More

    Submitted 19 February, 2018; originally announced February 2018.

    Comments: 13 pages

  43. arXiv:1712.04982  [pdf, ps, other

    cs.LO

    Interpreted Formalisms for Configurations

    Authors: Chong Tang, Kevin Sullivan, Jian Xiang, Trent Weiss, Baishakhi Ray

    Abstract: Imprecise and incomplete specification of system \textit{configurations} threatens safety, security, functionality, and other critical system properties and uselessly enlarges the configuration spaces to be searched by configuration engineers and auto-tuners. To address these problems, this paper introduces \textit{interpreted formalisms based on real-world types for configurations}. Configuration… ▽ More

    Submitted 15 December, 2017; v1 submitted 13 December, 2017; originally announced December 2017.

  44. arXiv:1708.08559  [pdf, other

    cs.SE cs.AI cs.LG

    DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars

    Authors: Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray

    Abstract: Recent advances in Deep Neural Networks (DNNs) have led to the development of DNN-driven autonomous cars that, using sensors like camera, LiDAR, etc., can drive without any human intervention. Most major manufacturers including Tesla, GM, Ford, BMW, and Waymo/Google are working on building and testing different types of autonomous vehicles. The lawmakers of several US states including California,… ▽ More

    Submitted 20 March, 2018; v1 submitted 28 August, 2017; originally announced August 2017.

  45. arXiv:1707.04947  [pdf, other

    physics.flu-dyn

    Pressure Drop and Flow development in the Entrance Region of Micro-Channels with Second Order Slip Boundary Conditions and the Requirement for Development Length

    Authors: Baibhab Ray, Franz Durst, Subhashis Ray

    Abstract: In the present investigation, the development of axial velocity profile, the requirement for development length ($L^*_{fd}=L/D_{h}$) and the pressure drop in the entrance region of circular and parallel plate micro-channels have been critically analysed for a large range of operating conditions ($10^{-2}\le Re\le 10^{4}$, $10^{-4}\le Kn\le 0.2$ and $0\le C_2\le 0.5$). For this purpose, the convent… ▽ More

    Submitted 10 June, 2018; v1 submitted 16 July, 2017; originally announced July 2017.

  46. arXiv:1703.00397  [pdf, ps, other

    cs.IR

    Combating the Cold Start User Problem in Model Based Collaborative Filtering

    Authors: Sampoorna Biswas, Laks V. S. Lakshmanan, Senjuti Basu Ray

    Abstract: For tackling the well known cold-start user problem in model-based recommender systems, one approach is to recommend a few items to a cold-start user and use the feedback to learn a profile. The learned profile can then be used to make good recommendations to the cold user. In the absence of a good initial profile, the recommendations are like random probes, but if not chosen judiciously, both bad… ▽ More

    Submitted 17 February, 2017; originally announced March 2017.

  47. arXiv:1603.04906  [pdf

    q-bio.NC

    Evaluation and Ensembling of Methods for Reverse Engineering of Brain Connectivity from Imaging Data

    Authors: Bisakha Ray, Alexander V. Alekseyenko, Sisi Ma, Alexander Statnikov, Constantin Aliferis

    Abstract: Brain science is an evolving research area inviting great enthusiasm with its potential for providing insights and thereby, preventing, and treating multiple neuronal disorders affecting millions of patients. Discovery of relationships, such as brain connectivity, is a major goal in basic, translational, and clinical science. Algorithms for causal discovery are used in diverse fields for tackling… ▽ More

    Submitted 15 March, 2016; originally announced March 2016.

  48. arXiv:1506.01159  [pdf, other

    cs.SE

    On the "Naturalness" of Buggy Code

    Authors: Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, Premkumar Devanbu

    Abstract: Real software, the kind working programmers produce by the kLOC to solve real-world problems, tends to be "natural", like speech or natural language; it tends to be highly repetitive and predictable. Researchers have captured this naturalness of software through statistical models and used them to good effect in suggestion engines, porting tools, coding standards checkers, and idiom miners. This s… ▽ More

    Submitted 10 September, 2015; v1 submitted 3 June, 2015; originally announced June 2015.

    Comments: 12 pages

    MSC Class: 68N30

  49. Design & Implementation Approach for Error Free Clinical Data Repository for the Medical Practitioners

    Authors: Kisor Ray, Santanu Ghosh, Mridul Das, Bhaswati Ray

    Abstract: The modern treatment of any disease is heavily dependent on the medical diagnosis. Clinical data obtained through the diagnostics tests need to be collected and entered into the computer database in order to make a clinical data repository. In most of the cases, manual entry is an absolute necessity. However, manual entry can cause errors also, leading to wrong diagnosis. This paper explains how d… ▽ More

    Submitted 30 March, 2015; originally announced March 2015.

    Comments: 04 pages, 04 Figures, International Journal of Computer Trends and Technology, Volume-21 Number-2,2015, ISSN 2231-2803

    ACM Class: H.4.0

  50. arXiv:1411.1566  [pdf

    cond-mat.mtrl-sci

    Effect of thermal and cryogenic conditioning on flexural behavior of thermally shocked Cu-Al2O3 micro- and nano-composites

    Authors: Khushbu Dash, Sujata Panda, Bankim Chandra Ray

    Abstract: This investigation has used flexural test to explore the effects of thermal treatments, i.e., high-temperature and cryogenic environments on the mechanical property of alumina particulate-reinforced Cu metal matrix micro and nanocomposites in ex-situ and in-situ conditions. Cu-5 vol. pct alumina micro (10 micron)- and nanocomposites (<50 nm) fabricated by powder metallurgy route were subjected to… ▽ More

    Submitted 6 November, 2014; originally announced November 2014.

    Journal ref: Metallurgical and Materials Transactions A, Volume 45, Issue 3 , pp 1567-1578, 2014