Search | arXiv e-print repository

LLM-Based Open-Domain Integrated Task and Knowledge Assistants with Programmable Policies

Authors: Harshit Joshi, Shicheng Liu, James Chen, Robert Weigle, Monica S. Lam

Abstract: Programming LLM-based knowledge and task assistants that faithfully conform to developer-provided policies is challenging. These agents must retrieve and provide consistent, accurate, and relevant information to address user's queries and needs. Yet such agents generate unfounded responses ("hallucinate"). Traditional dialogue trees can only handle a limited number of conversation flows, making th… ▽ More Programming LLM-based knowledge and task assistants that faithfully conform to developer-provided policies is challenging. These agents must retrieve and provide consistent, accurate, and relevant information to address user's queries and needs. Yet such agents generate unfounded responses ("hallucinate"). Traditional dialogue trees can only handle a limited number of conversation flows, making them inherently brittle. To this end, we present KITA - a programmable framework for creating task-oriented conversational agents that are designed to handle complex user interactions. Unlike LLMs, KITA provides reliable grounded responses, with controllable agent policies through its expressive specification, KITA Worksheet. In contrast to dialog trees, it is resilient to diverse user queries, helpful with knowledge sources, and offers ease of programming policies through its declarative paradigm. Through a real-user study involving 62 participants, we show that KITA beats the GPT-4 with function calling baseline by 26.1, 22.5, and 52.4 points on execution accuracy, dialogue act accuracy, and goal completion rate, respectively. We also release 22 real-user conversations with KITA manually corrected to ensure accuracy. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: preprint

arXiv:2311.06212 [pdf, other]

Differentiable VQ-VAE's for Robust White Matter Streamline Encodings

Authors: Andrew Lizarraga, Brandon Taraku, Edouardo Honig, Ying Nian Wu, Shantanu H. Joshi

Abstract: Given the complex geometry of white matter streamlines, Autoencoders have been proposed as a dimension-reduction tool to simplify the analysis streamlines in a low-dimensional latent spaces. However, despite these recent successes, the majority of encoder architectures only perform dimension reduction on single streamlines as opposed to a full bundle of streamlines. This is a severe limitation of… ▽ More Given the complex geometry of white matter streamlines, Autoencoders have been proposed as a dimension-reduction tool to simplify the analysis streamlines in a low-dimensional latent spaces. However, despite these recent successes, the majority of encoder architectures only perform dimension reduction on single streamlines as opposed to a full bundle of streamlines. This is a severe limitation of the encoder architecture that completely disregards the global geometric structure of streamlines at the expense of individual fibers. Moreover, the latent space may not be well structured which leads to doubt into their interpretability. In this paper we propose a novel Differentiable Vector Quantized Variational Autoencoder, which are engineered to ingest entire bundles of streamlines as single data-point and provides reliable trustworthy encodings that can then be later used to analyze streamlines in the latent space. Comparisons with several state of the art Autoencoders demonstrate superior performance in both encoding and synthesis. △ Less

Submitted 18 November, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: 5 pages, 4 figures, 1 table

arXiv:2301.13779 [pdf, other]

FLAME: A small language model for spreadsheet formulas

Authors: Harshit Joshi, Abishai Ebenezer, José Cambronero, Sumit Gulwani, Aditya Kanade, Vu Le, Ivan Radiček, Gust Verbruggen

Abstract: Spreadsheets are a vital tool for end-user data management. Using large language models for formula authoring assistance in these environments can be difficult, as these models are expensive to train and challenging to deploy due to their size (up to billions of parameters). We present FLAME, a transformer-based model trained exclusively on Excel formulas that leverages domain insights to achieve… ▽ More Spreadsheets are a vital tool for end-user data management. Using large language models for formula authoring assistance in these environments can be difficult, as these models are expensive to train and challenging to deploy due to their size (up to billions of parameters). We present FLAME, a transformer-based model trained exclusively on Excel formulas that leverages domain insights to achieve competitive performance while being substantially smaller (60M parameters) and training on two orders of magnitude less data. We curate a training dataset using sketch deduplication, introduce an Excel-specific formula tokenizer, and use domain-specific versions of masked span prediction and noisy auto-encoding as pre-training objectives. We evaluate FLAME on formula repair, formula completion, and similarity-based formula retrieval. FLAME can outperform much larger models, such as the Davinci (175B) and Cushman (12B) variants of Codex and CodeT5 (220M), in 10 of 14 evaluation settings for the repair and completion tasks. For formula retrieval, FLAME outperforms CodeT5, CodeBERT, and GraphCodeBERT. △ Less

Submitted 19 December, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: Accepted to AAAI 2024

arXiv:2209.01498 [pdf, other]

StreamNet: A WAE for White Matter Streamline Analysis

Authors: Andrew Lizarraga, Katherine L. Narr, Kirsten A. Donald, Shantanu H. Joshi

Abstract: We present StreamNet, an autoencoder architecture for the analysis of the highly heterogeneous geometry of large collections of white matter streamlines. This proposed framework takes advantage of geometry-preserving properties of the Wasserstein-1 metric in order to achieve direct encoding and reconstruction of entire bundles of streamlines. We show that the model not only accurately captures the… ▽ More We present StreamNet, an autoencoder architecture for the analysis of the highly heterogeneous geometry of large collections of white matter streamlines. This proposed framework takes advantage of geometry-preserving properties of the Wasserstein-1 metric in order to achieve direct encoding and reconstruction of entire bundles of streamlines. We show that the model not only accurately captures the distributive structures of streamlines in the population, but is also able to achieve superior reconstruction performance between real and synthetic streamlines. Experimental model performance is evaluated on white matter streamlines resulting from T1-weighted diffusion imaging of 40 healthy controls using recent state of the art bundle comparison metric that measures fiber-shape similarities. △ Less

Submitted 19 October, 2022; v1 submitted 3 September, 2022; originally announced September 2022.

arXiv:2208.11640 [pdf, other]

Repair Is Nearly Generation: Multilingual Program Repair with LLMs

Authors: Harshit Joshi, José Cambronero, Sumit Gulwani, Vu Le, Ivan Radicek, Gust Verbruggen

Abstract: Most programmers make mistakes when writing code. Some of these mistakes are small and require few edits to the original program -- a class of errors recently termed last mile mistakes. These errors break the flow for experienced developers and can stump novice programmers. Existing automated repair techniques targeting this class of errors are language-specific and do not easily carry over to new… ▽ More Most programmers make mistakes when writing code. Some of these mistakes are small and require few edits to the original program -- a class of errors recently termed last mile mistakes. These errors break the flow for experienced developers and can stump novice programmers. Existing automated repair techniques targeting this class of errors are language-specific and do not easily carry over to new languages. Transferring symbolic approaches requires substantial engineering and neural approaches require data and retraining. We introduce RING, a multilingual repair engine powered by a large language model trained on code (LLMC) such as Codex. Such a multilingual engine enables a flipped model for programming assistance, one where the programmer writes code and the AI assistance suggests fixes, compared to traditional code suggestion technology. Taking inspiration from the way programmers manually fix bugs, we show that a prompt-based strategy that conceptualizes repair as localization, transformation, and candidate ranking, can successfully repair programs in multiple languages with minimal effort. We present the first results for such a multilingual repair engine by evaluating on 6 different languages and comparing performance to language-specific repair engines. We show that RING can outperform language-specific repair engines for three of these languages. △ Less

Submitted 5 December, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

Comments: 13 pages, Accepted at AAAI 2023

arXiv:2207.11765 [pdf, other]

Neurosymbolic Repair for Low-Code Formula Languages

Authors: Rohan Bavishi, Harshit Joshi, José Pablo Cambronero Sánchez, Anna Fariha, Sumit Gulwani, Vu Le, Ivan Radicek, Ashish Tiwari

Abstract: Most users of low-code platforms, such as Excel and PowerApps, write programs in domain-specific formula languages to carry out nontrivial tasks. Often users can write most of the program they want, but introduce small mistakes that yield broken formulas. These mistakes, which can be both syntactic and semantic, are hard for low-code users to identify and fix, even though they can be resolved with… ▽ More Most users of low-code platforms, such as Excel and PowerApps, write programs in domain-specific formula languages to carry out nontrivial tasks. Often users can write most of the program they want, but introduce small mistakes that yield broken formulas. These mistakes, which can be both syntactic and semantic, are hard for low-code users to identify and fix, even though they can be resolved with just a few edits. We formalize the problem of producing such edits as the last-mile repair problem. To address this problem, we developed LaMirage, a LAst-MIle RepAir-engine GEnerator that combines symbolic and neural techniques to perform last-mile repair in low-code formula languages. LaMirage takes a grammar and a set of domain-specific constraints/rules, which jointly approximate the target language, and uses these to generate a repair engine that can fix formulas in that language. To tackle the challenges of localizing the errors and ranking the candidate repairs, LaMirage leverages neural techniques, whereas it relies on symbolic methods to generate candidate repairs. This combination allows LaMirage to find repairs that satisfy the provided grammar and constraints, and then pick the most natural repair. We compare LaMirage to state-of-the-art neural and symbolic approaches on 400 real Excel and PowerFx formulas, where LaMirage outperforms all baselines. We release these benchmarks to encourage subsequent work in low-code domains. △ Less

Submitted 24 July, 2022; originally announced July 2022.

arXiv:2204.00054 [pdf, ps, other]

Distributed Robust Geocast Multicast Routing for Inter-Vehicle Communication

Authors: Harshvardhan P. Joshi, Mihail L. Sichitiu, Maria Kihl

Abstract: Numerous protocols for geocast have been proposed in literature. It has been shown that explicit route setup approaches perform poorly with VANETs due to limited route lifetime and frequent network fragmentation. The broadcast based approaches have considerable redundancy and add significantly to the overhead of the protocol. A completely distributed and robust geocast approach is presented in thi… ▽ More Numerous protocols for geocast have been proposed in literature. It has been shown that explicit route setup approaches perform poorly with VANETs due to limited route lifetime and frequent network fragmentation. The broadcast based approaches have considerable redundancy and add significantly to the overhead of the protocol. A completely distributed and robust geocast approach is presented in this paper, that is resilient to frequent topology changes and network fragmentation. A distance-based backoff algorithm is used to reduce the number of hops and a novel mechanism to reduce redundant broadcasts is introduced. The performance of the proposed protocol is evaluated for various scenarios and compared with simple flooding and a protocol based on explicit route setup. △ Less

Submitted 31 March, 2022; originally announced April 2022.

Comments: 12 pages

Journal ref: Proceedings of WEIRD workshop on WiMax, Wireless and Mobility, 2007

arXiv:2203.12793 [pdf, other]

doi 10.1109/ICC45855.2022.9838876

A Reinforcement Approach for Detecting P2P Botnet Communities in Dynamic Communication Graphs

Authors: Harshvardhan P. Joshi, Rudra Dutta

Abstract: Peer-to-peer (P2P) botnets use decentralized command and control networks that make them resilient to disruptions. The P2P botnet overlay networks manifest structures in mutual-contact graphs, also called communication graphs, formed using network traffic information. It has been shown that these structures can be detected using community detection techniques from graph theory. These previous work… ▽ More Peer-to-peer (P2P) botnets use decentralized command and control networks that make them resilient to disruptions. The P2P botnet overlay networks manifest structures in mutual-contact graphs, also called communication graphs, formed using network traffic information. It has been shown that these structures can be detected using community detection techniques from graph theory. These previous works, however, treat the communication graphs and the P2P botnet structures as static. In reality, communication graphs are dynamic as they represent the continuously changing network traffic flows. Similarly, the P2P botnets also evolve with time, as new bots join and existing bots leave either temporarily or permanently. In this paper we address the problem of detecting such evolving P2P botnet communities in dynamic communication graphs. We propose a reinforcement-based approach, suitable for large communication graphs, that improves precision and recall of P2P botnet community detection in dynamic communication graphs. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2108.03697 [pdf, other]

Alignment of Tractography Streamlines using Deformation Transfer via Parallel Transport

Authors: Andrew Lizarraga, David Lee, Antoni Kubicki, Ashish Sahib, Elvis Nunez, Katherine Narr, Shantanu H. Joshi

Abstract: We present a geometric framework for aligning white matter fiber tracts. By registering fiber tracts between brains, one expects to see overlap of anatomical structures that often provide meaningful comparisons across subjects. However, the geometry of white matter tracts is highly heterogeneous, and finding direct tract-correspondence across multiple individuals remains a challenging problem. We… ▽ More We present a geometric framework for aligning white matter fiber tracts. By registering fiber tracts between brains, one expects to see overlap of anatomical structures that often provide meaningful comparisons across subjects. However, the geometry of white matter tracts is highly heterogeneous, and finding direct tract-correspondence across multiple individuals remains a challenging problem. We present a novel deformation metric between tracts that allows one to compare tracts while simultaneously obtaining a registration. To accomplish this, fiber tracts are represented by an intrinsic mean along with the deformation fields represented by tangent vectors from the mean. In this setting, one can determine a parallel transport between tracts and then register corresponding tangent vectors. We present the results of bundle alignment on a population of 43 healthy adult subjects. △ Less

Submitted 8 August, 2021; originally announced August 2021.

arXiv:2104.13449 [pdf, other]

SrvfNet: A Generative Network for Unsupervised Multiple Diffeomorphic Shape Alignment

Authors: Elvis Nunez, Andrew Lizarraga, Shantanu H. Joshi

Abstract: We present SrvfNet, a generative deep learning framework for the joint multiple alignment of large collections of functional data comprising square-root velocity functions (SRVF) to their templates. Our proposed framework is fully unsupervised and is capable of aligning to a predefined template as well as jointly predicting an optimal template from data while simultaneously achieving alignment. Ou… ▽ More We present SrvfNet, a generative deep learning framework for the joint multiple alignment of large collections of functional data comprising square-root velocity functions (SRVF) to their templates. Our proposed framework is fully unsupervised and is capable of aligning to a predefined template as well as jointly predicting an optimal template from data while simultaneously achieving alignment. Our network is constructed as a generative encoder-decoder architecture comprising fully-connected layers capable of producing a distribution space of the warping functions. We demonstrate the strength of our framework by validating it on synthetic data as well as diffusion profiles from magnetic resonance imaging (MRI) data. △ Less

Submitted 27 April, 2021; originally announced April 2021.

arXiv:2101.04427 [pdf, other]

Quantum Internet- Applications, Functionalities, Enabling Technologies, Challenges, and Research Directions

Authors: Amoldeep Singh, Kapal Dev, Harun Siljak, Hem Dutt Joshi, Maurizio Magarini

Abstract: The advanced notebooks, mobile phones, and internet applications in today's world that we use are all entrenched in classical communication bits of zeros and ones. Classical internet has laid its foundation originating from the amalgamation of mathematics and Claude Shannon's theory of information. But today's internet technology is a playground for eavesdroppers. This poses a serious challenge to… ▽ More The advanced notebooks, mobile phones, and internet applications in today's world that we use are all entrenched in classical communication bits of zeros and ones. Classical internet has laid its foundation originating from the amalgamation of mathematics and Claude Shannon's theory of information. But today's internet technology is a playground for eavesdroppers. This poses a serious challenge to various applications that relies on classical internet technology. This has motivated the researchers to switch to new technologies that are fundamentally more secure. Exploring the quantum effects, researchers paved the way into quantum networks that provide security, privacy and range of capabilities such as quantum computation, communication and metrology. The realization of quantum internet requires quantum communication between various remote nodes through quantum channels guarded by quantum cryptographic protocols. Such networks rely upon quantum bits (qubits) that can simultaneously take the value of zeros and ones. Due to extraordinary properties of qubits such as entanglement, teleportation and superposition, it gives an edge to quantum networks over traditional networks in many ways. But at the same time transmitting qubits over long distances is a formidable task and extensive research is going on quantum teleportation over such distances, which will become a breakthrough in physically realizing quantum internet in near future. In this paper, quantum internet functionalities, technologies, applications and open challenges have been extensively surveyed to help readers gain a basic understanding of infrastructure required for the development of global quantum internet. △ Less

Submitted 1 June, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

Comments: This survey paper is submitted in IEEE Communications Surveys and Tutorials and revised on 27th May 2021. It includes 31 pages, 14 figures, and 5 tables

arXiv:2011.15103 [pdf, other]

Automating Artifact Detection in Video Games

Authors: Parmida Davarmanesh, Kuanhao Jiang, Tingting Ou, Artem Vysogorets, Stanislav Ivashkevich, Max Kiehn, Shantanu H. Joshi, Nicholas Malaya

Abstract: In spite of advances in gaming hardware and software, gameplay is often tainted with graphics errors, glitches, and screen artifacts. This proof of concept study presents a machine learning approach for automated detection of graphics corruptions in video games. Based on a sample of representative screen corruption examples, the model was able to identify 10 of the most commonly occurring screen a… ▽ More In spite of advances in gaming hardware and software, gameplay is often tainted with graphics errors, glitches, and screen artifacts. This proof of concept study presents a machine learning approach for automated detection of graphics corruptions in video games. Based on a sample of representative screen corruption examples, the model was able to identify 10 of the most commonly occurring screen artifacts with reasonable accuracy. Feature representation of the data included discrete Fourier transforms, histograms of oriented gradients, and graph Laplacians. Various combinations of these features were used to train machine learning models that identify individual classes of graphics corruptions and that later were assembled into a single mixed experts "ensemble" classifier. The ensemble classifier was tested on heldout test sets, and produced an accuracy of 84% on the games it had seen before, and 69% on games it had never seen before. △ Less

Submitted 30 November, 2020; originally announced November 2020.

arXiv:2005.08011 [pdf, ps, other]

Decision Fusion in Space-Time Spreading aided Distributed MIMO WSNs

Authors: I. Dey, H. Joshi, N. Marchetti

Abstract: In this letter, we propose space-time spreading (STS) of local sensor decisions before reporting them over a wireless multiple access channel (MAC), in order to achieve flexible balance between diversity and multiplexing gain as well as eliminate any chance of intrinsic interference inherent in MAC scenarios. Spreading of the sensor decisions using dispersion vectors exploits the benefits of multi… ▽ More In this letter, we propose space-time spreading (STS) of local sensor decisions before reporting them over a wireless multiple access channel (MAC), in order to achieve flexible balance between diversity and multiplexing gain as well as eliminate any chance of intrinsic interference inherent in MAC scenarios. Spreading of the sensor decisions using dispersion vectors exploits the benefits of multi-slot decision to improve low-complexity diversity gain and opportunistic throughput. On the other hand, at the receive side of the reporting channel, we formulate and compare optimum and sub-optimum fusion rules for arriving at a reliable conclusion.Simulation results demonstrate gain in performance with STS aided transmission from a minimum of 3 times to a maximum of 6 times over performance without STS. △ Less

Submitted 16 May, 2020; originally announced May 2020.

Comments: 5 pages, 5 figures

arXiv:2002.01792 [pdf]

Experiments with Different Indexing Techniques for Text Retrieval tasks on Gujarati Language using Bag of Words Approach

Authors: Dr. Jyoti Pareek, Hardik Joshi, Krunal Chauhan, Rushikesh Patel

Abstract: This paper presents results of various experiments carried out to improve text retrieval of gujarati text documents. Text retrieval involves searching and ranking of text documents for a given set of query terms. We have tested various retrieval models that uses bag-of-words approach. Bag-of-words approach is a traditional approach that is being used till date where the text document is represente… ▽ More This paper presents results of various experiments carried out to improve text retrieval of gujarati text documents. Text retrieval involves searching and ranking of text documents for a given set of query terms. We have tested various retrieval models that uses bag-of-words approach. Bag-of-words approach is a traditional approach that is being used till date where the text document is represented as collection of words. Measures like frequency count, inverse document frequency etc. are used to signify and rank relevant documents for user queries. Different ranking models have been used to quantify ranking performance using the metric of mean average precision. Gujarati is a morphologically rich language, we have compared techniques like stop word removal, stemming and frequent case generation against baseline to measure the improvements in information retrieval tasks. Most of the techniques are language dependent and requires development of language specific tools. We used plain unprocessed word index as the baseline, we have seen significant improvements in comparison of MAP values after applying different indexing techniques when compared to the baseline. △ Less

Submitted 5 February, 2020; originally announced February 2020.

arXiv:2001.08085 [pdf]

Experiments on Manual Thesaurus based Query Expansion for Ad-hoc Monolingual Gujarati Information Retrieval Tasks

Authors: Hardik Joshi, Jyoti Pareek

Abstract: In this paper, we present the experimental work done on Query Expansion (QE) for retrieval tasks of Gujarati text documents. In information retrieval, it is very difficult to estimate the exact user need, query expansion adds terms to the original query, which provides more information about the user need. There are various approaches to query expansion. In our work, manual thesaurus based query e… ▽ More In this paper, we present the experimental work done on Query Expansion (QE) for retrieval tasks of Gujarati text documents. In information retrieval, it is very difficult to estimate the exact user need, query expansion adds terms to the original query, which provides more information about the user need. There are various approaches to query expansion. In our work, manual thesaurus based query expansion was performed to evaluate the performance of widely used information retrieval models for Gujarati text documents. Results show that query expansion improves the recall of text documents. △ Less

Submitted 18 January, 2020; originally announced January 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1209.0126

arXiv:1912.05255 [pdf, other]

Novel Deep Learning Framework for Wideband Spectrum Characterization at Sub-Nyquist Rate

Authors: Shivam Chandhok, Himani Joshi, A V Subramanyam, Sumit J. Darak

Abstract: Introduction of spectrum-sharing in 5G and subsequent generation networks demand base-station(s) with the capability to characterize the wideband spectrum spanned over licensed, shared and unlicensed non-contiguous frequency bands. Spectrum characterization involves the identification of vacant bands along with center frequency and parameters (energy, modulation, etc.) of occupied bands. Such char… ▽ More Introduction of spectrum-sharing in 5G and subsequent generation networks demand base-station(s) with the capability to characterize the wideband spectrum spanned over licensed, shared and unlicensed non-contiguous frequency bands. Spectrum characterization involves the identification of vacant bands along with center frequency and parameters (energy, modulation, etc.) of occupied bands. Such characterization at Nyquist sampling is area and power-hungry due to the need for high-speed digitization. Though sub-Nyquist sampling (SNS) offers an excellent alternative when the spectrum is sparse, it suffers from poor performance at low signal to noise ratio (SNR) and demands careful design and integration of digital reconstruction, tunable channelizer and characterization algorithms. In this paper, we propose a novel deep-learning framework via a single unified pipeline to accomplish two tasks: 1)~Reconstruct the signal directly from sub-Nyquist samples, and 2)~Wideband spectrum characterization. The proposed approach eliminates the need for complex signal conditioning between reconstruction and characterization and does not need complex tunable channelizers. We extensively compare the performance of our framework for a wide range of modulation schemes, SNR and channel conditions. We show that the proposed framework outperforms existing SNS based approaches and characterization performance approaches to Nyquist sampling-based framework with an increase in SNR. Easy to design and integrate along with a single unified deep learning framework make the proposed architecture a good candidate for reconfigurable platforms. △ Less

Submitted 7 May, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

arXiv:1512.05726 [pdf, other]

Semi-supervised Question Retrieval with Gated Convolutions

Authors: Tao Lei, Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola, Katerina Tymoshenko, Alessandro Moschitti, Lluis Marquez

Abstract: Question answering forums are rapidly growing in size with no effective automated ability to refer to and reuse answers already available for previous posted questions. In this paper, we develop a methodology for finding semantically related questions. The task is difficult since 1) key pieces of information are often buried in extraneous details in the question body and 2) available annotations o… ▽ More Question answering forums are rapidly growing in size with no effective automated ability to refer to and reuse answers already available for previous posted questions. In this paper, we develop a methodology for finding semantically related questions. The task is difficult since 1) key pieces of information are often buried in extraneous details in the question body and 2) available annotations on similar questions are scarce and fragmented. We design a recurrent and convolutional model (gated convolution) to effectively map questions to their semantic representations. The models are pre-trained within an encoder-decoder framework (from body to title) on the basis of the entire raw corpus, and fine-tuned discriminatively from limited annotations. Our evaluation demonstrates that our model yields substantial gains over a standard IR baseline and various neural network architectures (including CNNs, LSTMs and GRUs). △ Less

Submitted 3 April, 2016; v1 submitted 17 December, 2015; originally announced December 2015.

Comments: NAACL 2016

arXiv:1406.6840 [pdf]

From Citation count to Argumentation count: a new metric to indicate the usefulness of an article

Authors: Hardik Joshi

Abstract: Citation count is a quantifiable measure to indicate the number of times an article is cited by other articles. It is believed that if an article is cited often then it must be an important or influential article; however, there is no guarantee that the most cited articles are good in quality. In this paper, the author suggests argumentation count, a new metric for citation analysis. The proposed… ▽ More Citation count is a quantifiable measure to indicate the number of times an article is cited by other articles. It is believed that if an article is cited often then it must be an important or influential article; however, there is no guarantee that the most cited articles are good in quality. In this paper, the author suggests argumentation count, a new metric for citation analysis. The proposed metric, argumentation count is a triplet of quantities for each concept of an article that helps in providing a quantifiable measure about the usefulness of an article. △ Less

Submitted 26 June, 2014; originally announced June 2014.

Comments: Technical Conference cum Workshop on Digital Library Using DSpace hosted by Gujarat National Law University on 21-23 March, 2013

arXiv:1301.4337 [pdf]

doi 10.5120/9900-4481

A Novel Digital Watermarking Algorithm using Random Matrix Image

Authors: Mahimn Pandya, Hiren Joshi, Ashish Jani

Abstract: The availability of bandwidth for internet access is sufficient enough to communicate digital assets. These digital assets are subjected to various types of threats. [19] As a result of this, protection mechanism required for the protection of digital assets is of priority in research. The threat of current focus is unauthorized copying of digital assets which give boost to piracy. This under the… ▽ More The availability of bandwidth for internet access is sufficient enough to communicate digital assets. These digital assets are subjected to various types of threats. [19] As a result of this, protection mechanism required for the protection of digital assets is of priority in research. The threat of current focus is unauthorized copying of digital assets which give boost to piracy. This under the copyright act is illegal and a robust mechanism is required to curb this kind of unauthorized copy. To safeguard the copyright digital assets, a robust digital watermarking technique is needed. The existing digital watermarking techniques protect digital assets by embedding a digital watermark into a host digital image. This embedding does induce slight distortion in the host image but the distortion is usually too small to be noticed. At the same time the embedded watermark must be robust enough to with stand deliberate attacks. There are various techniques of digital watermarking but researchers are making constant efforts to increase the robustness of the watermark image. The layered approach of watermarking based on Huffman coding [5] can soon increase the robustness of digital watermark.[11] Ultimately, increasing the security of copyright of protection. The proposed work is in similar direction where in RMI (Random Matrix Image) is used in place of Huffman coding. This innovative algorithm has considerably increased the robustness in digital watermark while also enhancing security of production △ Less

Submitted 22 January, 2013; v1 submitted 18 January, 2013; originally announced January 2013.

Comments: 4 pages, 8 figures

Journal ref: International Journal of Computer Applications, Volume 61, Number 2, pp. 18-12, 2013

arXiv:1209.0126 [pdf]

Evaluation of some Information Retrieval models for Gujarati Ad hoc Monolingual Tasks

Authors: Hardik J. Joshi, Pareek Jyoti

Abstract: This paper describes the work towards Gujarati Ad hoc Monolingual Retrieval task for widely used Information Retrieval (IR) models. We present an indexing baseline for the Gujarati Language represented by Mean Average Precision (MAP) values. Our objective is to obtain a relative picture of a better IR model for Gujarati Language. Results show that Classical IR models like Term Frequency Inverse Do… ▽ More This paper describes the work towards Gujarati Ad hoc Monolingual Retrieval task for widely used Information Retrieval (IR) models. We present an indexing baseline for the Gujarati Language represented by Mean Average Precision (MAP) values. Our objective is to obtain a relative picture of a better IR model for Gujarati Language. Results show that Classical IR models like Term Frequency Inverse Document Frequency (TF_IDF) performs better when compared to few recent probabilistic IR models. The experiments helped to identify the outperforming IR models for Gujarati Language. △ Less

Submitted 1 September, 2012; originally announced September 2012.

Comments: 6 pages, Some text in Gujarati Language

Journal ref: VNSGU Journal of Science and Technology,3,2,176-181,2012

Showing 1–20 of 20 results for author: Joshi, H