-
Iterative Graph Alignment
Authors:
Fangyuan Yu,
Hardeep Singh Arora,
Matt Johnson
Abstract:
By compressing diverse narratives, LLMs go beyond memorization, achieving intelligence by capturing generalizable causal relationships. However, they suffer from local 'representation gaps' due to insufficient training data diversity, limiting their real-world utility, especially in tasks requiring strict alignment to rules. Traditional alignment methods relying on heavy human annotations are inef…
▽ More
By compressing diverse narratives, LLMs go beyond memorization, achieving intelligence by capturing generalizable causal relationships. However, they suffer from local 'representation gaps' due to insufficient training data diversity, limiting their real-world utility, especially in tasks requiring strict alignment to rules. Traditional alignment methods relying on heavy human annotations are inefficient and unscalable. Recent self-alignment techniques also fall short, as they often depend on self-selection based prompting and memorization-based learning. To address these issues, we introduce Iterative Graph Alignment (IGA), an annotation-free rule-based alignment algorithm. A teacher model (VLM) employs Iterative Graph Prompting (IGP) to create logical graphs and reference answers. The student model (LLM) identifies local knowledge gaps by attempting to align its responses with these references, collaborating with helper models to generate diverse answers. These aligned responses are then used for iterative supervised fine-tuning (SFT). Our evaluations across five rule-based scenarios demonstrate IGP's effectiveness, with a 73.12\% alignment improvement in Claude Sonnet 3.5, and Llama3-8B-Instruct achieving an 86.20\% improvement, outperforming Claude Sonnet 3.5 in rule-based alignment.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
AstroMLab 1: Who Wins Astronomy Jeopardy!?
Authors:
Yuan-Sen Ting,
Tuan Dung Nguyen,
Tirthankar Ghosal,
Rui Pan,
Hardik Arora,
Zechang Sun,
Tijmen de Haan,
Nesar Ramachandra,
Azton Wells,
Sandeep Madireddy,
Alberto Accomazzi
Abstract:
We present a comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. This dataset comprises 4,425 multiple-choice questions curated from the Annual Review of Astronomy and Astrophysics, covering a broad range of astrophysical topics. Our analysis examines model performance across various astronomical subfields and asse…
▽ More
We present a comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. This dataset comprises 4,425 multiple-choice questions curated from the Annual Review of Astronomy and Astrophysics, covering a broad range of astrophysical topics. Our analysis examines model performance across various astronomical subfields and assesses response calibration, crucial for potential deployment in research environments. Claude-3.5-Sonnet outperforms competitors by up to 4.6 percentage points, achieving 85.0% accuracy. For proprietary models, we observed a universal reduction in cost every 3-to-12 months to achieve similar score in this particular astronomy benchmark. Open-source models have rapidly improved, with LLaMA-3-70b (80.6%) and Qwen-2-72b (77.7%) now competing with some of the best proprietary models. We identify performance variations across topics, with non-English-focused models generally struggling more in exoplanet-related fields, stellar astrophysics, and instrumentation related questions. These challenges likely stem from less abundant training data, limited historical context, and rapid recent developments in these areas. This pattern is observed across both open-weights and proprietary models, with regional dependencies evident, highlighting the impact of training data diversity on model performance in specialized scientific domains. Top-performing models demonstrate well-calibrated confidence, with correlations above 0.9 between confidence and correctness, though they tend to be slightly underconfident. The development for fast, low-cost inference of open-weights models presents new opportunities for affordable deployment in astronomy. The rapid progress observed suggests that LLM-driven research in astronomy may become feasible in the near future.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
TrICy: Trigger-guided Data-to-text Generation with Intent aware Attention-Copy
Authors:
Vibhav Agarwal,
Sourav Ghosh,
Harichandana BSS,
Himanshu Arora,
Barath Raj Kandur Raja
Abstract:
Data-to-text (D2T) generation is a crucial task in many natural language understanding (NLU) applications and forms the foundation of task-oriented dialog systems. In the context of conversational AI solutions that can work directly with local data on the user's device, architectures utilizing large pre-trained language models (PLMs) are impractical for on-device deployment due to a high memory fo…
▽ More
Data-to-text (D2T) generation is a crucial task in many natural language understanding (NLU) applications and forms the foundation of task-oriented dialog systems. In the context of conversational AI solutions that can work directly with local data on the user's device, architectures utilizing large pre-trained language models (PLMs) are impractical for on-device deployment due to a high memory footprint. To this end, we propose TrICy, a novel lightweight framework for an enhanced D2T task that generates text sequences based on the intent in context and may further be guided by user-provided triggers. We leverage an attention-copy mechanism to predict out-of-vocabulary (OOV) words accurately. Performance analyses on E2E NLG dataset (BLEU: 66.43%, ROUGE-L: 70.14%), WebNLG dataset (BLEU: Seen 64.08%, Unseen 52.35%), and our Custom dataset related to text messaging applications, showcase our architecture's effectiveness. Moreover, we show that by leveraging an optional trigger input, data-to-text generation quality increases significantly and achieves the new SOTA score of 69.29% BLEU for E2E NLG. Furthermore, our analyses show that TrICy achieves at least 24% and 3% improvement in BLEU and METEOR respectively over LLMs like GPT-3, ChatGPT, and Llama 2. We also demonstrate that in some scenarios, performance improvement due to triggers is observed even when they are absent in training.
△ Less
Submitted 25 January, 2024;
originally announced February 2024.
-
Automated Material Properties Extraction For Enhanced Beauty Product Discovery and Makeup Virtual Try-on
Authors:
Fatemeh Taheri Dezaki,
Himanshu Arora,
Rahul Suresh,
Amin Banitalebi-Dehkordi
Abstract:
The multitude of makeup products available can make it challenging to find the ideal match for desired attributes. An intelligent approach for product discovery is required to enhance the makeup shopping experience to make it more convenient and satisfying. However, enabling accurate and efficient product discovery requires extracting detailed attributes like color and finish type. Our work introd…
▽ More
The multitude of makeup products available can make it challenging to find the ideal match for desired attributes. An intelligent approach for product discovery is required to enhance the makeup shopping experience to make it more convenient and satisfying. However, enabling accurate and efficient product discovery requires extracting detailed attributes like color and finish type. Our work introduces an automated pipeline that utilizes multiple customized machine learning models to extract essential material attributes from makeup product images. Our pipeline is versatile and capable of handling various makeup products. To showcase the efficacy of our pipeline, we conduct extensive experiments on eyeshadow products (both single and multi-shade ones), a challenging makeup product known for its diverse range of shapes, colors, and finish types. Furthermore, we demonstrate the applicability of our approach by successfully extending it to other makeup categories like lipstick and foundation, showcasing its adaptability and effectiveness across different beauty products. Additionally, we conduct ablation experiments to demonstrate the superiority of our machine learning pipeline over human labeling methods in terms of reliability. Our proposed method showcases its effectiveness in cross-category product discovery, specifically in recommending makeup products that perfectly match a specified outfit. Lastly, we also demonstrate the application of these material attributes in enabling virtual-try-on experiences which makes makeup shopping experience significantly more engaging.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Unsupervised Scene Sketch to Photo Synthesis
Authors:
Jiayun Wang,
Sangryul Jeon,
Stella X. Yu,
Xi Zhang,
Himanshu Arora,
Yu Lou
Abstract:
Sketches make an intuitive and powerful visual expression as they are fast executed freehand drawings. We present a method for synthesizing realistic photos from scene sketches. Without the need for sketch and photo pairs, our framework directly learns from readily available large-scale photo datasets in an unsupervised manner. To this end, we introduce a standardization module that provides pseud…
▽ More
Sketches make an intuitive and powerful visual expression as they are fast executed freehand drawings. We present a method for synthesizing realistic photos from scene sketches. Without the need for sketch and photo pairs, our framework directly learns from readily available large-scale photo datasets in an unsupervised manner. To this end, we introduce a standardization module that provides pseudo sketch-photo pairs during training by converting photos and sketches to a standardized domain, i.e. the edge map. The reduced domain gap between sketch and photo also allows us to disentangle them into two components: holistic scene structures and low-level visual styles such as color and texture. Taking this advantage, we synthesize a photo-realistic image by combining the structure of a sketch and the visual style of a reference photo. Extensive experimental results on perceptual similarity metrics and human perceptual studies show the proposed method could generate realistic photos with high fidelity from scene sketches and outperform state-of-the-art photo synthesis baselines. We also demonstrate that our framework facilitates a controllable manipulation of photo synthesis by editing strokes of corresponding sketches, delivering more fine-grained details than previous approaches that rely on region-level editing.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Structured Graph Variational Autoencoders for Indoor Furniture layout Generation
Authors:
Aditya Chattopadhyay,
Xi Zhang,
David Paul Wipf,
Himanshu Arora,
Rene Vidal
Abstract:
We present a structured graph variational autoencoder for generating the layout of indoor 3D scenes. Given the room type (e.g., living room or library) and the room layout (e.g., room elements such as floor and walls), our architecture generates a collection of objects (e.g., furniture items such as sofa, table and chairs) that is consistent with the room type and layout. This is a challenging pro…
▽ More
We present a structured graph variational autoencoder for generating the layout of indoor 3D scenes. Given the room type (e.g., living room or library) and the room layout (e.g., room elements such as floor and walls), our architecture generates a collection of objects (e.g., furniture items such as sofa, table and chairs) that is consistent with the room type and layout. This is a challenging problem because the generated scene should satisfy multiple constrains, e.g., each object must lie inside the room and two objects cannot occupy the same volume. To address these challenges, we propose a deep generative model that encodes these relationships as soft constraints on an attributed graph (e.g., the nodes capture attributes of room and furniture elements, such as class, pose and size, and the edges capture geometric relationships such as relative orientation). The architecture consists of a graph encoder that maps the input graph to a structured latent space, and a graph decoder that generates a furniture graph, given a latent code and the room graph. The latent space is modeled with auto-regressive priors, which facilitates the generation of highly structured scenes. We also propose an efficient training procedure that combines matching and constrained learning. Experiments on the 3D-FRONT dataset show that our method produces scenes that are diverse and are adapted to the room layout.
△ Less
Submitted 22 July, 2022; v1 submitted 11 April, 2022;
originally announced April 2022.
-
VoiceMoji: A Novel On-Device Pipeline for Seamless Emoji Insertion in Dictation
Authors:
Sumit Kumar,
Harichandana B S S,
Himanshu Arora
Abstract:
Most of the speech recognition systems recover only words in the speech and fail to capture emotions. Users have to manually add emoji(s) in text for adding tone and making communication fun. Though there is much work done on punctuation addition on transcribed speech, the area of emotion addition is untouched. In this paper, we propose a novel on-device pipeline to enrich the voice input experien…
▽ More
Most of the speech recognition systems recover only words in the speech and fail to capture emotions. Users have to manually add emoji(s) in text for adding tone and making communication fun. Though there is much work done on punctuation addition on transcribed speech, the area of emotion addition is untouched. In this paper, we propose a novel on-device pipeline to enrich the voice input experience. It involves, given a blob of transcribed text, intelligently processing and identifying structure where emoji insertion makes sense. Moreover, it includes semantic text analysis to predict emoji for each of the sub-parts for which we propose a novel architecture Attention-based Char Aware (ACA) LSTM which handles Out-Of-Vocabulary (OOV) words as well. All these tasks are executed completely on-device and hence can aid on-device dictation systems. To the best of our knowledge, this is the first work that shows how to add emoji(s) in the transcribed text. We demonstrate that our components achieve comparable results to previous neural approaches for punctuation addition and emoji prediction with 80% fewer parameters. Overall, our proposed model has a very small memory footprint of a mere 4MB to suit on-device deployment.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
LIDSNet: A Lightweight on-device Intent Detection model using Deep Siamese Network
Authors:
Vibhav Agarwal,
Sudeep Deepak Shivnikar,
Sourav Ghosh,
Himanshu Arora,
Yashwant Saini
Abstract:
Intent detection is a crucial task in any Natural Language Understanding (NLU) system and forms the foundation of a task-oriented dialogue system. To build high-quality real-world conversational solutions for edge devices, there is a need for deploying intent detection model on device. This necessitates a light-weight, fast, and accurate model that can perform efficiently in a resource-constrained…
▽ More
Intent detection is a crucial task in any Natural Language Understanding (NLU) system and forms the foundation of a task-oriented dialogue system. To build high-quality real-world conversational solutions for edge devices, there is a need for deploying intent detection model on device. This necessitates a light-weight, fast, and accurate model that can perform efficiently in a resource-constrained environment. To this end, we propose LIDSNet, a novel lightweight on-device intent detection model, which accurately predicts the message intent by utilizing a Deep Siamese Network for learning better sentence representations. We use character-level features to enrich the sentence-level representations and empirically demonstrate the advantage of transfer learning by utilizing pre-trained embeddings. Furthermore, to investigate the efficacy of the modules in our architecture, we conduct an ablation study and arrive at our optimal model. Experimental results prove that LIDSNet achieves state-of-the-art competitive accuracy of 98.00% and 95.97% on SNIPS and ATIS public datasets respectively, with under 0.59M parameters. We further benchmark LIDSNet against fine-tuned BERTs and show that our model is at least 41x lighter and 30x faster during inference than MobileBERT on Samsung Galaxy S20 device, justifying its efficiency on resource-constrained edge devices.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
ABO: Dataset and Benchmarks for Real-World 3D Object Understanding
Authors:
Jasmine Collins,
Shubham Goel,
Kenan Deng,
Achleshwar Luthra,
Leon Xu,
Erhan Gundogdu,
Xi Zhang,
Tomas F. Yago Vicente,
Thomas Dideriksen,
Himanshu Arora,
Matthieu Guillaumin,
Jitendra Malik
Abstract:
We introduce Amazon Berkeley Objects (ABO), a new large-scale dataset designed to help bridge the gap between real and virtual 3D worlds. ABO contains product catalog images, metadata, and artist-created 3D models with complex geometries and physically-based materials that correspond to real, household objects. We derive challenging benchmarks that exploit the unique properties of ABO and measure…
▽ More
We introduce Amazon Berkeley Objects (ABO), a new large-scale dataset designed to help bridge the gap between real and virtual 3D worlds. ABO contains product catalog images, metadata, and artist-created 3D models with complex geometries and physically-based materials that correspond to real, household objects. We derive challenging benchmarks that exploit the unique properties of ABO and measure the current limits of the state-of-the-art on three open problems for real-world 3D object understanding: single-view 3D reconstruction, material estimation, and cross-domain multi-view object retrieval.
△ Less
Submitted 24 June, 2022; v1 submitted 12 October, 2021;
originally announced October 2021.
-
RoomStructNet: Learning to Rank Non-Cuboidal Room Layouts From Single View
Authors:
Xi Zhang,
Chun-Kai Wang,
Kenan Deng,
Tomas Yago-Vicente,
Himanshu Arora
Abstract:
In this paper, we present a new approach to estimate the layout of a room from its single image. While recent approaches for this task use robust features learnt from data, they resort to optimization for detecting the final layout. In addition to using learnt robust features, our approach learns an additional ranking function to estimate the final layout instead of using optimization. To learn th…
▽ More
In this paper, we present a new approach to estimate the layout of a room from its single image. While recent approaches for this task use robust features learnt from data, they resort to optimization for detecting the final layout. In addition to using learnt robust features, our approach learns an additional ranking function to estimate the final layout instead of using optimization. To learn this ranking function, we propose a framework to train a CNN using max-margin structure cost. Also, while most approaches aim at detecting cuboidal layouts, our approach detects non-cuboidal layouts for which we explicitly estimates layout complexity parameters. We use these parameters to propose layout candidates in a novel way. Our approach shows state-of-the-art results on standard datasets with mostly cuboidal layouts and also performs well on a dataset containing rooms with non-cuboidal layouts.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
Multimodal Shape Completion via IMLE
Authors:
Himanshu Arora,
Saurabh Mishra,
Shichong Peng,
Ke Li,
Ali Mahdavi-Amiri
Abstract:
Shape completion is the problem of completing partial input shapes such as partial scans. This problem finds important applications in computer vision and robotics due to issues such as occlusion or sparsity in real-world data. However, most of the existing research related to shape completion has been focused on completing shapes by learning a one-to-one mapping which limits the diversity and cre…
▽ More
Shape completion is the problem of completing partial input shapes such as partial scans. This problem finds important applications in computer vision and robotics due to issues such as occlusion or sparsity in real-world data. However, most of the existing research related to shape completion has been focused on completing shapes by learning a one-to-one mapping which limits the diversity and creativity of the produced results. We propose a novel multimodal shape completion technique that is effectively able to learn a one-to-many mapping and generates diverse complete shapes. Our approach is based on the conditional Implicit MaximumLikelihood Estimation (IMLE) technique wherein we condition our inputs on partial 3D point clouds. We extensively evaluate our approach by comparing it to various baselines both quantitatively and qualitatively. We show that our method is superior to alternatives in terms of completeness and diversity of shapes.
△ Less
Submitted 7 July, 2021; v1 submitted 30 June, 2021;
originally announced June 2021.
-
Affordance-based Reinforcement Learning for Urban Driving
Authors:
Tanmay Agarwal,
Hitesh Arora,
Jeff Schneider
Abstract:
Traditional autonomous vehicle pipelines that follow a modular approach have been very successful in the past both in academia and industry, which has led to autonomy deployed on road. Though this approach provides ease of interpretation, its generalizability to unseen environments is limited and hand-engineering of numerous parameters is required, especially in the prediction and planning systems…
▽ More
Traditional autonomous vehicle pipelines that follow a modular approach have been very successful in the past both in academia and industry, which has led to autonomy deployed on road. Though this approach provides ease of interpretation, its generalizability to unseen environments is limited and hand-engineering of numerous parameters is required, especially in the prediction and planning systems. Recently, deep reinforcement learning has been shown to learn complex strategic games and perform challenging robotic tasks, which provides an appealing framework for learning to drive. In this work, we propose a deep reinforcement learning framework to learn optimal control policy using waypoints and low-dimensional visual representations, also known as affordances. We demonstrate that our agents when trained from scratch learn the tasks of lane-following, driving around inter-sections as well as stopping in front of other actors or traffic lights even in the dense traffic setting. We note that our method achieves comparable or better performance than the baseline methods on the original and NoCrash benchmarks on the CARLA simulator.
△ Less
Submitted 15 January, 2021;
originally announced January 2021.
-
A character representation enhanced on-device Intent Classification
Authors:
Sudeep Deepak Shivnikar,
Himanshu Arora,
Harichandana B S S
Abstract:
Intent classification is an important task in natural language understanding systems. Existing approaches have achieved perfect scores on the benchmark datasets. However they are not suitable for deployment on low-resource devices like mobiles, tablets, etc. due to their massive model size. Therefore, in this paper, we present a novel light-weight architecture for intent classification that can ru…
▽ More
Intent classification is an important task in natural language understanding systems. Existing approaches have achieved perfect scores on the benchmark datasets. However they are not suitable for deployment on low-resource devices like mobiles, tablets, etc. due to their massive model size. Therefore, in this paper, we present a novel light-weight architecture for intent classification that can run efficiently on a device. We use character features to enrich the word representation. Our experiments prove that our proposed model outperforms existing approaches and achieves state-of-the-art results on benchmark datasets. We also report that our model has tiny memory footprint of ~5 MB and low inference time of ~2 milliseconds, which proves its efficiency in a resource-constrained environment.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Contextual Diversity for Active Learning
Authors:
Sharat Agarwal,
Himanshu Arora,
Saket Anand,
Chetan Arora
Abstract:
Requirement of large annotated datasets restrict the use of deep convolutional neural networks (CNNs) for many practical applications. The problem can be mitigated by using active learning (AL) techniques which, under a given annotation budget, allow to select a subset of data that yields maximum accuracy upon fine tuning. State of the art AL approaches typically rely on measures of visual diversi…
▽ More
Requirement of large annotated datasets restrict the use of deep convolutional neural networks (CNNs) for many practical applications. The problem can be mitigated by using active learning (AL) techniques which, under a given annotation budget, allow to select a subset of data that yields maximum accuracy upon fine tuning. State of the art AL approaches typically rely on measures of visual diversity or prediction uncertainty, which are unable to effectively capture the variations in spatial context. On the other hand, modern CNN architectures make heavy use of spatial context for achieving highly accurate predictions. Since the context is difficult to evaluate in the absence of ground-truth labels, we introduce the notion of contextual diversity that captures the confusion associated with spatially co-occurring classes. Contextual Diversity (CD) hinges on a crucial observation that the probability vector predicted by a CNN for a region of interest typically contains information from a larger receptive field. Exploiting this observation, we use the proposed CD measure within two AL frameworks: (1) a core-set based strategy and (2) a reinforcement learning based policy, for active frame selection. Our extensive empirical evaluation establish state of the art results for active learning on benchmark datasets of Semantic Segmentation, Object Detection and Image Classification. Our ablation studies show clear advantages of using contextual diversity for active learning. The source code and additional results are available at https://github.com/sharat29ag/CDAL.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
Complex Network Analysis of Indian Railway Zones
Authors:
Nikhil Kumar Rajput,
Piyush Badola,
Harshit Arora,
Bhavya Ahuja Grover
Abstract:
Indian Railway Network has been analyzed on the basis of number of trains directly linking two railway zones. The network has been displayed as a weighted graph where the weights denote the number of trains between the zones. It may be pointed out that each zone is a complex network in itself and may depict different characteristic features. The zonal network therefore can be considered as a netwo…
▽ More
Indian Railway Network has been analyzed on the basis of number of trains directly linking two railway zones. The network has been displayed as a weighted graph where the weights denote the number of trains between the zones. It may be pointed out that each zone is a complex network in itself and may depict different characteristic features. The zonal network therefore can be considered as a network of complex networks. In this paper, self links, in-degree and out-degree of each zone have been computed which provides information about the inter and intra zonal connectivity. Degree passenger correlation which gives an idea about number of trains and passengers originating from a particular zone which might play a role in policy making decisions has also been studied. Some other complex network parameters like betweenness, clustering coefficient and cliques have been obtained to get more insight about the complex Indian zonal network.
△ Less
Submitted 8 April, 2020;
originally announced April 2020.
-
Iteratively Composing Statically Verified Traits
Authors:
Isaac Oscar Gariano,
Marco Servetto,
Alex Potanin,
Hrshikesh Arora
Abstract:
Static verification relying on an automated theorem prover can be very slow and brittle: since static verification is undecidable, correct code may not pass a particular static verifier. In this work we use metaprogramming to generate code that is correct by construction. A theorem prover is used only to verify initial "traits": units of code that can be used to compose bigger programs.
In our w…
▽ More
Static verification relying on an automated theorem prover can be very slow and brittle: since static verification is undecidable, correct code may not pass a particular static verifier. In this work we use metaprogramming to generate code that is correct by construction. A theorem prover is used only to verify initial "traits": units of code that can be used to compose bigger programs.
In our work, meta-programming is done by trait composition, which starting from correct code, is guaranteed to produce correct code. We do this by extending conventional traits with pre- and post-conditions for the methods; we also extend the traditional trait composition (+) operator to check the compatibility of contracts. In this way, there is no need to re-verify the produced code.
We show how our approach can be applied to the standard "power" function example, where metaprogramming generates optimised, and correct, versions when the exponent is known in advance.
△ Less
Submitted 20 August, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Checking Observational Purity of Procedures
Authors:
Himanshu Arora,
Raghavan Komondoor,
G. Ramalingam
Abstract:
Verifying whether a procedure is observationally pure is useful in many software engineering scenarios. An observationally pure procedure always returns the same value for the same argument, and thus mimics a mathematical function. The problem is challenging when procedures use private mutable global variables, e.g., for memoization of frequently returned answers, and when they involve recursion.…
▽ More
Verifying whether a procedure is observationally pure is useful in many software engineering scenarios. An observationally pure procedure always returns the same value for the same argument, and thus mimics a mathematical function. The problem is challenging when procedures use private mutable global variables, e.g., for memoization of frequently returned answers, and when they involve recursion.
We present a novel verification approach for this problem. Our approach involves encoding the procedure's code as a formula that is a disjunction of path constraints, with the recursive calls being replaced in the formula with references to a mathematical function symbol. Then, a theorem prover is invoked to check whether the formula that has been constructed agrees with the function symbol referred to above in terms of input-output behavior for all arguments.
We evaluate our approach on a set of realistic examples, using the Boogie intermediate language and theorem prover. Our evaluation shows that the invariants are easy to construct manually, and that our approach is effective at verifying observationally pure procedures.
△ Less
Submitted 14 February, 2019;
originally announced February 2019.
-
Separating Use and Reuse to Improve Both
Authors:
Hrshikesh Arora,
Marco Servetto,
Bruno C. D. S. Oliveira
Abstract:
Context: Trait composition has inspired new research in the area of code reuse for object oriented (OO) languages. One of the main advantages of this kind of composition is that it makes possible to separate subtyping from subclassing; which is good for code-reuse, design and reasoning. However, handling of state within traits is difficult, verbose or inelegant. Inquiry: We identify the this-leaki…
▽ More
Context: Trait composition has inspired new research in the area of code reuse for object oriented (OO) languages. One of the main advantages of this kind of composition is that it makes possible to separate subtyping from subclassing; which is good for code-reuse, design and reasoning. However, handling of state within traits is difficult, verbose or inelegant. Inquiry: We identify the this-leaking problem as the fundamental limitation that prevents the separation of subtyping from subclassing in conventional OO languages. We explain that the concept of trait composition addresses this problem, by distinguishing code designed for use (as a type) from code designed for reuse (i.e. inherited). We are aware of at least 3 concrete independently designed research languages following this methodology: TraitRecordJ, Package Templates and DeepFJig. Approach: In this paper, we design $42_μ$ a new language, where we improve use and reuse and support the This type and family polymorphism by distinguishing code designed for use from code designed for reuse. In this way $42_μ$ synthesise the 3 approaches above, and improves them with abstract state operations: a new elegant way to handle state composition in trait based languages. Knowledge and Grounding: Using case studies, we show that $42_μ$'s model of traits with abstract state operations is more usable and compact than prior work. We formalise our work and prove that type errors cannot arise from composing well typed code. Importance: This work is the logical core of the programming language 42. This shows that the ideas presented in this paper can be applicable to a full general purpose language. This form of composition is very flexible and could be used in many new languages.
△ Less
Submitted 1 February, 2019;
originally announced February 2019.
-
Multi-task Learning for Continuous Control
Authors:
Himani Arora,
Rajath Kumar,
Jason Krone,
Chong Li
Abstract:
Reliable and effective multi-task learning is a prerequisite for the development of robotic agents that can quickly learn to accomplish related, everyday tasks. However, in the reinforcement learning domain, multi-task learning has not exhibited the same level of success as in other domains, such as computer vision. In addition, most reinforcement learning research on multi-task learning has been…
▽ More
Reliable and effective multi-task learning is a prerequisite for the development of robotic agents that can quickly learn to accomplish related, everyday tasks. However, in the reinforcement learning domain, multi-task learning has not exhibited the same level of success as in other domains, such as computer vision. In addition, most reinforcement learning research on multi-task learning has been focused on discrete action spaces, which are not used for robotic control in the real-world. In this work, we apply multi-task learning methods to continuous action spaces and benchmark their performance on a series of simulated continuous control tasks. Most notably, we show that multi-task learning outperforms our baselines and alternative knowledge sharing methods.
△ Less
Submitted 3 February, 2018;
originally announced February 2018.
-
Lip2AudSpec: Speech reconstruction from silent lip movements video
Authors:
Hassan Akbari,
Himani Arora,
Liangliang Cao,
Nima Mesgarani
Abstract:
In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram w…
▽ More
In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy.
△ Less
Submitted 26 October, 2017;
originally announced October 2017.
-
Computing Egomotion with Local Loop Closures for Egocentric Videos
Authors:
Suvam Patra,
Himanshu Aggarwal,
Himani Arora,
Chetan Arora,
Subhashis Banerjee
Abstract:
Finding the camera pose is an important step in many egocentric video applications. It has been widely reported that, state of the art SLAM algorithms fail on egocentric videos. In this paper, we propose a robust method for camera pose estimation, designed specifically for egocentric videos. In an egocentric video, the camera views the same scene point multiple times as the wearer's head sweeps ba…
▽ More
Finding the camera pose is an important step in many egocentric video applications. It has been widely reported that, state of the art SLAM algorithms fail on egocentric videos. In this paper, we propose a robust method for camera pose estimation, designed specifically for egocentric videos. In an egocentric video, the camera views the same scene point multiple times as the wearer's head sweeps back and forth. We use this specific motion profile to perform short loop closures aligned with wearer's footsteps. For egocentric videos, depth estimation is usually noisy. In an important departure, we use 2D computations for rotation averaging which do not rely upon depth estimates. The two modification results in much more stable algorithm as is evident from our experiments on various egocentric video datasets for different egocentric applications. The proposed algorithm resolves a long standing problem in egocentric vision and unlocks new usage scenarios for future applications.
△ Less
Submitted 17 January, 2017;
originally announced January 2017.