-
Banking Turn of High-DOF Dynamic Morphing Wing Flight by Shifting Structure Response Using Optimization
Authors:
Bibek Gupta,
Yogi Shah,
Taoran Liu,
Eric Sihite,
Alireza Ramezani
Abstract:
The 3D flight control of a flapping wing robot is a very challenging problem. The robot stabilizes and controls its pose through the aerodynamic forces acting on the wing membrane which has complex dynamics and it is difficult to develop a control method to interact with such a complex system. Bats, in particular, are capable of performing highly agile aerial maneuvers such as tight banking and bo…
▽ More
The 3D flight control of a flapping wing robot is a very challenging problem. The robot stabilizes and controls its pose through the aerodynamic forces acting on the wing membrane which has complex dynamics and it is difficult to develop a control method to interact with such a complex system. Bats, in particular, are capable of performing highly agile aerial maneuvers such as tight banking and bounding flight solely using their highly flexible wings. In this work, we develop a control method for a bio-inspired bat robot, the Aerobat, using small low-powered actuators to manipulate the flapping gait and the resulting aerodynamic forces. We implemented a controller based on collocation approach to track a desired roll and perform a banking maneuver to be used in a trajectory tracking controller. This controller is implemented in a simulation to show its performance and feasibility.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Authors:
Mayank Mishra,
Matt Stallone,
Gaoyuan Zhang,
Yikang Shen,
Aditya Prasad,
Adriana Meza Soria,
Michele Merler,
Parameswaran Selvam,
Saptha Surendran,
Shivdeep Singh,
Manish Sethi,
Xuan-Hong Dang,
Pengyuan Li,
Kun-Lung Wu,
Syed Zawad,
Andrew Coleman,
Matthew White,
Mark Lewis,
Raju Pavuluri,
Yan Koyfman,
Boris Lublinsky,
Maximilien de Bayser,
Ibrahim Abdelaziz,
Kinjal Basu,
Mayank Agarwal
, et al. (21 additional authors not shown)
Abstract:
Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili…
▽ More
Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile all around code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI
Authors:
Hanxue Gu,
Roy Colglazier,
Haoyu Dong,
Jikai Zhang,
Yaqian Chen,
Zafer Yildiz,
Yuwen Chen,
Lin Li,
Jichen Yang,
Jay Willhite,
Alex M. Meyer,
Brian Guo,
Yashvi Atul Shah,
Emily Luo,
Shipra Rajput,
Sally Kuehn,
Clark Bulleit,
Kevin A. Wu,
Jisoo Lee,
Brandon Ramirez,
Darui Lu,
Jay M. Levin,
Maciej A. Mazurowski
Abstract:
Magnetic Resonance Imaging (MRI) is pivotal in radiology, offering non-invasive and high-quality insights into the human body. Precise segmentation of MRIs into different organs and tissues would be highly beneficial since it would allow for a higher level of understanding of the image content and enable important measurements, which are essential for accurate diagnosis and effective treatment pla…
▽ More
Magnetic Resonance Imaging (MRI) is pivotal in radiology, offering non-invasive and high-quality insights into the human body. Precise segmentation of MRIs into different organs and tissues would be highly beneficial since it would allow for a higher level of understanding of the image content and enable important measurements, which are essential for accurate diagnosis and effective treatment planning. Specifically, segmenting bones in MRI would allow for more quantitative assessments of musculoskeletal conditions, while such assessments are largely absent in current radiological practice. The difficulty of bone MRI segmentation is illustrated by the fact that limited algorithms are publicly available for use, and those contained in the literature typically address a specific anatomic area. In our study, we propose a versatile, publicly available deep-learning model for bone segmentation in MRI across multiple standard MRI locations. The proposed model can operate in two modes: fully automated segmentation and prompt-based segmentation. Our contributions include (1) collecting and annotating a new MRI dataset across various MRI protocols, encompassing over 300 annotated volumes and 8485 annotated slices across diverse anatomic regions; (2) investigating several standard network architectures and strategies for automated segmentation; (3) introducing SegmentAnyBone, an innovative foundational model-based approach that extends Segment Anything Model (SAM); (4) comparative analysis of our algorithm and previous approaches; and (5) generalization analysis of our algorithm across different anatomical locations and MRI sequences, as well as an external dataset. We publicly release our model at https://github.com/mazurowski-lab/SegmentAnyBone.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation
Authors:
Yuzhe Lu,
Sungmin Hong,
Yash Shah,
Panpan Xu
Abstract:
Writing radiology reports from medical images requires a high level of domain expertise. It is time-consuming even for trained radiologists and can be error-prone for inexperienced radiologists. It would be appealing to automate this task by leveraging generative AI, which has shown drastic progress in vision and language understanding. In particular, Large Language Models (LLM) have demonstrated…
▽ More
Writing radiology reports from medical images requires a high level of domain expertise. It is time-consuming even for trained radiologists and can be error-prone for inexperienced radiologists. It would be appealing to automate this task by leveraging generative AI, which has shown drastic progress in vision and language understanding. In particular, Large Language Models (LLM) have demonstrated impressive capabilities recently and continued to set new state-of-the-art performance on almost all natural language tasks. While many have proposed architectures to combine vision models with LLMs for multimodal tasks, few have explored practical fine-tuning strategies. In this work, we proposed a simple yet effective two-stage fine-tuning protocol to align visual features to LLM's text embedding space as soft visual prompts. Our framework with OpenLLaMA-7B achieved state-of-the-art level performance without domain-specific pretraining. Moreover, we provide detailed analyses of soft visual prompts and attention mechanisms, shedding light on future research directions.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
App for Resume-Based Job Matching with Speech Interviews and Grammar Analysis: A Review
Authors:
Tanmay Kulkarni,
Yuvraj Pardeshi,
Yash Shah,
Vaishnvi Sakat,
Sapana Bhirud
Abstract:
Through the advancement in natural language processing (NLP), specifically in speech recognition, fully automated complex systems functioning on voice input have started proliferating in areas such as home automation. These systems have been termed Automatic Speech Recognition Systems (ASR). In this review paper, we explore the feasibility of an end-to-end system providing speech and text based na…
▽ More
Through the advancement in natural language processing (NLP), specifically in speech recognition, fully automated complex systems functioning on voice input have started proliferating in areas such as home automation. These systems have been termed Automatic Speech Recognition Systems (ASR). In this review paper, we explore the feasibility of an end-to-end system providing speech and text based natural language processing for job interview preparation as well as recommendation of relevant job postings. We also explore existing recommender-based systems and note their limitations. This literature review would help us identify the approaches and limitations of the various similar use-cases of NLP technology for our upcoming project.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Modality-aware Transformer for Financial Time series Forecasting
Authors:
Hajar Emami,
Xuan-Hong Dang,
Yousaf Shah,
Petros Zerfos
Abstract:
Time series forecasting presents a significant challenge, particularly when its accuracy relies on external data sources rather than solely on historical values. This issue is prevalent in the financial sector, where the future behavior of time series is often intricately linked to information derived from various textual reports and a multitude of economic indicators. In practice, the key challen…
▽ More
Time series forecasting presents a significant challenge, particularly when its accuracy relies on external data sources rather than solely on historical values. This issue is prevalent in the financial sector, where the future behavior of time series is often intricately linked to information derived from various textual reports and a multitude of economic indicators. In practice, the key challenge lies in constructing a reliable time series forecasting model capable of harnessing data from diverse sources and extracting valuable insights to predict the target time series accurately. In this work, we tackle this challenging problem and introduce a novel multimodal transformer-based model named the \textit{Modality-aware Transformer}. Our model excels in exploring the power of both categorical text and numerical timeseries to forecast the target time series effectively while providing insights through its neural attention mechanism. To achieve this, we develop feature-level attention layers that encourage the model to focus on the most relevant features within each data modality. By incorporating the proposed feature-level attention, we develop a novel Intra-modal multi-head attention (MHA), Inter-modal MHA and Target-modal MHA in a way that both feature and temporal attentions are incorporated in MHAs. This enables the MHAs to generate temporal attentions with consideration of modality and feature importance which leads to more informative embeddings. The proposed modality-aware structure enables the model to effectively exploit information within each modality as well as foster cross-modal understanding. Our extensive experiments on financial datasets demonstrate that Modality-aware Transformer outperforms existing methods, offering a novel and practical solution to the complex challenges of multi-modal financial time series forecasting.
△ Less
Submitted 20 March, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Real-time Recognition of Yoga Poses using computer Vision for Smart Health Care
Authors:
Abhishek Sharma,
Yash Shah,
Yash Agrawal,
Prateek Jain
Abstract:
Nowadays, yoga has become a part of life for many people. Exercises and sports technological assistance is implemented in yoga pose identification. In this work, a self-assistance based yoga posture identification technique is developed, which helps users to perform Yoga with the correction feature in Real-time. The work also presents Yoga-hand mudra (hand gestures) identification. The YOGI datase…
▽ More
Nowadays, yoga has become a part of life for many people. Exercises and sports technological assistance is implemented in yoga pose identification. In this work, a self-assistance based yoga posture identification technique is developed, which helps users to perform Yoga with the correction feature in Real-time. The work also presents Yoga-hand mudra (hand gestures) identification. The YOGI dataset has been developed which include 10 Yoga postures with around 400-900 images of each pose and also contain 5 mudras for identification of mudras postures. It contains around 500 images of each mudra. The feature has been extracted by making a skeleton on the body for yoga poses and hand for mudra poses. Two different algorithms have been used for creating a skeleton one for yoga poses and the second for hand mudras. Angles of the joints have been extracted as a features for different machine learning and deep learning models. among all the models XGBoost with RandomSearch CV is most accurate and gives 99.2\% accuracy. The complete design framework is described in the present paper.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
AutoAI-TS: AutoAI for Time Series Forecasting
Authors:
Syed Yousaf Shah,
Dhaval Patel,
Long Vu,
Xuan-Hong Dang,
Bei Chen,
Peter Kirchner,
Horst Samulowitz,
David Wood,
Gregory Bramble,
Wesley M. Gifford,
Giridhar Ganapavarapu,
Roman Vaculin,
Petros Zerfos
Abstract:
A large number of time series forecasting models including traditional statistical models, machine learning models and more recently deep learning have been proposed in the literature. However, choosing the right model along with good parameter values that performs well on a given data is still challenging. Automatically providing a good set of models to users for a given dataset saves both time a…
▽ More
A large number of time series forecasting models including traditional statistical models, machine learning models and more recently deep learning have been proposed in the literature. However, choosing the right model along with good parameter values that performs well on a given data is still challenging. Automatically providing a good set of models to users for a given dataset saves both time and effort from using trial-and-error approaches with a wide variety of available models along with parameter optimization. We present AutoAI for Time Series Forecasting (AutoAI-TS) that provides users with a zero configuration (zero-conf ) system to efficiently train, optimize and choose best forecasting model among various classes of models for the given dataset. With its flexible zero-conf design, AutoAI-TS automatically performs all the data preparation, model creation, parameter optimization, training and model selection for users and provides a trained model that is ready to use. For given data, AutoAI-TS utilizes a wide variety of models including classical statistical models, Machine Learning (ML) models, statistical-ML hybrid models and deep learning models along with various transformations to create forecasting pipelines. It then evaluates and ranks pipelines using the proposed T-Daub mechanism to choose the best pipeline. The paper describe in detail all the technical aspects of AutoAI-TS along with extensive benchmarking on a variety of real world data sets for various use-cases. Benchmark results show that AutoAI-TS, with no manual configuration from the user, automatically trains and selects pipelines that on average outperform existing state-of-the-art time series forecasting toolkits.
△ Less
Submitted 8 March, 2021; v1 submitted 24 February, 2021;
originally announced February 2021.
-
Certificate and Signature Free Anonymity for V2V Communications
Authors:
Vipin Singh Sehrawat,
Yogendra Shah,
Vinod Kumar Choyi,
Alec Brusilovsky,
Samir Ferdi
Abstract:
Anonymity is a desirable feature for vehicle-to-vehicle (V2V) communications, but it conflicts with other requirements such as non-repudiation and revocation. Existing, pseudonym-based V2V communications schemes rely on certificate generation and signature verification. These schemes require cumbersome key management, frequent updating of certificate chains and other costly procedures such as cryp…
▽ More
Anonymity is a desirable feature for vehicle-to-vehicle (V2V) communications, but it conflicts with other requirements such as non-repudiation and revocation. Existing, pseudonym-based V2V communications schemes rely on certificate generation and signature verification. These schemes require cumbersome key management, frequent updating of certificate chains and other costly procedures such as cryptographic pairings. In this paper, we present novel V2V communications schemes, that provide authentication, authorization, anonymity, non-repudiation, replay protection, pseudonym revocation, and forward secrecy without relying on traditional certificate generation and signature verification. Security and privacy of our schemes rely on hard problems in number theory. Furthermore, our schemes guarantee security and privacy in the presence of subsets of colluding malicious parties, provided that the cardinality of such sets is below a fixed threshold.
△ Less
Submitted 16 August, 2020;
originally announced August 2020.
-
"The Squawk Bot": Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering
Authors:
Xuan-Hong Dang,
Syed Yousaf Shah,
Petros Zerfos
Abstract:
Multimodal analysis that uses numerical time series and textual corpora as input data sources is becoming a promising approach, especially in the financial industry. However, the main focus of such analysis has been on achieving high prediction accuracy while little effort has been spent on the important task of understanding the association between the two data modalities. Performance on the time…
▽ More
Multimodal analysis that uses numerical time series and textual corpora as input data sources is becoming a promising approach, especially in the financial industry. However, the main focus of such analysis has been on achieving high prediction accuracy while little effort has been spent on the important task of understanding the association between the two data modalities. Performance on the time series hence receives little explanation though human-understandable textual information is available. In this work, we address the problem of given a numerical time series, and a general corpus of textual stories collected in the same period of the time series, the task is to timely discover a succinct set of textual stories associated with that time series. Towards this goal, we propose a novel multi-modal neural model called MSIN that jointly learns both numerical time series and categorical text articles in order to unearth the association between them. Through multiple steps of data interrelation between the two data modalities, MSIN learns to focus on a small subset of text articles that best align with the performance in the time series. This succinct set is timely discovered and presented as recommended documents, acting as automated information filtering, for the given time series. We empirically evaluate the performance of our model on discovering relevant news articles for two stock time series from Apple and Google companies, along with the daily news articles collected from the Thomson Reuters over a period of seven consecutive years. The experimental results demonstrate that MSIN achieves up to 84.9% and 87.2% in recalling the ground truth articles respectively to the two examined time series, far more superior to state-of-the-art algorithms that rely on conventional attention mechanism in deep learning.
△ Less
Submitted 20 December, 2019;
originally announced December 2019.
-
Stem-driven Language Models for Morphologically Rich Languages
Authors:
Yash Shah,
Ishan Tarunesh,
Harsh Deshpande,
Preethi Jyothi
Abstract:
Neural language models (LMs) have shown to benefit significantly from enhancing word vectors with subword-level information, especially for morphologically rich languages. This has been mainly tackled by providing subword-level information as an input; using subword units in the output layer has been far less explored. In this work, we propose LMs that are cognizant of the underlying stems in each…
▽ More
Neural language models (LMs) have shown to benefit significantly from enhancing word vectors with subword-level information, especially for morphologically rich languages. This has been mainly tackled by providing subword-level information as an input; using subword units in the output layer has been far less explored. In this work, we propose LMs that are cognizant of the underlying stems in each word. We derive stems for words using a simple unsupervised technique for stem identification. We experiment with different architectures involving multi-task learning and mixture models over words and stems. We focus on four morphologically complex languages -- Hindi, Tamil, Kannada and Finnish -- and observe significant perplexity gains with using our stem-driven LMs when compared with other competitive baseline models.
△ Less
Submitted 25 October, 2019;
originally announced October 2019.
-
seq2graph: Discovering Dynamic Dependencies from Multivariate Time Series with Multi-level Attention
Authors:
Xuan-Hong Dang,
Syed Yousaf Shah,
Petros Zerfos
Abstract:
Discovering temporal lagged and inter-dependencies in multivariate time series data is an important task. However, in many real-world applications, such as commercial cloud management, manufacturing predictive maintenance, and portfolios performance analysis, such dependencies can be non-linear and time-variant, which makes it more challenging to extract such dependencies through traditional metho…
▽ More
Discovering temporal lagged and inter-dependencies in multivariate time series data is an important task. However, in many real-world applications, such as commercial cloud management, manufacturing predictive maintenance, and portfolios performance analysis, such dependencies can be non-linear and time-variant, which makes it more challenging to extract such dependencies through traditional methods such as Granger causality or clustering. In this work, we present a novel deep learning model that uses multiple layers of customized gated recurrent units (GRUs) for discovering both time lagged behaviors as well as inter-timeseries dependencies in the form of directed weighted graphs. We introduce a key component of Dual-purpose recurrent neural network that decodes information in the temporal domain to discover lagged dependencies within each time series, and encodes them into a set of vectors which, collected from all component time series, form the informative inputs to discover inter-dependencies. Though the discovery of two types of dependencies are separated at different hierarchical levels, they are tightly connected and jointly trained in an end-to-end manner. With this joint training, learning of one type of dependency immediately impacts the learning of the other one, leading to overall accurate dependencies discovery. We empirically test our model on synthetic time series data in which the exact form of (non-linear) dependencies is known. We also evaluate its performance on two real-world applications, (i) performance monitoring data from a commercial cloud provider, which exhibit highly dynamic, non-linear, and volatile behavior and, (ii) sensor data from a manufacturing plant. We further show how our approach is able to capture these dependency behaviors via intuitive and interpretable dependency graphs and use them to generate highly accurate forecasts.
△ Less
Submitted 7 December, 2018;
originally announced December 2018.
-
Secure Operations on Tree-Formed Verification Data
Authors:
Andreas U. Schmidt,
Andreas Leicher,
Yogendra Shah,
Inhyok Cha
Abstract:
We define secure operations with tree-formed, protected verification data registers. Functionality is conceptually added to Trusted Platform Modules (TPMs) to handle Platform Configuration Registers (PCRs) which represent roots of hash trees protecting the integrity of tree-formed Stored Measurement Logs (SMLs). This enables verification and update of an inner node of an SML and even attestation t…
▽ More
We define secure operations with tree-formed, protected verification data registers. Functionality is conceptually added to Trusted Platform Modules (TPMs) to handle Platform Configuration Registers (PCRs) which represent roots of hash trees protecting the integrity of tree-formed Stored Measurement Logs (SMLs). This enables verification and update of an inner node of an SML and even attestation to its value with the same security level as for ordinary PCRs. As an important application, it is shown how certification of SML subtrees enables attestation of platform properties.
△ Less
Submitted 19 August, 2010;
originally announced August 2010.
-
Tree-formed Verification Data for Trusted Platforms
Authors:
Andreas U. Schmidt,
Andreas Leicher,
Yogendra Shah,
Inhyok Cha
Abstract:
The establishment of trust relationships to a computing platform relies on validation processes. Validation allows an external entity to build trust in the expected behaviour of the platform based on provided evidence of the platform's configuration. In a process like remote attestation, the 'trusted' platform submits verification data created during a start up process. These data consist of hardw…
▽ More
The establishment of trust relationships to a computing platform relies on validation processes. Validation allows an external entity to build trust in the expected behaviour of the platform based on provided evidence of the platform's configuration. In a process like remote attestation, the 'trusted' platform submits verification data created during a start up process. These data consist of hardware-protected values of platform configuration registers, containing nested measurement values, e.g., hash values, of loaded or started components. Commonly, the register values are created in linear order by a hardware-secured operation. Fine-grained diagnosis of components, based on the linear order of verification data and associated measurement logs, is not optimal. We propose a method to use tree-formed verification data to validate a platform. Component measurement values represent leaves, and protected registers represent roots of a hash tree. We describe the basic mechanism of validating a platform using tree-formed measurement logs and root registers and show an logarithmic speed-up for the search of faults. Secure creation of a tree is possible using a limited number of hardware-protected registers and a single protected operation. In this way, the security of tree-formed verification data is maintained.
△ Less
Submitted 18 October, 2012; v1 submitted 5 July, 2010;
originally announced July 2010.
-
Information-theoretically Secret Key Generation for Fading Wireless Channels
Authors:
Chunxuan Ye,
Suhas Mathur,
Alex Reznik,
Yogendra Shah,
Wade Trappe,
Narayan Mandayam
Abstract:
The multipath-rich wireless environment associated with typical wireless usage scenarios is characterized by a fading channel response that is time-varying, location-sensitive, and uniquely shared by a given transmitter-receiver pair. The complexity associated with a richly scattering environment implies that the short-term fading process is inherently hard to predict and best modeled stochastic…
▽ More
The multipath-rich wireless environment associated with typical wireless usage scenarios is characterized by a fading channel response that is time-varying, location-sensitive, and uniquely shared by a given transmitter-receiver pair. The complexity associated with a richly scattering environment implies that the short-term fading process is inherently hard to predict and best modeled stochastically, with rapid decorrelation properties in space, time and frequency. In this paper, we demonstrate how the channel state between a wireless transmitter and receiver can be used as the basis for building practical secret key generation protocols between two entities. We begin by presenting a scheme based on level crossings of the fading process, which is well-suited for the Rayleigh and Rician fading models associated with a richly scattering environment. Our level crossing algorithm is simple, and incorporates a self-authenticating mechanism to prevent adversarial manipulation of message exchanges during the protocol. Since the level crossing algorithm is best suited for fading processes that exhibit symmetry in their underlying distribution, we present a second and more powerful approach that is suited for more general channel state distributions. This second approach is motivated by observations from quantizing jointly Gaussian processes, but exploits empirical measurements to set quantization boundaries and a heuristic log likelihood ratio estimate to achieve an improved secret key generation rate. We validate both proposed protocols through experimentations using a customized 802.11a platform, and show for the typical WiFi channel that reliable secret key establishment can be accomplished at rates on the order of 10 bits/second.
△ Less
Submitted 26 October, 2009;
originally announced October 2009.
-
Heterogeneous Relational Databases for a Grid-enabled Analysis Environment
Authors:
Arshad Ali,
Ashiq Anjum,
Tahir Azim,
Julian Bunn,
Saima Iqbal,
Richard McClatchey,
Harvey Newman,
S. Yousaf Shah,
Tony Solomonides,
Conrad Steenberg,
Michael Thomas,
Frank van Lingen,
Ian Willers
Abstract:
Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a…
▽ More
Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid.
△ Less
Submitted 10 April, 2005;
originally announced April 2005.