-
Which Part of the Heap is Useful? Improving Heap Liveness Analysis
Authors:
Vini Kanvar,
Uday P. Khedker
Abstract:
With the growing sizes of data structures allocated in heap, understanding the actual use of heap memory is critically important for minimizing cache misses and reclaiming unused memory. A static analysis aimed at this is difficult because the heap locations are unnamed. Using allocation sites to name them creates very few distinctions making it difficult to identify allocated heap locations that…
▽ More
With the growing sizes of data structures allocated in heap, understanding the actual use of heap memory is critically important for minimizing cache misses and reclaiming unused memory. A static analysis aimed at this is difficult because the heap locations are unnamed. Using allocation sites to name them creates very few distinctions making it difficult to identify allocated heap locations that are not used. Heap liveness analysis using access graphs solves this problem by (a) using a storeless model of heap memory by naming the locations with access paths, and (b) representing the unbounded sets of access paths (which are regular languages) as finite automata.
We improve the scalability and efficiency of heap liveness analysis, and reduce the amount of computed heap liveness information by using deterministic automata and by minimizing the inclusion of aliased access paths in the language. Practically, our field-, flow-, context-sensitive liveness analysis on SPEC CPU2006 benchmarks scales to 36 kLoC (existing analysis scales to 10.5 kLoC) and improves efficiency even up to 99%. For some of the benchmarks, our technique shows multifold reduction in the computed liveness information, ranging from 2 to 100 times (in terms of the number of live access paths), without compromising on soundness.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Enabling Communication via APIs for Mainframe Applications
Authors:
Vini Kanvar,
Srikanth Tamilselvam,
Keerthi Narayan Raghunath
Abstract:
For decades, mainframe systems have been vital in enterprise computing, supporting essential applications across industries like banking, retail, and healthcare. To harness these legacy applications and facilitate their reuse, there is increasing interest in using Application Programming Interfaces (APIs) to expose their data and functionalities, enabling the creation of new applications. However,…
▽ More
For decades, mainframe systems have been vital in enterprise computing, supporting essential applications across industries like banking, retail, and healthcare. To harness these legacy applications and facilitate their reuse, there is increasing interest in using Application Programming Interfaces (APIs) to expose their data and functionalities, enabling the creation of new applications. However, identifying and exposing APIs for various business use cases presents significant challenges, including understanding legacy code, separating dependent components, introducing new artifacts, and making changes without disrupting functionality or compromising key Service Level Agreements (SLAs) like Turnaround Time (TAT).
We address these challenges by proposing a novel framework for creating APIs for legacy mainframe applications. Our approach involves identifying APIs by compiling artifacts such as transactions, screens, control flow blocks, inter-microservice calls, business rules, and data accesses. We use static analyses like liveness and reaching definitions to traverse the code and automatically compute API signatures, which include request/response fields.
To evaluate our framework, we conducted a qualitative survey with nine mainframe developers, averaging 15 years of experience. This survey helped identify candidate APIs and estimate development time for coding these APIs on a public mainframe application, GENAPP, and two industry mainframe applications. The results showed that our framework effectively identified more candidate APIs and reduced implementation time. The API signature computation is integrated into IBM Watsonx Code Assistant for Z Refactoring Assistant. We verified the correctness of the identified APIs by executing them on an IBM Z mainframe system, demonstrating the practical viability of our approach.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Handling Communication via APIs for Microservices
Authors:
Vini Kanvar,
Ridhi Jain,
Srikanth Tamilselvam
Abstract:
Enterprises in their journey to the cloud, want to decompose their monolith applications into microservices to maximize cloud benefits. Current research focuses a lot on how to partition the monolith into smaller clusters that perform well across standard metrics like coupling, cohesion, etc. However, there is little research done on taking the partitions, identifying their dependencies between th…
▽ More
Enterprises in their journey to the cloud, want to decompose their monolith applications into microservices to maximize cloud benefits. Current research focuses a lot on how to partition the monolith into smaller clusters that perform well across standard metrics like coupling, cohesion, etc. However, there is little research done on taking the partitions, identifying their dependencies between the microservices, exploring ways to further reduce the dependencies, and making appropriate code changes to enable robust communication without modifying the application behaviour.
In this work, we discuss the challenges with the conventional techniques of communication using JSON and propose an alternative way of ID-passing via APIs. We also devise an algorithm to reduce the number of APIs. For this, we construct subgraphs of methods and their associated variables in each class and relocate them to their more functionally aligned microservices. Our quantitative and qualitative studies on five public Java applications clearly demonstrate that our refactored microservices using ID have decidedly better time and memory complexities than JSON. Our automation reduces 40-60\% of the manual refactoring efforts.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Generalizing the Liveness Based Points-to Analysis
Authors:
Uday P. Khedker,
Vini Kanvar
Abstract:
The original liveness based flow and context sensitive points-to analysis (LFCPA) is restricted to scalar pointer variables and scalar pointees on stack and static memory. In this paper, we extend it to support heap memory and pointer expressions involving structures, unions, arrays, and pointer arithmetic. The key idea behind these extensions involves constructing bounded names for locations in t…
▽ More
The original liveness based flow and context sensitive points-to analysis (LFCPA) is restricted to scalar pointer variables and scalar pointees on stack and static memory. In this paper, we extend it to support heap memory and pointer expressions involving structures, unions, arrays, and pointer arithmetic. The key idea behind these extensions involves constructing bounded names for locations in terms of compile time constants (names and fixed offsets), and introducing sound approximations when it is not possible to do so. We achieve this by defining a grammar for pointer expressions, suitable memory models and location naming conventions, and some key evaluations of pointer expressions that compute the named locations. These extensions preserve the spirit of the original LFCPA which is evidenced by the fact that although the lattices and flow functions change, the overall data flow equations remain unchanged.
△ Less
Submitted 21 November, 2014; v1 submitted 19 November, 2014;
originally announced November 2014.
-
Heap Abstractions for Static Analysis
Authors:
Vini Kanvar,
Uday P. Khedker
Abstract:
Heap data is potentially unbounded and seemingly arbitrary. As a consequence, unlike stack and static memory, heap memory cannot be abstracted directly in terms of a fixed set of source variable names appearing in the program being analysed. This makes it an interesting topic of study and there is an abundance of literature employing heap abstractions. Although most studies have addressed similar…
▽ More
Heap data is potentially unbounded and seemingly arbitrary. As a consequence, unlike stack and static memory, heap memory cannot be abstracted directly in terms of a fixed set of source variable names appearing in the program being analysed. This makes it an interesting topic of study and there is an abundance of literature employing heap abstractions. Although most studies have addressed similar concerns, their formulations and formalisms often seem dissimilar and some times even unrelated. Thus, the insights gained in one description of heap abstraction may not directly carry over to some other description. This survey is a result of our quest for a unifying theme in the existing descriptions of heap abstractions. In particular, our interest lies in the abstractions and not in the algorithms that construct them.
In our search of a unified theme, we view a heap abstraction as consisting of two features: a heap model to represent the heap memory and a summarization technique for bounding the heap representation. We classify the models as storeless, store based, and hybrid. We describe various summarization techniques based on k-limiting, allocation sites, patterns, variables, other generic instrumentation predicates, and higher-order logics. This approach allows us to compare the insights of a large number of seemingly dissimilar heap abstractions and also paves way for creating new abstractions by mix-and-match of models and summarization techniques.
△ Less
Submitted 13 May, 2015; v1 submitted 19 March, 2014;
originally announced March 2014.