Download as pdf or txt
Download as pdf or txt
You are on page 1of 186

Compiler Design

Helmut Seidl Reinhard Wilhelm


Sebastian Hack

Compiler Design
Analysis and Transformation

123

Sebastian Hack
Programming Group
Universitt des Saarlandes
Saarbrcken, Germany

Helmut Seidl
Fakultt fr Informatik
Technische Universitt Mnchen
Garching, Germany
Reinhard Wilhelm
Compiler Research Group
Universitt des Saarlandes
Saarbrcken, Germany

ISBN 978-3-642-17547-3
DOI 10.1007/978-3-642-17548-0

ISBN 978-3-642-17548-0

(eBook)

Springer Heidelberg New York Dordrecht London


Library of Congress Control Number: 2012940955
ACM Codes: D.1, D.3, D.2
Springer-Verlag Berlin Heidelberg 2012
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief
excerpts in connection with reviews or scholarly analysis or material supplied specifically for the
purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the
work. Duplication of this publication or parts thereof is permitted only under the provisions of
the Copyright Law of the Publishers location, in its current version, and permission for use must always
be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright
Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

Preface

Compilers for programming languages should translate source-language programs


correctly into target-language programs, often programs of a machine language.
But not only that; they should often generate target-machine code that is as efficient as possible. This book deals with this problem, namely the methods to
improve the efficiency of target programs by a compiler.
The history of this particular subarea of compilation dates back to the early days
of computer science. In the 1950s, a team at IBM led by John Backus implemented
a first compiler for the programming language FORTRAN. The target machine
was the IBM 704, which was, according to todays standards, an incredibly small
and incredibly slow machine. This motivated the team to think about a translation
that would efficiently exploit the very modest machine resources. This was the
birth of optimizing compilers.
FORTRAN is an imperative programming language designed for numerical
computations. It offers arrays as data structures to store mathematical objects such
as vectors and matrices, and it offers loops to formulate iterative algorithms on
these objects. Arrays in FORTRAN, as well as in ALGOL 60, are very close to the
mathematical objects that are to be stored in them.
The descriptional comfort enjoyed by the numerical analyst was at odds with
the requirement of run-time efficiency of generated target programs. Several
sources for this clash were recognized, and methods to deal with them were
discovered. Elements of a multidimensional array are selected through sequences
of integer-valued expressions, which may lead to complex and expensive computations. Some numerical computations use the same or similar index expressions
at different places in the program. Translating them naively may lead to repeatedly
computing the same values. Loops often step through arrays with a constant
increment or decrement. This may allow us to improve the efficiency by computing the next address using the address used in the last step instead of computing
the address anew. By now, it should be clear that arrays and loops represent many
challenges if the compiler is to improve a programs efficiency compared to a
straightforward translation.

vi

Preface

Already the first FORTRAN compiler implemented several efficiency


improving program transformations, called optimizing transformations. They
should, however, be carefully applied. Otherwise, they would change the
semantics of the program. Most such transformations have applicability conditions, which when satisfied guarantee the preservation of the semantics. These
conditions, in general, depend on nonlocal properties of the program, which have
to be determined by a static analysis of the program performed by the compiler.
This led to the development of data-flow analysis. This name was probably
chosen to express that it determines the flow of properties of program variables
through programs. The underlying theory was developed in the 1970s when the
semantics of programming languages had been put on a solid mathematical basis.
Two doctoral dissertations had the greatest impact on this field; they were written
by Gary A. Kildall (1972) and by Patrick Cousot (1978). Kildall clarified the
lattice-theoretic foundations of data-flow analysis. Cousot established the relation
between the semantics of a programming language and static analyses of programs
written in this language. He therefore called such a semantics-based program
analysis abstract interpretation. This relation to the language semantics allows for
a correctness proof of static analyses and even for the design of analyses that are
correct by construction. Static program analysis in this book always means sound
static analysis. This means that the results of such a static analysis can be trusted.
A property of a program determined by a static analysis holds for all executions of
the program.
The origins of data-flow analysis and abstract interpretation thus lie in the area
of compilation. However, static analysis has emancipated itself from its origins and
has become an important verification method. Static analyses are routinely used in
industry to prove safety properties of programs such as the absence of run-time
errors. Soundness of the analyses is mandatory here as well. If a sound static
analysis determines that a certain run-time error will never occur at a program
point, this holds for all executions of the program. However, it may be that a
certain run-time error can never happen at a program point, but the analysis is
unable to determine this fact. Such analyses thus are sound, but may be incomplete. This is in contrast with bug-chasing static analysis, which may fail to detect
some errors and may warn about errors that will never occur. These analyses may
be unsound and incomplete.
Static analyses are also used to prove partial correctness of programs and to
check synchronization properties of concurrent programs. Finally, they are used to
determine execution-time bounds for embedded real-time systems. Static analyses
have become an indispensable tool for the development of reliable software.
This book treats the compilation phase that attempts to improve the efficiency
of programs by semantics-preserving transformations. It introduces the necessary
theory of static program analysis and describes in a precise way both particular
static analyses and program transformations. The basis for both is a simple programming language, for which an operational semantics is presented.
The volume Wilhelm and Seidl: Compiler Design: Virtual Machines treats
several programming paradigms. This volume, therefore, describes analyses and

Preface

vii

transformations for imperative and functional programs. Functional languages are


based on the k-calculus and are equipped with a highly developed theory of
program transformation.
Several colleagues and students contributed to the improvement of this book.
We would particularly like to mention Jrg Herter and Iskren Chernev, who
carefully read a draft of this translation and pointed out quite a number of
problems.
We wish the reader an enjoyable and profitable reading.
Mnchen and Saarbrcken, November 2011

Helmut Seidl
Reinhard Wilhelm
Sebastian Hack

General literature

The list of monographs that give an overview of static program analysis and abstract
interpretation is surprisingly short. The book by Matthew S. Hecht [Hec77],
summarizing the classical knowledge about data-flow analysis is still worth reading.
The anthology edited by Steven S. Muchnick and Neil D. Jones [MJ81], which was
published only a few years later, contains many original and influential articles about
the foundations of static program analysis and, in particular, the static analysis of
recursive procedures and dynamically allocated data structures. A similar collection
of articles about the static analysis of declarative programs was edited by Samson
Abramsky and Chris Hankin [AH87]. A comprehensive and modern introduction is
offered by Flemming Nielson, Hanne Riis Nielson and Chris Hankin [NNH99].
Several comprehensive treatments of compilation contain chapters about static
analysis [AG04, CT04, ALSU07]. Steven S. Muchnicks monograph Advanced
Compiler Design and Implementation [Muc97] contains an extensive treatment.The Compiler Design Handbook, edited by Y.N. Srikant and Priti Shankar
[SS03], offers a chapter about shape analysis and about techniques to analyze objectoriented programs.
Ongoing attempts to prove compiler correctness [Ler09, TL09] have led to an
increased interest in the correctness proofs of optimizing program transformations.
Techniques for the systematic derivation of correct program transformations are
described by Patrick and Radhia Cousot [CC02]. Automated correctness proofs of
optimizing program transformations are described by Sorin Lerner [LMC03,
LMRC05, KTL09].

ix

Inhalt

Foundations and Intraprocedural Optimization. .


1.1
Introduction. . . . . . . . . . . . . . . . . . . . . . . .
1.2
Avoiding Redundant Computations . . . . . . .
1.3
Background: An Operational Semantics . . . .
1.4
Elimination of Redundant Computations . . . .
1.5
Background: Complete Lattices . . . . . . . . . .
1.6
Least Solution or MOP Solution?. . . . . . . . .
1.7
Removal of Assignments to Dead Variables .
1.8
Removal of Assignments Between Variables .
1.9
Constant Folding . . . . . . . . . . . . . . . . . . . .
1.10 Interval Analysis . . . . . . . . . . . . . . . . . . . .
1.11 Alias Analysis . . . . . . . . . . . . . . . . . . . . . .
1.12 Fixed-Point Algorithms. . . . . . . . . . . . . . . .
1.13 Elimination of Partial Redundancies. . . . . . .
1.14 Application: Moving Loop-Invariant Code . .
1.15 Removal of Partially Dead Assignments . . . .
1.16 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . .
1.17 Literature . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

1
1
7
8
11
16
27
32
40
43
54
67
83
89
97
102
108
114

Interprocedural Optimization . . . . . . . . . . . .
2.1
Programs with Procedures . . . . . . . . . . .
2.2
Extended Operational Semantics . . . . . .
2.3
Inlining . . . . . . . . . . . . . . . . . . . . . . . .
2.4
Tail-Call Optimization . . . . . . . . . . . . .
2.5
Interprocedural Analysis . . . . . . . . . . . .
2.6
The Functional Approach . . . . . . . . . . .
2.7
Interprocedural Reachability . . . . . . . . .
2.8
Demand-Driven Interprocedural Analysis
2.9
The Call-String Approach . . . . . . . . . . .
2.10 Exercises. . . . . . . . . . . . . . . . . . . . . . .
2.11 Literature . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

115
115
117
121
123
124
125
131
132
135
137
139

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

xi

xii

Inhalt

.
.
.
.
.
.
.
.
.
.

141
142
143
146
147
149
155
159
166
170

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

171

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

175

Optimization of Functional Programs . . . . . . . . . . . . . .


3.1
A Simple Functional Programming Language . . . . .
3.2
Some Simple Optimizations . . . . . . . . . . . . . . . . .
3.3
Inlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4
Specialization of Recursive Functions. . . . . . . . . . .
3.5
An Improved Value Analysis. . . . . . . . . . . . . . . . .
3.6
Elimination of Intermediate Data Structures . . . . . .
3.7
Improving the Evaluation Order: Strictness Analysis
3.8
Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9
Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

Chapter 1

Foundations and Intraprocedural Optimization

1.1 Introduction
This section presents basic techniques to improve the quality of compiler-generated
code. The quality metric need not be a priori fixed. It could be the execution time, the
required space, or the consumed energy. This book, however, is primarily concerned
with methods to improve the execution time of programs.
We now give several examples of how to improve the execution time of programs.
One strategy to improve the efficiency of programs is to avoid superfluous computations. A computation may be superfluous when it has already been performed, and
when a repetition would provably always produce the same result. The compiler can
avoid this recomputation of the same result if it takes care to store the result of the
first computation. The recomputation can then be avoided by accessing this stored
value.
The execution time of a program can be also reduced if some of the computations
can already be done at compile time. Constant folding replaces expressions whose
value is already known at compile time by this value. This optimization supports the
development of generic programs, often called program families. These are parametrized in a number of variables and thus can be instantiated to many different
variants by supplying different combinations of parameter values. This is good and
effective development practice, for instance, in the embedded-systems industry. One
generic power-train control program may be instantiated to many different versions
for different car engines. Constant folding eliminates the loss in efficiency that could
result from such a programming style.
Checks for run-time errors can be eliminated if it is clear that they would always
fail, that is, if these errors would provably never happen. A good example is the check
for index out of bounds. It checks the indices of arrays against their lower and upper
bounds. These checks can be avoided if the indices provably always lie within these
bounds.
Another idea to improve the efficiency of programs is to move computations
from more frequently executed program parts into less frequently executed parts.
H. Seidl et al., Compiler Design, DOI: 10.1007/978-3-642-17548-0_1,
Springer-Verlag Berlin Heidelberg 2012

1 Foundations and Intraprocedural Optimization

An example of this kind of optimization is to move loop-invariant computations out


of loops.
Some operations are more costly in execution time than others. For example,
multiplication is more expensive than addition. Multiplication can be defined, and
this means also replaced by, repeated addition. An optimization, called reduction in
operator strength would, under certain conditions, replace a multiplication occurring
in a loop by an addition.
Finally, procedure inlining, i.e., replacing a procedure call by an appropriately
instantiated body of the procedure, eliminates the procedure-call overhead and often
opens the way to new optimizations.
The following example shows how big the impact of optimizations on the quality
of generated code can be:
Example 1.1.1 Consider a program that should sort an array a written in an imperative programming language. This program would use the following function swap:
void swap ( int i, int j) {
int t;
if (a[i] > a[ j]) {
t a[ j];
a[ j] a[i];
a[i] t;
}
}
The inefficiencies of this implementation are apparent. The addresses of a[i] and
a[ j] are computed three times. This leads to 6 address computations altogether.
However, two should be sufficient. In addition, the values of a[i] and a[ j] are loaded
twice, resulting in four memory accesses where two should be sufficient.
These inefficiencies can be removed by an implementation as suggested by the
array concept of the C programming language. The idea is to access array elements
through pointers. Another idea is to store addresses that are used multiple times.
void swap (int p, int q) {
int t, ai, a j;
ai p; a j q;
if (ai > a j) {
t a j;
q ai;
p t;
}
}
Looking more closely at this new code reveals that the temporary variable t can be
eliminated as well.
This second version is apparently more efficient, while the original version was
much more intuitive. High-level programming languages are designed to allow intu-

1.1 Introduction

itive formulations of algorithms. It is then the duty of the compiler to generate efficient
target programs.


Optimizing program transformations ought to preserve the semantics of the program,
as defined through the semantics of the programming language in which the program
is written.
Example 1.1.2 Consider the transformation:
y f() + f();

==

y 2 f();

The idea behind the optimization is to save the evaluation of the second call of the
function f. However, the program resulting from this transformation is only equivalent
to the original program if the second call to f is guaranteed to produce the same result
and if the call does not produce a side effect. This last condition is not immediately
clear for functions written in an imperative language.


So-called program optimizations are not correct if they change the semantics of the
program. Therefore, most optimizing transformations have an associated applicability condition. This is a sufficient condition for the preservation of the semantics of
programs. Checking the satisfaction of these applicability conditions is the duty of
static program analysis. Such analyses need to be automatic, that is, run without user
intervention, as they will have to be performed by the compiler.
A careful treatment of the issue of semantics preservation needs several proofs.
First, a proof is needed that the applicability condition is, in fact, a sufficient condition
for semantics preservation. A second proof is needed that the analysis that is to
determine the applicability is correct, will never give wrong answers to the question
posed by the applicability condition. Both proofs refer to an operational semantics
as their basis.
Several optimizations are effective across several classes of programming languages. However, each programming language and also each class of programming
languages additionally require specific optimizations, designed to improve the efficiency of particular language constructs. One such example is the compile-time
removal of dynamic method invocations in object-oriented programs. A static method
call, which replaces a dynamic call, can be inlined and thus opens the door for further
optimizations. This is very effective since methods in object-oriented programs are
often rather small. In Fortran, on the other hand, inlining does not play a comparably large role. For Fortran, the parallelization or vectorization of nested loops has
greater impact.
The programming language, in particular its semantics, also has a strong influence on the efficiency and the effectiveness of program analyses. The programming
language may enforce restrictions whose validation would otherwise require an enormous effort. A major problem in the analysis of imperative programs is the determination of dependencies between the statements in programs. Such dependencies
restrict the compilers possibility to reorder statements to better exploit the resources

1 Foundations and Intraprocedural Optimization

of the target machine. The unrestricted use of pointers, as in the C programming language, makes this analysis of dependencies difficult due to the alias-problem created
through pointers. The more restricted use of pointers in Java eases the corresponding
analysis.
Example 1.1.3 Let us look at the programming language Java. Inherently inefficient language constructs are the mandatory checks for indices out of array bounds,
dynamic method invocation, and storage management for objects. The absence of
pointer arithmetic and of pointers into the stack increases the analyzability of Java
programs. On the other hand, dynamic loading of classes may ruin the precision of
Java analyses due to the lack of information about their semantics and their implementation. Further tough challenges for an automatic static analysis are offered by
language constructs such as exceptions, concurrency, and reflection, which still may
be useful for the Java programmer.
We have stressed in the preface that sound static program analysis has become
a verification technology. It is therefore interesting to draw the connection to the
problem of proving the correctness of Java programs. Any correctness proof needs
a formally specified semantics of the programming language. Quite some effort went
into the development of such a semantics for Java. Still, Java programs with a formal
correctness proof are rather rare, not because of a principal impossibility, but due to
the sheer size of the necessary effort. Java just has too many language constructs,
each with its non-trivial semantics.


For this reason, we will not use Java as our example language. Instead we use a
small subset of an imperative programming language. This subset is, on the one
hand, simple enough to limit the descriptional effort, and is, on the other hand, realistic enough to include essential problems of actual compilers. This programminglanguage fragment can be seen as an intermediate language into which source programs are translated. The int variables of the program can be seen as virtual registers.
The compiler backend will, during register allocation, assign physical registers to
them as far as such physical registers are available. Such variables can also be used
to store addresses for indirect memory accesses. Arithemtic expressions represent
computations of, in our fragment, int values. Finally, the fragment contains an abitrarily large array M, into which int values can be stored and from which they can
be retrieved. This array can be imagined as the whole (virtual) memory allocated to
a program by the operating system.
The separation between variables and memory may, at first glance, look somewhat
artificial. It is motivated by the wish to avoid the alias problem. Both a variable x
and a memory-access expression M[] denote containers for values. The identity of a
memory cell denoted by M[e] is not directly visible because it depends on the value of
the expression e. In general, it is even undecidable whether M[e1 ] and M[e2 ] denote
the same memory cell. This is different for variables: A variable name x is the only
name by which the container associated with x can be accessed. This is important for
many program analyses: If the analysis is unable to derive the identity of the memory
cell denoted by M[e] in a write access then no assumptions can be made about the

1.1 Introduction

contents of the rest of memory. The analysis looses much precision. The derivation
of assumptions about the contents of containers associated with variables is easier
since no indirect access to their containers is possible.
Our language fragment has the following constructs:
variables :
arithmetic expressions :
assignments :
reading access to memory :
writing access to memory :
conditional statement :
unconditional jump :

x
e
x e
x M[e]
M[e1 ] e2
if(e) s1 else s2
gotoL

Note that we have not included explicit loop constructs. These can be realized by
conditional and unconditional jumps to labeled program points. Also missing so far
are functions and procedures. This chapter is therefore restricted to the analysis and
optimization of single functions.
Example 1.1.4 Let us again consider the function swap() of Example 1.1.1. How
would a compiler translate the body of this function into our language fragment?
The array a can be allocated into some section of the memory M. Accesses to array
components need to be translated into explicit address calculations. The result of a
schematic, nonoptimized translation could be:
0:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10 :
11 :
12 :
13 :

A1 A0 + 1 i;
//
A0 = &a[0]
//
R1 = a[i]
R1 M[A1 ];
A2 A0 + 1 j;
//
R2 = a[ j]
R2 M[A2 ];
if (R1 > R2 ) {
A0 + 1 j;
A3
t
M[A3 ];
A0 + 1 j;
A4
A0 + 1 i;
A5
M[A5 ];
R3
M[A4 ] R3 ;
A0 + 1 i;
A6
M[A6 ] t;
}
//

We assume that variable A0 holds the start address of the array a. Note that this
code makes explicit the inherent inefficiencies discussed in Example 1.1.1. Which
optimizations are applicable to this code?
Optimization 1: 1 R == R
The scaling factor generated by an automatic (and schematic) translation of array
indexing can be dispensed with if this factor is 1 as is the case in the example.

1 Foundations and Intraprocedural Optimization

Optimization 2: Reuse of values calculated for (sub)expressions


A closer look at the example shows that the variables A1 , A5 , and A6 have the same
values as is the case for the variables A2 , A3 , and A4 :
A1 = A5 = A6

A2 = A3 = A4

In addition, the memory accesses M[A1 ] and M[A5 ] as well as the accesses M[A2 ]
and M[A3 ] will deliver the same values:
M[A1 ] = M[A5 ]

M[A2 ] = M[A3 ]

Therefore, the variables R1 and R3 , as well as the variables R2 and t also contain the
same values:
R2 = t
R1 = R3
If a variable x already contains the value of an expression e whose value is required
then xs value can be used instead of reevaluating the expression e. The program can
be greatly simplified by exploiting all this information:
A1 A0 + i;
R1 M[A1 ];
A2 A0 + j;
R2 M[A2 ];
if (R1 > R2 ) {
M[A2 ] R1 ;
M[A1 ] R2 ;
}
The temporary variable t as well as the variables A3 , A4 , A5 , and R3 are now superfluous and can be eliminated from the program.
The following table lists the achieved savings:

+
*
load
store
>

Before

After

6
6
4
2
1
6

2
0
2
2
1
2




1.1 Introduction

The optimizations applied to the function swap by hand should, of course,


be done in an automated way. The following sections will introduce the necessary
analyses and transformations.

1.2 Avoiding Redundant Computations


This chapter presents a number of techniques to save computations that the program
would otherwise needlessly perform. We start with an optimization that avoids redundant computations, that is, multiple evaluations of the same expression guaranteed
to produce the same result. This first example is also used to exemplify fundamentals
of the approach. In particular, an operational semantics of our language fragment
is introduced in a concise way, and the necessary lattice-theoretic foundations are
discussed.
A frequently used trick to speed up algorithms is to trade time against space, more
precisely, invest some additional space in order to speed up the programs execution.
The additional space is used to save some computed values. These values are then
later retrieved instead of recomputed. This technique is often called memoization.
Let us consider the profitability of such a transformation replacing a recomputation
by an access to a stored value. Additional space is needed for the storage of this
value. The recomputation does not disappear completely, but is replaced by an access
to the stored value. This access is cheap if the value is kept in a register, but it
can also be expensive if the value has to be kept in memory. In the latter case,
recomputing the value may, in fact, be cheaper. To keep things simple, we will
ignore such considerations of the costs and benefits, which are highly architecturedependent. Instead, we assume that accessing a stored value is always cheaper than
recomputing it.
The computations we consider here are evaluations of expressions. The first
problem is to recognize potential recomputations.
Example 1.2.1 Consider the following program fragment:

A:
B:

z 1;
y M[5];
x1 y + z ;
...
x2 y + z ;

It seems like at program point B, the expression y + z will be evaluated a second time
yielding the same value. This is true under the following conditions: The occurrence
of y + z at program point B is always evaluated after the one at program point A,
and the values of the variables y and z have the same values before B that they had
before A.



1 Foundations and Intraprocedural Optimization

Our conclusion from the example is that for a systematic treatment of this optimization we need to be able to answer the following questions:
Will one evaluation of an expression always be executed before another one?
Does a variable always have the same value at a given program point that it had at
another program point?
To answer these types of questions, we need several things: an operational semantics,
which defines what happens when a program is executed, and a method that identifies
redundant computations in programs. Note that we are not so ambitious as to attempt
to identify all redundant computations. This problem is undecidable. In practice,
the method to be developed should at least find some redundant computations and
should never classify as redundant a computation that, in fact, is not redundant.

1.3 Background: An Operational Semantics


Small-step operational semantics have been found to be quite adequate for correctness
proofs of program analyses and transformations. Such a semantics formalizes what
a step in a computation is. A computation is then a sequence of such steps.
We start by choosing a suitable program representation, control-flow graphs. The
vertices of these graphs correspond to program points; we will therefore refer to
these vertices as program points. Program execution traverses these vertices. The
edges of the graph correspond to steps of the computation. They are labeled with
the corresponding program actions, that is, with conditions, assignments, loads and
stores from and to memory, or with the empty statement, ;. Program point start
represents the entry point of the program, and stop the exit point.
Possible edge labels are:
test:
assignment:
load:
store:
empty statement:

NonZero (e) or Zero (e)


x e
x M[e]
M[e1 ] e2
;

A section of the control-flow graph for the body of the function swap is shown in
Fig. 1.1. Sometimes, we omit an edge label ;. A conditional statement with condition
e in a program has two corresponding edges in the control-flow graph. The one
labeled with NonZero(e) is taken if the condition e is satisfied. This is the case
when e evaluates to some value not equal to 0. The edge labeled with Zero is taken
if the condition is not satisfied, i.e., when e evaluates to 0.
Computations are performed when paths of the control-flow graph are traversed.
They transform the program state. Program states can be represented as pairs
s = (, )

1.3 Background: An Operational Semantics

start
A1 A0 + 1 i
R1 M [A1 ]
A2 A0 + 1 j
R2 M [A2 ]
Zero (R1 > R2 )

NonZero (R1 > R2 )

stop
A3 A0 + 1 j

Fig. 1.1 A section of the control-flow graph for swap()

The function maps each program variable to its actual value, and the function
maps each memory address to the actual contents of the corresponding memory cell.
For simplicity, we assume that the values of variables and memory cells are integers.
The types of the functions and , thus, are:
: Vars int value of variables
: N int memory contents

An edge k = (u, lab, v) with source vertex u, target vertex v and label lab defines
a transformation [[k]] of the state before the execution of the action labeling the edge
to a state after the execution of the action. We call this transformation the effect of the
edge. The edge effect need not be a total function. It could also be a partial function.
Program execution in a state s will not execute the action associated with an edge
if the edge effect is undefined for s. There may be two reasons for an edge being
undefined: The edge may be labeled with a condition that is not satsified in all states,
or the action labeling the edge may cause a memory access outside of a legal range.
The edge effect [[k]] of the edge k = (u, lab, v) only depends on its label lab:
[[k]] = [[lab]]
The edge effects [[lab]] are defined as follows:
[[; ]] (, ) = (, )
[[NonZero(e)]] (, ) = (, )
[[Zero(e)]] (, ) = (, )
[[e]] } , )
[[x e]] (, ) = ( {x

[[x M[e]]] (, ) = ( {x

([[e]])} , )
[[e2 ]]} )
[[M[e1 ] e2 ]] (, ) = (, {[[e1 ]]

if [[e]] 
=0
if [[e]] = 0

10

1 Foundations and Intraprocedural Optimization

An empty statement does not change the state. Conditions NonZero(e) and Zero(e),
represent partial identities; the associated edge effects are only defined if the conditions are satisfied, that is if the expression e evaluated to a value not equal to or
equal to 0, resp. They do, however, not change the state. Expressions e are evaluated by an auxiliary function [[e]], which takes a variable binding of the programs
variables and calculates es value in the valuation . As usual, this function is
defined by induction over the structure of expressions. This is now shown for some
examples:
[[x + y]] {x

7, y

1} = 6
[[(x = 4)]] {x

5}
= 0 = 1
The operator denotes logical negation.
An assignment x e modifies the -component of the state. The resulting
holds the value [[e]] for variable x, that is, the value obtained by evaluating e
in the old variable binding . The memory M remains unchanged by this assignment. The formal definition of the change to uses an operator . This operator modifies a given function such that it maps a given argument to a given new
value:

d
if y x
d}(y) =
{x

(y) otherwise
A load action, x M[e], is similar to an assignment with the difference that the
new value of variable x is determined by first calculating a memory address and then
loading the value stored at this address from memory.
The store operation, M[e1 ] e2 , has the most complex semantics. Values of
variables do not change. The following sequence of steps is performed: The values
of the expressions e1 , e2 are computed. e1 s value is the address of a memory cell at
which the value of e2 is stored.
We assume for both load and store operations that the address expressions deliver
legal addresses, i.e., values > 0.
Example 1.3.1 An assignment x x + 1 in a variable binding {x

5} results in:
[[x x + 1]] ({x

5}, ) = (, )
where:

= {x

5} {x

[[x + 1]] {x

5}}
= {x

5} {x

6}
= {x

6}



We have now established what happens when edges of the control-flow graph are
traversed. A computation is (the traversal of) a path in the control-flow graph leading

1.3 Background: An Operational Semantics

11

from a starting point u to an endpoint v. Such a path is a sequence = k1 . . . kn


of edges ki = (u i , labi , u i+1 ) of the control-flow graph (i = 1, . . . , n 1), where
u 1 = u and u n = v. The state transformation [[]] corresponding to is obtained as
the composition of the edge effects of the edges of :
[[]] = [[kn ]] . . . [[k1 ]]
Note that, again, the function [[]] need not be defined for all states. A computation
along starting in state s is only possible if [[]] is defined for s.

1.4 Elimination of Redundant Computations


Let us return to our starting point, the attempt to find an analysis that determines for
each program point whether an expression has to be newly evaluated or whether it has
an already computed value. The method to do this is to identify expressions available
in variables. An expression e is available in variable x at some program point if it
has been evaluated before, the resulting value has been assigned to x, and neither x
nor any of the variables in e have been modified in between. Consider an assignment
x e such that x 
Vars(e), that is, x does not occur in e. Let = k1 . . . kn be a
path from the entry point of the program to a program point v. The expression e is
available in x at v if the two following conditions hold:
The path contains an edge ki , labeled with an assignment x e.
No edge ki+1 , . . . , kn is labeled with an assignment to one of the variables in
Vars(e) {x}.
For simplicity, we say in this case that the assignment x e is available at v
Otherwise, we call e or x e, resp., not available in x at v. We assume that no
assignment is available at the entry point of the program. So, none are available at
the end of an empty path = .
Regard an edge k = (u, lab, v) and assume we knew the set A of assignments
available at u, i.e., at the source of k. The action labeling this edge determines which
assignments are added to or removed from the availability set A. We look for a
function [[k]] such that the set of assignments available at v, i.e., at the target of k, is
obtained by applying [[k]] to A. This function [[k]] should only depend on the label
of k. It is called the abstract edge effect in contrast to the concrete edge effect of the
operational semantics. We now define the abstract edge effects [[k]] = [[lab]] for
different types of actions.
Let Ass be the set of all assignments of the form x e in the program and with
the constraint that x 
Vars(e). An assignment violating this constraint cannot be
considered as available at the subsequent program point and therefore are excluded
from the set Ass. Let us assume that A Ass is available at the source u of the edge

12

1 Foundations and Intraprocedural Optimization

k = (u, lab, v). The set of assignments available at the target of k is determined
according to:
[[; ]] A = A
 A = A
[[NonZero(e)]] A = [[Zero(e)]]

(A\Occ(x)) {x e} if x 
Vars(e)
[[x e]] A =
A\Occ(x)
otherwise
[[x M[e]]] A = A\Occ(x)
[[M[e1 ] e2 ]] A = A
where Occ(x) denotes the set of all assignments in which x occurs either on the left or
in the expression on the right side. An empty statement and a condition do not change
the set of available assignments. Executing an assignment to x means evaluating the
expression on the right side and assigning the resulting value to x. Therefore, all
assignments that contain an occurrence of x are removed from the available-set.
Following this, the actual assignment is added to the available-set provided x does
not occur in the right side. The abstract edge effect for loads from memory looks
similar. Storing into memory does not change the value of any variable, hence, A
remains unchanged.
The abstract effects, which were just defined for each type of label, are composed
to an abstract effect [[]] for a path = k1 . . . kn in the following way:
[[]] = [[kn ]] . . . [[k1 ]]
The set of assignments available at the end of a path from the entry point of the
program to program point v is therefore obtained as:
[[]] = [[kn ]] (. . . ([[k1 ]] ) . . .)
Applying such a function associated with a path can be used to determine which
assignments are available along the path. However, a program will typically have
several paths leading to a program point v. Which of these paths will actually be
taken at program execution may depend on program input and is therefore unknown
at analysis time. We define an assignment x e to be definitely available at a
program point v if it is available along all paths leading from the entry node of the
program to v. Otherwise, x e is possibly not available at v. Thus, the set of
assignments definitely available at a program point v is:
A [v]


{[[]] | : start v}

where start v denotes the set of all paths from the entry point start of the
program to the program point v. The sets A[v] are called the merge-over-all-paths
(MOP) solution of the analysis problem. We temporarily postpone the question of

1.4 Elimination of Redundant Computations

13

A1 A + 7

A1 A + 7

B1 M [A1 ]

B1 M [A1 ]

B2 B1 1

B2 B1 1

A2 A + 7

A2 A1

M [A2 ] B2

M [A2 ] B2

Fig. 1.2 Transformation RE applied to the code for a[7];

how to compute these sets. Instead, we discuss how the analysis information can be
used for optimizing the program.
Transformation RE:
An assignment x e is replaced by an assignment x y, if y e is definitely
available at program point u just before this assignment, i.e., y e is contained in
the set A [u]. This is formally described by the following graph rewrite rule:
u

y e A [u]
xe

u
xy

Analogous rules describe the replacement of expressions by variable accesses in


conditions, in loads from and in stores into memory.
The transformation RE is called redundancy elimination. The transformation
appears quite simple. It may, however, require quite some effort to compute the
program properties necessary to ascertain the applicability of the transformation.
Example 1.4.1 Regard the following program fragment:
x y + 3;
x 7;
z y + 3;
The assignment x y + 3 is not available before, but it is available after the first
statement. The second assignment overwrites the value of x. So, the third assignment
can not be simplified using rule RE.


Example 1.4.2 Consider the C statement a[7]--; as implemented in our
language fragment. Assume that the start address of the array a is contained in variable A. Figure 1.2 shows the original control-flow graph of the program fragment
together with the application of transformation rule RE. The right side, A + 7, of the

14

1 Foundations and Intraprocedural Optimization

assignment A2 A + 7 can be replaced by the variable A1 since the assignment




A1 A + 7 is definitely available just before the assignment A2 A + 7.
According to transformation RE, the evaluation of an expression is not always
replaced by a variable look-up, when the evaluation is definitely repeated. Additionally, the result of the last evaluation still should be available in a variable, see
Example 1.4.1. In order to increase applicability of the transformation, a compiler
therefore could introduce a dedicated variable for each expression occurring in the
program. To develop a corresponding transformation is the task of Exercise 5.
To decide when the application of the transformation RE is profitable can be nontrivial. Storing values of subexpressions costs storage resources. Access to stored
values will be fast if the compiler succeeds to keep the values in registers. However,
registers are scarce. Spending one of them for temporary storage may cause more
costs somewhere else. Storing the value in memory, on the other hand, will result in
long access times in which case it may be cheaper to reevaluate the expression.
Let us turn to the correctness proof of the described transformation. It can be split
into two parts:
1. The proof of correctness of the abstract edge effects [[k]] with respect to the
definition of availability;
2. The proof of correctness of the replacement of definitely available expressions
by accesses to variables.
We only treat the second part. Note that availability of expressions has been
introduced by means of semantic terms, namely the evaluation of expressions and
the assignment of their values to variables. In order to formulate the analysis, we
then secretly switched to syntactic terms namely, labeled edges of the control-flow
graph, paths in this graph, and occurrences of variables on the left and right side of
assignments or in conditions. The proof thus has to connect syntax with semantics.
Let be a path leading from the entry point of the program to a program point u,
and let s = (, ) be the state after the execution of the path . Let y e be an
assignment such that y 
Vars(e) holds and that y e is available at u. It can be
shown by induction over the length of executed paths that the value of y in state
s is equal to the value of the expression e when evaluated in the valuation , i.e.,
(y) = [[e]] .
Assume that program point u has an outgoing edge k labeled with assignment
x e, and that y e is contained in A [u], i.e., definitely available. y e is in
particular available at the end of path . Therefore, (y) = [[e]] holds. Under this
condition, the assignment x e can be replaced by x y.
The proof guarantees the correctness of the analysis and the associated transformation. But what about the precision of the analysis? Does a compiler realizing this
analysis miss some opportunity to remove redundant computations, and if so, why?
There are, in fact, several reasons why this can happen. The first reason is caused by
infeasible paths. We have seen in Sect. 1.3 that a path may be not executable in all
states or even in not any states at all. In the latter case, such a path is called infeasible.
The composition of the concrete edge effects of such a path is not defined anywhere.

1.4 Elimination of Redundant Computations

15

0
y1
Zero(x > 1)
5

NonZero(x > 1)
2
y xy
3
xx1
4

A[0]
A[1] (A[0]\Occ(y)) {y 1}
A[1] A[4]
A[2] A[1]
A[3] A[2]\Occ(y)
A[4] A[3]\Occ(x)
A[5] A[1]

Fig. 1.3 The system of inequalities for the factorial function

The abstract edge effects of our analysis, however, are total functions. They do not
know about infeasibility. Such a path would be considered in forming the intersection
in the definition of definite availability and may pollute the information if this path
does not contain an assignment available on all other paths.
A second reason is the following: Assume that the assignment x y + z is
available at program point u, and that there exists an edge k = (u, y e, v) leaving
u. Assume further that the value of e is always the one that y has at u. In this case,
the transformation replacing y + z by x would still be correct although x y + z
would no longer be recognized as available at v.
An important question remains: How are the sets A [u] computed? The main idea
is to derive from the program a system of inequalities that characterizes these values:
A[start]
A[v]
[[k]] (A[u])

for an edge k = (u, lab, v)

The first inequality expresses the assumption that no assignments are available at
the entry of the program. Further, each edge k leading from a node u to a node v
generates an inequality of the second kind. [[k]] (A[u]) are the assignments that
are propagated as available along the edge k = (u, lab, v), either since they were
available at u and survived [[k]] or since they were made available by [[k]]. This set
is at most available at v since other edges may target v along which these assignments
might not be available.
Example 1.4.3 Let us consider the program implementing the factorial function as
in Fig. 1.3. We see that the system of inequalities can be produced from the controlflow graph and the abstract edge transformers in a straightforward way. The only
assignment whose left-side variable does not also occur on the right side is y 1.
The complete lattice for the analysis of available assignments therefore consists of
only two elements, and {y 1}. Correspondingly, Occ(y) = {y 1} and
Occ(x) = hold.

16

1 Foundations and Intraprocedural Optimization


A[0] = A[1] = A[2] = A[3] = A[4] = A[5] =

Fig. 1.4 A trivial solution of the system of inequalities of Example 1.4.3

Figure 1.4 shows a trivial solution of this system of inequalities. In this case, this
is the only solution. In general, there could be several solutions. In the availableassignment analysis, we are interested in largest sets. The larger the sets, the more
assignments have been shown to be available, and the more optimizations can be
performed. In consequence, we consider an analysis more precise that identifies
more assignments as available.
In this case, the largest solution is the best solution. The question is, does a best
solution always exist? If yes, can it be efficiently computed? We generalize the
problem a bit to be able to systematically answer the question as to the existence
of best solutions of systems of inequalities and as to their efficient computation.
This general treatment will provide us universal algorithms for solving virtually all
program-analysis problems in this book.
The first observation is that the set of possible values for the unknowns A[v]
forms a partial order with respect to the subset relation . The same holds for the
superset relation . These partial orders have the additional property that each subset
of X has a least upper bound and a greatest lower bound, namely the union and the
intersection, respectively, of the sets in X . Such a partial order is called a complete
lattice.
A further observation is that the abstract edge transformers [[k]] are monotonic
functions, that is, they preserve the ordering relation between values:
[[k]] (B1 ) [[k]] (B2 ) if B1 B2

1.5 Background: Complete Lattices


This section presents fundamental notions and theorems about complete lattices,
solutions of systems of inequalities, and the essentials of methods to compute least
solutions. The reader should not be confused about best solutions being least solutions, although in the available-assignments analysis the largest solution was claimed
to be the best solution. The following treatment is in terms of partial orders, , where
less is, by convention, always more precise. In the case of available assignments, we
therefore take the liberty to set  = . We start with definitions of partial orders
and complete lattices.
A set D together with a relation  on D D is called a partial order if for all
a, b, c D it holds that:

1.5 Background: Complete Lattices

17

aa
a  b b  a = a = b
a  b b  c = a  c

reflexivity
antisymmetry
transitivity

The sets we consider in this book consist of information at program points about
potential or definite program behaviors. In our running example, such a piece of
information at a program point is a set of available assignments. The ordering relation
indicates precision. By convention, less means more precise. More precise in the
context of program optimizations should mean enabling more optimizations. For the
available-assignments analysis, more available assignments means potentially more
enabled optimizations. So, the ordering relation  is the superset relation .
We give some examples of partial orders, representing lattices graphically as
directed graphs. Vertices are the lattice elements. Edges are directed upwards and
represent the  relation. Vertices not connected by a sequence of edges are incomparable by .
1. The set 2{a,b,c} of all subsets of the set {a, b, c} together with the relation :
a, b, c
a, b

a, c

b, c

2. The set of all integer numbers Z together with the relation :


2
1
0
-1

3. The set of all integer numbers Z = Z {}, extended by an additional element


together with the order:
-2

-1

An element d D is called an upper bound for a subset X D if


x d

for all x X

An element d is called a least upper bound of X if


1. d is an upper bound of X , and
2. d  y holds for each upper bound y of X .

18

1 Foundations and Intraprocedural Optimization

Not every subset of a partially ordered set has an upper bound, let alone a least upper
bound. The set {0, 2, 4} has the upper bounds 4, 5, . . . in the partially ordered set Z
of integer numbers, with the natural order , while the set {0, 2, 4, . . .} of all even
numbers has no upper bound.
A partial order D is a complete lattice if each subset
 X D possesses a least
upper bound. This least upper bound is represented as X . Forming the least upper
bound of a set of elements is an important operation in program analysis. Let us
consider the situation that several edges of the control-flow graph have the same target
node v. The abstract edge effects associated with these edges propagate different
information towards v. The least upper bound operator then can be applied to combine
the incoming information in a sound way to a value at v.
Each element is an upper bound of the empty set of elements of D. The least
upper bound of the empty set, therefore, is less than or equal to any other element
of the complete lattice. This least element is called the bottom element of the lattice.
The set of all elements of a complete lattice also possesses an upper bound. Each
complete lattice therefore also has a greatest element, , called the top element. Let
us consider the partial orders of our examples. We have:
1. The set D = 2{a,b,c} of all subsets of the basic set {a, b, c} and, in general, of
each base set together with the subset relation is a complete lattice.
2. The set Z of the integer numbers with the partial order is not a complete lattice.
3. The set Z together with the equality relation = is also not a clomplete lattice.
A complete lattice, however, is obtained if an extra least element, , and an
extra greatest element, , is added:

-2

-1

This lattice Z
= Z {, } contains only a minimum of pairs in the ordering
relation. Such lattices are called flat.
In analogy to upper and least upper bounds, one can define lower and greatest lower
bounds for subsets of partially ordered sets. For a warm-up, we prove the following
theorem:
Theorem 1.5.1 Each subset X of a complete lattice D has a greatest lower bound
X.


Proof Let U = {u D | x X : u  x}the set of all lower bounds of the set


X . The set U has a least upper bound g : = U since D is a complete lattice. We
claim that g is the desired greatest lower bound of X .
We first show that g is a lower bound of the set X . For this, we take an arbitrary
element x X . It holds u  x for each u U , since each u U is even a lower
bound for the whole set X . Therefore, x is an upper bound of the set U , and therefore

1.5 Background: Complete Lattices

19

Fig. 1.5 The least upper bound and the greatest lower bound for a subset X

greater than or equal to the least upper bound of U , i.e., g  x. Since x was an
arbitrary element, g is in deed a lower bound of X .
Since g is an upper bound of U and therefore greater than or equal to each element
in U , i.e., u  g for all u U , g is the greatest lower bound of X , which completes
the proof.


Figure 1.5 shows a complete lattice, a subset, and its greatest lower and least upper
bounds. That each of its subsets has a least upper bound makes a complete lattice
out of a partially ordered set. Theorem 1.5.1 says that each subset also has a greatest
lower bound.
Back to our search for ways to determine solutions for systems of inequalities!
Recall that the unknowns in the inequalities for the analysis of available assignments
are the sets A[u] for all program points u. The complete lattice D of values for
these unknowns is the powerset lattice 2Ass , where the partial order is the superset
relation .
All inequalities for the same unknown v can be combined into one inequality by
applying the least upper bound operator to the right sides of the original inequalities.
This leads to the form:
A[start]

= start
A[v]
{[[k]] (A[u]) | k = (u, lab, v) edge} for v 
This reformulation does not change the set of solutions due to
x  d1 . . . x  dk iff x 


{d1 , . . . , dk }

20

1 Foundations and Intraprocedural Optimization

As a result, we obtain the generic form of a system of inequalities specifying a


program-analysis problem:
xi

f i (x1 , . . . , xn )

i = 1, . . . , n

The functions f i : Dn D describe how the unknowns xi depend on other


unknowns. One essential property of the functions f i that define the right sides
of the inequalities is their monotonicity. This property guarantees that an increase
of values on right-hand sides, may have no impact or increase also the values on the
left-hand sides. A function f : D1 D2 between the two partial orders D1 , D2 is
monotonic, if a  b implies f (a)  f (b). For simplicity, the two partial orders in
D1 and in D2 have been represented by the same symbol, .
Example 1.5.1 For a set U , let D1 = D2 = 2U be the powerset lattice with the
partial order . Each function f defined through f x = (x a) b for a, b U
is monotonic. A function g defined through g x = a \ x for a 
= , however, is not
monotonic.
The functions inc and dec defined as inc x = x + 1 and dec x = x 1 are
monotonic on D1 = D2 = Z together with the partial order .
The function inv defined through inv x = x is not monotonic.


If the functions f 1 : D1 D2 and f 2 : D2 D3 are monotonic so is their
composition f 2 f 1 : D1 D3 .
If D2 is a complete lattice then the set [D1 D2 ] of monotonic functions f :
D1 D2 forms a complete lattice, where
f  g iff f x  g x for all x D1

holds. In particular, for F [D1 D2 ] the function f defined by f x = {g x |
g F} is again monotonic, and it is the least upper bound of the set F.
Let us consider the case D1 = D2 = 2U . For functions f i x = ai x bi , where
ai , bi U , the operations ,  and  can be described by operations on the
sets ai , bi :
( f 2 f 1 ) x = a1 a2 x a2 b1 b2

composition

( f 1  f 2 ) x = (a1 a2 ) x b1 b2

union

( f 1  f 2 ) x = (a1 b1 ) (a2 b2 ) x b1 b2 intersection


Functions of this form occur often in so-called bit-vector frameworks.
Our goal is to find a least solution in a complete lattice D for the system of
inequalities
xi  f i (x1 , . . . , xn ),

i = 1, . . . , n

()

1.5 Background: Complete Lattices

21

where the functions f i : Dn D that define the right sides of the inequalities are
monotonic. We exploit that Dn is a complete lattice if D is one. We combine the n
functions f i to one function f : Dn Dn to simplify the presentation of the underlying problem. This function f is defined through f (x1 , . . . , xn ) = (y1 , . . . , yn ),
where yi = f i (x1 , . . . , xn ). It turns out that this constructions leads from monotonic
component functions to a monotonic combined function. This transformation of the
problem has reduced our problem to one of finding a least solution for a single
inequality x  f x, however in the slightly more complex complete lattice Dn .
The search proceeds in the following way: It starts with an element d that is as
small as possible, for instance, with d = = (, . . . , ), the least element in Dn .
In case d  f d holds, a solution has been found. Otherwise, d is replaced by f d
and tested for being a solution. If not, f is applied to f d and so on.
Example 1.5.2 Consider the complete lattice D = 2{a,b,c} with the partial order
 = and the system of inequalities:
x1 {a} x3
x2 x3 {a, b}
x3 x1 {c}
The iterative search for a least solution produces the results for the different iteration
steps as they are listed in the following table:

x1
x2
x3

1
2
3
4
{a} {a, c} {a, c} ditto

{a}
{c} {a, c} {a, c}

We observe that at least one value for the unknowns increases in each iteration until
finally a solution is found.


We convince ourselves of the fact that this is the case for any complete lattice
given that right sides of equations are monotonic. More precisely, we show:
Theorem 1.5.2 Let D be a complete lattice and f : D D be a monotonic function.
Then the following two claims hold:
1. The sequence , f , f 2 , . . . is an ascending chain, i.e., it holds that f i1 
f i for all i 1.
2. If d = f n1 = f n then d is the least element d  satisfying d   f (d  ).
Proof The first claim is proved by induction: For i = 1, the first claim holds since
f 11 = f 0 = is the least element of the complete lattice and therefore less
than or equal to f 1 = f . Assume that the claim holds for i 1 1, i.e.,
f i2  f i1 holds. The monotonicity of the function f implies:
f i1 = f ( f i2 )  f ( f i1 ) = f i

22

1 Foundations and Intraprocedural Optimization

We conclude that the claim also holds for i. Therefore, the claim holds for all i 1.
Let us now regard the second claim. Assume that
d = f n1 = f n
Then d is a solution of the inequality x  f x. Let us further assume we have another
solution d  of the same inequality. Thus, d   f d  holds. It suffices to show that
f i  d  holds for all i 0. This is again shown by induction. It is the case for
i = 0. Let i > 0 and f i1  d  . The monotonicity of f implies
f i = f ( f i1 )  f d   d 
since d  is a solution. This proves the claim for all i.




Theorem 1.5.2 supplies us with a method to determine not only a solution, but even
the least solution of an inequality, assuming that the ascending chain f i eventually
stabilizes, i.e., becomes constant at some i. It is therefore sufficient for the termination
of our search for a fixed point that all ascending chains in D eventually stabilize.
This is always the case in finite lattices.
The solution found by the iterative method is the least solution not only of the
inequality x  f x, but is also the least solution of the equality x = f x, i.e., it is
the least fixed point of f . What happens if not all ascending chains of the complete
lattice eventually stabilize? Then the iteration may not always terminate. Nontheless,
a least solution is guaranteed to exist.
Theorem 1.5.3 (KnasterTarski) Each monotonic function f : D D on a
complete lattice D has a least fixed point d0 , which is also the least solution of
the inequality x  f x.
Proof A solution of the inequality x  f x is also called a post-fixed point of f .
Let P = {d D | d  f d} be the set of post-fixed points of f . We claim that the
greatest lower bound d0 of the set P is the least fixed point of f .
We first prove that d0 is an element of P, i.e., is a post-fixed point of f . It is clear
that f d0  f d  d for each post-fixed point d P. Thus f d0 is a lower bound of
P and is therefore less than or equal to the greatest lower bound, i.e., f d0  d0 .
d0 is a lower bound of P, and it is an element of P. It is thus the least post-fixed
point of f . It remains to prove that d0 also is a fixed point of f and therefore the
least fixed point of f .
We know already that f d0  d0 holds. Let us consider the other direction: The
monotonicity of f implies f ( f d0 )  f d0 . Therefore, f d0 is a post-fixed point
of f , i.e., f d0 P. Since d0 is a lower bound of P, the inequality d0  f d0
follows.


Theorem 1.5.3 guarantees that each monotonic function f on a complete lattice has
a least fixed point, which conicides with the least solution of the inequality x  f x.

1.5 Background: Complete Lattices

23

Example 1.5.3 Let us consider the complete lattice of the natural numbers augmented by , i.e., D = N {} together with the partial order . The function inc
defined by inc x = x + 1 is monotonic. We have:
inci = inci 0 = i  i + 1 = inci+1
Therefore, this function has a least fixed point, namely, . This fixed point will not
be reached after finitely many iteration steps.


Theorem 1.5.3 can be applied to the complete lattice with the dual partial order 
(instead of ). Thus, we obtain that each monotonic function not only has a least,
but also a greatest fixed point.
Example 1.5.4 Let us consider again the powerset lattice D = 2U for a base set U
and a function f with f x = x a b. This function is monotonic. It therefore has
a least and a greatest fixed point. Fixed-point iteration delivers for f :
f fk fk
0
U
1 b ab
2 b ab


With this newly acquired knowledge, we return to our application, which is to solve
systems of inequalities
xi  f i (x1 , . . . , xn ),

i = 1, . . . , n

()

over a complete lattice D for monotonic functions f i : Dn D. Now we know that


such a system of inequalities always has a least solution, which coincides with the
least solution of the associated system of equations
xi = f i (x1 , . . . , xn ),

i = 1, . . . , n

In the instances of static program analysis considered in this book, we will frequently meet complete lattices where ascending chains eventually stabilize. In these
cases, the iterative procedure of repeated evaluation of right-hand sides according to
Theorem 1.5.2, is able to compute the required solution. This naive fixed-point iteration, however, is often quite inefficient.
Example 1.5.5 Let us consider again the factorial program in Example 1.4.3. The
fixed-point iteration to compute the least solution of the system of inequalities for
available assignments is shown in Fig. 1.6. The values for the unknowns stabilize
only after four iterations.



24

1 Foundations and Intraprocedural Optimization


1

1 {y 1} {y 1}

2 {y 1} {y 1} {y 1}

4 {y 1}

5 {y 1} {y 1} {y 1}

ditto

Fig. 1.6 Naive fixed-point iteration for the program in Example 1.4.3
1
0

1 {y 1}
2 {y 1}
3
4
5

ditto

Fig. 1.7 Round-robin iteration for the program in Example 1.4.3

How can naive fixed-point iteration be improved? A significant improvement is


already achieved by round-robin iteration. In round-robin iteration, the computation
of a value in a new round does not use the values computed in the last round, but for
each variable xi the last value which has been computed for xi . In the description
of the algorithm, we must distinguish between the unknowns xi and their values.
For that purpose we introduce an array D that is indexed with the unknowns. The
array component D[xi ] always holds the value of the unknown xi . The array D is
successively updated until it finally contains the resulting variable assignment.
for (i 1; i n; i++) D[xi ] ;
do {
finished true;
for (i 1; i n; i++) {
new f i (D[x1 ], . . . , D[xn ]);
if ((D[xi ]  new)) {
finished false;
D[xi ] D[xi ]  new;
}
}
} while (finished)
Example 1.5.6 Let us consider again the system of inequalities for available assignments for the factorial program in Example 1.4.3. Figure 1.7 shows the corresponding
round-robin iteration. It appears that three iteration rounds suffice.



1.5 Background: Complete Lattices

25

Let us have a closer look at round-robin iteration. The assignment D[xi ] D[xi ] 
new; in our implementation does not just overwrite the old value of xi , but replaces
it by the least upper bound of the old and the new value. We say that the algorithm
accumulates the solution for xi during the iteration. In the case of a monotonic
function f i , the least upper bound of old and new values for xi is equal to the new
value. For a non-monotonic function f i , this need not be the case. The algorithm
is robust enough to compute an ascending chain of values for each unknown xi
even in the non-monotonic case. Thus, it still returns some solution of the system of
inequalities whenever it terminates.
The run time of the algorithm depends on the number of times the do-while loop
is executed. Let h be the maximum of the lengths of all proper ascending chains, i.e.,
one with no repetitions
 d1  d2  . . .  dh
in the complete lattice D. This number is called the height of the complete lattice D.
Let n be the number of unknowns in the system of inequalities. Round-robin iteration
needs at most h n rounds of the do-while loop until the values of all unknowns for
the least solution are determined and possibly one more round to detect termination.
The bound h n can be improved to n if the complete lattice is of the form 2U for
some base set U , if all functions f i are constructed from constant sets and variables
using only the operations and . The reason for this is the following: whether an
element u U is in the result set for the unknowns xi is independent of whether any
other element u  is contained in these sets. For which variables xi a given element u
is in the result sets for xi can be determined in n iterations over the complete lattice
2{u} of height 1. Round-robin iteration for all u U is performed in parallel by
using the complete lattice 2U instead of the lattice 2{u} . These bounds concern the
worst case. The least solution is often found in far fewer iterations if the variables
are ordered appropriately.
Will this new iteration strategy also find the least solution if naive fixed-point
iteration would have found the least solution? To answer this question at least in the
monotonic case, we assume again that all functions f i are monotonic. Let yi(d) be
the ith component of F d and xi(d) be the value of D[xi ] after the dth execution of
the do-while loop of round-robin iteration. For all i = 1, . . . , n and d 0 we prove
the following claims:
(d)

(d)

1. yi  xi  z i for each solution (z 1 , . . . , z n ) of the system of inequalities;


2. if the round-robin iteration terminates then the variables x1 , . . . , xn will, after
termination, contain the least solution of the system of inequalities;
(d)
(d)
3. yi  xi .
Claim 1 is shown by induction. It implies that all approximations xi(d) lie below
the value of the unknown xi in the least solution. Let us assume that the round-robin
(d)
iteration terminates after round d. The values xi therefore satisfy the system of
inequalities and thereby are a solution. Because of claim 1, they also form a least
solution. This implies claim 2.

26

1 Foundations and Intraprocedural Optimization

Favorable:

Unfavorable:

y1

y1
Zero(x > 1)
5

NonZero(x > 1)

Zero(x > 1)

NonZero(x > 1)
3

y xy

y xy
2

xx1

xx1
1

Fig. 1.8 A favorable and an unfavorable order of unknowns

0 {y 1} {y 1}

1 {y 1} {y 1} {y 1}
2

ditto

3 {y 1} {y 1}

4 {y 1}
5

Fig. 1.9 Round-robin iteration for the unfavorable order of Fig. 1.8

Claim 1 also entails that after d rounds the round-robin iteration computes values
at least as large as the naive fixed-point iteration. If the naive fixed-point iteration
terminates after round d, then the round-robin iteration terminates after at most d
rounds.
We conclude that round-robin iteration is never slower than naive fixed-point
iteration. Nevertheless, round-robin iteration can be performed more or less cleverly. Its efficiency substantially depends on the order in which the variables are
reevaluated. It is favorable to reevaluate a variable xi on which another variable x j depends before this variable. This strategy leads to termination with a
least solution after one execution of the do-while loop for an acyclic system of
inequalities.
Example 1.5.7 Let us consider again the system of inequalities for the determination
of available assignments for the factorial program in Example 1.4.3. Figure 1.8 shows
a favorable and an unfavorable order of unknowns.
In the unfavorable case, iteration needs four rounds for this program, as shown in
Fig. 1.9.



1.6 Least Solution or MOP Solution?

27

1.6 Least Solution or MOP Solution?


Section 1.5 presented methods to determine least solutions of systems of inequalities.
Let us now apply these techniques for solving program analysis problems such as
availability of expressions in variables. Assume we are given a control-flow graph.
The analysis problem consists in computing one information for each program point,
i.e., each node v in the control-flow graph. A specifation of the analysis then consists
of the following items:
a complete lattice D of possible results for the program points;
a start value d0 D for the entry point start of the program; together with
a function [[k]] : D D for each edge k of the control-flow graph, which
is monotonic. These functions are also called the abstract edge effects for the
control-flow graph.
Each such specification constitutes an instance of the monotonic analysis framework.
For availability of expressions in variables we provided such a specification, and we
will see more instances of this framework in the coming sections.
Given an instance of the monotonic analysis framework, we can define for each
program point v, the value
I [v] =


{[[]] d0 | : start v}

The mapping I is called the merge over all paths solution (in short: MOP solution)
of the analysis problem. On the other hand, we can put up a system of inequalities
which locally describes how information is propagated between nodes along the
edges of the control-flow graph:
I[start]  d0
I[v]
 [[k]] (I[u])

for each edge k = (u, lab, v)

According to the theorems of the last section, this system has a least solution. And if
the complete lattice D has finite height, this least solution can be computed by means,
e.g., of round-robin iteration. The following theorem clarifies the relation between
the least solution of the inequalities and the MOP solution of the analysis.
Theorem 1.6.1 (Kam and Ullman 1975) Let I denote the MOP solution of an
instance of the monotonic framework and I the least solution of the corresponding
system of inequalities. Then for each program point v,
I[v]  I [v]
holds. This means that for each path from program entry to v, we have:
I[v]  [[]] d0 .

()

28

1 Foundations and Intraprocedural Optimization

Proof We prove the claim () by induction over the length of . For the empty path
, i.e., = , we have:
[[]] d0 = [[]] d0 = d0  I[start]
Otherwise is of the form =  k for an edge k = (u, lab, v). According to the
induction hypothesis, the claim holds for the shorter path  , that is, [[  ]] d0  I[u].
It follows that:
[[]] d0 = [[k]] ([[  ]] d0 )
 [[k]] (I[u])
since [[k]] is monotonic
 I[v]
since I is a solution



This proves the claim.

Theorem 1.6.1 is somewhat disappointing. We would have hoped that the least solution was the same as the MOP solution. Instead, the theorem tells us that the least
solution is only an upper bound of the MOP solution. This means that, in general,
the least solution may be not as precise as the MOP solution and thus exhibit less
opportunities for optimization as the MOP. Still, in many practical cases the two
solutions agree. This is, in particular, the case if all functions [[k]] are distributive.
A function f : D1 D2 is called


distributive, if f ( X ) =
{ f x | x X } holds for all nonempty subsets
X D;
strict, if f = ;
totally distributive, if f is distributive and strict.
Example 1.6.1 Let us consider the complete lattice D = N {} with the canonical
order . The function inc defined by inc x = x + 1 is distributive, but not strict.
As another example, let us look at the function
add : (N {})2 (N {})
where add (x1 , x2 ) = x1 + x2 , and where the complete lattice (N {})2 is
component-wise ordered. We have:
add = add (0, 0) = 0 + 0 = 0
Therefore, this function is strict. But it is not distributive, as the following counterexample shows:
add ((1, 4)  (4, 1)) = add (4, 4) = 8

= 5 = add (1, 4)  add (4, 1)



1.6 Least Solution or MOP Solution?

29

Example 1.6.2 Let us again consider the powerset lattice D = 2U with the partial
order . For all a, b U the function f defined by f x = x a b is distributive
since


( X ) a b = {x a | x X } b
= {x a b | x X }
= { f x | x X}
for each nonempty subset X D. The function f is, however, strict only if b =
holds.
Functions f of the form f x = (x a) b have similar properties on the powerset
lattice D = 2U with the reversed order . For this partial
means
 order, distributivity


that for each nonempty subset X 2U it holds that f ( X ) = { f x | x X }. 
There exists a precise characterization of all distributive functions if their domain is
an atomic lattice. Let A be a complete lattice. An element a A is called atomic if
a 
= holds and the only elements a  A with a   a are the elements a  =
and a  = a. A complete lattice A is called atomic if each element d A is the least
upper bound of all atomic elements a  d in A.
In the complete lattice N {} of Example 1.6.1, 1 is the only atomic element.
Therefore, this lattice is not atomic. In the powerset lattice 2U , ordered by the subset
relation , the atomic elements are the singleton sets {u}, u U . In the powerset
lattice with the same base set, but the reversed order , the atomic elements are the
sets (U \{u}), u U . The next theorem states that for atomic lattices distributive
functions are uniquely determined by their values for the least element and for the
atomic elements.
Theorem 1.6.2 Let A and D be complete lattices where A is atomic. Let A A be
the set of atomic elements of A. It holds that
1. Two distributive functions f, g : A D are equal if and only if f () = g()
and f (a) = g(a) for all a A.
2. Each pair (d, h) such that d D and h : A D define a distributive function
f d,h : A D by:
f d,h (x) = d 


{h(a) | a A, a  x},

x A

Proof We only prove the first claim. If the functions f and g are equal they agree on
and the atomic elements of A. For the opposite direction, we regard an arbitrary
element x A. For x = holds f (x) = g(x) according to our assumption. For
x
= , the set A x = {a A | a  x} is not empty. It follows that:

f (x) = 
f ( Ax )
= { f (a) | a A, a  x}
= {g(a) | a A, a  x} = g(x)
which was to be proved.




30

1 Foundations and Intraprocedural Optimization

Note that each distributive function f : D1 D2 is also monotonic. a  b holds


if and only if a  b = b holds. If a  b holds we have:
f b = f (a  b) = f a  f b
Consequently, we have f a  f b, what was to be shown.




There is an important theorem for program analyses with distributive edge effects:
Theorem 1.6.3 (Kildall 1972) Assume that every program point v is reachable from
the programs entry point. Assume further that all edge effects [[k]] : D D are
distributive. The least solution I of the system of inequalities agrees with the MOP
solution I , i.e.,
I [v] = I[v]
for all program points v.
Proof Because of Theorem 1.6.1 it suffices to show that I[v]  I [v] holds for all
v. Since I is the least solution of the system of inequalities it suffices to show that
under the given circumstances, I is a solution, that is, satisfies all inequalities. For
the entry point start of the program, we have:
I [start] =


{[[]] d0 | : start start}  [[]] d0  d0

For each edge k = (u, lab, v) we check that



{[[]] d0 | : start v}
  
 {[[ k]] d0 |  : start u}

= {[[k]] ([[  ]] d0 ) |  : start u}

= [[k]] ( {[[  ]] d0 |  : start u})

I [v] =

= [[k]] (I [u])
The next to last equality holds since the set {  |  : start u} of all paths from
the entry point of the program start to u is not empty, and since the abstract edge
effect [[k]] is distributive. We conclude that I satisfies all inequalities. This proves
the claim.


The following example shows that in Theorem 1.6.3, the assumption that all
program points are reachable is necessary.
Example 1.6.3 Regard the control-flow graph of Fig. 1.10. As complete lattice we
choose D = N {} with the canonical order . As single edge effect we choose

1.6 Least Solution or MOP Solution?

31

7
0

inc
1

Fig. 1.10 A control-flow graph showing the consequences of unreachability

the distributive function inc. For an arbitrary starting value at the entry point, we
have
I[2] = inc (I[1])
= inc 0
=1
On the other hand, we have:

I [2] =

=0

since there is no path from the entry point 0 to program point 2. It follows that the
MOP solution is different from the least solution.


It is not critical to assume that all program points are reachable. Unreachable program
points can be easily identified and then removed without changing the semantics of
the program.
Conclusion 1.6.1 We gather all the observations about monotonic analysis frameworks.
The MOP solution of a monotonic analysis framework is always less or equal to
the least solution of the corresponding system of inequalities.
If all edge effects are distributive and every program point is reachable from the
entry point of the program, the MOP solution coincides with teh least solution.
Round-robin iteration can be used to determine the least solution if all ascending
chains in the complete lattice have finite lengths.
Let us apply these observations to the analysis of the availability of expressions
in variables. In this analysis, the complete lattice D = 2Ass is a finite powerset lattice
with the order . The value for the entry point of the program is d0 = , and the
abstract edge effects [[k]] are functions f of the form
f x = (x a)\b = (x a) b
for b = Ass\b. Example 1.6.2 shows that all such functions are distributive. Thus,
round-robin iteration for correpsonding systems of inequalities computes the MOP
solution, provided that all program points are reachable from the entry point of the
program.


We conclude the section about the removal of redundant computations. The transformation we presented has several disadvantages:

32

1 Foundations and Intraprocedural Optimization

The analysis of availability of expressions in variables may fail to discover a


redundant computation because it requires an available expression, i.e., an expression
whose reevaluation could be avoided, to be available in the same variable along all
paths. It also misses to identify expressions as available which occur in conditions or
index expressions, because their values are not available in variables. At the expense
of introducing extra auxiliary variables, the compiler could transform the program
before-hand in order to make this program analysis more effective.
This transformation introduces unique temporary variables Te for selected expressions e and insert an assignments of e into Te at each occurrence of e (see Exercise 5).
An assignment x e thus is decomposed into the sequence
Te e; x Te
which is the evaluation of the right-hand side, followed by a variable-to-variable
assignment. Most of these variable-to-variable assignments, though, turn out to be
superfluous, and thus should better be removed. Transformations doing that are provided in Sect. 1.8.

1.7 Removal of Assignments to Dead Variables


So far, we have only met a single optimizing transformation. It replaces the recomputation of an expression by accessing a previously computed value, provided that
this value is guaranteed to be available in a variable. To present this transformation and, in particular, to prove properties like correctness and termination of the
associated static program analysis, we introduced an operational semantics of our
language, complete lattices as domains of analysis information, and abstractions of
the operational semantics to statically analyze programs. This foundational background enables us now to introduce more optimizing transformations and their associated program analyses quite easily.
Example 1.7.1 Let us regard the following example:
0:
1:
2:

x y + 2;
y 5;
x y + 3;

The value of program variable x at program points 0 and 1 is not of any importance.
It is overwritten before being used. We therefore call variable x dead at this program
point. The first assignment to x can be removed because the value of x before the
second assignment is irrelevant for the semantics of the program. We also call this
assignment dead. These notions are now made precise.



1.7 Removal of Assignments to Dead Variables


Fig. 1.11 Example for liveness of variables

33
xy+2
0

y5
1

xy+3
2

Let us assume that, after program execution, the values of the variables from some
set X Vars are still needed. This set X can be empty, in case all variables are only
used within the program under analysis. However, the analysis to be presented can
also be applied to individual procedure bodies. Returning from a procedure does not
necessarily mean leaving the whole program. This means that accesses to globally
visible variables may still happen. The set X should, in this case, be defined as the
set of global variables.
The following definitions use the terms definition and use, well known in the
compiler literature. A definition of a variable x is a statement which may change the
value of x. In our small example language, the only definitions are assignments and
loads where the left sides are x. A use of a variable x is an occurrence where the value
of x is read. The sets of variables used and defined at an edge in the control-flow
graph can be derived from the statement labeling the edge. For a label lab, they are
determined by:
Lab
Used
Defined
;

NonZero(e)
Vars(e)

Zero(e)
Vars(e)

x e
Vars(e)
{x}
x M[e]
Vars(e)
{x}
M[e1 ] e2 Vars(e1 ) Vars(e2 )

where Vars(e) denotes the set of program variables that occur in e.


We call a variable x live (relative to X ) along path to program exit, if x X
and contains no definition of x, or if there exists at least one use of x in , and the
first use of x does not follow a definition of x. can, in this case, be decomposed
into = 1 k 2 such that the edge k contains a use of variable x and the prefix 1
contains no definition of x. We will in the future omit the restriction, relative to X
and tacitly assume a set X being given.
A variable x that is not live along is called dead along . A variable x is called
(possibly) live at a program point v if x is live along at least one path from v to the
program exit stop. Otherwise, we call x dead at program point v.
Whether a variable is possibly live or (definitely) dead at a program point depends
on the possible continuations of program execution, this is the future. This is in
contrast to the availability of assignments at a program point, which depends on the
history of program execution before this program point is reached.
Example 1.7.2 Let us regard the simple program of Fig. 1.11. In this example, we
assume that all variables are dead at the end of the program. There is only one path
from each program point to the program exit. So, the sets of live and dead variables

34

1 Foundations and Intraprocedural Optimization

at each program point are easily determined. For the program points of the example,
they are:
Live Dead
0 {y} {x}
1 {x, y}
2 {y} {x}
3 {x, y}


How can at each program point the set of live variables be computed? In principle,
we proceed in the same way as we did for available assignments. The domain of
possible values is L = 2Vars . Instead of providing a value for the entry point of the
program, however, we now provide a value for the exit point, namely, the set X of
variables which are live when exiting the program. Also, we provide for each edge
an abstract edge effect. Since the liveness of variables at a program point does not
depend on the history but on the future, the abstract effect of the edge k = (u, lab, v)
is a function [[k]] that determines the set of variables possibly live at u, given a set
of variables possibly live at v. Again, the abstract edge effect only depends on the
label lab of the edge. this means that [[k]] = [[lab]] , where
[[;]] L
[[NonZero(e)]] L
[[x e]] L
[[x M[e]]] L
[[M[e1 ] e2 ]] L

=L
= [[Zero(e)]] L = L Vars(e)
= (L\{x}) Vars(e)
= (L\{x}) Vars(e)
= L Vars(e1 ) Vars(e2 )

The abstract effects [[k]] of edges k on a path = k1 . . . kr can be composed to form


the abstract effect of this path. We define:
[[]] = [[k1 ]] . . . [[kr ]]
The sequence of the edges is maintained by this function composition (and not
reverted as for the analysis of expressions available in variables). The reason is that
the abstract effect [[]] of the path is to describe how a set of variables L live at
the end of is propagated through the path to compute the set of variables possibly
live at the beginning of .
The set of program variables possibly live at a program point v is obtained as the
union of the sets of variables that are live along at least one program path from
v to the program exit, that is, as the union of the sets [[]] X . Correspondingly, we
define:

{[[]] X | : v stop}
L [v] =
where v stop denotes the set of all paths from v to the program exit stop. As
partial order on the set L we choose the subset relation . Intuitively, smaller sets of

1.7 Removal of Assignments to Dead Variables

35

live variables mean larger sets of dead variables, which means more opportunities for
optimization. The function L represents the MOP solution of our analysis problem.
Program analyses are called forward analyses if the value at a program point
depends on the paths reaching this program point from program entry. Program
analyses are called backward analyses if the value at a program point depends on the
paths leaving that program point and reaching program exit. Liveness of variables
therefore is a backward analysis. Available assignments, in contrast, is a forward
analysis.
Transformation DE:
Let us assume that we are given the MOP solution L . At each program point v,
this is the set L [v]. Its complement contains only variables that are definitely dead
at v, i.e., dead along all program paths starting at v. Assignments to these variables
are superfluous and can be removed by the following transformation DE, for Dead
variable assignment Elimination:
x
 L [v]
xe
v

;
v

Memory accesses whose results are not needed could also be removed in analogy
to the removal of assignments. This could, however, change the semantics of the
program if it were to remove an illegal access to memory, whichdepending on the
semantics of the programming languagemay produce a side effect such as raising
an exception. Useless accesses to memory are therefore not removed.
Transformation DE is called dead-code elimination. Correctness of this transformation is again shown in two steps:
1. The abstract edge effects are shown to correctly implement the definition of
liveness.
2. The transformation is shown to preserve the semantics of programs.
We again consider the second step only. An important insight is that to show
semantics preservation it is not necessary to show that the value of each variable at
each program point remains invariant under the transformation. In fact, this is not
the case here. For the applicability of the transformation, however, it suffices that the
observable behavior of the original and the transformed program are the same. The
only question is: what is potentially observable? Here, we demand that the program
points traversed by program execution are the same, and that in each step the contents
of memory coincide as well as the values of the variables in the set X at the end of
program execution. Claim 2 then is that the value of a variable, dead at some program
point v, does not influence the observable behavior of the program. To prove this, we
consider the state s at v and the computation of a path starting at v and reaching
program exit. Remember, the state is a pair consisting of a concrete variable binding
and a memory. We show by induction over the length of program paths :

36

1 Foundations and Intraprocedural Optimization

(L) Let s  be a state that differs from s only in the values of dead variables. The state
transformation [[]] is also defined for s  , and the states [[  ]] s and [[  ]] s  agree
up to the values of dead variables for all prefixes  of .
The invariant (L) entails that two states at a program point v definitely lead to the
same program behavior if they only disagree in the values of variable that are dead
at v. To prove the correctness of the transformation it suffices to show that only the
values of dead variables may be different during the execution of the original and
the transformed program.
The computation of the set L [u] of variables possibly live at program point u
works analogously to the way sets of definitely available assignments were computed.
We set up an appropriate system of inequalities. Recall that, opposed to the availableassignments analysis where we fixed a start value at program entry start, we now fix a
value X at program exit stop which consists of all variables which are live at program
exit. Also, each edge k = (u, lab, v) has an associated inequality that delivers the set
of possibly live variables at u, given the set of possible live variables at v. Remember
that the inequalities in the available-assignment analysis were oriented the other way
around. The resulting system of inequalities is:
L[stop] X
L[u] [[k]] (L[v])

for an edge k = (u, lab, v)

Thus, the difference in the systems of inequalities for forward and backward analyses
only consists in the exchange of start and stop and in the reverted orientation of the
edges.
The complete lattice in which the system of inequalities will be solved is finite.
This means that all ascending chains will eventually stabilize. The abstract edge
effects are monotonic. Round-robin iteration can thus be used to determine the least
solution L. In fact, the abstract edge effects are even distributive, which means that
this least solution is the same as the MOP solution L provided that the program exit
stop is reachable from each program point (cf. Theorem 1.6.3).
Example 1.7.3 We consider again the program for the factorial function assuming that it obtains its input and returns its output through memory cells, more precisely through the cells M[I ] and M[R]. No variables are assumed to be live at program exit. The control-flow graph and the system of inequalities derived from it are
shown in Fig. 1.12. The system of inequalities closely corresponds to the control-flow
graph. After all, the system of equations was extracted from the control-flow graph.
However, it does not need to be explicitly constructed. Instead, fixed-point iteration
could traverse the control-flow graph, executing the abstract edge effects associated
with the traversed edges. The fixed-point iterator would be something like a driver.
Another analysis can be conducted by executing it with the edges effects of this new
analysis.
Round-robin iteration delivers the solution after only one round, given the right
ordering of the unknowns:

1.7 Removal of Assignments to Dead Variables

37

0
x M [I]
L[0] (L[1]\{x}) {I}

1
y1
Zero(x > 1)
6

NonZero(x > 1)
3
y xy

M [R] y
7

L[1]
L[2]
L[3]
L[4]
L[5]
L[6]

L[2]\{y}
(L[6] \ { x}) (L[3] {x})
(L[4]\{y}) {x, y}
(L[5]\{x}) {x}
L[2]
L[7] \ { y, R}

L[7]

4
xx1
5

Fig. 1.12 The system of inequalities for possibly live variables for the factorial program

7
6
2
5
4
3
1
0

1
2

{y, R}
{x, y, R} ditto
{x, y, R}
{x, y, R}
{x, y, R}
{x, R}
{I, R}

We notice that no assignment in the factorial program has a dead left side. Therefore,
transformation DE does not modify this program.


The removal of assignments to dead variables can make other variables dead. This
is witnessed in Fig. 1.13. This example shows a weakness of the analysis for dead
variables: It may classify variables as live due to later uses in assignments to dead
variables. A removal of such an assignment and a subsequent reanalysis would discover new dead variables. This iterative application of transformation and analysis is
rather inefficient. In the example of live-variable analysis the repeated analysis can be
avoided by strengthening the analysis. Strengthening leads to possibly smaller sets
of possibly live variables. The new analysis works with a more restricted condition
for liveness. The new notion, true liveness, uses the notion, true use of a variable
on a path starting at a program point. A use in an assignment to a dead variable is
not considered a true use. This renders the definition of true liveness recursive: true
liveness depends on true use, which depends on true liveness.
Let us assume again that the values of variables in a set X are still used at program
exit. We call a variable x truly live along a path to program exit if x X and
contains no definition of x or if contains a true use of x, which occurs before any

38

1 Foundations and Intraprocedural Optimization


1

{y, R}

{x, y, R}

{y, R}

{y, R}

M [R] y

xy+1

z 2x

{y, R}

xy+1

;
{y, R}

M [R] y

M [R] y

Fig. 1.13 Repeated application of transformation DE

{y, R}

xy+1
{y, R}

z 2x
{y, R}

M [R] y

M [R] y

Fig. 1.14 Truly live variables

definition of x, i.e., can be decomposed into = 1 k 2 , such that 1 contains no


definition of x, and k contains a true use of x relative to 2 . The true use of variables
at edge k = (u, lab, v) is defined by:
Lab
y truly used
;
f alse
NonZero(e)
y Vars(e)
Zero(e)
y Vars(e)
x e
y Vars(e) x is truly live at v
x M[e] y Vars(e) x is truly live at v
M[e1 ] e2 y Vars(e1 ) y Vars(e2 )
The additional condition that the assignments left side must be truly live makes up
the only difference to normal liveness.
Example 1.7.4 Consider the program in Fig. 1.14. Variable z is not live (nor truly
live) at program point 2. Therefore, the variables on the right side of the corresponding
assignment, i.e. x, are not truly used. Thus, x is not truly live at program point 1
since x is not truly used at the edge to program point 2.



1.7 Removal of Assignments to Dead Variables

39

The abstract edge effects for true liveness are as follows:


[[;]] L
[[NonZero(e)]] L
[[x e]] L
[[x M[e]]] L
[[M[e1 ] e2 ]] L

=L
= [[Zero(e)]] L = L Vars(e)
= (L\{x}) ((x L) ? Vars(e) : )
= (L\{x}) ((x L) ? Vars(e) : )
= L Vars(e1 ) Vars(e2 )

For an element x and sets a, b, c, the conditional expression (x a) ? b : c denotes


the set:

b if x a
(x a) ? b : c =
c if x 
a
The abstract edge effects for true liveness are thus more complex than those for plain
liveness. However, they are still distributive! This follows from the fact that the new
conditional operator is distributive provided that c b holds. To convince ourselves
of this property, we consider an arbitrary powerset domain D = 2U together with
the partial order and the function:
f y = (x y) ? b : c
For an arbitrary nonempty set Y 2U , we calculate:
f(


Y ) = (x Y ) ? b : c
= ( {x
 y | y Y }) ? b : c
= c {(x y) ? b : c | y Y }
= c { f y | y Y}

Theorem 1.6.2 has a more general implication:


Theorem 1.7.4 Let U be a finite set and f : 2U 2U be a function.
1. f (x1 x2 ) = f (x1 ) f (x2 ) for all x1 , x2 U holds if and only if f can be
represented in the following form:
f (x) = b0 ((u 1 x) ? b1 : ) ((u r x) ? br : )
for appropriate u i U and bi U .
2. f (x1 x2 ) = f (x1 ) f (x2 ) for all x1 , x2 U holds if and only if f can be
represented in the form:
f (x) = b0 ((u 1 x) ? U : b1 ) ((u r x) ? U : br )
for appropriate u i U and bi U .




40

1 Foundations and Intraprocedural Optimization

Liveness:

True liveness:

xx1

{x}

xx1

Fig. 1.15 True liveness in loops

T x+1
2

T x+1
2

yT
3

T x+1
2

yT
3

M [R] y
4

;
3

M [R] T
4

M [R] T
4

Fig. 1.16 A program with copying instructions

Note that the functions of Theorem 1.7.1 are closed under composition, least upper
bounds, and greatest lower bounds (Exercise 11).
The least solution of systems of inequalities for true liveness agree with the MOP
solutions due to the distributivity of the abstract edge effects. We must, however,
require that program exit stop is reachable from each program point.
It is interesting to note that the analysis of true liveness discovers more superfluous
assignments than repeated analysis of plain liveness and dead-code elimination.
Example 1.7.5 Figure 1.15 shows a loop in which a variable is modified that is only
used in the loop. Plain liveness analysis cannot determine that this variable is dead,
while true-liveness analysis is able to do so.



1.8 Removal of Assignments Between Variables


Programs often contain assignments that simply copy values from one variable into
another variable. These copy instructions may be the result of other optimizations or
of conversions of one program representation into a different form.
Example 1.8.1 Consider the program in Fig. 1.16. Storing a value in variable T is
useless in the given case, since the value of the expression is used exactly once.
Variable T can be used directly instead of variable y since T is guaranteed to contain

1.8 Removal of Assignments Between Variables

41

1
T x+1

{T }

2
yT

{y, T }

3
M [R] y

{y, T }

Fig. 1.17 Variables in Example 1.8.1 having the same value as T

the same value. This renders variable y dead at program point 2, such that the compiler
can eliminate the assignment to y. The resulting program still contains variable T ,
but variable y is eliminated.


For this kind of transformation, the compiler needs to know how the value of
an expression is propagated by copy actions between variables. Such an analysis,
therefore, is called copy propagation. Consider a variable x. The analysis maintains
at each program point a set of variables guaranteed to contain the actual value of this
variable. The use of a variable containing a copy of x can be replaced by a use of x.
Let V = {V Vars | x V } be the complete lattice of all sets of program
variables containing x, ordered by the superset relation . It is intuitively clear that
larger sets of variables guaranteed to contain the value of x will offer a greater chance
for this optimization.
At program entry, only variable x is guaranteed to contain its own value. The start
value of our analysis at program entry, therefore, is V0 = {x}. The abstract edge
effects again only depend on the edge labels. We define:
= {x}
[[x e]] V
[[x M[e]]] V = {x}

V {z} if y V
=
[[z y]] V
V \{z} if y 
V
= V \{z}
[[z r ]] V

if x 
z, r 
Vars

No other variable besides x definitely contains the value of x following an assignment


x e or reading from memory x M[e]. The other two cases treat assignments
to variables z different from x. The abstract edge effects of all other edge labels do
not change the incoming analysis information.
The result of the analysis for the program of Example 1.8.1 and the variable T
is shown in Fig. 1.17. Note that the information is propagated through the program
control-flow graph in a forward direction. Due to Theorem 1.7.1, all abstract edge
effects are known to be distributive. This means that also for this problem the least
solutions of the corresponding system of inequalities coincides with the MOP solution. Let Vx be this solution. By construction it follows from z Vx [u] that z contains

42

1 Foundations and Intraprocedural Optimization

the same value as x. The compiler may, therefore, replace accesses to z by accesses to
x. We introduce the substitution V[u] to define the corresponding transformation:

V[u] z =

x
z

if z Vx [u]
otherwise

The transformation then is given be the following rules:


Transformation CE:

u
NonZero(e)

NonZero(V[u] (e))

An analogous rule is applied to edges labeled by Zero (e).

Here, V[u] (e) denotes the application of the substitution V[u] to the expression e.
Example 1.8.2 It is time to have a look at a slightly bigger example to observe the
cooperation of the different transformations.
In Example 1.4.2 we considered the implementation of the statement a[7]; in
our example language and showed how the second computation of the expression
A + 7 could be replaced by an access to variable A1 . Figure 1.18 shows on the left
side the results of transformation RE. The application of transformation CE replaces
the use of variable A2 by a use of variable A1 . The result of the transformation
is the control-flow graph in the middle. The application of the transformation CE

1.8 Removal of Assignments Between Variables

43

A1 A + 7

A1 A + 7

A1 A + 7

B1 M [A1 ]

B1 M [A1 ]

B1 M [A1 ]

B2 B 1 1

B2 B1 1

B2 B1 1

A2 A1

A2 A1

M [A2 ] B2

M [A1 ] B2

M [A1 ] B2

Fig. 1.18 The transformations CE and DE for the implementation of a[7];

1
x7

x 7;
if (x > 0)
M [A] B;

Zero (x > 0)

NonZero (x > 0)

3
M [A] B

4
;

Fig. 1.19 An example for constant folding

renders variable A2 dead. So, an application of transformation DE in the last step can
eliminate the assignment to A2 . The inserted empty statement can later be removed
in some clean-up step.



1.9 Constant Folding


The goal of constant folding is to move parts of the computation from run time to
compile time.
Example 1.9.1 Consider the program of Fig. 1.19. Variable x has always the value
7 at program point 2. Therefore, the condition x > 0 at the edges emanating from
program point 2 will always evaluate to 1 such that the access to memory will
be always executed. A compiler can therefore eliminate the condition following
program point 2 (Fig. 1.20). The else part will become unreachable by eliminating
the condition.



44

1 Foundations and Intraprocedural Optimization


1

1
x7

Zero (x > 0)

NonZero (x > 0)

3
M [A] B

M [A] B

4
;

Fig. 1.20 An optimization of the example program of Fig. 1.19

The question is, do such inefficiencies occur in real programs? The answer is yes.
There are several reasons why constant folding might find much to optimize. It is good
programming style to use named constants to make programs easily modifyable. The
compiler then is expected to propagate the constants through the program and fold
expressions away where possible. Often, a program is written for many configurations of parameters. The automotive industry develops such generic programs, also
called program families, which can be instantiated for many different types of cars,
e.g. those with four or those with five gears, just by setting the named constants to
different values.
Also, many programs are not written by programmers, but are generated from
other programs. These generated programs tend to contain such inefficiencies. The
compiler itself may produce constant subexpressions during the translation process,
e.g., when it translates accesses to data structures such as arrays, or as the result of
other program transformations.
Constant folding is a special case of partial evaluation of programs, which is
the essence of compilation according to A. Ershov, one of the pioneers of compiler
design. Partial evaluation performs computations on statically known parts of the
programs state already at compile time. In this book, we are only concerned with
constant folding. Our goal is to develop an analysis that computes for each program
point v the following information: Which value does each program variable have
when program control reaches v? It should be clear that a variable, in general, will
have different values at a program point for different executions of the program or
even during the same execution of a program, when control reaches this program point
several times. The analysis can, at best, find the cases in which for every execution
of the program and every time control reaches the program point a variable has the
same value. We call this analysis constant propagation. As a side effect, this analysis
also determines whether each program point is potentially reachable.
We construct the complete lattice for this analysis in two steps. In the first step,
we design a partial order for the possible values of variables. To do this, we extend
the set of integer numbers by an element , which represents an unknown value.
Z = Z {}

and x  y iff y =  or x = y

1.9 Constant Folding

45

Fig. 1.21 The partial order


Z for values of variables


-2

-1

Figure 1.21 shows this partial order. The partial order Z by itself is not yet a complete
lattice, since it lacks a least element. In a second step, we construct the complete
lattice of abstract variable bindings by defining
D = (Vars Z ) = (Vars Z ) {},
i.e., D is the set of all functions mapping variables to abstract values, extended by an
additional value, as the unique least element. We say an abstract binding D 
=
knows the value of a variable x if D x Z. If, however, D x
/ Z, that is, if D x = ,
then the value of x is unknown. The value  for x means that the analysis could not
determine one single value for x, perhaps, since x was found to have several distinct
values in the course of execution.
The new element is associated with every program point that, according to the
current fixed-point iteration is not yet known to be reachable. If the solution found
by the fixed-point iteration still has program points with associated value , these
points cannot be reached by any program execution. We define an order on this set
of abstract states by:
D1  D2

iff

= D1 or D1 x  D2 x for all x Vars

The abstract variable binding denoting not yet reachable is considered as smaller
than any other abstract state. The idea is that later, the corresponding program point
still may turn out to be reachable and thus receive any abstract variable assignment

= . An abstract binding D1 
= is possibly better, i.e., less than or equal to another
= , if it agrees with D2 on the values of all variables that D2 knows,
binding D2 
but possibly knows more values of variables than D2 . Intuitively, an abstract variable
binding that knows values of more variables may lead to more optimizations and
thus is better information. Going up in the partially ordered set D = (Vars Z )
thus means forgetting values of variables.
We want to show that D together with this order is a complete lattice. Consider
a subset X D. Without loss of generality, we may assume that 
X . We have
then X (Vars Z ).
From X = follows X = D. Therefore, D has a least upper bound for
X . For X 
= , the least upper bound X = D is given by:


z if f x = z for all f X
Dx =
{ f x | f X} =
 otherwise

46

1 Foundations and Intraprocedural Optimization

This shows that every subset X of D has a least upper bound and that D is a complete
lattice. For each edge k = (u, lab, v) we construct an abstract edge effect [[k]] =
[[lab]] : D D, which simulates the concrete computation. Since unreachability
should be preserved by all abstract edge effects, we define all abstract effects as strict,
i.e., [[lab]] = holds for all edge labels lab.
Now let D 
= be an abstract variable binding. We need an abstract evaluation
function for expressions to define the abstract edge effects. This function determines the value of an expression as far as possible for the given information in D.
The abstract evaluation has to handle the situation that the precise value of a given
expression cannot be determined in the given abstract variable binding D. This means
that the expression should be evaluated to . The abstract evaluation of expressions
works like the concrete evaluation of expressions as long as all operands of the operators in the expression are concrete values. To handle the case of an operand ,
the concrete arithmetic, Boolean, and comparison operators, , are replaced by the
corresponding abstract operators,  , which are also able to deal with  operands.
For binary operators, , we define:
a  b =


if a =  or b = 
a  b otherwise

The result of the abstract evaluation of an expression shall be unknown, that is ,


whenever at least one of the operands is unknown.
This definition of the abstract operators is quite natural. Still, better information
can be obtained for some combinations of operators and operand values by exploiting
algebraic laws. For instance, knowing that one operand of a multiplication is 0 can
be exploited to infer that the result is 0 no matter what the other operand is. More of
these algebraic identities can be used to refine and improve the abstract evaluation.
Let us assume that we have defined an abstract operator  on abstract values for
each concrete operator . We then define the abstract evaluation
[[e]] : (Vars Z ) Z
of an expression e by:
[[c]] D = c
for unary operators 
[[ e]] D =  [[e]] D
[[e1  e2 ]] D = [[e1 ]] D  [[e2 ]] D for binary operators 
Example 1.9.2 Consider the abstract variable binding
D = {x

2, y

}
We get:

1.9 Constant Folding

47
1
x7

Zero (x > 0)

NonZero (x > 0)

3
M [A] B

1
2
3
4
5

{x
}
{x 7}
{x 7}
{x 7}
{x 7} = {x 7}

Fig. 1.22 Solution of the system of inequalities of Fig. 1.19

[[x + 7]] D = [[x]] D + [[7]] D


= 2 + 7
=9
[[x y]] D = 2  
=


Next, we define the abstract edge effects [[k]] = [[lab]] . We set [[lab]] = , and
for D 
= , we define:
[[;]] D = 
D
if 0 = [[e]] D

[[NonZero (e)]] D =
 D otherwise 
if 0 
 [[e]] D cannot be zero

[[Zero (e)]] D =
D if 0  [[e]] D could be zero
[[e]] D}
[[x e]] D = D {x


[[x M[e]]] D = D {x

}
[[M[e1 ] e2 ]] D = D
The operator changes a function at a given argument to a given value.
We assume that no values of variables are known at the start of program execution.
 | x Vars} for
Therefore, we select the abstract variable binding D = {x

the program point start.


The abstract edge effects [[k]] can, as usual, be composed to the abstract effects
of paths = k1 . . . kr by:
[[]] = [[kr ]] . . . [[k1 ]]

:DD

Example 1.9.3 The least solution of the system of inequalities of our introductory
example is shown in Fig. 1.22.



48

1 Foundations and Intraprocedural Optimization

How do we show the correctness of the computed information? This proof is


based on the theory of abstract interpretation as developed by Patrick and Radhia
Cousot in the late 1970s. We present this theory in a slightly simplified form. The
main idea is to work with abstract values, which are descriptions of (sets of) concrete
values. These descriptions are elements of a partial order D. A description relation,
, relates concrete and abstract values. For x a we say, x is described by a.
This relation should have the following properties:
x a1

a1  a2

==

x a2

If a1 is a description of a concrete value x, and a1  a2 holds, then a2 is also a


description of x. We can define a concretization, , for such a description relation.
It maps each abstract value a D to the set of all concrete values described by a:
a = {x | x a}
An abstract value a2 that is greater than another abstract value a1 describes a superset
of the set of concrete values described by a1 . The greater abstract value, a2 , is
therefore less precise information than the smaller value, a1 :
a1  a2

==

(a1 ) (a2 )

The description relation for constant propagation is built up in several steps. We


start with a description relation Z Z on the values of program variables and
define:
z a iff z = a a = 
This description relation has the concretization:

a =

{a} if a  
Z if a = 

We extend the description relation for values of program variables to one between
concrete and abstract variable bindings. For simplicity, it gets the same name, .
This description relation (Vars Z) (Vars Z ) is defined by:
D iff D 
= x  D x (for all x Vars)
This definition of implies that there exists no concrete variable binding such that
. Therefore, the concretization maps to the empty set. maps each abstract
variable binding D 
= to the set of all concrete variable bindings that know for
each variable x either D x Z or an arbitrary value, if D x = .
D = { | x : ( x) (D x)}

1.9 Constant Folding

49

We have, for instance:


{x

1, y

7} {x

, y

7}
The simple constant propagation, we consider here, ignores the values in memory.
We can therefore describe program states (, ) just by abstract variable bindings,
which only describe . Overall, the description relation is defined by:
(, ) D iff D
The concretization returns:

D=

if D =
{(, ) | D} otherwise

We want to prove that each path of the control-flow graph maintains the description
relation between concrete and abstract states. The claim is:
(K ) If s D holds, and if [[]] s is defined then ([[]] s) ([[]] D) holds.
The following diagram visualizes this claim:

[[]]

s1

[[]]

D1

Claim (K ) implies in particular that


[[]] s ([[]] D),
whenever s (D) holds. Property (K ) is formulated for arbitrary paths. It is
sufficient for the proof to show (K ) for a single edge k. The claim is then proved by
induction on the length of paths since the claim trivially holds for paths of length 0.
It therefore suffices to show that for each edge k and s D that ([[k]] s) ([[k]] D)
holds whenever [[k]] s is defined.
The essential step in the proof of property (K ) for an edge consists in showing
for each expression e:
([[e]] )([[e]] D),
if D.
()
To prove claim (), we show for each operator :

50

1 Foundations and Intraprocedural Optimization

(x  y) (x   y  ),

if x x  y y 

Claim () then follows by induction over the structure of expressions e. The claim
about the relation between concrete and abstract operators has to be shown for each
operator individually. For constant propagation with our simple definition of  , this
is certainly fulfilled.
Overall, we wanted to prove that each edge k = (u, lab, v) maintains the description relation between concrete and abstract states. Stated differently, this means
that the concrete and the abstract edge effects are compatible with the description
relation. Let us return to this proof. The proof goes by case distinction according to
the label lab of the edge.
We assume that s = (, ) D and that D 
= .
Assignment, x e: We have:
[[x e]] s

= (1 , )

[[x e]] D = D1

where

1 = {x

[[e]] }

where

D1 = D {x

[[e]] D}

The claim (1 , ) D1 follows from the compatibility of the concrete and the
abstract expression evaluation with the description relation .
Read, x M[e]: We have:
[[x M[e]]] s

= (1 , )

[[x M[e]]] D = D1

where

1 = {x

([[e]] )}

where

D1 = D {x

}

The claim (1 , ) D1 follows since 1 x  holds.


Store, M[e1 ] e2 :
The claim holds since neither the concrete nor the abstract edge effect modify
the variable binding.
Condition, Zero(e):
Let [[Zero(e)]] s be defined. We have 0 = ([[e]] ) ([[e]] D).
Therefore, [[Zero(e)]] D = D 
= holds, and the claim is shown.
Condition, NonZero(e):
Let [[NonZero(e)]] s be defined. We have 0 
= ([[e]] ) ([[e]] D).

= 0, and we have: [[NonZero(e)]] D = D, which implies
It follows [[e]] D 
the claim.
Altogether we conclude that the invariant (K ) holds.
The MOP solution for constant propagation at a program point v is the least upper
bound of all informations contributed by all paths from the entry point to v for the
initial binding D :
D [v] =


{[[]] D | : start v} ,

1.9 Constant Folding

51

0
1

x 10

2
Zero(x > 1)
6

NonZero(x > 1)
3
y xy

M [R] y
7

4
xx1
5

x y

x y

 

 

1 10 

10 

2 10 1

 

3 10 1

 

4 10 10

 

9 10

 

 

 

y1

ditto

Fig. 1.23 Constant propagation for the factorial program

where D x =  for all x Vars holds. Invariant (K ) implies for all initial states s
and all paths reaching program point v:
([[]] s) (D [v])
Solving the associated system of inequalities leads to an approximation of the MOP
solution.
Example 1.9.4 Consider the factorial program, this time with a given initial value for
variable x. Figure 1.23 shows the result of the analysis. We know that, with a given
initial value, the whole computation of the factorial of that value could be executed at
compile time. Our static analysis on the other hand, does not identify this possibility.
The reason is that constant propagation determines values of variables at program
points that are the same each time execution reaches that program point. The values
of the variables x and y, however, change within the loop.


In conclusion, we note that constant propagation computes with concrete values
as far as they can be determined statically. Expressions consisting of only known
values can be evaluated by the compiler. In general, though, constant propagation
will only be able to determine a subset of the concrete variable bindings. The fixedpoint iteration to determine the least solution of the system of inequalities always
terminates. With n program points and m variables it takes at most O(m n) rounds.
Example 1.9.4 shows that the iteration often terminates faster. There is one caveat,
though: the edge effects for constant propoagation are not all distributive. As a
counterexample consider the abstract edge effect for the assignment x x + y
together with the two variable bindings:

52

1 Foundations and Intraprocedural Optimization

D1 = {x

2, y

3} and D2 = {x

3, y

2}
On the one hand, we have:
5, y

3}  {x

5, y

2}
[[x x + y]] D1  [[x x + y]] D2 = {x

= {x

5, y

}

On the other hand, it holds that:


[[x x + y]] (D1  D2 ) = [[x x + y]] {x

, y

}
= {x

, y

}

Therefore
[[x x + y]] D1  [[x x + y]] D2 
= [[x x + y]] (D1  D2 )
violating the distributivity property.
The least solution D of the system of inequalities thus in general delivers only an
upper approximation of the MOP solution. This means:
D [v]  D[v]
for each program point v. Being an upper approximation, D[v] still describes the
result of each computation along a path that ends in v:
([[]](, ))D[v],
whenever [[]] (, ) is defined. Therefore, the least solution is safe information,
which the compiler can use to check the applicability of program transformations.
Transformation CF:
The first use of information D consists in removing program points that are identified as unreachable. The following transformation performs the removal of dead
code:
D[u] =
u

1.9 Constant Folding

53

Furthermore, the compiler may remove all condition edges that might lead to a
reachable node, but whose abstract edge effect delivers :
u

[[lab]] (D[u]) =
lab

The next two rules simplify condition edges whose conditions deliver a definite,
i.e., non- value. Having a definite value means that this edge will be taken in all
executions.
= D[u] = D
[[e]] D = 0

Zero (e)

= D[u] = D
[[e]] D
{0, }

NonZero(e)

Finally, the compiler uses the information D to evaluate program expressions at


compile time whenever this is shown to be possible. For assignments, we obtain:
u

= D[u] = D

u
x e

xe

where the expression e results from evaluating the expression e in the abstract
variable binding D .

c if [[e]] D = c 
=

e =
e if [[e]] D = 
The simplification of expressions at other edges works similarly.
Constant folding as explained so far is always applied to maximal expressions in
statements. It can also be extended to subexpressions:
x + (3 y)
y (x + 3)

{x

,y

5}

=========
{x

,y

5}

=========

x + 15
5 (x + 3)

Our analysis can be improved to better exploit the information contained in conditions.
Example 1.9.5 Consider the following example program:

54

1 Foundations and Intraprocedural Optimization

Zero (x = 7)

NonZero (x = 7)

Zero (x = 7)

NonZero (x = 7)

1
y x+3

y 10

Fig. 1.24 Exploiting the information in conditions

if (x = 7)
y x + 3;
Without knowing the value of x before the if statement, the analysis can derive that
x has the value 7 when control enters the then part.


Conditions testing the equality of variables with values can be exploited particularly
well.


if [[x = e]] D = 0

[[NonZero (x = e)]] D =
D1 otherwise
where we define:

(D x  [[e]] D)}
D1 = D {x

We can choose an analogous abstract edge effect for Zero (x 


= e).
Figure 1.24 shows the improvement that the compiler achieves for Example 1.9.5.

1.10 Interval Analysis


Constant propagation attempts for each program point v to determine the values of
variables that the variables have every time execution reaches v. Often, a variable
has different values at a program point v when execution reaches v several times.
Interval analysis makes the best of this situation by computing an interval enclosing
all possible values that the variable may have when execution reaches v.
Example 1.10.1 Consider the following program:
for (i 0; i < 42; i++) a[i] = i;
Programming languages such as Java require that array indices always lie within
the declared bounds of the array. Let the int array a begin at address A, and let it have
the bounds 0 and 41. The code generated for the program above can look as follows:

1.10 Interval Analysis

55

l1

u1

l2

u2

Fig. 1.25 The order on intervals [l1 , u 1 ]  [l2 , u 2 ]

i 0;
B : if (i < 42) {
if (0 i i < 42) {
A1 A + i;
M[A1 ] i;
i i + 1;
} else goto error;
goto B;
}
The condition of the outer loop makes the inner bounds check superfluous. It will
never trigger the jump to program point error. The inner bounds check can therefore
be eliminated.


Interval analysis generalizes constant propagation by replacing the domain Z for
the values of variables by a domain of intervals. The set of all intervals is given by:
I = {[l, u] | l Z {}, u Z {+}, l u}
l stands for lower and u for upper. According to this definition, each interval represents a nonempty set of integers. There exists a natural order on intervals: :
[l1 , u 1 ]  [l2 , u 2 ]

iff

l2 l1 u 1 u 2

Figure 1.25 represents the geometric intuition behind this definition.


The least upper bound and the greatest lower bounds on intervals are defined as
follows:
[l1 , u 1 ]  [l2 , u 2 ] = [min{l1 , l2 }, max{u 1 , u 2 }]
[l1 , u 1 ]  [l2 , u 2 ] = [max{l1 , l2 }, min{u 1 , u 2 }], sofern max{l1 , l2 } min{u 1 , u 2 }
The geometric intuition for these operations is illustrated in Fig. 1.26. The least upper
bound is depicted on top of the two given intervals; the greatest lower bound below
them. Like Z , the set I together with the partial order  is a partial order, but not
a complete lattice. It has no least element since the empty set is explicitly excluded.
Least upper bounds therefore only exist for nonempty sets of intervals. Also, the
greatest lower bound operation is is only defined, if intervals overlap. There is one

56

1 Foundations and Intraprocedural Optimization

[l1 , u1 ] [l2 , u2 ]
l1
l2

u1
u2

[l1 , u1 ] [l2 , u2 ]
Fig. 1.26 Two intervals, in the middle, and their least upper bound, on top, and their greatest lower
bound, on the bottom

important difference, though, between the partial order Z and the partial order I.
The partial order Z has only finite strictly ascending chains while I has ascending
chains that never stabilize, for instance, the following:
[0, 0]  [0, 1]  [1, 1]  [1, 2]  . . .
The natural description relation between integer values and integer intervals is
given by:
z [l, u]
iff
lzu
This description relation leads to the following concretization function:
[l, u] = {z Z | l z u}
Example 1.10.2 We have:
[0, 7] = {0, . . . , 7}
[0, ] = {0, 1, 2, . . .}


Interval analysis needs to calculate with intervals. These calculations are expressed
in terms of abstract versions of the arithmetic, Boolean and comparison operators.
The sum of two intervals should contain all values that result when any two values
from the argument intervals are added. We therefore define:
[l1 , u 1 ] + [l2 , u 2 ] = [l1 + l2 , u 1 + u 2 ]
+ _ =
+ + _ = +
Note that the value of + never need to be computed.

where

1.10 Interval Analysis

57

Negation on intervals is defined as:


 [l, u] = [u, l]
To define multiplication on intervals is more difficult. The smallest interval must be
determined that contains all products of values taken from two argument intervals.
A rather simple definition that saves many case distinctions is the following:
[l1 , u 1 ]  [l2 , u 2 ] = [a, b]

where

a = min{l1l2 , l1 u 2 , u 1l2 , u 1 u 2 }
b = max{l1l2 , l1 u 2 , u 1l2 , u 1 u 2 }
Example 1.10.3 We check the plausibility of this definition of interval multiplication
by inspecting a few examples.
[0, 2]  [3, 4] = [0, 8]
[1, 2]  [3, 4] = [4, 8]
[1, 2]  [3, 4] = [6, 8]
[1, 2]  [4, 3] = [8, 4]


To define division on intervals is really problematic! Let [l1 , u 1 ] / [l2 , u 2 ] = [a, b] .
If 0 is not contained in the denominator interval we can define:
a = min{l1 /l2 , l1 /u 2 , u 1 /l2 , u 1 /u 2 }
b = max{l1 /l2 , l1 /u 2 , u 1 /l2 , u 1 /u 2 }
However, if 0 is contained in the denominator interval, that is: l2 0 u 2 , a runtime error cannot be excluded. The semantics of our example language does not
state what happens in the case of such a run-time error. We assume for simplicity
that any value is a legal result. We therefore define for this case:
[a, b] = [, +]
Besides abstract versions of the arithmetic operators, we need abstract versions of
comparison operators. The abstract version of the comparison for equality is quite
different from the natural equality of intervals. Abstract comparisons of intervals
can have the values true, false, or unknown Boolean value which describes both true
and false. According to the semantics of our example language, the value false is
represented by 0, while the value true (when returned by a Boolean operator) should
be represented by 1. The corresponding intervals are [0, 0] and [1, 1]. Accordingly,

58

1 Foundations and Intraprocedural Optimization

the unknown Boolean value true  false is represented by the interval [0, 0][1, 1] =
[0, 1].
The value true results for two identical singleton intervals. The result must be
false for two disjoint intervals because the comparison can never deliver true for any
pair of values from the argument intervals. truefalse must be the result in the case
of nondisjoint, nonsingleton intervals because there are pairs of identical values in
the argument intervals, but also pairs of nonidentical values.

[1, 1] if l1 = u 1 = l2 = u 2
[l1 , u 1 ] = [l2 , u 2 ] = [0, 0] if u 1 < l2 u 2 < l1

[0, 1] otherwise

Example 1.10.4 We use some examples to convince ourselves that this definition
makes sense:
[42, 42]= [42, 42] = [1, 1]
[1, 2] = [3, 4] = [0, 0]
[0, 7] = [0, 7] = [0, 1]


We now treat just one more comparison operator, namely the operation <. We have:

[1, 1] if u 1 < l2
[l1 , u 1 ] < [l2 , u 2 ] = [0, 0] if u 2 l1

[0, 1] otherwise

Example 1.10.5 Some example calculations illustrate the abstract comparison


operator.
[1, 2] < [9, 42] = [1, 1]
[0, 7] < [0, 7] = [0, 1]
[3, 4] < [1, 3] = [0, 0]


Starting with the partial order (I, ), we construct a complete lattice for abstract
variable bindings. This procedure is analogous to the construction of the complete
lattice for constant propagation.
DI = (Vars I) = (Vars I) {}
for a new element , which is the least element and again denotes unreachability.
We define a description relation between concrete and abstract variable bindings in
the natural way by:

1.10 Interval Analysis

59

iff

D
=

x Vars : ( x) (D x).

This leads to a corresponding description relation between concrete states (, )


and abstract variable bindings:
(, ) D

iff

The abstract evaluation of expressions is also defined in analogy to the abstract


evaluation for constant propagation. It holds for all expressions:
([[e]] ) ([[e]] D)

if

Next to define are the abstract edge effects for interval analysis. They also look quite
like the ones for constant propagation, apart from the fact that they now calculate
over interval domains:
[[;]] D
[[x e]] D
[[x M[e]]] D
[[M[e1 ] e2 ]] D

=
=
=
=

D
D {x

[[e]] D}
D {x

}
D

if [0, 0] = [[e]] D
[[NonZero (e)]] D =
 D otherwise
if [0, 0] 
 [[e]] D

[[Zero (e)]] D =
D if [0, 0]  [[e]] D
if D 
= . Here,  denotes the interval [, ].
We assume, like in the case of constant propagation, that nothing is known about
the values of variables at the entry to the program. This is expressed by associating
the largest lattice element,  = {x

[, ] | x Vars}, with program entry.


For the proof of correctness of interval analysis we formulate an invariant that is
very similar to the invariant (K ) of constant propagation, the only difference being
that all computations are on intervals instead of on Z . The proof uses the same
argumentation so that we omit it here.
Conditions are an essential source of information for interval analysis, even more
so than they are for constant propagation. Comparisons of variables with constants
can be very fruitfully exploited. Let us assume that e is of the form x  e1 for
comparison operators  {=, <, >}. We define:
[[NonZero (e)]]
where


D=

if [0, 0] = [[e]] D
D1 otherwise

60

1 Foundations and Intraprocedural Optimization

0
i0
1

Zero(i < 42)


8

Zero(0 i < 42)


7

NonZero(i < 42)


2

NonZero(0 i < 42)


3
A1 A + i
4
M [A1 ] i
5
ii+1
6

0
1
2
3
4
5
6
7
8

l
u
+
0 42
0 41
0 41
0 41
0 41
1 42

42 42

Fig. 1.27 The least solution of the interval analysis of Example 1.10.1

if e (x = e1 )
(D x)  ([[e1 ]] D)}
D {x

D1 = D {x

(D x)  [, u 1]} if e (x < e1 ), [[e1 ]] D = [_, u]

D {x

(D x)  [l + 1, ]} if e (x > e1 ), [[e1 ]] D = [l, _]


A condition NonZero(x < e1 ) allows cutting the interval [u, ] from the interval
for x where u is the largest possible value in the interval for e1 . We define correspondingly:

if [0, 0] 
 [[e]] D
[[Zero (e)]] D =
D1 otherwise
where

(D x)  [, u]} if e (x > e1 ), [[e1 ]] D = [_, u]


D {x

D1 = D {x

(D x)  [l, ]} if e (x < e1 ), [[e1 ]] D = [l, _]

D
if e (x = e1 )
Note that greatest lower bounds of intervals are used here. These greatest lower
bounds are defined in this context, because otherwise the abstract evaluation of the
condition would have returned an interval not subsuming [0, 0].
Let us regard the program of Example 1.10.1. Its control-flow graph and the least
solution of the system of inequalities for the interval analysis of variable i are shown
in Fig. 1.27.
The partial order I has ascending chains that never stabilize. It is therefore not
clear how to determine the least solution of the system of inequalities for interval
analysis. In our example, fixed-point iteration would terminate, but only after 43
rounds. Other programs, though, can be constructed where round-robin iteration for
interval analysis would not terminate.

1.10 Interval Analysis

61

Apparently, we need new techniques to deal with complete lattices that have
infinite ascending chains. The inventors of abstract interpretation, Patrick and Radhia
Cousot, also invented the necessary techniques, widening and narrowing. Their first
publication presented interval example with widening as an example.
The idea of widening is to speed fixed-point iteration, albeit at the cost of a
possibly reduced precision. The measure to speed up the iteration guarantees that
each abstract value of an unknown can only undergo finitely many changes.
One idea to widening for interval analysis is not to allow arbitrary enlargements of
intervals. No enlargements from finite to finite intervals are admitted. An admissible
ascending chain of intervals could look like the following:
[3, 17]  [3, +]  [, +]
Let us formalize the general approach of widening. Let
xi  f i (x1 , . . . , xn ) , i = 1, . . . , n
be again a system of inequalities over a complete lattice D. We consider the accumulating system of equations associated with this system of inequalities:
xi = xi  f i (x1 , . . . , xn ) , i = 1, . . . , n
A tuple x = (x1 , . . . , xn ) Dn is a solution of the system of inequalities if and
only if it is a solution of the associated accumulating system of equalities. The
reformulation of the system of inequalities as an accumulating system of equations
alone does not solve our problem. Fixed-point iteration for the accumulating system
as for the original system may not necessarily terminate. Therefore, we replace the
operator  of the accumulating system by a widening operator,  which can be used
to enforce termination. As a result, we obtain the system of equations:
xi = xi  f i (x1 , . . . , xn ) , i = 1, . . . , n
The new operator  must satisfy:
v1  v2  v1  v2
The values accumulated for an unknown xi during a fixed-point iteration for the system with widening therefore will grow at least as fast as the values for the fixed-point
iteration for the system without widening. Round-robin iteration for the modified
system, if it terminates, still computes a solution of the accumulating system of
equations and therefore also for the original system of inequalities.
We apply the general method of widening to interval analysis and the complete
lattice DI = (Vars I) . A widening operator  for this complete lattice is
defined by:
 D = D  = D

62

1 Foundations and Intraprocedural Optimization

and for D1 
=
= D2
(D1  D2 ) x = (D1 x)  (D2 x)
where
u]
such
that
[l1 , u 1 ]  [l2 , u 2 ] = [l,

if l1 l2
l
l= 1

otherwise

if u 1 u 2
u1
u=
+ otherwise
The widening operator for variable bindings is based on a widening operator for
intervals. During fixed-point iteration, the left operand operand is the old value while
the right operand is the new value. Therefore, the operator treats its two arguments
differently and is, thus, not commutative.
Example 1.10.6 Here are some iteration steps:
[0, 2]  [1, 2] = [0, 2]
[1, 2]  [0, 2] = [, 2]
[1, 5]  [3, 7] = [1, +]


The widening operator, in general, does not deliver the least upper bound, but only
some upper bound. The values of the unknowns therefore may grow faster. A practical widening operator should be chosen in a way that guarantees that the resulting
ascending chains eventually stabilize so that fixed-point iteration terminates. The
widening operator that we have presented, guarantees that each interval can grow at
most two times. Therefore, the number of iteration steps for round-robin iteration
for a program with n program points to O(n #Vars).
In general, the starting point is a complete lattice with infinite ascending chains
together with a system of inequalities over this lattice. In order to determine some
(hopefully nontrivial) solution for this system, we first rewrite it into an equivalent
accumulating system of equations. Then we replace the least-upper bound operation
of the accumulation with a widening operator. This operator speeds up iteration and
enforces termination of the fixed-point iteration by admitting only a finite number of
changes to the values of the unknowns.
The design of such widening operators is black magic. On the one side, the
widening operator needs to radically lose information in order to guarantee termination. On the other hand, it should keep enough relevant information such that the
results of the analysis still have some value. Figure 1.28 shows round-robin iteration
for the program of Example 1.10.1. The iteration terminates rather quickly as we
have hoped, but with a disappointing result. The analysis loses all knowledge of
upper bounds. An elimination of the index out of bounds check is not possible.
Apparently, information is thrown away too generously. We therefore need to
improve on this naive procedure. Some thinking reveals that the widening operator
should be applied more economically. It is not necessary to apply it for each unknown

1.10 Interval Analysis

63

0
i0
1

Zero(i < 42)

NonZero(i < 42)

Zero(0 i < 42)

NonZero(0 i < 42)


3

A1 A + i
4
M [A1 ] i
5
ii+1
6

0
1
2
3
4
5
6
7
8

l
u
l
u l u
+ +
0
0
0 +
0
0
0 +
0
0
0 +
0
0
0 + ditto
0
0
0 +
1
1
1 +

42 +

42 +

Fig. 1.28 Accelerated round-robin iteration for Example 1.10.1

0
i0
1

Zero(i < 42)


8

Zero(0 i < 42)


7

NonZero(i < 42)


2

NonZero(0 i < 42)


3
A1 A + i

I1 = {1}
I2 = {2}

oder

4
M [A1 ] i
5
ii+1
6

Fig. 1.29 Feedback vertex set for the control-flow graph of Example 1.10.1

at each program point and still guarantee termination. It suffices to apply widening
at least once in each cycle of the control-flow graph.
A set I of nodes in a directed graph G is called feedback vertex set if it contains
at least one node of each directed cycle in G. Round-robin iteration still terminates
if widening is applied only at the nodes of a feedback vertex set of the control-flow
graph.
Example 1.10.7 This idea is tried out at our program from Example 1.10.1.
Figure 1.29 shows example sets I1 and I2 of nodes each of which contain one

64

1 Foundations and Intraprocedural Optimization

node in the (unique) directed cycle of the program. For widening placed at node
1, round-robin iteration yields:
1

l
u
l
u l u
+ +
0
0
0 +
0
0
0 41
0
0
0 41
0
0
0 41 ditto
0
0
0 41
1
1
1 42

42 +

0
1
2
3
4
5
6
7
8

In fact, it is almost the least solution that is obtained. The only information lost
is the upper bound for loop variable i at program points 1 and 8.
For widening placed at the node 2, we obtain:
1
0
1
2
3
4
5
6
7
8

l
u
l
u
l
u l u
+ + +
0
0
0
1
0 42
0
0
0 + 0 +
0
0
0 41
0 41
0
0
0 41
0 41 ditto
0
0
0 41
0 41
1
1
1 42
1 42

42 + 42 +

42 42

The analysis using this feedback vertex set obtains better information about variable
i at program points 1 and 8, but loses so much information at program point 2 that
it can no longer derive the nonreachability of program point 7.


This example shows that the restriction of widening to some relevant program points
may improve the precision of the analysis considerably. The example also shows that
is not always clear where to apply widening to obtain the most precise information.
A complementary technique is now presented, narrowing.
Narrowing is a technique to gradually improve a possibly too imprecise solution.
As for widening, we first develop the general approach for arbitrary systems of
inequalities and then turn to how narrowing can be applied to interval analysis.
Let x be some solution to the system of inequalities
xi  f i (x1 , . . . , xn ) ,

i = 1, . . . , n

1.10 Interval Analysis

65

Let us assume further that the right-hand sides f i are all monotonic and that F is the
associated function Dn Dn . The monotonicity of F implies:
x  F x  F2 x  . . .  Fk x  . . .
This iteration is called narrowing. Narrowing has the property that all tuples F i x that
are obtained after some iteration steps are solutions to the system of inequalities. This
also holds for narrowing by round-robin iteration. Termination is not a problem any
more: iteration can be stopped whenever the obtained information is good enough.
Example 1.10.8 Consider again the program of Example 1.10.1 where narrowing is
applied to the result produced by naive widening. We obtain:
0
0
1
2
3
4
5
6
7
8

0
0
0
0
0
1
42
42

u
l
u
l
u
+ + +
+ 0 + 0 42
+ 0 41
0 41
+ 0 41
0 41
+ 0 41
0 41
+ 0 41
0 41
+ 1 42
1 42
+

+ 42 + 42 42




In fact, the optimal solution is obtained!

In our example, the narrowing, following the widening, completely compensates for
the widenings loss of information. This can not always be expected. It is also possible
that narrowing needs a long time. It even may not terminate, namely if the lattice
has infinite descending chains. This is the case for the interval lattice. Termination,
though, can be enforced by accelerated narrowing. Let us assume that we are given
some solution of the system of inequalities
xi  f i (x1 , . . . , xn ) , i = 1, . . . , n
We consider the following system of equations:
xi = xi  f i (x1 , . . . , xn ) , i = 1, . . . , n
We start with a possibly too large solution. To improve, that is to shrink, the values
for the unknowns, the contributions of the right sides are used.
Let H : Dn Dn be the function defined by H (x1 , . . . , xn ) = (y1 , . . . , yn )
such that yi = xi  f i (x1 , . . . , xn ). If all f i are monotonic we have:
Hi x = Fi x

for all i 0 .

66

1 Foundations and Intraprocedural Optimization

,
Now the operator  in the system of equations is replaced by a new operator 
which possesses the following property:
a2  a1
a1  a2  a1 
We call the new operator the narrowing operator. The new operator does not necessarily reduce the values as quickly as the greatest-lower-bound operator, but at least
returns values which are less or equal to the old values.
In the case of the interval analysis, a narrowing operator is obtained by allowing
interval bounds only to be improved by replacing infinite bounds with finite bounds.
This way, each interval can at most be improved at most twice. For variable bindings
D we define:
D = D
=

=
= D2
and for D1 
D2 ) x = (D1 x) 
(D2 x)
(D1 
where
[l2 , u 2 ] = [l, u]
where
[l1 , u 1 ] 

l if l1 =
l= 2
l
 1 otherwise
u 2 if u 1 =
u=
u 1 otherwise
In the applications of the narrowing operator, the left operand is the value of the last
iteration step, while the right operand is the newly computed value. Therefore, the
narrowing operator does not treat both its operands in the same way and thus is not
necessarily commutative.
Example 1.10.9 Let us apply the accelerated narrowing with round-robin iteration
to the program of Example 1.10.1. We obtain:
0

l
u
l
u
l
u
0 + + +
1 0 + 0 + 0 42
2 0 + 0 41
0 41
3 0 + 0 41
0 41
4 0 + 0 41
0 41
5 0 + 0 41
0 41
6 1 + 1 42
1 42
7 42 +

8 42 + 42 + 42 42
We observe that no information is lost despite the application of accelerated
narrowing.



1.10 Interval Analysis

67

Widening, in principle, also works for nonmonotonic right sides in the system of
inequalities. However, narrowing requires monotonicity. Accelerated narrowing is
guaranteed to terminate if the narrowing operator admits only descending chains of
is defined in a way
bounded length. In the case of interval analysis, our operator 
that each interval would be modified at most twice. This means that round-robin
iteration using this narrowing operator takes at most O(n #Vars) rounds, where n
is the number of program points.

1.11 Alias Analysis


The analyses and transformations presented so far were concerned with variables.
The memory component M of our programming language was considered as one
large statically allocated array. This view is sufficient for analysis problems that
deal with variables and expressions only. Many programming languages, however,
offer dynamic allocation of anonymous objects and constructs to indirectly access
anonymous data objects through pointers (references). This section treats analyses
to deal with pointers and dynamically allocated memory. Therefore, we extend our
programming language by pointers, which point to the beginning of dynamically
allocated blocks of memory. We use small letters for int variables to distinguish
them from pointer variables, for which we use capital letters. The generic name z
can denote both int variables and pointer variables. There is one pointer constant,
null. As new statements in our language we introduce:
A statement R new(e) for an expression e and a pointer variable R. The
operator new() allocates a new block in memory and returns in z a pointer to
the beginning of this block. The size of this block is given by the value of the
expression e.
A statement z R[e] for a pointer variable R, an expression e, and a variable z.
The value of e is used as an index into the block to which R points and selects one
cell in this block. This index is assumed to be within the range of 0 and the size
of the block. The value in the indexed cell then is assigned to z.
A statement R[e1 ] e2 with a pointer variable R and expressions e1 and e2 .
Expression e2 s value is stored in the cell whose index is the value of e1 in the
block pointed to by R. Again, the index is assumed to lie within the range of 0 and
the size of the block pointed at by R.
We do not allow pointer arithmetic, that is, arithmetic operations on pointer values.
We also do not allow pointers to variables. To keep the presentation simple, we do not
introduce a type system that would distinguish int variables from pointer variables.
We just assume that, during runtime, int variables will only hold int values and pointer
variables only pointer values, and that for indexing and in arithmetic operations only
int values are used.
Pointer variables R1 and R2 are aliases of each other in some state if they have
the same value, that is, point to the same block in memory in the given state. We

68

1 Foundations and Intraprocedural Optimization

also say that R1 is an alias for R2 and vice versa. An important question about
programs written in a language with dynamic memory allocation is whether two
pointer variables possibly have the same value at some program point, that is, whether
the program may be in a state in which the two pointers are aliases. This problem
is called the may-alias problem. Another question is whether two pointer variables
always have the same value at a program point. This problem correspondingly is
called the must-alias problem.

Use of Alias Information


Here is an example of the use of alias information. The compiler may want to optimize
the statement x R[0] + 2 in the following code fragment:
R[0] 0;
S[0] 1;
x R[0] + 2;
There are three different cases:
Program analysis has found out that S and R cannot be aliases. In this case it may
transform the assignment to x into x 2.
It has found out that S and R are must aliases. The compiler can transform the
assignment into x 3.
It is unknown whether S and R are aliases. In this case, the compiler cannot do
any optimization.
The most important use of may-alias information is in dependence analysis. This
analysis determines information about the flow of values from definitions to uses
also in presence of dynamically allocated memory. Several optimizations attempt to
improve the efficiency of programs by reordering the statements of the program. One
such optimization, performed on the machine program by the compiler back-end, is
instruction scheduling, which tries to exploit the parallel processing capabilities of
modern processors. Reordering the statements or instructions of a program must not
change its semantics. A sufficient condition for semantics preservation in reordering
transformations is that the flow of values from definitions to uses is not changed.
Dependence analysis determines several types of dependences:
True dependence: A use of a resource, e.g., a variable or a memory cell, follows a
definition of the same resource without an intervening redefinition.
Output dependence: A definition of a resource follows another definition of the
same resource without intervening use or definition of that resource.
Antidependence: A definition of a resource follows a use of the same resource
without an intervening redefinition.
Any reordering of statements changing such dependences would be forbidden.
In a language with pointers, writing accesses to resources (definitions) and reading

1.11 Alias Analysis

69

accesses to resources (uses) can be performed indirectly through pointers. May-alias


information can then be interpreted as, could be the same resource in the above
definitions of dependences.
Must-alias information allows optimizations exploit the information that the
access through a pointer R goes to a particular memory block. Must-alias information for pointer variables R and R  can then be used to infer that the access
through R goes to the same memory block as through R  . If we additionally know
that the corresponding index expressions are equal, we can infer that the accesses
even go to the identical memory cell. An extension of redundancy elimination to
memory operations is considered in Exercise 27.

Background: Over- and Underapproximations


There is an interesting observation we can make about may- and must-alias analysis.
Both analyses attempt to determine whether there exist memory cells to which two
pointers point. May-alias analysis computes an overapproximation, that is, a superset
of the set of existing alias relationships. This means it detects all cases of aliases that
happen in some execution, but it may also report some aliases that never occur.
A safe use of this information is the use of its complement. If two pointers are not in
the may-alias set they will never be aliased. Must-alias analysis, on the other hand,
computes an underapproximation, that is, a subset of the set of actually occurring
alias relationships. It will only report aliases at a program point that definitely exist
every time execution reaches this program point. It may, however, miss some aliases
occurring during program execution.
Now that we have met these two notions, we can also try to classify the analyses
we have met so far as either over- or underapproximations. Available-assignments
analysis computes an underapproximation of the set of assignments that are available
along all program execution paths. Live-variable analysis computes an overapproximation of the set of variables whose values are used later on. Constant propagation,
on the other hand, again determines an underapproximation of the set of invariant
bindings of variables to constants. Interval analysis computes an overapproximation
of the set of values a variable may have.
Formally, however, the abstract domains and their partial orders, denoted by ,
are arranged in such a way that our analyses always compute overapproximations.
So, in the example of may-alias analysis, the partial order  of the lattice is the
subset relation, , and the lattice element , which represents no information, is the
set of all alias relationships. In the case of the must-alias analysis,  is the superset
relation, , and  is the empty set of alias relationships.

Some Programs Using Pointers


Example 1.11.1 A first example of a program using pointers is shown in Fig. 1.30.

70

1 Foundations and Intraprocedural Optimization

0
X new(2)
1

X new(2);
Y new(2);
X[0] Y ;

Y new(2)
2
X[0] Y

Y [1] 7;
3

Y [1] 7
4

Fig. 1.30 A simple pointer-manipulating program and its control-flow graph

X
Y

0
1
0
1

Fig. 1.31 Program state after the execution of the program of Fig. 1.30

0
R null;

A:

R null;
if (T =
null) {
H T;
T T [0];
H[0] R;
R H;
goto A;

Zero(T = null)
7

NonZero(T = null)
2
HT
3
T T [0]
4
H[0] R
5
RH
6

Fig. 1.32 A program that reverses a list

The program allocates two blocks. A pointer to the second block is stored at
address 0 of the first block. Value 7 is stored at address 1 of the second block.
Figure 1.31 shows the state of memory after the execution of this program.


Example 1.11.2 A somewhat more complex example is the program in Fig. 1.32,
which reverses a list pointed to by pointer variable T .

1.11 Alias Analysis

71

Although this program is short, it is by no means easy to convince oneself of


its correctness. It demonstrates that even short programs doing nontrivial pointer
manipulations are hard to understand and are prone to subtle errors.



Extension of the Operational Semantics


We modify the operational semantics of our programming language to serve as the
basis for the definition of a may-alias analysis. The memory component is no longer
a single potentially infinite array of memory cells, but a potentially infinite array of
blocks, each consisting of an array of memory cells.1 Each execution of the statement
new() makes available a new block. The size of these blocks is only known when the
program is executed. Each block consists of as many memory cells as are indicated
in the new() statement, where we assume for the semantics that during each program
execution, only those cells are accessed which have been allocated.
Addr h
Val h
Storeh
Stateh
State

=
=
=
=
=

{null} {ref a | a {0, . . . , h 1}}


Addr h Z
(Addr h N0 }) Val h
(Vars Val h ) {h} Storeh

h0 Stateh

adresses
values
memory
states

The program state has an integer component, h, which keeps track of how many
blocks have already been allocated, that is, how often the statement new() has been
executed. The set of values also contains, in addition to the integer numbers, addresses
of memory blocks. In each state this is an address between ref 0 and ref h 1, i.e.,
the ith allocated block is associated with address ref i 1. Addresses are values of
pointer variables. Recall that pointers may only point to the beginning of memory
blocks and not inside blocks. A program state consists of a variable binding and a
memory. The memory associates a value with each cell in each allocated block.
Let (, h, ) State be a program state. The concrete edge effects for the new
statements are:
[[R new(e)]] (, h, ) = ( {R

ref h}, h + 1,
( {(ref h, i)

null | i [[e]] })
[[z R[e]]] (, h, ) = ( {z

( R, [[e]] )}, h, )
[[e2 ]] })
[[R[e1 ] e2 ]] (, h, ) = (, h, {( R, [[e1 ]] )

The most complex operation is the operation new(). According to our semantics, it
performs the following steps:
1. It computes the size of the new block;
1

Note that this roughly corresponds to a change from considering the memory component as a
contiguous array containing directly addressed objects to a managed memory component where
new blocks are dynamically allocated.

72

1 Foundations and Intraprocedural Optimization

2. It provides the new blockby incrementing h;


3. It initializes all memory cells in the new block with null (or any other value we
could have selected);
4. It returns, into R, the address of the new block.
This semantics is very detailed since it works with absolute addresses. For some
purposes, it may be even too detailed. Consider, for example, the two following
program fragments:
X new(4);
Y new(4);

Y new(4);
X new(4);

After executing the left program fragment, X and Y have received the values
ref 1 and ref 2 while after executing the right program fragment, X and Y have
received the values ref 2 and ref 1. The two fragments, therefore, cannot be considered
as equivalent. In many cases, though, the semantics of a program is meant to be
independent of the precise values of addresses. In these cases, program states should
be considered as equal, if they are equalup to some permutation of addresses
appearing in the program states.

A Flow-Sensitive Points-to Analysis


A pointer variable may contain several different values at a program point when
program execution reaches this program point several times. We design a points-to
analysis, which determines for each pointer variable a superset of these values, that
is, all the addresses that the pointer variable may contain. After these supersets are
computed one can check whether two pointer variables have a nonempty intersection
of their possible values. Those for that this is the case may be aliases of each other.
Starting with the concrete semantics we define an analysis for this problem. The
analysis has to deal with potentially infinitely many concrete addresses created by
executing new operators in loops. It needs a way to abstract this potentially infinite
set to a set of bounded size. Our analysis uses allocation sites, that is, statements
in which a new operator occurs to partition the potentially infinite set into finitely
many sets, represented by abstract addresses. It also does not distinguish the contents
of the different cells within a block, but manages for each block a set of addresses
possibly contained in any one of its cells.
The analysis describes all addresses created by executing an edge (u, R
new(e), v) by one abstract address, which is identified with the starting point u
of the edge. We define:
Addr 
Val 
Store
State

=
=
=
=

Nodes

2Addr
Addr  Val 
(Pointers Val  ) Store

abstract addresses creation points


abstract values
abstract memory
abstract states

1.11 Alias Analysis

73

0
X new(2)
1

0 1

Y new(2)

0 {0, 1} {0, 1}

X[0] Y

1
2
3

{0} {0, 1}
{0} {1}

{0} {1} {1}

Y [1] 7

{0}

2
3

{1}

{1}

Fig. 1.33 The abstract states for the program of Example 1.11.1

Pointers Vars is the set of pointer variables. The abstract states ignore all int
values and the special pointer constant null. We will use the generic name (D, M)
for abstract states. Abstract states have a canonical partial order, derived from set
inclusion:
(D1 , M1 )  (D2 , M2 ) if ( R Pointers. D1 (R) D2 (R))
( u Addr  . M1 (u) M2 (u))
Example 1.11.3 Let us regard again the program of Example 1.11.1. Figure 1.33
shows the abstract states for the different program points. The analysis does not lose
any information in this example since each edge allocating a new block is only visited
once, and since each block is assigned an address only once.


We have seen above that the points-to analysis we are about to design will, in
general, have imprecise information about where a pointer variable or a pointer
expression point to. Let us consider how this lack of information propagates: It
propagates from the right side of an assignment to the left side, from a pointer
component in memory to the left side of a read if the analysis has already collected
several possible values for this pointer component. It may increase when the analysis
accumulates information for all possible target addresses for a write to memory.
The abstract edge effects for the points-to analysis are:
[[(_, R1 R2 , _)]] (D, M)
[[(u, R new(e), _)]] (D, M)
[[(_, R1 R2 [e], _)]] (D, M)
[[(_, R1 [e1 ] R2 , _)]] (D, M)

=
=
=
=

(D {R1

D R2 }, M)
(D {R

{u}},
 M)
0(D {R1

{M a | a D R2 }}, M)
(D, M {a

(M a) (D R2 ) | a D R1 })

All other statements do not change the abstract state.


The edge effects for those edge labels that allocate new blocks now depend on
the whole edge. Assignments to a variable overwrite the corresponding entry in the
variable binding D. This was what we had before considering pointers. Overwriting

74

1 Foundations and Intraprocedural Optimization

the entry for a variable in an abstract variable binding is called destructive update.
Destructive update, although it may sound negative, leads to more precise information. In the presence of pointers, we resort to nondestructive updates since a pointer
variable or pointer expression at some program point may point to different memory cells when the program point is reached several times. Non-destructive update
accumulates all possibilities that cannot be excluded and may therefore lead to less
precise information.
For a read from a block in memory, the address is not necessarily known. To be
on the safe side, the new value of a pointer variable on the left side is defined as the
union of the contributions of all blocks whose abstract addresses the analysis has
collected for the right side.
For a write to memory, we need to take care of the case of multiple abstract
target addresses a, which may each correspond to a set of concrete target addresses.
Writing to memory can therefore not be recorded destructively, i.e., by overwriting.
Instead, the set of addresses forming the potential new abstract address is added to the
sets M a.
Without initializing new blocks the analysis would have to assume for each block
that it may contain any possible value. Only since the operation new() returns initialized blocks can the analysis produce any meaningful information about memory
contents. Alternatively, we could assume that a correct program execution would
never use the contents of an uninitialized memory cell as address. Program behavior
is exactly as in the case of a program where each cell of a newly allocated block is
initialized to null before the block is used.

A System of Inequalities
A system of inequalities is derived from the control-flow graph of a program based on
the abstract domain State and the abstract edge effects. Nothing is known about the
values of pointer variables before program execution. No blocks are, yet, allocated.
The initial abstract state is therefore (D , M ), where
D x = ,

D R = Addr  ,

M a =

for all int-variables x, all pointer variables R, and all abstract addresses a.
Let P[v] for all program points v be the least solution of the system of inequalities.
This least solution associates with each program point v an abstract state P[v] =
(D, M) that delivers for each pointer variable R a superset of the abstract addresses
of memory blocks to which R may point when v is reached. In consequence, R is
not an alias of any other pointer variable R  if (D R) (D R  ) = .
Note that we ignore the concrete value null in the abstract state. The error of
dereferencing a null pointer can therefore not be detected nor the absence of such an
error be proved.

1.11 Alias Analysis

75

We would like to prove the correctness of the analysis. Disappointingly, this proof
is not possible with respect to the operational semantics we started with. This is due
to the fact that different program executions may perform the hth allocation of a
block at different edges. However, we have already complained about our operational
semantics being too detailed as it uses absolute addresses. The number h of an
allocation should have no semantical significance. One way out of this problem
is to use an auxiliary semantics, which is instrumented with additional, helpful
information. The auxiliary semantics does not just use the values ref h, h N0 ,
as concrete addresses. Instead, it uses:
Addr = {ref (u, h) | u Nodes, h N0 }
Thus, the instrumented concrete semantics keeps track of the source node of the edge
at which a new block is allocated. The addresses grouped at the edges this way can
be easily mapped to abstract addresses. First, a proof of correctness with respect to
the instrumented semantics needs to performed for the analysis. Then the equivalence
of the original and the instrumented semantics needs to be shown. Exercise 23 gives
the reader the opportunity to produce these two proofs.

A Flow-Insensitive May-Alias Analysis


The points-to analysis described so far keeps one abstract memory for each program
point. It may be quite expensive if there are many abstract addresses. On the other
hand, the abstract edge effects do not employ destructive operators on the abstract
memory. Therefore the abstract memories at all program points within a loop are the
same! In order to reduce the complexity of the analysis, we therefore may prefer to
compute just one abstract state (D, M) and hope not to lose too much information.
The single abstract state then describes the concrete states at all program points. This
is an example of flow-insensitive analysis.
Example 1.11.4 Let us consider again the program of Example 1.11.1. The expected
result of the analysis is shown in Fig. 1.34. No loss of information is encountered
since each program variable and each memory cell receives a value only once. 


An Efficient Implementation
The implementation of the flow-insensitive analysis merits some more consideration. We introduce one unknown P[R] per program variable R and one unknown
P[a] per abstract address a instead of considering the one global abstract state as a
whole.

76

1 Foundations and Intraprocedural Optimization

0
X new(2)
1
Y new(2)

0 1

{0} {1} {1}

X[0] Y
3
Y [1] 7
4

Fig. 1.34 The result of the flow-insensitive analysis of the program of Example 1.11.1

An edge (u, lab, v) of the control-flow graph leads to the following inequalities:
Lab
R1
R
R1
R1 [e]

Inequalities

R2
new(e)
R2 [e]
R2

P[R1 ] P[R2 ]
P[R] 
{u}
P[R1 ] {P[a] | a P[R2 ]}
P[a] (a P[R1 ]) ? P[R2 ] : for all a Addr 

All other edges do not have an effect. In this system of inequalities, the inequalities
for assignments to pointer variables and read operations are no longer destructive.
We assume that all pointer variables are initialized with null at program start to be
able to compute nontrivial information for pointer variables. Alternatively, we could
assume that the first access will only happen after an initialization. The system of
inequalities has a least solution P1 [R], R Pointers, P1 [a], a Addr  since the
right sides of the inequalities are monotonic functions over the set of addresses. This
least solution can again be determined by round-robin iteration.
In order to prove the correctness of a solution s  State of the system of
inequalities, it suffices to show for each edge k of the control-flow graph that the
following diagram commutes:

[[k]]

s1

s
where is a description relation between concrete and abstract values. The system
of inequalities has the size O(k n), if k is the number of needed abstract addresses
and n is the number of edges in the control-flow graph. The values which the fixedpoint algorithm computes are sets of a cardinality less than or equal to k. Therefore,

1.11 Alias Analysis

77

0
X new(2)
1
Y new(2)
2
X[0] Y

= {{X}, {Y, X[]}, {Y }, {Y []}}

3
Y [1] 7
4

Fig. 1.35 The equivalence classes of the relation for the program of Example 1.11.1

the values of each of the unknowns P1 [R] and P1 [a] can change at most k times.
Given the low precision of the flow-insensitive analysis this method is still rather
expensive. Also, for may alias analysis, one is not interested in in the sets P1 [R] or
P1 [a] themselves, but whether or nor their pairwise intersection is nonempty.
In order to do so, we consider two radical modifications of the flow-insensitive
points-to analysis. First, we replace the set Addr  of abstract addresses with the set
of all expressions R[], R a pointer variable. The abstract address R[] then represents
all memory blocks possibly pointed at by R. Second, we no longer consider inclusions of abstract values but equivalences. Let Z = {R, R[] | R Pointers}. Two
elements from Z should be as equivalent, if they may represent variables or memory
blocks which may contain the same address. Accordingly, the idea is to compute an
equivalence relation on the set Z of variables and abstract addresses.
Example 1.11.5 Consider again the trivial program of Example 1.11.1. Figure 1.35
shows an equivalence relation for this example. The equivalence relation directly
indicates which pointer expressions possibly evaluate to the same addresses different
from null.


Let E be the set of equivalence relations over Z . We regard an equivalence relation
1 as less than or equal to another equivalence relation 2 , if 2 contains more
equivalences than 1 , that is, if 1 2 . E is a complete lattice with respect to this
order.
Like the preceding points-to analysis, the new alias analysis is flow-insensitive,
that is, one equivalence relation is computed for the whole program. As any equivalence relation, can be represented as the partition = {P1 , . . . , Pm } of pointer
variables and abstract addresses that are considered as equivalent. Let 1 and 2 be
equivalence relations and 1 and pi 2 be the associated partitions. Then 1 2
holds if and only if the partition 1 is a refinement of the partition 2 , that is, if each
equivalence class P1 1 is contained in an equivalence class P2 2 .
An individual equivalence class P Z of an equivalence relation should be
identified by a representative p P. For simplicity, we choose this representative
in Pointers whenever P Pointers 
= . Let = {P1 , . . . , Pr } be a partition and pi

78

1 Foundations and Intraprocedural Optimization

be the representative of the equivalence class Pi . The analysis we aim at needs the
following two operations over :
Pointers find (, p)
returns the representative of class Pi where p Pi
Partition union (, pi1 , pi2 ) returns {Pi1 Pi2 } {P j | i 1 
= j
= i2 }
i.e., forms the union of the two represented classes.
If R1 , R2 Pointers are equivalent then we regard R1 [] and R2 [] as equivalent.
Therefore, the operation union will be applied recursively:
Partition union (, q1 , q2 ) {
pi1 find (, q1 );
pi2 find (, q2 );
if ( pi1 = pi2 ) return ;
else {
union (, pi1 , pi2 );
if ( pi1 , pi2 Pointers) return union (, pi1 [], pi2 []);
else return ;
}
}
The operation union as well as the derived operation union are monotonic on
partitions. The alias analysis using these operations iterates exactly once over the
edges of the control-flow graph and unifies the left and the right side when it encounters an edge at which pointers are changed:
{{R}, {R[]} | R pointer};
forall ((_, lab, _) edge) [[lab]] ;
Thereby, we have:
[[R1 R2 ]]
[[R1 R2 [e]]]
[[R1 [e] R2 ]]
[[lab]]

=
=
=
=

union (, R1 , R2 )
union (, R1 , R2 [])
union (, R1 [], R2 )

otherwise

Example 1.11.6 Consider again the program of Example 1.11.1. Figure 1.36 shows
the steps of the new analysis for this program.


Example 1.11.7 Let us also look at the result of the flow-insensitive alias analysis for
the program of Example 1.11.2 to reverse lists in Fig. 1.37. The result of the analysis
is not very precise: All pointer variables and all blocks may be aliases of each
other.



1.11 Alias Analysis

79

0
X new(2)
{{X}, {Y }, {X[]}, {Y []}}

1
Y new(2)
2
X[0] Y
3
Y [1] 7

(0, 1) {{X}, {Y }, {X[]}, {Y []}}


(1, 2) {{X}, {Y }, {X[]}, {Y []}}
(2, 3) {{X}, {Y, X[]} , {Y []}}
(3, 4)

{{X}, {Y, X[]}, {Y []}}

Fig. 1.36 The flow-insensitive alias analysis for the program of Example 1.11.1

0
R null;
Zero(T = null)
7

NonZero(T = null)
2
HT
3
T T [0]
4
H[0] R

{{H}, {R}, {T }, {H[]}, {R[]}, {T []}}


(2, 3) { {H, T } , {R}, {H[], T []} , {R[]}}
(3, 4)

{ {H, T, H[], T []} , {R}, {R[]}}

(4, 5)

{ {H, T, R, H[], R[], T []} }

(5, 6)

{{H, T, R, H[], R[], T []}}

5
RH
6

Fig. 1.37 Result of the analysis for Example 1.11.2

The alias analysis iterates once over the edges. This is no accident. A second
iteration would not change the partition, see Exercise 24. This method computes the
least solution of the system of inequalities over partitions:
P2  [[lab]] P2 , (_, lab, _) edge of the control-flow graph
The correctness proof again assumes that all accesses to cells only happen after these
have been initialized. Let us now estimate the needed effort for the alias analysis.
Let k be the number of pointer variables and n be the number of edges in the controlflow graph. Each edge is considered exactly once. For each edge, there is at most one
call to the function union . Each call to union performs two calls of the function
find. The operation union and possibly also recursively the function union are only
called if these calls to find return representatives of two different equivalence classes.
At the beginning, there are 2k equivalence classes. Each call to union decreases the

80

1 Foundations and Intraprocedural Optimization

1
0

Fig. 1.38 The partition = {{0, 1, 2, 3}, {4}, {5, 6, 7}} of the set {0, . . . , 7} represented by parent
links

number of equivalence classes. So, at most 2k 1 calls of the operation union are
possible and therefore at most O(n + k) calls of the operation find.
We need an efficient data structure to support the operations union and find. Such
union-find data structures are well-known in the literature. We present a particularly
simple implementation, invented by Robert E. Tarjan. A partition of a finite base set
U is represented as a directed forest:
For each u U there exists a parent link F[u].
An element u is a root in the directed forest if the parent link points from u to u,
i.e. if F[u] = u.
All nodes that may indirectly reach the same root through their parent links form an
equivalence class, whose representative is the root.
Figure 1.38 shows the partition {{0, 1, 2, 3}, {4}, {5, 6, 7}} of the base set U =
{0, . . . , 7}. The lower part shows the representation by an array F with parent links,
which are visualized above the array.
The operations find and union can be easily implemented in this representation.
find : To find the representative of the equivalence class of an element u it suffices
to follow the parent links starting at u until an element u  is found whose parent
link points to u  .
union : To form the union of the equivalence classes of two representatives u 1 and
u 2 the only action needed is to make the parent link of one of the elements point
to the other element. The result of applying the union operation to the example
partition of Fig. 1.38 is shown in Fig. 1.39.
The operation union only requires O(1) steps. The costs of the operation find,
however, are proportional to the length of the path from the element at the start of the
search to the root of the associated tree. This path can be very long in the worst case.
An idea to prevent long paths is to always hang the smaller tree below the bigger
one. Using this strategy in the example of Fig. 1.38, the operation union would set

1.11 Alias Analysis

81

1
0

5
6

Fig. 1.39 The result of applying the operation union(, 4, 7) to the partition of Fig. 1.38

Fig. 1.40 The result of applying the operation union(, 4, 7) to the partition of Fig. 1.38 considering the size of the involved equivalence classes

the parent link of the element 4 to 7 and not the other way round (Fig. 1.40). The
algorithm needs to account for the size of the equivalence classes in order to know
which class is the smaller and which is the bigger. This makes the costs of the union
operation slightly more expensive. Let n be the number of union operations that are
applied to the initial partition 0 = {{u} | u U }. The length of the paths to a
root is then at most O(log(n)). Accordingly, each find operation has costs at most
O(log(n)).
Amazingly enough, this data structure can be improved even further. To do this,
the algorithm redirects the parent links of all visited elements directly to the root
of the associated tree during a find operation. This increases the costs of each find
operation by a small constant factor, but decreases the costs of later find operations. Figure 1.41 shows how the paths to the root are shortened when this idea
is used.

82

1 Foundations and Intraprocedural Optimization

3
2

7
4

7
4

5
0

5
0

4 5

Fig. 1.41 Path compression by the find operation for 6

The left tree has paths of length up to 4. A find inquiry for node 6 turns nodes
3, 7, 5 and 6 into direct successors of root 1. This shortens the paths in the example
to lengths of at most 2.
This implementation of a union-find data structure has the property that n union
operations and m find operations together only have costs O((n + m) log (n)),
where log is the inverse of the iterated exponentiation function: log (n) is the least

number k such that n 22 for a tower of exponentiations of height k. The function


log therefore is an incredibly slowly growing function, which has a value 5 for all
realistic inputs n. A proof for the upper bound can be found in textbook about data
structures and algorithms, such as the book by Cormen, Leiserson, Rivest and Stein
(2009).
Conclusion 1.11.1 This section has presented methods to analyze programs using
pointers and dynamically allocated blocks of storage. We started with a flow-sensitive
points-to analysis, which computes individual information for each program point. It
uses destructive updating of analysis information for assignments to pointer variables,
but accumulates the possible values at accesses for dynamically allocated storage.
At the cost of losing the destructive update for program variables, we developed a
possibly more efficient flow-insensitive points-to analysis, which produces only one
analysis information describing all program states occurring during program execution. In case we are only interested in alias information, flow-insensitive analysis
can be used which partitions pointer variables and abstract addresses of blocks into
equivalences classes of possible aliases. This latter analysis is based on a union-find
data structure and is very fast, but may be very imprecise on programs with complex
pointer manipulations.



1.12 Fixed-Point Algorithms

83

1.12 Fixed-Point Algorithms


The last section detailed our search for an analysis of aliases, which is as efficient
as possible. This leads to the question of how one, in general, computes (if possible,
least) solutions of systems of inequalities over complete lattices. The only practical
procedure we have met so far to determine solutions of systems of inequalities
xi  f i (x1 , . . . , xn ),

i = 1, . . . , n

is round-robin iteration. It is easily implementable and can nicely be manually simulated. However, this procedure has its disadvantages. First, it needs a whole round
to detect termination of the iteration. Second, it reevalutes all right sides f i for the
unknowns xi anew, although only the value of one variable might have changed since
the last round. Last, the runtime depends heavily on the used order of the variables.
A more efficient algorithm is the worklist algorithm. This procedure administers
the set of variables xi whose values might no longer satisfy their inequality in a data
structure W , the worklist. For a variable xi taken out of the worklist, the value of
its right side is computed using the actual values of the unknowns. The old value of
xi is replaced by a new value that subsumes the previous and the newly computed
value of xi if the newly computed value is not subsumed by the previous value. The
worklist has been shortened by taking out one element. In the case that the value of
xi has grown the inequalities whose right sides depend directly on the value of xi
might be no more satisfied. The left sides of these possibly violated inequalities are
remembered for a recomputation, i.e., inserted into the worklist W .
The implementation of this procedure uses for each variable xi the set I [xi ] of all
variables whose right side possibly depends directly on xi . These direct dependences
between variables are easily identified in the examples of program analyses presented
so far: In a forward analysis, the value for a program point u influences the value at
a program point v directly if there is an edge from u to v in the program control-flow
graph. Analogously, the value at v influences the value at u in a backwards analysis
if there is an edge from u to v. The precise determination of dependences may not
always be that easy. It may be more difficult if the right sides of constraints are only
given as semantic functions f i whose implementation is unknown.
In the description of the algorithm, we again distinguish between the unknowns
xi and their values. The values of variables are stored in the array D, which is
indexed with variables. The worklist W , on the other hand, administers unknowns
and not values. In our formulations of generic systems of inequalities, we have always
assumed that the right sides f i are functions of type Dn D, i.e. may possibly
depend on all variables. We now want to take into account that evaluating the right
side of a variable may access the values only of some other variables. Therefore, we
now consider right sides f i of the functionality
f i : (X D) D

84

1 Foundations and Intraprocedural Optimization

where X = {x1 , . . . , xn } is the set of unknowns of the system of inequalities. Such a


function f i expects a binding of the unknowns to values and returns a value. When
the function accesses the value of a variable x j , this value is obtained by applying
the variable binding to x j . Since the actual values of variables are stored in the array
D, the actual binding of the unknowns is delivered by the function eval:
D eval(x j ) { return D[x j ]; }
The implementation of the worklist iteration looks as follows:
W ;
forall (xi X ) {
D[xi ] ; W W {xi };
}
while (exists xi W ) {
W W \{xi };
t f i eval;
t D[xi ]  t;
if (t 
= D[xi ]) {
D[xi ] t;
W W I [xi ];
}
}
The set W of variables whose right sides need to be reevaluated can be administered
in a simple list structure where insertions and extractions are performed last in first
out, i.e., which behaves like a stack. Note that the last line of the body of the while
loop indicates that elements from I [xi ] need only be inserted into W if they are not
yet in there. Another array of Boolean flags can be used to maintain this membership
information and thus to avoid double insertions into the worklist.
Example 1.12.1 To illustrate the worklist algorithm, we have again a look at the
system of inequalities of Example 1.5.2:
x1 {a} x3
x2 x3 {a, b}
x3 x1 {c}
The right sides in this system are given by expressions which explicitly expose the
variable dependences.
I
x1 x3
x2
x3 x1 , x2

1.12 Fixed-Point Algorithms

85
D[x1 ] D[x2 ] D[x3 ]

x1 ,x2 ,x3

{a }

x2 ,x3

{ a}

{a }

a ,c

x1 ,x2

{a, c}

x3 ,x2

a ,c

{a, c}

a ,c

{a, c}

a ,c

x3

x2

Fig. 1.42 The worklist-based fixed-point iteration for Example 1.5.2

The steps of the worklist algorithm applied to this system of inequalities is shown in
Fig. 1.42. The next variable xi to be taken out of the worklist is emphasized in the
actual worklist. Altogether six evaluations of right sides suffice. This would not be
beaten by a round-robin iteration.


The next theorem collects our observations about the worklist algorithm. To specify its precise runtime we recall that the height h of a complete lattice D is defined as
the maximal length of any strictly ascending chain of elements in D. The size | f i | of
a right side f i is defined as the number of variables that are possibly accessed during
the evaluation of f i . The sum of the sides of all right sides therefore is given by:

xi X

| fi | =

#I [x j ]

x j X

This equality results from the fact that each variable dependence x j xi is counted
exactly once in the sum on the left and also exactly once in the sum on the right side.
Accordingly,
the size of the system of inequalities in a set of unknowns X is defined
as the sum xi X (1 + #I [xi ]). Using this definition we find:
Theorem 1.12.1 Let S be a system of inequalities of size N over the complete lattice
D of height h > 0. We have:
1. The worklist algorithm terminates after at most h N evaluations of right sides.
2. The worklist algorithm produces a solution. It delivers the least solution if all
f i are monotonic.
Proof To prove the first claim, we observe that each variable xi can only change
its value at most h times. This means that the list I [xi ] of variables depending on
xi is added to the worklist at most h times. Therefore, the number of evaluations is
bounded from above by:
n
n + i=1
nh # I [xi ]
= n +
h i=1
# I [xi ]
n
(1 + # I [xi ])
h i=1
=hN

86

1 Foundations and Intraprocedural Optimization

Of the second claim we only consider the statement about monotonic right sides. Let
D0 be an array which represents the least solution of the system of inequalities. We
first prove that we have at any time:
D0 [xi ]  D[xi ]

for all unknowns xi .

Finally, we convince ourselves that after executing the body of the while loop,
all variables xi for which the corresponding inequality is actually violated, are contained in the worklist. This worklist is empty when the algorithm terminates. Hence
on termination, all inequalities must be satisfied, and the array D therefore represents
a solution. The least solution of the system of inequalities is an upper bound of this
solution. Consequently, the found solution must be equal to the least solution.


According to Theorem 1.12.1, the worklist algorithm finds a solution also in case
of nonmonotonic right sides. This solution is not necessarily a least solution. It is
just some solution. A similar behavior has been observed for round-robin iteration.
The worklist algorithm can be simplified if all right sides are monotonic. The accumulation at the recomputation of the values can then be replaced with
overwriting.
t D[xi ]  t; == ;
For iterations using widening accumulation works again differently: In this case, the
widening operator  is applied to the old and the new value instead of the least
upper bound.
t D[xi ]  t;

==

t D[xi ]  t;

==

t;
t D[xi ] 

In case of narrowing we have:


t D[xi ]  t;

where the iteration of the while loop does not start with the value for each variable
but with a previously computed solution of the system of inequalities.
In practice, the worklist algorithm has proved very efficient. It still has two disadvantages:
The algorithm needs the direct dependences between the unknowns, that is, the
sets I [xi ]. These dependences were quite obvious in the examples so far. This,
however, is not the case in all applications.
The actual value of a required variable xi is accessed when the right side of an
unknown is evaluated, no matter if this is still a trivial value or an already computed
nontrivial value.
A better strategy would be to first try to compute a reasonably good value for a
variable x j before the value is accessed. To improve further, we extend the function
eval by monitoring: before function eval delivers the values of a variable x j it keeps

1.12 Fixed-Point Algorithms

87

book of the variable xi for whose right side the value of x j is needed, that is, the
function eval adds xi to the set I [x j ]. Function eval therefore receives variable xi as
a first argument. Function eval should not return the actual values of x j , but the best
possible value for x j . Therefore, the computation of an as-good-as-possible value for
x j is triggeredeven before the variable dependence between x j and xi is recorded.
Altogther function eval turns into:
D eval (xi ) (x j ) { solve(x j );
I [x j ] {xi } I [x j ];
return D[x j ];
}
Function eval together with procedure solve recursively compute a solution. To
prevent an infinite recursion procedure solve manages a set stable. This set stable
contains all variables for which an evaluation of the corresponding right sides has
already been triggered and and not yet finished together with those variables for
which (relative to the actual values of variables in the set stable) the fixed-point has
already been reached. For the variables in the set stable procedure solve will not do
anything.
Set stable is initialized with the empty set at the start of the fixed-point iteration.
The program performing the fixed-point iteration looks as follows:
stable ;
forall (xi X ) D[xi ] ;
forall (xi X ) solve(xi );
where the procedure solve is defined as follows:
void solve (xi ){
stable) {
if (xi 
stable stable {xi };
t f i (eval (xi ));
t D[xi ]  t;
if (t 
= D[xi ]) {
D[xi ] t;
W I [xi ]; I [xi ] ;
stable stable\W ;
forall (xi W ) solve(xi );
}
}
}
The call of procedure solve(xi ) directly terminates if the variable xi is already stable.
Otherwise, the variable xi is added to the set stable. After that, the right side f i of xi
is evaluated. Instead fo the actual variable binding, procedure solve uses the function

88
solve(x2 )

1 Foundations and Intraprocedural Optimization


eval (x2 ) (x3 )

solve(x3 )

eval (x3 ) (x1 ) solve(x1 )

eval (x1 ) (x3 ) solve(x3 )


stable!
I[x3 ] = { x1 }

D[x1 ] = { a}
I[x1 ] = { x3 }
{ a}
D[x3 ] = { a, c }
I[x3 ] =
solve(x1 )

eval (x1 ) (x3 ) solve(x3 )


stable!
I[x3 ] = { x1 }
{ a, c }

D[x1 ] = { a, c }
I[x1 ] =
solve(x3 )

eval (x3 ) (x1 ) solve(x1 )


stable!
I[x1 ] = { x3 }
{ a, c }

ok
I[x3 ] = { x1 , x 2 }
{ a, c }
D[x2 ] = {a}

Fig. 1.43 An execution of the recursive fixed-point algorithm

eval partially applied to xi , which computes when applied to another unknown x j ,


the best possible value for x j , adds xi to the I set of variable x j , and only then delivers
the value of x j .
Let t be the least upper bound of the value of xi and the value delivered by the
evaluation of the the right side of xi . If this value t is subsumed by the previous value
of xi , the call of solve immediately returns. Otherwise the value of xi is set to t.
The change of the value of variable xi then is propagated to all variables whose last
evaluation accessed a smaller value of xi . This means that their right sides must be
scheduled for reevaluation. The set W of these variables is given by the set I [xi ].
The old value of the set I [xi ] is no longer needed and is therefore reset to the
empty set: the reevaluation of the right sides of the variables from the set W will
reconstruct the variable dependences, should these be needed. The variables of the
set W can no longer be regarded as stable. They are therefore removed from the set
stable. After that, procedure solve is called for all variables in the set W .
Example 1.12.2 We consider again the system of inequalities of Example 1.5.2 and
Example 1.12.1:
x1 {a} x3
x2 x3 {a, b}
x3 x1 {c}
An execution of the recursive fixed-point algorithm is shown in Fig. 1.43. A recursive
descent is always shown to the right. The column of a call to procedure solve contains

1.12 Fixed-Point Algorithms

89

the computed new entries D[xi ] and I [xi ] as well as the calls of procedure solve
to treat variables in the set W . An ok in this column signals that a reevaluation of
the right side does not require a change of the actual value of the variable. A stable!
indicates that the variable for which the last call to solve was performed is stable,
such that the call should directly terminate. The column of function eval indicates
changes of the set I [x j ] and the returned values. The algorithm evaluates fewer right
sides than the worklist algorithm, although this example is very small.


The recursive fixed-point algorithm can be elegantly implemented in a programming
language such as OCaml, which has assignments on one side, and partial applications
of higher-order functions on the other side.
The recursive fixed-point algorithm is more complicated than the worklist algorithm.
It executes, in general, less evaluations of right sides. It does not need a precomputation of the variable dependences, and, even better, it also works when variable
dependences change during the fixed-point iteration. In addition, it has a property
that we will later exploit for interprocedural analysis in Sect. 2.5: The algorithm can
be modified such that not the values of all unknowns are computed. Rather, the evaluation of an unknown of interest, xi , can be started. Then only those unknowns whose
values are needed to evaluate the unknown xi are themselves evaluated. Fixed-point
iterators having this property are called local.

1.13 Elimination of Partial Redundancies


We return to our question of how the compiler can speed up program execution by
preventing the execution of redundant computations. In Sect. 1.2, we described how
the repeated evaluation of an expression e along an edge u v can be avoided if
the value of e is definitely available in a variable x. This is the case if the assignment
x e has been executed on all paths from the entry point of the program to program
point u and no variable occurring in the assignment has been changed in between.
This optimization replaces a redundant occurrence of e by the variable x. So far, this
substitution of e by x at an edge from u to v is only possible if there are occurrences
of x e on all paths from program entry to u. The optimization now described
attempts to replace e by x if x e is only available on some paths to u. This
availability on some, but not all paths is called partial redundancy. We will use
redundancy and availability interchangeably in this section.
Example 1.13.1 Regard the program on the left side of Fig. 1.44.
The assignment T x + 1 is evaluated on every path, on the path to the right
even twice with identical result. The compiler cannot simply replace the occurrence
of x + 1 at the edge from 5 to 6 by an access to T , although the value of x + 1 that
is computed along the edge from 3 to 4 is stored in the variable T . However, the
compiler may move the occurrence of the assignment T x]1 at the edge from 5 to

90

1 Foundations and Intraprocedural Optimization


0

0
1

x M [a]

T x+1

T x+1

T x+1

x M [a]
T x+1

M [T ] 5

M [T ] 5

5
T x+1

6
M [x] T

M [x] T

Fig. 1.44 A removal of a partial redundancy


1

xe

5
6

xe

xe

Fig. 1.45 x e is partially redundant at 5, as well as very busy

6 to the edge from 2 to 5 and thus avoid a redundant computation on the right path.
This program transformation results in the program on the right of Fig. 1.44.


We look for a transformation that places assignments x e at points in the program
such that
variable x is guaranteed to contain the value of e whenever the program executes
the assignment x e the next time, and
the insertion of new redundant computations are avoided.
Consider Fig. 1.13. In Fig. 1.45, x e is available along some, but not all paths,
synonymously called partially redundant at program point 5. The compiler, therefore,
cannot eliminate the two occurrences of x e on the paths starting at 5. We observe,
however, that x e is going to be executed on every computation starting at program
point 5, before the variable x is used or any variable occurring in this statement is
overwritten. We say, the assignment x e is very busy at program point 5. This also
is true for program points 1, 2, and 4. At program point 4, however, x e is already
available and therefore need not be computed once more. Instead, the compiler may
insert the assignment x e before program point 1. After this insertion, x e
is redundant at program points 2 and 5 as well as at the program points 6 an 8

1.13 Elimination of Partial Redundancies

91
1

xe

xe

5
6

xe

Fig. 1.46 x e is totally redundant at 5, but not very busy


1

xe

5
6

xe

Fig. 1.47 x e is neither partially redundant at 5, nor very busy

and thus allows to remove the assignments there. The situation looks different in
Fig. 1.46. There, the assignment x e is already redundant at program point 5 as
well as at program point 8. Therefore, already redundancy elimination succeeds in
removing the assignment x e at the edge from 8 to 9. Finally in Fig. 1.47, the
assignment x e is neither redundant nor very busy at program point 5. Therefore,
no optimization is possible.
The transformation therefore inserts an x e at the end of an edge e with
endpoint v, if two conditions are satisfied: First, the assignment should not already
be available along this edge e. Second, it should be very busy at v. This means that
the assignment x e is executed on all outgoing paths from v before the left side
of the assignment, x, is used, and before any of the variables of the statement is
modified.

An Analysis for Very Busy Assignments


A new program analysis is required for determining very busy assignments. An
assignment x e is called busy along a path to program exit if has the form
= 1 k2 , where k is an assignment x e, and 1 contains no use of the left side,
x, and no definition of any variable of x e, that is, of any {x} Vars(e).
The assignment x e is very busy at program point v if it is busy along every path
from v to program exit. The analysis for very busy assignments thus is a backwards
analysis. Abstract values in this analysis are sets of assignments x e where x 

92

1 Foundations and Intraprocedural Optimization

Vars(e), like in the available-assignments analysis. The complete lattice therefore is


B = 2Ass
where the ordering again is given by the superset relation . No assignment is
very busy at program exit. The abstract edge effect [[k]] = [[lab]] for an edge
k = (u, lab, v) depends only on the edge label and is given by:
[[; ]] B = B
 B = B\Ass(e)
[[NonZero(e)]] B = [[Zero(e)]]

B\(Occ(x) Ass(e)) {x e} if x 
Vars(e)
[[x e]] B =
B\(Occ(x) Ass(e))
if x Vars(e)
[[x M[e]]] B = B\(Occ(x) Ass(e))
[[M[e1 ] e2 ]] B = B\(Ass(e1 ) Ass(e2 ))
The set Occ(x) denotes the set of all assignments containing an occurrence of x. The
abbreviation Ass(e) for an expression e denotes the set of all assignments whose left
side occurs in e. The analysis is supposed to determine very busy assignments, i.e.,
assignments busy along all outgoing paths. It must, therefore, form the intersection
of the contributions of all paths from v to program exit. Since the partial order on
the set 2Ass is the superset relation, the MOP solution for a program point v is
B [u] =


{[[]] | : u stop}

where [[]] is the effect of path like in the other backwards analyses. Thus,
[[]] = [[k1 ]] . . . [[km ]]
for = k1 . . . km . The abstract edge effects [[ki ]] are all distributive. The least
solution, B, with respect to the chosen partial order is, therefore, equal to the MOP
solutionprovided that the program exit is reachable from any program point.
Example 1.13.2 Figure 1.48 shows the sets of very busy assignments for the program
of Example 1.13.1. The control-flow graph is acyclic in this case. This means that
round-robin iteration can compute these sets in one round.


Note that the reachability of program exit is of great importance for the result of the
backwards analysis for very busy assignments. Let us assume that program point v
does not reach program exit. The analysis would start with the set of all assignments
at all nodes without outgoing edges. Going backwards, it would remove only those
with a new definition of a variable in Vars(e) {x} until it reaches v. The remaining
assignments would all be considered very busy at v.
Example 1.13.3 Consider the program of Fig. 1.49.

1.13 Elimination of Partial Redundancies

93

0
3

1
x M [a]

T x+1

T x+1

M [T ] 5

5
T x+1

6
M [x] T

5 { T x + 1}
4

3 { T x + 1}
2 { T x + 1}
1

Fig. 1.48 The very busy assignments of Example 1.13.1

0
a M [7]

1
4

2
xa+b

Fig. 1.49 A program whose exit is not reachable

The program admits only one, infinite computation. Program exit, node 4, is not
reachable from program point 1. Therefore, any assignment not containing variable
x is very busy at 1, even assignments that do not occur in the program.



The Placement Phase


The compiler has now at its disposal an analysis for partially available (partial redundant) and an analysis for very busy assignments. Assignments recognized as very
busy at some program point v are called movable at v, and the set of all occurrences
of the assignment that make it very busy at v are called its business sites. Note that
an assignment x e with x V ar s(e) at some edge from u to v is never movable.
Let us now turn to the optimization phase.
There are two different versions in which the transformation can be described:
Assume that the assignment x e is movable at v. In the insertion version, the
compiler inserts copies of the movable assignment y e onto all paths reaching

94

1 Foundations and Intraprocedural Optimization

v on which it was previously not available. This insertion makes the business sites
redundant such that they can be eliminated. In the motion view of the optimization,
the compiler takes the business sites and moves copies backwards into the paths on
which the movable assignment was not previously available.
The remaining open question for both views is at what edges to place the copies of
the movable assignment z e. The answer is that a placement at the new positions
should establish that y e is definitely available at all positions at which it has
been very busy before. In the control-flow graphs that we consider here, y e is
always very busy at the source program-point of an edge labeled with y e.
The optimization uses the strategy to place assignments as early as possible, maybe
even before the program entry point. This placement is constrained by correctness
and by efficiency considerations: Correctness reasons inhibit the movement of the
assignment y e over an edge leading from u to v at which y or a variable occurring
in e receive a new value. After such a move, y might have a different value at v, or the
evaluation of e in the moved assignment might result in a wrong value. For efficiency
reasons, the optimization may not move y e onto a path on which it originally
would not have been executed. Otherwise, an execution of this path would lead to a
potential runtime increase.
There is another argument why we are so cautious not to move move an assignment
onto a path that did not contain that assignment before. Depending on the semantics
of the programming language, the assignment may have side effects, e.g., throwing
an exception at a division by zero. Throwing an exception on a path of the optimizd
program where the corresponding path of the original program did not throw an
expection, violates the requirement of semantics preservation.
We now go through the different cases for the placement of movable assignments.
Let us first consider potential insertions at the entry point, start, of the program.
We introduce a new entry point and insert all assignments from B[start] before
start to make all assignments from B[start] definitely available at start. This is
realized by the first transformation rule.
Transformation PRE for the Start Node:

B[v]
v

We have taken the liberty to annotate an edge with a set of mutually independent
assignments, which can be executed in any order. To assure independence of two
y2 and that neither
assignments y1 e1 and y2 e2 B[u] one checks that y1 
y1 occurs in e2 nor y2 in e1 .
Next, we consider a program point u which is nonbranching. This means that
u has exactly one outgoing edge whose label s either is a skip operation, with an
assignment or a memory access. This edge is assumed to lead to a program point v.
All assignments from B[v] should be placed at this edge besides the ones that are still

1.13 Elimination of Partial Redundancies

95

very busy at u, i.e., could be moved further, and the ones that were already available
at v, i.e., need not be placed here. This set is given by:


ss = B[v]\([[s]]B (B[v]) [[s]]A (A[u]))


As in Sect. 1.2, A[u] denotes the set of assignments definitely available at program
point u. We have used the indices A and B to differentiate the abstract edge effects of
available assignments from those for very busy assignments. Let us make this case
concrete.
Transformation PRE for Empty Statements and for Movable Assignments:

u
xe

;
v

u
ss

An edge labeled with the empty statement can receive the set ss of assignments
where ss is defined as: B[v]\(B[v] A[u]) = . No additional assignment needs to
be placed at this edge.
A movable assignment, x e with x 
Vars(e), is moved to another place,
and the set ss of assignments is placed at its position where ss is obtained from the
definition of ss above by substituting the definitions of the abstract edge effects for
their names:
ss = B[v]\(B[v]\(Occ(x) Ass(e)) A[u]\Occ(x) {x e})
= (B[v] Occ(x)\{x e}) (B[v] Ass(e)\A[u])
Transformation PRE for Nonmovable Statements:
An edge labeled with a nonmovable statement s is replaced by a sequence of two
edges labeled with s and ss:

u
s

v
ss
v
The new label ss for an edge from u to v labeled with the nonmovable assignment
x e with x Vars(e) is:
ss = B[v] (Occ(x) Ass(e))\(A[u]\Occ(x))
= (B[v] (Occ(x)) (B[v] Ass(e)\A[u])
The set of assignments ss to place at a read operation x M[e] are defined analogously. For a write operation M[e1 ] e2 we obtain:

96

1 Foundations and Intraprocedural Optimization

ss = B[v] (Ass(e1 ) Ass(e2 ))\A[u]


It remains to consider a program point u with more than one outgoing edge, i.e., a
branching on some condition b.
Transformation PRE for Conditional Branches:
Let v1 and v2 be its successor nodes for the cases of 0 and not 0, respectively.
Assignments in A[u] need not be placed at any outgoing edge since they are available
at their target nodes already before the transformation. Of the other assignments in
B[v1 ] those assignments need to be placed that cannot be moved over u. These are
the ones that modify variables of the condition b or that are not contained in B[v2 ].
The edge to v2 is handled analogously. Therefore, we have:

u
Zero(b)
v1

u
NonZero(b)

Zero(b)

NonZero(b)

v2
ss1

ss2
v1

where

v2

ss1 = (B[v1 ] Ass(b)\A[u]) (B[v1 ]\(B[v2 ] A[u])


ss2 = (B[v2 ] Ass(b)\A[u]) (B[v2 ]\(B[v1 ] A[u])

The given transformation rules for PRE make each assignment x e at all
program points available at which x e was very busy before the transformation.
Therefore, an assignment x e is in particular available at all program points where
it would have been computed in the original program.
Example 1.13.4 Figure 1.50 shows the analysis information for the program of
Example 1.13.1 together with the result of the optimization. In fact, one partially
redundant computation could be removed.


Let ss be the set of assignments that are very busy at v. To prove correctness one
shows that for all execution paths of the program from the original entry point of
the program to a program point v and all program states before program execution
it holds that:
[[ss]] ([[]] ) = [[k0 ]]
where k0 is the new edge leading to the original entry point of the program, and [[]]
and [[]] are the semantics of the program paths before and after the application
of the transformation. The validity of the claim can be proved by induction. For the
empty program path =  it is trivially true. For a nonempty program path =  k,
it follows from the induction hypothesis with a case distinction according to the label
of the last edge k of the path.
As the example indicates, the number of executions of the assignment x e has
increased on no path, but may have been decreased on some paths. It would be nice

1.13 Elimination of Partial Redundancies

97

0
1

x M [a]
3

T x +1
2

T x +1

M [T ] 5
5
6

M [x] T

0
1
2
3
4 {T
5
6 {T
7{T

T x +1 }
{ T x +1 }

x +1 }

{
{}
T x +1 }
x +1 }

x +1 }

Fig. 1.50 The result of the transformation PRE for the Example 1.13.1 together with the necessary
analyses

to prove this property of nonpessimization for all programs. However, we will not
do this here. The intuition on which the proof is based is that the assignment x e
becomes available at all program points in the transformed program at which it was
previously very busy and is now up to removal. Each copy of a movable assignment
can so be associated with at least one subsequent occurrence that is eliminated.
The elimination of partially redundant assignments also removes some totally
redundant assignments as a side effect. More assignments may become partially
available by the application of transformation PRE. It may, therefore, pay to
apply the transformation PRE repeatedly. Similar methods can be use to save on
memory accesses. An alias analysis then must be used to refine the dependences
between the set of memory accesses. Such analyses are described in Sect. 1.11.
An important aspect of the transformation PRE is that it supports the removal of
loop-invariant code from loops. This will be considered in the next section.

1.14 Application: Moving Loop-Invariant Code


One important instance of a superfluous, repeated computation is an assignment that
occurs in a loop and computes the same value on each iteration of the loop.
Example 1.14.1 Let us consider the following program:
for (i 0; i < n; i++) {
T b + 3;
a[i] T ;
}

98

1 Foundations and Intraprocedural Optimization

0
i0
0
i0
Zero(i < n)

T b+3
1
Zero(i < n)

NonZero(i < n)
2

NonZero(i < n)

T b+3
3

3
M [A + i] T

M [A + i] T
4

ii+1

i i+1

Fig. 1.51 A loop containing invariant code and an inadequate transformation

Figure 1.51 shows the control-flow graph of this program. The loop contains the
assignment T b + 3, which computes the same value on each iteration of the
loop and stores it in the variable T . Note that the assignment T b + 3 cannot
be moved before the loop, as indicated in Fig. 1.51 because it would be executed in
executions that would not enter the loop.


This problem does not occur with do-while loops. In a do-while loop, loop-invariant
code can be placed just infront of the loop. One way to avoid the placement problem
with invariant code in while loops therefore is to transform them into do-while loops
beforehand. The corresponding transformation is called loop inversion.
Example 1.14.2 Let us regard again the while loop of Example 1.14.1. The following
Fig. 1.52 shows the inverted loop on the left side. The inverted loop has an extra
occurrence of the loop condition guarding the first entry into the loop. Another
occurrence of the loop condition is at the end of the loop. Figure 1.53 shows the
analysis information for partial redundancy elimination. Figure 1.52 on the right
shows the application of transformation PRE. The loop-invariant code has been
moved to before the do-while loop.


We conclude that transformation PRE is able to move loop-invariant code out
of do-while loops. To treat while loops, these need to be transformed into do-while
loops. This is straightforward in most imperative and object-oriented programming
languages if the source code of the program is available. In C or Java, for example,
the while loop:
while (b) stmt
can be replaced by:
if (b) do stmt while (b);

1.14 Application: Moving Loop-Invariant Code

99

0
i0
0

i0
1

NonZero(i < n)

Zero(i < n)

NonZero(i < n)

Zero(i < n)

T b+3

T b+3
3

M [A + i] T

M [A + i] T
4

ii+1
6

Zero(i < n)

i i+1
5

NonZero(i < n)

Zero(i < n)

NonZero(i < n)

Fig. 1.52 The inverted loop of Example 1.14.1 and the result of moving invariant code out of the
loop

A
0
1
2
3
4
5
6

{T b + 3 }

{T b + 3 }

{T b + 3 }

{ T b + 3}

Fig. 1.53 The partial-redundancy analysis-information for the inverted loop of Fig. 1.52

However, often programs and programming-language constructs are intermediately


represented in the more flexible form of control-flow graphs, in particular, if optimizations are applied which transform the control-flow graph in such a way that
it cannot easily be mapped back to control constructs of the language. We should
therefore identify loops by graph properties. We only consider the case of loops with
a unique loop head and use the predominator relation between program points.
We say that a program point u predominates a program point v if each path
starting at the entry point of the program and reaching v passes through u. We write
u v . The relation is reflexive, transitive, and antisymmetric and therefore
defines a partial order over the set of program points. This relation allows the compiler
to discover back edges in the control-flow graph. An edge k = (u, _, v) is called a
back edge if the target node v predominates the start node u of the edge.

100

1 Foundations and Intraprocedural Optimization

P
{0 }
{0 , 1 }
{0 , 1 , 2 }
3 {0 , 1 , 2 , 3}
4 {0 , 1 , 2 , 3 , 4}
0

1
2

3
4

1
2

{ 0 , 1 , 5}

Fig. 1.54 The predominator sets for a simple control-flow graph

Example 1.14.3 Regard the example of the while loop on the left side of Fig. 1.51.
Each program point of the loop body is predominated by program point 1. Edge
(6, ; , 1) therefore is a back edge.


We design a simple analysis to determine the set of predominators at the program
points of programs. It collects the set of program points traversed along each path.
The set of predominators for a program point v is obtained as the intersection of
these sets. As complete lattice we therefore choose:
P = 2Nodes , with the partial order
We define as abstract edge effects:
[[(u, lab, v)]] P = P {v}
for all edges (u, lab, v). Note that the labels of the edges play no role. Instead, the
analysis collects the endpoints of the edges. These abstract edge effects lead to the
following set of predominators: P [v] at program point v:
P[v] =


{[[]] {start} | : start v}

All abstract edges effects are distributive such that these sets can be determined as
least solution of the associated system of inequalities.
Example 1.14.4 Regard the control-flow graph for the example program of Example
1.14.1. Figure 1.54 shows the associated predominator sets. Figure 1.55 shows the
associated partial order .
As usual in the representation of partial orders, only the direct relations are represented. Transitivity stays implicit. The result apparently is a tree! This is by no
means an accident, as the following theorem shows.



1.14 Application: Moving Loop-Invariant Code

101

P
0

1
2

{0}
{0 , 1 }
{0 , 1 , 2 }

3 {0 , 1 , 2 , 3}
4 {0 , 1 , 2 , 3 , 4}

{ 0 , 1 , 5}

Fig. 1.55 The predominator relation for the control-flow graph of Fig. 1.54. The direction goes
from top to bottom

Theorem 1.14.1 Each program point v has at most one immediate predominator.
Proof Assume that a program point v had two different direct predominators u 1 , u 2 .
Neither u 1 u 2 nor u 2 u 1 can be true. Therefore, not every path from entry
point of the program to u 1 nor every path from u 1 to v can contain program point
u 2 . Thus, there exists a path from the entry point to v that does not contain u 2 , and
u 2 cannot be a predominator of v. This is a contradiction to the assumption above.
So, v has at most one direct predominator. This proves the theorem.


The entry condition of the while loop is represented by a program point v with two
outgoing condition edges. v predominates all nodes in the loop body, in particular
the source node u of the back edge. Loop inversion consists in creating a new node
with copies of the condition edges emanating from it to the targets of the original
targets, and then redirecting the back edge from u towards the new node. This is the
next transformation:
Transformation LR:
v
Zero(e)

lab

v
Zero(e)

NonZero(e)
u

NonZero(e)

v P[u]
Zero(e)

u
NonZero(e)
lab

Loop inversion works for all while loops. There are, however, loops that cannot be
inverted in this way. One such, somewhat unusual, loop is shown in Fig. 1.56. Unfortunately, there exist quite normal loops that cannot be inverted by transformation LR.
Figure 1.57 shows one such loop. One would have to copy the whole path from the
back edge to the condition together with the condition edges to invert such a loop.

102

1 Foundations and Intraprocedural Optimization


0

0
1

predominators:
3

Fig. 1.56 A non-reversible loop

1
2

Fig. 1.57 A normal loop that cannot easily be reversed

This kind of loop can originate when a complex condition is evaluated in several
steps.

1.15 Removal of Partially Dead Assignments


The removal of partial redundancy can be understood as a generalization of the
optimization to avoid redundant computations. We now show how the removal of
assignments to dead variables can also be generalized to a removal of assignments
to partially dead variables.
Example 1.15.1 Consider the program of Fig. 1.58. The assignment T x + 1
needs only be computed on one of the two paths since variable x is dead along the
other path. Such a variable is called partially dead.
Goal of the transformation is to delay the assignment T x + 1 as long as
possible, that is, to move it forward along the control-flow path until the assignment
is completely dead or certainly necessary. It is completely dead at a program point if
the variable on the left side is dead at this program point. The desired result of this
transformation is shown in Fig. 1.59.



1.15 Removal of Partially Dead Assignments

103

0
T x+1

1
2
M [x] T

3
4

Fig. 1.58 An assignment to a partially dead variable


0
;

2
T x+1
M [x] T

Fig. 1.59 The desired optimization for the program of Fig. 1.58

Delaying an assignment x e must not change the semantics of the program.


A sufficient condition for semantics preservation constrains the edges over which
the assignment is shifted forward. None of those edges must change any variables
occurring in the assignment; neither x nor any variable in Vars(e), and none of their
labels may depend on the variable x. To guarantee profitability of the transformation
we additionally require that when a merge point of the control flow is reached,
i.e., when two edges meet, the assignment must be shifted over both edges to this
merge point. To formalize these two requirements we define an analysis of delayable
assignments. Like in the analysis of very busy assignments we use the complete
lattice 2Ass of all assignments x e, where x does not occur in e. The abstract edge
effects remove those assignments that cannot be delayed over the edge and add the
one that is computed at the edge if any. This latter assignment is a newly delayable
assignment. The abstract edge effects are defined by:
[[x e]] D =

D\(Ass(e) Occ(x)) {x e} if x 
Vars(e)
D\(Ass(e) Occ(x))
if x Vars(e)

where Ass(e) is the set of assignments to variables, occurring in e and Occ(x) is the
set of assignments in which x occurs. Using these conventions we define the rest of
the abstract edge effects:

104

1 Foundations and Intraprocedural Optimization

[[x M[e]]] D = D\(Ass(e) Occ(x))


[[M[e1 ] e2 ]] D = D\(Ass(e1 ) Ass(e2 ))
[[Zero(e)]] D = [[NonZero(e)]] D = D\Ass(e)
There are no delayable assignments at the entry point of the program. The initial
value for the analysis therefore is D0 = . As partial order we choose the superset
relation since an assignment can only be delayed up to a program point if it can
be shifted there over all paths reaching that program point.
For the transformation rules to come we assume that D[ . ] and L[ . ] are the least
solutions of the systems of inequalities for delayable assignments and the liveness
of variables. Since we need the abstract edge effects of both analyses for the formulation of the applicability condition of the transformations, they are identified by the
subscripts D and L, respectively.
Transformation PDE for the Empty Statement:
u

u
ss

;
v

This edge receives all assignments that cannot be moved beyond its target node v, but
whose left side is live at v. The sequence of assignments, ss, to move there consists
of all x e D[u]\D[v] with x L[v].
Transformation PDE for Assignments:
The assignment y e cannot be moved if y Vars(e). In this case, the transformation is:
u
ss1
u
ye

ye

v
ss2
v

The sequence ss1 collects those useful assignments that cannot be delayed beyond
y e. The sequence ss2 collects those useful assignments that can be delayed along
this edge, but not beyond its target node. Therefore, ss1 is a sequence consisting
of the assignments x e D[u] (Ass(e) Occ(y)) with x in L[v]\{y}
Vars(e). Furthermore, ss2 is a sequence consisting of the assignments x e
D[u]\(Ass(e) Occ(y) D[v]) with x L[v].
An assignment y e satisfying y 
Vars(e) can be delayed by the transformation:

1.15 Removal of Partially Dead Assignments

105
u

ss1
ye

ss2
v

The sequence ss1 is defined in the same way as in the case of delayable assignments.
The sequence ss2 is defined analogously: It collects all useful assignments that can
be delayed along the edge, but not beyond its target edge. Possibly, the sequence ss2
could contain an occurrence of y e.
This means, ss1 is a sequence formed out of assignments x e D[u]
(Ass(e) Occ(y)) with x L[v]\{y} Vars(e). Furthermore, ss2 is a sequence of
assignments x e (D[u]\(Ass(e) Occ(y)) {y e})\D[v] with x L[v].
Transformation PDE for Conditional Branches:
u
ss0
u
Zero(b)

NonZero(b)

Zero(b)

NonZero(b)

v2

v1

ss1

ss2
v2

v1

The sequence ss0 consists of all useful assignments that are delayable at u, but that
cannot be delayed beyond the condition edges. The sequences ssi , i = 1, 2, on the
other hand, consist of all useful assignments that can be delayed beyond the condition
edges, but not beyond the target node vi .
This means that the sequence ss0 consists of all assignments x e D[u] with
x Vars(b), and the sequences ssi for i = 1, 2, consist of all assignments with
x e D[u]\(Ass(b) D[vi ]) and x L[vi ].
Transformation PDE for Load Operations:
u
ss1
u
y M [e]

y M [e]

v
ss2
v

We do not present transformations that would delay load operations. Instead we


treat them like nondelayable assignments. This means that the sequence ss1 consists
of assignments x e D[u] (Ass(e) Occ(y)) with x L[v]\{y} Vars(e).

106

1 Foundations and Intraprocedural Optimization

Furthermore, the sequence ss2 consists of all assignments x e D[u]\(Occ(y)


Ass(e) D[v]) with x L[v].
Transformation PDE for Write Operations:
The next transformation rule treats edges labeled with memory-write operations.
These operations are not delayed.
u
ss1
u
M [e1 ] e2

M [e1 ] e2

v
ss2
v

Again sequences ss1 and ss2 of assignments are required that are placed before
and after the original statement, resp. The sequence ss1 consists of the assignments
x e D[u] (Ass(e1 ) Ass(e2 )), and the sequence ss2 of the assignments
x e D[u]\(Ass(e1 ) Ass(e2 ) D[v]) with x L[v].
According to our assumption, a set X of variables is live at program exit. The last
rule treats the case of a nonempty set X of live variables. Assignments to variables
in X that are delayable at program exit need to be placed just before program exit.
This is done by the following transformation rule:
Transformation PDE for Program Exit:
Let u be the exit point of the original program. Then a new program exit point is
introduced:
u

u
ss

Here ss is the set of assignments x e in D[u] with x X .


Example 1.15.2 Let us return to our introductory example, and let us assume that
no variable is live at program exit. No new program exit needs to be introduced in
this case. The analyses for live variables and for delayable assignments result in:
L
D
0 {x}

1 {x, T } {T x + 1}
2 {x, T } {T x + 1}
3

The application of transformation PDE transforms the control-flow graph of Fig. 1.58
into the control-flow graph of Fig. 1.59.



1.15 Removal of Partially Dead Assignments

107

0
T x+1

1
4

2
y M [T ]

Fig. 1.60 A loop without delayable assignments


0

0
T x+1

T x+1

y M [T ]

y M [T ]

Fig. 1.61 The inverted loop and the removal of partially dead code

When we want to argue the correctness of transformation PDE we must take


into account that some assignments are removed from the control-flow graph. The
removed assignments are, however, only lost if their left sides are dead at the subsequent program point. Otherwise, they are remembered in the corresponding analysis
information and are moved along the control-flow edges and reinserted into the
control-flow graph when they can no longer be delayed.
An application of transformation PDE may remove some assignments and thus
open new chances to remove newly partially dead code. Like with transformation
PRE it can pay to apply this transformation repeatedly. One question is whether
the transformation may not sometimes decrease the efficiency of the program, for
example, by moving an assignment into a loop.
Example 1.15.3 Consider the loop of Fig. 1.60. The assignment T x + 1 is not
delayable at any program point. This is different after the loop has been reversed as
shown in Fig. 1.61. The assignment can now be moved past the loop-entry edge. This
removes partially dead code.


Transformation PDE did, in fact, not deteriorate the efficiency of the example
program. In fact, it can be proved that it never decreases the efficiency of programs.
Conclusion 1.15.2 We have by now seen a number of optimizing program transformations. Several of these transformations may trigger another transformation
applicable. For instance, transformation RE (removal of redundancies) may introduce inefficiencies, which may be removed by a subsequent application of transformation CE (copy elimination), followed by an application of DE (removal of

108

1 Foundations and Intraprocedural Optimization

assignments to dead variables). It is an interesting question in which order to apply


optimizing transformations. Here is a meaningful order of the optimizing transformations we have described:
LR
CF

Loop inversion
Alias analysis
Constant folding
Interval analysis
RE Removal of redundant computations
CE Copy propagation
DE Elimination of dead assignments
PRE Removal of partially redundant assignments
PDE Removal of partially dead assignments

1.16 Exercises
1. Available assignments
Regard the control-flow graph of the function swap of the introduction.
(a) Determine for each program point u the set A[u] of assignments available at u.
(b) Apply transformation RE to remove redundant assignments.
2. Complete lattices
Consider the complete lattice M of monotonic boolean functions with two variables:
1
xy
y

x
xy
0

(a) Determine the set of all monotonic functions that map M to the complete
lattice 2 = {0, 1} with 0 < 1.
(b) Determine the order on these functions.
3. Complete lattices Show the following claims:
(a) If D1 , D2 are complete lattices then so is
D1 D2 = {(x, y) | x D1 , y D2 }

1.16 Exercises

109

where (x1 , y1 )  (x2 , y2 ) if and only if x1  x2 and y1  y2 .


(b) A function f : D1 D2 D is monotonic if and only if the functions:
f x :D2 D
f y :D1 D

f x (y) = f (x, y)
f y (x)= f (x, y)

(x D1 )
(y D2 )

are monotonic.
4. Complete lattices
For a complete lattice D let h(D) = n be the maximal length of a proper ascending
chain  d1   dn . Show that for complete lattices D1 , D2 it holds that:
(a) h(D1 D2 ) = h(D1 ) + h(D2 )
(b) h(Dk ) = k h(D)
(c) h([D1 D2 ]) = #D1 h(D2 ), where [D1 D2 ] is the set of functions
f : D1 D2 and #D1 is the cardinality of D1 .
5. Introduction of temporary variables for expressions
Introduce temporary variables Te for given expressions e, such that the value of
e is stored in Te after each evaluation of e.
(a) Define a program transformation that introduces these temporary variables.
Argue that this transformation does not change the semantics of programs.
(b) What influence does this transformation have on the removal of redundant
computations RE?
(c) For which expressions is this introduction of temporary variables profitable?
How can the number of variable-to-variable copies be decreased by a subsequent transformation?
(d) How can the transformation PRE profit from this introduction of temporary
variables?
6. Available memory reads
Extend the analysis and transformation of available assignments such that the
new analysis also determines the availability of memory reads x M[e].
7. Complete lattices
Let U be a finite set and D = 2U be the powerset over U, ordered by  = .
Let F the set of all functions f : D D of the form
f x = (x a) b
with a, b D. Show:
(a) F contains the identity function, and it has a least and a greatest element;
(b) F is closed under composition, , and ;
(c) A (postfix-) operation * can be defined on F by:
f x =


j0

f jx

110

1 Foundations and Intraprocedural Optimization

8. Fixed-point iteration
Consider a system of inequalities of the form:
xi  f i (xi+1 ), where f i is monotonic, for i = 1, . . . , n
Show:
(a) The fixed-point iteration terminates after at most n iterations.
(b) One round of a round-robin iteration suffices given a suitable order of variables.
9. Dead variables
Define a program analysis that determines for each program point the set of dead
variables directly, that is, not as the complement of the set of live variables.
(a) Define the associated lattice.
(b) Define the associated abstract edge effects.
(c) Extend the analysis to an analysis of definite deadness.
How could one prove the correctness of the analysis?
10. Distributivity I
Let f 1 , f 2 : D D be two distributive functions over a complete lattice D.
Show:
(a) f 1 f 2 is also distributive;
(b) f 1  f 2 is also distributive.
11. Distributivity II
Prove Theorem 1.7.1.
12. Distributivity III
Consider the function
f (X ) = (a X )?A : B
with A, B U for some universe U .
(a) Show that f is distributive both w.r.t. the ordering and the ordering on
the set 2U , whenever B A.
(b) Show for the ordering that f is completely distributive, if B = .
(c) Show for the ordering that f is completely distributive, if A = U .
13. Optimization of function swap
Apply first the optimization RE, then the optimizations CE and DE to the example program swap!
14. Constant propagation: signs
Simplify constant propagation such that it only considers the signs of values.
(a) Define an appropriate partial order for this analysis.
(b) Define the description relation ?
(c) Define appropriate abstract operators on the abstract values.

1.16 Exercises

111

(d) Prove that your operators respect the description relation .


(e) Define abstract edge effects for the condition edges. Argue that these are
correct.
15. Constant propagation: excluded values
Extend constant propagation in such a way that the new analysis determines not
only definite values for variables, but also definitely excluded values.
Consider, e.g., a conditional:
if (x = 3) y x;
else z x;
Variable x can definitely not have the value 3 in the else part.
(a)
(b)
(c)
(d)
(e)

Define an adequate partial order on values.


Define the associated description relation .
Define meaningful abstract operators on the values.
Prove that the abstract operators respect the description relation .
Strengthen the abstract edge effects for particular conditions and prove their
correctness.

16. Constant propagation: memory cells


Extend constant propagation such that the contents of some memory cells are
also tracked.
(a) Define the new abstract states.
(b) Define the new abstract edge effects for edges with load and store operations.
(c) Argue for the correctness of these new abstract edge effects.
17. Stripes
A generalization of constant propagation is obtained when sets of integers with
more than one element are not abstracted just to the unknown value , but to
a common linear progression. Such a progression is called a stripe. The single
value 3 then could be described by the linear progression 3 + 0, while the
elements from the set {1, 3, 7} all could be described by the linear progression
1 + 2.
In general, the elements of the stripes domain are given by:
{(a, b) | 0 a < b} {(a, 0) | a Z}
where the description relation between integers and stripes is given by
z (a, b) iff z = a + b for some Z.
(a) Define a natural ordering  on stripes such that z (a, b) implies
z (a  , b ) whenever (a, b)  (a  , b ).
(b) Show that the partial order of stripes has no infinite strictly ascending chains.
(c) Define a least upper bound operation on stripes and show that every nonempty
set of stripes has a least upper bound.

112

1 Foundations and Intraprocedural Optimization

(d) Define abstract versions of the arithmetic operators + and for stripes.
(e) Use that to create an analysis in the style of constant propagation of the
stripes.
18. Interval operators
Define the abstract operations ! (negation) and 
= (inequality).
19. Description relation for intervals
Prove that the abstract multiplication for intervals respects the description relation , i.e., show that z 1 I1 and z 2 I2 together imply that
(z 1 z 2 ) (I1  I2 )
20. Interval analysis: refined widening
Define a partial order on intervals that makes it possible to modify the lower as
well as the upper bounds at most r times.
Define the description relation for this abstract domain. Define a new widening.
21. Interval analysis: termination
Give an example program for which the interval analysis does not terminate
without widening.
22. Alias analysis
Consider the following program:
for (i 0; i < 3; i++) {
R new();
R[1] i;
R[2] l;
l x;
}
Apply the point-to analysis and the alias analysis of Sect. 1.11 to this program.
23. Alias analysis: semantics
In Sect. 1.11, we introduced an instrumented operational semantics to prove the
correctness of the points-to analyses. Prove that this instrumented semantics is
equivalent to the natural operational semantics for programs with dynamic
allocation of memory blocks. To do this, formalize an equivalence notion that
relates the different sets of addresses employed by the two semantics, though
not globally, but for each concrete program execution.
24. Alias analysis: number of iterations
Show that the least fixed-point is already reached in exactly one iteration over
all edges in the equality-based alias analysis of Sect. 1.11.
25. Points-to analysis with Stripes
The precision of point-to analyses can be improved by an accompanying analysis of stripes (see Exercise 17). considering blocks identified through abstract
addresses not monolithically, but distiguishing an access A[e1 ] from an access
A[e2 ] if the stripes corresponding to the index expressions e1 and e2 do not inter-

1.16 Exercises

113

sect. Assume, e.g., that the stripe corresponding to e1 is (1, 2), while the stripe
= 0 + 42 , these
corresponding to e2 is (0, 4). Since for all 1 , 2 Z, 1 + 21 
accesses definitely go to different memory locations.
Assume for the following, that for every program point v, an assignment S[v]
from the int variables of the program to stripes is given. The goal is to design a
refined points-to analysis which for each abstract address l, maintains a modulus
b together with a mapping:


l : {0, . . . , b 1} 2Val

(a) Define a partial ordering on such descriptions of heaps. How can a description relation be defined? What is the least upper bound operation between
two such abstract heaps?
(b) Define abstract effects of the various statements. How do you interpret
the new statement? What about reads from and writes to memory? Design
accesses in such a way that the modulus b for an abstract location l is the
gcd of all possible accesses to l.
(c) Use your analysis to infer more precise may alias information for expressions
A[e] and A [e ] occurring at different program points.
26. Worklist iteration
Perform worklist iteration for the computation of available assignments for the
factorial program. Determine the number of executed evaluations of right sides.
27. Available memory look-ups
Assume that we are given an equivalence relation on pointer variables which
equates variables R1 , R2 , if they may point to the same block in memory.
(a) Provide a refinement of availability analysis of expressions in variables to
memory reads and memory writes, which takes the alias information into
account.
(b) Use this analysis to extend redundancy elimination to memory operations.
(c) Apply your transformation to the body of the swap function from the introduction!
(d) Can your idea be extended also to partial redundancy elimination? How?
(e) Would your transformation benefit from stripes information (see Exercise
25)? Why?
28. Loop-invariant code
Perform code motion of loop-invariant code in the following program:
for (i 0; i < n; i++) {
b a + 2;
T b + i;
M[T ] i;
if ( j > i) break;
}

114

1 Foundations and Intraprocedural Optimization

Could the loop-invariant code also be moved if the condition if ( j > i) . . . is


located at the beginning of the loop body? Justify your answer.
29. Loop-dominated programs
A program is called loop dominated if each loop has exactly one entry point,
i.e., one program point that dominates all program points in the loop.
(a) Prove that in loop-dominated programs the set of entry points of loops is a
feedback vertex set of the control-flow graph.
(b) Transform the loop in the example program for interval analysis into a dowhile loop.
(c) Perform interval analysis without narrowing on the transformed program.
Compare the result with the result of Sect. 1.10.

1.17 Literature
The foundations of abstract interpretation were laid by Cousot and Cousot in 1977.
Interval analysis is described for the first time in Cousot and Cousot (1976). This
article also describes the widening and narrowing techniques. A precise interval analysis without widening and narrowing is described in Gawlitza and Seidl
(2007). Monotone analysis frameworks are introduced by Kam and Ullman in 1976,
1977. Program analyses with distributive edge effects are considered by Kildall in
1973. Giegerich et al. develop the strengthening of liveness analysis to true liveness
(1981).
The recursive fixed-point algorithm were formally proven correct in Coq (Hofmann et al. 2010a, b). The presentation of partial-redundancy elimination and partial
dead-code elimination follows work by Knoop, Rthing, and Steffen (Knoop et al.
1994a, b; Knoop 1998).
Karr (1976) and Granger (1991) present generalizations of constant propagation
to the analysis of linear equalities. Their approaches are extended to interprocedural
versions in Mller-Olm and Seidl (2004, 2005, 2007). Cousot and Halbwachs (1978)
introduce an analysis of linear inequalities between variables. Practical applications
in the analysis of C programs are extensively discussed by Simon in 2008.
Our rather brief overview only discusses very simple analyses of dynamic data
structures. The alias-analysis problem for the programming language C is a real
challenge, and is even more difficult if pointer arithmetic is considered. The simple
methods described here follow the works of Steensgaard (1996) and Anderson et al.
(2002). Fhndrich et al. present interprocedural extensions (2000) as do Liang et al.
(2001). Ramalingam (2002) deals extensively with loop structures in control-flow
graphs. He gives axiomatic and constructive definitions of loop structures. Sagiv et
al. develop elaborated techniques for the analysis of program with dynamic memory
allocation and linked structures in the heap (Sagiv et al. 1999, 2002). These analysis
are of high complexity, but very powerful. They may automatically derive statements
about the shape of linked data structures.

Chapter 2

Interprocedural Optimization

2.1 Programs with Procedures


In this chapter, we extend our programming language by procedures. Procedures
have declarations and calls. Procedure calls have a complex semantics: The actual
computation is interrupted at the call and resumed when the called procedure has
terminated. The body of the procedure often forms a new scope for local names, i.e.,
names that are only visible within the body. Global names are possibly hidden by
local names. Upon a procedure call, actual parameters are passed to formal parameters
and, upon termination, results are passed back to the caller. Two different names for
a variable coexist if the variable is passed to a reference parameter.
Our programming language has global and local variables. In this chapter,
we use the naming convention that names of local variables begin with uppercase letters, while names of global variables begin with lowercase letters. Our
interprocedural optimizations can only be applied if the called procedures can
be statically determined. Therefore, we assume that at each call site the called
procedure is explicitly given. In more expressive programming languages such
as C, it is not always possible to determine statically which procedure is called,
because the call goes indirectly through a function pointer. The same holds for objectoriented languages such as Java or C#, where the dynamic type of an object decides
which method is actually called. For these languages an extension of the proposed
techniques are required.
For simplicity, our programming language does not provide parameters for procedures or mechanisms for returning results. This is not as restrictive as it might
seem, since call-by-value parameters as well as the passing of result values can be
simulated through local and global variables, see Exercise 1. Since only procedures
without parameters are considered, the only extension of the programming language
as to statements, thus, is a procedure call, which has the form f().
A procedure f has a declaration:
f() { stmt }
H. Seidl et al., Compiler Design, DOI: 10.1007/978-3-642-17548-0_2,
Springer-Verlag Berlin Heidelberg 2012

115

116

2 Interprocedural Optimization
f ()
4
Ab

main()

0
Zero(A 1)

b3
1

NonZero(A 1)
9

6
f()

b A1
7

2
M [17] ret

ret 1

f()

8
ret A ret
10

Fig. 2.1 The factorial program with its procedures

Each procedure f has one entry point, startf , to which control is passed from the
caller, and one exit point, stopf , from which control is passed back to the caller. We
forbid outgoing edges from exit points of procedures to ensure deterministic program
execution. Program execution starts with the call to a dedicated procedure main ().
Example 2.1.1 We consider the factorial function, implemented by a procedure f
and a procedure main calling f after setting the global variable b to 3.
main() {
b 3;
f();
M[17] ret;
}

f() {

A b;
if (A 1) ret 1;
else {
b A 1;
f();
ret A ret;
}

The global variables b and ret hold the actual parameter and the return value of f,
respectively. The formal parameter of f is simulated by the local variable A, which
as first action in the body of procedure f receives the value of the actual parameter
b.

Programs in our extended programming language can be represented by sets of
control-flow graphs, one control-flow graph per procedure. Figure 2.1 shows the two
control-flow graphs for the program of Example 2.1.1.

2.2 Extended Operational Semantics

117

2.2 Extended Operational Semantics


In order to argue about the correctness of programs and of program transformations,
we need to extend our semantics of control-flow graphs to a semantics of control-flow
graphs containing procedure calls. In the presence of procedure calls, executions
of programs can no longer be described by paths, but need nested paths for their
description. The actual nesting corresponds to the sequence of procedure calls. Each
procedure call corresponds to an opening parenthesis, the return to the call site corresponds to the corresponding closing parenthesis. The sequence of already opened,
but not yet closed parentheses can be represented by a stack, the call stack. Call
stacks, therefore, form the foundation of the operational semantics of our program
language with procedures.The operational semantics is defined by a one-step computation relation, , between configurations. A configuration is a triple consisting of a
call stack, stack, a binding of the global variables, globals, and a contents of memory,
store. A call stack consists of a sequence of stack frames, one for each entered but
not yet terminated procedure. Each stack frame consists of a program point; this is
the one in the associated procedure to which the computation has progressed, and a
local state, that is, a binding of the local variables of the procedure:
configuration
globals
store
stack
frame
locals

==
==
==
==
==
==

stack globals store


Glob Z
NZ
frame frame
point locals
Loc Z

Glob and Loc denote the set of global and local variables of the program, and point
denotes the set of program points.
Stacks grow from the bottom upwards in our graphical representation. The stack
frame of the actual procedure is always on top. Sequences of stacks representing
progress of a computation are listed from left to right. The stack of a caller is immediately to the left of the stack of the callee. The execution of the factorial program
of Example 2.1.1 yields a sequence of call stacks (see Fig. 2.2).

Computations
The steps of a computation always refer to the actual procedure. In addition to the
steps already known from programs without procedures, we need the following new
cases:

118

2 Interprocedural Optimization

5
1

A3

A3

A1

A2

A2

A2

A3

A3

A3

10 A 1
8

A2

A2

10 A 2

A3

A3

A3

8
2

A3

10 A 3
2

Fig. 2.2 A sequence of call stacks for the program of Example 2.1.1

call k = (u, f(), v) :


( (u, Loc ) , Glob , )

 ( (v, Loc ) (u f , f ) , Glob , )


startf entry point of f

return from a call :


( (v, Loc ) (rf , _) , Glob , )  ( (v, Loc ) , Glob , )
stopf exit point of f
Mapping f binds the local variables to their values at the entry to procedure f.
Program execution would be in principle nondeterministic if these values are left
undefined and therefore could be arbitrary. To avoid this complication, we assume
0 | x Loc}. This
that local variables are always initialized to 0, i.e., f = {x 
variable binding is given the name 0.
These two new transitions can be expressed by the two functions enter and
combine, whose abstractions are the basis for interprocedural analysis. Both are
applied to execution states of the program consisting of
1. a function Loc binding local variables to their values,
2. a function Glob binding global variables to their values, and
3. a function describing the contents of memory.
The effect of calling a procedure is described by the function enter. It computes the
state after entry into the procedure from the state before the call. It is defined by:
enter(Loc , Glob , ) = (0, Glob , )
Its arguments in an application are the execution state before the call.
The function combine determines the new state after the call by combining the
relevant parts of the state before the call, namely the binding of the callers local
variables, with the effect the called procedure had on the global variables and the
memory.

2.2 Extended Operational Semantics

119




combine((Loc , Glob , ), (Loc
, Glob
, )) = (Loc , Glob
, )

A sequence of computation steps, i.e., a computation, can equivalently represented


by the sequence of edges as in the intraprocedural case, into which the subcomputations of calls are embedded. In order to indicate the nesting structure, we add two
kinds of new labels,
f for the call to procedure f, and
/f for the exit from this
procedure. In the example, we have the following sequence:

main
0, 1
f 5, 6, 7

f 5, 6, 7

f 5, 6, 7

f 5, 9, 10
/f
8, 10
/f
8, 10
/f
8, 10
/f
2, 3
/main
where we have listed only the traversed program points instead of the full edges
(u, lab, v) to improve readability.

Same-Level Computations
A computation that leads from a configuration ((u, Loc ), Glob , ) to a config ),

uration ((v, Loc
Glob , ) is called same-level since in such a computation each
entered procedure is also exited, which means that the height of the stack at the
end of the computation is exactly the height it had at the beginning. This means
that the nested computation path contains for every opening parenthesis
f also the
corresponding closing parenthesis
/f . Since the same-level computation does not
consult any stack frame in the stack below the initial one, it gives rise to computations leading from a configuration ( (u, Loc ), Glob , ) to the configuration
),

( (v, Loc
Glob , ) for every call stack .
Let us assume that a same-level computation
sequence leadsfrom a configura
),

tion ((u, Loc ), Glob , ) to a configuration (v, Loc
Glob , . This means that
this computation starts at program point u and reaches program point v, possibly with intervening procedure calls, if the variables have the values as given by
= Loc Glob and the memorys content is described by . Here, denotes the
disjoint union of two functions. The value of the variables and the content of memory


after the computation is described by = Loc
Glob and . The same-level computation therefore defines a partial function [[ ]], which transforms (Loc , Glob , )
,

into (Loc
Glob , ). This transformation can be determined by induction over the
structure of the computation:

120

2 Interprocedural Optimization

[[ k]] = [[k]] [[ ]]

for a normal edge k

[[1
f 2
/f ]] = H ([[2 ]]) [[1 ]]

for a procedure f

where the state transformation caused by the computation effected by a procedures


body is translated into one for the caller of that procedure by the operator H ( ):
H (g) (Loc , Glob , ) = combine ((Loc , Glob , ), g (enter (Loc , Glob , )))
Besides the concept of same-level computation we also need the notion of u-reaching
computations. These start in a call of procedure main and lead to a call stack whose
topmost stack frame contains program point u. Such computations can again be
described as sequences of edges that contain labels
f and
/f for procedure entry
and exit. Each such sequence is of the form:
=
main 0
f1 1
fk k
for procedures f1 , . . . , fk and same-level computations 0 , 1 , . . . , k . Each such
sequence causes the transformation [[ ]] of a triple (Loc , Glob , ) consisting of
variable bindings and a memory contents before the computation into a triple after
the execution of . We have
[[
f ]] = [[ ]] enter [[ ]]
for a procedure f and a same-level computation sequence .

Implementation of Procedures
The given operational semantics is closely related to an actual implementation of
procedures. Therefore, the semantics is useful to estimate the effort needed by a
procedure call.
Steps to be done before entering a procedure body:

allocation of a stack frame;


saving the local variables;
saving the continuation address;
jump to the code for the procedure body.

Steps to be taken upon termination of a procedure call:


release of the stack frame;
restoration of the local variables;
jump to the continuation address.
Saving and restoring local variables is easy for a stack-based implementation. Realistic implementations on real machines, however, make use of the registers of the

2.2 Extended Operational Semantics

121

machine to have fast access to the values of local variables. The values of local variables of the caller may have to be stored to memory when control switches from the
caller to the callee, and then reloaded into registers when control returns from the
callee to the caller. While access to the values of local variables during the execution
of a procedure body is fast, saving and restoring the values may require considerable
effort.

2.3 Inlining
The first idea to reduce the overhead of procedure calls is to place a copy of the
procedures body at the call site. This optimization is called inlining.
Example 2.3.1 Consider the program:
abs () {
b1 b;
b2 b;
max ();
}

max () {
if (b1 < b2 ) ret b2 ;
else ret b1 ;
}

Inlining the body of the procedure max () results in the program:


abs () {
b1 b;
b2 b;
if (b1 < b2 ) ret b2 ;
else ret b1 ;
}

The transformation Inlining for procedures is made simple by the simulation of
parameter passing through the use of local and global variables. However, even the
treatment of parameterless procedures offers some problems.
Inlining can only be done at call sites where the called procedure is statically
known. This is not always the case as exemplified by programming languages such as
C allowing indirect calls through function pointers. Object-oriented languages offer
the problem that the method to call depends on the run-time type of the corresponding
object. Additional analyses may be required to restrict the set of potentially called
functions or methods. Inlining is only possible if the set of functions that can be
called at this call site can be constrained to exactly one function.
Furthermore, it must be secured that the body of the inlined procedure does not
modify the local variables of the calling procedure. This can be achieved by renam-

122

2 Interprocedural Optimization

main

abs

max

Fig. 2.3 The call graphs of Examples 2.1.1 and 2.3.1

ing the local variables of the called procedure. Still, there exists the threat of code
explosion if a procedure is called multiple times. Even worse, complete inlining
for recursive procedures does not terminate. Recursion must therefore be identified
in procedure calls before inlining can be applied. This can be done using the call
graph of the program. The nodes in this graph are the procedures of the program. An
edge leads from a procedure p to a procedure q if the body of p contains a call to
procedure q.
Example 2.3.2 The call graphs for the programs of Examples 2.1.1 and 2.3.1 are
quite simple (see Fig. 2.3). In the first example, the call graph consists of the nodes
main and f, and procedure main calls procedure f, which calls itself. In the second
example, the call graph consists of the two nodes abs and max and the edge leading
from abs to max.

There are various strategies to decide for a procedure call whether to inline the
procedure or not:
Inline only leaf procedures; these are procedures without any calls.
Inline at all nonrecursive call sites; these are call sites that are not part of strongly
connected components of the call graph.
Inline up to a certain fixed depth of nested calls.
Other criteria are possible as well. Once a procedure call is selected for inlining the
following transformation is performed:
Transformation PI:
u
f()
v
u
Af = 0;

copy
of f

;
v

(A Loc)

2.3 Inlining

123

The edge labeled with the empty statement can be avoided if the stop node of f has
no outgoing edges. Initializations of the local variables are inserted into the inlined
procedure because our semantics requires these to receive initial values.

2.4 Tail-Call Optimization


Consider the following example:
f () {

if (b2 1) ret b1 ;
else {
b1 b1 b2 ;
b2 b2 1;
f ();
}

The last action to be performed in the body of the procedure is a call. Such a call
is known as a last call. Last calls do not need a separate frame on the call stack.
Instead, they can be evaluated in the same frame as the caller. For that, the local
variables of the caller must be replaced by the local variables of the called procedure
f. Technically, this means that the call of procedure f is replaced by an unconditional
jump to the beginning of the body of f. If the last call is a recursive call then this tail
recursion is tranformed into a loop, i.e., into iteration.In the example, this looks like:
f () {
_f : if (b2 1) ret b1 ;
else {
b1 b1 b2 ;
b2 b2 1;
goto _f;
}
}
Transformation LC:

u
A = 0;

f()
f() :

f() :

(A Loc)
v

According to our semantics, the local variables must be initialized with 0 before the
procedures body can be entered.
Tail-call optimization is particularly important for declarative programming languages, which typically do not offer loops. Iterative computations have to be

124

2 Interprocedural Optimization

expressed by recursion, which is more expensive than iteration because of the allocation and deallocation of stack frames. An advantage of tail-call optimization over
inlining is that it avoids code duplication. Last-call optimization can also be applied
to nonrecursive tail calls. A somewhat disturbing feature of tail-call optimization
is that it introduces jumps out of one procedure into another one, a concept that is
shunned in modern high-level programming languages.
The reuse of the actual stack frame for the procedure called in a tail call is only
possible if the local variables of the caller are no longer accessible. This is the case in
our programming language. The programming language C, however, permits access
to local variables at arbitrary positions in the call stack through pointers. Similar
effects are possible if the programming language provides parameter passing byreference. To apply tail-call optimization, the compiler then has to ensure that the
local variables of the caller are indeed no longer accessible. An analysis determining
sets of possibly accessible variables is the subject of Exercise 2.

2.5 Interprocedural Analysis


The analyses presented so far can only analyze single procedures. Applying such
analyses to each procedure in a program has the advantage that complexity only grows
linearly with the number of procedures. These techniques also work on separately
compiled pieces of programs. The price to be paid is the limited precision that can be
achieved. The analyses have little information at procedure boundaries and, therefore,
must assume the worst.This means that typically no information about variables and
data structures that the procedure might possibly access is available. For constant
propagation, it means that only the values of local variables can be propagated. These
are definitely not accessible from the outside.
However, with separate compilation, several related procedures may be compiled
together since, for instance, they are defined in the same file or are parts of the
same module or class. For this reason techniques are required that allow to analyze
programs consisting of several procedures. We will, as an example, generalize an
intraprocedural analysis to an interprocedural analysis. The considered example is
copy propagation (described in Sect. 1.8). This analysis determines for a given variable x at each program point the set of variables that definitely contain the value last
assigned to x. Such an analysis makes sense also across procedure boundaries.
Example 2.5.1 Consider the following example program:
main () {
A M[0];
if (A) print();
b A;
work ();
ret 1 ret;
}

work () {
A b;
if (A) work ();
ret A;
}

2.5 Interprocedural Analysis

125

work ()

main()
0

A M [0]

Ab

1
Zero(A)

8
NonZero(A)

Zero(A)

NonZero(A)

print ()
3
bA
4
work ()

work ()
10
ret A
11

5
ret 1 ret
6
Fig. 2.4 The control-flow graphs for Example 2.5.1

Figure 2.4 shows the control-flow graphs associated with this program. Copying
the value of global variable b into local variable A inside procedure work can be
avoided.

The problem with the generalization of the intraprocedural approach to copy propagation is that the interprocedural analysis no longer works on one control-flow graph,
but needs to cope with the effects of possibly recursively called procedures.

2.6 The Functional Approach


Let us assume that we are given a complete lattice D of abstract states as potential
analysis information at program points. The analysis performs a step along a normal
edge k by applying to this information the abstract edge effect [[k]] : D D
corresponding to the edge label in the analysis. Now we have to deal with edges
labeled with procedure calls.
The essential idea of the functional approach is to view the abstract edge effect of
such a call also as a transformation of analysis information. The abstract edge effect
of a call edge k is, thus, described by a function [[k]] : D D. The difference to
the abstract edge effects of normal edges is that this function is not known before the
analysis is performed. It is only determined during the analysis of the body of the
procedure.

126

2 Interprocedural Optimization

To realize the analysis we need abstractions of the operations enter and combine
of the operational semantics that are specific for the analysis to be designed. The
abstract operation enter initializes the abstract value for the entry point of the
procedure using the analysis information available at the program point before
the procedure call. The operation combine combines the information obtained
at the exit of the procedure body, its second argument, with the information available upon procedure entry, its first argument. These operations therefore have the
functionality:
enter 
: DD
combine : D2 D
The overall abstract effect of a call edge k then is given by:
[[k]] D = combine (D, [[f]] (enter  D))
if [[f]] is the transformation associated with the body of procedure f.

Interprocedural Copy Propagation


Let us consider how the functions enter and combine look in the analysis for
copy propagation. The case of all variables being global is particularly simple: The
function enter  is the identity function, that is, it returns its argument. The function
combine returns its second argument.
enter  V = V

combine (V1 , V2 ) = V2

Let us now consider an analysis of programs with local variables. The analysis
should determine for each program point the set of variables containing the value
last assigned to the global variable x.
During program execution, a copy of this value may be stored in a local variable
of the caller of a procedure. This local variable is not visible inside the callee. If the
callee assigns a new value to x the analysis cannot assume for any local variable of
the caller that it (still) contains xs value. To determine whether such a recomputation
may take place we add a new local variable ; xs value before the call is recorded in
it. is not modified during the call. The analysis checks whether after return from the
callee, is guaranteed to still hold the value that x had before the call. Technically
it means that function enter  adds the local variable to the set of global variables
that had the value of x before the call. After return from the callee, the local variables
of the caller still contain the last computed value of x if is contained in the set
returned by the analysis of the callee. We define:
= V Glob {}
enter V
combine (V1 , V2 ) = (V2 Glob) (( V2 ) ? V1 Loc : )

2.6 The Functional Approach

127

where Loc = Loc{}. The complete lattice used for the analysis of interprocedural
copy propagation for a global variable x is:
V = {V Vars | x V }
ordered by the superset relation, , where Vars = Vars {}.
Abstract Edge Effects of Call Edges
In analogy to the concrete semantics we define, as a first step, for each complete
lattice D, for monotonic edge effects [[k]] , and for monotonic functions enter  and
combine the abstract transformation effected by a same-level computation:
= [[k]] [[ ]]
for a normal edge k
[[ k]]

[[1
f 2
/f ]] = H  ([[2 ]] ) [[1 ]] for a procedure f
where the transformation
H  : (D D) D D
is defined by

H  g d = combine (d, g(enter  (d)))

The abstract effect [[f]] for a procedure f should be an upper bound for the abstract
effects [[ ]] of each same-level computation for f, that is, each same-level computation starting at the entry point of f to its exit point. We use a system of inequalities
over the lattice of all monotonic functions in D D to determine or to approximate
the effects [[f]] .
[[startf ]]
[[v]]
[[v]]
[[f]]

 Id
 H  ([[f]] ) [[u]]
 [[k]] [[u]]
 [[stopf ]]

startf entry point of procedure f


k = (u, f(), v) call edge
k = (u, lab, v) normal edge
stopf exit point of f

The function [[v]] : D D for a program point v of a procedure f describes the


effects of all same-level computations that lead from the entry point of f to v.
The expressions on the right side of the inequalities describe monotonic functions.
Thus, the system of inequalities has a least solution. To prove the correctness of the
approach one proves the following generalization of Theorem 1.6.1:
Theorem 2.6.1 Let [[ . ]] be the least solution of the interprocedural system of
inequalities. We then have:
1. [[v]]  [[ ]] for each same-level computation leading from start f to v, if v is
a program point of procedure f;

128

2 Interprocedural Optimization

2. [[f]]  [[ ]] for each same-level computation of procedure f.

Theorem 2.6.1 can be proved by induction over the structure of same-level computations . The theorem guarantees that each solution of the system of inequalities
can be used to approximate the abstract effects of procedure calls.
Problems of the Functional Approach
There are two fundamental problems connected to this approach. First, the monotonic
functions in D D need to be effectively represented. However, functions occurring
here do not always have simple representations. In the case of finite complete lattices
D all occurring functions can, at least in principle, be represented by their value
tables. This is no longer possible if the complete lattice D is infinite as is the case
for constant propagation. In this case a further complication may arise, namely that
of infinite ascending chains, which never stabilize.
Let us return to our example of interprocedural copy propagation. In this case,
the following observation helps: The complete lattice V is atomic. The set of atomic
elements in V is given by:
{Vars \{z} | z Vars \{x}}
Furthermore, all occurring abstract edge effects are not only monotonic, but are even
distributive with respect to . Instead of considering all monotonic functions in
V V it suffices to calculate in the sublattice of distributive functions.
Distributive functions over an atomic lattice D have compact representations. Let
A D be the set of atomic elements in D. According to Theorem 1.7.1 in Sect. 1.7
each distributive function g : D D can be represented as
g(V ) = b 


{h(a) | a A a  V }

for a function h : A D and an element b D.


Each distributive function g for the propagation of copies can therefore be represented by at most k sets where k is the number of variables of the program. Thereby,
the height of the complete lattice of distributive functions is bounded by k 2 .
Example 2.6.1 Let us consider the program of Example 2.5.1. The set Vars is given
as {A, b, ret, }. Assume that we want to analyze copy propagation for the global
variable b. The functions
=: g1 (C)
[[A b]] C = C {A}
[[ret A]] C = (A C) ? (C {ret}) : (C\{ret}) =: g2 (C)
correspond to the assignments A b and ret A. The two abstract edge effects
g1 , g2 can be represented by the two pairs (h 1 , Vars ) and (h 2 , Vars ) with

2.6 The Functional Approach

129

h1
h2
{b, ret, }
Vars
{b, }
{b, A, } {b, A, }
Vars
{b, A, ret} {b, A, ret} {b, A, ret}
In a first round of a round-robin iteration, program point 7 is associated with the
identity function; program points 8, 9, and 10 with the function g1 ; and program
point 11 with the composition g2 g1 . This last function is given by:
g2 (g1 (C)) = C {A, ret} =: g3 (C)
It delivers the first approximation for the body of function work. Thereby, we obtain
for a call to function work:
combine (C, g3 (enter  (C))) = C {ret} =: g4 (C)
In the second round of a round-robin iteration, the new value at program point 10 is
the function g1 g4 g1 where
g4 (g1 (C)) = C {A, ret} and consequently:
= g1 (C)
g1 (C) g4 (g1 (C)) = C {A}
as in the last iteration. In this example, the fixed point is reached in the first
iteration.


An Interprocedural Coincidence Theorem


A coincidence theorem similar to 1.6.3 can be proved for analyses with distributive
edge effects. An additional condition needed for this generalization is that the edge
effect [[k]] of a call edge k = (u, f(), v) is distributive. This leads to the following
theorem.
Theorem 2.6.2 Let us assume that for each procedure f and each program point v
of f there exists at least one same-level computation from the entry point start f of the
procedure f to v. Let us further assume that all edge effects [[k]] of normal edges as
well as the transformations H  are distributive. That is, in particular,
H (

F) =


{H  (g) | g F}

for each nonempty set F of distributive functions. We have then for each procedure
f and each program point v of f,
[[v]] =


{[[ ]] | Tv }

130

2 Interprocedural Optimization

Here Tv is the set of all same-level computations from the entry point start f of
procedure f to v.
The proof of this theorem is a generalization of the proof of the corresponding theorem for intrarocedural analyses. To easily apply Theorem 2.6.2, we need a
class of functions enter  and combine as large as possible with the property that
transformation H  becomes distributive. We observe:
Theorem 2.6.3 Let enter : D D be distributive and combine : D2 D have
the form:
combine (x1 , x2 ) = h 1 (x1 )  h 2 (x2 )
for two distributive functions h 1 , h 2 : D D. Then H  is distributive, that is, it
holds that


{H  (g) | g F}
H  ( F) =
for each nonempty set F of distributive functions.
Proof Let F be a nonempty set of distributive functions. We then have:
H (


F) = h 1  h 2 ( F) enter

= h 1  h 2 ( {g enter | g F})

= h 1  ( {h 2 g enter  | g F})

= {h 1  h 2 g enter  | g F}
= {H  (g) | g F}

The second equation holds because composition is distributive in its first argument,
and the third equation holds since composition is distributive in its second argument,
provided the first argument is a distributive function.

Let us recall the definitions of the two functions enter and combine for the propagation of copies. We had:
enter V
= V Glob {}

combine (V1 , V2 ) = (V2 Glob) ( V2 ) ? V1 Loc :
= ((V1 Loc ) Glob)
((V2 Glob) Loc ) (Glob ( V2 ) ? Vars : )
The function enter  therefore is distributive. Likewise, the function combine can
be represented as the intersection of a distributive function of the first argument
with a distributive function of the second argument. Theorem 2.6.3 can therefore be
applied for the order . We conclude that the transformation H  for copy propagation
is distributive for distributive functions. Therefore, the interprocedural coincidence
theorem (Theorem 2.6.2) holds.

2.7 Interprocedural Reachability

131

2.7 Interprocedural Reachability


Let us assume that we have determined, in a first step, the abstract effects [[ f ]] of
the bodies of procedures f, or at least safely approximated them. In a second step,
we would like to compute for each program point u a property D[u] D that is
guaranteed to hold whenever program execution reaches the program point u. We
construct a system of inequalities for this analysis problem:
D[startmain ]
D[startf ]
D[v]
D[v]






enter  (d0 )
enter (D[u])
(u, f(), v) call edge
combine (D[u], [[f]] (enter (D[u]))) (u, f(), v) call edge
[[k]] (D[u])
k = (u, lab, v) normal edge

where d0 D is the analysis information assumed before program execution.


All right sides are monotonic. So, the system of inequalities has a least solution.
To prove the correctness of our approach we define the abstract effect [[ ]] : D D
for each v-reaching computation . This is in analogy to the concrete semantics. The
following theorem relates the abstract effects of the v-reaching computations to the
abstract values D[v] obtained by solving the system of inequalities.
Theorem 2.7.1 Let D[ . ] be the least solution of the interprocedural system of
inequalities given above. We then have:
D[v]  [[ ]] d0
for each v-reaching computation.

Example 2.7.1 Let us regard again the program of Example 2.5.1. We obtain for
program points 0 to 11:
0 {b}
1 {b}
2 {b}
3 {b}
4 {b}
5 {b, ret}
6 {b}

7
{b, }
8 {b, A, }
9 {b, A, }
10 {b, A, }
11 {b, A, , ret}

We conclude that the global variable b can be used instead of the local variable A
inside procedure work.

If all program points are reachable the second phase of the interprocedural analysis
also satisfies a conincidence theorem.
Theorem 2.7.2 Let us assume that for each program point v there exists at least
one v-reaching computation. Let us further assume that all effects [[k]] : D D

132

2 Interprocedural Optimization

of normal edges as well as the transformation H  are distributive. Then we have for
each program point v,

D[v] =
{[[ ]] d0 | Pv }
Here, Pv is the set of all v-reaching computations.

2.8 Demand-Driven Interprocedural Analysis


In many practical cases, the complete lattice D is finite, and the required monotonic
functions in D D are compactly representable. This holds in particular for the
interprocedural analysis of available assignments, very busy assignments (see Exercises 4 and 5, respectively) or copy propagation. However, one would like to have
interprocedural analyses for relevant problems where either the complete lattice is
infinite, or the abstract effects do not have compact representations. In these cases, an
observation can be exploited made very early by Patrick Cousot as well as by Micha
Sharir and Amir Pnueli. Procedures are often only called in constellations that can be
classified into a few abstract constellations. In theses cases it may suffice to analyze
only a few abstract calls of each procedure whose values are really needed.
To elaborate this idea, we introduce unknowns D[f, a] (f a procedure, a D)
which should receive the value of the abstract effect of the body of the procedure
f, given that it has been entered in the abstract state a. Thus, it corresponds to the
value [[f]] a. In order to determine the value of the unknown D[f, a], we introduce
a distinct unknown D[v, a] of every program point v of the procedure f and at least
conceptually, set up the following system of constraints:
D[v, a]  a
v entry point

D[v, a]  combine (D[u, a], D[f, enter  (D[u, a])])
(u, f(), v) call edge
k = (u, lab, v) normal edge
D[v, a]  [[lab]] (D[u, a])
stopf exit point of f
D[f, a]  D[stopf , a]
The unknown D[v, a] denotes the abstract state when the program point v within a
procedure is reached that was called in the abstract state a. It corresponds to the value
[[v]] a that our analysis of abstract effects of procedures would compute for program
point v.Note that in this system of inequalities modeling of procedure calls creates
nested unknowns; the value of the inner unknown influences for which second component b the value of the variable D[f, b] is queried. This indirect addressing causes
the variable dependences to change dynamically during the fixed-point iteration. The
variable dependences are, thus, no longer statically known.
This system of inequalities is, in general, very large. Each program point is copied
as often as there are elements in D! On the other hand, we do not want to solve it
completely. Only the values for those calls should be determined that really occur,

2.8 Demand-Driven Interprocedural Analysis

133

that is, whose values are demanded during the analysis. Technically, this means that
the analysis should constrain itself to those unknowns whose values are accessed
during the fixed-point iteration when computing the value D[main, enter  (d0 )].
The demand-driven evaluation of a system of inequalities needs an adequate fixedpoint algorithm. The local fixed-point algorithm of Sect. 1.12 can be applied here! It
explores the variable space according to the possibly dynamic dependences between
variables. Assume that the initial abstract state for program entry is given by d0 D.
When started with an initial query for the value of D[main, enter  d0 ], the local
fixed-point iteration will explore every procedure f only for that subset of abstract
states which are queried when trying to compute the value for the procedure exit of
main, when called with the abstract value enter  d0 .
Let us demonstrate the demand-driven variant of the functional approach with an
example analysis. We choose the interprocedural constant propagation. The complete
lattice for this analysis is as before:
D = (Vars Z )
This complete lattice is of finite height, but not finite. Interprocedural constant propagation might, therefore, not terminate. The functions enter and combine for
constant propagation are given by:


if D =
D

{A



|
A
lokal}
otherwise


if D1 = D2 =
combine (D1 , D2 ) =
D1 {b 
D2 (b) | b global} otherwise
enter D

Together with the intraprocedural abstract edge effects for constant propagation we
obtain an analysis that terminates for lattices of finite height D if and only if the
fixed-point algorithm needs to consider only finitely many unknowns D[v, a] and
D[ f, a].
Example 2.8.1 Let us consider a slight modification of the program of Example 2.5.1
with the control-flow graphs of Fig. 2.5. Let d0 be the variable binding:
, b 
, ret 
}
d0 = {A 
This leads to the following sequence of evaluations:

134

2 Interprocedural Optimization

work ()

main()
0

7
Ab

A0
8

1
Zero(A)

NonZero(A)

Zero(A)

NonZero(A)

print ()

work ()

10

bA

ret A

11

work ()
5
ret 1 ret
6
Fig. 2.5 The control-flow graphs for Example 2.8.1

A b ret
0, d0
1, d0
2, d0
3, d0
4, d0
7, d1
8, d1
9, d1
10, d1
11, d1
5, d0
6, d0
main, d0

 
0  

0  
0 0 
 0 
0 0 

0 0 
0 0 0
0 0 0
0 0 1
0 0 1

for d1 = {A 
, b 
0, ret 
}. The right side of each unknown is evaluated
at most once in this example.

In the example, the analysis terminates after only one iteration. Only one copy
needs to be considered for each program point. In general, the analysis needs to
calculate with several copies. The lattice does not have infinite ascending chains.
Thus, the analysis terminates if each procedure is called only with finitely many
arguments during the iteration.

2.9 The Call-String Approach

135
work ()

main()
0

7
Ab

A0
8

1
Zero(A)

NonZero(A)

Zero(A)

NonZero(A)

print ()
3

10
bA

enter
combine

ret A
enter

11

combine
5
ret 1 ret
6

Fig. 2.6 The interprocedural supergraph for the example of Fig. 2.4

2.9 The Call-String Approach


An alternative approach for interprocedural static analysis uses an abstraction of a
stack-based operational semantics. The goal is to determine properties of the procedures behaviors differentiated by the set of call strings corresponding to reachable
run-time stacks. Here, the call string of a run-time stack retains the sequence of procedures but ignores the values of locals. In general, the set of all reachable run-time
stacks as well as the set of potential call strings will be infinite. The trick is to keep
distinct information associated with call strings up to a fixed depth d and summarize
the property for call strings underneath the top d elements. This idea was presented
in Sharir and Pnuelis seminal work.
The complexity of this approach increases drastically with the depth d. In practical
applications, call strings of length 1 or even 0 are often used. Using stack depth 0
means to approximate the entry into a procedure f as an unconditional jump to
the beginning of procedure f and the exit of procedure f as a return jump to the
continuation address, that is, the target of the call edge. Connected to the return jump
is a restoration of the values that the local variables of the caller had before the call.
Example 2.9.1 Let us consider again the program of Example 2.8.1. The introduction of the jump edges corresponding to procedure entry and return leads to the
graph of Fig. 2.6. The graph constructed in this way is called the interprocedural
supergraph.

Let D be the complete lattice for the analysis with the abstract edge effects [[lab]]
and the functions enter  : D D and combine : D2 D for the treatment of
procedures. We set up the following system of inequalities to determine the invariants

136

2 Interprocedural Optimization
work ()

main()
0

7
A0

Ab
8

1
Zero(A)

NonZero(A)

Zero(A)

NonZero(A)

print ()
3

combine

10
bA

enter

ret A
enter

11

combine
5
ret 1 ret
6

Fig. 2.7 An infeasible path in the interprocedural supergraph of Fig. 2.6

D[v] at each program point v:


D[start main ] 
D[start f ]

D[v]

D[v]

D[f]


enter  (d0 )
enter  (D[u])
combine (D[u], D[f])
[[lab]] (D[u])
D[stopf ]

(u, f(), v) call edge


(u, f(), v) call edge
k = (u, lab, v) normal edge

Example 2.9.2 Let us again regard the program of Example 2.8.1. The inequalities
at the program points 5, 7, and 10 are:
D[5]  combine (D[4], D[work])
D[7]  enter  (D[4])
D[7]  enter (D[9])
D[10]  combine (D[9], D[work])

Correctness of this analysis is proved with respect to the operational semantics. Constant propagation finds, in this example, the same results as full constant propagation.
The interprocedural supergraph, however, contains additional paths that the program
can, in fact, never take, so-called infeasible paths. These infeasible paths can impair
the precision of the results. Such an infeasible path is shown in Fig. 2.7.
Only one abstract value is computed for each program point. Termination of the
analysis is thus guaranteed if the used complete lattice has only ascending chains
that eventually stabilize.

2.9 The Call-String Approach

137

For a comparison, let us have a look at the constraint system for call strings of
length 1. This system has the unknowns D[v, ] where v is a program point and
is a call string of length at most 1. Then we define:
D[start main , ]
D[start f , g]
D[v, ]
D[v, ]
D[f, ]

 enter  (d0 )
 enter (D[u, ])
 combine (D[u, ], D[f, g])
 [[lab]] (D[u, ])
 D[stopf , ]

(u, f(), v) call edge in g


(u, f(), v) call edge in g
k = (u, lab, v) normal edge

Example 2.9.3 Let us regard again the program of Example 2.8.1. The inequalities
at the program points 5, 7, 10 are:
D[5, ]  combine (D[4, ], D[work, main])
D[7, main]  enter  (D[4, ])
D[7, work]  enter (D[9, work])
D[10, main]  combine (D[9, main], D[work, work])
D[10, work]  combine (D[9, work], D[work, work])

As for call strings of length 0, the number of unknowns of the constraint system is
independent of the complete lattice D of the analysis. Termination of the analysis is,
thus, guaranteed if the used complete lattice has only ascending chains that eventually
stabilize.Since the number of contexts which are distinguished is larger than for the
constraint system for call string 0, the analysis result is potentially more precise.
For constant propagation, e.g., it may find more interprocedural constants than an
analysis with call string 0.

2.10 Exercises
1. Parameter passing
Describe a general method using global variables for passing call-by-value parameters.
Show that global variables can also be used for returning results of a procedure
call.
2. Reference parameter
Extend the example programming language by the possibility to store the address
& A of a local variable A in another variable or in memory.
(a). Design a simple analysis to determine a superset of the local variables whose
address is taken and stored in another variable or in memory.

138

2 Interprocedural Optimization

(b). Improve the precision of your analysis by additionally determining sets of


local variables A whose address & A is taken, but only assigned to local
variables
(c). Explain how your analyses can be used in last-call optimization.
3. Removal of recursion, inlining
Regard the program:
f1 () {
if (n 1) z y;
else {
n n 1;
z x + y;
x y;
y z;
f1 ();
}
}

f() {
x 1;
y 1;
f1 ();
}

main() {
n M[17];
f();
M[42] z;
}

Remove the recursion of function f1 . Perform inlining.


4. Interprocedural available assignments
Design an interprocedural elimination of redundant assignments.
Design an interprocedural analysis of available assignments. For that, take
into account that there are global as well as local variables. How should the
abstract functions enter  and combine be defined?
Is your operator H  distributive? Is it possible to describe all abstract effects
of edges in the control-flow graph by means of functions of the form f x =
(x a) b for suitable sets a, b?
Use the information computed by your availability analysis to remove certain
redundancies.
5. Interprocedural partial redundancies
Design an interprocedural elimination of partially redundant assignments.
(a). Design an interprocedural analysis of very busy assignments.
How should the abstract functions enter  and combine be defined? For
that, take into account that the analysis should be backward! Is your operator
H  distributive? Is it possible to describe all abstract effects of edges in the
control-flow graph by means of functions of the form f x = (x a) b for
suitable sets a, b?
(b). Use the information computed by your very business analysis to remove
partial redundancies.

2.11 Literature

139

2.11 Literature
A first approach to the static analysis of programs with procedures is contained
in the article by Cousot and Cousot (1977). Independent of this work, Sharir and
Pnueli present the functional and the call-string approaches in 1981. This article contains an interprocedural coincidence theorem for procedures without local variables.
A generalization for procedures with local variables is contained in Knoop and Steffen (1992). Fecht and Seidl present in 1999 a discussion of several local fixed-point
algorithms as they are applicable to Prolog programs.
An application of interprocedural analysis methods for the precise analysis of
loops is described by Martin et al. (1998).
An important optimization of object-oriented programs attempts to identify
data objects of fixed size that do not escape from a given method call. These do
not need to be allocated on the heap, but can be allocated directly on the stack
(Choi et al. 1999).
Inlining is very important for object-oriented programs because they often have
many small functions. These consist of only a few statements such that the effort
needed for a call is greater than for the method itself. However, aggressive inlining
finds its limits at dynamic method calls. The static type of an object may deviate
from the run-time type of the object. This means that the method actually applied
may be statically unknown. An interprocedural analysis is used to determine the
exact dynamic type of the object at the call site. For a fast static analysis, as used in
just-in-time compilers, simple, context-insensitive analysis such as the Rapid Type
Analysis (Bacon 1997) are used. It only considers the call sites of the program and
ignores all variables and assignments. The precision can be improved at the cost of
its efficiency by considering assignments to variables and their types (Sundaresan
et al. 2000).
Our list of program optimizations is by no means complete. We have not discussed
the reduction in operator strength (Paige and Schwartz 1977; Paige 1990; Sheldon
et al. 2003) or methods dedicated to programs working with arrays, and concurrent
programs.

Chapter 3

Optimization of Functional Programs

In a somewhat naive view, functional programs are imperative programs without


assignments.
Example 3.0.1 Consider the following program fragment written in the functional
language OCaml.
let rec fac2 x y = if y 1 then x
else fac2 (x y) (x 1)
in let fac x = fac2 1 x
Some concepts known from imperative languages are missing. There is no sequential
control flow and no loop. On the other hand, almost all functions are recursive. 
Besides recursive functions we have some more concepts in functional languages,
such as OCaml, Scala, and Haskell, which are rarely provided by imperative
languages, like pattern matching on structural values, partial application of higherorder functions, or lazy evaluation of function arguments. The type systems often
provide polymorphic types, and the implementation attempts to determine types by
type inference.
To increase portability of the compiler, some implementations of functional programming languages first compile to an imperative language. The Glasgow Haskell
compiler ghc, for example, offers the compilation to C as an option. Any compiler for
C can then be used to produce executable code. Other compilers for functional languages compile directly to some suitable virtual machine. The compiler for Scala
compiles to the Java Virtual Machine, while the compiler for F# generates .NET
instructions. One possibility to optimize functional programs, which are compiled
to an imperative intermediate language, is to exploit the optimizations offered by the
compiler for the imperative intermediate language.This strategy is not so bad, considering that compilers for functional languages typically generate nontrivial control
flow by translating sequences of let definitions into sequences of assignments, and
tail calls into unconditional jumps. Both calls of fac2 in our example program, for

H. Seidl et al., Compiler Design, DOI: 10.1007/978-3-642-17548-0_3,


Springer-Verlag Berlin Heidelberg 2012

141

142

3 Optimization of Functional Programs

example, are tail calls. Ignoring the allocation of all values on the heap, including the
int values, the imperative program generated for the function fac could look like this:
int fac(int x) {
int a, a1 , b, b1
a 1; b x;
fac2 : if (b 1) return a;
else {
a1 a b; b1 b 1;
a a1 ; b b1 ;
goto fac2;
}
}
The intraprocedural optimizations for imperative programs described so far can,
therefore, also be used to improve functional programs. Assignments to dead variables can be removed. Constants or copies can be propagated. In the example, the
temporary variables a1 , b1 can be eliminated; these variables were introduced for the
evaluation of the arguments of the recursive applications of fac2.
In general, the control flow resulting from the translation of functional programs
into imperative programs is quite confusing, both for human readers and static analysis. Better results in the analysis and the optimization of functional programs can be
obtained if the specific properties and sources of inefficiencies of functional programs
are taken into account.

3.1 A Simple Functional Programming Language


As in the book Compiler DesignVirtual Machines (Wilhelm and Seidl), we restrict
ourselves to a small fragment of the functional programming language OCaml. We
consider expressions e and patterns p according to the following grammar:
e ::= b | (e1 , . . . , ek ) | c e1 . . . ek | fun x e
| (e1 e2 ) | (1 e) | (e1 2 e2 ) |
let x1 = e1 in e0 |
let rec x1 = e1 and . . . and xk = ek in e
match e0 with p1 e1 | . . . | pk ek
if e0 then e1 else e2
p ::= b | x | c x1 . . . xk | (x1 , . . . , xk )
where b denotes a value of a base type, x a variable, c a data constructor, and i an
i-place operator, which returns values of base type. Note that all functions are unary.
However, OCaml provides tuples (e1 , . . . , ek ) of arbitrary length k 0, which can
be used to implement multiargument functions. Formal parameters x1 , . . . , xk are not

3.1 A Simple Functional Programming Language

143

listed on the left sides of function definitions. Instead, functional abstraction is always
used: fun x1 fun x2 . . . fun xk . . .. We also omit function definitions by
cases since these can be equivalently formulated by match expressions. Furthermore,
we assume that all programs are well-typed.
A function max computing the maximum of two numbers looks as follows:
let max = fun x fun y if x1 < x2 then x2
else x1

3.2 Some Simple Optimizations


This section presents some simple optimizations for functional programs. The basic
idea behind all of them is to move evaluations from run-time to compile-time.
A function application (fun x e0 ) e1 can be rewritten into the let expression, let x = e1 in e0 . A case distinction can be optimized if the expression to
which patterns are compared is already partly known at compile-time. Consider the
expression
match c e1 . . . ek with . . . c x1 . . . xk e . . .
where all patterns to the left of c x1 . . . xk start with a constructor, which is different
from c. The compiler knows then that only the alternative for c x1 . . . xk may match.
The expression can, therefore, be transformed into
let x1 = e1 . . . in let xk = ek in e
Both transformations are semantics-preserving and replace more complex program
constructs by let expressions.
A let expression let x = e1 in e0 can be rewritten into e0 [e1 /x], that is, into the
main expression e0 , in which each free occurrence of x is replaced by the expression
e1 .This transformation corresponds to the reduction in the calculus. It must,
however, be taken care that none of the variables occurring free in e1 is bound by the
substitution into e0 .
Example 3.2.1 Consider the expression:
let x = 17
in let f = fun y x + y
in let x = 4
in f x
The variable x that is visible in the definition of f represents the value 17, while the
variable x that is visible in the application f x represents the value 4. The expression,
therefore, evaluates to 21.

144

3 Optimization of Functional Programs

The application of the let optimization to the second let returns the expression:
let x = 17
in let x = 4
in (fun y x + y) x
The variable x, now visible in the function as well as in its argument, represents the
value 4. The expression evaluates to the value 8.

There exist several possibilities to solve this problem. The simplest possibility,
which we will also use, consists in renaming the variables that are bound in e0 in
such a way that their new names are different from those of variables occurring free
in e1 . Renaming of this kind is called conversion.
Example 3.2.2 Consider again the expression of Example 3.2.1. The free variable
x of the function fun y x + y occurs as bound variable in the expression into
which the function is to be substituted. Renaming this occurrence of x results in the
expression:
let x = 17
in let f = fun y x + y
in let x  = 4
in f x 
The substitution of fun y x + y for f produces:
let x = 17
in let x  = 4
in (fun y x + y) x 
Evaluation of this expression produces the correct result, 21.

No renaming of variables is necessary if the free variables of the expression e1 do not


occur bound in e0 . This is in particular the case if e1 does not have any free variables.
The transformation of let expressions is only an improvement if its application
does not lead to additional evaluations. This is definitely the case in the following
three special situations:
Variable x does not occur in e0 . In this case the evaluation of e1 is completely
avoided by applying the transformation.
Variable x occurs exactly once in e0 . In this case the evaluation of e1 is just moved
to another position.
Expression e1 is just a variable z. In this case all accesses to variable x in e0 are
replaced by accesses to variable z.
But attention! The application of the let transformation, including conversion,
preserves the semantics only if the functional language prescribes lazy evaluation
for let expressions, as is the case in Haskell. With lazy evaluation the argument e1

3.2 Some Simple Optimizations

145

in the application (fun x e0 ) e1 will only be evaluated if and when the value of
x is accessed during the evaluation of e0 .
Eager evaluation of let expressions, as in OCaml, will evaluate the expression
e1 in any case. The evaluation of the whole let expression does not terminate if the
evaluation of e1 does not terminate. If x does not occur in the main expression e0
or only in a subexpression that is not evaluated, then evaluation of the transformed
expression may still terminate although the evaluation of the original expression
would not.
Example 3.2.3 Consider the program:
let rec f = fun x 1 + f x
in let y = f 0
in
42
With lazy evaluation, the program returns the value 42. With eager evaluation, the
program does not terminate since the evaluation of f 0 is started before the value 42
is returned. The variable y does not occur in the main expression. The application of
the let optimization would remove the evaluation of f 0. Consequently, the evaluation
of the optimized program would terminate and return 42.

One is perhaps not worried about an improved termination behavior. The situation is
different if the evaluation of e1 has side effects which are required. This cannot happen in our small example language. OCaml expressions may well raise exceptions
or interact with their environment, independently of whether their return value is
accessed or not. In this case the application of the transformation must be restricted
to expressions e1 that neither directly nor indirectly cause side effects. This is definitely the case with variables and expressions that directly represent values such as
functions.
Further optimizations become possible when let definitions are moved in front of
the evaluation of an expression.
((let x = e in e0 ) e1 )

= (let x = e in e0 e1 ),
if x is not free in e1
(let y = e1 in let x = e in e0 ) = (let x = e in let y = e1 in e0 ),
if x is not free in e1 and y is not free in e
(let y = let x = e in e1 in e0 ) = (let x = e in let y = e1 in e0 ),
if x is not free in e0

The applicability of these rules is not restricted if no side effects are involved. Even
the termination behavior does not change. The application of these rules may move
a let definition further to the outside creating chances, e.g., for the application of
the transformation Inlining presented in the next section. Further movements of let
definitions are discussed in Exercises 1, 2, and 3.

146

3 Optimization of Functional Programs

3.3 Inlining
As for imperative programs, inlining for functional programs is performed to save
the costs associated with function application. Inlining of a function f means that the
body of f is copied at the place of application. This is quite analogous to inlining
in imperative languages as treated in Sect. 2.3. A notable difference between the
two transformations is that our imperative core language simulated parameter and
result passing by copying values between global and local variables and possibly
also memory. We, therefore, assumed procedures to have no parameters. Under this
assumption, procedure inlining just means to replace the call to a procedure by a
copy of its body.
With functional languages, me must make passing of parameters explicit. Let us
assume that a function f is defined as let f = fun x e0 . Inlining replaces the
application f e1 by:
let x = e1 in e0
Example 3.3.1 Consider the program fragment:
let fmax = fun f fun x fun y
if x > y then f x
else f y
in let max = fmax (fun z z)
Applying inlining to the definition of max results in:
let max = let f = fun z z
in fun x fun y if x > y then f x
else f y
Inlining of f then delivers:
let max = let f = fun z z
in fun x fun y if x > y then let z = x
in z
else let z = y
in z
Applying the let optimizations for variables and useless constant definitions yields:
let max = fun x fun y if x > y then x
else y

The inlining transformation can be understood as a combination of a restricted


case of the let optimization of the preceding section with the optimization of function
application. The let optimization is partly applied to let expressions of the form

3.3 Inlining

147

let f = fun x e0 in e where the functional value fun x e0 is only copied to


such occurrences in e at which f is applied to an argument. Subsequently, optimization
of function applications is performed at these places. Inlining requires, as did the
previous optimizations, some care to guarantee correctness and termination of the
transformation. conversion of the bound variables in the expression e before the
application of inlining must guarantee that no free variable in e0 will be bound by
the transformation.
As in the case of imperative languages, inlining is only applied to nonrecursive
functions. In our core language, these are the let-defined functions. In itself, this does
not suffice to guarantee termination of the transformation in all functional languages.
Example 3.3.2 Consider the program fragment:
let w = fun f fun y f(y f y)
in let fix = fun f w f w
Neither w nor fix are recursive. We may apply inlining to the body w f w of the
function fix. With the definition of w this yields for fix the function:
fun f let f = f in let y = w in f(y f y) ,
which can be simplified to:
fun f f(w f w) .
Inlining can be repeated. After k repetitions this results in:
fun f fk (w f w) ,
and inlining can again be applied.

Nontermination as in our example can happen in untyped languages like Lisp.The


function w is, however, rejected as not typeable in typed languages like OCaml and
Haskell. In these languages, inlining of let-defined functions always terminates and
yields uniquely determined normal forms up to the renaming of bound variables. Still,
in untyped languages, the problem of potential nontermination can be pragmatically
solved: Whenever inlining lasts too long it may be stopped.

3.4 Specialization of Recursive Functions


Inlining is a technique to improve the efficiency of applications of nonrecursive functions. What can be done to improve the efficiency of recursive functions? Let us look
at one particular programming technique often used in functional programming languages. It uses recursive polymorphic functions of higher order as, e.g., the function

148

3 Optimization of Functional Programs

map. Such functions distill the algorithmic essence of a programming technique and
are instantiated for a particular application by supplying the required parameters,
including functional parameters.
Example 3.4.1 Consider the following program fragment:
let f = fun x x x
in let rec map = fun g fun y match y
with
[] [ ]
| x1 :: z g x1 :: map g z
in map f list
The actual parameter of the function application map f is the function fun x
x x. The function application map f list thus represents a function that squares all
elements of the list list. Note that we have as usual written the list constructor :: as
infix operator between its two arguments.

Let f be a recursive function and f v an application of f to an expression v that
represents a value, that is, either another function or a constant. Our goal is to introduce a new function h for the expression f v. This optimization is called function
specialization.
Let f be defined by let rec f= funx e. We define h by:
let h = let x = v in e
Example 3.4.2 Let us regard the program fragment of Example 3.4.1.
let h = let g = fun x x x
in fun y match y
with
[] [ ]
| x1 :: z g x1 :: map g z
The function map is recursive. Therefore, the body of the function h contains another
application of map. Specialization of map for this application would introduce a
function h1 with the same definition (up to renaming of bound variables) as h. Instead
of introducing this new function h1 , the application map g is replaced by the function
h. This replacement of the right side of a definition by its left side is called function
folding. Function folding in the example yields:
let rec h = let g = fun x x x
in fun y match y
with
[] [ ]
| x1 :: z g x1 :: h z

3.4 Specialization of Recursive Functions

149

The definition of h no longer contains any explicit application of the function


map and is itself recursive. Inlining of the function g yields:
let rec h = let g = fun x x x
in fun y match y
with
[] [ ]
| x1 :: z ( let x = x1
in x x ) :: h z
The removal of superfluous definitions and variable-to-variable bindings finally
yields:
let rec h = fun y match y
with
[] [ ]
| x1 :: z x1 x1 :: h z

We can, in general, not expect that the recursive calls of the function to be specialized, can be immediately folded to another application of specialization. Worse, it
may happen that continuing specialization leads to an infinite number of auxiliary
functions and to nontermination. A pragmatic point of view would again stop the
endless creation of new functions when the number of auxiliary functions exceeds a
given threshold.

3.5 An Improved Value Analysis


Inlining and function specialization optimize function applications f e where we
assume that the function f is defined in an enclosing let- or letrec-expression. Functional languages, however, allow functions to be passed as arguments or be returned
as results. The applicability of inlining and function specialization therefore relies
on an analysis that for each variable determines a superset of its potential run-time
values. This is the goal of the next analysis.
In a first step, the analysis identifies the subset E of the set of expressions occurring
in a given program whose values should be determined. This set E consists of all
subexpressions of the program that are variables, function applications, let-, letrec-,
match-, or if-expressions.
Example 3.5.1 Consider the following program:
let rec from = fun i i :: from (i + 1)
and first = fun l match l with x :: xs x
in first (from 2)

150

3 Optimization of Functional Programs

The set E consists of the expressions:


E = {from, i, from (i + 1), first, l, x, from 2, first (from 2),
match l with . . . , let rec from = . . .}

Let V be the set of the remaining subexpressions of the program. The expressions in
V , thus, are either values such as function abstractions or constants, or they provide
at least the outermost constructor or outermost operator application. In the program
of Example 3.5.1, the set V therefore is:
V = {fun i . . . , i :: from (i + 1), i + 1, fun l . . . , 2}
Each subexpression e in V can be decomposed in a unique way into an upper part, in
which only constructors, values, or operators occur, and the maximal subexpressions
e1 , . . . , ek from the set E below those.
The upper part is represented by a k-place pattern, that is, a term in which
the pattern variables 1 , . . . , k at the leaves stand for the expressions e1 , . . . , ek .
The expression e has the form e t[e1 /1 , . . . , ek /k ], or in shorter form,
e t[e1 , . . . , ek ].
In our example, the expression e (i :: from (i + 1)) can be decomposed into
e t[i, from (i + 1)] for expressions i, from (i + 1) from E and the pattern t
(1 :: 2 ).
Our goal consists in identifying for each expression e E a subset of expressions
from V into which e may develop. We first explain in which sense a relation G
E V for each expression from E defines a set of value expressions and then present
a method to compute such subsets from V . A value expression is an expression v
that is formed according to the following grammar:
v ::=

b | fun x e | c v1 . . . vk | (v1 , . . . , vk ) | 1 v | v1 2 v2

for basic values b, arbitrary expressions e, constructors c and unary and binary
operators 1 , 2 , respectively, that return basic values.
Let G E V be a relation between expressions from E and V . Expressions

e E are associated with the set [[e]]G of all value expressions that can be derived
from e using G. Each pair (e, t[e1 , . . . , ek ]) G for expressions e, e1 , . . . , ek E
and patterns t can be seen as the inequality


[[e]]G t[[[e1 ]]G , . . . , [[ek ]]G ]


Here, we interpret the application of the pattern t to sets V1 , . . . , Vk as the set
t[V1 , . . . , Vk ] = {t[v1 , . . . , vk ] | vi Vi }

3.5 An Improved Value Analysis

151

The sets [[e]]G , e E, are defined as the least solution of this system of inequalities.
Example 3.5.2 Let G be the relation
{(i, 2), (i, i + 1)} .


The set [[i]]G consists of all expressions of the form 2 or (. . . (2+1) . . .)+1. Note that
in this analysis, operator applications are treated in the same way as data constructors.

The set [[from (i + 1)]]G  on the other hand is empty for
G  = {(from (i + 1), i :: from (i + 1))}

A relation G can be seen as a regular tree grammar with the set of nonterminals
E and constants and function abstractions as 0-ary terminal symbols, and operators
and constructors as multiplace terminal symbols. For an expression e E, i.e., a

nonterminal, the set [[e]]G denotes the set of terminal expressions derivable from e

according to the grammar (see Exercises 4 and 5). The sets [[e]]G are, in general,
infinite. The relation G, however, is a finite representation of these sets, which makes

it possible to decide simple properties of the sets [[e]]G . The most important question

is whether [[e]]G contains a certain term v. This question can be easily answered if
v is a function abstraction fun x e . Each pair (e, u) from G has a right side u
that is either a constant, a function, or the application of a constructor or an operator.

Therefore, (fun x e ) [[e]]G holds if and only if (e, fun x e ) G.
Further examples of properties that can be easily decided are:


Is [[e]]G nonempty?

Is [[e]]G finite, and if yes, of which elements does this set consist?
The goal of our value analysis is to construct a relation G for a program such that

for each expression e of the program, the set [[e]]G contains all values to which
e may develop during runtime, relative to the bindings for the free variables in e.
The relation G E V is defined by means of axioms and derivation rules. For
convenience, we will not define the relation G itself but the relation which is
the relation G, extended with all pairs (v, v) for v V . Also for convenience, we
write the relation in infix notation. Axioms represent those relationships that hold
without preconditions:
vv
(v V )
i.e, expressions from the set V are related to themselves. We supply rules for each
program construct:
Function application. Let e (e1 e2 ). We have the rules:

152

3 Optimization of Functional Programs

e1 fun x e0
ev

e0 v

e1 fun x e0
xv

e2 v

If the function expression of a function application evaluates to a function


fun x e0 and the main expression e0 of the function evaluates to a value
v, then the function application may develop into v. If, on the other hand, the
argument of the function application evaluates to a value v , then v is a potential
value of the formal parameter x of the function.
let-Definition. Let e let x1 = e1 in e0 . We then have the rules:
e0 v
ev

e1 v
xv

If the main expression e0 of a let-expression evaluates to a value v, then so


does the whole let-expression. Each value for the expression e1 represents a
potential value for the local variable x. A similar consideration justifies the rules
for letrec-expressions.
letrec Definition. For e let rec x1 = e1 . . . and xk = ek in e0 , we have:
e0 v
ev

ei v
xi v

Case distinctions. Let e match e0 with p1 e1 | . . . | pm em .


If pi is a basic value we have the rules:
ei v
ev
If on the other hand, pi c y1 . . . yk , we have:
e0 c e1 . . . ek ei v
ev

e0 c e1 . . . ek ej v


( j = 1, . . . , k)
y j v

If finally pi is a variable y, we have:


ei v
ev

e0 v
yv

If an alternative evaluates to a value, then the whole case distinction may evaluate
to that value, as long as the corresponding pattern cannot be statically excluded
for the evaluation of the expressions e0 . The analysis does not track the exact
values of operator applications. Therefore, basic values are always assumed to
be possible. This is different for patterns c y1 . . . yk . Such a pattern matches the
value of e0 only if e0 v holds, where c is the outermost constructor of v. In this

3.5 An Improved Value Analysis

153

case v has the form v = c e1 . . . ek , and the values for ei are potential values for
the variables xi .
Conditional expressions. Let e if e0 then e1 else e2 . For i = 1, 2 we have:
ei v
ev
These rules are similar to the rules for match-expressions where basic values are
used as patterns.
Example 3.5.3 Consider again the program
let rec from = fun i i :: from (i + 1)
and first = fun l match l with x :: xs x
in first (from 2)
This program terminates only with lazy evaluation like in Haskell. In OCaml,
however, with eager evaluation, the application from 2 and with it the whole program
does not terminate. One possible derivation of the relation x2 would look as
follows:
fromfun i i :: from (i + 1)
firstfun l . . .
from 2i :: from (i + 1)
li :: from (i + 1)
x2

fromfun i i :: from (i + 1)
i2

We have left out occurrences of axioms vv. For e E let G(e) be the set of all
expressions v V , for which ev can be derived.
The analysis of this program delivers for the variables and function applications:
G(from)
= {fun i i :: from (i + 1)}
G(from (i + 1)) = {i :: from (i + 1)}
G(from 2)
= {i :: from (i + 1)}
G(i)
= {2, i + 1}
G(first)
= {fun l match l . . .}
= {i :: from (i + 1)}
G(l)
G(x)
= {2, i + 1}
G(xs)
= {i :: from (i + 1)}
G(first (from 2)) = {2, i + 1}
We conclude that the evaluation of the expressions from 2 and from (i + 1) never
delivers a finite value. On the other hand, the variable i will potentially be bound to
expressions with values 2, 2 + 1, 2 + 1 + 1, . . .. According to the analysis, the main
expression evaluates to one of the values 2, 2 + 1, 2 + 1 + 1, . . ..


154

3 Optimization of Functional Programs

The sets G(e) can be computed by fixed-point iteration. A more clever implementation does not calculate with sets of expressions, but propagates expressions v V
individually. When v is added to a set G(e), the applicability conditions for further
rules might be satisfied, which may add more expressions v  to sets G(e ). This is
the idea of the algorithm of Heintze (1994).
Correctness of this analysis can be shown using an operational semantics for programs with delayed evaluation. We do not present this proof here, but like to mention
that the derivation rules for the relation ev are quite analogous to the corresponding
rules of the operational semantics, with the following notable exceptions:
Operators 1 , 2 on base types are not further evaluated;
At case distinctions depending on basic values, all possibilities are explored nondeterministically;
In case distinctions, the order of the patterns is not taken into account;
The computation of the return value of a function is decoupled from the determination of the potential actual parameters of the function.
The analysis described is also correct for programs of a programming language with
eager expression evaluation. For these, the precision of the analysis can be increased
by requiring additional constraints for rules applications:


The set [[e2 ]]G should not be empty at function applications with argument e2 ;

At let- and letrec-expressions, the sets [[ei ]]G for the right sides ei of locally
introduced variables should not be empty;

At conditional expressions, the sets [[e0 ]]G for the condition e0 should not be empty;

Analogously, at case distinctions match e0 . . ., the set [[e0 ]]G should not be empty.

In the rules for patterns c y1 . . . yk , the sets [[ej ]]G for j = 1, . . . , k for the value


c e1 . . . ek in [[e0 ]]G should not be empty.


In the example, if holds that:


[[l]]G = [[x]]G = [[xs]]G = [[match l . . .]]G = [[first (from 2)]]G =


The analysis therefore finds out that eager evaluation of the application first (from 2)
does not terminate.
The value analysis just presented, produces amazingly precise results. It can be
extended to an analysis of the exceptions possibly thrown during the evaluation of an
expression (see Exercise 7) and an analysis of the set of side effects possibly occurring
during expression evaluation (see Exercise 8). Imprecision, however, can come in
since in the analysis of functions the approximation of the potential parameter values
is decoupled from the determination of the return values. In imperative languages,
this corresponds to an interprocedural analysis which uses call strings of length 0. In
the analysis of polymorphic functions this means that arguments of different types
are not differentiated.

3.6 Elimination of Intermediate Data Structures

155

3.6 Elimination of Intermediate Data Structures


One of the most important data structures offered by functional programming languages is lists. Functional programs compute results from the values in lists, collect
intermediate results in lists, and apply functions to all elements in lists. Program
libraries contain higher-order functions on list arguments supporting this programming style. Examples of such higher-order functions are:
map = fun f fun l match l
with [ ] [ ]
| h :: t f x :: map f t
filter = fun p fun l match l
with [ ] [ ]
| h :: t if p h then h :: filter p t
else filter p t)
fold_left = fun f fun a fun l match l with [ ] a
| h :: t fold_left f (f a h) t)
Functions can be composed by function composition:
comp = fun f fun g fun x f (g x)
The next example shows how quite complex functions can be constructed out of
these few components.
Example 3.6.1 The following program fragment supplies functions to compute the
sum of all elements of a list, to determine the length of a list, and to compute the
standard deviation of the elements of the list.
let sum = fold_left (+) 0
in let length = comp sum (map (fun x 1))
in let der
= fun l
let s1
= sum l
in let n
= length l
in let mean = s1 /n
= sum (
in let s2
map (fun x x x) (
map (fun x x mean) l))
in
s2 /n
Here, (+) denotes the function fun x fun y x + y. The definition above
does not show recursion explicitly. However, it is implicitly used to define the functions map and fold_left. The definition of length does not need explicit functional
abstraction fun . . . . On the one hand, the implementation is clearer. On the other
hand, this programming style leads to programs that create data structures for inter-

156

3 Optimization of Functional Programs

mediate results, which actually could be avoided. Function length can directly be
implemented by:
let length = fold_left (fun a fun y a + 1) 0
This implementation avoids the creation of the auxiliary list, which contains one 1
for each element of the input.

The following rules allow for the elimination of some apparently superfluous auxiliary data structures.
comp (map f) (map g)
= map (comp f g)
comp (fold_left f a) (map g) = fold_left (fun a comp (f a) g) a
comp (filter p1 ) (filter p2 )
= filter (fun x if p2 x then p1 x
else false)
comp (fold_left f a) (filter p) = fold_left (fun a fun x if p x then f a x
else a) a
The evaluation of the left sides always needs an auxiliary data structure, while the
evaluation of the right sides does not. Applying such rules for eliminating intermediate data structures is called deforestation. Deforestation allows to optimize the
function length of Example 3.6.1. However, left sides and right sides are now no
longer equivalent under all circumstances. In fact, these rules may only be applied
if the functions f, g, p1 , p2 that occur have no side effects. Another problem of this
optimization consists in recognizing when it can be applied. Programmers often do
not use explicit function composition to sequentially apply functions. They instead
use directly nested function applications. An example is the definition of function
der in Example 3.6.1. For this case, the transformation rules should be written as:
map f (map g l)
= map (fun z f (g z)) l
fold_left f a (map g l) = fold_left (fun a fun z f a (g z)) a l
= filter (fun x if p2 x then p1 x
filter p1 (filter p2 l)
else false) l
fold_left f a (filter p l) = fold_left (fun a fun x if p x then f a x
else a) a l
Example 3.6.2 The application of these rules to the definition of function der of
Example 3.6.1 leads to:

3.6 Elimination of Intermediate Data Structures

157

let sum = fold_left (+) 0


in let length = fold_left (fun a fun z a + 1) 0
= sum l
in let der
= fun l
let s1
in let n
= length l
in let mean = s1 /n
= fold_left (fun a fun z
in let s2
(+) a (
(fun x x x) (
(fun x x mean) z))) 0 l
in
s2 /n
Applying the optimization of function application repeatedly and performing let
optimization leads to:
let sum = fold_left (+) 0
in let length = fold_left (fun a fun z a + 1) 0
= sum l
in let der
= fun l
let s1
in let n
= length l
in let mean = s1 /n
= fold_left (fun a fun z
in let s2
let x = z mean
in let y = x x
in
a + y) 0 l
in
s2 /n
All intermediate data structures have disappeared. Only applications of the functions
fold_left remain. The function fold_left is tail recursive such that the compiler can
generate code that is as efficient as loops in imperative languages.

Sometimes, a first list of intermediate results is produced by tabulation of a function.
Tabulation of n values of a function f : int produces a list:
[f 0; . . . ; f (n 1)]
A function tabulate to compute this list can be defined in OCaml by:
let tabulate = fun n fun f
let rec tab = fun j if j n then [ ]
else (f j) :: tab ( j + 1)
in tab 0
Under the conditions that all occurring functions terminate and have no side effects
it holds that:

158

3 Optimization of Functional Programs

map f (tabulate n g)

= tabulate n (comp f g)
= tabulate n (fun j f (g j))
fold_left f a (tabulate n g) = loop n (fun a comp (f a) g) a
= loop n (fun a fun j (f a (g j)) a
Here we have:
let loop
= fun n fun f fun a
let rec doit = fun a fun j if j n then a
else doit (f a j) ( j + 1)
in doit a 0
The tail-recursive function loop corresponds to a for loop: The local data are collected
in the accumulating parameter a while the functional parameter f determines how
the new value for a after the jth iteration is computed from the old values of a and
j. The function loop computes its result without a list as auxiliary data structure.
The applicability of the rules essentially depends on whether composition of the
functions fold_left f a, map f, filter p are recognized. This structure, however, may
occur rather indirectly in a given program. Subexpressions may be contained in let
definitions or may be passed as parameters to the appropriate position. Application
of the let optimizations of Sect. 3.2 may help here, or in more complex situations the
value analysis of Sect. 3.5.
The principle of deforestation can be generalized in different directions:
Other functions on lists can be considered besides the considered functions, like
the function rev, which reverses the order of the elements of a list, the tailrecursive version rev_map of the function map, and the function fold_right (see
Exercise 10).
The suppression of intermediate data structures is also possible for index-dependent
versions of the functions map and fold_left (see Exercise 11).
Let l denote the list [x0 ; . . . ; xn1 ] of type  b list. The index-dependent version
of map receives as argument a function f of type int  b  c and returns for
l the list:
[f 0 x0 ; . . . ; f (n 1) xn1 ]
In analogy, the index-dependent version of the function fold_left receives as argument a function f of type int  a  b  a, an initial value a of type  a and
computes the value
f (n 1) (. . . f 1 (f 0 a x0 ) x1 . . .) xn1
The functions map and fold_left can, in full generality, be defined for user-defined
functional data types, although this is not too frequently done. At least in principle, the same optimizations can be applied as we presented them for lists (see
Exercise 12).

3.7 Improving the Evaluation Order: Strictness Analysis

159

3.7 Improving the Evaluation Order: Strictness Analysis


Functional programming languages such as Haskell delay the evaluation of expressions until their evaluation is strictly necessary. They evaluate the defining expressions of let-defined variables in the same way as the actual parameters of functions,
namely only when an access to the values happens. Such a lazy evaluation allows
for an elegant treatment of (potentially) infinite data structures. Each execution of a
program working on such potentially infinite data structures will, in fact, only use
a finite part for the computation of its result. The lazy evaluation of an expression
e causes additional costs since a closure for e has to be constructed allowing for a
later evaluation of e.
Example 3.7.1 Consider the following program:
let rec from = fun n n :: from (n + 1)
and take = fun k fun s if k 0 then [ ]
else match s with [ ] [ ]
| h :: t h :: take (k 1) t
Lazy evaluation of the expression take 5 (from 0) produces the list [0; 1; 2; 3; 4],
while eager evaluation using call-by-value passing of parameters causes nontermination.

Lazy evaluation, however, has its disadvantages. Even tail-recursive functions might
not always only consume constant amount of space.
Example 3.7.2 Consider the following program fragment:
let rec fac2 = fun x fun a if x 0 then a
else fac2 (x 1) (a x)
Lazy evaluation creates one closure for each multiplication in the accumulating
parameter. The nested sequence of closures is only evaluated when the recursion
arrives at the application fac2 x 1. It would be much more efficient to immediately
evaluate the multiplication.

It is often more efficient to eagerly evaluate an expression, thereby avoiding the
construction of a closure, instead of delaying it. This is the aim of the following
optimization.
For a simplification, we start with programs that neither use composed data structures nor higher-order functions. In addition, we assume that all functions are defined
on the top level. To describe the transformation, we introduce a construct
let# x = e1 in e0

160

3 Optimization of Functional Programs

that forces the evaluation of the expression e1 whenever the value of e0 is needed.
The goal of the optimization is to replace as many let-expressions as possible by let#expressions without changing the termination properties of the program. Strictness
analysis determines the necessary information about the termination properties of
expressions. A k-place function f is called strict in its jth argument, 1 j k, if the
evaluation of the expression f e1 . . . ek does not terminate whenever the evaluation of
e j does not terminate. The evaluation of the jth argument e j can be forced without
changing the termination behavior if the function is strict in its jth argument. The
compiler may then replace f e1 . . . ek by
let# x = e j in f e1 . . . e j1 x e j+1 . . . ek
Analogously, the compiler can replace a let-expression let x = e1 in e0 by the
expression
let# x = e1 in e0
if the evaluation of e0 does not terminate whenever the computation of e1 does not
terminate.
The simplest form of a strictness analysis only distinguishes whether the evaluation of an expression does definitely not terminate or maybe terminates and delivers
a value. Let 2 be the finite lattice consisting of the two values 0 and 1, where 0 < 1.
The value 0 is associated with an expression whose evaluation does definitely not
terminate. The value 1 denotes possible termination. A k-place function f is described
by an abstract k-place function:
[[f]] : 2 . . . 2 2
The fact [[f]] 1 . . . 1 0 1 . . . 1 = 0 (0 in the jth argument) allows us to derive that
an application of function f definitely does not terminate if the evaluation of the jth
argument does not terminate. The function f, therefore, is strict in its jth argument.
We construct a system of equations to determine abstract descriptions f for all
functions f of the program. This needs the abstract evaluation of expressions as an
auxiliary function. This abstract evaluation is defined with respect to a value binding
for the free variables of base types and a mapping of functions to their actual
abstract descriptions:
=1
[[b]]
=x
[[x]]
[[1 e]]
= [[e]]

= [[e1 ]] [[e2 ]]
[[e1 2 e2 ]]

[[if e0 then e1 else e2 ]] = [[e0 ]] ([[e1 ]] [[e2 ]] )
= (f) ([[e1 ]] ) . . . ([[ek ]] )
[[f e1 . . . ek ]]

= [[e]] ( {x1 
[[e1 ]] })
[[let x1 = e1 in e]]


= ([[e1 ]] ) ([[e]] ( {x1 
1}) )
[[let# x1 = e1 in e]]

3.7 Improving the Evaluation Order: Strictness Analysis

161

The abstract evaluation function [[.]] interprets constants as the value 1. Variables
are looked up in . Unary operators 1 are approximated by the identity function
since the evaluation of an application does not terminate whenever the evaluation of
the argument does not terminate. Binary operators are, analogously, interpreted as
conjunction. The abstract evaluation of an if-expression is given by b0 (b1 b2 ),
where b0 represents the abstract value of the condition and b1 , b2 represent the
abstract values for the two alternatives. The intuition behind this definition is that
when evaluating a conditional expression the condition needs to be evaluated in any
case while only one of the two alternatives must be evaluated. For a function application, the actual abstract value of the function is looked up in the function binding is
and applied to the values determined recursively for the argument expressions. For a
let-defined variable x in an expression e0 , first the abstract value for x is determined
and then the value of the main expression e0 with respect to this value. If the variable
x is let#-defined, it must be ensured that the overall expression obtains the abstract
value 0 when the value obtained for x is 0.
Example 3.7.3 Consider the expression e, given by
if x 0 then a
else fac2 (x 1) (a x)
b1 , a 
b2 }. Mapping
For values b1 , b2 2, let be the variable binding = {x 
associates the function fac2 with the abstract function fun x fun a x a.
The abstract evaluation of e produces the value:
[[e]] = (b1 1) (b2 ( fac2) (b1 1) (b2 b1 ))
= b1 (b2 (b1 b2 ))
= b1 b2

The abstract expression evaluation constructs for each function f = fun x1
. . . fun xk e defined in the program the equations
(f) b1 . . . bk = [[ei ]] {x j 
b j | j = 1, . . . , k}
for all b1 , . . . , bk 2. The right side depends monotononically on the abstract values
(f). Therefore, this system of equations possesses a least solution, denoted by [[f]] .
Example 3.7.4 For the function fac2 of Example 3.7.2 one obtains the equations
[[fac2]] b1 b2 = b1 (b2 [[fac2]] b1 (b1 b2 ))
Fixed-point iteration successively delivers for [[fac2]] the abstract functions:

162

3 Optimization of Functional Programs

0
1
2

fun x fun a 0
fun x fun a x a
fun x fun a x a

Note that the occurring abstract functions have been represented by boolean expressions instead of by their value tables. We conclude that function fac2 is strict in both
its arguments. The definition of fac2 can thus be transformed into
let rec fac2 = fun x fun a if x 0 then a
else
let# x  = x 1
in let# a  = x a
in
fac2 x  a 

The analysis produces the precise results for this example. Correctness of the analysis follows since the abstract expression evaluation is an abstraction of the concrete
expression evaluation as provided by the denotational semantics of our functional
language. The denotational semantics uses a partial order for the integer numbers
that consists of the set of numbers Z together with a special value that represents a nonterminating evaluation where the order relation is given by  z for
all z Z. The abstract denotational semantics which we have constructed, however, interprets basic values and operators over the lattice 2. Our description relation
between concrete and abstract values is
0 and z 1 for z Z
The proof that the given analysis always delivers correct values is by induction over
the fixed-point iteration.
In the following, the analysis is extended beyond values of base types to structured
data. So far, the only distinction made is whether a functions argument is totally
needed or not needed at all for computing the result. With structured data, functions
may access their arguments to different depths.
Example 3.7.5 The function
let hd = fun l match l with h :: t h
only visits the topmost list constructor of its argument and returns its first element.
The function length of Example 3.6.1, in contrast, needs all list constructors and the
empty list at the end of the argument list to compute its result.

We now consider programs that, besides basic values, manipulate lists and tuples.
Accordingly, we extend the syntax of expressions by admitting the corresponding
constructors for structured data.

3.7 Improving the Evaluation Order: Strictness Analysis

163

e ::= . . . | [ ] | e1 :: e2 | match e0 with [ ] e1 | h :: t e2


| (e1 , . . . , ek ) | match e0 with (x1 , . . . , xk ) e1
The first question we like to answer is whether functions access topmost constructors
of their arguments or not. A function f is root-strict in its ith argument if the topmost
constructor of the ith argument is needed to compute the topmost constructor of an
application of function f. For basic values, root-strictness agrees with strictness, as
considered so far. As with strictness for basic values, we use a construct let# x =
e1 in e0 that evaluates the value of x up to the root constructor before evaluating the
root constructor of e0 .
As with strictness properties of functions on basic values we describe rootstrictness using boolean functions. The value 0 represents only the concrete value
(nonterminating computation), the value 1, in contrast, represents all other values,
for instance, the list [1; 2] as well as the partially computed lists [1; ] and 1 :: .
Beyond the strictness analysis for basic values, we extend the abstract evaluation
function [[e]] by rules for lists, tuples, and case distinction;
[[match e0 with [ ] e1 | h :: t e2 ]] =
[[e0 ]] ([[e1 ]] [[e2 ]] ( {h, t 
1}))
[[match e0 with (x1 , . . . , xk ) e1 ]] =
[[e0 ]] [[e1 ]] ( {x1 , . . . , xk 
1})
[[[ ]]] = [[e1 :: e2 ]] = [[(e1 , . . . , ek )]] = 1
The abstract evaluation of an expression returns the abstract value 1 if the expression
already provides the topmost constructor of the result. A match-expression for lists
is abstractly evaluated in analogy to an if-expression. In the case of a composed list,
the analysis does not know anything about the values of the two newly introduced
variables. These are, therefore, described by the value 1. The abstract evaluation of
a match-expression for tuples corresponds to the conjunction of the abstract value
of the expression e0 with the value for the body e1 , where the newly introduced
variables are bound to 1.
Example 3.7.6 Let us check our analysis with the example function app, which
concatenates two lists:
let rec app = fun x fun y match x with [ ] y
| h :: t h :: app t y
Abstract interpretation establishes the equations:
[[app]] b1 b2 = b1 (b2 1)
= b1

164

3 Optimization of Functional Programs

for values b1 , b2 2. We conclude that the root constructor of the first argument is
definitely needed for the computation of the root constructor of the result.

In many applications, not only is the root constructor of the result value needed,
but the whole value. A strictness analysis concentrating on that property tries to find
out which arguments of a function are totally needed if the result of the function is
totally needed. This generalization of strictness on basic values to structured values
is called total strictness. The abstract value 0 now describes all concrete values that
definitely contain a , while 1 still describes all values.
The total-strictness properties of functions are again described by boolean functions. The rules for the evaluation of strictness for expressions without structured
data are again extended to constructors for tuples, lists, and match-expressions:
[[match e0 with [ ] e1 | h :: t e2 ]] = let b = [[e0 ]]
in b [[e1 ]]
[[e2 ]] ( {h 
b, t 
1}
1, t 
b}
[[e2 ]] ( {h 
[[match e0 with (x1 , . . . , xk ) e1 ]]
= let b = [[e0 ]]

b, x2 , . . . , xk 
1})
in [[e1 ]] ( {x1 
. . . [[e1 ]] ( {x1 , . . . , xk1 
1, xk 
b})
=1
[[[ ]]]
[[e1 :: e2 ]]
= [[e1 ]] [[e2 ]]

= [[e1 ]] . . . [[ek ]]
[[(e1 , . . . , ek )]]

= [[e]] ( {x1 
[[e1 ]] })
[[let# x1 = e1 in e]]
In the analysis of total strictness, constructor applications need to be treated differently from how they were treated in the analysis of root-strictness. The application
of a data constructor is now interpreted as the conjunction of the abstract values
obtained for the components. For the evaluation of let#-expressions, we recall that
the eagerly evaluated expression is only evaluated to its root constructor. The value
obtained this way might contain without causing nontermination of the evaluation
of the whole expression. The abstract evaluation of a let#-expression is, therefore, not
different from the abstract evaluation of a let-expression. The decomposition of the
value of an expression by applying a match-construct has also changed: The abstract
value of the expression e0 , to which the pattern is matched, is not 0 if it evaluates to
the empty list. This case thus corresponds to the conjunction of the abstract values
of e0 and the abstract value of the expression for the case of an empty list. Two cases
must be considered if the expression e0 evaluates to a composed list.
If the expression e0 produces the value 1 only this value 1 can be assumed for all
components of the list.
If the abstract evaluation of e0 returns 0, either the first element or the rest of the
list must contain . So, either the local variable h or the local variable t must obtain
the value 0. Let b be the value of the expression e0 . These two cases can be compactly
combined by disjunction of the results obtained through abstract evaluation of the
expression for composed lists, where for the newly introduced local variables h, t

3.7 Improving the Evaluation Order: Strictness Analysis

165

the values b, 1 and 1, b, resp., are substituted. A similar disjunction is employed in


the abstract evaluation of match-expressions for tuples. If a tuple is described by 0,
then this is the case when it contains , so at least one of its components must also
contain . This component can be described by 0. If a tuple is described by 1, then
nothing is known about its components, and therefore all of them must be described
by 1.
Example 3.7.7 We test our approach to the analysis of total strictness again with the
function app of Example 3.7.6. Abstract interpretation establishes the equations:
[[app]] b1 b2 = b1 b2 b1 [[app]] 1 b2 1 [[app]] b1 b2
= b1 b2 b1 [[app]] 1 b2 [[app]] b1 b2
for b1 , b2 2. Fixed-point iteration produces the following approximations of the
least fixed point:
0 fun x fun y 0
1 fun x fun y x y
2 fun x fun y x y
We conclude that both arguments are totally needed if the result is totally needed. 
Whether the value of an expressions is totally needed depends on the context of the
expression. For a function f, which possibly occurs in such a context, a variant f#
is required that computes its result possibly more efficiently than f. For simplicity,
we only consider functions f that totally need the values of all their arguments to
compute the result totally. For the implementation of the function f#, we then assume
that their arguments have already been totally evaluated. The implementation must
then guarantee that this also holds for all recursive applications of any variants g#
and that the result is also already totally evaluated.
For the function app the variant app# can be implemented in the following way:
let rec app# = fun x fun y match x with [ ] y
| h :: t
let# t1 = app# t y
in let# r = h :: t1
in r
We assume here that no closures are constructed for variables. A general transformation that systematically exploits information about total strictness is the subject
of Exercise 13.
The programming language fragment for which we have developed several strictness analyses is very restricted. In the following, we briefly sketch how to at least
partly lift these restrictions.
A first assumption was that all functions are defined on the topmost level. Each
program of our OCaml-fragment can be transformed such that this property holds,
see Exercise 15. Alternatively, we associate all free variables of a local function with

166

3 Optimization of Functional Programs

the value 1 (dont know) in the analysis of this local function. This may produce
imprecise information, but at least gives correct results.
Further, we constrained ourselves to k-place functions without functional arguments and results, and without partial applications. This is because the complete
lattice of the k-place monotonic abstract functions 2 . . . 2 possesses properly
ascending chains whose length is exponential in k, that is, the number of arguments.
This is still acceptable since the number of arguments is often not very large. The
number of elements of this lattice, however, is even doubly exponential in k. And
this number matters if higher order functions are to be analyzed. One way out of this
dilemma consists in abstracting the abstract-function domains radically by smaller
complete lattices. For example, one could use the boolean lattice 2 for function
domains: 0 represents the constant 0-function. This strong abstraction will not lead
to precise strictness information for programs that systematically employ higherorder functions. Some of the higher-order functions, however, may be removed by
function specialization as shown in Sect. 3.4 and thereby improve the chances for
strictness analysis.
Strictness analysis, as we have considered it, is only applicable to monomorphically typed functions and monomorphic instances of polymorphically typed functions.
In general, the programmer might have a hard time finding out when the compiler is able to determine and exploit strictness information, and when it fails. The
programming language Haskell therefore provides annotations, which allow the
programmer to force the evaluation of expressions whenever he considers it important
for efficiency reasons.

3.8 Exercises
1. let-Optimization
Consider the following equation:
(fun y let x = e in e1 ) = (let x = e in fun y e1 )
if y does not occur free in e.
(a) Give conditions under which the expressions on both sides are semantically
equivalent.
(b) Give conditions under which the application of this equation from left to right
may contribute to an increase in efficiency of evaluation.
2. letrec-Optimization
This exercise tries to extend the optimizations of let-expressions to optimizations
of letrec-expressions.
(a) Give rules how let-definitions can be moved out of letrec-expressions.

3.8 Exercises

167

(b) Give constraints under which your transformations are semantics preserving.
(c) Give constraints under which your transformations lead to an improvement
in efficiency.
(d) Test your optimizations with some example programs.
3. let-Optimization of if-expressions
What do you think of the following rules?
(if let x = e in e0 then e1 else e2 ) = (let x = e in if e0 then e1 else e2 )
(if e0 then let x = e in e1 else e2 ) = (let x = e in if e0 then e1 else e2 )
where x does not occur free in the expressions e0 , e1 , and e2 .
4. Regular tree grammar
A regular tree grammar is a tuple G = (N , T, P), where N is a finite set of
nonterminal symbols, T is a finite set of terminal constructors, and P is a set
of rules of the form At, and where t is a term that is built from nonterminal
symbols in N using constructors in T . The language LG (A) of the regular tree
grammar G for a nonterminal A is the set of all terminal expressions t derivable
from nonterminal A using rules from P. An expression is called terminal if no
nonterminal occurs in it.
Give regular tree grammars for the following set of trees:
(a) all lists (inner nodes : ::) with an even number of elements from {0, 1, 2};
(b) all lists with elements from {0,1,2} such that the sum of the elements is
even;
(c) all terms with inner nodes :: and leaves {0,1,2} or [ ] that are of type list list
int.
5. Tree grammar (cont.)
Let G be a regular tree grammar of size n and A a nonterminal symbol of G.
Show:
(a) LG (A) 
= if and only if t LG (A) for a t of depth n;
(b) LG (A) is infinite if and only if t LG (A) for a t of depth d where
n d < 2n.
(In particular, define the size of a grammar in such a way that these claims
hold.)
6. Value analysis: case distinction
Modify the algorithm for the value analysis in such a way that it respects the
given order of the patterns in a case distinction.
7. Value analysis: exceptions
Consider the functional core language, extended by the constructs:
e ::= . . . | raise e | (try e with p1 e1 | . . . | pk ek )

168

3 Optimization of Functional Programs

The expression raise e throws an exception of value e, while a try-expression


evaluates the main expression e, and if the computation ends with an exception
then v catches this exception if one of the patterns pi matches v, and throws the
same exception, otherwise.
How should one modify the value analysis to identify the set of exceptions which
the evaluation of an expression may possibly throw?
8. Value analysis: references
Consider the functional core language, extended by destructively modifiable
references:
e ::= . . . | ref e | (e1 := e2 ) |!e
Extend the value analysis to apply it to this extended language. How could one
find out by help of this analysis whether an expression is pure, i.e., that its
evaluation does not modify references?
9. Simplification rules for the identity
Let id= fun x x. Give a set of rules to simplify expressions containing id!
10. Simplification rules for rev
The list of functions considered by deforestation can be further extended.
(a) Define functions rev, fold_right, rev_map, rev_tabulate and rev_loop,
where rev reverses a list. The following equalities hold for the other functions:
fold_right f a = comp (fold_left f a) rev
rev_map f = comp (map f) rev
rev_tabulate n = comp rev tabulate n
rev_loop n should behave like loop n, but the iteration should run from
n 1 to 0 instead of the other way round.
(b) Design rules for the composition of these functions and for these functions
with map, fold_left, filter, and tabulate. Exploit that comp rev rev is the
identity.
(c) Explain under which circumstances these rules are applicable, and why they
improve the efficiency.
11. Simplification rules for index-dependent functions
Define index-dependent variants of the functions map and fold_left. Design
simplification rules and argue under which conditions these are applicable!
How do the new functions behave at composition with map, fold_left, filter and
tabulate?
12. Simplification rules for general data structures
Elimination of intermediate data structures may not only pay off for lists.
(a) Design simplification rules for functions map and fold_left on tree-like data
structures. The function map is to apply a functional argument to all data

3.8 Exercises

169

elements contained in the data structure. fold_left is to combine all elements


in the data structure in one value using its functional argument.
(b) Give examples for your proposed functions and discuss their applicability.
What could a generalization of tabulate from lists to tree-like data structures
look like?
(c) Do you encounter an analogy to the list function filter? Define functions
to_list and from_list that convert your data structure into a list and reconstruct a list from your data structure, respectively. What simplification rules
exist for these functions?
13. Optimization for total strictness
Develop a transformation that optimizes an expression whose value is definitely
completely needed.
14. Combination of total and root strictness
The strictness analyses which we have designed are instances of a neededness
analysis which more precisely determines how much of the arguments of a
function is needed for satisfying various demands onto the functions result.
(a) Define a strictness analysis that simultaneously determines total-strictness
and root-strictness information. Use a complete lattice 3 = {0 < 1 < 2}.
(b) Define a description relation between concrete values and abstract values
from 3 and define the necessary abstract expression evaluation.
(c) Test your analysis at the function app.
(d) Generalize your analysis to an analysis that determines, for a given k 1,
up to which depth k 1 the arguments need to be evaluated to or whether
they need to be completely evaluated if the result needs to be evaluated up
to a depth 0 j k 1 or completely.
15. Moving local functions to the outermost level
Transform a given OCaml program in such a way that all functions are defined
on the outermost level.
16. Monotone functions over 2
Construct the following complete lattices of monotonic functions:
(a) 22;
(b) 222;
(c) (22)22.
17. Strictness analysis for higher-order functions
Analyze total strictness properties of monomorphic instances of the functions
map and fold_left with types:
map
: (int int) list int list int
fold_left : (int int int) int list int list int

170

3 Optimization of Functional Programs

3.9 Literature
The calculus with -reduction and -conversion is the theoretical basis for functional programming languages. It is based on work about the foundations of mathematics by Church and Kleene from the 1930s. The book by Barendregt (1984) is
still the standard textbook presenting important properties of the -calculus.
Jones and Santos (1998) gives a survey of optimizations implemented in the
Haskell compiler. This description also treats optimizations of nested
let-expressions (Jones and Santos 1996).
fold/unfold-transformations are first treated in Burstall and Darlington (1977).
Inlining and function specialization are simple forms of partial evaluation of programs (Secher and Srensen 1999). Our value analysis follows the approach taken
by Heintze (1994). A type-based analysis of side effects is proposed by Amtoft et al.
(1997).
The idea to systematically suppress intermediate data structures was proposed by
Wadler (1990). An extension to programs with higher-order functions is described
in Seidl and Srensen (1998). The particularly simple variant for programs with
lists presented here was introduced by Gill et al. (1993). Generalizations to arbitrary
algebraic data structures are studied by Takano and Meijer (1995).
The idea to use strictness analysis to support a conversion from CBN to CBV
originates with Mycroft (1980). A generalization to monomorphic programs with
higher-order functions is presented by Burn et al. (1986). The method for the analysis
of total strictness for programs with structured data described here is a simplification
of the method of Sekar et al. (1990).
Not treated in this chapter are optimizations that are based on the representation
of functional programs in continuation-passing style. An extensive presentation of
this technique is given by Appel (2007).

References

P. Anderson, D. Binkley, G. Rosay, T. Teitelbaum, Flow insensitive points-to sets. Inf. Softw.
Technol. 44(13), 743754 (2002)
W.A. Appel, M. Ginsburg, Modern Compiler Implementation in C (Cambridge University Press,
Cambridge, 2004)
S. Abramsky, C. Hankin (Hrsg.), Abstract Interpretation of Declarative Languages (Ellis
Horwood, Chichester, 1987)
A.V. Aho, M.S. Lam, R. Sethi, J.D. Ullman, Compilers: Principles, Techniques, & Tools, 2nd
revised edn. (Addison-Wesley, New York, 2007)
T. Amtoft, F. Nielson, H.R. Nielson, Type and behaviour reconstruction for higher-order
concurrent programs. J. Funct. Program. 7(3), 321347 (1997)
W.A. Appel, Compiling with Continuations (Cambridge University Press, Cambridge, 2007)
D.F. Bacon, Fast and effective optimization of statically typed object-oriented languages. Ph.D.
thesis, Berkeley, 1997
H.P. Barendregt, The Lambda Calculus: Its Syntax and Semantics, Volume 103 of Studies in
Logic and the Foundations of Mathematics, revised edition (North Holland, Amsterdam,
1984)
R.M. Burstall, J. Darlington, A transformation system for developing recursive programs. J. ACM
24(1), 4467 (1977)
G.L. Burn, C. Hankin, S. Abramsky, Strictness analysis for higher-order functions. Sci. Comput.
Program. 7(3), 249278 (1986)
P. Cousot, R. Cousot, Static determination of dynamic properties of programs, in 2nd
International Symposium on Programming, pp. 106130. Dunod, Paris, France, 1976
P. Cousot, R. Cousot, Abstract interpretation: a unified lattice model for static analysis of
programs by construction or approximation of fixpoints, in 4th ACM Symposium on Principles
of Programming Languages (POPL), pp. 238252, 1977a
P. Cousot, R. Cousot, Static determination of dynamic properties of recursive procedures, ed. by
E.J. Neuhold (Hrsg.), in IFIP Conference on Formal Description of Programming Concepts,
pp. 237277. (North Holland, Amsterdam, 1977b)
P. Cousot, R. Cousot, Systematic design of program transformation frameworks by abstract
interpretation, in 29th ACM Symposium on Principles of Programming Languages (POPL),
pp. 178190, 2002
J.-D. Choi, M. Gupta, M. Serrano, V.C. Sreedhar, S. Midkiff, Escape analysis for Java. SIGPLAN
Not. 34(10), 119 (1999)
P. Cousot, N. Halbwachs, Automatic discovery of linear restraints among variables of a program,
in 5th ACM Symposium on Principles of Programming Languages (POPL), pp. 8497, 1978

H. Seidl et al., Compiler Design, DOI: 10.1007/978-3-642-17548-0,


Springer-Verlag Berlin Heidelberg 2012

171

172

References

T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, 3rd edn. (MIT
Press, Cambridge, 2009)
K.D. Cooper, L. Torczon, Engineering a Compiler (Morgan Kaufmann, Massachusetts, 2004)
M. Fhndrich, J. Rehof, M. Das, Scalable context-sensitive flow analysis using instantiation
constraints. SIGPLAN Not. 35(5), 253263 (2000)
C. Fecht, H. Seidl, A faster solver for general systems of equations. Sci. Comput. Program. (SCP)
35(2), 137161 (1999)
A.J. Gill, J. Launchbury, S.L.P. Jones, A short cut to deforestation, in Functional Programming
and Computer Architecture (FPCA), pp. 223232, 1993
R. Giegerich, U. Mncke, R. Wilhelm, Invariance of approximate semantics with respect to
program transformations. In GI Jahrestagung, pp. 110, 1981
P. Granger, Static analysis of linear congruence equalities among variables of a program, in
International Joint Conference on Theory and Practice of Software Development (TAPSOFT),
pp. 169192. LNCS 493 (Springer, Heidelberg, 1991)
T. Gawlitza, H. Seidl, Precise fixpoint computation through strategy iteration, in European
Symposium on Programming (ESOP), pp. 300315. LNCS 4421 (Springer, Heidelberg, 2007)
M.S. Hecht, Flow Analysis of Computer Programs (North Holland, Amsterdam, 1977)
N. Heintze, Set-based analysis of ML programs. SIGPLAN Lisp Pointers VII(3), 306317 (1994)
M. Hofmann, A. Karbyshev, H. Seidl, in Verifying a Local Generic Solver in Coq, ed. by R.
Cousot, M. Martel. Static Analysis, Volume 6337 of Lecture Notes in Computer Science
(Springer, Heidelberg, 2010), pp. 340355
M. Hofmann, A. Karbyshev, H. Seidl, What is a Pure Functional? ed. by S. Abramsky,
C. Gavoille, C. Kirchner, F. Meyer auf der Heide, P.G. Spirakis (Hrsg.), ICALP (2), Volume
6199 of Lecture Notes in Computer Science (Springer, Heidelberg, 2010), pp. 199210
S.L.P. Jones, W. Partain, A. Santos, Let-floating: moving bindings to give faster programs, in
International Conferece on Functional Programming (ICFP), pp. 112, 1996
L. Simon, P. Jones, A.L.M. Santos, A Transformation-based optimiser for Haskell. Sci. Comput.
Program. 32(13), 347 (1998)
M. Karr, Affine relationships among variables of a program. Acta Inf. 6, 133151 (1976)
G.A. Kildall, A unified approach to global program optimization, in ACM Symposium on
Principles of Programming Languages (POPL), pp. 194206, 1973
J. Knoop, Optimal Interprocedural Program Optimization, A New Framework and Its
Application, LNCS 1428. (Springer, Berlin, 1998)
J. Knoop, O. Rthing, B. Steffen, Optimal code motion: theory and practice. ACM Trans.
Program. Lang. Syst. 16(4), 11171155 (1994)
J. Knoop, O. Rthing, B. Steffen, Partial dead code elimination, in ACM Conference on
Programming Languages Design and Implementation (PLDI), pp. 147158, 1994
J. Knoop, B. Steffen, The interprocedural coincidence theorem, in 4th International Conference
on Compiler Construction (CC), pp. 125140. LNCS 541 (Springer, Heidelberg, 1992)
S. Kundu, Z. Tatlock, S. Lerner, Proving optimizations correct using parameterized program
equivalence, in ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI), 2009
J.B. Kam, J.D. Ullman, Global data flow analysis and iterative algorithms. J. ACM 23(1),
158171 (1976)
J.B. Kam, J.D. Ullman, Monotone data flow analysis frameworks. Acta Inf. 7, 305317 (1977)
X. Leroy, Formal verification of a realistic compiler. Commun. ACM 52(7), 107115 (2009)
S. Lerner, T.D. Millstein, C. Chambers, Automatically proving the correctness of compiler
optimizations, in ACM SIGPLAN Conference on Programming Language Design and
Implementatio (PLDI), pp. 220231, 2003
S. Lerner, T. Millstein, E. Rice, C. Chambers, Automated soundness proofs for dataflow analyses
and transformations via local rules, in 32nd ACM Symp. on Principles of Programming
Languages (POPL), pp. 364377, 2005

References

173

D. Liang, M. Pennings, M.J. Harrold, Extending and evaluating flow-insensitive and contextinsensitive points-to analyses for Java, in ACM SIGPLAN-SIGSOFT Workshop on Program
Analysis For Software Tools and Engineering (PASTE), pp. 7379, 2001
F. Martin, M. Alt, R. Wilhelm, C. Ferdinand, Analysis of loops, in 7th International Conference
on Compiler Construction (CC), pp. 8094. LNCS 1383, Springer, 1998.
S.S. Muchnick, N.D. Jones (Hrsg.), Program Flow Analysis: Theory and Application (Prentice
Hall, Englewood Cliffs, 1981)
M. Mller-Olm, H. Seidl, Precise interprocedural analysis through linear algebra, in 31st ACM
Symposium on Principles of Programming Languages (POPL), pp. 330341, 2004.
M. Mller-Olm, H. Seidl, A generic framework for interprocedural analysis of numerical
properties, in Static Analysis, 12th International Symposium (SAS), pp. 235250. LNCS 3672
(Springer, Heidelberg, 2005)
M. Mller-Olm, H. Seidl, Analysis of modular arithmetic. ACM Trans. Program. Lang. Syst.
29(5), 2007
S.S. Muchnick, Advanced Compiler Design and Implementation (Morgan Kaufmann, Massachusetts, 1997)
A. Mycroft, The theory and practice of transforming call-by-need into call-by-value, in Symposium on Programming: Fourth Colloque International sur la Programmation, pp. 269281.
LNCS 83 (Springer, Heidelbeg, 1980)
F. Nielson, H.R. Nielson, C. Hankin, Principles of Program Analysis (Springer, Heidelberg,
1999)
R. Paige, Symbolic finite differencingPart I, in 3rd European Symposium on Programming
(ESOP), pp. 3656. LNCS 432 (Springer, Heidelberg, 1990)
R. Paige, J.T. Schwartz, Reduction in strength of high level operations, in 4th ACM Symposium
on Principles of Programming Languages (POPL), pp. 5871, 1977
G. Ramalingam, On loops, dominators, and dominance frontiers. ACM Trans. Program. Lang.
Syst. (TOPLAS) 24(5), 455490 (2002)
V. Sundaresan, L. Hendren, C. Razafimahefa, R. Valle-Rai, P. Lam, E. Gagnon, C. Godin,
Practical virtual method call resolution for Java, in 15th ACM SIGPLAN Conference on
Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 264
280, 2000
A. Simon, Value-Range Analysis of C Programs: Towards Proving the Absence of Buffer
Overflow Vulnerabilities (Springer, Heidelberg, 2008)
J. Sheldon, W. Lee, B. Greenwald, S.P. Amarasinghe, Strength reduction of integer division and
modulo operations, in Languages and Compilers for Parallel Computing, 14th International
Workshop (LCPC). Revised Papers, pp. 254273. LNCS 2624 (Springer, Heidelberg, 2003)
M. Sharir, A. Pnueli, Two approaches to interprocedural data flow analysis, ed. by S.S.
Muchnick, N.D. Jones (Hrsg.). Program Flow Analysis: Theory and Application, pp. 189234
(Prentice Hall, Englewood Cliffs, 1981)
R.C. Sekar, S. Pawagi, I.V. Ramakrishnan, Small domains spell fast strictness analysis, in ACM
Symposium on Principles of Programming Languages (POPL), pp. 169183, 1990
M. Sagiv, T.W. Reps, R. Wilhelm, Parametric shape analysis via 3-valued logic, in 26th ACM
Symposium on Principles of Programming Languages (POPL), pp. 105118, 1999
M.Sagiv, T.W. Reps, R. Wilhelm, Parametric shape analysis via 3-valued logic. ACM Trans.
Program. Lang. Syst. (TOPLAS) 24(3), 217298 (2002)
H. Seidl, M.H. Srensen, Constraints to stop deforestation. Sci. Comput. Program. 32(13),
73107 (1998)
J.P. Secher, M.H. Srensen, On perfect supercompilation, in 3rd Int. Andrei Ershov Memorial
Conference: Perspectives of System Informatics (PSI), pp. 113127. LNCS 1755 (Springer,
Heidelberg, 1999)
Y.N. Srikant, P. Shankar (Hrsg.), The Compiler Design Handbook: Optimizations and Machine
Code Generation (CRC Press, Boca Raton, 2003)

174

References

B. Steensgaard, Points-to analysis in almost linear time, in 23rd ACM Symposium on Principles of
Programming Languages (POPL), pp. 3241, 1996
J.-B. Tristan, X. Leroy, Verified validation of lazy code motion, in ACM SIGPLAN Conference
on Programming Language Design and Implementation (PLDI), pp. 316326, 2009
A. Takano, E. Meijer, Shortcut deforestation in calculational form, in SIGPLAN-SIGARCHWG2.8 Conference on Functional Programming Languages and Computer Architecture
(FPCA), pp. 306313, 1995
P. Wadler, Deforestation: transforming programs to eliminate trees. Theor. Comput. Sci. 73(2),
231248 (1990)

Index

A
Abstract interpretation, 47
Algorithm, 23, 83, 87
recursive fixed-point, 88, 89, 114
round-robin, 23, 25, 27
worklist
Alias, 67
may, 67
must, 67
Alias analysis, 66
a conversion, 4
Analysis, 72
distributive framework, 30
flow-insensitive, 75
interprocedural, 124
monotonic framework, 26
points-to, 72
Antisymmetry, 17
Application
partial, 1
Approach
call string, 135
functional, 133
Array-bounds check, 4
Assignment
available, 138
between variables, 40
dead, 32
delayable, 103
partially dead, 102
partially redundant, 108
very busy, 132

B
Back edge, 99
Backward analysis, 34

b reduction, 143
Bottom, 18
Bound
greatest lower, 16, 18
least upper, 16, 17
upper, 17

C
C, 141, 144
Call, 117, 122, 135
last, 123
Call graph, 122
Call stack, 117, 124
Call string, 135
Chain
ascending, 166
descending, 65
stable ascending, 88
Closure, 159
Code
loop invariant, 97
Compilation
separate, 124
Computation
reaching, 120
redundant, 8
same level, 119
step, 8
Concretization, 47
Constant folding, 42
Constant propagation, 44, 42
interprocedural, 133
Control-flow graph, 8
interprocedural, 116
Copy propagation
interprocedural, 40, 124, 126

H. Seidl et al., Compiler Design, DOI: 10.1007/978-3-642-17548-0,


Springer-Verlag Berlin Heidelberg 2012

175

176

C (cont.)
Correctness
of constant propagation, 47
of interprocedural analysis, 127
of interval analysis, 59
of points-to analysis, 74
of the transformation PRE, 97
of transformation RE, 42
of transformation DE, 37

D
Data structure
intermediate, 155
union-find-, 79
Data-flow analysis, vi, 35
Dead-code elimination, 35
Deforestation, 15
Dependence, 68
Dependence analysis, 68
Description relation, 47
.NET instructions, 1

E
Edge effect
abstract, 11
concrete, 9
Element
atomic, 29
greatest, 18
least, 18
Equivalence class, 77
Evaluation
eager, 5, 19
lazy, 141, 144, 159
multiple, 7
partial, 44
Expression
abstract evaluation of, 45
available, 11
concrete evaluation of, 10

F
Feedback vertex set, 62
Fixed point
greatest, 23
least, 22
post-, 22
Fixed-point iteration
accumulating, 24
local, 89
naive, 23

Index
recursive, 87
round-robin, 23
worklist, 83
Fortran, 3
Forward analysis, 34
F#, 1
Function
distributive, 28
folding, 8
inlining, 6
monotonic, 20
specialization, 7
strict, 28
totally distributive, 28
Functional abstraction, 2

H
Haskell, 141, 144, 147

I
Inequalities, 15
system of
Inlining
of functions, 6
procedure, 121
Interval analysis, 53
Interval arithmetic, 56

J
Java, 4

L
k calculus, 143
Lattice
atomic, 29
complete, 16, 17
flat, 18
height of, 25
powerset-, 18
Lisp, 7
List constructor, 8
Loop inversion, 98

M
Memoization, 7
Memory
dynamically allocated, 67
Memory cell
uninitialized, 74

Index
N
Narrowing, 63

O
Operator, 65
Ocaml, 141, 145
Order
dual, 23
partial, 16

P
Partial order, 16
Partition
refinement of, 77
Path
infeasible, 14
Pattern, 10
Pattern matching, 141
Pointer, 67
Pointer arithmetic, 67
Polymorphism, 141
Predominator, 98
Program
well-typed, 3
Program optimization semantics, 3
Program point, 8
Program state, 8

R
Recursion
tail, 123
Redundancy
partial, 89
Redundancy elimination, 13
Reflexivity, 17
Register
virtual, 4
Register allocation, 4
Root-strictness, 23
Round-robin iteration
correctness, 25

S
Scala, 141
Semantics
denotational, 162

177
instrumented, 74
operational, 3, 154
small-step operational, 8
Side effects, 5
Solution, 149
merge-over-all-paths, 27
Stack frame, 117
Strictness
root, 23
total, 24
Substitution, 3
Supergraph
interprocedural, 135
System of inequalities
size of, 85

T
Termination, 145
Top, 18
Total strictness, 164
Transitivity, 17
Tree grammar
regular, 151
type inference, 141
type system, 67

V
Value analysis, 149
expression, 150
Variable
binding, 9, 10
abstract, 44
dead, 32
definition of, 32
global, 32
live, 33
order, 25
partially dead, 102
renaming of a, 144
true use, 37
truly live, 37
use of, 32
Verification, 4

W
Widening, 60
-operator, 61

You might also like