Cloning-Based Context-Sensitive Pointer Alias Analysis using BDDs:

Cloning-Based Context-Sensitive Pointer Alias Analysis using BDDs John Whaley
Monica Lam
Stanford University June 10, 2004

Unification vs. Inclusion:

Unification vs. Inclusion Earlier scalable pointer analysis was context-insensitive unification-based [Steensgaard ’96]
Pointers are either unaliased or point to the same set of objects.
Near-linear, but VERY imprecise.
Inclusion-based pointer analysis
Can point to overlapping sets of objects.
Closure calculation is O(n3)
Various optimizations [Fahndrich,Su,Heintze,…]
BDD formulation, simple, scalable [Berndl,Zhu]

Context Sensitivity:

Context Sensitivity Context sensitivity is important for precision.
Unrealizable paths. Object id(Object x) { return x; } a = id(b); c = id(d);

Context Sensitivity:

Object id(Object x) { return x; } Object id(Object x) { return x; } Context Sensitivity Context sensitivity is important for precision.
Unrealizable paths.
Conceptually give each caller its own copy. a = id(b); c = id(d);

Summary-Based Analysis:

Summary-Based Analysis Popular method for context sensitivity.
Two phases:
Bottom-up: Summarize effects of methods.
Top-down: Propagate information down.
Problems:
Difficult to summarize pointer analysis.
Summary-based analysis using BDD: not shown to scale [Zhu’02]
Queries (e.g. which context points to x) require expanding an exponential number of contexts.

Cloning-Based Analysis:

Cloning-Based Analysis Simple brute force technique.
Clone every path through the call graph.
Run context-insensitive algorithm on expanded call graph.
The catch: exponential blowup

Cloning is exponential!:

Cloning is exponential!

Recursion:

Recursion Actually, cloning is unbounded in the presence of recursive cycles.
Technique: We treat all methods within a strongly-connected component as a single node.

Recursion:

Recursion A G B C D E F

Top 20 Sourceforge Java Apps:

Top 20 Sourceforge Java Apps 1016
1012
108
104
100

Cloning is infeasible (?):

Cloning is infeasible (?) Typical large program has ~1014 paths
If you need 1 byte to represent a clone:
Would require 256 terabytes of storage
Registered ECC 1GB DIMMs: $98.6 million
Power: 96.4 kilowatts = Power for 128 homes
300 GB hard disks: 939 x $250 = $234,750
Time to read (sequential): 70.8 days
Seems unreasonable!

BDD comes to the rescue:

BDD comes to the rescue There are many similarities across contexts.
Many copies of nearly-identical results.
BDDs can represent large sets of redundant data efficiently.
Need a BDD encoding that exploits the similarities.

Contribution (1):

Contribution (1) Can represent context-sensitive call graph efficiently with BDDs and a clever context numbering scheme
Inclusion-based pointer analysis
1014 contexts, 19 minutes
Generates all answers

Contribution (2) :

Contribution (2) BDD hacking is complicated
bddbddb (BDD-based deductive database)
Pointer algorithm in 6 lines of Datalog
Automatic translate into efficient BDD implementation
10x performance over hand-tuned solver (2164 lines of Java)

Contribution (3):

Contribution (3) bddbddb: General Datalog solver
Supports simple declarative queries
Easy use of context-sensitive pointer results
Simple context-sensitive analyses:
Escape analysis
Type refinement
Side effect analysis
Many more presented in the paper

Context-sensitive call graphs in BDD:

Context-sensitive call graphs in BDD

Call graph relation:

Call graph relation Call graph expressed as a relation.
Five edges:
Calls(A,B)
Calls(A,C)
Calls(A,D)
Calls(B,D)
Calls(C,D) B D C A

Call graph relation:

Call graph relation Relation expressed as a binary function.
A=00, B=01, C=10, D=11 B D C A 00 10 01 11

Binary Decision Diagrams Size is correlated to amount of redundancy, NOT size of relation.
As the set gets larger, the number of don’t-care bits increases, leading to fewer necessary nodes.

Expanded Call Graph:

Expanded Call Graph A D B C E F G H 0 1 2 0 1 2 2 1 0

Numbering Clones:

Numbering Clones A D B C E F G H A D B C E F G H E E F F G G H H H H H 0 0 0 0 1 2 0-2 0-2 0-2 3-5 0 0 0 0 1 2 0 1 2 2 1 0 0 1 2 3 4 5 0

Pointer Analysis:

Pointer Analysis

Pointer Analysis Example:

Pointer Analysis Example
h1: v1 = new Object();
h2: v2 = new Object();
v1.f = v2;
v3 = v1.f; Input Relations
vPointsTo(v1,h1)
vPointsTo(v2,h2)
Store(v1,f,v2)
Load(v1,f,v3)
Output Relations
hPointsTo(h1,f,h2)
vPointsTo(v3,h2) v1 h1 v2 h2 f v3

Context-sensitive pointer analysis Compute call graph with context-insensitive pointer analysis.
Datalog rules for:
assignments, loads, stores
discover call targets, bind parameters
type filtering
Apply rules until fix-point reached.
Compute expanded call graph relation.
Apply context-insensitive algorithm to expanded call graph.

bddbddb:BDD-Based Deductive DataBase:

bddbddb: BDD-Based Deductive DataBase

Datalog:

Datalog Declarative logic programming language designed for databases
Horn clauses
Operates on relations
Datalog is expressive
Relational algebra:
Explicitly specify relational join, project, rename.
Relational calculus:
Specify relations between variables; operations are implicit.
Datalog:
Allows recursively-defined relations.

Datalog BDD:

Datalog BDD Join, project, rename are directly mapped to built-in BDD operations
Automatically optimizes:
Rule application order
Incrementalization
Variable ordering
BDD parameter tuning
Many more…

Experimental Results:

Experimental Results

Experimental Results:

Experimental Results Top 20 Java projects on SourceForge
Real programs with 100K+ users each
Using automatic bddbddb solver
Each analysis only a few lines of code
Easy to try new algorithms, new queries
Test system:
Pentium 4 2.2GHz, 1GB RAM
RedHat Fedora Core 1, JDK 1.4.2_04, javabdd library, Joeq compiler

Slide38:

Slide39:

Multi-type variables:

Multi-type variables A variable is multi-type if it can point to objects of different types.
Measure of analysis precision
One line in Datalog
Two ways of handling context sensitivity:
Projected: Merge all contexts together
Full: Keep each context separate

Slide41:

Related Work:

Related Work Context-insensitive pointer analysis
Steensgaard: Unification-based (POPL’96)
Andersen: Inclusion-based (’94)
Optimizations: too many to list
Berndl: formulate in BDD (PLDI’03)
Das: one-level-flow (PLDI’00)
Hybrid unification/inclusion

Related Work:

Related Work Scalable context-sensitive pointer analysis
Fähndrich etal, instantiation constraints (PLDI’00)
CFL-reachability
Unification-based: Imprecise.
Handles recursion well.
Computes on-demand.
GOLF: Das etal. (SAS’01)
One level of context sensitivity.
Foster, Fahndrich, Aiken (SAS’00)
Throws away information.
Wilson andamp; Lam: PTF (PLDI’95)
Doesn't really scale (especially complexity)

Related Work:

Related Work Whaley andamp; Rinard: Escape analysis (OOPSLA’99)
Compositional summaries: only weak updates.
Achieves scalability by collapsing escaped nodes.
Emami andamp; Hendren: Invocation graphs (PLDI’94)
Only shown to scale to 8K lines.
Zhu andamp; Calman: (PLDI’04)
To be presented next in this session.
More complete coverage in the paper.

Conclusion:

Conclusion The first scalable context-sensitive inclusion-based pointer analysis.
Achieves context sensitivity by cloning.
bddbddb: Datalog efficient BDD
Easy to query results, develop new analyses.
Very efficient!
andlt;19 minutes, andlt;600mb on largest benchmark.
Complete system is publically available at: http://suif.stanford.edu/bddbddb

You do not have the permission to view this presentation. In order to view it, please
contact the author of the presentation.

Send to Blogs and Networks

Processing ....

Premium member

Use HTTPs

HTTPS (Hypertext Transfer Protocol Secure) is a protocol used by Web servers to transfer and display Web content securely. Most web browsers block content or generate a “mixed content” warning when users access web pages via HTTPS that contain embedded content loaded via HTTP. To prevent users from facing this, Use HTTPS option.