Points-to Analysis: Intro
Points-to Analysis is widely used, but it’s tricky to understand. There’s a wide variety of notations used, leading to confusion. And if that wasn’t bad enough, the most popular points-to analysis tool SVF, operates on LLVM IR, while most descriptions of points-to analysis algorithms use examples from C code. This makes it difficult for readers to map the familiar C source code based examples to the functioning of SVF.
This, and the following blog articles will describe things that I have learnt during my experiences with SVF and pointer analysis.
Andersen’s Algorithm
It is an iterative, subset-based points-to analysis algorithm. I’ll describe the different ways to think about the constraints and derivation rules, that I’ve found to be helpful.
-
p = &q:
This is the Address-of Constraint.
Here the address of q is stored to p. So q ∈ p. Where p is the pts-to set for p. -
p = q:
This is the Copy Constraint.
Here both p and q are pointers (or pointers to pointers, or pointers to pointers to pointers … and so on). This is fairly intuitive. It means, the points to set of q is a subset of the points-to set of p.
In other words, p might* point to whatever q might point to, in addition to whatever else, p might already be pointing to.
In other words, because q is a subset, we can denote p ⊇ q, denoting p is a superset of q. Some texts use this notation.
In other words, p ⊇ q, lx ∈ q, then, lx ∈ p.
In other words, the points-to set of q, is copied into the points-to set of p.
Note*: I use might because points-to analysis is imprecise and overapproximated.
Note: the alphabet is used to denote both the pointer / object as well as its points-to set. The meaning is clear depending on the context. -
*p = q:
This is the Assign Constraint.
Here q is a pointer (or a pointer to a pointer or so on ..) and p is a pointer-to-a-pointer (or a pointer to a pointer to a pointer and so on …). Intuitively, it means that the points-to set of q is a subset of each of the points-to sets of every element that is in points-to set for p.
In other words, every element that p might point to, might also point to whatever q might point to, in addition to whatever they might already be pointing to.
In other words, for every element e such that, p → e, e q.
In other words, *p ⊇ q, lr p, lx ∈ q , then, lx ∈ r.
In other words, the points-to set of q, is copied into every element in the points-to set of p. -
p = *q:
This the Deref Constraint.
Here q is pointer-to-a-pointer (or a pointer-to-a-pointer-to-a-pointer and so on), and p is a pointer (or a pointer-to-a-pointer and so on). Intuitively it means that the points-to set of every element that q could point to, is a subset of the points-to set of p.
In other words, p might point to whatever, every element that is in the points-to set for every element that might be pointed to by q.
In other words, for every element e such that, q → e, p ⊇ e.
In other words, p ⊇ *q, lr ∈ q, lx ∈ r , then, lx ∈ p.
In other words, the points-to set of every element that q might point to, is copied into the points-to set of p.
The Andersen’s Algorithm, and practically all static analysis algorithms are language-independent. For example, all languages that supports pointers and allows for pointers to be assigned and dereferenced, will work with Andersen’s Analysis.
Also, note that C programs can contain more complex statements like **p = ***q, or whatever. These have to be broken down into simpler statements via temporary variables in order to do the analysis.
Enjoy Reading This Article?
Here are some more articles you might like to read next: