Tool User Guide
pymwp (“pai m-w-p”) is a tool for automatically performing static analysis on programs written in a subset of the C language. It analyzes resource usage and determines if program variables’ growth rates are no more than polynomially related to their inputs sizes.
The theoretical foundations are described in paper the “mwp-Analysis Improvement and Implementation: Realizing Implicit Computational Complexity” (cf. “Learn More” for additional references and links). The technique is generic and applicable to any imperative language. pymwp is an implementation demonstrating this technique concretely on C programs. The technique is originally inspired by “A Flow Calculus of mwp-Bounds for Complexity Analysis”.
This guide explains pymwp usage and behavior through several examples.
For an imperative program, the goal is to discover a polynomially bounded data-flow relation, between the initial values of the variables (denoted X1,…,Xn
) and its final values (denoted X1',…,Xn'
).
For a program written in C language, this property can be presented as follows.
void main(int X1, int X2, int X3){
// initial values ↑
/*
* various commands involving
* variables X1, X2, X3
*/
// X1', X2', X3' (final values)
}
Question: \(\forall n\), is \(\texttt{X}_n \rightsquigarrow \texttt{X}_n^\prime\) polynomially bounded in inputs?
We answer this question using the mwp-flow analysis, implemented in the pymwp static analyzer.
The mwp-flow analysis works to establish a polynomial growth bound for input variables by applying inference rules to program’s commands.
Internally, the analysis tracks coefficients representing dependencies between program’s variables. These coefficients (or “flows”) are \(0, m, w, p\) and \(\infty\). They characterize how data flows between variables.
with ordering: \(0 < m < w < p < \infty\). The analysis name also comes from these coefficients.
After analysis, two outcomes are possible. (A) The program’s variables values can be bounded by a polynomial in the input’s values, or (B) the analysis determines it is impossible to establish such a bound. Due to non-determinism, many derivation paths need to be explored to determine this result.
The analysis succeeds if – for some derivation – no pair of variables is characterized by \(\infty\)-flow. That is, obtaining an \(\infty\)-free derivation implies existence of a polynomial growth bound; i.e., the program has the property of interest, or we can say that the program is derivable. The soundness theorem of the mwp-calculus guarantees that if such derivation exists, the program variables’ value growth is polynomially bounded in inputs.
Program fails the analysis if every derivation contains an \(\infty\) coefficient. Then it is not possible to establish polynomial growth bound. For these programs, pymwp reports an \(\infty\)-result.
If the analysis is successful, i.e., polynomial growth bound exists, it is represented using mwp-bounds.
An mwp-bound is a number theoretic expression of form: \(\text{max}(\vec x, poly_1(\vec y)) + poly_2(\vec z)\).
Disjoint variable lists \(\vec x\), \(\vec y\) and \(\vec z\) capture dependencies of an input variable. Dependencies characterized by \(m\)-flow are in \(\vec x\), \(w\)-flow in \(\vec y\), and \(p\)-flow in \(\vec z\). The polynomials \(poly_1\) and \(poly_2\) are built up from constants, variables, and operators \(+\) and \(\times\). Each variable list may be empty and \(poly_1\) and \(poly_2\) may not be present.
For multiple input variables, the result is a conjunction of mwp-bounds, one for each input variable.
Example 1. Assume program has one input variable named \(\texttt{X}\), and we have obtained a bound: \(\texttt{X}' \leq \texttt{X}\).
The bound expression means the final value \(\texttt{X}'\) depends only on its own initial value \(\texttt{X}\).
Example 2. Assume program has two inputs, \(\texttt{X}\) and \(\texttt{Y}\), and we obtained a bound: \(\texttt{X}' \leq \texttt{X} \land \texttt{Y}' \leq \text{max}(\texttt{X}, 0) + \texttt{Y}\).
This section explains how to get started using pymwp. We recommend installing pymwp locally for an interactive tutorial experience.
pymwp requires Python runtime 3.7 – 3.11 (current latest).
To check currently installed version, run:
python3 --version
On systems that default to Python 3 runtime, use python
instead of python3
.
Instructions for installing Python for different operating systems can be found at python.org ↗.
Install pymwp from Python Package Index:
pip3 install pymwp==0.4.2
Download a set of examples.
wget https://github.com/statycc/pymwp/releases/download/0.4.2/examples.zip
Alternatively, download the set of examples using a web browser:
https://github.com/statycc/pymwp/releases/download/0.4.2/examples.zip
Unzip examples.zip
using your preferred approach.
This completes the setup.
We now demonstrate use of pymwp on several examples. To ease the presentation, we will use multiple command line arguments.
--fin
— always run analysis to completion (even on failure)--info
— reduces amount of terminal output to info level--no_time
— omits timestamps from output logFor a complete list of available command arguments, run:
pymwp --help
This example shows that assigning a compounded expression to a variable results in correct analysis.
int foo(int y1, int y2){
y2 = y1 + y1; }
It is straightforward to observe that this program has a polynomial growth bound. The precise value of that bound is \(\texttt{y1}^\prime = \texttt{y1} \land \texttt{y2}^\prime \leq 2 * \texttt{y1}\). Although the program is simple, it is interesting because binary operations introduce complexity in program analysis.
The current working directory should be the location of unzipped examples from Installation Step 4.
pymwp basics/assign_expression.c --fin --info --no_time
Output:
INFO (result): Bound: y1' ≤ y1 ∧ y2' ≤ y1
INFO (result): Bounds: 3
INFO (result): Total time: 0.0 s (0 ms)
INFO (file_io): saved result in output/assign_expression.json
The analysis correctly assigns a polynomial bound to the program. The bound obtained by the analyzer is \(\texttt{y1}^\prime \leq \texttt{y1} \land \texttt{y2}^\prime \leq \texttt{y1}\). Comparing to the precise value determined earlier, this bound is correct, because we omit constants in the analysis results.
Due to non-determinism, the analyzer finds three different derivations that yield a bound. From the .json
file, that captures the analysis result in more technical detail, it is possible to determine these three bounds are:
They all simplify to \(\texttt{y1}^\prime \leq \texttt{y1} \land \texttt{y2}^\prime \leq \texttt{y1}\). This concludes the obtained result matches the expected result.
A program computing the exponentiation returns an infinite coefficient, no matter the derivation strategy chosen.
int foo(int base, int exp, int i, int result){
while (i < exp){
result = result * base;1;
i = i +
} }
This program’s variable \(\texttt{result}\) grows exponentially. It is impossible to find a polynomial growth bound, and the analysis is expected to report \(\infty\)-result. This example demonstrates how pymwp arrives to that conclusion.
pymwp infinite/exponent_2.c --fin --info --no_time
Output:
INFO (result): foo is infinite
INFO (result): Possibly problematic flows:
INFO (result): base ➔ result ‖ exp ➔ result ‖ i ➔ result ‖ result ➔ result
INFO (result): Total time: 0.0 s (2 ms)
INFO (file_io): saved result in output/exponent_2.json
The output shows that the analyzer correctly detects that no bound can be established, and we obtain \(\infty\)-result. The output also gives a list of problematic flows. This list indicates all variable pairs, that along some derivation paths, cause \(\infty\) coefficients to occur. The arrow direction means data flows from source ➔ target
. We can see the problem with this program is the data flowing to \(\texttt{result}\) variable. This clearly indicates the origin of the problem, and allows programmer to determine if the issue can be repaired, to improve program’s complexity properties.
A program that shows infinite coefficients for some derivations.
int foo(int X0, int X1, int X2, int X3){
if (X1 == 1){
X1 = X2+X1;
X2 = X3+X2;
}while(X0<10){
X0 = X1+X2;
} }
This program contains decision logic, iteration, and multiple variables. Determining if a polynomial growth bound exists is not immediate by inspection. It is therefore an interesting candidate for analysis with pymwp!
pymwp not_infinite/notinfinite_3.c --fin --info --no_time
Output:
INFO (result): Bound: X0' ≤ max(X0,X1)+X2*X3 ∧ X1' ≤ X1+X2 ∧ X2' ≤ X2+X3 ∧ X3' ≤ X3
INFO (result): Bounds: 9
INFO (result): Total time: 0.0 s (3 ms)
INFO (file_io): saved result in output/notinfinite_3.json
Compared to previous examples, the analysis is now getting more complicated. We can observe this in the number of discovered bounds and the form of the bound expression. The number of times the loop iterates, or which branch of the if
statement is taken, is not a barrier to determining the result.
From the bound expression, we can determine the following.
\(\texttt{X0}\) has the most complicated dependency relation. Its mwp-bound combines the impact of the if
statement, the while
loop, and the chance that the loop may not execute.
\(\texttt{X1}\) and \(\texttt{X2}\) have fairly simple growth dependencies, originating from the if
statement.
\(\texttt{X3}\) is the simplest case – it never changes. Therefore, it only depends on itself.
Overall, the analysis concludes the program has a polynomial growth bound.
A program that shows infinite coefficients for all choices.
int foo(int X1, int X2, int X3){
if (X1 == 1){
X1 = X2+X1;
X2 = X3+X2;
}while(X1<10){
X1 = X2+X1;
} }
If you studied the previous example carefully, you might notice that this example is very similar. There is a subtle differences: variable \(\texttt{X0}\) has been removed and its usages changed to \(\texttt{X1}\). This example demonstrates how this seemingly small change impacts the analysis result.
pymwp infinite/infinite_3.c --fin --info --no_time
Output:
INFO (result): foo is infinite
INFO (result): Possibly problematic flows:
INFO (result): X1 ➔ X1 ‖ X2 ➔ X1 ‖ X3 ➔ X1
INFO (result): Total time: 0.0 s (2 ms)
INFO (file_io): saved result in output/infinite_3.json
We can observe the result is \(\infty\). Thus, even a small change can change the analysis result entirely.
The output reveals the problem arises from how data flows from source variables \(\texttt{X1}\), \(\texttt{X2}\), and \(\texttt{X3}\), to target variable \(\texttt{X1}\). Observe that even though there is no direct assignment from \(\texttt{X3}\) to \(\texttt{X1}\), the analysis correctly identifies this dependency relation, that occurs through \(\texttt{X2}\).
From the output, we have identified the point and source of failure. Conversely, we know other variable pairs are not problematic. By focusing on how to avoid “too strong” dependencies targeting variable \(\texttt{X1}\), programmer may be able to refactor and improve the program’s complexity properties.
Try to guess the analysis outcome before determining the result with pymwp.
int foo(int X0, int X1, int X2){
if (X0) {
X2 = X0 + X1;
}else{
X2 = X2 + X1;
}
X0 = X2 + X1;
X1 = X0 + X2;while(X2){
X2 = X1 + X0;
} }
After seeing the various preceding examples – with and without polynomial bounds – we present the following challenge. By inspection, try to determine if this program is polynomially bounded w.r.t. its input values.
It is unknown which if
branch will be taken, and whether the while
loop will terminate, but this is not a problem for determining the result.
pymwp other/dense_loop.c --fin --no_time --info
Output:
INFO (result): Bound: X0' ≤ max(X0,X2)+X1 ∧ X1' ≤ X0*X1*X2 ∧ X2' ≤ max(X0,X2)+X1
INFO (result): Bounds: 81
INFO (result): Total time: 0.0 s (29 ms)
INFO (file_io): saved result in output/dense_loop.json
Even with just 3 variables we can see—in the obtained bound expression and the number of bounds—that this is a complicated derivation problem. The analyzer determines the program has a polynomial growth bound. Let us reason informally and intuitively why this obtained result is correct.
We can observe in the bound expression, that all three variables have complicated dependencies on one another; this corresponds to what is also observable in the input program.
Regarding variables \(\texttt{X0}\) and \(\texttt{X1}\), observe there is no command, with either as a target variable, that would give rise to exponential value growth (need iteration). Therefore, they must have polynomial growth bounds.
Variable \(\texttt{X2}\) is more complicated. The program has a while
loop performing assignments to \(\texttt{X2}\) (potentially problematic), and the while
loop may or may not execute.
Case 1: loop condition is initially false. Then, final value of \(\texttt{X2}\) depends on the if
statement, and in either branch, it will have polynomially bounded growth.
Case 2: loop condition true – at least one iteration will occur. The program iteratively assigns values to \(\texttt{X2}\) inside the while
loop. However, notice the command is loop invariant. No matter how many times the loop iterates, the final value of \(\texttt{X2}\) is \(\texttt{X1} + \texttt{X0}\). We already know those two variables have polynomial growth bounds. Therefore, \(\texttt{X2}\) also grows polynomially w.r.t. its input values.
This reasoning concurs with the result determined by pymwp.
This guide has offered only a very short introduction to mwp-analysis and pymwp.
If you wish to explore more, have a look at:
“mwp-Analysis Improvement and Implementation: Realizing Implicit Computational Complexity”
Describes theoretical foundations behind pymwp.
“A Flow Calculus of mwp-Bounds for Complexity Analysis”
Original mwp-flow analysis technique.
Documentation – Includes information about supported C language features, etc.
Online demo – Run pymwp online on more than 40 examples.
Source code – pymwp is open source.
License – pymwp is licensed under GPLv3.