10. Evaluation of POLYBENCH and Future Work

Previous: 9. Distribution of Dimension Up: 2. Constitution of a Next: 3. Results for Satisfiability

10. Evaluation of POLYBENCH and Future Work

In this section, we talk about the advantages of our approach, its limitations and the future of POLYBENCH. First of all, we summarize some advantages as follows:

Methodology: Easy and straightforward comparison of performance and stability;
Quality of comparison: large databases from an existing static analyzer;
Presentation of results in graphical form and tables.
Execution is fully automated;
Static program analysis related test sets;
POLYBENCH's input format is human legible;
POLYBENCH supports variable names and thus provides variable origin tracking in debugging.

Alongside the above advantages, limitations of our approach are also present:

Work required for new implementation;
Comparisons can be inhomogeneous;
Missing memory usage comparisons;
Chosen criteria are not fully satisfying, i.e. heuristic-based approach is not applicable;
Head-to-head comparisons require several executions of the same implementation, since exception cases are excluded from execution time measurements;
PIPS-dependence: version development.

Since we intend to remove these limitations, here we discuss them in more details before presenting the experimental part.

The diversity of algorithms and implementations yields the most important limit of our framework. The diversity consists of availability of algorithms, algorithmic differences, implementation differences, etc. As such, the comparisons sometimes are not homogeneous.

For example, many integer algorithms are not yet designed because of complexity, e.g. the convex hull operator, or JANUS [SOG] was originally implemented with C built-in arithmetic, thus only -bit computation is enabled, whereas most polyhedral implementations support -bit integer arithmetic. Some libraries implement GNU multi-precision, e.g. New POLKA [JEA,JEA00], some others do not, e.g. [PIPa,CAI00]. Therefore, although the 64-bit vs GNU multi-precision computing question seems to be more interesting than the 32-bit vs 64-bit computing question in our point of view, the corresponding implementations are not available.

In particular, CDD [FUK] implements C built-in floating point and GNU multi-precision arithmetic, and it sometimes uses both at the same time though usually one uses only one.

The second most important limit of our framework is that it does not support yet the memory consumption comparisons, due to its already complicated structure. We intend to implement this feature in the future.

In order to analyze a new algorithm or implementation, an internal representation conversion from our format is required. This is not a simple work, errors or incompatibilities may appear. The conversion time is included in comparisons, which is not trivial. Some of available implementations are not tested, due to problems of time and bugs. Furthermore, in order to compare the results computed by different implementations, each result in its own internal format must be converted to our format. This conversion is different from the above-mentioned conversion, thus requires additional work. That is why, in case of New POLKA, the output is not yet verified.

In our framework, representations of results are based on run times and polyhedral criteria, such as number of dimensions, number of constraints, etc. Thus, for operators that take several arguments, e.g. the convex hull operator, the representation is not trivial since run times depend on all arguments. We can for instance take the criteria of the largest argument, but we actually take the first polyhedron's criteria, in the convex hull evaluation.

Another limit of our framework is that, while a great number of tests is generated in order to increase the exactitude of experiments, it requires days of experimentation. If we deploy powerful computers, we can reduce the time, but then the time of each execution may go under the time resolution which is based on quanta, thus comparisons becomes imprecise.

On the contrary, if we use a small set of tests, deploying slow computer, we obtain more precision in timing comparison. Thus a sampling mechanism is implemented in our framework. For execution times that are too small, a repeating mechanism for time measurement is implemented since it is the cheapest solution. We plan to implement an adaptive schema in the future.

Our database creation schema depends strongly on PIPS's infrastructure, in order to put the evaluations in the program analysis and transformation context. Thus, these databases may contain polyhedra that are not suitable in other context. We notice that though there is redundancy in our databases, it is not considered a problem, since the databases are constructed and used independently for each operator, and because it reflects the reality of static analyzers. In fact, we can test a memorization scheme for this, as well as for head-to-head comparisons to avoid several executions of the same implementation, but we decide to implement them in the future due to their lower priority.

Previous: 9. Distribution of Dimension Up: 2. Constitution of a Next: 3. Results for Satisfiability

Nguyen Que Duong
2006-09-16