PROMPT : A Mapping environment for telecom applications on "system-on-a-chip"
Michel Barreteau, Juliette Mattioli
Corinne Ancourt, François Irigoin
Thierry GRANDPIERRE, Christophe LAVARENNE, Yves SOREL
Philippe BONNOT, Philippe KAJFASZ
ABSTRACT - Increasing of computation needs and improving of processor integration make the mapping of embedded real time applications more and more expensive. PROMPT provides a new software design methodology based on the "Adéquation Algorithme Architecture" for telecom applications on Systems-On-a-Chip.
In the next decade, "Systems-On-a-Chip" (SOC) will play a crucial role in telecom applications (UMTS, Wide Band, …). Using deep sub-micron technologies, they are designed to give an answer to the high level of integration. Moreover, they integrate heterogeneous computing engines (SIMD, DSP, micro-controllers, …). They will have to deal with all the various levels of digital processing, from signal processing (SP), in digital beam-forming applications for instance, to protocol stacks.
The architecture of these SOC will be fundamentally heterogeneous. It will integrate various computing engines devoted to specific functions like number crunching (filtering, FFT, …), digital signal processing, data processing and/or decision and supervision. These engines could be number crunchers, which take into account data parallelism and are optimally designed using homogeneous SIMD structures, interconnected with VLIW DSP, RISC cores or micro-controllers. For instance, MEFISTO  is based on a SIMD computation unit (MARAÑON), a floating point unit (mAgicFPU) and a general-purpose processor (ARM).
Unfortunately, programming of such SOC becomes harder and harder. Indeed, mapping and performance estimation of real time applications on this kind of engines, closely interconnected on a single silicon die, is not trivial at all. Moreover, complexity of software development (processing and communication) is drastically increasing mainly because on-chip inter-engines buses are not accessible and also because each engine uses its own tools (Assembler, C/C++ compilers, optimizer, debugger, …).
This paper focuses on a new approach, and the environment which is based on, for helping users to map telecom applications onto SOC. This methodology is currently validated through a prototype environment. This article is articulated as follows. The next section gives an overview of some development tools for mapping and optimizing embedded real time applications onto multi-processors to rapid prototyping. Section 3 describes AAA and PLC2 methodologies that compose the generic software design approach of the PROMPT project and how they are combined to this end. Section 4 focuses on the new global approach itself. It is illustrated in section 5.
This section does not provide a state-of-the-art but only gives an overview of some development tools that aim at mapping and optimizing embedded real time applications onto multi-processors to rapid prototyping. Each tool has its own features but they all simulate an heterogeneous system and generate some code.
Ptolemy  (University of Berkeley) is a simulation, modeling and code generation environment for SP heterogeneous systems; it can use several models in the same simulation to take into account several aspects of an embedded system.
SynDEx  (INRIA) is a system-level CAD software which supports the "Algorithm Architecture Adequation" methodology. It enables to quickly scan the solution space to extract an efficient solution and generate a real time distributed executive without deadlock for multicomponents architectures.
GEDAE  (Lockheed Martin ATL) is a graphical programming and automated code generation tool designed for multiprocessing. It can simulate while controlling execution.
Our approach is based on a co-operation between two main methodologies and use of their associated tools:
The goal of the AAA methodology is to find the best matching between an algorithm and an architecture, while satisfying constraints. "Adequation" is a French word meaning an efficient matching. Note that it is different from the English word "adequacy" which involves a sufficient matching.
AAA is based on graph models to exhibit both the potential parallelism of the algorithm and the available parallelism of the multi-component. The implementation consists in distributing and scheduling the algorithm data-flow graph on the multicomponent hyper-graph while satisfying real time constraints. This is formalized in terms of graph transformations. Heuristics are used to optimize the real time performances and resources allocation of real time applications embedded on the SOC.
The result of graph transformations is an optimized Synchronized Distributed Executive, automatically built from a library of architecture-dependent executive primitives composing the executive kernel. There is one executive kernel for each supported processor. These primitives support boot-loading, memory allocation, inter-engine communications, sequentialization of user supplied computation functions and of intra and inter-SOC communications, and inter-sequences synchronization.
For the homogeneous parts, the PLC2 methodology focuses on mapping of systematic signal processing (SSP) on parallel computation units (SIMD or SPMD) using Concurrent Constraints Programming Languages. This approach takes into account all the constraints inherent to real time implementation on the targeted SOC: available hardware (local memories, number of processors, processors communications), application latency, …
It is based on formalizations of the architectural, applicative and mapping models by constraints and takes as input the specification of different models such as the targeted SOC, communication costs, the application, partitioning, data alignment, memory allocation and the scheduling models.
The result is a fine grain scheduling of computations, their distribution onto processors and memory allocations. Computations are distributed in a block-cyclic way onto processors. Moreover, communications are overlapped with computations when possible. The memory model is precise: only the amount of memory useful to the computation is allocated.
Some comparison items are required to be able to combine AAA and PLC2 methodologies. Three items have been chosen: applications domain, target architectures and features of the tools which support these methodologies.
AAA and PLC2 methodologies share the same kind of model to specify the algorithm: a Data Flow Graph (DFG) between computation blocks. They are regular and repetitive parts on one or several dimensions. Benefits of this common model are not only its declarative aspect (close to mathematical expressing of SP algorithms) but also and above all the focus (using data dependencies) on the partial order between operations, so their potential parallelism which allows efficient implantations on miscellaneous parallel architectures. About the PROMPT project, the main benefit is to use the DFG as a well-defined (i.e. without ambiguity) bridge between both methodologies.
But functional description of the algorithms are not similar. For instance, their applications domain: PLC2 one copes with SSP where operations are executed without any condition. AAA applications domain rather tackles complex applications with control, command, SP and image processing that may include some SSP parts but with a bigger grain than the one specified using PLC2.
PLC2 considers homogeneous architectures (SIMD or MIMD using SPMD programming model). AAA target machines are MIMD architectures that may have SIMD parts.
"Homogeneous" means that considered architectures are architectures where processors are similar each other (same memory capacity, same duration of computation and communication tasks). Hence, processors are fully-connected. In contrary with AAA, topology is implicitly defined.
Respectively, "heterogeneous" means that each processor may have different features (several types of memory or bus, different throughputs, …); the processor graph (described in extension) has to be connected but not necessarily fully.
About the Human Machine Interface, EPHORAT inputs mainly are the architectural parameters (number of processors, latency, memory capacity, …), the application to be read and a partial solution if necessary. SynDEx offers an graphical interface to enter the algorithm and architecture graphs.
SynDEx provides libraries of portable operations to specify the application, EPHORAT does not; but it may use libraries of a specification language of SP applications: Array-OL .
PLC2 can manipulate several granularity levels according to the objects that have been used into the models. It is very useful to simultaneously consider them during the problem modeling. PLC2 accepts fine-grain (data level). SynDEx granularity is at macro-operation level.
Taking into account constraints is difficult not only owing to the number of constraints but also to the type of constraints: global and elementary constraints. The former are complicated to manage because they have to be globally respected (PLC2 models). The latter are only intricate to manage because of their number; such constraints are the operation durations. PLC2 automatically manages global constraints (e.g. latency, memory capacity) thanks to a constraints solver.
Minimizing search space of solutions is one of the main goal of such a tool. SynDEx is based on an optimization heuristics about the duration of critical path. In EPHORAT, optimization criterion (memory, latency, efficiency, …) is model-independent.
SynDEx and EPHORAT are not simulation but validation tools. They only provide a temporal view that shows a prediction of the real time behavior without simulating the actual computation executions. PLC2 is not restricted to the validation of a given mapping since it is ensured by conception. It can enumerate multiple solutions while verifying given constraints. The user can manipulate a solution without violating the whole constraints set by the global system.
Let us remember that the main goal is to run the optimized application on a real machine. So code generation is a major problem. SynDEx output can be connected with any language dedicated to the target architecture because an executive is generated using an intermediate macro-code. EPHORAT generates placement directives. Its generic features enable to suit some architectures families. As sets in PLC2 are described with formulas, it extends decision making and allows to handle a great number of tasks and processors because enumerative methods are too slow in that case. Sets in SynDEx are fully-described. So irreversible decisions are made but simple enumerative techniques yield an optimal and executable code that is chosen in a finite set of solutions.
Distribution and scheduling of computation and communication operations (and their local synchronizations) are the main strengths of AAA. PLC2 one is the interleaved scheduling of parallel computations in order to minimize local memory or latency.
PLC2 handles homogeneous, fine-grain, strongly repetitive and unconditional algorithms on homogeneous SIMD (or MIMD using SPMD programming model) architectures, whereas AAA focuses on heterogeneous, irregular, medium-grain and conditional algorithms on heterogeneous MIMD architectures.
Our new approach consists of simultaneously taking into account heterogeneous and homogeneous aspects of telecom applications using AAA; it is as true at the algorithm and SOC/multi-SOC architecture levels as at the optimization and optimized implementation level (automatic and easy portable code generation).
Usual approaches help the application designer for compiling his source codes and providing a dynamic resource (memory, CPU time, communication routing) allocation; the programmer is in charge of optimization of computation distribution, coding and debugging of the computation and communication sequences.
Our innovative global approach would improve productivity through rapid prototyping. Moreover, debugging of distributed executives is no more a costly task; this approach enables evaluation and performance optimization of algorithms without any physical available architecture, and easy re-targeting of algorithms on other architectures.
Figure 1 - Combining SynDEx and EPHORAT
Figure 2 - PROMPT environment (SynDEx)
Figure 3 - PROMPT environment (EPHORAT)
Figure 4 - SynDEx
Figure 5 - EPHORAT