There are two published works [JEO+94] [KM96] on fine-grain hardware/software partitioning which use dynamic programming. In both of these works, the target architecture contains a single microprocessor and a single hardware chip. The authors try to find the best combination of non-overlapping sequences of fine-grain ``basic scheduling blocks'' which fit the available hardware (ASIC or FPGA) and result in maximum speedup by moving the scheduling blocks from software to hardware). The problem is similar to the knapsack problem, and the dynamic programming formulation is used only to avoid repeated computations in their iterative procedure.
There have been a number of research publications on coarse-grain HW/SW partitioning which handle task graphs with only end/begin type communication [Wol95] [DH94] [Ben96] [NG94]. In these task graphs, the total time used by a process is simply the summation of the time used to do the computation and time used to do the communication. These works use greedy heuristics [Wol95], branch and bound search [DH94], or MILP solvers [PP92] [Ben96] as their optimization techniques. In [BEH95] and [GM92], the authors allow midway communication in a fine-grain HW/SW environment. The form of communication allowed is however not as general as the ones proposed in the present work and not at the coarse-level.
The work reported in [DIJ95] for coarse-grain system synthesis, separates the synthesis of computational and communication processes into two distinct stages. In this case, it is very difficult to apply a timing constraint (deadline) on the system because part of the time in the critical path is used to do the computation whereas another part is used to do the communication. In [Wol95], the gradient search method on the solution space is used. In each iteration, the authors perform a generate and test operation commonly used in AI. That is, in each iteration, they try to relocate one process from a CPU to another, relocate a message (communication process) from one bus to another, do the rescheduling on the CPUs and buses, and calculate the change on the cost. If the timing constraints on CPUs or buses are violated, they add one more CPU or bus to fix the problem. The synthesis of computational and communication processes can thus be considered to be performed simultaneously during each iteration of the search on the solution space. The algorithm is, however, greedy and non-optimal. In our work, the timing constraint is applied to all of the computation and communication subprocesses in all critical paths and thus the synthesis of the two kinds of processes is performed simultaneously. Since our algorithm is based on dynamic programming it produces the optimal solution.
In summary, previous works do not address the problem of coarse-grain mapping of communicating processes. This problem is the subject of the present work.