Article ID: | iaor2005185 |
Country: | United Kingdom |
Volume: | 42 |
Issue: | 14 |
Start Page Number: | 2877 |
End Page Number: | 2898 |
Publication Date: | Jan 2004 |
Journal: | International Journal of Production Research |
Authors: | Girault A., Kalla H., Sorel Y. |
Keywords: | heuristics, simulation: applications |
Hardware fault tolerance is an important consideration in critical distributed real-time embedded systems and has been extensively researched. In these systems, critical real-time constraints must be satisfied even in the presence of hardware component failures. Our goal is to propose a solution to automatically produce a fault-tolerant distributed schedule of a given algorithm onto a given distributed architecture, according to real-time constraints. The distributed architectures we consider have bidirectional point-to-point communication links. Our solution is a list scheduling heuristics, based on disjoint paths to tolerate a fixed number of arbitrary processor and communication link failures. Because of the resource limitation in embedded systems, our heuristics implements a software solution based on the active replication technique, where each operation of the algorithm is replicated on different processors. With a detailed example, we show the techniques used to satisfy the real-time constraints and tolerate the failure of processor and communication links. Simulations show the efficiency of our method compared with other heuristics found in the literature.