Business Description
The idea of multi-level checkpointing is that checkpoints
are taken for each level of faults, but at different periods.
Intuitively, the less frequent the faults, the longer the checkpointing
period: this is because the risk of a failure striking
is lower when going to higher levels; hence the expected reexecution
time is lower too; one can safely checkpoint less
frequently, thereby reducing failure-free overhead (checkpointing
is useless in the absence of fault). There are several
natural approaches to implement multi-level checkpointing.
The first option is to use independent checkpointing periods for
each level. This option raises several difficulties, the most
prominent one being overlapping checkpoints. Typically,
we need to checkpoint different levels in sequence (e.g., writing
into memory before writing onto disk), so we would
need to delay some checkpoints, which might not be possible
in some environments, and which would introduce irregular
periods. The second option is to synchronize all checkpoint
levels by nesting them inside a periodic pattern that repeats
over time, as illustrated in Fig. 1a. In this figure, the pattern
has five computational segments, each followed by a level-1
checkpoint. A segment is a chunk of work between two
checkpoints, and a pattern consists in segments and checkpoints.
The second and fifth level-1 checkpoints are followed
by a level-2 checkpoint. Finally, the pattern ends with a
level-3 checkpoint. When using patterns, a checkpoint at
level ‘ is always preceded by checkpoints at all lower levels 1
to ‘ 1, which makes good sense in practice (e.g., with two
levels, main memory and disk, one writes the data into memory
before transferring it to disk).
For more details :
Product Intro Video candles online