Crash recovery

In systems software, crash recovery is self-explanatory: it refers to techniques used to allow systems to recover from crashes, while preserving failure atomicity, that operations are either fully executed, or not at all.

There are a few techniques available to us to ensure failure atomicity on the software level:

Shadow copies operates on a working copy of the data to be modified, which is then atomically exchanged at commit-time.
Write-ahead logging (WAL)
Undo logging
Redo logging
assume put operation has a unique timestamp across all clients
stable storage — storage never fails — in practice done with schemes like RAID
flushed — data in DRAM copied to storage — now they’re synced
failure atomicity
- if the operation doesn’t complete — we must revert
- atomic commit on disk?
shadow copy — much like what vim does
- commit point - done with a “swap” syscall
- crash at pre-commit — save to disk
- crash at commit — complete afterwards
undo logging
- example: arrow indicates dependency: change must happen before install
redo logging
- ok if crash while recovery
- b/c idempotent property?
logging costs
- “log writes almost come for free because they’re sequential”
- battery backed RAM — only on crash we write to disk

jszhn

Recent Notes

ALOHA

ARP

American literature

Assert

Atomics

Crash recovery

Graph View

Backlinks