In systems software, crash recovery is self-explanatory: it refers to techniques used to allow systems to recover from crashes, while preserving failure atomicity, that operations are either fully executed, or not at all.
There are a few techniques available to us to ensure failure atomicity on the software level:
-
Shadow copies operates on a working copy of the data to be modified, which is then atomically exchanged at commit-time.
-
Write-ahead logging (WAL)
-
assume
putoperation has a unique timestamp across all clients -
stable storage — storage never fails — in practice done with schemes like RAID
-
flushed — data in DRAM copied to storage — now they’re synced
-
failure atomicity
- if the operation doesn’t complete — we must revert
- atomic commit on disk?
-
shadow copy — much like what vim does
- commit point - done with a “swap” syscall
- crash at pre-commit — save to disk
- crash at commit — complete afterwards
-
undo logging
- example: arrow indicates dependency: change must happen before install
-
redo logging
- ok if crash while recovery
- b/c idempotent property?
-
logging costs
- “log writes almost come for free because they’re sequential”
- battery backed RAM — only on crash we write to disk