Difference between revisions of "Architecture Tactics"
From Suhrid.net Wiki
Jump to navigationJump to searchLine 43: | Line 43: | ||
* Fault recovery consists of preparing for recovery and making the actual system repair. | * Fault recovery consists of preparing for recovery and making the actual system repair. | ||
− | === Voting == | + | === Voting === |
* Processes running on redundant processors each take equivalent input and compute a simple output value that is sent to a voter. | * Processes running on redundant processors each take equivalent input and compute a simple output value that is sent to a voter. | ||
Line 49: | Line 49: | ||
* Different choices of voting algorithm - "majority wins" or "preferred component". | * Different choices of voting algorithm - "majority wins" or "preferred component". | ||
* Often used in control systems to correct faulty algo's or processors. | * Often used in control systems to correct faulty algo's or processors. | ||
+ | |||
+ | === Active Redundancy (Hot restart) === | ||
+ | |||
+ | * There are N redundant components - all of which respond to events in parallel. | ||
+ | * Response/output from only one component is used though and rest are discarded. | ||
+ | * Downtime is minimal, because backups are current and time to recover is only the switching time. | ||
+ | * E.g. LAN with a number of parallel paths and redundant component in a separate path. | ||
+ | * Synch is done by ensuring that all msgs to any component are sent to all redundant components, therefore a reliable transmission protocol may be required. | ||
+ | |||
+ | === Passive Redundancy (Warm restart) === | ||
+ | |||
+ | * One component (the primary) responds to events and informs the other components (the standbys) of status updates. | ||
+ | * When a fault occurs, backup state on standby must be fresh before resuming services. | ||
+ | |||
+ | === Spare === | ||
+ | |||
+ | * Standby spare platform. | ||
+ | * Must be rebooted to the appropriate software config and the state must be initialized to the point where the failure occurs. | ||
+ | * Therefore checkpoints of the system state must be made regularly. |
Revision as of 06:03, 28 March 2012
Contents
Intro
- How can we design an architecture that will achieve the desired quality attributes ?
- Sources of architecture
- Theft: From previous systems, literature
- Method: Systematic and conscious, derived from requirements via transformations and heuristics.
- Intuition: Ability to conceive without conscious reasoning. Increased reliance on intuition increases the risk.
- Ratio of usage of above three methods varies according to architects experience and novelty.
- What is a tactic ? - A tactic is a design decision that influences the control of a quality attribute response.
- A collection of tactics is an architectural strategy.
- Each tactic is a design option for the architect.
Availability Tactics
- All approaches to maintaining availability involve some type of redundancy, some type of health monitoring and some type of recovery when a failure is detected.
- Availability tactics involve- Fault detection, fault recovery and fault prevention.
Fault Detection
- Ping/echo and hearbeat generally operate among distinct processes and the exception tactic operates within a single process.
Ping/Echo
- One component issues a ping to a component to be checked and expects to receive back an echo within a predefined time.
- Response time allows performance to be assessed.
- If bandwidth consumption of pings is an issue, then the ping/echo detectors can be organized in a hierarchy.
- Low-level detector pings low level processes and higher level fault detectors ping lower level ones.
Heartbeat
- One component emits a heartbeat message periodically and another component listens for it.
- Absence of heartbeat means originating component has failed.
- Heartbeat messages can be combined with useful data.
Exceptions
- Exceptions encountered during an exception.
- Exception handler is invoked which typically executes in the same process that introduced the exception.
Fault Recovery
- Fault recovery consists of preparing for recovery and making the actual system repair.
Voting
- Processes running on redundant processors each take equivalent input and compute a simple output value that is sent to a voter.
- Voter detects deviant behaviour from a single processor - then it fails it.
- Different choices of voting algorithm - "majority wins" or "preferred component".
- Often used in control systems to correct faulty algo's or processors.
Active Redundancy (Hot restart)
- There are N redundant components - all of which respond to events in parallel.
- Response/output from only one component is used though and rest are discarded.
- Downtime is minimal, because backups are current and time to recover is only the switching time.
- E.g. LAN with a number of parallel paths and redundant component in a separate path.
- Synch is done by ensuring that all msgs to any component are sent to all redundant components, therefore a reliable transmission protocol may be required.
Passive Redundancy (Warm restart)
- One component (the primary) responds to events and informs the other components (the standbys) of status updates.
- When a fault occurs, backup state on standby must be fresh before resuming services.
Spare
- Standby spare platform.
- Must be rebooted to the appropriate software config and the state must be initialized to the point where the failure occurs.
- Therefore checkpoints of the system state must be made regularly.