Revision as of 06:19, 28 March 2012

Intro

How can we design an architecture that will achieve the desired quality attributes ?
Sources of architecture
- Theft: From previous systems, literature
- Method: Systematic and conscious, derived from requirements via transformations and heuristics.
- Intuition: Ability to conceive without conscious reasoning. Increased reliance on intuition increases the risk.
Ratio of usage of above three methods varies according to architects experience and novelty.

What is a tactic ? - A tactic is a design decision that influences the control of a quality attribute response.
A collection of tactics is an architectural strategy.
Each tactic is a design option for the architect.

Availability Tactics

All approaches to maintaining availability involve some type of redundancy, some type of health monitoring and some type of recovery when a failure is detected.
Availability tactics involve- Fault detection, fault recovery and fault prevention.

Fault Detection

Ping/echo and hearbeat generally operate among distinct processes and the exception tactic operates within a single process.

Ping/Echo

One component issues a ping to a component to be checked and expects to receive back an echo within a predefined time.
Response time allows performance to be assessed.
If bandwidth consumption of pings is an issue, then the ping/echo detectors can be organized in a hierarchy.
- Low-level detector pings low level processes and higher level fault detectors ping lower level ones.

Heartbeat

One component emits a heartbeat message periodically and another component listens for it.
Absence of heartbeat means originating component has failed.
Heartbeat messages can be combined with useful data.

Exceptions

Exceptions encountered during an exception.
Exception handler is invoked which typically executes in the same process that introduced the exception.

Fault Recovery

Fault recovery consists of preparing for recovery and making the actual system repair as well reintroduction of components after repair.

= Preparation and Repair Tactics

Voting

Processes running on redundant processors each take equivalent input and compute a simple output value that is sent to a voter.
Voter detects deviant behaviour from a single processor - then it fails it.
Different choices of voting algorithm - "majority wins" or "preferred component".
Often used in control systems to correct faulty algo's or processors.

Active Redundancy (Hot restart)

There are N redundant components - all of which respond to events in parallel.
Response/output from only one component is used though and rest are discarded.
Downtime is minimal, because backups are current and time to recover is only the switching time.
E.g. LAN with a number of parallel paths and redundant component in a separate path.
Synch is done by ensuring that all msgs to any component are sent to all redundant components, therefore a reliable transmission protocol may be required.

Passive Redundancy (Warm restart)

One component (the primary) responds to events and informs the other components (the standbys) of status updates.
When a fault occurs, backup state on standby must be fresh before resuming services.

Spare

Standby spare platform.
Must be rebooted to the appropriate software config and the state must be initialized to the point where the failure occurs.
Therefore checkpoints of the system state must be made regularly.

Repair Tactics / Component Reintroduction

When a redundant comp fails, it may be reintroduced after it has been repaired.

Shadow operation

The previously failed component may be made to run in shadow mode to mimic behaviour of working components for a short time before making it operational.

State resynchronization

Restored component must have its state upgraded before return to service.
Ideal approach to update the state is a single atomic message. Incremental state upgrades lead to complicated software.

Checkpoint/Rollback

A checkpoint is recording of consistent states either periodically or in response to specific events.
System can be restored using a previous consistent checkpoint and a log of transactions since the last checkpoint was taken.

Fault Prevention

Removal from Service

Removes a component from operation to undergo activities to prevent anticipated failures.
For e.g. rebooting a component regularly to prevent memory leaks from causing a failure.
Arch strategy must be designed to support it.

Transactions

Bundling together of several actions so that entire bundle can be undone at once.
If one action is failed, entire transaction is failed.
Intermediate data doesnt corrupt output and affect rest of system.
Lock shared data - threads.

Process Monitor

Detect and shutdown failed processes,
New process instance created and state recovered.

@@ Line 41: / Line 41: @@
 == Fault Recovery ==
-* Fault recovery consists of preparing for recovery and making the actual system repair.
+* Fault recovery consists of preparing for recovery and making the actual system repair as well reintroduction of components after repair.
-=== Voting ===
+=== Preparation and Repair Tactics ==
+==== Voting ====
 * Processes running on redundant processors each take equivalent input and compute a simple output value that is sent to a voter.
@@ Line 50: / Line 52: @@
 * Often used in control systems to correct faulty algo's or processors.
-=== Active Redundancy (Hot restart) ===
+==== Active Redundancy (Hot restart) ====
 * There are N redundant components - all of which respond to events in parallel.
@@ Line 58: / Line 60: @@
 * Synch is done by ensuring that all msgs to any component are sent to all redundant components, therefore a reliable transmission protocol may be required.
-=== Passive Redundancy (Warm restart) ===
+==== Passive Redundancy (Warm restart) ====
 * One component (the primary) responds to events and informs the other components (the standbys) of status updates.
 * When a fault occurs, backup state on standby must be fresh before resuming services.
-=== Spare ===
+==== Spare ====
 * Standby spare platform.

Difference between revisions of "Architecture Tactics"

Revision as of 06:19, 28 March 2012

Contents

Intro

Availability Tactics

Fault Detection

Ping/Echo

Heartbeat

Exceptions

Fault Recovery

= Preparation and Repair Tactics

Voting

Active Redundancy (Hot restart)

Passive Redundancy (Warm restart)

Spare

Repair Tactics / Component Reintroduction

Shadow operation

State resynchronization

Checkpoint/Rollback

Fault Prevention

Removal from Service

Transactions

Process Monitor

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools