Intro

How can we design an architecture that will achieve the desired quality attributes ?
Sources of architecture
- Theft: From previous systems, literature
- Method: Systematic and conscious, derived from requirements via transformations and heuristics.
- Intuition: Ability to conceive without conscious reasoning. Increased reliance on intuition increases the risk.
Ratio of usage of above three methods varies according to architects experience and novelty.

What is a tactic ? - A tactic is a design decision that influences the control of a quality attribute response.
A collection of tactics is an architectural strategy.
Each tactic is a design option for the architect.

Availability Tactics

All approaches to maintaining availability involve some type of redundancy, some type of health monitoring and some type of recovery when a failure is detected.
Availability tactics involve- Fault detection, fault recovery and fault prevention.

Fault Detection

Ping/echo and hearbeat generally operate among distinct processes and the exception tactic operates within a single process.

Ping/Echo

One component issues a ping to a component to be checked and expects to receive back an echo within a predefined time.
Response time allows performance to be assessed.
If bandwidth consumption of pings is an issue, then the ping/echo detectors can be organized in a hierarchy.
- Low-level detector pings low level processes and higher level fault detectors ping lower level ones.

Heartbeat

One component emits a heartbeat message periodically and another component listens for it.
Absence of heartbeat means originating component has failed.
Heartbeat messages can be combined with useful data.

Exceptions

Exceptions encountered during an exception.
Exception handler is invoked which typically executes in the same process that introduced the exception.

Fault Recovery

Fault recovery consists of preparing for recovery and making the actual system repair as well reintroduction of components after repair.

= Preparation and Repair Tactics

Voting

Processes running on redundant processors each take equivalent input and compute a simple output value that is sent to a voter.
Voter detects deviant behaviour from a single processor - then it fails it.
Different choices of voting algorithm - "majority wins" or "preferred component".
Often used in control systems to correct faulty algo's or processors.

Active Redundancy (Hot restart)

There are N redundant components - all of which respond to events in parallel.
Response/output from only one component is used though and rest are discarded.
Downtime is minimal, because backups are current and time to recover is only the switching time.
E.g. LAN with a number of parallel paths and redundant component in a separate path.
Synch is done by ensuring that all msgs to any component are sent to all redundant components, therefore a reliable transmission protocol may be required.

Passive Redundancy (Warm restart)

One component (the primary) responds to events and informs the other components (the standbys) of status updates.
When a fault occurs, backup state on standby must be fresh before resuming services.

Spare

Standby spare platform.
Must be rebooted to the appropriate software config and the state must be initialized to the point where the failure occurs.
Therefore checkpoints of the system state must be made regularly.

Repair Tactics / Component Reintroduction

When a redundant comp fails, it may be reintroduced after it has been repaired.

Shadow operation

The previously failed component may be made to run in shadow mode to mimic behaviour of working components for a short time before making it operational.

State resynchronization

Restored component must have its state upgraded before return to service.
Ideal approach to update the state is a single atomic message. Incremental state upgrades lead to complicated software.

Checkpoint/Rollback

A checkpoint is recording of consistent states either periodically or in response to specific events.
System can be restored using a previous consistent checkpoint and a log of transactions since the last checkpoint was taken.

Fault Prevention

Removal from Service

Removes a component from operation to undergo activities to prevent anticipated failures.
For e.g. rebooting a component regularly to prevent memory leaks from causing a failure.
Arch strategy must be designed to support it.

Transactions

Bundling together of several actions so that entire bundle can be undone at once.
If one action is failed, entire transaction is failed.
Intermediate data doesnt corrupt output and affect rest of system.
Lock shared data - threads.

Process Monitor

Detect and shutdown failed processes,
New process instance created and state recovered.

Modifiability Tactics

Goal is to control time and cost to implement, test and deploy changes.

Localize Modifications

Goals of tactics is to assign responsibilities to modules during design such that anticipated changes will be limited in scope.

Maintain semantic coherence

Responsibilities should work together without excessive reliance on other modules.
Abstract common services - makes modifiability easy.

Anticipate expected changes

Considering set of future changes helps to evaluate assignment of responsibilities.

Generalize the module

Make a module compute a broader range of functions based on input. For e.g. constants can be passed in as input parameters.
Basically, more general a module is, the more likely that requested changes can be made by adjusting the input rather than by modifying the module.

Limit possible options

Restricting possible change options can reduce effect of modifications.
For e.g. restrict processors to only be members of a certain family - limits the option and reduce the effect of modifications.

Prevent ripple effects

A ripple effect from a modification is the necessity of making changes to modules not directly affected by it.
Various types of dependencies one module can have on another:
- Syntax of data and service.
- Semantics of data and service.
- Sequence of data : e.g. protocol sequence
- Sequence of control: e.g. A must have executed no longer than 5ms before B executes.
- Identity of an interface of a module: Id (name/handle) of an interface of A must be consistent with assumptions of B.
- Runtime location of A: For B to exec correctly.
- QOS of service/data provided by A. e.g. accuracy must be within a certain range.
- Existence of A: For B
- Resource behaviour of A: e.g. use of memory or resource ownership.

Hide Information

Oldest technique. Hide private data.

Maintain existing interfaces

Creating abstract interfaces to mask variations.
Add interfaces, adapters, providing a stub (proxy pattern).

Restrict communication paths

Reduce the no of data providers and consumers to and from the module.

Use an intermediary

For non semantic dependencies, add an intermediary b/w B and A that manages activities associated with the dependency.
- Data (syntax) : Convert syntax from A to B's.
- Service (syntax) : Facade, Proxy, Factory : provide intermediaries that convert syntax of a service from A to B.
- Identity of an interface: Broker pattern
- Location of A (Runtime) : Name server. LDAP etc.
- Resource behaviour: Introduce a resource manager.
- Existence of A: Factory pattern.

Architecture Tactics

Contents

Intro

Availability Tactics

Fault Detection

Ping/Echo

Heartbeat

Exceptions

Fault Recovery

= Preparation and Repair Tactics

Voting

Active Redundancy (Hot restart)

Passive Redundancy (Warm restart)

Spare

Repair Tactics / Component Reintroduction

Shadow operation

State resynchronization

Checkpoint/Rollback

Fault Prevention

Removal from Service

Transactions

Process Monitor

Modifiability Tactics

Localize Modifications

Maintain semantic coherence

Anticipate expected changes

Generalize the module

Limit possible options

Prevent ripple effects

Hide Information

Maintain existing interfaces

Restrict communication paths

Use an intermediary

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools