Intro

How can we design an architecture that will achieve the desired quality attributes ?
Sources of architecture
- Theft: From previous systems, literature
- Method: Systematic and conscious, derived from requirements via transformations and heuristics.
- Intuition: Ability to conceive without conscious reasoning. Increased reliance on intuition increases the risk.
Ratio of usage of above three methods varies according to architects experience and novelty.

What is a tactic ? - A tactic is a design decision that influences the control of a quality attribute response.
A collection of tactics is an architectural strategy.
Each tactic is a design option for the architect.

Availability Tactics

All approaches to maintaining availability involve some type of redundancy, some type of health monitoring and some type of recovery when a failure is detected.
Faults cause failures. Availability tactics focus on dealing with faults.
Availability tactics involve- Fault detection, fault recovery and fault prevention.

Fault Detection

Ping/echo and hearbeat generally operate among distinct processes and the exception tactic operates within a single process.

Ping/Echo

One component issues a ping to a component to be checked and expects to receive back an echo within a predefined time.
Response time allows performance to be assessed.
If bandwidth consumption of pings is an issue, then the ping/echo detectors can be organized in a hierarchy.
- Low-level detector pings low level processes and higher level fault detectors ping lower level ones.

Heartbeat

One component emits a heartbeat message periodically and another component listens for it.
Absence of heartbeat means originating component has failed.
Heartbeat messages can be combined with useful data.

Exceptions

Exceptions encountered during operation.
Exception handler is invoked which typically executes in the same process that introduced the exception.

Fault Recovery

Fault recovery consists of preparing for recovery and making the actual system repair as well reintroduction of components after repair.

Preparation and Repair Tactics

Voting

Processes running on redundant processors each take equivalent input and compute a simple output value that is sent to a voter.
Voter detects deviant behaviour from a single processor - then it fails it.
Different choices of voting algorithm - "majority wins" or "preferred component".
Often used in control systems to correct faulty algo's or processors.

Active Redundancy (Hot restart)

There are N redundant components - all of which respond to events in parallel.
Response/output from only one component is used though and rest are discarded.
Downtime is minimal, because backups are current and time to recover is only the switching time.
E.g. LAN with a number of parallel paths and redundant component in a separate path.
Synch is done by ensuring that all msgs to any component are sent to all redundant components, therefore a reliable transmission protocol may be required.

Passive Redundancy (Warm restart)

One component (the primary) responds to events and informs the other components (the standbys) of status updates.
When a fault occurs, backup state on standby must be fresh before resuming services.

Spare

Standby spare platform.
Must be rebooted to the appropriate software config and the state must be initialized to the point where the failure occurs.
Therefore checkpoints of the system state must be made regularly.

Repair Tactics / Component Reintroduction

When a redundant comp fails, it may be reintroduced after it has been repaired.

Shadow operation

The previously failed component may be made to run in shadow mode to mimic behaviour of working components for a short time before making it operational.

State resynchronization

Restored component must have its state upgraded before return to service.
Ideal approach to update the state is a single atomic message. Incremental state upgrades lead to complicated software.

Checkpoint/Rollback

A checkpoint is recording of consistent states either periodically or in response to specific events.
System can be restored using a previous consistent checkpoint and a log of transactions since the last checkpoint was taken.

Fault Prevention

Removal from Service

Removes a component from operation to undergo activities to prevent anticipated failures.
For e.g. rebooting a component regularly to prevent memory leaks from causing a failure.
Arch strategy must be designed to support it.

Transactions

Bundling together of several actions so that entire bundle can be undone at once.
If one action is failed, entire transaction is failed.
Intermediate data doesnt corrupt output and affect rest of system.
Lock shared data - threads.

Process Monitor

Detect and shutdown failed processes,
New process instance created and state recovered.

Modifiability Tactics

Goal is to control time and cost to implement, test and deploy changes.

Localize Modifications

Goals of tactics is to assign responsibilities to modules during design such that anticipated changes will be limited in scope.

Maintain semantic coherence

Responsibilities should work together without excessive reliance on other modules.

Abstract common services

Makes modifiability easy.

Anticipate expected changes

Considering set of future changes helps to evaluate assignment of responsibilities.

Generalize the module

Make a module compute a broader range of functions based on input. For e.g. constants can be passed in as input parameters.
Basically, more general a module is, the more likely that requested changes can be made by adjusting the input rather than by modifying the module.

Limit possible options

Restricting possible change options can reduce effect of modifications.
For e.g. restrict processors to only be members of a certain family - limits the option and reduce the effect of modifications.

Prevent ripple effects

A ripple effect from a modification is the necessity of making changes to modules not directly affected by it.
Various types of dependencies one module can have on another:
- Syntax of data and service.
- Semantics of data and service.
- Sequence of data : e.g. protocol sequence
- Sequence of control: e.g. A must have executed no longer than 5ms before B executes.
- Identity of an interface of a module: Id (name/handle) of an interface of A must be consistent with assumptions of B.
- Runtime location of A: For B to exec correctly.
- QOS of service/data provided by A. e.g. accuracy must be within a certain range.
- Existence of A: For B
- Resource behaviour of A: e.g. use of memory or resource ownership.

Hide Information

Oldest technique. Hide private data.

Maintain existing interfaces

Creating abstract interfaces to mask variations.
Add interfaces, adapters, providing a stub (proxy pattern).

Restrict communication paths

Reduce the no of data providers and consumers to and from the module.

Use an intermediary

For non semantic dependencies, add an intermediary b/w B and A that manages activities associated with the dependency.
- Data (syntax) : Convert syntax from A to B's.
- Service (syntax) : Facade, Proxy, Factory : provide intermediaries that convert syntax of a service from A to B.
- Identity of an interface: Broker pattern
- Location of A (Runtime) : Name server. LDAP etc.
- Resource behaviour: Introduce a resource manager.
- Existence of A: Factory pattern.

Defer Binding Time

Time to deploy and allowing non developers (sys admins and end users) to make changes.
Tactics:

Runtime registration: Pub/sub registration.
Config files: set params at startup.
Polymorphism: Late binding of method calls.
Component replacement: allows load time binding.
Adherence to defined protocols: Allows runtime binding of independent processes.

Performance Tactics

Goal of performance tactics it to generate a response to an event arriving at the system with some time constraint.
Main thing is to control the time within which a response is generated - the latency.
Two basic contributors to resource time:
- Resource consumption: CPU, database, network, memory, internal entities such as buffers. All these contribute to latency.
- Blocked time: Blocking can happen due to various reasons:
  - Contention: Multiple events compete for the resource.
  - Availability: Resource may be unavailable for some reason (e.g. failure - network down)
  - Dependency on other computation: For e.g. data must be cached from DB before it can be read - this can cause latency.

Resource Demand

One tactic is reduce the resources required:

Increase Computational Efficiency

Use efficient algorithms.

Reduce computational overhead

Eliminate intermediaries (for e.g. RMI - adds lot of overhead)
This is a trade-off between modifiability and performance.

Another tactic is to reduce the number of events processed:

Manage Event Rate

Reduce sampling rate - there can be unnecessary oversampling.

Control Frequency of Sampling

If no control over the arrival of external events - queued requests can be sampled at lower frequency.

Control the use of resources

Bound execution times

Place a limit on how much exec time - for e.g. limit the time given to an algo.

Bound queue sizes

Control max no. of queued arrivals.

Resource Management

What if resource demand is not controllable, mgmt of resources affect response times.

Introduce Concurrency

Parallelizing processing can reduce blocking times.

Maintain multiple copies of either data or computations

In client-server architecture use caching to reduce contention.

Increase available resources

Faster processors, additional processors
Add more memory, network bandwidth.
Trade-off between cost and performance.

Resource Arbitration

Whenever there is contention for a resource, the resource must be scheduled.
Basically, a scheduling policy for the resource.
Scheduling policies can be:

FIFO Scheduling

All requests for resources are treated equally and are satisfied in turn.

Fixed Priority Scheduling

Assign each request a particular priority and assigns resources in that priority.
Priority can be assigned according to
- Semantic importance: According to domain characteristics.
- Deadline monotonic: Higher priority to shorter deadlines.
- Rate monotonic: Higher priority to streams with shorter periods.

Dynamic Priority Scheduling

Round robin: Orders requests and assigns resource to next request in round robin order.
EDF: Assigns priorities based on pending requests with the earliest deadline.

Static Scheduling

Sequence of assignment of resources is determined offline.

Security Tactics

Can be divided into three types of tactics : resting attacks (e.g. lock), detecting attacks (e.g. sensor), recovering from attacks (e.g. insurance).

Resisting attacks

Address the requirements of security of a system.

Authenticate users: Users are who they claim to be. Passwords etc.
Authorize users: Authenticated user has the rights to access and modify either data or services. Access Control Systems etc.
Maintain data confidentiality: Encryption, public key authentication.
Maintain integrity: Redundant information encoded in it - e.g. checksums, hash results.
Limit exposure: Attacks typically all data & services on a single host. Architect can design allocation of services to hosts so that limited services are available on each host.
Limit access: Limit access from sources using firewalls. Establish DMZ. DMZ is a subnet that exists between the firewall protecting the internal LAN and the wider internet. Hosts within DMZ have limited connectivity to internal hosts while communicating with external hosts.

Detecting attacks

Intrusion detection systems. e.g. compare network traffic patterns against known ones.
Artificial immune systems
Set traps

Recovering from attacks

Can be divided into restoring state and identifying attacker.

Recovering state: related to availability tactics.
- Especially maintain redundant copoes of sys admin data such as passwords, ACL's, and user profile data.

Tactics for identifying attackers: Maintain an audit trail. Audit can be used trace actions of attacker, support non repudiation and support system recovery.

Usability Tactics

Runtime Tactics

Support user initiative:
- Undo, redo, aggregate: All require architectural consideration.

Support system initiative:

Maintain a task model: Task analysis. e.g. auto correct beginning of sentences to capital.
Maintain a model of user: e.g. Personas. How much the user's knows about the system, users behaviour in terms of expected response times. e.g. Page scrolling rate.
Maintain a model of the system: Determines expected system behaviour so that appropriate feedback can be given to user. e.g. time needed to complete certain activity.

Design time Tactics

Separate UI from rest of application: Allows UI developers to frequently change the UI and maintain code separately.
Classic design pattern to implement this tactic: Model-View-Controller (MVC).

Testability Tactics

Manage input/output

Record/playback:Capture information crossing an interface and use it as input to the test harness.
Separate interface from implementation: Stubbing implementation allows rest of system to be tested in absence of stubbed component.
Specialize access routes/interfaces: Have specialized testing interfaces.

Internal Monitoring

Built in monitors: Record events when monitoring states have been activated. Increased visibility into activities of the component.

Architecture Tactics

Contents

Intro

Availability Tactics

Fault Detection

Ping/Echo

Heartbeat

Exceptions

Fault Recovery

Preparation and Repair Tactics

Voting

Active Redundancy (Hot restart)

Passive Redundancy (Warm restart)

Spare

Repair Tactics / Component Reintroduction

Shadow operation

State resynchronization

Checkpoint/Rollback

Fault Prevention

Removal from Service

Transactions

Process Monitor

Modifiability Tactics

Localize Modifications

Maintain semantic coherence

Abstract common services

Anticipate expected changes

Generalize the module

Limit possible options

Prevent ripple effects

Hide Information

Maintain existing interfaces

Restrict communication paths

Use an intermediary

Defer Binding Time

Performance Tactics

Resource Demand

Increase Computational Efficiency

Reduce computational overhead

Manage Event Rate

Control Frequency of Sampling

Bound execution times

Bound queue sizes

Resource Management

Introduce Concurrency

Maintain multiple copies of either data or computations

Increase available resources

Resource Arbitration

FIFO Scheduling

Fixed Priority Scheduling

Dynamic Priority Scheduling

Static Scheduling

Security Tactics

Resisting attacks

Detecting attacks

Recovering from attacks

Usability Tactics

Runtime Tactics

Design time Tactics

Testability Tactics

Manage input/output

Internal Monitoring

Navigation menu

Search