Revision as of 00:33, 18 January 2014

Relational Databases

Set of named relations (tables)
Each relation has a set of named attributes (columns)
Each tuple (row) has a value for each attribute
Each attribute has a type (or domain)
All queries return relations (tables), so it can said to be compositional or closed.

Alternative to relational model to store data
Standard for data representation and exchange
Document format similar to HTML, however tags represent content instead of formatting.
XML consists of tagged elements which can be nested, attributes of elements and the text (values/data)
XML document can be thought of as a tree with values/text forming the leaves.

	Relational	XML
Structure	Tables	Hierarchical/Tree
Schema	Fixed in advance	Flexible, self-describing. e.g. all elements need not have same attributes. (In relational, we would need to have NULL values). Inconsistencies in the structure is NOT a problem.
Queries	Relatively simple	Complicated compared to Relational / SQL.
Ordering	Fundamentally UNORDERED. e.g. Order by clause is required in SQL in case order is desired.	Ordering is implied - e.g. the way the text is laid out in the XML document or in the Stream.
Implementation	Mature - native implementation	Add - on. (Underlying implementation is relational)

Since XML occurs a lot on the internet, a convenient way is required to display XML.
Use rule based languages such as CSS, XSL to translate XML to HTML.
XML document is fed to a CSS/XSL interpreter alongwith rules and a HTML document is generated.

Valid XML should adhere to requirements for well formed XML
Also should adhere to content-specific specifications : DTD/XSD
DTD is a grammar-like language for specifying elements, attributes, nesting, ordering, #occurrences
DTD/XSD offers the benefits of "typing" - can serve as a specification. Useful for data exchange.
Having no DTD offers flexibility of data.
DTD also can specify pointer types - an element can have an ID and another element can refer to it using an IDREF.

Like DTD, but more comprehensive.
XSD is written in XML itself.
Everything in DTD is string, in XSD e.g., we can specify that an attribute should be of a specific type.
Pointers can be typed, so we can enforce that a reference is made to a particular type and not just global like in DTD's
Also specify min/max constraints on elements

Like XML, JSON can be thought of as a data model - as an alternative to the relational data model.
Useful for semi-structured data
Designed originally for serializing data objects
JSON is human readable - often used for data interchange
JSON consists of base values, objects (label-value pairs), and arrays to hold either. Structure is recursive.
JSON vs Relational - JSON differs in how data is stored, schema is self-describing, ordering is present, query support is minimal unlike XQuery, XPath, XSLT, and is implemented typically in prog languages
JSON vs XML - XML more verbose/complex than JSON, like DTD, XSD, JSON has JSON schema.
XML model dont match with programming languages. Interface is clunky. This is called impedence mismatch - similar to Relational and prog languages.
JSON has a more direct mapping with prog languages.

Adheres to basic structural requirements of JSON. (label-value pairs, arrays of values)
Can use JSON schema for semantic validation.

Relational algebra is the formal language that is the underpinning for implemented languages like SQL
Query (expression) on a set of results produces relation as a result.
Simplest query in RA is the name of the relation e.g. Student which produce the contents of the Student relation
Operators to filter, slice and combine relations.
Select operator : Sigma (cond) Relation name. Condition is a filter and the result will be a subset of the relation
Project operator : Select picks rows, project picks columns. Pi (col name) Relation Name.
To pick both rows and cols, combine the select and project operator. e.g. Project(sid,name) (Select(gpa > 3.7) student)
Cross Product : By default RA is based on sets and doesnt consider duplicates. But SQL uses multisets/bags and includes duplicates - Cross product between two relations combines all attributes (columns) and every combination of tuples between the relation. m X n attrs and a X B tuples.
Natural Join : Is like cross product but enforces equality on all attributes with the same name. Eliminate one copy of duplicate attributes (same attribute from diff table). So if we use the nat. join operator we dont need to specify the key attributes need to be equal condition. Doesnt provide any additional expressive power, but is convenient notationally.

@@ Line 101: / Line 101: @@
 * Project operator : Select picks rows, project picks columns. Pi (col name) Relation Name.
 * To pick both rows and cols, combine the select and project operator. e.g. Project(sid,name) (Select(gpa > 3.7) student)
-* Cross Product : By default RA is based on sets and doesnt consider duplicates. But SQL uses multisets/bags and includes duplicates.
+* Cross Product : By default RA is based on sets and doesnt consider duplicates. But SQL uses multisets/bags and includes duplicates - Cross product between two relations combines all attributes (columns) and every combination of tuples between the relation. m X n attrs and a X B tuples.
-Cross product between two relations combines all attributes (columns) and every combination of tuples between the relation. m X n attrs and a X B tuples.
+* Natural Join : Is like cross product but enforces equality on all attributes with the same name. Eliminate one copy of duplicate attributes (same attribute from diff table). So if we use the nat. join operator we dont need to specify the key attributes need to be equal condition. Doesnt provide any additional expressive power, but is convenient notationally.