Difference between revisions of "Databases"

From Suhrid.net Wiki
Jump to navigationJump to search
Line 97: Line 97:
 
* Query (expression) on a set of results produces relation as a result.
 
* Query (expression) on a set of results produces relation as a result.
 
* Simplest query in RA is the name of the relation e.g. Student which produce the contents of the Student relation
 
* Simplest query in RA is the name of the relation e.g. Student which produce the contents of the Student relation
 +
* Operators to filter, slice and combine relations.
 +
* Select operator : Sigma (cond) Relation name. Condition is a filter and the result will be a subset of the relation
 +
* Project operator : Select picks rows, project picks columns. Pi (col name) Relation Name.
 +
* To pick both rows and cols, combine the select and project operator. e.g. Project(sid,name) (Select(gpa > 3.7) student)
 +
* Cross Product : By default RA is based on sets and doesnt consider duplicates. But SQL uses multisets/bags and includes duplicates.
 +
Cross product between two relations combines all attributes (columns) and every combination of tuples between the relation. m X n attrs and a X B tuples.

Revision as of 06:08, 16 January 2014

Relational Databases

  • Set of named relations (tables)
  • Each relation has a set of named attributes (columns)
  • Each tuple (row) has a value for each attribute
  • Each attribute has a type (or domain)
  • All queries return relations (tables), so it can said to be compositional or closed.

XML Data

  • Alternative to relational model to store data
  • Standard for data representation and exchange
  • Document format similar to HTML, however tags represent content instead of formatting.
  • XML consists of tagged elements which can be nested, attributes of elements and the text (values/data)
  • XML document can be thought of as a tree with values/text forming the leaves.

Relational vs XML

Relational XML
Structure Tables Hierarchical/Tree
Schema Fixed in advance Flexible, self-describing. e.g. all elements need not have same attributes. (In relational, we would need to have NULL values). Inconsistencies in the structure is NOT a problem.
Queries Relatively simple Complicated compared to Relational / SQL.
Ordering Fundamentally UNORDERED. e.g. Order by clause is required in SQL in case order is desired. Ordering is implied - e.g. the way the text is laid out in the XML document or in the Stream.
Implementation Mature - native implementation Add - on. (Underlying implementation is relational)

Well formed XML

  • Single root element
  • XML parser is used to validate XML.
  • DOM parser - treats XML as a document, SAX parser - treats XML as a stream.
  • Matched tags, proper nesting
  • Unique attributes within each element

Displaying XML

  • Since XML occurs a lot on the internet, a convenient way is required to display XML.
  • Use rule based languages such as CSS, XSL to translate XML to HTML.
  • XML document is fed to a CSS/XSL interpreter alongwith rules and a HTML document is generated.

Valid XML

  • Valid XML should adhere to requirements for well formed XML
  • Also should adhere to content-specific specifications : DTD/XSD
  • DTD is a grammar-like language for specifying elements, attributes, nesting, ordering, #occurrences
  • DTD/XSD offers the benefits of "typing" - can serve as a specification. Useful for data exchange.
  • Having no DTD offers flexibility of data.
  • DTD also can specify pointer types - an element can have an ID and another element can refer to it using an IDREF.

XSD

  • Like DTD, but more comprehensive.
  • XSD is written in XML itself.
  • Everything in DTD is string, in XSD e.g., we can specify that an attribute should be of a specific type.
  • Pointers can be typed, so we can enforce that a reference is made to a particular type and not just global like in DTD's
  • Also specify min/max constraints on elements

JSON

  • Like XML, JSON can be thought of as a data model - as an alternative to the relational data model.
  • Useful for semi-structured data
  • Designed originally for serializing data objects
  • JSON is human readable - often used for data interchange
  • JSON consists of base values, objects (label-value pairs), and arrays to hold either. Structure is recursive.
  • JSON vs Relational - JSON differs in how data is stored, schema is self-describing, ordering is present, query support is minimal unlike XQuery, XPath, XSLT, and is implemented typically in prog languages
  • JSON vs XML - XML more verbose/complex than JSON, like DTD, XSD, JSON has JSON schema.
  • XML model dont match with programming languages. Interface is clunky. This is called impedence mismatch - similar to Relational and prog languages.
  • JSON has a more direct mapping with prog languages.

Valid JSON

  • Adheres to basic structural requirements of JSON. (label-value pairs, arrays of values)
  • Can use JSON schema for semantic validation.


Relational Algebra

  • Relational algebra is the formal language that is the underpinning for implemented languages like SQL
  • Query (expression) on a set of results produces relation as a result.
  • Simplest query in RA is the name of the relation e.g. Student which produce the contents of the Student relation
  • Operators to filter, slice and combine relations.
  • Select operator : Sigma (cond) Relation name. Condition is a filter and the result will be a subset of the relation
  • Project operator : Select picks rows, project picks columns. Pi (col name) Relation Name.
  • To pick both rows and cols, combine the select and project operator. e.g. Project(sid,name) (Select(gpa > 3.7) student)
  • Cross Product : By default RA is based on sets and doesnt consider duplicates. But SQL uses multisets/bags and includes duplicates.

Cross product between two relations combines all attributes (columns) and every combination of tuples between the relation. m X n attrs and a X B tuples.