Difference between revisions of "Databases"

From Suhrid.net Wiki
Jump to navigationJump to search
Line 101: Line 101:
 
* Project operator : Select picks rows, project picks columns. Pi (col name) Relation Name.
 
* Project operator : Select picks rows, project picks columns. Pi (col name) Relation Name.
 
* To pick both rows and cols, combine the select and project operator. e.g. Project(sid,name) (Select(gpa > 3.7) student)  
 
* To pick both rows and cols, combine the select and project operator. e.g. Project(sid,name) (Select(gpa > 3.7) student)  
* Cross Product : By default RA is based on sets and doesnt consider duplicates. But SQL uses multisets/bags and includes duplicates.
+
* Cross Product : By default RA is based on sets and doesnt consider duplicates. But SQL uses multisets/bags and includes duplicates - Cross product between two relations combines all attributes (columns) and every combination of tuples between the relation. m X n attrs and a X B tuples.
Cross product between two relations combines all attributes (columns) and every combination of tuples between the relation. m X n attrs and a X B tuples.
+
* Natural Join : Is like cross product but enforces equality on all attributes with the same name. Eliminate one copy of duplicate attributes (same attribute from diff table). So if we use the nat. join operator we dont need to specify the key attributes need to be equal condition. Doesnt provide any additional expressive power, but is convenient notationally.

Revision as of 23:33, 17 January 2014

Relational Databases

  • Set of named relations (tables)
  • Each relation has a set of named attributes (columns)
  • Each tuple (row) has a value for each attribute
  • Each attribute has a type (or domain)
  • All queries return relations (tables), so it can said to be compositional or closed.

XML Data

  • Alternative to relational model to store data
  • Standard for data representation and exchange
  • Document format similar to HTML, however tags represent content instead of formatting.
  • XML consists of tagged elements which can be nested, attributes of elements and the text (values/data)
  • XML document can be thought of as a tree with values/text forming the leaves.

Relational vs XML

Relational XML
Structure Tables Hierarchical/Tree
Schema Fixed in advance Flexible, self-describing. e.g. all elements need not have same attributes. (In relational, we would need to have NULL values). Inconsistencies in the structure is NOT a problem.
Queries Relatively simple Complicated compared to Relational / SQL.
Ordering Fundamentally UNORDERED. e.g. Order by clause is required in SQL in case order is desired. Ordering is implied - e.g. the way the text is laid out in the XML document or in the Stream.
Implementation Mature - native implementation Add - on. (Underlying implementation is relational)

Well formed XML

  • Single root element
  • XML parser is used to validate XML.
  • DOM parser - treats XML as a document, SAX parser - treats XML as a stream.
  • Matched tags, proper nesting
  • Unique attributes within each element

Displaying XML

  • Since XML occurs a lot on the internet, a convenient way is required to display XML.
  • Use rule based languages such as CSS, XSL to translate XML to HTML.
  • XML document is fed to a CSS/XSL interpreter alongwith rules and a HTML document is generated.

Valid XML

  • Valid XML should adhere to requirements for well formed XML
  • Also should adhere to content-specific specifications : DTD/XSD
  • DTD is a grammar-like language for specifying elements, attributes, nesting, ordering, #occurrences
  • DTD/XSD offers the benefits of "typing" - can serve as a specification. Useful for data exchange.
  • Having no DTD offers flexibility of data.
  • DTD also can specify pointer types - an element can have an ID and another element can refer to it using an IDREF.

XSD

  • Like DTD, but more comprehensive.
  • XSD is written in XML itself.
  • Everything in DTD is string, in XSD e.g., we can specify that an attribute should be of a specific type.
  • Pointers can be typed, so we can enforce that a reference is made to a particular type and not just global like in DTD's
  • Also specify min/max constraints on elements

JSON

  • Like XML, JSON can be thought of as a data model - as an alternative to the relational data model.
  • Useful for semi-structured data
  • Designed originally for serializing data objects
  • JSON is human readable - often used for data interchange
  • JSON consists of base values, objects (label-value pairs), and arrays to hold either. Structure is recursive.
  • JSON vs Relational - JSON differs in how data is stored, schema is self-describing, ordering is present, query support is minimal unlike XQuery, XPath, XSLT, and is implemented typically in prog languages
  • JSON vs XML - XML more verbose/complex than JSON, like DTD, XSD, JSON has JSON schema.
  • XML model dont match with programming languages. Interface is clunky. This is called impedence mismatch - similar to Relational and prog languages.
  • JSON has a more direct mapping with prog languages.

Valid JSON

  • Adheres to basic structural requirements of JSON. (label-value pairs, arrays of values)
  • Can use JSON schema for semantic validation.


Relational Algebra

  • Relational algebra is the formal language that is the underpinning for implemented languages like SQL
  • Query (expression) on a set of results produces relation as a result.
  • Simplest query in RA is the name of the relation e.g. Student which produce the contents of the Student relation
  • Operators to filter, slice and combine relations.
  • Select operator : Sigma (cond) Relation name. Condition is a filter and the result will be a subset of the relation
  • Project operator : Select picks rows, project picks columns. Pi (col name) Relation Name.
  • To pick both rows and cols, combine the select and project operator. e.g. Project(sid,name) (Select(gpa > 3.7) student)
  • Cross Product : By default RA is based on sets and doesnt consider duplicates. But SQL uses multisets/bags and includes duplicates - Cross product between two relations combines all attributes (columns) and every combination of tuples between the relation. m X n attrs and a X B tuples.
  • Natural Join : Is like cross product but enforces equality on all attributes with the same name. Eliminate one copy of duplicate attributes (same attribute from diff table). So if we use the nat. join operator we dont need to specify the key attributes need to be equal condition. Doesnt provide any additional expressive power, but is convenient notationally.