Databases
From Suhrid.net Wiki
Jump to navigationJump to searchContents
Relational Databases
- Set of named relations (tables)
- Each relation has a set of named attributes (columns)
- Each tuple (row) has a value for each attribute
- Each attribute has a type (or domain)
- All queries return relations (tables), so it can said to be compositional or closed.
XML Data
- Alternative to relational model to store data
- Standard for data representation and exchange
- Document format similar to HTML, however tags represent content instead of formatting.
- XML consists of tagged elements which can be nested, attributes of elements and the text (values/data)
- XML document can be thought of as a tree with values/text forming the leaves.
Relational vs XML
| Relational | XML | |
|---|---|---|
| Structure | Tables | Hierarchical/Tree | 
| Schema | Fixed in advance | Flexible, self-describing. e.g. all elements need not have same attributes. (In relational, we would need to have NULL values). Inconsistencies in the structure is NOT a problem. | 
| Queries | Relatively simple | Complicated compared to Relational / SQL. | 
| Ordering | Fundamentally UNORDERED. e.g. Order by clause is required in SQL in case order is desired. | Ordering is implied - e.g. the way the text is laid out in the XML document or in the Stream. | 
| Implementation | Mature - native implementation | Add - on. (Underlying implementation is relational) | 
Well formed XML
- Single root element
- XML parser is used to validate XML.
- DOM parser - treats XML as a document, SAX parser - treats XML as a stream.
- Matched tags, proper nesting
- Unique attributes within each element
Displaying XML
- Since XML occurs a lot on the internet, a convenient way is required to display XML.
- Use rule based languages such as CSS, XSL to translate XML to HTML.
- XML document is fed to a CSS/XSL interpreter alongwith rules and a HTML document is generated.
Valid XML
- Valid XML should adhere to requirements for well formed XML
- Also should adhere to content-specific specifications : DTD/XSD
- DTD is a grammar-like language for specifying elements, attributes, nesting, ordering, #occurrences
- DTD/XSD offers the benefits of "typing" - can serve as a specification. Useful for data exchange.
- Having no DTD offers flexibility of data.
- DTD also can specify pointer types - an element can have an ID and another element can refer to it using an IDREF.
XSD
- Like DTD, but more comprehensive.
- XSD is written in XML itself.
- Everything in DTD is string, in XSD e.g., we can specify that an attribute should be of a specific type.
- Pointers can be typed, so we can enforce that a reference is made to a particular type and not just global like in DTD's
- Also specify min/max constraints on elements
JSON
- Like XML, JSON can be thought of as a data model - as an alternative to the relational data model.
- Useful for semi-structured data
- Designed originally for serializing data objects
- JSON is human readable - often used for data interchange
- JSON consists of base values, objects (label-value pairs), and arrays to hold either. Structure is recursive.
- JSON vs Relational - JSON differs in how data is stored, schema is self-describing, ordering is present, query support is minimal unlike XQuery, XPath, XSLT, and is implemented typically in prog languages
- JSON vs XML - XML more verbose/complex than JSON, like DTD, XSD, JSON has JSON schema.
- XML model dont match with programming languages. Interface is clunky. This is called impedence mismatch - similar to Relational and prog languages.
- JSON has a more direct mapping with prog languages.
Valid JSON
- Adheres to basic structural requirements of JSON. (label-value pairs, arrays of values)
- Can use JSON schema for semantic validation.
Relational Algebra
- Relational algebra is the formal language that is the underpinning for implemented languages like SQL
- Query (expression) on a set of results produces relation as a result.
- Simplest query in RA is the name of the relation e.g. Student which produce the contents of the Student relation
- Operators to filter, slice and combine relations.
- Select operator : Sigma (cond) Relation name. Condition is a filter and the result will be a subset of the relation
- Project operator : Select picks rows, project picks columns. Pi (col name) Relation Name.
- To pick both rows and cols, combine the select and project operator. e.g. Project(sid,name) (Select(gpa > 3.7) student)
- Cross Product : By default RA is based on sets and doesnt consider duplicates. But SQL uses multisets/bags and includes duplicates - Cross product between two relations combines all attributes (columns) and every combination of tuples between the relation. m X n attrs and a X B tuples.
- Natural Join : Is like cross product but enforces equality on all attributes with the same name. Eliminate one copy of duplicate attributes (same attribute from diff table). So if we use the nat. join operator we dont need to specify the key attributes need to be equal condition. Doesnt provide any additional expressive power, but is convenient notationally.
- Theta Join : A convenience like natural join. Basically apply the theta condition to the cross product of two relations. This is the "join" that is implemented in most DBMS.
- Union operator : Used to produce a list.
- Difference operator : Here but not there.
- Intersection : Both here and there.
- Rename operator : Necessary to express certain queries. Takes a result of a schema and calls it something. This name can be used later on. Useful for unifying schemas for set operators (Relational Algebra) rule and for disambiguations in self-joins (joins on the same relation).
SQL
- Two parts - DDL (create,drop) DML (query and modify)
- SQL is declarative - what, not how. Therefore, query optimizer is very important.
- Selecting from multiple tables is a join, but unlike relational algebra, we need to specify the join condition manually.
- e.g. select a from person, city where person.lives = city.name
