Wednesday, November 19, 2008

Thinking Cap questions: The Lame-Duck edition

First, a link:

Here is a link to the "Chinese room" argument: http://en.wikipedia.org/wiki/Chinese_Room

-------------------

Here are some points to ponder on the recent topics:

0 (don't need to answer on blog): We talked a lot about syntax and semantics. Using English as the example, think of (a) whether
an ungrammatical sentence can have semantics (b) a sentence with "no meaning" can be grammatically correct.
Consider the famous example: "Colorless green ideas sleep furiously."  (Check out
http://rakaposhi.eas.asu.edu/f06-cse471-mailarchive/msg00090.html for the history of that sentence...)


1. We talked about the fact that XML is a syntactic standard and doesn;t have semantics. Do relational databases have
semantics?  (And if so, then won't a conversion of a relational database to an XML form preserve those semantics?)

Consider the case of the use of the database by someone who knows and understands the database schema as well as a
lay user that doesn't

[In thinking about semantics, it is useful to think in terms of the "worlds" that are consistent with a data/knowledge base.
You will say that a formal sentence has semantics if you can enumerate worlds where it is going to be true (or alternately,
given a completely specified world, you can tell whether or not that sentence is "true" in that world. As you add more and more
sentences to a knowledge base, you constrain the number of worlds that are consistent with it.]


2. Here is a question that one of the students asked after the class: We said XML can be viewed either as ordered or unordered.
From a DB perspective, we would like to see it as unordered and from an IR perspective we would like to see it as ordered.
The question is what, if any, is the disadvantage of assuming that XML is always ordered? More specifically what is the loss from the
database side if we unilaterally decide that XML is ordered whether or not it was intended in such a way?



14 comments:

Ravi Gummadi said...

1. Yeah, RDBMS do have semantics. The schema for each table forms the grammar used to check the validity of each table. So when we are converting a table to a XML, if we are throwing the schema and just converting the raw data to XML, we would loose the semantics. But if we convert the schema to XML's respective DTD and use it for the correctness verification, I think we can preserve the semantics.

2. Any ordering of the given tuples in a table will correspond to a single instance of table, but if we strictly consider XML as ordered file, the resulting XML matches to only one ordering of the tuples which is not true. So this is really useful, when we are trying to validate a given XML with a database table.

Anonymous said...

1. I consider that the semantics of a relational database is in the eyes of the user. In other words, the understanding of the meaning of a given attribute, tuple, or table, depends on the knowledge the user has about the underlying structure.
Additionally , converting a relational database to XML will preserve the semantics if there is an adequate procedure to translate the meaning of each element from one schema into another schema.

2. If we decide that XML is ordered, the big loss in the database side is the loss of all the potential a DB has to express the stored (raw) data in multiple ways. Plus, assuming that XML is always ordered could create conflicts if the data is incomplete or completely absent.

Girish said...
This comment has been removed by the author.
Girish said...

2. When we convert XML document to a database model, we generally store the data extracted from an XML document in the form of tables which follows a predefined schema. However, this structured representation of the DB doesn't preserve the XML document's original structure. i.e the tag ordering in XML may be lost. Therefore, considering that an XML is ordered may not help in preserving the structure of a database model. Or in other words, it is better to consider the XML as unordered.

1.Inorder to preserve the semantics while converting an RDBMS to an XML, we can convert it to an XML DTD where the columns in RDB can be represented as PCDATA or attributes. However, the CDATA sections in DTD might not be preserved.
I stand corrected though.

Shruti Gaur said...

A relation in an RDBMS is essentially a mapping between the fields, like we have mapping between variables in functions. The same way as we have propositions in logic. Each tuple of a relation represents one such propositional rule eg.if we have a relation with the following attributes [father,son] and the tuples are [a,b],[b,c], [p,q]. Then given a query find x:x is father(father(c)). The database can find a as the correct answer. So, we do know the worlds that are possible and can tell whether a given statement is true or false.

As another illustration, let us say the attributes are [a1,a2,a3,a4] then if a tuple [x1,x2,x3,x4] is stored in the table, we have an idea that if a1=x1 and a2=x2 and a3=x3 then a4=x4.

We can do this even without knowing what the values of the attributes are and what they mean in the real world.

we can retain the data types and relationships(kind-of) among the data values in XML but there is no mechanism to (do joins and) make logical infernces.

One might argue that the semantics are in the eye of the beholder even in relational DBs because the E-R diagram is in our head. So, we might not know whether relationships are 1:1,m:1,1:m etc. but we can still do such basic inferences. Absurd inferences are also possible. If someone decides to join the age(int) column of table1 with serial no(int) column of table2, there is no way of stopping that without having any knowledge of the schema.If we have the schema, the number of worlds possible is even more constrained and inferences more accurate.

Shruti Gaur said...

the above was response to Q1

No blog name said...

2. We considered "unordered" as not ordered by a specific value (like object-id), but the data in an XML file is still ordered (in the way how we entered the data for instance). The same is true for a database. Furthermore, data in a DB is often stored in a tree (B+ tree). If a DB stores data in a tree and an XML-document is nothing else than a tree, what is the difference? "Unordered" doesn't exist in my opinion. It is more like "ordered, but we don't know how" or "ordered in a way that doesn't make any sense" (but still ordered).

Way more important is the way how the DBMS retrieves data from the underlying DB. If a relational database uses trees to store the data, we could use such a DBMS on an XML-file to retrieve the data in the same efficient way, especially, if we use techniques like indexing.

In consideration of the question, the IR perspective deals with the way how data is returned and ordered by the DBMS. The DB perspective deals with the way how data is stored (on the hard drive).

So, what is the disadvantage of the assumption that XML is always ordered? Based on my assumption that there is no real "unordered" and a DB doesn't necessarily want "unordered" data (why should it?): No disadvantage, since it depends on the DBMS how the data is retrieved (for IR).
And what is the loss from the DB side? No loss, since all data always has to be ordered in a specific way to make it retrievable. The question is, how efficient is the order of the data and how efficient is the retrieval.

Anupam said...

RDBMS have no semantics whatsoever. A database schema (like XML DTD) just allows for syntax parsing, semantic interpretation of the meaning (which involves model checking against the background knowledge). For example, a tuple (make = Honda, model = Civic, price = $20000) is supposed to mean that "there exists a car manufactured by Honda, of model Civic, priced at $10000. The tuple and the schema do not imply this meaning anywhere. We can make sense of it because we know that Honda Civic is a car and $ is a currency etc.

the closest databases reach towards semantics are deductive databases, which are equipped with an inference mechanism.

Anupam said...
This comment has been removed by the author.
Anupam said...

i made a typing error...i meant to say:

A database schema (like XML DTD) just allows for syntax parsing, "BUT NO" semantic interpretation of the meaning (which involves model checking against the background knowledge)...

Radhika Nair said...

Relational databases can be thought of having semantics, but again it depends on how the user perceives it(as many have pointed out in the blog). If we do go ahead with the idea that relational databases have semantics. then semantics can be retained while converting to XML if we can determine some relation between XML schema and database schema, in the sense of the "world" that we are talking about.

Radhika Nair said...
This comment has been removed by the author.
Dejun Yang said...

2. One problem of assuming XML is ordered is that for an unordered DB, there could be multiple ordered XML representations of a table. When we try to compare two DBs using XML, that will be more complicated.

Mithila said...

1. RDF is the framework being used to create semantic web. If we look at a database we see that each relation in the DB can be reduced to an RDF triple as follows:

- each record in the DB is an RDF node
- the field (column) name is the RDF property Type
- the record field (table cell) is a value

Now RDF s serialization format is nothing but its syntax in XML, and hence XML seems to be a good way to represent relational DBs.

Check out the link: http://www.w3.org/DesignIssues/RDB-RDF.html