Is XML a database? Answer is no, its not. XML is hardware independent markup language which can be used to send and store data across distributed computing environment. XML are self-descriptive, uses user defined tags. Technically, XML define set of rules for encoding documents in way that it’s both human and machine readable. XML specifications are majorly defined by W3C’s XML 1.0 specification along with other free open standards.
XML alone won’t comply with the ACID properties of database. First thing is it possible to store data in XML file? Yes, but storing alone doesn’t save the purpose of database. We need an efficient way to model the real world and to capture, analyze, retrieve and update data.
Challenges to perform of various operation
Consider amount of data to be small, like configuration files for certain application (XML are widely used in such place) XML will do the job. But when the volumes of database are more than 1 MB per record and are fixed in length. XML file will cause huge amount of overhead. When it comes to database based application system the amount of data will be in GB, say100 GB of XML file to be processed it simply means that we need to load it into the memory to get process it that would require lots of extra technique/Technology to deal with.
XML, however, doesn’t really fit these requirements. Due to its nested tag structure, it is difficult to determine where in the file a certain value is stored (in terms of a byte offset into a file) without walking the entire document tree, at least up to the match. Considering 100 student record, where you need to fetch the name of hundred students, we would be parsing 100 XML files and then finding first name out of each.
XML follows a tree structure with nested tags, without parsing the entire tree structure it is difficult retrieve the needed information of the file, which in any case would average to complexity O(n). Same in case of database it would be O(logn) , where the we use offset/index based calculation and then to determine the specific value the complexity would be O(1).
Now we would increase a bit more complexity by adding index to XML, which mean the for the select operation index calculation can be reduced to O(logn) but still retrieving of specific value from file will be O(n). Insertion will become more complex as every new insertion to XML would require rebuilding the entire Index for the XML files.
Inserting is even worse. Relational databases do not guarantee row order, which means they can just append new rows, or overwrite any rows marked as ‘deleted’. This is extremely fast: the DB can just keep a pool of writable locations around; getting an entry from the pool is O(1) unless the pool is empty; worst case, the pool is empty and a new page has to be created, but this too is O(1).
When indexes come into play, things become even more interesting: typical relational-database indexes can be updated with relatively low complexity, say O(log n); but if you want to update index in XML, you might have to rebuild the entire index. This also goes for updates, because updating, say, an element’s text content, can change its size, which means the consecutive XML has to shift. A relational database doesn’t have to touch the index at all if you update a non-indexed column; an XML database would have to rebuild the entire index for each update that changes the size of the updated XML node.
Complexity will be to next level when it will come to maintenance of the XML, if you want to delete or update a schema level information of table. Delete, add a field or change constrain on the values. Entire XML files storing data might needed to be updated to mark the changes, which would incur more cost and effort.
Processing and storage overhead
When we store a value in XML file, it will stored along with the tag, which mean that I would require, (length of the tag*2)bytes+ 5 bytes for opening and closing tag+ n bytes to store my data. While processing the data i will have to parse the line, find my while then it will be converted to binary value before getting loaded into the memory for processing. With a database I would require N byte to store my data and as the data is stored in binary format, it just needed to be loaded and processed.
When we store data in XML file, it will require more space compare to Database and more will be the time to process it.
There is no doubt about whether you can query, retrieve information from a XML file, its simply can be done using Xpath but XML does qualify as a database simple reason it won’t fit with requirement of ACID properties and it would take more cost and effort to handle and maintain huge volume of data. XML stores data in human readable format but we require data to be stored in machine readable format for faster computation, In this case conversion would be an overhead.
XML is markup language which can be used for interchange of data among distributed computing system, but if we start using them to store huge data it will mean more CPU utilization due to lots of processing overhead, higher will be the will be the physical storage and memory requirements.
Is there something called XML database system?
Interesting Yes, we do have XML database. What are they? An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. These data can then be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database (meaning Not (only) SQL).
Lets discuss the new trends and achievement in the area, recently companies like are Oracle and IBM are using XML and SQL to create Hybird Database system, these system are found to yield 50 % higher throughput compared to relational database system.
Would be continued ….