Tuesday, December 14, 2010

Something about converting XML to JSON

For the past few weeks, I have worked on several tasks and in the following several blog posts I would like to talk about what I have learned. This time, I am going to talk about something tricky I found in converting XML to JSON.

Actually, there is no universal standard in converting XML to JSON. Every one may have his/her own standard while converting. Here, I will explain the mechanism I used in my conversion function first.

Basic Concept:
Basically, convert Xml to JSON is to parse Xml and translate each node into a JSON object, then nest the JSON objects according to the Xml structure. The most common way to parse Xml is using a recursive function, so that it can recursively go through each node level. For each recursive cycle, the parser normally needs to handle three situations:

Use the following simple xml as an example:
  <book state="new">Lord of the Ring</book>
1. Attributes of an element. In this case, "state".
Only element can have attributes. So the function needs to loop through and save all the attributes before it goes further.

2. Elements with child nodes (nodeType = 1). In this case, "book"(it has a child node which contains the Text of "Lord of the Ring". So create an object with the node name and start parsing all its child nodes.

3. Text of an element (nodeType = 3).
In this case, "Lord of the Ring". So simply return the node value.

A little problem came up:
I followed these rules and my function appears working well. However, when I used it to parse a self-closed Xml element, it failed. For example: Parsing following xml,
  <book state="new" title="Lord of the Ring" />
Interesting discovery:
After some investigation, I found some thing very interesting. It looks like self-closed Xml element will have a leading empty Text node before the element node, which means the parser will get a Text node with a value of " " before the Element node with a value of "book". Since the parser will not continue parsing if the input is a Text node, it will not be able to parse the element node. Therefore, I need to add an condition statement to take care of this situation.

In the end, I posted the xmlToJson conversion function I built. Hopefully, it will help some developers that working on this subject.
 function xmlToJson(xml) {
var obj = {};
// element
if (xml.nodeType == 1) {
// check attributes
if (xml.attributes.length > 0) {
for (var j = 0; j < xml.attributes.length; j++) {
obj[xml.attributes[j].nodeName] = xml.attributes[j].nodeValue;
}
}
}
// check child node
if (xml.hasChildNodes()) {
for(var i = 0; i < xml.childNodes.length; i++) {
// check if the child node has another child node, if not,
// this child node is the text value of the parent node.
if (xml.childNodes[i].nodeType != 3 || i != xml.childNodes.length - 1) {
//check if this node name is existed in obj, if not, add
//this node as a property to current obj
if (typeof(obj[xml.childNodes[i].nodeName]) == 'undefined') {
obj[xml.childNodes[i].nodeName] = xmlToJson(xml.childNodes[i]);
}
else {
//check if there is the array for this node name, if not, create
//an array and add the existing one to the array
if (!(obj[xml.childNodes[i].nodeName] instanceof Array)) {
var old = obj[xml.childNodes[i].nodeName];
obj[xml.childNodes[i].nodeName] = [];
obj[xml.childNodes[i].nodeName].push(old);
}
obj[xml.childNodes[i].nodeName].push(xmlToJson(xml.childNodes[i]));
}
}
else {
//if it is a text node, just return the value. No object created
//for text node.
return xml.childNodes[i].nodeValue;
}
}
}
else {
//if it is a self closed element, it will have a blank text node before parser
//reach the element. Ignore it.
if (xml.nodeType == 3) {
return xml.nodeValue;
}
}
//return this object
return obj;
}

No comments:

Post a Comment