We tried the above with MSXML 4.0 SP1 and it worked.
Here is the drill (try it out yourself)
1.) Start Notepad and write following text in it:
<Item>
<Name>ßetaTrack</Name>
<Price CountryCode="1">$120.00</Price>
<Price CountryCode="2">€122.372</Price>
<Price CountryCode="3">£77.0752</Price>
<Price CountryCode="4">¥14,224.74</Price>
</Item>
Save the document as c:\item.xml and open the XML file in Internet Explorer and you should see the following error:
An invalid character was found in text content. Error processing resource 'file:///C:/item.xml'. Line 2, Position 8
2.) Add the following line (XML declaration line) at the top of the above document (as the very first line)
<?xml version="1.0" encoding="UTF-16"?>
Save the document and hit refresh in the browser to reload c:\item.xml, and you should get the following error message:
Switch from current encoding to specified encoding not supported. Error processing resource 'file:///C:/item.xml'. Line 1, Position 40
Here is what happened:
In step 1 we created a XML document that contained special characters (€, £, ¥ and ß), but when we saved the document in notepad, by default it saves the document with ANSI encoding. Now when MSXML loaded this file and looked at first bytes, it figured out that document is an ANSI file and assumed it will contain only ANSI characters (0-127). But as soon as it saw ß character, it raised an error and document load failed.
In step 2 then we added XML declaration line specifying encoding as UTF-16, but still saved as an ANSI document (the default encoding used when we save the file in Notepad). Now when MSXML loaded this file, once again assumed it to be an ANSI document, but then when it found encoding="UTF-16" attribute, it raised the "Switch from. . ." error and document load failed.
To fix this problem, open the item.xml in Notepad, make sure it has XML declaration line with encoding="UTF-16" still present, select File | Save As and select Unicode from the encoding select box in the Save as dialog. Overwrite the file, hit refresh in the browser and this time the browser should load the file correctly. Now when MSXML loads this file, as the file is saved with Unicode encoding, MSXML will find that out by looking at first bytes, and everything works fine after that.
So, we have c:\item.xml saved as Unicode encoding (in Notepad) as below:
<?xml version="1.0" encoding="UTF-16"?>
<Item>
<Name>ßetaTrack</Name>
<Price CountryCode="1">$120.00</Price>
<Price CountryCode="2">€122.372</Price>
<Price CountryCode="3">£77.0752</Price>
<Price CountryCode="4">¥14,224.74</Price>
</Item>
Let's say we have one more file called as main.xml as below:
<?xml version="1.0" encoding="UTF-8"?>
<Items>
<FromUK/>
</Items>
And our job is to merge item.xml (the XML document that contains special characters) into the main.xml under the FromUK tag.
Start Visual Basic 6.0, create standard EXE project, add reference to MSXML 4.0, and write following code:
Option Explicit
Private Sub Form_Load()
Dim objXMLDocMain As New MSXML2.DOMDocument40
Dim objXMLDocItem As New MSXML2.DOMDocument40
Dim objFromUKNode As MSXML2.IXMLDOMNode
objXMLDocMain.async = False
objXMLDocMain.validateOnParse = False
objXMLDocMain.setProperty "ServerHTTPRequest", False
objXMLDocItem.async = False
objXMLDocItem.validateOnParse = False
objXMLDocItem.setProperty "ServerHTTPRequest", False
'No error handling done
objXMLDocMain.Load "c:\main.xml"
objXMLDocItem.Load "c:\item.xml"
'Debug.Print objXMLDocMain.xml
'Debug.Print objXMLDocItem.xml
Set objFromUKNode = objXMLDocMain.selectSingleNode("//FromUK")
objFromUKNode.appendChild _
objXMLDocItem.documentElement
objXMLDocMain.save "c:\Merged.xml"
Unload Me
End Sub
Look at c:\Merged.XML document and you'll see that the data is merged properly:
<?xml version="1.0" encoding="UTF-8"?>
<Items>
<FromUK><Item>
<Name>ßetaTrack</Name>
<Price CountryCode="1">$120.00</Price>
<Price CountryCode="2">€122.372</Price>
<Price CountryCode="3">£77.0752</Price>
<Price CountryCode="4">¥14,224.74</Price>
</Item></FromUK>
</Items>
To include special characters in your XML document, the other alternative is to use Character and Entity References . This way you can still save the document as ANSI files and there is no need to specifically indicate the encoding attribute in the XML declaration.
Start Notepad, write following text and save the file as ANSI file, open the file in Internet Explorer, you'll still see the special characters.
<Item>
<Name>ßetaTrack</Name>
<Price CountryCode="1">$120.00</Price>
<Price CountryCode="2">€122.372</Price>
<Price CountryCode="3">£77.0752</Price>
<Price CountryCode="4">¥14,224.74</Price>
</Item>
Useful Link: HTML Special Characters and Browser Compatibility
|