Basic Search  Advanced Search   
Topics Resources Free Library Software XML News About Us
home » focus » msxml » ask a question » past questions & answers Friday, 12 October 2007
MSXML 4.0 SP2 now available!

HTTP Access
Data Islands
Ask a Question
   Past Q&As
C++ Samples
DLL/Version Info
Reference Guide
KB Articles
   August 2002
   September 2002

Microsoft XML Core Services

Go back to list of previously asked questions and answers

Question: I am merging two dynamically created XML document files (sub.xml into main.xml), using appendChild method. But the method fails if the sub.xml (XML file being appended) contains some special European characters.
Asked By: Suresh
Viewed: 10734
Answer: We tried the above with MSXML 4.0 SP1 and it worked.

Here is the drill (try it out yourself)
1.) Start Notepad and write following text in it:
        <Price CountryCode="1">$120.00</Price>
        <Price CountryCode="2">€122.372</Price>
        <Price CountryCode="3">£77.0752</Price>
        <Price CountryCode="4">¥14,224.74</Price>
Save the document as c:\item.xml and open the XML file in Internet Explorer and you should see the following error:
An invalid character was found in text content. Error processing resource 'file:///C:/item.xml'. Line 2, Position 8

2.) Add the following line (XML declaration line) at the top of the above document (as the very first line)
<?xml version="1.0" encoding="UTF-16"?>

Save the document and hit refresh in the browser to reload c:\item.xml, and you should get the following error message:
Switch from current encoding to specified encoding not supported. Error processing resource 'file:///C:/item.xml'. Line 1, Position 40

Here is what happened:
In step 1 we created a XML document that contained special characters (€, £, ¥ and ß), but when we saved the document in notepad, by default it saves the document with ANSI encoding. Now when MSXML loaded this file and looked at first bytes, it figured out that document is an ANSI file and assumed it will contain only ANSI characters (0-127). But as soon as it saw ß character, it raised an error and document load failed.

In step 2 then we added XML declaration line specifying encoding as UTF-16, but still saved as an ANSI document (the default encoding used when we save the file in Notepad). Now when MSXML loaded this file, once again assumed it to be an ANSI document, but then when it found encoding="UTF-16" attribute, it raised the "Switch from. . ." error and document load failed.

To fix this problem, open the item.xml in Notepad, make sure it has XML declaration line with encoding="UTF-16" still present, select File | Save As and select Unicode from the encoding select box in the Save as dialog. Overwrite the file, hit refresh in the browser and this time the browser should load the file correctly. Now when MSXML loads this file, as the file is saved with Unicode encoding, MSXML will find that out by looking at first bytes, and everything works fine after that.

So, we have c:\item.xml saved as Unicode encoding (in Notepad) as below:
<?xml version="1.0" encoding="UTF-16"?>
        <Price CountryCode="1">$120.00</Price>
        <Price CountryCode="2">€122.372</Price>
        <Price CountryCode="3">£77.0752</Price>
        <Price CountryCode="4">¥14,224.74</Price>

Let's say we have one more file called as main.xml as below:
<?xml version="1.0" encoding="UTF-8"?>
And our job is to merge item.xml (the XML document that contains special characters) into the main.xml under the FromUK tag.

Start Visual Basic 6.0, create standard EXE project, add reference to MSXML 4.0, and write following code:
Option Explicit
Private Sub Form_Load()
    Dim objXMLDocMain As New MSXML2.DOMDocument40
    Dim objXMLDocItem As New MSXML2.DOMDocument40
    Dim objFromUKNode As MSXML2.IXMLDOMNode
    objXMLDocMain.async = False
    objXMLDocMain.validateOnParse = False
    objXMLDocMain.setProperty "ServerHTTPRequest", False
    objXMLDocItem.async = False
    objXMLDocItem.validateOnParse = False
    objXMLDocItem.setProperty "ServerHTTPRequest", False
    'No error handling done
    objXMLDocMain.Load "c:\main.xml"
    objXMLDocItem.Load "c:\item.xml"
    'Debug.Print objXMLDocMain.xml
    'Debug.Print objXMLDocItem.xml
    Set objFromUKNode = objXMLDocMain.selectSingleNode("//FromUK")
    objFromUKNode.appendChild _
    Unload Me
End Sub

Look at c:\Merged.XML document and you'll see that the data is merged properly:

<?xml version="1.0" encoding="UTF-8"?>
                        <Price CountryCode="1">$120.00</Price>
                        <Price CountryCode="2">€122.372</Price>
                        <Price CountryCode="3">£77.0752</Price>
                        <Price CountryCode="4">¥14,224.74</Price>

To include special characters in your XML document, the other alternative is to use Character and Entity References . This way you can still save the document as ANSI files and there is no need to specifically indicate the encoding attribute in the XML declaration.

Start Notepad, write following text and save the file as ANSI file, open the file in Internet Explorer, you'll still see the special characters.

        <Price CountryCode="1">$120.00</Price>
        <Price CountryCode="2">&#8364;122.372</Price>
        <Price CountryCode="3">&#163;77.0752</Price>
        <Price CountryCode="4">&#165;14,224.74</Price>

Useful Link: HTML Special Characters and Browser Compatibility

Go back to list of previously asked questions and answers
  Contact Us |  | Site Guide | About PerfectXML | Advertise ©2004 All rights reserved. | Privacy