perfectxml.com
 Basic Search  Advanced Search   
Topics Resources Free Library Software XML News About Us
home » focus » msxml » ask a question » past questions & answers Monday, 13 August 2007
 
NEWS
MSXML 4.0 SP2 now available!

 
MSXML
Basics
DOM
SAX
XPath
XSLT
Schemas
SOM
HTTP Access
.NET
Data Islands
Ask a Question
   Past Q&As
C++ Samples
DLL/Version Info
Reference Guide
Books
KB Articles
   HOW TO
   SAMPLE
   INFO
   BUG/PRB
   FIX
   Misc.
MSXML Tips
   August 2002
   September 2002
MSXML Tools

Microsoft XML Core Services


Go back to list of previously asked questions and answers

Question: I am creating XML documents from the database. Sometimes, the MSXML parser fails to create/load this XML document with the error "Invalid Unicode Character". I am pretty sure it is not encoding problem, as I am using UTF-8 encoding and tried out with all special characters, including character above ASCII 127 (0x7F). Please help.
Asked By: Guest
Viewed: 5505
Answer: We think that the most probable cause of this problem is that the data contains characters less than ASCII 32. XML document cannot contain characters less than 32, except tab (0x9), newline (0xA) and carriage return CR (0xD). See http://www.w3.org/TR/REC-xml.html#charsets for details on this. Here is what we tried out:

1.) Created following document as c:\1.xml
<?xml version="1.0" encoding="UTF-8"?>
<Test>&#25;&#22;&#x1F;</Test>
2.) The above sample XML document contains character references for characters less than 32. Try to open this document in Internet Explorer and it would open it without any problems. But if you try to load this document using MSXML DOM it would fail with "-1072896737 (0xC00CE51F) Invalid unicode character.".

In this example, MSXML is doing the right job. The XSL/MSXML code inside Internet Explorer does not really check for this well-formedness constraint.

Now, back to your problem: Well, if you find out that the data contains characters less than 32, the only solution is to convert these characters using Base64 encoding. MSXML can automatically do the base64 encoding/decoding for you. See the article XML and Binary Data and/or search for Base64 on perfectxml.com for more information on this.

While looking at this problem, we also found out that there is really no way to write character references while generating XML document using MSXML DOM. That is, if you set the element value (either by using nodeTypedValue or by using createTextNode) to something like &#65;, MSXML will escape the ampersand (&) character and write &amp;#65;. I guess disableOutputEscaping alongwith MXXMLWriter40 may be used in such cases.


Go back to list of previously asked questions and answers
  Contact Us |  | Site Guide | About PerfectXML | Advertise ©2004 perfectxml.com. All rights reserved. | Privacy