|
Technology
|
|
XML Encoding Problems - Hexadecimal Value 0x1A, is an Invalid Character
|
|
|
|
Tags : xml, encoding
|
|
|
|
Published On : 03-03-2008
|
|
|
|
Author : Steve Donegan
|
|
|
|
12732 Views
|
|
|
|
|
|
|
|
|
|
|
Recently I started noticing exceptions in my log files that stated "hexadecimal value 0x1A, is an invalid character". I did some searching and a lot of other people have had the same problem. It occurs when someone pastes in text that contains specific unicode characters that are invalid in XML. From what I gather this frequently happens when someone copies in text from Microsoft Word.
The trick was to find out which characters are invalid and then use a regular expression to remove them.
You can look at the specs for all the details:
But it is probably easier to just get the code:
/// <summary>
/// This removes characters that are invalid for xml encoding
/// </summary>
/// <param name="text">Text to be encoded.</param>
/// <returns>Text with invalid xml characters removed.</returns>
public static string CleanInvalidXmlChars(string text)
{
// From xml spec valid chars:
// #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
// any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.
string re = @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
return Regex.Replace(text, re, "");
}
|
|
|