<< Return to Main Articles Page

Most Popular Articles

Articles by Topic

A Home Builders Opinion Air Conditioning Tips Beachfront Houseplans building lots bungalow Cape Cod House Plans Color Schemes Contemporary Style country house plans craftsman house plans Curb Appeal Custom vs Stock House Plans Do-It-Yourself Dream Homes Engineering House Plans european house plans Exterior Materials Feng Shui Floorplan Layout Frank Lloyd Wright French Country Home Design Garage Plans Garages green design Home Builders Home Designers Homebuilding activity homeplans house plan styles house plans Housing Market Indoor Gyms in-law suites Jake England Landscaping Log Cabins luxury home plans Mediterranean Homeplans modern home plans Modifications Mortgages Mountain House Plans National Association of Home Builders Passive Solar Design Perfect House Plans Plan Images Prairie Houseplans Ranch House Plans Remodeling

Recent Articles

XML Encoding Problems - Hexadecimal Value 0x1A, is an Invalid Character

Recently I started noticing exceptions in my log files that stated "hexadecimal value 0x1A, is an invalid character". I did some searching and a lot of other people have had the same problem. It occurs when someone pastes in text that contains specific unicode characters that are invalid in XML. From what I gather this frequently happens when someone copies in text from Microsoft Word.

The trick was to find out which characters are invalid and then use a regular expression to remove them.

You can look at the specs for all the details:

But it is probably easier to just get the code:

        /// <summary>
        /// This removes characters that are invalid for xml encoding
        /// </summary>
        /// <param name="text">Text to be encoded.</param>
        /// <returns>Text with invalid xml characters removed.</returns>
        public static string CleanInvalidXmlChars(string text)
        {
            // From xml spec valid chars:
            // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]    
            // any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.
            string re = @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
            return Regex.Replace(text, re, "");
        }

Comments:

Leave a Comment:

You must be signed in to add a comment

Related Articles