February 21, 2004
Right on the Mark Mark Pilgrim has an excellent article about determining the character set used within a feed. Gush complies mostly with the outlined steps. We have a last resort step which uses the Mozilla's Univerisal Charset Detector for recognizing feeds that use encodings such as SHIFT_JS, EUC_KR, etc. The results for using UCD are mixed, but it's better than just giving up when the encoding isn't us-ascii or UTF-8, and the information isn't present in the XML declaration or the HTTP header.