feedparser.text content type

I need to change this line from
true_encoding = http_encoding or ‘us-ascii’
to
true_encoding = http_encoding or xml_encoding or ‘us-ascii’
for those buggy sites that don’t obey the standard. And set content type to text/* but don’t offer a charset, set their encoding in the xml file.

Share This

feedparser.whitespace

According to the XML spec http://www.w3.org/TR/REC-xml/#NT-EncodingDecl whitespace is allowed around the quotes of encoding Here is a simple patch:
— /usr/ports/textproc/py-feedparser/work/feedparser/feedparser.py.old Sat Jul  2 16:17:11 2005
+++ /usr/ports/textproc/py-feedparser/work/feedparser/feedparser.py     Sat Jul  2 16:18:25 2005
@@ -2101,7 +2101,7 @@
else:
# ASCII-compatible
pass
-        xml_encoding_match = re.compile(’^<\?.*encoding=[\’"](.*?)[\’"].*\?>’).match(xml_data)
+        xml_encoding_match = re.compile(’^<\?.*encoding\s=\s[\’"](.*?)[\’"].*\?>’).match(xml_data)
except:
xml_encoding_match = None
if xml_encoding_match:
I’ve send this patch to the […]

feedparser.encoding

Looks Feedparser was written with Python 2.3. With python 2.4, the CJKcodecs is included in the official release. So the line
import cjkcodecs.aliases
should be changed to
import encodings.aliases

Share This

Close
E-mail It