Smart Quotes and RSS October 17, 2006 10:22 AM   Subscribe

Smart Quotes Ahoy! Love 'em or hate 'em, they turn up awful weird in my RSS reader.

Not a major crisis, I was just surprised to see Word-style formatting passing through after the very recent thread on blocking entities.
posted by gimonca to Bugs at 10:22 AM (31 comments total)

Actually, those are summary quotes.
posted by cribcage at 11:26 AM on October 17, 2006 [1 favorite]


I blame your RSS reader.
posted by smackfu at 11:42 AM on October 17, 2006


I can't tell you if the use of smart quotes in MetaFilter today would last five days, or five weeks, or five months, but it certainly isn't going to last any longer than that.
posted by ijoshua at 11:56 AM on October 17, 2006


Obviously gimonca has never been livejournalism school.
posted by dios at 12:03 PM on October 17, 2006


Smart quotes are actually quite stupid, in the context of text feeds they will always be a wrench in the works.
posted by prostyle at 12:53 PM on October 17, 2006


Yeah, all the gazillions of WordPress feeds out there are wrenched-up, right?
posted by Firas at 1:55 PM on October 17, 2006


They ought to become " in the XML, shouldn't they?
posted by gimonca at 2:33 PM on October 17, 2006


“They ought to become " in the XML, shouldn't they?”

No, because " => "
i.e. a straight quote, a.k.a. inch mark or double prime

These are proper typographical quotation marks, which should appear curly: “”

The feeds in which I’ve seen them work use numerical entity encoding for those characters, e.g. 8220; => “ and 8221; => ”
cf: http://daringfireball.net/index.xml
posted by ijoshua at 3:07 PM on October 17, 2006


That should be 8220; and 8221;
posted by ijoshua at 3:08 PM on October 17, 2006


dammit!
posted by ijoshua at 3:09 PM on October 17, 2006


Ignore me. I was trying to be all smart, but preview foiled me.
posted by ijoshua at 3:10 PM on October 17, 2006


One time when I went on vacation for a week, my coworkers foiled my whole cubicle.
posted by cortex at 3:12 PM on October 17, 2006


You were trying for 8220;
posted by Rhomboid at 4:05 PM on October 17, 2006


And so was I apparently.
posted by Rhomboid at 4:06 PM on October 17, 2006


Okay it's aaaaaawn, comment-filter.


&​#8220;
posted by Rhomboid at 4:07 PM on October 17, 2006


Phew, there it went. All it took was a little U+200B. Fools filters all the time.
posted by Rhomboid at 4:08 PM on October 17, 2006


See what I mean? All that heartache, and some goomba who copies-and-pastes right out of MS Word gets a free ride through. Weird, ain't it?
posted by gimonca at 5:14 PM on October 17, 2006


Or someone who uses the ‘keyboard shortcuts’ of his “operating system,” such as the vaunted ™, to type perfectly normal unicode into a standard text view. Without retarded entities. Windows can do this too, seriously. It has keyboard shortcuts too.
posted by blasdelf at 5:26 PM on October 17, 2006


Um, you misunderstand. I was not trying to get a fancy quote in the comment, I was trying to get the representation of the numerical entity of a fancy quote into the comment. Those are two completely different things; if I wanted fancy quotes I'd just enable the greasemonkey script that automatically smartquote-ifies everything and be done with it. Or just copy and paste it. That is trivial and not something I care about.
posted by Rhomboid at 5:38 PM on October 17, 2006


My feed reader shows those characters as a question mark, which makes reading difficult because I can never tell if it's actually a question mark or just a curly quote. But I don't understand why it does that and my browser doesn't, because both are reading the same characters with UTF-8 encoding.
posted by scottreynen at 5:45 PM on October 17, 2006


scottreynen, I get question marks instead of curly quotes when I load the mefi rss.xml file in firefox. Weird. Something about XML parsing rules?

Anyway, were mathowie inclined to fix this, he could do a 'string replace' or whatever that's called in coldfusion to grab the windows smart quotes--and normal ones for that matter--and escape them to html entities.
posted by Firas at 5:56 PM on October 17, 2006


Firas, I've tried that, and smart quotes are stored in the db like ten different ways depending on browser and how they were input into the form. I've never been able to just search and replace those alone (and when I have tried, I've ended up having to write dozens of rules for every degree sign, emdash, endash, etc).

I think they work better in Ask MeFi because I XML escape every single field in that feed. In the MeFi feed I think I only escape the title because it would kick an error to most parsers.
posted by mathowie (staff) at 6:47 PM on October 17, 2006


I get question marks instead of curly quotes when I load the mefi rss.xml file in firefox.

Oh me too, but not when I load the HTML in Firefox, though both are being loaded with UTF-8 encoding. It appears only the HTML actually includes UTF-8 encoded text.

Anyway, were mathowie inclined to fix this, he could do a 'string replace' or whatever that's called in coldfusion to grab the windows smart quotes--and normal ones for that matter--and escape them to html entities.

That will work fine for smart quotes, but it won't fix the underlying problem for every other multi-byte character, e.g. the entire title of this AskMe, which is just a bunch of question marks in the RSS.

There seems to be an encoding problem caused by putting multi-byte characters through some function that is assuming single-byte input.
posted by scottreynen at 6:50 PM on October 17, 2006


I think they work better in Ask MeFi

Ask MeFi doesn't work any better for me.
posted by scottreynen at 6:51 PM on October 17, 2006


To confirm my suspicion, I looked at the actual byte values for the curly apostrophe in this thread. In the HTML, it's a multi-byte UTF-8 character with ordinal values: 226,128,153. In the RSS, it's a single-byte character with ordinal value: 63. Whatever is converting those three bytes into that one byte is the problem. The original multi-byte curly quotes should work fine in the RSS, but they're never making it into the RSS for some reason.
posted by scottreynen at 7:12 PM on October 17, 2006 [1 favorite]


ain't matts fault your RSS reader sucks.
posted by delmoi at 10:47 PM on October 17, 2006


Don't forget your &s people
posted by delmoi at 10:48 PM on October 17, 2006


In any case....what prostyle said.

Try Feed Validator and scroll down to the post in question, if it hasn't aged off yet. (Note: it does validate, as of this moment.)

([ahem] Easiest fix is to regex-n-replace 'em all with " before generating the XML, of course...)
posted by gimonca at 5:34 AM on October 18, 2006


9731;?
posted by armoured-ant at 5:28 PM on October 18, 2006


Dammit!
☃?
posted by armoured-ant at 5:28 PM on October 18, 2006



posted by Termite at 12:17 PM on October 19, 2006


« Older St. Louis meetup?   |   How about a mobile version of MetaFilter? Newer »

You are not logged in, either login or create an account to post comments