Pi fine on preview bad on post November 25, 2003 3:32 PM   Subscribe

The other day I posted a sentence including the Greek letter pi. This appeared as the Greek letter on preview but came out as a couple of question marks in the actual post. Today a post by languagehat included some Kanji characters and a bit later in the thread he says, darn it, ...those characters looked fine on preview. Can someone explain the difference between preview and live posting when it comes to special characters? Thanks!
posted by jfuller to Bugs at 3:32 PM (25 comments total)

Someone will come along and explain this better, but here I go anyway. Evidently there is something in the coding of MeFi's comment box that kicks out certain HTML for certain symbols. If you code HTML into a comment, say for example:

I ♥ pancakes!! it will show the heart on preview. If you look in the comment box while previewing you'll even see the characters in the preview box. But the preview box is full of lies. The solution is to write the HTML initially, check it on preview, delete the HTML character and then write the HTML again on preview before you post.
posted by elwoodwiles at 3:52 PM on November 25, 2003


I ? pancakes
posted by elwoodwiles at 3:57 PM on November 25, 2003


MetaFilter: the preview box is full of lies.
posted by gleuschk at 3:57 PM on November 25, 2003


Special characters are unicode characters, and there's a difference between UTF-8 encoded text (the bytes) and character entities (ASCII strings like &).

If it was inserted by copy/pasting the character in then it's probably UTF-8. If the person typed the character entity then it's (obviously) a character entity.

UTF-8 could be broken if there was something in the code that wants ASCII or 8879 or something, and breaks on UTF-8 (a while ago Matt mentioned something about a UTF-8 problem with JDBC/ColdFusion MX). The is different code involved in a post preview and a post submission. Preview probably just feeds back your form submission into the webpage, and doesn't touch the database. If there was a problem with UTF-8 encoded text and the database you wouldn't see the break until it went through the database (obviously) and it was live.

Character entities are broken in metafilter because it doesn't reencode ampersands into & for the form box, which is a simple thing to fix.
posted by holloway at 4:04 PM on November 25, 2003


Also discussed here, here, here, here, here, here, here, here, here and here.
posted by dg at 4:29 PM on November 25, 2003


Crap, there were a few of those I didn't post in.
posted by holloway at 4:33 PM on November 25, 2003


I think it's mostly an issue with the JDBC/CFMX connection, since I'm pretty sure I'm storing stuff in MSSQL as utf-8 and I specify utf-8 for all form submissions (which is why it works in preview). It's when the posts head to the database that there is trouble.
posted by mathowie (staff) at 4:40 PM on November 25, 2003


To fix the entity bug the bit of code that writes between <textarea> needs to replace & with & to keep the entities rather than resolving them, and then that way previews won't resolve entities into utf-8 data.

Won't fix the utf-8 db problem, but then entities will be consistant.

(heh, you just changed it?)
posted by holloway at 4:55 PM on November 25, 2003


I want to ? you in the ?
posted by scarabic at 4:56 PM on November 25, 2003


damn! get right on this!
posted by scarabic at 4:56 PM on November 25, 2003


Oh, so there's a replacement for <lt; to make it &amp;lt; and >gt; to make it &amp;gt; , but not & to &amp; ?
posted by holloway at 4:57 PM on November 25, 2003


*cries*
posted by holloway at 4:58 PM on November 25, 2003


The obvious solution is to use English exclusively and quit trying to be cute/pretentious/know-it-all. ;-P
posted by mischief at 5:06 PM on November 25, 2003


This classy, classic tutorial from much-missed Evanizer is very useful.
posted by MiguelCardoso at 6:04 PM on November 25, 2003


I *heart* evanizer.

Fuck that up mr. metafilter textbox man!
posted by zpousman at 8:14 PM on November 25, 2003



posted by gen at 11:08 PM on November 25, 2003


well, lookit...it works.
posted by gen at 11:08 PM on November 25, 2003


'I think it's mostly an issue with the JDBC/CFMX connection, since I'm pretty sure I'm storing stuff in MSSQL as utf-8 and I specify utf-8 for all form submissions'

WHAT.
THE.
FUCK.
MATT?

;-)
posted by i_cola at 2:05 AM on November 26, 2003


I'm pretty sure I'm storing stuff in MSSQL as utf-8 and I specify utf-8 for all form submissions
The character set utf-8 isn't available in MSSQL, if you want to work in UTF-8 you have to store the data in fields with a Unicode data type (nvarchar etc) and convert to and from UTF-8 on the way in and out. The way I work in an ASP/SQL Server environment is pretty simple...
Form pages come in as UTF-8 (assuming someone hasn't changed the character set for some reason), and the ASP locale is set to UTF-8. The text fields in the database are nvarchar. Because the server knows that the string is UTF-8 and the field it is going to put it in is Unicode, the conversion is carried out automatically. conversely, when you retrieve data from the database, it knows that the data is unicode and that it has to convert it to UTF-8 when you do a response.write.
posted by chill at 3:04 AM on November 26, 2003


No, i_cola, that's WTF-M@ encoding.

HTH.
posted by arto at 4:34 AM on November 26, 2003


?????!
posted by SweetJesus at 10:36 AM on November 26, 2003


> I'm pretty sure I'm storing stuff in MSSQL as utf-8 and I specify utf-8
>for all form submissions. The character set utf-8 isn't available in MSSQL,
> if you want to work in UTF-8 you have to store the data in fields with a
> Unicode data type (nvarchar etc) and convert to and from UTF-8 on the
> way in and out. The way I work in an ASP/SQL Server environment is
> pretty simple...

Understood, you bet. But it's really hard to write it all down without those Martian characters.
posted by jfuller at 11:03 AM on November 26, 2003


私は漢字を好む。

Character entities work fine ( for example: &#x65e5;&#x672c; = 日本) but it's sort of a pain to figure out what code goes with what character. And when you hit preview it wipes all your codes from the entry window so you have to type them in again.
posted by bobo123 at 10:46 PM on November 26, 2003


????????.
posted by punishinglemur at 3:11 PM on November 30, 2003


??
日本
posted by punishinglemur at 3:11 PM on November 30, 2003


« Older Deleted Comment of the Day   |   MonkeyFilter: More Bananas, Less Flinging. Newer »

You are not logged in, either login or create an account to post comments