Live preview fixes some things but they still want more.... July 21, 2005 12:34 PM   Subscribe

The live preview window seems to have fixed the old character entity problem... kinda. As long as you use hexadecimal character codes, things turn out fine. The annoying days of "????" "Oops, I meant ????" "Shit, make that ????" are almost over.

Since we're already so close, could we get one of the following?
1) Put a note under the text box reminding new users to use hex codes only.
2) Fix it so character names and decimal character codes work too.
posted by nebulawindphone to Feature Requests at 12:34 PM (37 comments total)

(FWIW, Matt, please take this as a suggestion and not a complaint. The live preview window rocks, and I'm thrilled that character entities work better than they used to.)
posted by nebulawindphone at 12:35 PM on July 21, 2005


So when you say "hex codes work" do you mean you can get stuff to show up if you write ampersand-pound sign-number code-semicolon?

What is a decimal character code? character name?

Show me one example of each, with the one that worked and the two that didn't, along with the exact text you entered. I need to see a pattern of what works and doesn't.

Feel free to post the text somewhere else and link it here, if the preview is just going to munge it up anyway.
posted by mathowie (staff) at 12:52 PM on July 21, 2005


mathowie : "when you say 'hex codes work' do you mean you can get stuff to show up if you write ampersand-pound sign-number code-semicolon?"

Yes, exactly, as long as they're in hex notation, not decimal notation.

Here's an example of the sentence "I am bugbread" in Japanese.

First, hex:
私は虫パン

Next, decimal:
私は虫パ

Unfortunately, I don't know what "character code" or "character name" mean either.

Note that in preview, both look fine, but when posted, only the hex one shows up fine.

If you want to test it yourself, matt, but don't have an IME for entering foreign text, try this:

Go to Yahoo Korea or Japan or whathaveyou, and copy the text so that you have something to fiddle with.

Then go to this site, and paste the text into the top right box. Then click anywhere outside of that box, and you will have that text converted into character numbers (character code?), UTF-8, UTF-16, hexadecimal NCRs, and decimal NCRs. You can then preview or post them, and compare them to the original pasted string, to see if it works. No foreign language skills necessary!
posted by Bugbread at 12:58 PM on July 21, 2005


Unicode characters
If you type: 김치
Live preview shows: 김치
Post shows: ??

Hexadecimal codes
If you type: 김치
Live preview shows: 김치
Post shows: 김치

Decimal codes
If you type: 김치
Live preview shows: 김치
Post shows: 김치
posted by 김치 at 1:00 PM on July 21, 2005


Ah, common sense intervenes. Since hex is converted, posting it as-is doesn't show you much. I should have said:

This hex:
& #x79C1;& #x306F;& #x866B;
with the spaces after the & symbols removed, shows up as:
私は虫
posted by Bugbread at 1:01 PM on July 21, 2005


Sorry, for that last one (Decimal codes) it displays as 김치 in preview, 김치 in post. D'oh!
posted by 김치 at 1:01 PM on July 21, 2005


(Ah, I see I made a mistake up above. I thought that < and é and their ilk were broken too, but they appear to work. So it's just the decimal character codes — the ones bugbread describes — that are broken.)
posted by nebulawindphone at 1:03 PM on July 21, 2005


Someone is going to need to do a plain text file on another server or take a screenshot or something. I still don't get what the problem is, apart from ampersands get parsed and utf-16 or utf-8 stuff probably can't be fetched or stored in the database.
posted by mathowie (staff) at 1:03 PM on July 21, 2005


김치 : "Decimal codes
"If you type: 김치
"Live preview shows: 김치
"Post shows: 김치"


That's not what I get. With those same symbols, I get:

If you type: 김치
Live preview shows: 김치
Post shows: 김치
posted by Bugbread at 1:03 PM on July 21, 2005


D'oh! Already foiled by the preview! My correction of 김치's preview-post display was started before he posted his mea-culpa, and posted afterwards. I guess I need to go back to non-live-preview.
posted by Bugbread at 1:05 PM on July 21, 2005


I guess it's also worth mentioning whether or not these special characters add anything. I mean, I know it's kind of a fun programmer problem to solve to get them to work, but are there really that many comments on MetaFilter that need to be in special character sets?

Aside from "I [heart symbol] foo" comments?
posted by mathowie (staff) at 1:07 PM on July 21, 2005


mathowie : "Someone is going to need to do a plain text file on another server or take a screenshot or something. I still don't get what the problem is, apart from ampersands get parsed and utf-16 or utf-8 stuff probably can't be fetched or stored in the database."

Ok, here ya go:



The problem isn't so much that certain codes can or can't be used, as that the preview window makes you think certain codes can be used, but when you push "Post", they convert to ?? or &#blahblahblah. If they previewed as broken, there would be less, "???" "I meant, ???" "Dammit! Lemme try again. ????" runs. And if there was a note pointing out that hex codes should be used, people wouldn't have to trial-and-error it until they figured it out.
posted by Bugbread at 1:19 PM on July 21, 2005


Here's a summary:

1. Use Unicode.
2. Type an ampersand and then an abbreviated name for the character (e.g. é).
3. Type an ampersand and then a decimal number that corresponds to the character (e.g. &#233).
4. Type an ampersand and then a hexadecimal number that corresponds to the character (e.g. &#xE9).

2 and 4 work reliably. 1 and 3 work in the live preview window, but not in the final post. Fixing 1 might be hard, but fixing 3 seems like it should be easy enough.

From my point of view, the real reason to fix this would be to make foreign-language text easier to post. I can see that being especially useful in AskMe, where people might want to put up translation questions and the like. But it's up to you whether that counts as a good reason.
posted by nebulawindphone at 1:23 PM on July 21, 2005


mathowie : "I guess it's also worth mentioning whether or not these special characters add anything...are there really that many comments on MetaFilter that need to be in special character sets?"

Mebbe, mebbe not. But I think nebulawindphone (and myself) are just looking at it from a "we're so close!" standpoint. If it's a pain-in-the-butt, it's not worth it, but while I think
2) Fix it so character names and decimal character codes work too.
may be a tough order,
1) Put a note under the text box reminding new users to use hex codes only.
would be painless enough that it would justify the (er...your) effort.
posted by Bugbread at 1:24 PM on July 21, 2005


but are there really that many comments on MetaFilter that need to be in special character sets?

Certainly, there are many current multilingual Metafilter users who are interested in this possibility (Mr. Hat perhaps being the most visible). I think if the capability were added (and not just using the hex workaround, but really added, so that it was possible to type and manipulate Unicode characters and not have them mangled in the output), then you would see a flourishing of communication that is not currently possible on Metafilter. Of course there's no way to measure this with any certainty, because the conversations you're asking about are currently technologically impossible here. But I'd wager that there is any number of users waiting in the wings who would be willing to add their own insights. Unfortunately, if your name is 史慧, you can't even currently register that as your Metafilter username.
posted by 김치 at 1:31 PM on July 21, 2005


I find the ???? just as meaningful as the actual characters would be.
posted by smackfu at 1:44 PM on July 21, 2005


김치 : "you would see a flourishing of communication that is not currently possible on Metafilter"

I understand what you're saying, but from what I understand, Mefi is meant to be, primarily, an English language board. Whether that's for better or for worse is a different issue, but I can't really see a practical way for Matt to do his administrative duties (pulling self-linking posts, pulling bad posts, blocking/deleting derails, noise, pepsi blue, etc.) on languages he can neither read nor understand. So, in that context, it wouldn't add all that much functionality, in that if it did open up a flourishing of communication, it would be a flourishing of communication out of the moderator's grasp, and therefore a flourishing that would have to be stopped, or all administration of posts dropped (which would suck).

There are some threads that it would be helpful to, though, in an English language context (i.e. threads in English about other languages). One I can think of off the top of my head is the post about folks getting tattoos in Asian languages, but not knowing what they mean. AskMe could also occasionally benefit (for folks who want to get a tattoo in an Asian language they can't read).

So, it has its potential applications, but opening up giant realms of discourse isn't really one of them.
posted by Bugbread at 1:49 PM on July 21, 2005


I can't really see a practical way for Matt to do his administrative duties (pulling self-linking posts, pulling bad posts, blocking/deleting derails, noise, pepsi blue, etc.) on languages he can neither read nor understand.

Jag håller med insektsbrödet. Ska vi börja skriva på vilka språk som helst här blir det jävligt svårt att kommunicera med varandra. Tror inte att Matt tycker det är en så särskilt bra idé.
posted by mr.marx at 2:23 PM on July 21, 2005


Hey! I'll have you know my mother is a fine upstanding woman, mr.marx. She would never do that.
posted by mathowie (staff) at 2:46 PM on July 21, 2005


=)
posted by mr.marx at 2:57 PM on July 21, 2005


So I knew that the live preview would differ slightly from the actual posting. I do filter out some HTML, and the live preview is essentially unfiltered.

But I don't see much of a benefit in characters being easier to post, for reasons stated above that this site has to stay english only as much as possible in order for everyone to be able to read it (and for me to moderate). So, I won't be working hard on fixing this.

I will eventually put a link to a note about the slight differences between the HTML that will get filtered out when posted to the server, and which character encoding will work on a post and what won't.
posted by mathowie (staff) at 3:03 PM on July 21, 2005


You know, some people's names can't be spelled properly without special characters. If people want to quote the titles of books in Swedish or Turkish, they can't do so correctly without special characters. I seriously doubt making characters easier to post will lead to a flood of conversations in obscure languages; it's simply a courtesy to people who have need of such characters from time to time. I don't see why you're treating it as some kind of danger.

Damn, this live preview is the cat's pajamas!
posted by languagehat at 5:34 PM on July 21, 2005


I don't see why you're treating it as some kind of danger.

I'm not treating it as a danger -- it's a royal pain the ass to solve the problem, and I'm merely asking for reasoning behind the hours that would go into testing and tweaking to make it work.
posted by mathowie (staff) at 6:01 PM on July 21, 2005


As someone who's worked for hours with someone who doesn't use foreign characters trying to get them to get their configuration files correct, I can vouch for the fact that it's a royal pain in the ass to solve the problem, and I totally understand matt's not desiring to fix it all up.

It should be easy to get things displayed properly: Mefi is in fact the only forum blog thingie I've ever seen that didn't handle non-ASCII normally...but that's crying over spilled milk. Mefi has been made, and fixing something that's already been made to allow it to work right with non-ASCII is far, far from easy, and requires a pretty compelling reason to justify the effort.
posted by Bugbread at 6:07 PM on July 21, 2005


I really like being able to use unicode characters. And there was once a comment, way back, where I was trying to explain something about the origin of kanji or something and I got totally shut down by the no-unicode thing.
On the other hand, I do also understand that adding non-ascii to an existing site is a pain. So. I dunno. Unicode is the way of the future! But the road to the future is paved with tired programmers.

I still like the idea of completely reimplementing MeFi from the ground up, but that would have to be a big huge open project if anything, and ... well, you know... Effort.

And by the way.. I think this live preview thing is dangerous. It's like seeing yourself on the big screen. It encourages me to type and type... which is weird, because it's not like I can't already see what I'm typing in the little editbox.
posted by blacklite at 9:48 PM on July 21, 2005


on non-preview, maybe this is a bad idea after all.
posted by blacklite at 10:01 PM on July 21, 2005


If people want to quote the titles of books in Swedish or Turkish, they can't do so correctly without special characters.

Actually, all the time's I've seen ??? appear is when people were trying to do things like "Metafilter™" or "I ♥ Huckabees" or ∫(⅜dx)/(√dy).
posted by rafter at 10:04 PM on July 21, 2005


it's a royal pain the ass to solve the problem, and I'm merely asking for reasoning behind the hours that would go into testing and tweaking to make it work.

Fair enough. Since I'm a complete ignoramus about coding, I did not know that. Thanks for explaining.
posted by languagehat at 9:04 AM on July 22, 2005


Here's a good example of a thread that might benefit.

Matt: I'm not trying to be a pain in the ass about this (which is why this will be my last comment on the subject). I've done some programming on this type of stuff and I realize that it's not trivial; but at the same time, I did that work because linguistic issues are important to me. Only you can decide whether they're important enough to you to put in that effort for Metafilter. I'm just trying to let you know what I think some of the reasons are.
posted by 김치 at 3:44 PM on July 22, 2005


A few points:
It certainly appears to me that it's not broken so much as not completely implemented. Kimchi and languagehat have been able to post what they want, as long as they use hex code.
bugbread's suggestion above to remind users to restrict their non-English to hex coding should be sufficient.
As a Japanophile, I like the fact that if necessary, I can post
?????
That it might be a slight pain in the butt to do so is acceptable, given the 'dangers' of a surfeit of essentially unmoderatorable (word?) text.
What would be great (and this is not for Matt to code) is some way to know what language a particular set of symbols is representing. I recognize most Asian languages, but the "Jag håller..." above - is that Finnish, Swedish, Norwegian? I don't know - and trying to translate an unknown language is mucho difficult on the web. A tag or parenthetic (Jpn) or (Fin) or (Norw), or whatever, might be a good idea.
posted by birdsquared at 4:39 PM on July 23, 2005


crap. I thought that was going to work - I used hex codes...&#x65E5&#x672C&#x304C&#x597D&#x304D
posted by birdsquared at 4:41 PM on July 23, 2005


日本が好き
I see I missed the semi-colons - like I said, a slight pain in the butt :-)
posted by birdsquared at 4:43 PM on July 23, 2005


While I'm correcting myself time and again, I should note that the idea to put the hex reminder is not initially bugbread's, rather nebulawindphone suggests it in the original post. Mea culpa.
posted by birdsquared at 7:20 PM on July 23, 2005


birdsquared : "A tag or parenthetic (Jpn) or (Fin) or (Norw), or whatever, might be a good idea."

Good idea. I'll try to do that from now on. (Though I think I'll skip it for characters such as ℃, which, though I may get them from the Japanese character set, are not actually Japanese)
posted by Bugbread at 7:39 PM on July 23, 2005


Trying out another method of entering non-English text here. Pardon me if it becomes question marks.
どうでしょうか?ちゃんと映るのかな?
posted by Bugbread at 2:27 AM on July 25, 2005


And one more fiddle.
今回はいけるのかな?
posted by Bugbread at 2:27 AM on July 25, 2005


bugbread - tsk,tsk. Right after writing that you'll "try to do that from now on" - you post "entering non-English text" as opposed to "entering Japanese text" (which by definition is non-English).
/mini snark
posted by birdsquared at 10:11 AM on July 25, 2005


« Older Love preview, but can't see it   |   What are your top 5 favourite Metafilter FPPs of... Newer »

You are not logged in, either login or create an account to post comments