Accents and posting January 28, 2004 7:39 PM   Subscribe

A boring, very minor quibble with a view to absolute perfection: accent xenophobia continues to run rampant in MetaFilter's and MetaTalk's posting boxes. [More inside.]
posted by MiguelCardoso to Bugs at 7:39 PM (43 comments total)

This isn't really a problem with Safari (as one can just go back to the original text, take out the accents, tildes, cedillas and repost) but it is with other browsers, specially on MeTa. In those cases, the part of the text after the offending accent or character is lost forever.

Still, given MeFi's increasingly international status, it sure would be nice for us foreigners and ortographic sticklers to be able to spell foreign words and proper names correctly in our posts. There's no such problem with the comment boxes, so I'm hoping it's a minor fix. If not (or if it's a Safari thing), please disregard.

I should add that I'm not talking about importing and pasting illegal characters, but typing them out properly.
posted by MiguelCardoso at 7:39 PM on January 28, 2004


solution: write in the lord's chosen language, english.
posted by mathowie (staff) at 8:24 PM on January 28, 2004


Metafilter: A boring, very minor quibble with a view to absolute perfection.

write in the lord's chosen language, [E]nglish.

While they may be here because of borrowing, doesn't English actually have some words with accents?

Now if you'll, excuse me, I have to go work on my resume...
posted by namespan at 8:33 PM on January 28, 2004


Well!

So this is how you reward the Portuguese for discovering bloody California?

That does it. I'm wiring Iceland so that nobody speaks a word of English to you when you arrive.

*hauls mathowie's ass before the United Nations' Commission For Multicultural Insensitivity, fully expecting he should receive a stiff letter, on tough recycled cardboard, from Kofi Annan*
posted by MiguelCardoso at 8:38 PM on January 28, 2004


Don't you mean resume working on your resume.
posted by anathema at 8:38 PM on January 28, 2004


?
posted by anathema at 8:39 PM on January 28, 2004


Well, I'd agree with you anathema, but there is that name of yours...
posted by namespan at 8:48 PM on January 28, 2004


Is ortography a means of representing orts? If so, (*looks around*) I don't think we require its assistance.
posted by Opus Dark at 8:55 PM on January 28, 2004


Ok, so seriously, to all the programmers out there, this wheel must have been invented countless numbers of times but I never tried it myself. Is this what would solve all the problems:

when submitting a comment, replace all instances of & with & unless there is text immediately to the right of & (like   , or <)

Would that work in all browsers? I seem to remember Safari escapes things in textareas on its own, and everyone in safari would be double-escaping.
posted by mathowie (staff) at 9:02 PM on January 28, 2004


To be slightly more helpful, there's always things like the ISO Latin Character Entities, for those whose corações long for orthographic perfection.
posted by namespan at 9:03 PM on January 28, 2004


Hmmm. I guess I didn't preview closely... mathowie, your logic looks right but... apparently has already been implemented or isn't needed, because the entities seemed to have worked fine for me. Maybe you're not grabbing &#'s?
posted by namespan at 9:08 PM on January 28, 2004


Ah, namespan: corações - that's probably the single most important word in the Portuguese language and culture. Get that and you get everything. Thanks!
posted by MiguelCardoso at 9:19 PM on January 28, 2004


We've already have several useful threads about how to render special characters and accents. My (I repeat) minor and minority pony is that posters could just type those characters in when posting and see them rendered as they were typed, same as what happens with comments.

About 10 years ago I was deeply involved in a (doomed) movement to keep separate Portuguese and Brazilian spellings, like what happens with English and American English. Brazilians are keener than we are on doing away with accents.

With great effort, I came up with this example:

Tens cágado no jardim?

(Do you have a tortoise in your garden?) ("Cágado" is "tortoise")

would become, according to the new convention:

Tens cagado no jardim?

(Have you been shitting in your garden?) ("Cagado" is the past participle of "Cagar", "To shit").

I'm sorry to say that, despite all this, our movement lost, so now the two sentences are the same. :(

Let not MetaFilter go down the same slippery slope! ;)
posted by MiguelCardoso at 9:39 PM on January 28, 2004


Dumb question: can't you just URLEncode everything?
posted by timeistight at 10:23 PM on January 28, 2004


Miguel, I have several friends whom you remind me of: irrepressibly and gloriously bi/trilingual. I love 'em, and, by extension, you. I hate to see your wings clipped over something so petty as character sets. I will say it's fairly one bitch of a hassle though, having managed my way through similar problems.

That said: find a site that lets you post in whatever character set you want all you want in whatever browser you want. Something server-side that's hiccup-free, platform agnostic and won't turn a Portugese "fig" into a Brazillian "fart." We'll start a conversation with those engineers and figure out the smartest way for Matt to implement it (at his convenience, of course). But let's figure out something smart first. No need for the Haughey to reinvent a wheel everyone and their grandmother is probably using already. So just find one that works the way you want. Pick one, any one.

You know, though, while you're at it, I think it might take more than accent marks to banish multi-lingual abiguity from, and bring universal idiomatic flavor to MetaFilter... I'm thinking haiku in original Japanese, perhaps even some calligraphic capability for posting in Arabic that would allow my ancestors' tendency to inspire each word with pictographic emphasis to take root in my MetaFilter comments and render correctly.

Actually... considering the weight distribution on this, I think perhaps timestight's suggestion is a good one. Zero implementation time, browser-side extensibility! Well, that is, at least as an interim solution. It could be a lot worse than URL-encoding, though, Miguel. My best guess is I'll be posting in my calligraphic GIFs for some time to come.
posted by scarabic at 11:00 PM on January 28, 2004


The algorithm I used on my site to escape special characters but permit character entities was to replace all &s with &, and then replace &([[:alnum:]]); with &$1; (unescape any escaped entities).

But as for what goes into the <textarea/>, just escape all the ampersands and leave it at that.
posted by Khalad at 11:17 PM on January 28, 2004


Touché, Miguel--you rapscallion, you. I, too, would like to see this implemented, if at all possible, if only because the thought of an unhappy Miguel is too much for this world.

Also, this way I can display my broad range of improperly used foreign words.
posted by The God Complex at 11:44 PM on January 28, 2004


That said: find a site that lets you post in whatever character set you want all you want in whatever browser you want

Hey, thanks, scarabic and other fellow members!

eGullet* is the community weblog I know and belong to which is closest to absolute technical perfection. They've sorted out the accents and characters long ago - well, they would have, wouldn't they? We're talking (and heretofore I have to omit all accents) puree [should be an accent on the first "E"], balchao [should be a tilde on the second "A"] and cacao [Should be a cedilla on the "C" and a tilde on the "A", to mean "shark"], rather than cacao, which means cocoa.

All I'm asking for here is a means to distinguish fish from chocolate!

*Jason Perlow is a philantropic genius, foodwise; webwise and codewise. He's the Matt That Roared, so to speak! ;)
posted by MiguelCardoso at 12:03 AM on January 29, 2004


corações - that's probably the single most important word in the Portuguese language and culture.

Ta fazendo ano e meio amor
Que o nossa lar desmoronou
Meu sabiá meu violão
E uma cruel desilusão
Foi tudo que ficou
Ficou para machucar meu coração

Ta fazendo ano e meio amor
Que o nossa lar desmoronou
Meu sabiá meu violão
E uma cruel desilusão
Foi tudo que ficou
Ficou para machucar meu coração

Quem sabe não foi bem melhor assim
Melhor pra você e melhor pra mim
A vida é uma escola
que a gente precisa aprender

A ciência de viver pra não sofrer


(yeah, i know, brazilian, but you said language as well as culture - and joao gilberto is certainly anything if not universal)
posted by PrinceValium at 12:13 AM on January 29, 2004


Actually, I now have to admit that I have almost no idea what this thread is about or is ideally attempting to accomplish, but I like the word rapscallion so much that I decided to post. I'm a reprobative villain, I admit--or perhaps I'm a villainous reprobate, depending on the day of the week and the turn of the wheel.

My eloquent and important point stands, though, which is what the hell is going on and if you can't post accents in post, then why does this thread exist? Or is it only certain accents?
posted by The God Complex at 12:18 AM on January 29, 2004


" I have almost no idea what this thread is about or is ideally attempting to accomplish"

mission accomplished, Miguel
;)


by the way, I've always loved how the meaning of the word "polish" totally changes if you write it with a capital P

posted by matteo at 4:37 AM on January 29, 2004


Miguel are you being paid by Safari for product placement? The public has a right to know.
posted by johnny novak at 4:48 AM on January 29, 2004


sorry, Apple.
posted by johnny novak at 4:50 AM on January 29, 2004


and heretofore I have to omit all accents

For heretofore read henceforth, I believe. That aside, I am in full support of the Movement to Import Graves and Umlauts Everywhere Liberally. And while we're at it, Ênglish could üse a few møre âccents.
posted by languagehat at 9:28 AM on January 29, 2004


Wi nøt trei a høliday in Sweden this yër? See the løveli lakes, the wøndërful telephøne system...
posted by DrJohnEvans at 10:11 AM on January 29, 2004


by the way, I've always loved how the meaning of the word "polish" totally changes if you write it with a capital P

There's a great old joke about that but if I repeated it here, I'd probably be hung.
posted by jonmc at 10:34 AM on January 29, 2004


Well hung, obviously...

/acknowledges fellow innuendo lover
posted by dash_slot- at 11:08 AM on January 29, 2004


Miguel is so cliché.
But thanks for the new example of foreign language profanity (cagar) I can use on obnoxious monolingual Americans, you tortoisehead.

Thank you too, DrJøhn.

And jonmc, the only polish joke I know was when Martin Mull came up with the perfect name for a white bluesman: Blind Lemon Pledge. (and dash beat me to the obvious hung joke)
posted by wendell at 11:11 AM on January 29, 2004


And there is _NO_ missing comma in that sentence, miguel, amberglow, languagehat and all other logophiles!
posted by dash_slot- at 11:13 AM on January 29, 2004


The punch line of mine has to do with shit and shinola. You figure it out...
posted by jonmc at 11:30 AM on January 29, 2004


I think you need latin 1 encoding, as UTF8 does not support a lot of the extra characters.

Whether this can be changed as a shell environment or server locale, I don't know.
posted by the fire you left me at 12:43 PM on January 29, 2004


And there is _NO_ missing comma in that sentence, miguel, amberglow, languagehat and all other logophiles!


I could have sworn there was, and even another place I would have put one; the latter would only be a suggestion, of course ;)
posted by The God Complex at 1:04 PM on January 29, 2004


My (I repeat) minor and minority pony is that posters could just type those characters in when posting and see them rendered as they were typed, same as what happens with comments.

See, I agree with Miguel. It would be really nice to be able to just write out the special characters without knowing the terms UTF8, ISO, URL encoding etc etc.

Like, a set of clickable shortcuts just like the ones we have now for bold, italics and link tags. Why even a mefi user could write this as an offsite tool no? A mefi comment-maker with support for foreign characters.
posted by vacapinta at 1:11 PM on January 29, 2004


Or you could just do what I did and spend a few years, at fourteen, playing online games and learning all the numeric keypad codes for them.

Ÿöû® ßá§ê ã®ë ßé£õñg †ö ù§, ñêWßìéž
+----+----+----+---- +----

I think I was robbed of my youth.
posted by The God Complex at 1:34 PM on January 29, 2004


It would definitely be nice to have support for other character sets... I've gotten "?"s from trying to post in Japanese, even though everything looked fine on preview. It would be really great to be able to type in Japanese and have MeFi display it correctly. This said, I think there is some sort of workaround for this, but I've never been able to get it to work. Anybody got a concise explanation of how to correctly post Japanese to MeFi?
posted by vorfeed at 2:57 PM on January 29, 2004


TGC, you were fourteen for a few years? You can bottle that stuff and sell it, you know.
posted by PrinceValium at 2:59 PM on January 29, 2004


Why oh why do I get the impression that my little pony looks like it's going to stay single and lonely? ;)
posted by MiguelCardoso at 3:04 PM on January 29, 2004


dash : >

ánd î'm wíth Mígüél on this
posted by amberglow at 3:58 PM on January 29, 2004


I think you need latin 1 encoding, as UTF8 does not support a lot of the extra characters.

Are you positive? I'm pretty sure that UTF-8 supports many more characters than Latin-1, since UTF-8 is a Unicode encoding. Latin-1 can represent about 191 characters, while UTF-8 represents, I don't know, a bazillion, so in my thinking, at least, UTF-8 is the more versatile and thus prefereable character encoding.
posted by cobra libre at 8:39 PM on January 29, 2004


Thanks, free snake!
posted by MiguelCardoso at 9:25 PM on January 29, 2004


TGC, you were fourteen for a few years? You can bottle that stuff and sell it, you know.

I spend five years at fourteen. At ten, they said I was four years ahead of the pace, and for the next four I was static, mindlessly clicking away at Diablo and memorizing numeric keypad codes in order to righteously vanquish my online foes in a battle of accents, umlauts, and other assorted character keys.

I'm pretty sure it destroyed any chance I had to make something of myself. You can catch me on an upcoming episode of Dr. Phil with Joe Lieberman.
posted by The God Complex at 2:40 PM on January 30, 2004


I blame dueling Persians
posted by clavdivs at 3:00 PM on January 30, 2004


??????????: ???????? ?? ????????? ????? ????, ?????????? ?????
posted by TimeFactor at 5:50 PM on January 30, 2004


« Older IraqFilter, I/P, staying on topic   |   Upgrade in progress Newer »

You are not logged in, either login or create an account to post comments