Special characters October 8, 2002 1:18 PMSubscribe
I “think” spéciàl chàràctérs shöuld wörk nöw, thöugh it's pröbably töö éàrly tö sày the böxés and quéstiön marks will be göné. That'll be £5 or 1000¥ or €3.5. posted by mathowie to Bugs at 1:18 PM (55 comments total)
Uh....fill in the blanks posted by thewittyname at 1:22 PM on October 8, 2002
Let's see if my favorite — the emdash (the — version) — works. How about the correct version (the — one) — does it work? posted by dayvin at 1:27 PM on October 8, 2002
Do you prefer ¢ or £?
[I prefer £, I have enough sense already!] posted by dash_slot- at 1:29 PM on October 8, 2002
old metatalk threads seem to work now, as do metafilter threads that had boxes in them. posted by mathowie(staff) at 1:33 PM on October 8, 2002
Me – I like the en-dash – either version.
Looks good. I hope the fix isn't what's slowing down the site. posted by timeistight at 1:34 PM on October 8, 2002
thanks, matt. posted by moz at 1:38 PM on October 8, 2002
Sniff. Well, very good work anyway, Matt! posted by yhbc at 1:39 PM on October 8, 2002
Excellent. Thanks for the improvement, Matt. posted by rocketman at 1:45 PM on October 8, 2002
Well, I'm glad the boxes are gone anyway, was it something you did or a patch to the server software? Any geeky details you'd care to share? posted by joemaller at 2:09 PM on October 8, 2002
¡Excelente, puedo ahora utilizar la primera marca del exclamation! posted by MrBaliHai at 2:25 PM on October 8, 2002
I had to change the database field types to "n" types (nvarchar instead of varchar, ntext instead of text, etc), as well as force utf-8 encoding, and I also had to install the latest service packs on the application server. It seems to be mostly working, the only problem I've found in testing is hebrew. posted by mathowie(staff) at 2:45 PM on October 8, 2002
Let's try Korean! :-)
? ?? ?? ???.
Works in the preview, now the moment of truth... posted by Plunge at 2:58 PM on October 8, 2002
grrrrr... posted by Plunge at 2:59 PM on October 8, 2002
Nevermind, then..heh :) posted by puffin at 3:14 PM on October 8, 2002
ok, so language issues still persist. Most problems are solved though. If you're seeing the characters in preview, the problem lies with the database drivers still. posted by mathowie(staff) at 3:15 PM on October 8, 2002
The difference in the character entity versions and the unicode is that your browser interprets the entities, SQL server and Cold Fusion have to interpret the unicode versions. Hence, the ???? when you try to do it without the entities. posted by eyeballkid at 3:44 PM on October 8, 2002
I could make some snarky comment on how this must be a plot by mathowie against the Hebrew speaking Jews and the Asians (with Korean & Japanese not posting, Chinese most likely doesn't either) to keep us down, but instead I'll say good job for even getting it this far. I'm sure you'll get it working with all the different languages and then boy, are we in trouble.
UNICODE data types in MS SQL Server only support characters from the code page assigned at the SQL Server install (though, 2000 supports the use of seperate collations for each database and seperate collations for queries). If I'm right, the default install code page is 1252 . That page only supports, I believe, the english alphabet and variations therof (Italian, Spanish, etc.) I believe that UNICODE set required for multiple languages is 850.
Still, I can't find a listing for the 850 code page anywhere in the MSDN or in SQL's BOL, so I'm not positive that it's the right page. I do know that 1252 won't handle anything except the variants I mentioned above (and, I believe, the Greek alphabet).
Fantastic - but now all the apologies for weird characters appearing in posts are going to look silly :-) posted by dg at 4:48 PM on October 8, 2002
Hey, it has the invisible characters working too now. posted by XQUZYPHYR at 6:43 PM on October 8, 2002
mathowie.. I wouldn't worry too much about getting the language characters to show up.
As nifty as it would be to post Japanese in hiragana, most users probably wouldn't be able to see the characters anyway because their computers don't have the language support installed. And hey -- this is an english-language web site, after all.
The first person that tries to post in Japanese is going to get a severe wood-shed-ass-hauled-into-metatalking-to. posted by insomnyuk at 7:28 PM on October 8, 2002
conichi-wa, insomnyuk (couldn't resist!) posted by amberglow at 7:30 PM on October 8, 2002
bobo: that second character ain't quite right posted by hama7 at 8:58 PM on October 8, 2002
????????? ???? posted by dydecker at 11:20 PM on October 8, 2002
Regular Japanese input doesn't work. posted by dydecker at 11:23 PM on October 8, 2002
No conspiracy, Plunge. bobo123 is using character entities, which means that instead of typing a character straight into the comment box, you type in the code for the character, in this format:
本
which would give you 本.
The database has no problem storing the character entities, because they're just numbers and punctuation, but the current software seems to choke if you enter the actual symbols, and produces all those ?s.
So if you're really desperate to put the odd foreign character in, that's how to do it (although remember it won't display in everyone's browser). Also watch out that when you type in a character entity, and then hit preview, it converts the code into a regular, typed-in character. You'll have to turn it back into a code before you post. posted by chrismear at 11:41 PM on October 8, 2002
Åwøsæme! posted by dagny at 1:16 AM on October 9, 2002
UNICODE data types in MS SQL Server only support characters from the code page.
That's not true actually. In SQL Server, Unicode fields are Unicode, pure and simple. The collation refers to the character set/sort order of non-unicode fields (e.g. varchar instead of nvarchar).
Matt, question marks (as opposed to blocks), and the fact that the characters work in preview, indicate to me that the text has not made in into the database correctly. Can you see Hebrew characters in there?
I know nothing of Cold Fusion, but in ASP, the first thing to check would be that the server codepage is set to UTF8, this basically tells ASP that when it is response.writing unicode data from the database, it should convert it to UTF8, and that any input it receives from form data should be considered UTF8 and converted to Unicode accordingly.
I know nothing of your internationalization skills, so I apologize if a certain phrase about sucking eggs comes to mind. posted by chill at 2:09 AM on October 9, 2002
this is all just just a cunning plan to reduce the maximum comment size from 8000 to 4000 characters... ;o) posted by andrew cooke at 6:01 AM on October 9, 2002
No conspiracy, Plunge. bobo123 is using character entities, which means that instead of typing a character straight into the comment box, you type in the code for the character...
Oh sure, way to go and ruin a good conspiracy for me. posted by Plunge at 9:31 AM on October 9, 2002
Wondering if this addresses the problem I posted last month.
&
That should display as ampersand-a-m-p-semicolon posted by chipr at 10:35 AM on October 9, 2002
Dang. That looked fine on preview (and I even typed in the character-entry thingie, not copy-and-pasted from the character map). posted by wanderingmind at 12:14 PM on October 9, 2002
An oldie but a goodie - Metafilter : That looked fine on preview! posted by stavrosthewonderchicken at 3:24 PM on October 9, 2002
Let's see! Here's a Portuguese sentence that uses all our accents (and which miserably failed the last test):
Os cães só têm maçãs à terça-feira.
[Dogs only have apples on Tuesday] posted by MiguelCardoso at 4:39 PM on October 9, 2002
Here's some hebrew:
?? ???? ?? ??????? ??
Here's a trademark symbol: ™
posted by mathowie (staff) at 1:22 PM on October 8, 2002