Special characters October 8, 2002 1:18 PM Subscribe

I “think” spéciàl chàràctérs shöuld wörk nöw, thöugh it's pröbably töö éàrly tö sày the böxés and quéstiön marks will be göné. That'll be £5 or 1000¥ or €3.5.
posted by mathowie (staff) to Bugs at 1:18 PM (54 comments total)

hmm. everything seems to be working.

Here's some hebrew:
?? ???? ?? ??????? ??

Here's a trademark symbol: ™
posted by mathowie (staff) at 1:22 PM on October 8, 2002

Ì ßènt ?? Woo??e.

Excellent.
posted by thewittyname at 1:22 PM on October 8, 2002

dang, is hebrew beyond utf-8?
posted by mathowie (staff) at 1:22 PM on October 8, 2002

Uh....fill in the blanks
posted by thewittyname at 1:22 PM on October 8, 2002

Let's see if my favorite — the emdash (the  version) — works. How about the correct version (the — one) — does it work?
posted by dayvin at 1:27 PM on October 8, 2002

Do you prefer ¢ or £?

[I prefer £, I have enough sense already!]
posted by dash_slot- at 1:29 PM on October 8, 2002

old metatalk threads seem to work now, as do metafilter threads that had boxes in them.
posted by mathowie (staff) at 1:33 PM on October 8, 2002

Me – I like the en-dash – either version.

Looks good. I hope the fix isn't what's slowing down the site.
posted by timeistight at 1:34 PM on October 8, 2002

thanks, matt.
posted by moz at 1:38 PM on October 8, 2002

Sniff. Well, very good work anyway, Matt!
posted by yhbc at 1:39 PM on October 8, 2002

Excellent. Thanks for the improvement, Matt.
posted by rocketman at 1:45 PM on October 8, 2002

é è ç ì ù

posted by matteo at 1:47 PM on October 8, 2002

[I prefer £, I have enough sense already!]

/me £'s dash_slot
posted by bradlands at 1:49 PM on October 8, 2002

Let’s see if it’s fixed what happened last time I tried to use typographer’s punctuation.
posted by Firefly at 1:50 PM on October 8, 2002

Another attempt at hebrew: (just letters, works in preview)
??????

? Œ‰†‡æ¡™£¢8§¶£¢8°·?‡‹Ô?¯¿

Preview is working, at least in Mozilla/Chimera... here goes...
posted by joemaller at 2:07 PM on October 8, 2002

and failure.
posted by joemaller at 2:08 PM on October 8, 2002

Well, I'm glad the boxes are gone anyway, was it something you did or a patch to the server software? Any geeky details you'd care to share?
posted by joemaller at 2:09 PM on October 8, 2002

¡Excelente, puedo ahora utilizar la primera marca del exclamation!
posted by MrBaliHai at 2:25 PM on October 8, 2002

I had to change the database field types to "n" types (nvarchar instead of varchar, ntext instead of text, etc), as well as force utf-8 encoding, and I also had to install the latest service packs on the application server. It seems to be mostly working, the only problem I've found in testing is hebrew.
posted by mathowie (staff) at 2:45 PM on October 8, 2002

Let's try Korean! :-)

? ?? ?? ???.

Works in the preview, now the moment of truth...
posted by Plunge at 2:58 PM on October 8, 2002

grrrrr...
posted by Plunge at 2:59 PM on October 8, 2002

¿Que?
posted by mr_crash_davis at 3:07 PM on October 8, 2002

Here is some Hebrew as character entities:
א ב ג
And just as characters:
? ? ?
posted by chrismear at 3:10 PM on October 8, 2002

Hmm..what about japanese?

?????????

looks ok in preview...

posted by puffin at 3:13 PM on October 8, 2002

Nevermind, then..heh :)
posted by puffin at 3:14 PM on October 8, 2002

ok, so language issues still persist. Most problems are solved though. If you're seeing the characters in preview, the problem lies with the database drivers still.
posted by mathowie (staff) at 3:15 PM on October 8, 2002

The difference in the character entity versions and the unicode is that your browser interprets the entities, SQL server and Cold Fusion have to interpret the unicode versions. Hence, the ???? when you try to do it without the entities.
posted by eyeballkid at 3:44 PM on October 8, 2002

I could make some snarky comment on how this must be a plot by mathowie against the Hebrew speaking Jews and the Asians (with Korean & Japanese not posting, Chinese most likely doesn't either) to keep us down, but instead I'll say good job for even getting it this far. I'm sure you'll get it working with all the different languages and then boy, are we in trouble.

:-)
posted by Plunge at 3:46 PM on October 8, 2002

UNICODE data types in MS SQL Server only support characters from the code page assigned at the SQL Server install (though, 2000 supports the use of seperate collations for each database and seperate collations for queries). If I'm right, the default install code page is 1252 . That page only supports, I believe, the english alphabet and variations therof (Italian, Spanish, etc.) I believe that UNICODE set required for multiple languages is 850.

Still, I can't find a listing for the 850 code page anywhere in the MSDN or in SQL's BOL, so I'm not positive that it's the right page. I do know that 1252 won't handle anything except the variants I mentioned above (and, I believe, the Greek alphabet).

posted by eyeballkid at 4:04 PM on October 8, 2002

Fantastic - but now all the apologies for weird characters appearing in posts are going to look silly :-)
posted by dg at 4:48 PM on October 8, 2002

mathowie.. I wouldn't worry too much about getting the language characters to show up.

As nifty as it would be to post Japanese in hiragana, most users probably wouldn't be able to see the characters anyway because their computers don't have the language support installed. And hey -- this is an english-language web site, after all.

posted by puffin at 7:04 PM on October 8, 2002

I'm guessing this might not work: 日本語? א? ⇔?
posted by bobo123 at 7:07 PM on October 8, 2002

日朮語?
posted by bobo123 at 7:16 PM on October 8, 2002

The first person that tries to post in Japanese is going to get a severe wood-shed-ass-hauled-into-metatalking-to.
posted by insomnyuk at 7:28 PM on October 8, 2002

conichi-wa, insomnyuk (couldn't resist!)
posted by amberglow at 7:30 PM on October 8, 2002

bobo: that second character ain't quite right
posted by hama7 at 8:58 PM on October 8, 2002

???
posted by reverendX at 9:21 PM on October 8, 2002

Korean test :

?????!
?? stavrosthewonderchicken ???.
posted by stavrosthewonderchicken at 9:21 PM on October 8, 2002

I know. I meant 日本語 but I typed 672E instead of 672C.

私はガラスを食べられます。それは私を傷つけません。
posted by bobo123 at 9:22 PM on October 8, 2002

Hmmm... so Japanese does work? But Korean is still a no show. *sigh*

/me ponders a Japanese conspiracy...
posted by Plunge at 9:25 PM on October 8, 2002

This is so great—dashes work now!
posted by Yelling At Nothing at 11:19 PM on October 8, 2002

????????? ????
posted by dydecker at 11:20 PM on October 8, 2002

Regular Japanese input doesn't work.
posted by dydecker at 11:23 PM on October 8, 2002

No conspiracy, Plunge. bobo123 is using character entities, which means that instead of typing a character straight into the comment box, you type in the code for the character, in this format:

本

which would give you 本.

The database has no problem storing the character entities, because they're just numbers and punctuation, but the current software seems to choke if you enter the actual symbols, and produces all those ?s.

So if you're really desperate to put the odd foreign character in, that's how to do it (although remember it won't display in everyone's browser). Also watch out that when you type in a character entity, and then hit preview, it converts the code into a regular, typed-in character. You'll have to turn it back into a code before you post.
posted by chrismear at 11:41 PM on October 8, 2002

Åwøsæme!
posted by dagny at 1:16 AM on October 9, 2002

UNICODE data types in MS SQL Server only support characters from the code page.
That's not true actually. In SQL Server, Unicode fields are Unicode, pure and simple. The collation refers to the character set/sort order of non-unicode fields (e.g. varchar instead of nvarchar).
Matt, question marks (as opposed to blocks), and the fact that the characters work in preview, indicate to me that the text has not made in into the database correctly. Can you see Hebrew characters in there?
I know nothing of Cold Fusion, but in ASP, the first thing to check would be that the server codepage is set to UTF8, this basically tells ASP that when it is response.writing unicode data from the database, it should convert it to UTF8, and that any input it receives from form data should be considered UTF8 and converted to Unicode accordingly.
I know nothing of your internationalization skills, so I apologize if a certain phrase about sucking eggs comes to mind.
posted by chill at 2:09 AM on October 9, 2002

this is all just just a cunning plan to reduce the maximum comment size from 8000 to 4000 characters... ;o)
posted by andrew cooke at 6:01 AM on October 9, 2002

No conspiracy, Plunge. bobo123 is using character entities, which means that instead of typing a character straight into the comment box, you type in the code for the character...

Oh sure, way to go and ruin a good conspiracy for me.
posted by Plunge at 9:31 AM on October 9, 2002

Wondering if this addresses the problem I posted last month.

&

That should display as ampersand-a-m-p-semicolon
posted by chipr at 10:35 AM on October 9, 2002

I ? Unicode.
posted by wanderingmind at 12:11 PM on October 9, 2002

Dang. That looked fine on preview (and I even typed in the character-entry thingie, not copy-and-pasted from the character map).
posted by wanderingmind at 12:14 PM on October 9, 2002

An oldie but a goodie - Metafilter : That looked fine on preview!
posted by stavrosthewonderchicken at 3:24 PM on October 9, 2002

Let's see! Here's a Portuguese sentence that uses all our accents (and which miserably failed the last test):

Os cães só têm maçãs à terça-feira.

[Dogs only have apples on Tuesday]
posted by MiguelCardoso at 4:39 PM on October 9, 2002

Whee!
posted by MiguelCardoso at 4:39 PM on October 9, 2002

« Older MeFi user survey about MeTa topics? | Internal errors in MetaFilter and MetaTalk Newer »

You are not logged in, either login or create an account to post comments

MetaTalk

Special characters October 8, 2002 1:18 PM Subscribe

Tags

Share