Change double hyphen to dash. April 13, 2007 6:58 PM   Subscribe

Feel free to put me in my place if this is way too trivial an issue, but I've been posting more recently and can't help but notice the little tip below the posting box, and—I know this is trivial, sorry—can we replace the two hyphens with a dash? kthx.
posted by Firas to Bugs at 6:58 PM (66 comments total) 1 user marked this as a favorite

An m-dash, to be precise. With no adjacent spaces.
posted by monju_bosatsu at 7:01 PM on April 13, 2007


I will perform any single sexual favour of one of the admin's choice with any moving thing if we can implement something like WP's texturize function on all the text on mefi (er, let's say, if it's implemented within a couple months.)
posted by Firas at 7:04 PM on April 13, 2007


Well now you have my attention, as the only admin in your timezone, but I'm not even sure what you're talking about w/r/t WordPress. I fixed the other thing.
posted by jessamyn (staff) at 7:10 PM on April 13, 2007


if we can implement something like WP's texturize function on all the text on mefi

I hate "smart quotes". Sure they're proper and curly or whatever, but they turn to shit whenever you copy and paste into anything else and they're hell to clean out of xml files.
posted by mathowie (staff) at 7:13 PM on April 13, 2007


Is there a javascript hook for 'oncopy'? Yeah, I guess the copy/paste thing is an issue.

You can just escape them to entities in XML, so I don't see the issue there.
posted by Firas at 7:18 PM on April 13, 2007


Let's do that right after we get rid of all the comma splices floating around the place.
posted by Wolfdog at 7:19 PM on April 13, 2007


Nice snark, champ. I'm asking for regular expression-based text replacement, not natural language processing.
posted by Firas at 7:21 PM on April 13, 2007


(re: wolfdog)
posted by Firas at 7:21 PM on April 13, 2007


i hate designer text. designers are ruining the intertubes. the web is not ms word. 255 ascii characters ought to be enough for anybody.
posted by quonsar at 7:48 PM on April 13, 2007


If you want this, use Smart Firefox. But don't force the horror of Unicode quotes down everybody's throats.
posted by Rhomboid at 8:41 PM on April 13, 2007


On a mac all of these are accessible via the option key. I felt naked in X windows right now without my Option-underscore to make an em dash. Are there unfucked entites for em-dashes on Mefi? "8212;"

But please for the love of all that is holy, don't fuck us in the ass with Texturize.
posted by blasdelf at 9:21 PM on April 13, 2007


Apparently the entity 8212; that A List Apart reccomends is hosed. Oh well.
posted by blasdelf at 9:23 PM on April 13, 2007


Goddamnit, I said & and I mean &!
posted by blasdelf at 9:24 PM on April 13, 2007


I fail to see how fucking the intertube's throat in the ass has anything to do with m-dashes, ellipses and smart quotes. Like, I don't see the downsideat all. Enlighten me?
posted by Firas at 9:28 PM on April 13, 2007


All the pushback I see is coming from technical types who've perhaps experienced interop horrors. The particular implementation I'm referring to would be a 'text filter' on output (or a table in the db that cached generated text) rather than something that screwed with the original text. Using character entities near-guarantees that there will be no interop problems with rss readers, etc.
posted by Firas at 9:38 PM on April 13, 2007


"But don't force the horror of Unicode quotes down everybody's throats."

There's nothing wrong with Unicode. There's no good reason in this day and age to be limited to ASCII. It's just the laziness of developers.
posted by Ethereal Bligh at 9:55 PM on April 13, 2007


"But don't force the horror of Unicode quotes down everybody's throats."

OMG! Unclench!
posted by amyms at 11:18 PM on April 13, 2007


OMG! Unclench!

But if we unclench, it will only make it easier to force Unicode characters through one of our orifices!

For the record, I'm for more people learning to use —. People might want to use -- without it being equivalent to a dash, such as when pasting C code.
posted by grouse at 12:42 AM on April 14, 2007


Every time somebody makes a post that uses those stupid smart quotes or em-dashes, they display as '?' in my RSS reader. It makes reading what they're trying to say a royal pain in the ass. I can't count a day that goes by that I don't visit some blog or website where the quotes look like the line-noise fairy took a shit on the screen because somebody put the wrong Charset setting in the headers and so UTF-8 fancyquotes come out as 2 or 3 random noise characters. I have no problem with fancyquotes and emdashes when they actually work, but my experience has been that they fail quite often.
posted by Rhomboid at 2:10 AM on April 14, 2007 [1 favorite]


That's your wary reptile mind FUD'ing you mate. They'll work.
posted by Firas at 2:25 AM on April 14, 2007


such as when pasting C code.

Incredibly edge case, solved by skipping anything inside <code>

somebody put the wrong Charset setting in the headers

Not only is mefi served as UTF-8 in the first place, I'm talking about character entities (or even better, numeric ones.) Check out my magic super-duper numeric reference action:
“Oh!” said she, “I heard you before, but I could not immediately determine what to say in reply. You wanted me, I know, to say ‘Yes,’ that you might have the pleasure of despising my taste; but I always delight in overthrowing those kind of schemes, and cheating a person of their premeditated contempt. I have, therefore, made up my mind to tell you, that I do not want to dance a reel at all—and now despise me if you dare.”
Did that just break your brain or what?
posted by Firas at 2:46 AM on April 14, 2007


Well, it looks like something in mefi's code pipeline turned the character references into actual characters in my Pride & Prejudice quote (and if it displays correctly for you, even better!), but to smash a dead horse in the teeth with a metal baseball bat, references like 8217; are written in ASCII numerals.
posted by Firas at 3:08 AM on April 14, 2007


You can't use numeric entities on Metafilter, the comment parser strips them. This was because there was some XSS vuln that Matt discovered a couple of years ago, that used this as a means to embed javascript. It might have worked in conjunction with the user-page custom CSS which was also removed for the same reason.

I am well aware that MeFi is UTF-8 everywhere, and there is hardly a problem with broken smartquotes in comments. Feel free to go crazy in comments, whatever. But if you start automatically fucking with every post/comment to add these things, then it would essentially render Metafilter unreadable to me, as like I said, every time there's one of those little bastards in a post it just shows as ? in RSS and RSS is how I read the entire site. I know it's not just me as there was at MeTa thread about it in the past, although nobody could figure out why as if you view the source of the feed it looks correct. But my RSS reader works fine with other UTF-8 feeds, so it's only metafilter that they're replaced by ?. I've given up trying to solve it, and I just grumble and live with the small percentage of posts where people smartquotify on their own.
posted by Rhomboid at 4:09 AM on April 14, 2007


Yeah, but mon ami, were they implemented by Matt, he could use numeric references instead of unicode characters, cleverly sidestepping the gnomes.
posted by Firas at 4:20 AM on April 14, 2007


I know this is all a bit presumptious considering the site isn't mine. Just a suggestion, y'all. I'm just sayin' the technical arguments against it are spurious.
posted by Firas at 4:22 AM on April 14, 2007


"I'm just sayin' the technical arguments against it are spurious."

Not exactly true because legitimate non-lazycoder interop problems have been raised, but let's move past that rather than play the hand-waving game. What about the aesthetic argument? Your typographer's quotes are ugly.
posted by majick at 7:53 AM on April 14, 2007


"Your typographer's quotes are ugly."

Says you. The 99% of the print world and its readers say otherwise. Maybe you'd like to try to abolish them from print?
posted by Ethereal Bligh at 10:36 AM on April 14, 2007


This is not something that needs to be fixed. The fact that I can't post a comment to this thread, on the other hand—I don't even see a comment box at the bottom—that's a problem. And don't tell my to upgrade my browser or some shit like that. I like my browser. If you're going to add frills, make 'em work for everybody. (Or, of course, make 'em opt-in, which I think, if I understand that thread aright, is what's going to happen. But I want to register my gripe just in case. Um, this is the Gripe Registry, is it not? They sent me here at the front desk...)
posted by languagehat at 10:38 AM on April 14, 2007


Yes. While we're at it, I would also like for Matt to stop storing our passwords in his db in cleartext.
posted by Firas at 11:23 AM on April 14, 2007


EB, you've been around more than long enough to know the refrain: "The web is not print." The use cases are different, the constraints are different, and the intended result is different. It's disingenuous and unbecoming to pretend otherwise.
posted by majick at 1:36 PM on April 14, 2007


Ok, that's refrain refers mostly to layout issues, not typographic ones. Talk about disingenuity.
posted by Firas at 1:38 PM on April 14, 2007


Anyway, I don't line up with EB's quantitative "well that's what it's like in print" argument in favour of pretty text. Instead I just remain in befuddlement that anybody would oppose them on aesthetic grounds. It's like being against capitalization, or punctuation.
posted by Firas at 1:42 PM on April 14, 2007


My argument wasn't in favor of pretty text but that those characters are the standard and these double ticks are non-standard. But there is something to be said for "pretty text" or else we'd still be reading monospaced characters online. There's reasons that typography does many of the things it does. Some of those reasons are technology-centric. Others are reader-centric. Double-quotes are probably not at all important on reader-centric grounds but at this point there's no reason not to use the conventional typographic characters. The only reason we use the small subset in ASCII is because they were once little pieces of metal that had to fit into typewriters and teletypes.
posted by Ethereal Bligh at 2:19 PM on April 14, 2007


"[That] refrain refers mostly to layout issues, not typographic ones."

Um, no. It refers to the general category of "this works in print, so let's do it on the web without considering the consequences" thinking. While that frequently has something to do with layout it's by no means ever been constrained to that meaning. The aesthetics of the display screen are not the aesthetics of print.

"I just remain in befuddlement that anybody would oppose them on aesthetic grounds."

Some otherwise well-rendered screen fonts use downright hideous glyphs for these largely untypeable, rarely-used characters. Verdana is one extremely common such font. The characters in question render unattractively like a pair of backticks. What's the point of knowingly munging an acceptably well-rendered character into an (to put it charitably) ugly one?

"Maybe you'd like to try to abolish them from print?"

I see no need. In print they are (generally speaking) used effectively and rendered lovingly with well-chosen glyphs. Likewise I'd prefer most of my printed reading continue to use pleasantly serifed characters from a thoughtfully selected typeface. However, the state of type on the web and the resolution of common displays doesn't really make either expectation reasonable when it comes to browser text.

"...at this point there's no reason not to use the conventional typographic characters."

This is ignoring the not insubstantial technical issues with munged text. Some have already been enumerated and largely dismissed with handwaving.

Listen, people, I'd love to live in a world where Unicode actually worked properly, screen text were often as beautiful as printed, and keyboards supported writing with typographical conventions so that dubious munging schemes or fumbling around with numerical character entities didn't have to be proposed. It's a wonderful fantasy, but not yet the world we live in.
posted by majick at 3:46 PM on April 14, 2007


I'm reading this in a nicely rendered Georgia. And the double-quotes look fine.
posted by Ethereal Bligh at 4:47 PM on April 14, 2007


in response to Firas: no.

the double-dash "--" is a perfectly acceptable substitute (as a matter of formal style) for the em-dash (not m-dash, i believe), and it has two further advantages:

1. as mentioned above, em-dashes often render wrong (i.e. as gibberish) in RSS feeds, etc -- the double dash almost never does

2. even more prevalent is the phenomenon where you copy some text with an em-dash and it gets truncated to a single dash, or worse, the dreaded " -" or "- " -- this glitch annoys me to no end.

please, please, please, keep the "--"
posted by spiderwire at 5:07 PM on April 14, 2007


oh, and adjacent spaces around the "--" is also a stylistic preference, not a rule.
posted by spiderwire at 5:08 PM on April 14, 2007


I'm reading this in a nicely rendered Georgia. And the double-quotes look fine.

EB, you're showing an unwonted refusal to address the point. I'm glad it looks nice for you, but you must be aware that for many people, fancy quotes (and similar typographical frills) look like crap online. The fact that they look good in printed books is entirely irrelevant. I know you know this. Why do you insist on sticking to such rusty guns?
posted by languagehat at 5:11 PM on April 14, 2007


John Trimble, Writing With Style, 2nd Ed.:
...note that the dash is printed in books and magazines as one long line. And perhaps 95% of the time now it's also printed "attached," to use a copy editor's term. That means the mark is jammed right up against the surrounding text; there's no space before or after it. Some writers, though, still prefer an "open" look... It doesn't matter which style you use so long as you're consistent. I myself prefer the attached style because it's the house style for most publishers and it saves space, but I concede that the unattached style looks more expansive. (125-26)
Also, it's "em dash," not hyphenated, so that was my bad.

It was Trimble who told me that the "--" was an acceptable substitute for an em dash when the latter wasn't available, in the context of a word processor that couldn't produce em dashes. But I'm admittedly unsure if that was just editorial dicta.
posted by spiderwire at 5:19 PM on April 14, 2007


Agh! It's changed already? What the hell.
posted by spiderwire at 5:22 PM on April 14, 2007


Ah ha! I was right! Given the prevalence of typographical systems that don't allow for em dashes (typewriters, etc.), the double-hyphen is an acceptable stylistic substitute for the em dash! Id. at 126.
posted by spiderwire at 5:27 PM on April 14, 2007


"...but you must be aware that for many people, fancy quotes (and similar typographical frills) look like crap online."

They don't have to. Every OS supports Unicode now and comes with good typefaces. Every browser supports it and all the major server software does, too. The only reason things break is because someone, somewhere, has taken the easy way out.

This is like the argument against marked-up text in email and insisting on plain text. It's rigid, unnecessary conservatism that prefers a very artificially limiting environment.

Furthermore, what's broken with double-quotes is what's broken with internationalization. The easy way out is anglocentric and why you, as a linguist, have undoubtedly had problems displaying various languages on your computers. Those problems no longer need exist.
posted by Ethereal Bligh at 5:48 PM on April 14, 2007


"Those problems no longer need exist."

I adamantly maintain you are living in fantasy. Every major OS supports IPv6. Every major networking product vendor supports IPv6. Modern server software and clients, including popular browsers, support IPv6. Yet it's deployed almost nowhere and remains largely unusable in the real world.

Sure, a certain part of that is the noxious design-by-committee nature of the standard, but the largest part of it is that nobody else uses IPv6. Following a similar if not identical line of logic to your own, we should all be sporting portable, routable, 128 bit addresses on the public network. Note, however, that instead some massive fraction (75% is the number I've heard, but can't reliably cite) of the Internet is hidden behind NAT.

True, it's because someone, somewhere -- or rather, everyone everywhere -- has taken the easy way out, but that's the actual reality of the situation and all practical actions should take that into account at implementation time. If you don't code pessimistically, you're shooting yourself in the foot.

If you were to say "[t]hose problems ought no longer exist," you would have my full and complete support. However, they do exist, and pretending they don't doesn't fix them.
posted by majick at 7:03 PM on April 14, 2007


Before I write my comment, I want to encourage everyone participating in this thread to read this article on character encoding, because it's fundamental to even being able to discuss these issues. Really. Seriously. Go read it. Now.
posted by spiderwire at 7:06 PM on April 14, 2007


If you were to say "[t]hose problems ought no longer exist," you would have my full and complete support. However, they do exist, and pretending they don't doesn't fix them.

I think that this is accurate, but EB, I respectfully submit that your description of the problem is inaccurate for a different reason than majick does.

First, read the article I just linked. The author points out that this is stuff that every programmer needs to know, but the other takeaway is that as a practical matter, for a coder, it's really, really hard to do.

The result of that problem is that in order for your ideal world to exist, all the intervening programs have to do their encoding properly. Even one break in the chain screws everything up. This is not a problem with standard plaintext for the most part. That's why you use regular quotes instead of smart quotes, and double-dashes instead of em dashes. No matter how many times the text is retransmitted, those characters will stay intact on the other end.

If you need to use really special characters, so be it, but choosing the smart quotes over the standard quotes sacrifices functionality and readability on the altar of aesthetics. Period.

(Furthermore, if this bugs you to no end, you can fix it on the client side. No one is preventing you from writing a FireFox plugin that converts everything into pretty smart quotes. Who's imposing their preferences now?)

Two additional problems which are worth pointing out:

1. You assume that the whole chain uses Unicode, and furthermore than the entire chain is digital. I work in a profession that still uses a lot of old-school mono-spaced characters. If there's an analog-to-digital conversion along the line, things get messed up. Intervening programs (AutoCorrect, I'm looking it you) screw things up. I have (no joke) gotten in trouble with superiors because of typographical errors introduced by standardization methods. Those problems wouldn't have occurred if, say, the original writers had been considerate enough to use double-dashes rather than getting fancy with their em dashes.

2. Even more problematic is the cut-n-paste issue. Even if the Unicode works fine all the way through, clipboard protocols are still disastrous. File formats are a nightmare. I can't even count the number of times I've seen someone cut-n-paste legit text from a webpage onto MeFi and seen it come out in a hodgepodge of little diamond-questionmarks that looked fine in the text box. ("Oops, sorry, it looked fine in Preview!") And that's even within the same browser. What's the standard solution to this problem? Copying, pasting to a plaintext editor first, then copying and posting to the MeFi box and fixing errors. Hmmm. What would have solved this problem? Writing the damn thing in plaintext in the first place, with no smart quotes or any other fancy-schmancy crap.
posted by spiderwire at 7:19 PM on April 14, 2007


spiderwire, I worked for a major web vendor. I don't need to read the article.

Not doing things right is hindering internationalization. Double-quotes is a trivial example of a much bigger problem that no one at this point has any excuse to ignore.

The comparison to IPv6 is spurious. The net as it exists right now isn't really broken without it. It is broken without correct multi-language support.
posted by Ethereal Bligh at 7:31 PM on April 14, 2007


Oh, and from the article, yet another example of how MicroSoft contributes to this problem by thinking it's smarter than its users:
Internet Explorer actually does something quite interesting: it tries to guess, based on the frequency in which various bytes appear in typical text in typical encodings of various languages, what language and encoding was used.
Joel doesnt mind it when MicroSoft tries to outsmart its users because, well, he used to work for MicroSoft, but I think it corroborates my previous point, which is that the entire system only works if:

(a) Every coder along the line passes input and output with the correct encodings (I submit that relying on no lazy coders or bugs = FUBAR)

(b) Every process is actually capable of passing the right encodings (the cut-n-paste problem)

(c) No one along the way decides to get fancy and fix things or do shortcuts (the AutoCorrect problem)

It's not a soluble issue. But -- the more standard plaintext you use (i.e., the 32-127 part of the ASCII set) the fewer cumulative errors you'll encounter and the happier everyone will be.

Code for the lowest common denominator wherever possible.
posted by spiderwire at 7:31 PM on April 14, 2007


Not doing things right is hindering internationalization.

I'm not arguing majick's point -- The IPv6 point is, I think, a little off-base -- I'm saying that your position requires that every entity down the chain "do things right," and that's a pipe dream.

Maybe we should clarify what problem we're talking about here. Users employing smart quotes? MeFi using smart quotes? Automatically reducing smart quotes to standard double quotes?
posted by spiderwire at 7:37 PM on April 14, 2007


Don't call them “smart quotes”. That makes it sound like Microsoft invented left and right double-quotes.

By the way—as you see—MetaFilter handles this stuff fine.

Firefox also can attempt to auto-detect encoding. Unlike IE, it's off by default.

You're just making excuses for an individual developer or admin not doing the right thing. Their job individually is to do the right thing. If someone else screws up, that's their bug.
posted by Ethereal Bligh at 7:57 PM on April 14, 2007


Also, I'm trying to understand how it is that you think the article you linked to supports your case. Here's a quote:
When I looked into another commercial library, it, too, had a completely broken character code implementation. I corresponded with the developer of that package and he sort of thought they "couldn't do anything about it." Like many programmers, he just wished it would all blow over somehow.

But it won't. When I discovered that the popular web development tool PHP has almost complete ignorance of character encoding issues, blithely using 8 bits for characters, making it darn near impossible to develop good international web applications, I thought, enough is enough.

So I have an announcement to make: if you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will.

And one more thing:

IT'S NOT THAT HARD.
It's 2007.
posted by Ethereal Bligh at 8:02 PM on April 14, 2007


Spiderwire, smart quotes require context you can't fix them on the client side. If someone uses one to represent inches, or embeds a quote within another quote you are screwed

Ethereal Bligh, the Internet might not be broken but IPv4 is. It has created an artificial scarcity, otherwise people wouldn't have to pay $24-$60 per year for a single static IP address.

Also, I use plain text in all my emails because I value accessibility over aesthetics. It is artificially limiting, but I prefer not to have to educate myself about whether the makers of my software are going to pull an embrace and extend on the protocol, and I've seen problems caused by RTF or HTML for clients in the past working in IT (long ago). Also, the more complicated you make the interface between programs, the more ambiguity and security issues creep in.
posted by BrotherCaine at 8:02 PM on April 14, 2007


EB, I just thought that it was an important article so that people could understand the technical issues at work here. You do, so it's not an issue.

By the way—as you see—MetaFilter handles this stuff fine.

I know, that's why I asked what the issue was. All I'm saying is that people should use " quotes and -- in lieu of em dashes.

You're just making excuses for an individual developer or admin not doing the right thing. Their job individually is to do the right thing. If someone else screws up, that's their bug.

Speaking to the broader issue, yeah, I think that's the point. As a goal, you're right. The problem is that you can't rely on all the coders and programs in the chain to do their encoding or their interoperability right. Of course they should. The question is whether they do and how we deal with it.

IT'S NOT THAT HARD.

That's not the issue. First, as Joel points out, PHP doesn't even do this right. Lots of languages don't do it right or force the coder to do it for themselves. Even if it's not 'hard,' I can say from experience that it's a royal pain in the ass. In fact, the fact that it's not hard makes it all the more frustrating.

Second, even if you do it right, interoperability remains a problem. This is the killer.

Example: Even if you get the encoding right in your browser, how to you preserve the encoding when you copy it into the clipboard and mess with it in a different program? How does that program know where to look for the encoding metadata, if it even exists? How does it push that data back? Will your browser recognize it when it comes back, and if so, will there be a way to communicate it properly through POST or GET if something's changed along the way? (Say a different international character set at any point in the chain.) Those are not easy problems. And that's all just on the same computer.

Spiderwire, smart quotes require context you can't fix them on the client side. If someone uses one to represent inches, or embeds a quote within another quote you are screwed

I'm aware of that. But that would be preferable, I think, because at least you're not corrupting the data in transit, it just looks funky on your end.
posted by spiderwire at 8:30 PM on April 14, 2007


You damn ninnies. I just woke up and haven't yet read all the thread after my last reply but, um, it seems to be a retread of the 'oooh ze high character bytes, ooh la la!' Come the fuck on. I just said way above that they can be written as numeric or named entities. Don't bring this crappy FUD up and pretend it's an argument.
posted by Firas at 8:40 PM on April 14, 2007


Firas, glad you're back. You're wrong about the em dash stylistic stuff, as you'll see above. :)
posted by spiderwire at 8:48 PM on April 14, 2007


Some otherwise well-rendered screen fonts use downright hideous glyphs for these largely untypeable, rarely-used characters.

Well, I'll concede this issue. I'm not qualified to argue about the actual fonts.

keyboards supported writing with typographical conventions so that dubious munging schemes or fumbling around with numerical character entities didn't have to be proposed

Well, I'd rather my software take care of the issue for me than I have to deal with more characters on a keyboard! And given that Mefi is—you know—a web site, calling the usage of numerical character entities 'fumbling' is unfounded at best. (I mean, we also use a href and other web technology convention here!)

If someone uses one to represent inches, or embeds a quote within another quote you are screwed

No, you're not screwed if someone embeds a quote inside another quote. Regular expressions rock that way.

Inches are an issue, but damn—I'd rather get the inch marker wrong the 0.0xx% of the time it's deployed than get the apostrophe wrong near-universally.

In terms of technical arguments, I see two main types of issues:

Copy/Paste

Look, I understand this is an issue. But I'd also question how many people are pasting from Mefi into things that can't handle appropriate typography. Blogging engines can. Word Processors can. Notepad can.

RSS etc.

I would submit that the people saying things like "well it often fails" are the ones doing the handwaving. Which freakin' RSS reader doesn't render a wordpress rss feed properly? Using numeric entities is like using a condom and spermicide—and then abstaining. It's difficult to go wrong. The reason I say this with confidence is that we aren't talking about a gazillion different deployments here. We're talking about one site. Under the control of one admin. Which generates its own RSS feed. The environment for the html and xml pages therefore is one of 100% control.

The issue about php-digital-chain, GET/POST etc. is even more exasperating. I said a filter-type thing on certain output didn't I? In the db and internal processing the text would remain the same as it would have been without the typography filter.

In sum:
the double-dash "--" is a perfectly acceptable substitute (as a matter of formal style) for the em-dash (not m-dash, i believe)
Well, sure it's 'perfectly acceptable'. It just looks really dingy. Maybe spoilt by using technology that handles this properly, but I see no reason Mefi can't.

I understand that this isn't a pressing concern. If I thought this had a likelihood of being addressed I'd have different tone for effective advocacy. It just sort of occured to me and then the random "well it can't work!" notions just sucked me into defending it. It's a fairly avant-garde thing to do, I'll grant—I like things I'm involved with to be well-crafted. Pardon me.
posted by Firas at 9:50 PM on April 14, 2007


Verdana’s quotation marks are a perfectly acceptable form. If you want a comparison to a “print font,” an increasingly nebulous distinction in the first place, check Mrs Eaves (and the old Emigre articles about how flipped-99/99 double quotes are preferred by that house).

I flatly deny that any font in common or even uncommon use has ill-designed quotation marks or dashes. Those are only krazy, weird, exceptional, edge-case characters to oldskool programmers who haven’t caught up to 1984 yet (when real typography became possible on retail computers). You aren't using MS-DOS anymore and the claim is false.

A credible reason I can see to avoid implementing a smart-quotes and -dashes algorithm is its propensity to get things wrong, particularly with opening apostrophes and when writing in U.K. or Irish English.
posted by joeclark at 10:42 PM on April 14, 2007


At least this thread spurred me to get the new version of Microsoft's Keyboard Layout Creator and add some keypress combinations to give me some of these glyphs. Then I programmed them into my Logitech G15 keyboard with its extra bank of programmable keys. I put the Unicode code points in the first bank and the corresponding named entities as macros in the second bank.

Incidentally, in the course of doing this I came across something I had forgotten that I should never have forgotten. It's the real problem with Microsoft's “smart quotes”. Windows is stupidly using the Unicode code point range #129–#159, which are undefined. They're using it because that range is the equivalent of the Windows codepage. Windows's “smart quotes” are in that range.

And I've been stupidly using those. (Because I know the numpad keystrokes well, especially for the em-dash.) My earlier example, which I claimed works on MeFi, should only actually mostly be working on Windows machines. The ones I'm using here, which are Unicode and should be encoded as UTF-8 since that's how Matt has it set up, should display fine for everyone. So, mea culpa, especially since I've been admonishing others for getting things wrong.
posted by Ethereal Bligh at 10:48 PM on April 14, 2007


Look, I understand this is an issue. But I'd also question how many people are pasting from Mefi into things that can't handle appropriate typography. Blogging engines can. Word Processors can. Notepad can.

Actually, that's entirely untrue. Try copying a chunk of MeFi text with an em dash into NotePad and then back and tell me what happens. (They may have fixed this since I stopped using Windows, but I doubt it.)

At any rate, I have literally lost count of how many times I've heard "Errr, that looked fine when I copied it from the webpage into the MeFi box," so even if it shouldn't be a problem, it is. This get on my nerves more than anything else.

Well, sure it's 'perfectly acceptable'. It just looks really dingy. Maybe spoilt by using technology that handles this properly, but I see no reason Mefi can't.

Damnit, it's people like you that got me marked down on my last otherwise perfectly good legal memo because of your oh-so-pretty em dashes when there are perfectly good double-hyphen substitutes available! :)

Seriously, though (although that did happen, and recently) -- where I draw the line is digital and analog. If you don't control the end display, I say (as a personal matter), use the most universal form. Straight quotes, double-hyphens, etc. Once you get around to printing something out to show to someone and you know it's in its final graphic form, that is the time to pretty it up. Just my opinion, though.

I understand that this isn't a pressing concern. If I thought this had a likelihood of being addressed I'd have different tone for effective advocacy.

Wait, what are we talking about? Last I check it had been changed. Maybe I was hallucinating.

Incidentally, in the course of doing this I came across something I had forgotten that I should never have forgotten. It's the real problem with Microsoft's “smart quotes”. Windows is stupidly using the Unicode code point range #129–#159, which are undefined. They're using it because that range is the equivalent of the Windows codepage. Windows's “smart quotes” are in that range.

This may sound like techie groupthink, but it's really astonishing to me how much of this comes down to what I said earlier about Microsoft just thinking that it's smarter than its users. That thing about IE using statistical analysis to infer the appropriate Unicode set is still just fucking mind-boggling to me. At least the stupid AutoCorrect stuff (ordinals, smart quotes, em dashes, the list is infinite) is somewhat transparent.
posted by spiderwire at 12:42 AM on April 15, 2007


“That thing about IE using statistical analysis to infer the appropriate Unicode set is still just fucking mind-boggling to me.”—spiderwire

You realize that it doesn't do that all the time, right? That when the doctype is set properly it doesn't do that?
posted by Ethereal Bligh at 12:48 AM on April 15, 2007


Try copying a chunk of MeFi text with an em dash into NotePad and then back and tell me what happens.”—spiderwire

Okay:

“That thing about IE using statistical analysis to infer the appropriate Unicode set is still just fucking mind-boggling to me.”—spiderwire

How's that look to you?
posted by Ethereal Bligh at 12:50 AM on April 15, 2007


Incidentally, that was copied from Firefox to Vista's clipboard to Notepad to the clipboard to Firefox's textbox.

The following is the same, except now I'm using IE7:

“That thing about IE using statistical analysis to infer the appropriate Unicode set is still just fucking mind-boggling to me.”—spiderwire

When saving with “Save As”, Notepad asks me if I want to save the file in ANSI, Unicode, Unicode Big-Endian, or UTF-8. FYI. I just pasted and then copied in the previous comment. In this one I pasted, saved as Unicode, re-opened, copied, and pasted to IE7.
posted by Ethereal Bligh at 12:59 AM on April 15, 2007


Hey, they fixed it! I haven't used Vista, but that's good news. Having disproved what I admitted at the time was probably a strawman, what about the substantive objection?

You realize that it doesn't do that all the time, right? That when the doctype is set properly it doesn't do that?

Yes. I do realize that.

I also realize that 99% of the time, AutoCorrect makes relatively acceptable changes. That doesn't make me any less hesitant to turn it off any time I'm forced to use Word.
posted by spiderwire at 1:26 AM on April 15, 2007


Also, I'm not particularly impressed by Microsoft's ability to make their own programs interoperable.

FrontPage stuff often looks relatively pretty in IE.
posted by spiderwire at 1:28 AM on April 15, 2007


In case you didn't get my position from my previous comment, I support the fuck out of using sweet Unicode characters all over the place. My comment would be peppered with them if I were using one of my Macs or if I had a Compose key on my Model M. I use emdashes constantly in my writing, and die a little inside when I have to use -- instead.

But don't do autoconversion of quotes or double-dashes by default!

Making Textile/Markdown/BBCode an option for text input would be nice (and implementable as a Firefox extention, you insensitive clods!). The main problem is Live Preview -- you'd have to do all the parsing in javascript live in the browser.
posted by blasdelf at 6:55 AM on April 15, 2007


You don't have to use Unicode. A simple &mdash; would do the job. I've used it for years.
posted by adipocere at 9:51 AM on April 15, 2007


“You don't have to use Unicode. A simple &mdash; would do the job. I've used it for years.”

Really? I had no idea. That's amazing! I'm going to have to look into this whole “html” thing myself.
posted by Ethereal Bligh at 10:04 AM on April 15, 2007


« Older About that Kurt Vonnegut thread   |   how many posts can a duclod snort? Newer »

You are not logged in, either login or create an account to post comments