URI filtering? November 14, 2008 5:03 PM Subscribe
Whenever I preview a post—at least on the front page and MeTa—any
&lang
s in my links became 〈
. And this is why you encode your ampersands in links lest the following lang attribute be interpreted as, say, a left angle bracket.
In your face Artw!
posted by cillit bang at 5:45 PM on November 14, 2008
In your face Artw!
posted by cillit bang at 5:45 PM on November 14, 2008
Well...that makes sense. Thanks. I probably could find this in the archive, didn't think to search...sigh.
posted by Korou at 5:56 PM on November 14, 2008
posted by Korou at 5:56 PM on November 14, 2008
Would you rather they be transformed into ⟨?
If you're a lazy fucker, and you're writing an input sanitizer to deal with ampersands, de-referencing all valid entities is a pretty decent approach. It's only problematic on crap-ass PHP/MySQL apps that love to murder the fuck out of unicode.
Is the end result not exactly what you wanted to happen? You put in ⟨, and got a 〈 in your post. Why would you particularly want the entity reference to be preserved through the preview process?
posted by blasdelf at 6:12 PM on November 14, 2008
If you're a lazy fucker, and you're writing an input sanitizer to deal with ampersands, de-referencing all valid entities is a pretty decent approach. It's only problematic on crap-ass PHP/MySQL apps that love to murder the fuck out of unicode.
Is the end result not exactly what you wanted to happen? You put in ⟨, and got a 〈 in your post. Why would you particularly want the entity reference to be preserved through the preview process?
posted by blasdelf at 6:12 PM on November 14, 2008
Oh fuck, I misread you.
You were linking to sites retarded enough to use "lang" as a CGI parameter, and the de-referencing happened even though there wasn't a terminating semi-colon.
posted by blasdelf at 6:15 PM on November 14, 2008
You were linking to sites retarded enough to use "lang" as a CGI parameter, and the de-referencing happened even though there wasn't a terminating semi-colon.
posted by blasdelf at 6:15 PM on November 14, 2008
Here's a ampersand-quoting function in Python I wrote a while back that is my preferred way of solving the problem:
You could also dereference numeric character entities — Matt would want to do that to keep people from evading some of his other checks, like the one that keeps you from using the string "posted by" inside <small>.
posted by blasdelf at 6:35 PM on November 14, 2008
def ampersands(string): """Allow terminated entities but escape wild ampersands.""" splits = string.split('&') if len(splits) == 1: return string result = splits[0] for split in splits[1:]: if split: for char in split: if char.isspace(): result += "&" + split; break elif char == ';': result += "&" + split; break else: # end of split result += "&" + split else: # empty split result += "&" return resultThere's a way to rewrite it as a regex using non-capturing expressions, but I've never bothered.
You could also dereference numeric character entities — Matt would want to do that to keep people from evading some of his other checks, like the one that keeps you from using the string "posted by" inside <small>.
posted by blasdelf at 6:35 PM on November 14, 2008
Is blasdelf's code showing up on only one line to anyone else?
posted by Pronoiac at 6:45 PM on November 14, 2008
posted by Pronoiac at 6:45 PM on November 14, 2008
Yes, to blasdelf!
The contents of <pre> elements normally end up getting double-spaced, as all comments are passed through a 's/\n/<br>\n/g' regex. It used to be you could get around that by preemptively passing your comment through 's/\n/<br>/g', but it looks like the <br>s get stripped out again inside <pre> now.
I pastied it to avoid the shenanigans
posted by blasdelf at 6:54 PM on November 14, 2008
The contents of <pre> elements normally end up getting double-spaced, as all comments are passed through a 's/\n/<br>\n/g' regex. It used to be you could get around that by preemptively passing your comment through 's/\n/<br>/g', but it looks like the <br>s get stripped out again inside <pre> now.
I pastied it to avoid the shenanigans
posted by blasdelf at 6:54 PM on November 14, 2008
Let's see what happens if you just use <pre> naively, maybe the thing I was trying to work-around has been fixed, rendering the workaround harmful?
def ampersands(string): """Allow terminated entities but escape wild ampersands.""" splits = string.split('&') if len(splits) == 1: return string result = splits[0] for split in splits[1:]: if split: for char in split: if char.isspace(): result += "&" + split; break elif char == ';': result += "&" + split; break else: # end of split result += "&" + split else: # empty split result += "&" return resultposted by blasdelf at 6:57 PM on November 14, 2008
Yay! Would it be possible to add this fix to the manual preview? It currently returns the double-spaced <pre> and lures the unsuspecting into using the old (now disastrous) workaround.
Maybe in 15 years white-space:pre-line from CSS2.1 will actually be supported in browsers! (who am I kidding)
posted by blasdelf at 7:09 PM on November 14, 2008
Maybe in 15 years white-space:pre-line from CSS2.1 will actually be supported in browsers! (who am I kidding)
posted by blasdelf at 7:09 PM on November 14, 2008
blasdelf: "Let's see what happens if you just use <pre> naively, maybe the thing I was trying to work-around has been fixed, rendering the workaround harmful?"
Holy shit. Fuck encoding ampersands, this is the newsflash of the goddamn century.
posted by Plutor at 4:35 AM on November 15, 2008 [2 favorites]
Holy shit. Fuck encoding ampersands, this is the newsflash of the goddamn century.
posted by Plutor at 4:35 AM on November 15, 2008 [2 favorites]
Thread where "newsflash of the goddamn century" was asked for and delivered.
posted by philomathoholic at 10:46 AM on November 15, 2008
posted by philomathoholic at 10:46 AM on November 15, 2008
_________________
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
------TENORI ON-----
posted by roofus at 2:12 PM on December 10, 2008
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
------TENORI ON-----
posted by roofus at 2:12 PM on December 10, 2008
You are not logged in, either login or create an account to post comments
It's only for the opening post and it doesn't seem restricted to links. I'm on Firefox 3.0.4 and Safari 3.2 on OS X 10.5.5.
posted by Korou at 5:07 PM on November 14, 2008