URI filtering? November 14, 2008 5:03 PM   Subscribe

Whenever I preview a post—at least on the front page and MeTa—any &langs in my links became .
posted by Korou to Bugs at 5:03 PM (15 comments total)

Whoops, didn't finish...

It's only for the opening post and it doesn't seem restricted to links. I'm on Firefox 3.0.4 and Safari 3.2 on OS X 10.5.5.
posted by Korou at 5:07 PM on November 14, 2008


And this is why you encode your ampersands in links lest the following lang attribute be interpreted as, say, a left angle bracket.

In your face Artw!
posted by cillit bang at 5:45 PM on November 14, 2008


Well...that makes sense. Thanks. I probably could find this in the archive, didn't think to search...sigh.
posted by Korou at 5:56 PM on November 14, 2008


Would you rather they be transformed into ⟨?

If you're a lazy fucker, and you're writing an input sanitizer to deal with ampersands, de-referencing all valid entities is a pretty decent approach. It's only problematic on crap-ass PHP/MySQL apps that love to murder the fuck out of unicode.

Is the end result not exactly what you wanted to happen? You put in ⟨, and got a ⟨ in your post. Why would you particularly want the entity reference to be preserved through the preview process?
posted by blasdelf at 6:12 PM on November 14, 2008


Oh fuck, I misread you.

You were linking to sites retarded enough to use "lang" as a CGI parameter, and the de-referencing happened even though there wasn't a terminating semi-colon.
posted by blasdelf at 6:15 PM on November 14, 2008


Here's a ampersand-quoting function in Python I wrote a while back that is my preferred way of solving the problem:
def ampersands(string):  """Allow terminated entities but escape wild ampersands."""  splits = string.split('&')  if len(splits) == 1:    return string  result = splits[0]  for split in splits[1:]:    if split:      for char in split:        if char.isspace():          result += "&" + split; break        elif char == ';':          result += "&" + split; break      else: # end of split        result += "&" + split    else: # empty split      result += "&"  return result
There's a way to rewrite it as a regex using non-capturing expressions, but I've never bothered.

You could also dereference numeric character entities — Matt would want to do that to keep people from evading some of his other checks, like the one that keeps you from using the string "posted by" inside <small>.
posted by blasdelf at 6:35 PM on November 14, 2008


Is blasdelf's code showing up on only one line to anyone else?
posted by Pronoiac at 6:45 PM on November 14, 2008


Yes, to blasdelf!

The contents of <pre> elements normally end up getting double-spaced, as all comments are passed through a 's/\n/<br>\n/g' regex. It used to be you could get around that by preemptively passing your comment through 's/\n/<br>/g', but it looks like the <br>s get stripped out again inside <pre> now.

I pastied it to avoid the shenanigans
posted by blasdelf at 6:54 PM on November 14, 2008


Let's see what happens if you just use <pre> naively, maybe the thing I was trying to work-around has been fixed, rendering the workaround harmful?
def ampersands(string):
  """Allow terminated entities but escape wild ampersands."""
  splits = string.split('&')
  if len(splits) == 1:
    return string
  result = splits[0]
  for split in splits[1:]:
    if split:
      for char in split:
        if char.isspace():
          result += "&amp;" + split; break
        elif char == ';':
          result += "&" + split; break
      else: # end of split
        result += "&amp;" + split
    else: # empty split
      result += "&amp;"
  return result
posted by blasdelf at 6:57 PM on November 14, 2008


Yes, that looks much better.
posted by Pronoiac at 7:01 PM on November 14, 2008


Yay! Would it be possible to add this fix to the manual preview? It currently returns the double-spaced <pre> and lures the unsuspecting into using the old (now disastrous) workaround.

Maybe in 15 years white-space:pre-line from CSS2.1 will actually be supported in browsers! (who am I kidding)
posted by blasdelf at 7:09 PM on November 14, 2008


blasdelf: "Let's see what happens if you just use <pre> naively, maybe the thing I was trying to work-around has been fixed, rendering the workaround harmful?"

Holy shit. Fuck encoding ampersands, this is the newsflash of the goddamn century.
posted by Plutor at 4:35 AM on November 15, 2008 [2 favorites]




_________________
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
| oooooooooooooooo |
------TENORI ON-----
posted by roofus at 2:12 PM on December 10, 2008


WTF? Did you preview, roofus?
posted by Pronoiac at 3:00 PM on December 10, 2008


« Older Oooh! I've got one!   |   Anyone going to SfN? Newer »

You are not logged in, either login or create an account to post comments