Quotes in links considered harmful July 14, 2009 5:10 PM   Subscribe

The AskMe link entry control generates malformed HTML when fed a URL containing double quote characters.

I copied the following from the browser's address bar

http://www.google.com/search?q="yield+factor"

then selected a phrase, clicked the "link" control, and pasted it into the URL popup box. I didn't notice that the tag actually inserted was

<a href="http://www.google.com/search?q="yield+factor"">

or that this had then got "cleaned up" as

<a href="http://www.google.com/search?q=">

until after hitting Post.

It seems to me that the box should have generated

<a href="http://www.google.com/search?q=%22yield+factor%22">

Broken comment here, corrected one here.
posted by flabdablet to Bugs at 5:10 PM (32 comments total)

Thanks for the detailed report. Looks like something we need to fix up.
posted by pb (staff) at 5:15 PM on July 14, 2009


Stop cluttering up MetaBirthdayCelebration with your sniveling little totally accurate detailed and useful bug reports.
posted by davejay at 5:19 PM on July 14, 2009 [4 favorites]


Now that pb's seen it, I fixed your link.
posted by jessamyn (staff) at 5:21 PM on July 14, 2009


Then let the revelry resume! And summon the longboats.
posted by flabdablet at 5:24 PM on July 14, 2009


Don't worry davejay, we'll remember this bug report FOREVER.
posted by Pants! at 5:34 PM on July 14, 2009 [1 favorite]


flabdablet is the best dablet.
posted by cowbellemoo at 6:17 PM on July 14, 2009


If I may piggyback a bug. If you are not logged in when you search you get the error:
"Error: File not found
Looks like you've asked for a file that doesn't exist, try out the search below to find what you are looking for, which searches across all the MetaFilter sites."
posted by tellurian at 8:29 PM on July 14, 2009


tellurian, where did you search from? If you're logged out, the search form at the top and bottom of the page should take you to google. The internal site search is only for members.
posted by pb (staff) at 8:38 PM on July 14, 2009


Interestingly, if I search for a quoted string in Google, the URL of the results page displays with quotes. However, if I copy it, what actually gets inserted into the paste buffer is URL-encoded, i.e. the quotes are replaced with %22. This is in Firefox 3.0 Linux.

What appears to be happening is that Firefox is doing a conversion of the displayed URL for readability in the address bar, but overriding the standard platform copy operation to make sure that if you copy it, what goes into your buffer is actually standards-compliant, i.e. appropriately URL-encoded. (So use Firefox.)
posted by George_Spiggott at 8:39 PM on July 14, 2009


Slight correction: Firefox will not automatically encode the URL when you copy it if what you're copying is something you typed. What seems to happen is that Google sends you to a correctly encoded URL. When Firefox displays the resulting page it puts the decoded human-friendly version of the URL (%22 converted to quote) in the address bar but keeps the original URL-encoded one, and inserts that into the paste buffer if you copy it.
posted by George_Spiggott at 8:49 PM on July 14, 2009


And if we're in a piggybackin' mood, would it be an easy task to write a filter (heh) that changes malformed links that look like

  http://http://www.somesiteorother.com/

to the desired

  http://www.somesiteorother.com/ ?

Seems to come up from time to time when people use the link function in the comment box, and paste their full-address URL after the presupplied "http://" rather than in place of it.
posted by hangashore at 8:50 PM on July 14, 2009


If you do that, be sure that what you keep is the second scheme prefix and not the first one. The user may have pasted an https:// scheme and you'll want that rather than the presupplied one, unless the page happens also be served on port 80 as well as 443, which isn't always the case.

(and that's probably enough buttinsky from me)
posted by George_Spiggott at 8:56 PM on July 14, 2009


tellurian, where did you search from?
The link in the top menu. The one at the bottom does indeed take me to Google.
posted by tellurian at 9:07 PM on July 14, 2009


The link in the top menu.

ahh, I think you have an unusual situation. You have the plain theme cookie, but you're logged out. Most non-members driving by the site won't have the plain text theme specified. The default theme has a search form rather than a link.
posted by pb (staff) at 9:18 PM on July 14, 2009


But just to cover the bases I just removed the Search link from the plain theme if you're logged out. You can still use the form at the bottom of the page to do a Google search of metafilter.
posted by pb (staff) at 9:26 PM on July 14, 2009


Ah! Cheers.
posted by tellurian at 9:40 PM on July 14, 2009


DISCRIMINATION AGAINST THE PLAIN THEME SHALL NOT STAND
posted by BitterOldPunk at 12:02 AM on July 15, 2009


If I can piggyback: if I accidentally type:

http://boingboing.net/

Can you make it redirect to:

http://fark.com/

?
posted by davejay at 12:06 AM on July 15, 2009 [1 favorite]


Well, if we're filling this thread with bugs, there's a couple I've found related to HTML closing tags:
  1. In general, it lets any hanging closing tags stay hanging if all the opening tags are matched up. (Try for example <em>foo</em></script></body></html>, and compare behavior to <u><em>foo</em></script></body></html>.) The only tag it seems to catch consistently is </div>, which seems to eliminate most of the problems I can see, but it's still inconsistent.
  2. On the other hand, if no closing tags are added, it replaces the tags in the order they appear, rather than in reverse order outward. (Try for example <s><noframes>: the close tag for the strikethrough ends up placed in the alternate no-frames page. Since most browsers now will display frames, the rest of the page will be struck through for those people.)

posted by Upton O'Good at 12:20 AM on July 15, 2009


What appears to be happening is that Firefox is doing a conversion of the displayed URL for readability in the address bar, but overriding the standard platform copy operation to make sure that if you copy it, what goes into your buffer is actually standards-compliant, i.e. appropriately URL-encoded. (So use Firefox.)

In fact I do use Firefox, but I had apparently defeated that rather stinky Firefox hack by selecting only part of the text in the address bar (everything from ?q="stuff" leftward) before doing Copy. This is my usual practice for Google search links, because I don't like including all the browser-identifying cruft that Google sticks on the end.

I still think it would be nice if MeFi did a simple replace of %22 for " in anything pasted into the Link box. Doing a full URLencode is probably wrong, because it will break any URL containing ampersands which is probably most of them.
posted by flabdablet at 12:35 AM on July 15, 2009


I've just noticed that Firefox prettifies %20 in the address bar as well, rendering it as a space. So you might want to unscrew that, too.
posted by flabdablet at 5:24 AM on July 15, 2009


I've not seen this much piggy-backing since school sports day. Tenner on tellurian to show or place in the egg-and-spoon.
posted by Abiezer at 5:33 AM on July 15, 2009


Whoah! pb looked at my cookie and fixed the plain theme but now I see this. Freakout!
posted by tellurian at 6:55 AM on July 15, 2009


Piggybacking bugs.

I like to imagine that the little one thinks it's in a rodeo.
posted by quin at 7:35 AM on July 15, 2009


Freakout!

You're just missing the stylesheet for some reason. A shift+reload should fix that up for you. And I didn't look at your cookie, just deduced the problem from your description. I don't want folks to think I'm peering into their cookie jar.

So you might want to unscrew that, too.

Good point, thanks. I'm going to try to push out a new version of that function today when I'm not bagging MeFi shirts.

...there's a couple I've found related to HTML closing tags...

Thanks for letting us know. The HTML-fixer is a bit of a black box to me, and we can usually spot-fix any of these rare parsing problems. But it's good to keep in mind for the next time I'm in there monkeying around.
posted by pb (staff) at 9:40 AM on July 15, 2009


...when I'm not bagging MeFi shirts.

Is that in your job description? I hope you told that mathowie guy he really owes you big time.
posted by deborah at 11:03 AM on July 15, 2009


I still think it would be nice if MeFi did a simple replace of %22 for " in anything pasted into the Link box.

There are standard urlencode and urldecode functions in every web-specific language and in libraries for all the modern general-purpose scripting languages as well. The knee-jerk thing to do with any user input that's meant to be treated as a URL is simply:

url = urlencode(urldecode(what-the-user-supplied))

That's it: no condition testing needed. At worst harmless, otherwise does the right thing no matter what. (Checking for injection exploits is left as an exercise for the implementer.)
posted by George_Spiggott at 5:15 PM on July 15, 2009


Okay, come to think of it, you only do that for the value side of query string name-value pairs, so you need to split that out and do it in a loop. Otherwise you'll encode delimiters and things that you don't really want to. I'll clam up really now.
posted by George_Spiggott at 6:36 PM on July 15, 2009


ok, just sent out a new version of the function that escapes quotes and spaces, handles doubled http://, and trims up the entered-URL. Tested in the usual browser suspects, even IE6, and all seems well. Let me know if you run into problems with it.
posted by pb (staff) at 2:42 PM on July 16, 2009


Works for me. Thanks, pb.
posted by flabdablet at 6:41 PM on July 16, 2009


HTML encoding of URLs is something that so many sites and apps still do incorrectly and it tickles a nerdrage recpetor in my brain, so I'm here 4 days later to say that it's not URL encoding you want to do, it's HTML encoding. And that does include ampersands (i.e., <a href="http://foo.example/x?a=b&z=y"> is not correct HTML, but <a href="http://foo.example/x?a=b&amp;z=y"> is).
posted by jjwiseman at 4:43 PM on July 20, 2009


Yeah, I hear you jjwiseman. We have to walk a line on this feature between absolutely valid HTML and people going WTF? when they double-check their comment. An ampersand works 99.9% of the time in 99.9% of browsers. But yeah, when someone uses lang as the 2nd or 3rd querystring variable in their URL, the browser can choke. For the most part though they just work, and they're what people are expecting to see as they type.
posted by pb (staff) at 7:47 AM on July 21, 2009


« Older PrintMe   |   Last shall be First, etc. Newer »

You are not logged in, either login or create an account to post comments