What the heck is securewebhosting.org? July 20, 2015 8:39 AM Subscribe
securewebhosting.org
In addition to that weirdness, googling securewebhosting & metafilter gives me pages like http://faq.metafilter.com/237/SecureWebhosting-and-copyright, which is even weirder. Is there some reasonable explanation to all of this that I'm not yet caffeinated enough to figure out myself?
In addition to that weirdness, googling securewebhosting & metafilter gives me pages like http://faq.metafilter.com/237/SecureWebhosting-and-copyright, which is even weirder. Is there some reasonable explanation to all of this that I'm not yet caffeinated enough to figure out myself?
(Also I delinked the URL in the post because I don't want to give 'em even that amount of ancillary link juice.)
posted by cortex (staff) at 8:57 AM on July 20, 2015 [1 favorite]
posted by cortex (staff) at 8:57 AM on July 20, 2015 [1 favorite]
oh wow. it's some kind of automated copy of mefi.
note that the link you gave doesn't mean anything here "knows" about them. for example - http://faq.metafilter.com/237/poop-and-smelly-girls also "exists".
posted by andrewcooke at 8:58 AM on July 20, 2015 [2 favorites]
note that the link you gave doesn't mean anything here "knows" about them. for example - http://faq.metafilter.com/237/poop-and-smelly-girls also "exists".
posted by andrewcooke at 8:58 AM on July 20, 2015 [2 favorites]
Essentially, someone cloned the entire www.metafilter.com domain on July 19th and search-and-replaced every instance of "Metafilter" with "SecureWebhosting". As a result, they also search-and-replaced the titles on the FAQ page, and also the URLs. But since the titles aren't actually used in the URLs, the links still work.
posted by smackfu at 8:58 AM on July 20, 2015
posted by smackfu at 8:58 AM on July 20, 2015
Yep, we don't enforce accurate link stubs on FAQ entries like we do with posts. We'll get that updated soon.
posted by pb (staff) at 8:59 AM on July 20, 2015
posted by pb (staff) at 8:59 AM on July 20, 2015
And you moved the FAQ (or maybe just the link) to its own domain since July 17, so if they rescrape it, the FAQ page will no longer be included in the copy.
posted by smackfu at 9:02 AM on July 20, 2015
posted by smackfu at 9:02 AM on July 20, 2015
Not sure I follow, smackfu. We haven't made any changes to the FAQ recently.
posted by pb (staff) at 9:03 AM on July 20, 2015
posted by pb (staff) at 9:03 AM on July 20, 2015
Yeah, I think they went as far as actively choosing to scrape the FAQ; I don't mean literally "no other subsites", just that they didn't do a remotely thorough job.
posted by cortex (staff) at 9:06 AM on July 20, 2015
posted by cortex (staff) at 9:06 AM on July 20, 2015
Yeah, I guess it's not actually a change. When you are logged out, the FAQ page links to http://www.metafilter.com/faq.mefi. When you are logged in, it links to http://faq.metafilter.com/
posted by smackfu at 9:09 AM on July 20, 2015
posted by smackfu at 9:09 AM on July 20, 2015
Yeah, smackfu, that's a difference between themes. If you're using Classic or Plain you'll get links to the whole FAQ. The Modern theme (default for logged-out users) has subsite-specific FAQ pages.
posted by pb (staff) at 9:15 AM on July 20, 2015
posted by pb (staff) at 9:15 AM on July 20, 2015
So what is google crawling/indexing/whatever they do to wind up with search results like http://faq.metafilter.com/130/How-does-advertising-on-S*****W**h******-work? Is that us that google encountered, or SWH? [apologies if there's an obvious answer to this, I'm not really well versed on how URL's relate to real live servers or what google actually does behind the scenes with a query].
posted by a box and a stick and a string and a bear at 9:44 AM on July 20, 2015
posted by a box and a stick and a string and a bear at 9:44 AM on July 20, 2015
I notice that this is running a javascript tracking script that is associated with malware (tongji.js, google link) so beyond being pretty shitty it may not be safe to load. I'm not up on malware scanners but it did trip one random one that I tried, and it appears that the site hosting that js is blacklisted by various parties; at the least these probably aren't people you want tracking you.
(Maybe this is an angle that would help get rid of it?)
posted by advil at 9:47 AM on July 20, 2015
(Maybe this is an angle that would help get rid of it?)
posted by advil at 9:47 AM on July 20, 2015
@a box and blah blah - they just dumbly replaced "metafilter" with "securewebhosting". because of the way that metafilter works, that part is ignored. see my comment above and pb's reply.
so http://faq.metafilter.com/237/MetaFilter-and-copyright became http://faq.metafilter.com/237/SecureWebhosting-and-copyright
that link then exists in their weird pages, and google indexed it.
posted by andrewcooke at 9:50 AM on July 20, 2015
so http://faq.metafilter.com/237/MetaFilter-and-copyright became http://faq.metafilter.com/237/SecureWebhosting-and-copyright
that link then exists in their weird pages, and google indexed it.
posted by andrewcooke at 9:50 AM on July 20, 2015
Ahh, that makes sense, thanks. It didn't cross my mind that google would treat those wonky links as the real deal.
posted by a box and a stick and a string and a bear at 9:54 AM on July 20, 2015
posted by a box and a stick and a string and a bear at 9:54 AM on July 20, 2015
But also as pb mentioned, the rest of the site does not allow that tomfoolery any more. If you change the title in the url fn a MetaTalk page, for instance, it sends a 301 Permanent Redirect response to move you to the correct URL. Google respects the 301 and wouldn't index the "wrong" URL then.
posted by smackfu at 10:06 AM on July 20, 2015
posted by smackfu at 10:06 AM on July 20, 2015
We're now enforcing our link stub for FAQ URLs. So the odd "SecureWebhosting-and-copyright" essentially doesn't exist on our server anymore. So the next time Google (or anyone) brings up the bogus URL they'll be redirected to the correct URL.
posted by pb (staff) at 10:06 AM on July 20, 2015 [1 favorite]
posted by pb (staff) at 10:06 AM on July 20, 2015 [1 favorite]
MetaTalk: Ancillary link juice
(I haven't been around much lately, do we still do this?)
posted by Plutor at 10:14 AM on July 20, 2015 [8 favorites]
(I haven't been around much lately, do we still do this?)
posted by Plutor at 10:14 AM on July 20, 2015 [8 favorites]
It is an eldritch tradition, impervious to the ravages of time.
posted by cortex (staff) at 10:15 AM on July 20, 2015 [21 favorites]
posted by cortex (staff) at 10:15 AM on July 20, 2015 [21 favorites]
So, it's weird and shitty. I checked in with Matt but he hadn't had much else in the way of process for dealing with this stuff in the past either. If folks have suggestions based on specific experience with situations like this, I'm happy to learn what else is available.
I do have some experience dealing with this kind of thing, to the extent that we repeatedly had people put up blatant copies in other jurisdictions of the relatively high-traffic site I worked on. Slightly different vibe, as there was a big e-commerce component and the content that got copied was technically CC-licensed, but basically the same kind of "let's pretend to be this site!" behavior only with the effort of actually wiring it up to a product catalog and using it to sell physical goods. I don't have a lot of great advice beyond measures you've already taken, other than that we usually just told them it wasn't cool and waited for time and Google search algorithm tuning to solve the problem. (I recognize this latter may not be a particularly appealing thing to say in reference to MeFi specifically, but.) In one case we politely pointed out some glaring bugs on their site and replaced the hotlinked graphics/js with kittens and unicorns and such, and eventually they got the hint and changed up their branding.
Weird spammy bullshit like this usually has a pretty short effective half-life, is what I'm getting at.
It is an eldritch tradition, impervious to the ravages of time.
I keep hoping that if I ignore it...
posted by brennen at 10:33 AM on July 20, 2015
I do have some experience dealing with this kind of thing, to the extent that we repeatedly had people put up blatant copies in other jurisdictions of the relatively high-traffic site I worked on. Slightly different vibe, as there was a big e-commerce component and the content that got copied was technically CC-licensed, but basically the same kind of "let's pretend to be this site!" behavior only with the effort of actually wiring it up to a product catalog and using it to sell physical goods. I don't have a lot of great advice beyond measures you've already taken, other than that we usually just told them it wasn't cool and waited for time and Google search algorithm tuning to solve the problem. (I recognize this latter may not be a particularly appealing thing to say in reference to MeFi specifically, but.) In one case we politely pointed out some glaring bugs on their site and replaced the hotlinked graphics/js with kittens and unicorns and such, and eventually they got the hint and changed up their branding.
Weird spammy bullshit like this usually has a pretty short effective half-life, is what I'm getting at.
It is an eldritch tradition, impervious to the ravages of time.
I keep hoping that if I ignore it...
posted by brennen at 10:33 AM on July 20, 2015
Ancillary link juice
I can't wait for that to come out - I loved the first two!
posted by moonmilk at 10:39 AM on July 20, 2015 [21 favorites]
I can't wait for that to come out - I loved the first two!
posted by moonmilk at 10:39 AM on July 20, 2015 [21 favorites]
and waited for time and Google search algorithm tuning to solve the problem. (I recognize this latter may not be a particularly appealing thing to say in reference to MeFi specifically, but.)
Are Google up to this? They're advertising for an SEO Manager to try and get Google things higher on Google search results. Did Google lose their manual or forget how their algorithm works?
posted by Wordshore at 10:41 AM on July 20, 2015
Are Google up to this? They're advertising for an SEO Manager to try and get Google things higher on Google search results. Did Google lose their manual or forget how their algorithm works?
posted by Wordshore at 10:41 AM on July 20, 2015
Out of curiousity, did they also scrape profile pages? Is
posted by maryr at 11:18 AM on July 20, 2015
http://www.SecureWebHosting.org/user/7418
out there somewher?posted by maryr at 11:18 AM on July 20, 2015
MetaTalk: Ancillary link juice
(I haven't been around much lately, do we still do this?)
It is now SecureWebHostingTalk: Ancillary link juice.
posted by maryr at 11:19 AM on July 20, 2015 [1 favorite]
(I haven't been around much lately, do we still do this?)
It is now SecureWebHostingTalk: Ancillary link juice.
posted by maryr at 11:19 AM on July 20, 2015 [1 favorite]
Out of curiousity, did they also scrape profile pages?
The did, which is one of the more assholish knock-on effects of the whole thing, because one thing they didn't scrape is our robots.txt file, which is what tells search bots and such not to index various things, including user pages.
posted by cortex (staff) at 11:22 AM on July 20, 2015 [7 favorites]
The did, which is one of the more assholish knock-on effects of the whole thing, because one thing they didn't scrape is our robots.txt file, which is what tells search bots and such not to index various things, including user pages.
posted by cortex (staff) at 11:22 AM on July 20, 2015 [7 favorites]
It feels spooky parallel universe.
(Also I delinked the URL in the post because I don't want to give 'em even that amount of ancillary link juice.)
You might say that's some Ancilllary Justice right there.
posted by SpacemanStix at 11:40 AM on July 20, 2015 [2 favorites]
(Also I delinked the URL in the post because I don't want to give 'em even that amount of ancillary link juice.)
You might say that's some Ancilllary Justice right there.
posted by SpacemanStix at 11:40 AM on July 20, 2015 [2 favorites]
*Ancillary link juice
I can't wait for that to come out - I loved the first two!*
MetaFilter, but with only female gender pronouns unless you explicitly specify a different one.
posted by Going To Maine at 11:40 AM on July 20, 2015 [3 favorites]
I can't wait for that to come out - I loved the first two!*
MetaFilter, but with only female gender pronouns unless you explicitly specify a different one.
posted by Going To Maine at 11:40 AM on July 20, 2015 [3 favorites]
SecureWebHosting.org is one of the strangest sites I've seen in some time. I have no idea how these people got MetaFilter wedged into their HTML, or why.
posted by Going To Maine at 11:48 AM on July 20, 2015 [9 favorites]
posted by Going To Maine at 11:48 AM on July 20, 2015 [9 favorites]
just that they didn't do a remotely thorough job.
I mean, if you're going to scrape us, at least do it right and do it thorough! We have standards.
posted by nubs at 11:52 AM on July 20, 2015
I mean, if you're going to scrape us, at least do it right and do it thorough! We have standards.
posted by nubs at 11:52 AM on July 20, 2015
I am thoroughly amused by the idea of pb with a very large mallet whacking a bunch of molebots.
posted by bedhead at 11:58 AM on July 20, 2015
posted by bedhead at 11:58 AM on July 20, 2015
I once went on an ancillary link juice diet, but it made me really bloated and gassy.
posted by exogenous at 1:18 PM on July 20, 2015
posted by exogenous at 1:18 PM on July 20, 2015
Metafilter: (I haven't been around much lately, do we still do this?)
posted by Pink Frost at 1:21 PM on July 20, 2015 [4 favorites]
posted by Pink Frost at 1:21 PM on July 20, 2015 [4 favorites]
The did, which is one of the more assholish knock-on effects of the whole thing, because one thing they didn't scrape is our robots.txt file, which is what tells search bots and such not to index various things, including user pages.
I do not in any way mean this as a strong negative comment, but this is probably a good reminder that robots.txt is a useful but very limited sort of abstraction. If it can be indexed, it probably will be, for better or worse.
posted by brennen at 1:21 PM on July 20, 2015
I do not in any way mean this as a strong negative comment, but this is probably a good reminder that robots.txt is a useful but very limited sort of abstraction. If it can be indexed, it probably will be, for better or worse.
posted by brennen at 1:21 PM on July 20, 2015
No, absolutely. We block indexing on profile pages because there's some general utility there, but we've also been super clear over the years in discussions of site privacy and anonymity that that is a permeable barrier and that securing publicly-displayed info isn't fundamentally doable. Fortunately most of the contexts where it would practically arise are human-mediated ones where people being decent makes it a moot point. Unfortunately, spammer and robotic jerks also exist.
posted by cortex (staff) at 1:23 PM on July 20, 2015 [2 favorites]
posted by cortex (staff) at 1:23 PM on July 20, 2015 [2 favorites]
Get bended.
posted by maryr at 3:18 PM on July 20, 2015 [1 favorite]
posted by maryr at 3:18 PM on July 20, 2015 [1 favorite]
I'm a skrull!
posted by vrakatar at 4:33 PM on July 20, 2015 [1 favorite]
posted by vrakatar at 4:33 PM on July 20, 2015 [1 favorite]
MetaTalk: an eldritch tradition, impervious to the ravages of time.
posted by double block and bleed at 7:08 PM on July 20, 2015 [1 favorite]
posted by double block and bleed at 7:08 PM on July 20, 2015 [1 favorite]
Greg Nog: "I tried to log into securewebhosting, but it just brought me to the mefi login page. Which is kind of disappointing; I was kind of hoping I'd be able to comment there as some kind of Bizarro Nightmare Mefi, where all is permitted and chaos reigns."
It looks like they have moderators:
posted by double block and bleed at 7:27 PM on July 20, 2015 [11 favorites]
It looks like they have moderators:
The moderators are: hypothalumus, contented_settler, zat, CrawfishSock, badnewsforthesane, coder bp, and very occasionally vacaimpecable.
posted by double block and bleed at 7:27 PM on July 20, 2015 [11 favorites]
Can we do anything to really screw with them, at least?
All new content is in Wingdings? Images turned back on, auto playing videos, etc? Every comment is the Treaty of Westphalia?
Maybe some terrible, terrible hacks that we just didn't catch get scraped and it knocks their whole site down.... Oops!
Maybe we could all just be really quiet for a few days and they'll forget about us?
posted by anotherpanacea at 3:13 AM on July 21, 2015
All new content is in Wingdings? Images turned back on, auto playing videos, etc? Every comment is the Treaty of Westphalia?
Maybe some terrible, terrible hacks that we just didn't catch get scraped and it knocks their whole site down.... Oops!
Maybe we could all just be really quiet for a few days and they'll forget about us?
posted by anotherpanacea at 3:13 AM on July 21, 2015
Can we do anything to really screw with them, at least?
A good FPP on the blue about SecureWebHosting.org, with many circuitous links, ought to do it.
posted by Going To Maine at 8:24 AM on July 21, 2015
A good FPP on the blue about SecureWebHosting.org, with many circuitous links, ought to do it.
posted by Going To Maine at 8:24 AM on July 21, 2015
I had someone threaten to report me to the Internet Police. We could try that.
posted by double block and bleed at 12:49 PM on July 21, 2015
posted by double block and bleed at 12:49 PM on July 21, 2015
Put out an APB for internet fraud detective squad, station number 9.
posted by Going To Maine at 2:14 PM on July 21, 2015 [2 favorites]
posted by Going To Maine at 2:14 PM on July 21, 2015 [2 favorites]
It could be fun to get their CloudFlare account cancelled:
(1) Host a copy of cloudflare.com at metafilter.com/12345/cloudflare. Maybe throw in a fake credit card form. It should be possible to feed URLs to their bot without anyone else seeing them.
(2) Wait for ScumCo to copy your CloudFlare copy.
(3) Report securewebhosting.com/12345/cloudflare to CloudFlare for abuse, i.e. running a fake version of CloudFlare.
If that doesn't work, well, there's all sorts of other things you could get them to host that might cause trouble with this or that patron ...
posted by jhc at 6:53 PM on July 22, 2015
(1) Host a copy of cloudflare.com at metafilter.com/12345/cloudflare. Maybe throw in a fake credit card form. It should be possible to feed URLs to their bot without anyone else seeing them.
(2) Wait for ScumCo to copy your CloudFlare copy.
(3) Report securewebhosting.com/12345/cloudflare to CloudFlare for abuse, i.e. running a fake version of CloudFlare.
If that doesn't work, well, there's all sorts of other things you could get them to host that might cause trouble with this or that patron ...
posted by jhc at 6:53 PM on July 22, 2015
"Basically it's some Russian site..."
If that doesn't work, well, there's all sorts of other things you could get them to host that might cause trouble with this or that patron ...
Mocked-up "Putin Porn", followed by a tip-off to the Kremlin, should do it.
posted by Wordshore at 7:16 PM on July 22, 2015
If that doesn't work, well, there's all sorts of other things you could get them to host that might cause trouble with this or that patron ...
Mocked-up "Putin Porn", followed by a tip-off to the Kremlin, should do it.
posted by Wordshore at 7:16 PM on July 22, 2015
Yeah, I was bored in a line 45 minutes ago and searched "jade helm metafilter" to see if there were any threads around this particular right-wing trainwreck that might help me understand it, and got these results:
http://imgur.com/Mjelt4R
basically one legit result from MeFi and then pages and pages and pages of this spam, with "jade helm" and "nick cave" (I guess he's a popular search term since his son passed away?) actually added as keywords somehow.
I don't know how the Internet wizardry works, but it sure is annoying.
posted by Shepherd at 10:00 AM on July 24, 2015
http://imgur.com/Mjelt4R
basically one legit result from MeFi and then pages and pages and pages of this spam, with "jade helm" and "nick cave" (I guess he's a popular search term since his son passed away?) actually added as keywords somehow.
I don't know how the Internet wizardry works, but it sure is annoying.
posted by Shepherd at 10:00 AM on July 24, 2015
You are not logged in, either login or create an account to post comments
Basically it's some Russian site that's just really thoroughly scraped mefi (specifically the blue, it seems like, no other subsites) to grab all the threads since day one and some of the ancillary/nav pages like the Archives, and then clumsily replaced "Metafilter" with their own site name. It's a weird mix of aggressive and half-assed, given that a bunch of the nav links and other stuff just end up pointing back to us anyway, and it's not totally clear to me what their goal is other than just scamming for search traffic to then do...something with, eventually.
I've been trying to dig into it since someone gave us a heads up about it last week, with not a whole lot to show for it unfortunately. The details there:
- I've got a copyright removal request in with Google, which is pending and if that is successful would delist it, which would be great but we'll see what happens.
- Their nameservice is with CloudFlare, so I put in an abuse report there as well, but got a quick automated reply that since they're not literally hosting they won't take action.
- The site's registration info is masked by privacyprotect.org, with whom I filed an abuse report to request actual contact info for the site, but I've gotten no response so far there. I'm doubtful that an email the folks running the site would have much effect, but it'd be nice to at least mark that off the list.
- The apparent host is a Russian service provider with an incredibly doubtful abuse report process that involves writing a notarized letter including legal documentation of copyright and yadda yadda and then waiting 60 days.
So, it's weird and shitty. I checked in with Matt but he hadn't had much else in the way of process for dealing with this stuff in the past either. If folks have suggestions based on specific experience with situations like this, I'm happy to learn what else is available.
In the mean time, most of the scraping damage is done—we saw a conspicuous amount of bot visits a while back that pb had to play pretty active whackamole with to keep from causing server performance problems (they kept hopping around), and that seems almost certainly to have been related to this since the domain in question was only re-registered a month ago. They do also seem to be doing ongoing scraping, which we're trying to investigate to see if we can identify more about where they're coming from and potentially disrupt that, but they're unlikely to rescrape most of the 150K+ threads that are older.
Based on what we've found in logs and such, it doesn't look the scraping came from the same servers as the actual site the content's being hosted on; possibly/probably they've been doing that all via AWS instances or similar and then just moving stuff around after.
posted by cortex (staff) at 8:57 AM on July 20, 2015 [6 favorites]