filtering rss by tag September 2, 2010 4:17 PM Subscribe
Restricting user RSS feeds to specific tags?
I'd like to pull my askmefi rss feed into my personal site, but I'm looking for a way to filter the questions for quasi-privacy. With LiveJournal I can restrict to specific tags like http://jldugger.livejournal.com/data/rss?tag=ubuntu.
What I was thinking is, I'd tag things I want on my front page 'sync' and (or mark things I don't want 'nosync') filter based on that. Getting the XPath to work correctly is failing me, so I'm wondering if I can get a similar feature from Mefi instead and skip the local processing.
I'd like to pull my askmefi rss feed into my personal site, but I'm looking for a way to filter the questions for quasi-privacy. With LiveJournal I can restrict to specific tags like http://jldugger.livejournal.com/data/rss?tag=ubuntu.
What I was thinking is, I'd tag things I want on my front page 'sync' and (or mark things I don't want 'nosync') filter based on that. Getting the XPath to work correctly is failing me, so I'm wondering if I can get a similar feature from Mefi instead and skip the local processing.
Yeah, we do block Yahoo Pipes for tag pages because they were requesting them hundreds at a time with no way to rate-limit. But you could probably filter an RSS feed by tag with whichever development environment you're using.
posted by pb (staff) at 4:22 PM on September 2, 2010
posted by pb (staff) at 4:22 PM on September 2, 2010
Yea, I tried Pipes earlier; but the specific feed I want is blocked and Pipes dutifully complies.
User-agent: *
Disallow: /user.mefi/
Disallow: /username.mefi/
Disallow: /user/
So I guess it's time I start rehosting RSS. I really wish Pipes supported some kind of export format, because it's a damn handy design system.
posted by pwnguin at 4:26 PM on September 2, 2010
User-agent: *
Disallow: /user.mefi/
Disallow: /username.mefi/
Disallow: /user/
So I guess it's time I start rehosting RSS. I really wish Pipes supported some kind of export format, because it's a damn handy design system.
posted by pwnguin at 4:26 PM on September 2, 2010
Alright, well I think I've figured it out on my end. Easy enough, but caching systems make this stuff a pleasure to debug. For any curious coders, this is the crux of the filter:
posted by pwnguin at 5:12 PM on September 2, 2010
xmlstarlet ed -d "//*/item/category[contains(text(),'nosync')]/.."I love that xmlstarlet tool, it's like sed for XML.
posted by pwnguin at 5:12 PM on September 2, 2010
Here's an equivalent solution I was dicking around with in perl, in case anyone finds it useful:
perl -MXML::RSS -0777 -e 'my $r = XML::RSS->new(version => 2)->parse(<>); $r->{items} = [ grep { !grep(/nosync/i, @{$_->{category}}) } @{$r->{items}} ]; print $r->as_string();' <input >output>
posted by Rhomboid at 5:21 PM on September 2, 2010
perl -MXML::RSS -0777 -e 'my $r = XML::RSS->new(version => 2)->parse(<>); $r->{items} = [ grep { !grep(/nosync/i, @{$_->{category}}) } @{$r->{items}} ]; print $r->as_string();' <input >output>
posted by Rhomboid at 5:21 PM on September 2, 2010
While I'm always one for repurposing technology, doesn't your sync/nosync tagging strategy risk diluting those tags?
"nosync" probably won't show up legitimately, but I can think of a couple contexts where it might. "sync" on the other hand most certainly will be used as a legitimate tag.
posted by m@f at 7:34 PM on September 2, 2010
"nosync" probably won't show up legitimately, but I can think of a couple contexts where it might. "sync" on the other hand most certainly will be used as a legitimate tag.
posted by m@f at 7:34 PM on September 2, 2010
Tags are emergent behavior, so I don't see the problem. We mark things resolved even though it could be about the rug cleaner. FWIW, I'm preferring the nosync tag as it's more rare. It's trivial to change though if there's a legitimate concern for this.
posted by pwnguin at 10:09 PM on September 2, 2010
posted by pwnguin at 10:09 PM on September 2, 2010
Good point on resolved, but at least it is meaningful to others - and relates to the post - via either of its definitions. Your tag is only meaningful to you.
If it is kosher to use tags in this fashion, why not go more specific and make a tag that has no possibility for overlap?
pwnguin_nosync_blog :)
posted by m@f at 10:32 PM on September 2, 2010
If it is kosher to use tags in this fashion, why not go more specific and make a tag that has no possibility for overlap?
pwnguin_nosync_blog :)
posted by m@f at 10:32 PM on September 2, 2010
I really think you're overthinking this plate of beans. People don't really do a good job of tagging things that might be helpful. Case in point: city, who's top related tag is New, followed by York.
I figure it's like the FCC interference rules. Anyone using tags has to accept whatever noise might interfere with normal operations.
posted by pwnguin at 12:32 AM on September 3, 2010
I figure it's like the FCC interference rules. Anyone using tags has to accept whatever noise might interfere with normal operations.
posted by pwnguin at 12:32 AM on September 3, 2010
Why not set up a whitelist or blacklist of tags on your end? So if you know you don't want your posts about cities to appear on your blog, just filter by that tag.
We have removed completely unrelated tags from posts before, and I think we tend use tags more than some other sites and view them as a community resource rather than an individual's resource. I'm not saying we would remove a nosync_blog tag, but I think it's a possibility—especially if it became a focus of complaints from the community. So just something to keep in mind as you're coding.
posted by pb (staff) at 8:58 AM on September 3, 2010
We have removed completely unrelated tags from posts before, and I think we tend use tags more than some other sites and view them as a community resource rather than an individual's resource. I'm not saying we would remove a nosync_blog tag, but I think it's a possibility—especially if it became a focus of complaints from the community. So just something to keep in mind as you're coding.
posted by pb (staff) at 8:58 AM on September 3, 2010
pb: "Why not set up a whitelist or blacklist of tags on your end?"
The blacklist approach is the one I'm using, and I'm not sure there's a reliable set of tags I can put in a blacklist and be confident I'll never touch again. At which point I may as well blacklist specific URLs.
As far as tags being modded out, that's just a risk I'll have to take. If the community complains for some reason, I'll listen, but I'm having trouble imagining problems. Obviously this is a fine grained privacy approach geared towards reducing feedspam than protecting myself, and not one I imagine many people desire. But right now it's one post, and a tag with a post-set size of one.
posted by pwnguin at 11:26 AM on September 3, 2010
The blacklist approach is the one I'm using, and I'm not sure there's a reliable set of tags I can put in a blacklist and be confident I'll never touch again. At which point I may as well blacklist specific URLs.
As far as tags being modded out, that's just a risk I'll have to take. If the community complains for some reason, I'll listen, but I'm having trouble imagining problems. Obviously this is a fine grained privacy approach geared towards reducing feedspam than protecting myself, and not one I imagine many people desire. But right now it's one post, and a tag with a post-set size of one.
posted by pwnguin at 11:26 AM on September 3, 2010
You could set up a an rss feed for each tag you want using feed43.com .. I used it in the past and have no complaints. Can be a bit tricky to set up, but the interface is well thought out.
posted by 3mendo at 10:56 AM on September 4, 2010
posted by 3mendo at 10:56 AM on September 4, 2010
You are not logged in, either login or create an account to post comments
posted by mathowie (staff) at 4:19 PM on September 2, 2010