filtering rss by tag September 2, 2010 4:17 PM   Subscribe

Restricting user RSS feeds to specific tags?

I'd like to pull my askmefi rss feed into my personal site, but I'm looking for a way to filter the questions for quasi-privacy. With LiveJournal I can restrict to specific tags like http://jldugger.livejournal.com/data/rss?tag=ubuntu.

What I was thinking is, I'd tag things I want on my front page 'sync' and (or mark things I don't want 'nosync') filter based on that. Getting the XPath to work correctly is failing me, so I'm wondering if I can get a similar feature from Mefi instead and skip the local processing.
posted by pwnguin to Feature Requests at 4:17 PM (13 comments total)

Try Yahoo Pipes, it probably makes this possible, but otherwise, naw, this is way too niche of a feature.
posted by mathowie (staff) at 4:19 PM on September 2, 2010


Pipes appears to be blocked?
posted by pwnguin at 4:20 PM on September 2, 2010


Yeah, we do block Yahoo Pipes for tag pages because they were requesting them hundreds at a time with no way to rate-limit. But you could probably filter an RSS feed by tag with whichever development environment you're using.
posted by pb (staff) at 4:22 PM on September 2, 2010


Yea, I tried Pipes earlier; but the specific feed I want is blocked and Pipes dutifully complies.

User-agent: *
Disallow: /user.mefi/
Disallow: /username.mefi/
Disallow: /user/

So I guess it's time I start rehosting RSS. I really wish Pipes supported some kind of export format, because it's a damn handy design system.
posted by pwnguin at 4:26 PM on September 2, 2010


Alright, well I think I've figured it out on my end. Easy enough, but caching systems make this stuff a pleasure to debug. For any curious coders, this is the crux of the filter:
xmlstarlet ed -d "//*/item/category[contains(text(),'nosync')]/.."
I love that xmlstarlet tool, it's like sed for XML.
posted by pwnguin at 5:12 PM on September 2, 2010


Here's an equivalent solution I was dicking around with in perl, in case anyone finds it useful:

perl -MXML::RSS -0777 -e 'my $r = XML::RSS->new(version => 2)->parse(<>); $r->{items} = [ grep { !grep(/nosync/i, @{$_->{category}}) } @{$r->{items}} ]; print $r->as_string();' <input >output
posted by Rhomboid at 5:21 PM on September 2, 2010


While I'm always one for repurposing technology, doesn't your sync/nosync tagging strategy risk diluting those tags?

"nosync" probably won't show up legitimately, but I can think of a couple contexts where it might. "sync" on the other hand most certainly will be used as a legitimate tag.
posted by m@f at 7:34 PM on September 2, 2010


Tags are emergent behavior, so I don't see the problem. We mark things resolved even though it could be about the rug cleaner. FWIW, I'm preferring the nosync tag as it's more rare. It's trivial to change though if there's a legitimate concern for this.
posted by pwnguin at 10:09 PM on September 2, 2010


Good point on resolved, but at least it is meaningful to others - and relates to the post - via either of its definitions. Your tag is only meaningful to you.

If it is kosher to use tags in this fashion, why not go more specific and make a tag that has no possibility for overlap?

pwnguin_nosync_blog :)
posted by m@f at 10:32 PM on September 2, 2010


I really think you're overthinking this plate of beans. People don't really do a good job of tagging things that might be helpful. Case in point: city, who's top related tag is New, followed by York.

I figure it's like the FCC interference rules. Anyone using tags has to accept whatever noise might interfere with normal operations.
posted by pwnguin at 12:32 AM on September 3, 2010


Why not set up a whitelist or blacklist of tags on your end? So if you know you don't want your posts about cities to appear on your blog, just filter by that tag.

We have removed completely unrelated tags from posts before, and I think we tend use tags more than some other sites and view them as a community resource rather than an individual's resource. I'm not saying we would remove a nosync_blog tag, but I think it's a possibility—especially if it became a focus of complaints from the community. So just something to keep in mind as you're coding.
posted by pb (staff) at 8:58 AM on September 3, 2010


pb: "Why not set up a whitelist or blacklist of tags on your end?"

The blacklist approach is the one I'm using, and I'm not sure there's a reliable set of tags I can put in a blacklist and be confident I'll never touch again. At which point I may as well blacklist specific URLs.

As far as tags being modded out, that's just a risk I'll have to take. If the community complains for some reason, I'll listen, but I'm having trouble imagining problems. Obviously this is a fine grained privacy approach geared towards reducing feedspam than protecting myself, and not one I imagine many people desire. But right now it's one post, and a tag with a post-set size of one.
posted by pwnguin at 11:26 AM on September 3, 2010


You could set up a an rss feed for each tag you want using feed43.com .. I used it in the past and have no complaints. Can be a bit tricky to set up, but the interface is well thought out.
posted by 3mendo at 10:56 AM on September 4, 2010


« Older Help us with a new feature for Jobs   |   I can't believe I'm actually asking this. Newer »

You are not logged in, either login or create an account to post comments