I Want My MeFiMu! September 8, 2010 7:37 PM Subscribe

A while back, I tried to set my Flashgot Firefox plug-in to download all of the files with the address music.metafilter.com/music/*.mp3, only to find that FlashGot doesn't like wildcards, only ranges (i.e. [a-z], [1...1000], etc.). But I really would like to have a quick easy way to suck all of this music onto my iPod. Dear AskMeTa,

...can someone out there point me in the right direction: a different swarm-downlaoding program, perhaps? Or maybe I am doing it entirely wrong!

Back in February, when I first wanted to try this, there were 4200-4300 songs, estimated at 20GB. That number is now 4952 as of this posting. If it works, ultimately I might assemble a torrent or ten out of the results.

Alternately, this is where I ask MeFi to do all the haaaard work, since I'm lazier than even programmers, so I challenge the programmers out there: if it's easier and quicker for you to scour the music folder yourself and whip up a torrent (or ten) than it is to explain to me how to do so, please announce here that you've done it. (I'd like to know how you did it anyway, too!)

I understand that this may concern some of the musicians past and present represented in the 4952 songs for various reasons. My take, obviously, is: the musicians posted their work on MeMu willingly, knowing the file would available free and in .mp3 format via the posted webpage; this would just make it easier for me and the small sampling of other MeFites I've talked to about this idea, who would absolutely love to swarm download or get a torrent of MeFiMu songs, and listen to all this friggin' talent!

posted by not_on_display to MetaFilter-Related at 7:37 PM (34 comments total) 2 users marked this as a favorite

I coulda sworn iTunes was sneakily downloading every MeFiMu song, since I had subscribed to the RSS feed as a 'podcast'. Setting it up to grab new ones won't solve the problem of downloading everything already posted, but it should set you up for the future.

(As to the issue at hand, I've heard tell that one particular moderator has downloaded every song posted to Music...)
posted by carsonb at 7:57 PM on September 8, 2010

I keep telling y'all, I'm not a mod. And I don't have every song.

Shit, you're not talking about me are you? I'm going to find a hole and take a nap now.
posted by theichibun at 8:08 PM on September 8, 2010

If Virtual Directory Browsing was enabled on just that one folder, you could use something like the Firefox Extension DownThemAll. But alas.
posted by deezil at 8:14 PM on September 8, 2010 [1 favorite]

This will download all of them:

for U in $(seq 8 4952); do curl -s http://music.metafilter.com/$U | perl -lne 'print $1 if m,"file"\,"(http://music\.[^"]+)",i' | wget -i -; done

Note that I don't necessarily endorse slamming the MeFi server like that... maybe add some 'sleep's in there.
posted by Rhomboid at 8:48 PM on September 8, 2010

Yeah, something nicer to the server would be better. Thanks though!
posted by not_on_display at 9:26 PM on September 8, 2010

`wget -r -l 2 -A '*.mp3' --no-parent -w 1 --random-wait` should work.
posted by mkb at 5:15 AM on September 9, 2010

Sorry, I hadn't tested that since I was on a train. Try wget -r -l 2 --no-parent -w 1 --random-wait http://music.metafilter.com -R '*.js' -X random,home
posted by mkb at 6:12 AM on September 9, 2010

OK, so here's where I reveal my ignorance: where do I type this command into?
posted by not_on_display at 6:20 AM on September 9, 2010

Plain wget won't work because there is no download link for non-logged-in users; that's why my version extracts the URL of the mp3 from the flash variables. I suppose you could supply the necessary cookies to wget to simulate being logged in but you'd have to extract them from your browser and that's kind of a pain.
posted by Rhomboid at 6:27 AM on September 9, 2010

BTW to slow mine down just change the end to "; sleep 30; done" or however many seconds you want to pause between tracks.

As to where to type them in, a command prompt. If you are running Windows you'll need to install the necessary tools from Cygwin (bash, perl, wget, curl.) If you're on a Mac or *nix you probably already have them installed.
posted by Rhomboid at 6:31 AM on September 9, 2010

Oh, I didn't even think of that. I think not_on_display is a Mac user, but OS X does not include wget anymore. You can do this with only curl (with the seq removed for bash on OS X 10.5):

for U in {8..4952}; do curl -s http://music.metafilter.com/$U | perl -lne 'print $1 if m,"file"\,"(http://music\.[^"]+)",i' | xargs curl -O ; done

You paste this into a Terminal window. Terminal lives in /Applications/Utilities. (Or you can paste it into a text file, give it an extension of .command, and double-click on it)
posted by mkb at 7:02 AM on September 9, 2010 [1 favorite]

Thanks. I pasted into terminal, now it seems to be downloading stuff. Here's what it says/does, from the first entry:

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3611k 100 3611k 0 0 341k 0 0:00:10 0:00:10 --:--:-- 345k

...So, it's counting up from 0 to 100% in the first column, and then it starts the next one, new row. It's doing something!

My next question: I can't tell where on my computer (yes, a Mac) these files are landing. What are likely places to check?
posted by not_on_display at 9:51 AM on September 9, 2010

...cancel that... it's working; I've found the folder.

...IT'S WORKING!!

OMG, thanks mkb.
where's the best answer button?

I'll post here if I decide to create a torrent. Holy schlamoley this is great.
posted by not_on_display at 9:55 AM on September 9, 2010

Don't thank me! Thank Rhomboid!
posted by mkb at 10:33 AM on September 9, 2010

Thanks, Rhomboid, too! (d'oh!)
posted by not_on_display at 10:46 AM on September 9, 2010

I like this idea, but what I'd really like to see is a monthly torrent, collecting only the songs shared during a single month. I know creating a .torrent file isn't a huge deal (there are libraries), so it shouldn't be difficult to automate. The only concern is having a full time seed to make it come together.
posted by seanmpuckett at 10:54 AM on September 9, 2010

To answer your original question:

download all of the files with the address music.metafilter.com/music/*.mp3

In general, there is no way to do this. Web servers simply don't provide a standard way to say "give me all the URLs that match this pattern". If the resources are being dynamically generated, that list might not even be finite.

As deezil alluded to, if the server is specifically configured to allow it, going to http://www.metafilter.com/music/ might give you a formatted HTML page with links to all the files in that directory. (If, that is, /music/ is actually backed by a single directory on the filesystem.) But the vast majority of servers aren't set up that way. Hence, the need to crawl the individual song pages and extract the download URLs.
posted by teraflop at 12:13 PM on September 9, 2010

For the sake of the archives, the 8 and 4952 are the start and end post ID numbers, so if you just wanted to grab say the last 100 tracks you could change those accordingly. And if anyone in the future wants to give it a try, the 4952 will need incrementing -- that was just what happened to be the highest track ID at the time of writing.
posted by Rhomboid at 12:24 PM on September 9, 2010

(oh, heh, it's up to 4954 by now anyway.)
posted by Rhomboid at 12:28 PM on September 9, 2010

For the sake of the archives I'd use HTTrack for this task. Handles all sorts of dynamic generation; configurable download limits (time, rate and GB based), you can set it to only grab specific files and it handles sites requiring login. Installers for most popular operating systems.
posted by Mitheral at 12:59 PM on September 9, 2010

That won't work for the same reason that wget won't -- the mp3 URLs are not links anywhere on the page.
posted by Rhomboid at 1:37 PM on September 9, 2010

Links to the mp3 file (assuming I don't have an extension creating them automagically) appear next to the flash player on the song's comment page. HTTrack can be set just to get those files (and to look no further than one link down there by not grabbing the entirety of Metafilter). I imagine wget could too but I'm much better at clicking and typing things into dialogue boxes than I am at writing regular expressions.
posted by Mitheral at 2:12 PM on September 9, 2010

Again, those links are only there if you're logged in. The page that wget or HTTrack will retrieve will not have them, which is why it's necessary to extract them from the flash variables.
posted by Rhomboid at 11:33 PM on September 9, 2010

HTTrack allows you to capture sites that require login. I use it for that purpose all the time. In the case of metafilter which is using cookie based authentication you can just copy your browser cookie from your browser over to the project folder. But it'll also handle form and HTTP style authentication.
posted by Mitheral at 2:42 AM on September 10, 2010

I have'em all now, approx 4958 songs, 20.2 GB. I'll announce a torrent when I have time to make the torrent.

Thanks again to youse!
posted by not_on_display at 10:30 PM on September 10, 2010

seanmpuckett: I like this idea, but what I'd really like to see is a monthly torrent, collecting only the songs shared during a single month.

Well, this won't help not_on_display, but I made this bash script, saved as grab-month.sh, to fetch the files & put them in folders for years & months, on Ubuntu:

#!/bin/bash

# downloads tracks from music.metafilter.com for a specified month.
# example: ./grab-month.sh 2006 6

export year=$1
export month=$2

if ! [ $2 ]; then
  echo Please specify year and month of music to grab.
  exit 1
fi

echo Fetching month and year = $month / $year
mkdir -p $year/$month
cd $year/$month || \
  (echo  "Can't make or enter directory $2/$1" && exit 1)

curl -s http://music.metafilter.com/archived.mefi/$month/01/$year/ | \
  perl -lne 'print $1 if m,"file"\,"(http://music\.[^"]+)",i' | \
  wget --limit-rate=100k -c -i -
# to use curl instead, uncomment below & put it in the above line's place
#  xargs -L 1 -t curl -C - --limit-rate 100k -O

As an example, use "./grab-month.sh 2006 06" to grab June 2006 songs. I prefer wget because it preserves datestamps on files. It might be worth noting that this loads a few dozen monthly pages instead of every single song page. This will skip deleted songs, if they exist. I've chosen parameters for wget & curl & xargs to be a bit more verbose, to limit bandwidth to 100k/sec, & to resume downloads.

To get everything so far:

for year in {2006..2010}; do for month in {1..12}; do echo Attempting year $year, month $month.; ./grab-month.sh $year $month; done; done

This makes 2006/6 instead of 2006/06, but that's simple to move into place later.
posted by Pronoiac at 1:40 PM on September 18, 2010

Things I've noticed:
* curl likes to encode spaces as %20, so it saves "The%20Allusions%20%2D%20Color%20of%20Love.mp3" instead of "The Allusions - Color of Love.mp3" - this is another reason I prefer wget.
* Almost 5k files in one directory might be slow to access on some filesystems, so splitting up (by year & month) is likely a really good idea.
* The filenames are often cryptic. Automatically renaming the files to match the id3 tags, like "artist - title.mp3" should help that. On the other hand, favoriting & adding tracks to playlists might be simpler if they're left as is. Maybe a Google Doc spreadsheet would help with this.
* This is big enough that it might take a couple of weeks for me to upload by myself. I think I see a way to use par2 files to sync up differently organized & named attempts. Or rsync & tar files. Hm.
* I'm bored & off to get a snack.
posted by Pronoiac at 3:00 PM on September 18, 2010

Hey, are people still interested in a torrent?
posted by Pronoiac at 12:10 PM on October 1, 2010

Sure!
posted by seanmpuckett at 3:05 PM on October 1, 2010

I've pinged not_on_display, & I'll work on it this weekend.
posted by Pronoiac at 4:42 PM on October 1, 2010

I'd be interested in a torrent but 20G is a lot for me to suck down at once (and would be hard to update going forward). How about splitting it up into multiple torrents? Either by year or by song number? With song number 500 or a 1000 at a time would be reasonable and would mean updates could be pushed semi regularly.
posted by Mitheral at 6:22 PM on October 1, 2010

Mitheral, your BitTorrent client should let you selectively download files or folders from a torrent.

About frequency, I was thinking, make one big torrent, & then newer, smaller torrents that cover, for example, October, or the remainder of 2010. Rolling monthly, quarterly, then yearly collections are simpler to keep track of than going by song count.

Of course, this might be thinking too far ahead - this might ever only be used by a handful of people, who decide that mailing dvdrs around is faster & simpler. Who knows?
posted by Pronoiac at 10:33 AM on October 2, 2010

I put together a spreadsheet of the filenames & the id3 tags for artist & track titles, using a script or two. This could be helpful to figure out whose song you like on your iPod.

Surely someone's done this before: given an md5sum checksum file & a bunch of files with alternate names & directories, rename & move the files to match the checksum file. What's the elegant solution for this? Given a checksum file for the alternate structure, I could generate a script file to move everything around, but there should be a simpler, more general method - already available software that does this.

Also, I want to bounce my post off a couple of people.
posted by Pronoiac at 1:20 PM on October 4, 2010

This thread is closing tonight, & I'm dropping a note here to mention that we haven't forgotten the torrent, & we're working on it right now. We're syncing up as much as we can so we can have two people uploading as we begin. Keep an eye on the torrent tag in the next few days.
posted by Pronoiac at 5:40 PM on October 8, 2010

« Older MeFi UEFA | Deleted Post Newer »

You are not logged in, either login or create an account to post comments

MetaTalk

I Want My MeFiMu! September 8, 2010 7:37 PM Subscribe

Tags

Share