Does someone have a script or something? February 28, 2012 8:06 AM Subscribe

I would like to take the links from the text file of my exported comments (found in my edit profile page) and turn them into a bookmark file that I can import. I would like to avoid doing that manually. Does someone know of a relatively easy way to accomplish this?

This is basically what the information in the file looks like:

2012-02-27 09:44:28.203
http://ask.metafilter.com/209185/How-to-get-business-search-results-to-change-contact-info-in-one-or-two-fell-swoops#3017384
[comment text]
-----
2012-02-27 09:23:42.123
http://ask.metafilter.com/209200/Cats-and-Carpet#3017347
[comment text]

posted by Kimberly to MetaFilter-Related at 8:06 AM (33 comments total) 3 users marked this as a favorite

We don't offer a way to do this, but it should be possible to script something up. Just to add a bit more info: Kimberly would like to take the proprietary MetaFilter comments export file and translate it into the Netscape Bookmark File Format. Services like Pinboard and Delicious use the bookmarks file format for importing links.
posted by pb (staff) at 8:18 AM on February 28, 2012 [2 favorites]

Man, I would love to be able to do that too.
posted by zarq at 9:08 AM on February 28, 2012

What a weirdly difficult thing this seems to be to do. I can think of hacky ways to do it with Notepad++, though.
posted by koeselitz at 9:24 AM on February 28, 2012

Right? If all I can get out of this is an automated way to pull the links out of the rest of the content or have everything defined as a separate field, that would be better than nothing.
posted by Kimberly at 9:29 AM on February 28, 2012

I think I'm missing something, but do you want a link to the thread or comment, and if the comment: all at the same level, or nested in sub-folders by thread?
posted by BrotherCaine at 9:34 AM on February 28, 2012

Or are you looking for links embedded inside the comments?
posted by BrotherCaine at 9:35 AM on February 28, 2012

The links to the comments are in there (and that's what I want). Currently they are all at the same level which is fine, but sub-folders by thread would be awesome.
posted by Kimberly at 9:36 AM on February 28, 2012

So in other words, there's this text file that has links to all the comments I've made as well as the comments themselves. That's nifty, but I want to be able to export the links to the comments I've made to places (delicious is a good example) and in order to do that I need a bookmark file. I don't really need the text of the comments per se.
posted by Kimberly at 9:38 AM on February 28, 2012

If all I can get out of this is an automated way to pull the links out of the rest of the content or have everything defined as a separate field, that would be better than nothing.

That's fairly easy. If you're using a Mac you can do it from the Terminal like this:

grep -e '^http://.*' my-mefi-comments.txt

That will give you a list of all your comment URLs. But to create your bookmark file I think you'll also want the date/time of the comment and a brief excerpt of the comment with the HTML stripped. That's all possible, but it'll take someone some time to put together.
posted by pb (staff) at 10:12 AM on February 28, 2012

I just threw together a python script that is close to what pb's grep line does, except I had to add an extra filter since I had a comment a while back that includes a list of links. The relevant portion is:



f = open(filename, 'r')

links = [line for line in f.readlines() if line.startswith('http://') and 'metafilter.com' in line]

f.close()

posted by mysterpigg at 10:22 AM on February 28, 2012

Out of curiosity, is there also a command we can add to grep that would allow us to filter out links which have metafilter.com in them? It just occurred to me that thanks to the quote script, most of my comments will begin like this:

pb: "hat's fairly easy. If you're using a Mac you can do it from the Terminal like this:"
posted by zarq at 10:23 AM on February 28, 2012

The grep command I posted will grab any line that starts with http:// in the file. And with the way the comments export file is formatted, it's going to be a metafilter.com link every time. This is for extracting URLs of comments. Your quoting style there doesn't start with http://, it starts with <a href so it wouldn't be included in the list of links.

Trying to find any links contained within your comments is a separate task.
posted by pb (staff) at 10:35 AM on February 28, 2012

Kimberly: “If all I can get out of this is an automated way to pull the links out of the rest of the content or have everything defined as a separate field, that would be better than nothing.”

pb: “That's fairly easy. If you're using a Mac you can do it from the Terminal like this...”

Or, if you don't want to do anything on a command-line, most good text editors (like the aforementioned Notepad++) have a built-in "Sort" function. So you could just sort all lines, and the ones that start with http:// will all be in one chunk. Delete the rest, and there's your list.
posted by koeselitz at 10:41 AM on February 28, 2012

I just wrote a quick python script that should drop everything into a file with the (I think) correct Netscape bookmark file formatting based on pb's link. Using RegEx to find the URLs is probably overkill, but it should work and I wanted an excuse to use them.

Run it like this:

python [script filename].py [your comments file].txt [your bookmarks file].txt



import re, time, datetime
from sys import argv

script, readfile, writefile = argv

input_file = open(readfile)
target = open(writefile, 'w')

links = re.findall('\nhttp:\/\/[a-zA-Z]+.metafilter.com\/[0-9]+\/[a-zA-Z0-9-]+#[0-9]{6,7}', input_file.read())

target.write("""<!DOCTYPE NETSCAPE-Bookmark-file-1>
	<!--This is an automatically generated file.
	It will be read and overwritten.
	Do Not Edit! -->
	<Title>Bookmarks</Title>
	<H1>Bookmarks</H1>
	<DL>
""")

for item in links:
	trimmed_item = item.lstrip('\n')
	date = str(time.time())
	line = '\t<DT><A HREF="'+trimmed_item+'" ADD_DATE="'+date+'", LAST_VISIT="'+date+'", LAST_MODIFIED="'+date+'">'+trimmed_item+'</A></DT>\n'
	target.write(line)

posted by The Michael The at 10:49 AM on February 28, 2012

Ah, thanks, pb.
posted by zarq at 10:49 AM on February 28, 2012

Actually, add this to the end, indented: </DL>
posted by The Michael The at 10:49 AM on February 28, 2012

Okay, one more, here's the final script; it worked for importing into Firefox 10:

import re, time, datetime
from sys import argv

script, readfile, writefile = argv

input_file = open(readfile)
target = open(writefile, 'w')

links = re.findall('\nhttp:\/\/[a-zA-Z]+.metafilter.com\/[0-9]+\/[a-zA-Z0-9-]+#[0-9]{6,7}', input_file.read())

target.write("""<!DOCTYPE NETSCAPE-Bookmark-file-1>
	<!--This is an automatically generated file.
	It will be read and overwritten.
	Do Not Edit! -->
	<Title>Bookmarks</Title>
	<H1>Bookmarks</H1>
	<DL>
""")

for item in links:
	trimmed_item = item.lstrip('\n')
	date = str(time.time())
	line = '\t<DT><A HREF="'+trimmed_item+'" ADD_DATE="'+date+'", LAST_VISIT="'+date+'", LAST_MODIFIED="'+date+'">'+trimmed_item+'</A></DT>\n'
	target.write(line)

target.write("\t</DL>")

input_file.close()
target.close()

Run it like above, just make sure to save it into a .html file or change the extension before importing.
posted by The Michael The at 11:02 AM on February 28, 2012 [1 favorite]

Awesome! Thank you so much The Michael The.

So let's pretend I've never run a python script in my life and need some direction on how to make that go. What would be a good resource so I can educate myself?

(I have some experience with programming for the web including ColdFusion so I'm not a complete novice and can follow directions if that matters.)
posted by Kimberly at 11:02 AM on February 28, 2012

If you're on a Mac, you already have python installed.

Now, open a text editor, paste the code in, and save it with the extension .py. Let's say "comment_script.py".

Use the text editor to make a blank file called "mefi_bookmark_file.html"

Make sure that file, comment_script.py, and my-mefi-comments.txt are all in the same directory. Open Terminal and navigate to that directory. Run the script like this:

python comment_script.py my-mefi-comments.txt mefi_bookmark_file.html

If you're on Windows, you'll have to install Python (I think?). Instructions. Beyond that, I don't have a machine in front of me to write out a step-by-step, so hopefully someone else can step in and help if necessary.
posted by The Michael The at 11:11 AM on February 28, 2012

Works for me, The Michael The, nice work. Here's how it works:

1.) Copy the code.
2.) Paste the code into a new text file, name it comments-bookmarks.py
3.) Move the file to the same directory as my-mefi-comments.txt
4.) Open Terminal if you're on a Mac.
5.) Go to your working directory.
6.) Type: python [script] [input file] [output file]
7.) So: python comments-bookmarks.py my-mefi-comments.txt my-mefi-comments.html

Now you can use my-mefi-comments.html at Delicious or Pinboard. If you're on Windows you might need to install Python.
posted by pb (staff) at 11:13 AM on February 28, 2012

On non-preview, what The Michael The said.
posted by pb (staff) at 11:14 AM on February 28, 2012

I'm on a pc.
posted by Kimberly at 11:15 AM on February 28, 2012

I will follow those instructions! Thanks guys!
posted by Kimberly at 11:16 AM on February 28, 2012

Looks like The Michael The answered the question in regards to getting the bookmark file. Since I was worried about links in comments, I redid mine in a more "state-machine" format, such that it tracks what line (datetime/url/comment text) you are on. Someone could theoretically combine the two if they felt that it was necessary:



def parse_mefi_comments(filename):

    from time import strptime

    f = open(filename, 'r')



    ST_DATE = 0

    ST_LINK = 1

    ST_TEXT = 2



    comments = []

    dtformat = '%Y-%m-%d %H:%M:%S.%f\n'

    comment_text = ''

    comment_time = comment_link = None

    state = ST_DATE



    for line in f.readlines():

        if state == ST_DATE:

            comment_time = strptime(line, dtformat)

            state = ST_LINK

        elif state == ST_LINK:

            comment_link = line.rstrip('\n')

            state = ST_TEXT

        elif state == ST_TEXT:

            if line == '-----\n':

                comments.append( (comment_time, comment_link, comment_text) )

                # reset

                comment_text = ''

                comment_time = comment_link = None

                state = ST_DATE

            else:

                comment_text += line

    f.close()

    return comments



if __name__=='__main__':

    comments = parse_mefi_comments('my-mefi-comments.txt')

    for comment in comments:

        print 'DATE:',comment[0]

        print 'LINK:',comment[1]

        print '----------------'

        print '%s' % comment[2] # format carriage returns

        print '----------------'

posted by mysterpigg at 11:17 AM on February 28, 2012

doh, forgot pre tag:

def parse_mefi_comments(filename):
    from time import strptime
    f = open(filename, 'r')

    ST_DATE = 0
    ST_LINK = 1
    ST_TEXT = 2

    comments = []
    dtformat = '%Y-%m-%d %H:%M:%S.%f\n'
    comment_text = ''
    comment_time = comment_link = None
    state = ST_DATE

    for line in f.readlines():
        if state == ST_DATE:
            comment_time = strptime(line, dtformat)
            state = ST_LINK
        elif state == ST_LINK:
            comment_link = line.rstrip('\n')
            state = ST_TEXT
        elif state == ST_TEXT:
            if line == '-----\n':
                comments.append( (comment_time, comment_link, comment_text) )
                # reset
                comment_text = ''
                comment_time = comment_link = None
                state = ST_DATE
            else:
                comment_text += line
    f.close()
    return comments

if __name__=='__main__':
    comments = parse_mefi_comments('my-mefi-comments.txt')
    for comment in comments:
        print 'DATE:',comment[0]
        print 'LINK:',comment[1]
        print '----------------'
        print '%s' % comment[2] # format carriage returns
        print '----------------'

posted by mysterpigg at 11:18 AM on February 28, 2012

Nice, mysterpigg! I thought about rewriting mine later to create a script that created tuples from the timestamps and URLs; I think I like your approach better.

Also, I got kudos from pb today. Best. MeFi day. Ever.
posted by The Michael The at 11:21 AM on February 28, 2012 [1 favorite]

Nice, mysterpigg! I thought about rewriting mine later to create a script that created tuples from the timestamps and URLs; I think I like your approach better.

Yeah, like I said, looks like you answered the question as is, I didn't quite get that far and figured I'd put up what I had since it was a slightly different approach.

Also, I got kudos from pb today. Best. MeFi day. Ever.

So close... :)
posted by mysterpigg at 1:47 PM on February 28, 2012 [1 favorite]

perl -e '$/="\r\n-----\r\n";while($r=<>){($d,$u)=split"\r\n",$r,3;$d=~s{\.\d+$}{};($l=$u)=~s{.*/}{};$l=~s{#\d+$}{};$l=~s{-}{ }g;push@r,[$d,$u,$l];}BEGIN{print"<!DOCTYPE NETSCAPE-Bookmark-file-1><!--This is an automatically generated file. It will be read and overwritten. Do Not Edit! --><title>Bookmarks</title><h1>Bookmarks</h1><dl>";}END{printf qq[<dt>%s: <a href="%s" add_date="%s" last_visit="%s" last_modified="%s">%s</a></dt>],@$_[0,1],(time)x3,$_->[2]for@r}' < my-mefi-comments.txt > bookmarks.html

Being able to set the INPUT_RECORD_SEPARATOR to "\r\n-----\r\n" makes it easy to read one 'chunk' at a time, then each chunk can be split on "\r\n" to get the date,url from the first two lines. Some cleanup of the date, and link text is the url stripped of the path and the anchor and '-'s converted back to spaces (not necessarily correct). Then just push the link info onto a list and at the BEGIN dump a header and at the END generate '<dt>' anchors for the links.

Maybe Metafilter can provide JSON formatted dumps in the future. :P
posted by zengargoyle at 2:06 PM on February 28, 2012

Somewhat related, can we assume any of:

data is UTF-8
is CRLF
IS NOT NULL (there will be non-empty date, url, comment)
ordered most recent to least recent

posted by zengargoyle at 2:30 PM on February 28, 2012

Yep, I can verify that those are all good assumptions.
posted by pb (staff) at 3:13 PM on February 28, 2012

I've niced it up a little, but wonder how strict the Netscape format has to be... ATM it's strictly following the Netscape Bookmark File Format pb posted earlier. With the sub-site matched for a nice label, and the comment text HTML stripped and nicely chopped to 70 chars or so to use with the time of the post as the link text.

(MetaFilter) Dont take it personally        # thread = container = h3
(2012-02-27 21:25:58) Robot Roomba pickers , a TED talk.   # link via comment

I did a version with an <a> link in the container that pointed to the thread itself, and left the comment text outside of the <a> link in the list of shortcuts. But I fear that anything using this this format for import might not like having links in headers and text outside of links or even in <dd> elements.

Has anybody tried feeding Delicious, et. al. non strictly compliant data?
posted by zengargoyle at 7:05 PM on February 28, 2012

A while back I wrote a Python script that converts the Metafilter comment export file into XML.

It is fairly rough but it does work, IIRC. You can choose at runtime whether to munge the HTML inside comments or wrap them in CDATA to preserve them as-is.

From XML, you can do whatever you want with them ... you could create a bookmarks file with some XSLT, or load them all in a database, etc. (The latter was my goal at one point but I'm not really sure why. Seemed like a good idea one evening, I guess.)
posted by Kadin2048 at 7:10 PM on February 28, 2012

A slightly heavier Perl script with a few dependencies that can generate a slightly fancier version in addition to the bare Netscape format. Has per-thread folders, titles with spaces (not dashes), and the first 72 or so HTML stripped characters of the comment as the link text. The fancy version adds a 'AskMeFi:' like sub-site id to the thread title and a '(YYYY-MM-DD HH:MM:SS)' to the link text (configurable time format if you grok strftime formats). Depends on a few modules that may or may not need installing.

my-mefi-bookmarks (Gist).
posted by zengargoyle at 5:02 AM on March 1, 2012 [1 favorite]

« Older Fly your favorite flag | Mefi Medals Newer »

You are not logged in, either login or create an account to post comments

MetaTalk

Does someone have a script or something? February 28, 2012 8:06 AM Subscribe

Tags

Share