Does someone have a script or something? February 28, 2012 8:06 AM Subscribe
I would like to take the links from the text file of my exported comments (found in my edit profile page) and turn them into a bookmark file that I can import. I would like to avoid doing that manually. Does someone know of a relatively easy way to accomplish this?
This is basically what the information in the file looks like:
2012-02-27 09:44:28.203
http://ask.metafilter.com/209185/How-to-get-business-search-results-to-change-contact-info-in-one-or-two-fell-swoops#3017384
[comment text]
-----
2012-02-27 09:23:42.123
http://ask.metafilter.com/209200/Cats-and-Carpet#3017347
[comment text]
This is basically what the information in the file looks like:
2012-02-27 09:44:28.203
http://ask.metafilter.com/209185/How-to-get-business-search-results-to-change-contact-info-in-one-or-two-fell-swoops#3017384
[comment text]
-----
2012-02-27 09:23:42.123
http://ask.metafilter.com/209200/Cats-and-Carpet#3017347
[comment text]
What a weirdly difficult thing this seems to be to do. I can think of hacky ways to do it with Notepad++, though.
posted by koeselitz at 9:24 AM on February 28, 2012
posted by koeselitz at 9:24 AM on February 28, 2012
Right? If all I can get out of this is an automated way to pull the links out of the rest of the content or have everything defined as a separate field, that would be better than nothing.
posted by Kimberly at 9:29 AM on February 28, 2012
posted by Kimberly at 9:29 AM on February 28, 2012
I think I'm missing something, but do you want a link to the thread or comment, and if the comment: all at the same level, or nested in sub-folders by thread?
posted by BrotherCaine at 9:34 AM on February 28, 2012
posted by BrotherCaine at 9:34 AM on February 28, 2012
Or are you looking for links embedded inside the comments?
posted by BrotherCaine at 9:35 AM on February 28, 2012
posted by BrotherCaine at 9:35 AM on February 28, 2012
The links to the comments are in there (and that's what I want). Currently they are all at the same level which is fine, but sub-folders by thread would be awesome.
posted by Kimberly at 9:36 AM on February 28, 2012
posted by Kimberly at 9:36 AM on February 28, 2012
So in other words, there's this text file that has links to all the comments I've made as well as the comments themselves. That's nifty, but I want to be able to export the links to the comments I've made to places (delicious is a good example) and in order to do that I need a bookmark file. I don't really need the text of the comments per se.
posted by Kimberly at 9:38 AM on February 28, 2012
posted by Kimberly at 9:38 AM on February 28, 2012
If all I can get out of this is an automated way to pull the links out of the rest of the content or have everything defined as a separate field, that would be better than nothing.
That's fairly easy. If you're using a Mac you can do it from the Terminal like this:
That will give you a list of all your comment URLs. But to create your bookmark file I think you'll also want the date/time of the comment and a brief excerpt of the comment with the HTML stripped. That's all possible, but it'll take someone some time to put together.
posted by pb (staff) at 10:12 AM on February 28, 2012
That's fairly easy. If you're using a Mac you can do it from the Terminal like this:
grep -e '^http://.*' my-mefi-comments.txt
That will give you a list of all your comment URLs. But to create your bookmark file I think you'll also want the date/time of the comment and a brief excerpt of the comment with the HTML stripped. That's all possible, but it'll take someone some time to put together.
posted by pb (staff) at 10:12 AM on February 28, 2012
I just threw together a python script that is close to what pb's grep line does, except I had to add an extra filter since I had a comment a while back that includes a list of links. The relevant portion is:
posted by mysterpigg at 10:22 AM on February 28, 2012
f = open(filename, 'r')
links = [line for line in f.readlines() if line.startswith('http://') and 'metafilter.com' in line]
f.close()
posted by mysterpigg at 10:22 AM on February 28, 2012
Out of curiosity, is there also a command we can add to grep that would allow us to filter out links which have metafilter.com in them? It just occurred to me that thanks to the quote script, most of my comments will begin like this:
pb: "hat's fairly easy. If you're using a Mac you can do it from the Terminal like this:"
posted by zarq at 10:23 AM on February 28, 2012
pb: "hat's fairly easy. If you're using a Mac you can do it from the Terminal like this:"
posted by zarq at 10:23 AM on February 28, 2012
The grep command I posted will grab any line that starts with
Trying to find any links contained within your comments is a separate task.
posted by pb (staff) at 10:35 AM on February 28, 2012
http://
in the file. And with the way the comments export file is formatted, it's going to be a metafilter.com link every time. This is for extracting URLs of comments. Your quoting style there doesn't start with http://
, it starts with <a href
so it wouldn't be included in the list of links.Trying to find any links contained within your comments is a separate task.
posted by pb (staff) at 10:35 AM on February 28, 2012
Kimberly: “If all I can get out of this is an automated way to pull the links out of the rest of the content or have everything defined as a separate field, that would be better than nothing.”
pb: “That's fairly easy. If you're using a Mac you can do it from the Terminal like this...”
Or, if you don't want to do anything on a command-line, most good text editors (like the aforementioned Notepad++) have a built-in "Sort" function. So you could just sort all lines, and the ones that start with http:// will all be in one chunk. Delete the rest, and there's your list.
posted by koeselitz at 10:41 AM on February 28, 2012
pb: “That's fairly easy. If you're using a Mac you can do it from the Terminal like this...”
Or, if you don't want to do anything on a command-line, most good text editors (like the aforementioned Notepad++) have a built-in "Sort" function. So you could just sort all lines, and the ones that start with http:// will all be in one chunk. Delete the rest, and there's your list.
posted by koeselitz at 10:41 AM on February 28, 2012
I just wrote a quick python script that should drop everything into a file with the (I think) correct Netscape bookmark file formatting based on pb's link. Using RegEx to find the URLs is probably overkill, but it should work and I wanted an excuse to use them.
Run it like this:
posted by The Michael The at 10:49 AM on February 28, 2012
Run it like this:
python [script filename].py [your comments file].txt [your bookmarks file].txt
import re, time, datetime
from sys import argv
script, readfile, writefile = argv
input_file = open(readfile)
target = open(writefile, 'w')
links = re.findall('\nhttp:\/\/[a-zA-Z]+.metafilter.com\/[0-9]+\/[a-zA-Z0-9-]+#[0-9]{6,7}', input_file.read())
target.write("""<!DOCTYPE NETSCAPE-Bookmark-file-1>
<!--This is an automatically generated file.
It will be read and overwritten.
Do Not Edit! -->
<Title>Bookmarks</Title>
<H1>Bookmarks</H1>
<DL>
""")
for item in links:
trimmed_item = item.lstrip('\n')
date = str(time.time())
line = '\t<DT><A HREF="'+trimmed_item+'" ADD_DATE="'+date+'", LAST_VISIT="'+date+'", LAST_MODIFIED="'+date+'">'+trimmed_item+'</A></DT>\n'
target.write(line)
posted by The Michael The at 10:49 AM on February 28, 2012
Actually, add this to the end, indented:
posted by The Michael The at 10:49 AM on February 28, 2012
</DL>
posted by The Michael The at 10:49 AM on February 28, 2012
Okay, one more, here's the final script; it worked for importing into Firefox 10:
Run it like above, just make sure to save it into a .html file or change the extension before importing.
posted by The Michael The at 11:02 AM on February 28, 2012 [1 favorite]
import re, time, datetime
from sys import argv
script, readfile, writefile = argv
input_file = open(readfile)
target = open(writefile, 'w')
links = re.findall('\nhttp:\/\/[a-zA-Z]+.metafilter.com\/[0-9]+\/[a-zA-Z0-9-]+#[0-9]{6,7}', input_file.read())
target.write("""<!DOCTYPE NETSCAPE-Bookmark-file-1>
<!--This is an automatically generated file.
It will be read and overwritten.
Do Not Edit! -->
<Title>Bookmarks</Title>
<H1>Bookmarks</H1>
<DL>
""")
for item in links:
trimmed_item = item.lstrip('\n')
date = str(time.time())
line = '\t<DT><A HREF="'+trimmed_item+'" ADD_DATE="'+date+'", LAST_VISIT="'+date+'", LAST_MODIFIED="'+date+'">'+trimmed_item+'</A></DT>\n'
target.write(line)
target.write("\t</DL>")
input_file.close()
target.close()
Run it like above, just make sure to save it into a .html file or change the extension before importing.
posted by The Michael The at 11:02 AM on February 28, 2012 [1 favorite]
Awesome! Thank you so much The Michael The.
So let's pretend I've never run a python script in my life and need some direction on how to make that go. What would be a good resource so I can educate myself?
(I have some experience with programming for the web including ColdFusion so I'm not a complete novice and can follow directions if that matters.)
posted by Kimberly at 11:02 AM on February 28, 2012
So let's pretend I've never run a python script in my life and need some direction on how to make that go. What would be a good resource so I can educate myself?
(I have some experience with programming for the web including ColdFusion so I'm not a complete novice and can follow directions if that matters.)
posted by Kimberly at 11:02 AM on February 28, 2012
If you're on a Mac, you already have python installed.
Now, open a text editor, paste the code in, and save it with the extension .py. Let's say "comment_script.py".
Use the text editor to make a blank file called "mefi_bookmark_file.html"
Make sure that file, comment_script.py, and my-mefi-comments.txt are all in the same directory. Open Terminal and navigate to that directory. Run the script like this:
If you're on Windows, you'll have to install Python (I think?). Instructions. Beyond that, I don't have a machine in front of me to write out a step-by-step, so hopefully someone else can step in and help if necessary.
posted by The Michael The at 11:11 AM on February 28, 2012
Now, open a text editor, paste the code in, and save it with the extension .py. Let's say "comment_script.py".
Use the text editor to make a blank file called "mefi_bookmark_file.html"
Make sure that file, comment_script.py, and my-mefi-comments.txt are all in the same directory. Open Terminal and navigate to that directory. Run the script like this:
python comment_script.py my-mefi-comments.txt mefi_bookmark_file.html
If you're on Windows, you'll have to install Python (I think?). Instructions. Beyond that, I don't have a machine in front of me to write out a step-by-step, so hopefully someone else can step in and help if necessary.
posted by The Michael The at 11:11 AM on February 28, 2012
Works for me, The Michael The, nice work. Here's how it works:
1.) Copy the code.
2.) Paste the code into a new text file, name it
3.) Move the file to the same directory as
4.) Open Terminal if you're on a Mac.
5.) Go to your working directory.
6.) Type:
7.) So:
Now you can use
posted by pb (staff) at 11:13 AM on February 28, 2012
1.) Copy the code.
2.) Paste the code into a new text file, name it
comments-bookmarks.py
3.) Move the file to the same directory as
my-mefi-comments.txt
4.) Open Terminal if you're on a Mac.
5.) Go to your working directory.
6.) Type:
python [script] [input file] [output file]
7.) So:
python comments-bookmarks.py my-mefi-comments.txt my-mefi-comments.html
Now you can use
my-mefi-comments.html
at Delicious or Pinboard. If you're on Windows you might need to install Python.posted by pb (staff) at 11:13 AM on February 28, 2012
Looks like The Michael The answered the question in regards to getting the bookmark file. Since I was worried about links in comments, I redid mine in a more "state-machine" format, such that it tracks what line (datetime/url/comment text) you are on. Someone could theoretically combine the two if they felt that it was necessary:
posted by mysterpigg at 11:17 AM on February 28, 2012
def parse_mefi_comments(filename):
from time import strptime
f = open(filename, 'r')
ST_DATE = 0
ST_LINK = 1
ST_TEXT = 2
comments = []
dtformat = '%Y-%m-%d %H:%M:%S.%f\n'
comment_text = ''
comment_time = comment_link = None
state = ST_DATE
for line in f.readlines():
if state == ST_DATE:
comment_time = strptime(line, dtformat)
state = ST_LINK
elif state == ST_LINK:
comment_link = line.rstrip('\n')
state = ST_TEXT
elif state == ST_TEXT:
if line == '-----\n':
comments.append( (comment_time, comment_link, comment_text) )
# reset
comment_text = ''
comment_time = comment_link = None
state = ST_DATE
else:
comment_text += line
f.close()
return comments
if __name__=='__main__':
comments = parse_mefi_comments('my-mefi-comments.txt')
for comment in comments:
print 'DATE:',comment[0]
print 'LINK:',comment[1]
print '----------------'
print '%s' % comment[2] # format carriage returns
print '----------------'
posted by mysterpigg at 11:17 AM on February 28, 2012
doh, forgot pre tag:
def parse_mefi_comments(filename):
from time import strptime
f = open(filename, 'r')
ST_DATE = 0
ST_LINK = 1
ST_TEXT = 2
comments = []
dtformat = '%Y-%m-%d %H:%M:%S.%f\n'
comment_text = ''
comment_time = comment_link = None
state = ST_DATE
for line in f.readlines():
if state == ST_DATE:
comment_time = strptime(line, dtformat)
state = ST_LINK
elif state == ST_LINK:
comment_link = line.rstrip('\n')
state = ST_TEXT
elif state == ST_TEXT:
if line == '-----\n':
comments.append( (comment_time, comment_link, comment_text) )
# reset
comment_text = ''
comment_time = comment_link = None
state = ST_DATE
else:
comment_text += line
f.close()
return comments
if __name__=='__main__':
comments = parse_mefi_comments('my-mefi-comments.txt')
for comment in comments:
print 'DATE:',comment[0]
print 'LINK:',comment[1]
print '----------------'
print '%s' % comment[2] # format carriage returns
print '----------------'
posted by mysterpigg at 11:18 AM on February 28, 2012 Nice, mysterpigg! I thought about rewriting mine later to create a script that created tuples from the timestamps and URLs; I think I like your approach better.
Also, I got kudos from pb today. Best. MeFi day. Ever.
posted by The Michael The at 11:21 AM on February 28, 2012 [1 favorite]
Also, I got kudos from pb today. Best. MeFi day. Ever.
posted by The Michael The at 11:21 AM on February 28, 2012 [1 favorite]
Nice, mysterpigg! I thought about rewriting mine later to create a script that created tuples from the timestamps and URLs; I think I like your approach better.
Yeah, like I said, looks like you answered the question as is, I didn't quite get that far and figured I'd put up what I had since it was a slightly different approach.
Also, I got kudos from pb today. Best. MeFi day. Ever.
So close... :)
posted by mysterpigg at 1:47 PM on February 28, 2012 [1 favorite]
Yeah, like I said, looks like you answered the question as is, I didn't quite get that far and figured I'd put up what I had since it was a slightly different approach.
Also, I got kudos from pb today. Best. MeFi day. Ever.
So close... :)
posted by mysterpigg at 1:47 PM on February 28, 2012 [1 favorite]
perl -e '$/="\r\n-----\r\n";while($r=<>){($d,$u)=split"\r\n",$r,3;$d=~s{\.\d+$}{};($l=$u)=~s{.*/}{};$l=~s{#\d+$}{};$l=~s{-}{ }g;push@r,[$d,$u,$l];}BEGIN{print"<!DOCTYPE NETSCAPE-Bookmark-file-1><!--This is an automatically generated file. It will be read and overwritten. Do Not Edit! --><title>Bookmarks</title><h1>Bookmarks</h1><dl>";}END{printf qq[<dt>%s: <a href="%s" add_date="%s" last_visit="%s" last_modified="%s">%s</a></dt>],@$_[0,1],(time)x3,$_->[2]for@r}' < my-mefi-comments.txt > bookmarks.html
Being able to set the INPUT_RECORD_SEPARATOR to "\r\n-----\r\n" makes it easy to read one 'chunk' at a time, then each chunk can be split on "\r\n" to get the date,url from the first two lines. Some cleanup of the date, and link text is the url stripped of the path and the anchor and '-'s converted back to spaces (not necessarily correct). Then just push the link info onto a list and at the BEGIN dump a header and at the END generate '<dt>' anchors for the links.
Maybe Metafilter can provide JSON formatted dumps in the future. :P
posted by zengargoyle at 2:06 PM on February 28, 2012
Somewhat related, can we assume any of:
posted by zengargoyle at 2:30 PM on February 28, 2012
- data is UTF-8
- is CRLF
- IS NOT NULL (there will be non-empty date, url, comment)
- ordered most recent to least recent
posted by zengargoyle at 2:30 PM on February 28, 2012
Yep, I can verify that those are all good assumptions.
posted by pb (staff) at 3:13 PM on February 28, 2012
posted by pb (staff) at 3:13 PM on February 28, 2012
I've niced it up a little, but wonder how strict the Netscape format has to be... ATM it's strictly following the Netscape Bookmark File Format pb posted earlier. With the sub-site matched for a nice label, and the comment text HTML stripped and nicely chopped to 70 chars or so to use with the time of the post as the link text.
Has anybody tried feeding Delicious, et. al. non strictly compliant data?
posted by zengargoyle at 7:05 PM on February 28, 2012
(MetaFilter) Dont take it personally # thread = container = h3 (2012-02-27 21:25:58) Robot Roomba pickers , a TED talk. # link via commentI did a version with an <a> link in the container that pointed to the thread itself, and left the comment text outside of the <a> link in the list of shortcuts. But I fear that anything using this this format for import might not like having links in headers and text outside of links or even in <dd> elements.
Has anybody tried feeding Delicious, et. al. non strictly compliant data?
posted by zengargoyle at 7:05 PM on February 28, 2012
A while back I wrote a Python script that converts the Metafilter comment export file into XML.
It is fairly rough but it does work, IIRC. You can choose at runtime whether to munge the HTML inside comments or wrap them in CDATA to preserve them as-is.
From XML, you can do whatever you want with them ... you could create a bookmarks file with some XSLT, or load them all in a database, etc. (The latter was my goal at one point but I'm not really sure why. Seemed like a good idea one evening, I guess.)
posted by Kadin2048 at 7:10 PM on February 28, 2012
It is fairly rough but it does work, IIRC. You can choose at runtime whether to munge the HTML inside comments or wrap them in CDATA to preserve them as-is.
From XML, you can do whatever you want with them ... you could create a bookmarks file with some XSLT, or load them all in a database, etc. (The latter was my goal at one point but I'm not really sure why. Seemed like a good idea one evening, I guess.)
posted by Kadin2048 at 7:10 PM on February 28, 2012
A slightly heavier Perl script with a few dependencies that can generate a slightly fancier version in addition to the bare Netscape format. Has per-thread folders, titles with spaces (not dashes), and the first 72 or so HTML stripped characters of the comment as the link text. The fancy version adds a 'AskMeFi:' like sub-site id to the thread title and a '(YYYY-MM-DD HH:MM:SS)' to the link text (configurable time format if you grok strftime formats). Depends on a few modules that may or may not need installing.
my-mefi-bookmarks (Gist).
posted by zengargoyle at 5:02 AM on March 1, 2012 [1 favorite]
my-mefi-bookmarks (Gist).
posted by zengargoyle at 5:02 AM on March 1, 2012 [1 favorite]
You are not logged in, either login or create an account to post comments
posted by pb (staff) at 8:18 AM on February 28, 2012 [2 favorites]