{"id":1443,"date":"2008-11-29T16:03:00","date_gmt":"2008-11-29T20:03:00","guid":{"rendered":"http:\/\/johncohn.org\/base\/2008\/11\/29\/friday-night-way-too-late\/"},"modified":"2008-11-29T16:03:00","modified_gmt":"2008-11-29T20:03:00","slug":"friday-night-way-too-late","status":"publish","type":"post","link":"http:\/\/johncohn.org\/base\/2008\/11\/29\/friday-night-way-too-late\/","title":{"rendered":"Friday night &#8211; way too late"},"content":{"rendered":"<div id=\"pBlogBody_452643862\" class=\"blogContent\">\n<p>\t\t\t\t\t\t    Yikes. I&#8217;ve been working in my lab all night and completely lost track of the time. It&#8217;s now about 2AM.. and I realized that I hadn&#8217;t blogged.&nbsp; I&#8217;ve spent many hours in my lab during this break.. I realize just how much I love just tinkering in there.. If only I could make a living doing that. <br \/>&nbsp;I had&nbsp; a good quiet day.. not too much to report.. The four of us had a nice sushi lunch together midday.. and that was about it. <\/p>\n<p>It&#8217;s too late to write much more.. but I did want to report on another hacking project I took on over this break.. I wrote a simple web page extractor in Python and managed to capture all of the text of my first two years of blogging. (here&#8217;s the code if anyone is interested&nbsp;&nbsp; )<\/p>\n<blockquote><p><small><\/small><small>f<i>rom BeautifulSoup import BeautifulSoup<br \/>import re, urllib2<br \/>output_file = file(&#8216;outputdata.txt&#8217;,&#8217;w&#8217;)<\/p>\n<p>def remove_html_tags(..<br \/>&nbsp;&nbsp;&nbsp; p = re.compile(r&#8217;<.*?>&#8216;)<br \/>&nbsp;&nbsp;&nbsp; return p.sub(&#8221;, data)<\/p>\n<p>def remove_extra_spaces(..<br \/>&nbsp;&nbsp;&nbsp; p = re.compile(r&#8217;..s+&#8217;)<br \/>&nbsp;&nbsp;&nbsp; return p.sub(&#8216; &#8216;, data)<\/p>\n<p>def read_entry(year,month,day):<br \/>&nbsp;&nbsp;&nbsp; url = &#8220;http:\/\/blog.myspace.com\/index.cfm?fuseaction=blog.view&amp;FriendID=105120181&amp;blogMonth=&#8221; + str(month) + &#8220;&amp;blogDay=&#8221; + str(day) + &#8220;&amp;blogYear=&#8221; + str(year)<br \/>&nbsp;&nbsp; print url<br \/>&nbsp;&nbsp;&nbsp; date = str(month) + &#8220;\/&#8221; + str(day) + &#8220;\/&#8221; + str(year)<br \/>&nbsp;&nbsp;&nbsp; print date + &#8216;..n&#8217;<br \/>&nbsp;&nbsp;&nbsp; output_file.write(date + &#8216;..n&#8217;)<br \/>&nbsp;&nbsp;&nbsp; <br \/>&nbsp;&nbsp;&nbsp; try:<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; resp = urllib2.urlopen(url,None,100)<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; output_file.write(url + &#8216;..n&#8217;)<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; html_code = resp.read()<\/p>\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; soup = BeautifulSoup(&#8221;.join(html_code))<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; subject= soup.find(&#8220;p&#8221;, { &#8220;class&#8221; : &#8220;blogSubject&#8221; })<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if subject != None:<br \/>&nbsp; <br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; subj = remove_extra_spaces(remove_html_tags(str(subject.contents1)))<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print subj + &#8216;..n&#8217;<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; output_file.write(subj + &#8216;..n&#8217;)<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; anyP = 0<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; table = soup.findAll(&#8220;table&#8221;, { &#8220;class&#8221; : &#8220;blog&#8221; }) <br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; allP = table0.findAll(&#8220;p&#8221;)<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for p in allP:<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; anyP = 1<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cont =&nbsp; remove_extra_spaces(remove_html_tags(str(p)))<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print cont + &#8216;..n&#8217;<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; output_file.write(cont + &#8216;..n&#8217;)<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if anyP == 0:<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print &#8220;&#8230; Empty content..n&#8221;<\/p>\n<p>&nbsp;&nbsp;&nbsp; except IOError, e:<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print e.reason<\/p>\n<p>for day in range(25, 31):<br \/>&nbsp;&nbsp;&nbsp; read_entry(2006,11,day)<br \/>for day in range(1, 32):<br \/>&nbsp;&nbsp;&nbsp; read_entry(2006,12,day)<br \/>for year in range(2007, 2009):<br \/>&nbsp;&nbsp;&nbsp; for month in range(1, 13):<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for day in range(1, 32):<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; read_entry(year,month,day)<\/i><br \/><\/small><\/p><\/blockquote>\n<p>It was fun figureing out how to do that because @$%ing MySpace doesnt&#8217; have an export feature.. that means you can&#8217;t back up you blog.. Now I have a backup of all the text..&nbsp;&nbsp; Just like last year, I found a program to analyze the text just for fun.. O fhte half a million words.. here&#8217;s what Wordle.com made of my blog. The bigger the word, the more frequently it was mentioned&#8230; Take a look&#8230; <\/p>\n<div align=\"center\"><img decoding=\"async\" src=\"http:\/\/i147.photobucket.com\/albums\/r319\/johncohn\/nov2808\/samwordle.jpg\" \/><\/p>\n<div align=\"left\">The more I starte a ttit.. the more I think it reallydoes sonme up the second year &#8230;.<\/p>\n<p>OK.. I gotta sleep now.. More tomorrow.. G&#8217;note all. G&#8217;nite Sam !<br \/>-me<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Yikes. I&#8217;ve been working in my lab all night and completely lost track of the time. It&#8217;s now about 2AM.. and I realized that I hadn&#8217;t blogged.&nbsp; I&#8217;ve spent many hours in my lab during this break.. I realize just how much I love just tinkering in there.. If only I could make a living &hellip; <a href=\"http:\/\/johncohn.org\/base\/2008\/11\/29\/friday-night-way-too-late\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Friday night &#8211; way too late<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[1],"tags":[],"class_list":["post-1443","post","type-post","status-publish","format-standard","hentry"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/johncohn.org\/base\/wp-json\/wp\/v2\/posts\/1443","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/johncohn.org\/base\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/johncohn.org\/base\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/johncohn.org\/base\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/johncohn.org\/base\/wp-json\/wp\/v2\/comments?post=1443"}],"version-history":[{"count":0,"href":"http:\/\/johncohn.org\/base\/wp-json\/wp\/v2\/posts\/1443\/revisions"}],"wp:attachment":[{"href":"http:\/\/johncohn.org\/base\/wp-json\/wp\/v2\/media?parent=1443"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/johncohn.org\/base\/wp-json\/wp\/v2\/categories?post=1443"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/johncohn.org\/base\/wp-json\/wp\/v2\/tags?post=1443"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}