<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Shane's Blog &#187; Geek</title>
	<atom:link href="http://sbutler.com/blog/category/geek/feed/" rel="self" type="application/rss+xml" />
	<link>http://sbutler.com/blog</link>
	<description>data mining and things i find interesting</description>
	<lastBuildDate>Sat, 10 May 2008 02:14:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Winning the DARPA Grand Challenge</title>
		<link>http://sbutler.com/blog/2006/09/grand-challenge-video/</link>
		<comments>http://sbutler.com/blog/2006/09/grand-challenge-video/#comments</comments>
		<pubDate>Sun, 17 Sep 2006 04:21:05 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/09/grand-challenge-video/</guid>
		<description><![CDATA[Sebastian Thrun of Stanford Racing gives a great a talk on what it took build an autonomous vehicle to win the DARPA Grand Challenge. There are lots of cool technical details on the use of machine learning to achieve this. You can watch it on Google Video here.]]></description>
			<content:encoded><![CDATA[<p>Sebastian Thrun of <a href="http://www.stanfordracing.org/">Stanford Racing</a> gives a great a talk on <a href="http://video.google.com/videoplay?docid=8594517128412883394">what it took build an autonomous vehicle</a> to win the <a href="http://www.darpa.mil/grandchallenge/index.asp">DARPA Grand Challenge</a>. There are lots of cool technical details on the use of machine learning to achieve this. You can watch it on Google Video <a href="http://video.google.com/videoplay?docid=8594517128412883394">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/09/grand-challenge-video/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>In-cell Graphing</title>
		<link>http://sbutler.com/blog/2006/08/in-cell-graphing/</link>
		<comments>http://sbutler.com/blog/2006/08/in-cell-graphing/#comments</comments>
		<pubDate>Fri, 11 Aug 2006 09:30:11 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Geek]]></category>
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/08/in-cell-graphing/</guid>
		<description><![CDATA[The guys from Juice Analytics have put together an interesting series on in cell graphing (parts 1, 2, &#038; 3). This is a feature that is due in the upcoming version of Excel 2007, however the technique the Juice guys use works across all versions of Excel and is quite visually appealing too. Added bonus, [...]]]></description>
			<content:encoded><![CDATA[<p>The guys from <a target="_blank" href="http://juiceanalytics.com/weblog/">Juice Analytics</a> have put together an interesting series on in cell graphing (parts <a target="_blank" href="http://www.juiceanalytics.com/weblog/?p=236">1</a>, <a href="http://www.juiceanalytics.com/weblog/?p=239">2</a>, &#038; <a target="_blank" href="http://www.juiceanalytics.com/weblog/?p=240">3</a>). This is a feature that is due in the upcoming version of Excel 2007, however the technique the Juice guys use works across all versions of Excel and is quite visually appealing too. Added bonus, I can confirm it works in <a href="http://openoffice.org">OpenOffice.org</a>, <a href="http://www.gnome.org/projects/gnumeric/">Gnumeric</a> and even <a target="_blank" href="http://spreadsheets.google.com">Google Spreadsheets</a> (all to varying degrees).</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/08/in-cell-graphing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Smart SPAM &amp; Fighting it</title>
		<link>http://sbutler.com/blog/2006/05/smart-spam/</link>
		<comments>http://sbutler.com/blog/2006/05/smart-spam/#comments</comments>
		<pubDate>Sat, 13 May 2006 02:26:28 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/05/smart-spam/</guid>
		<description><![CDATA[For any machine learning based SPAM filters, such as the popular Bayesian methods, the key to success is the body of previously identified SPAM and HAM (valid emails) or training data. In order for the spammer to trick the filter, they must try to be more HAM-like. The way to beat this is by giving [...]]]></description>
			<content:encoded><![CDATA[<p>For any machine learning based SPAM filters, such as the popular Bayesian methods, the key to success is the body of previously identified SPAM and HAM (valid emails) or training data. In order for the spammer to trick the filter, they must try to be more HAM-like. The way to beat this is by giving your email classifier as much training data as possible, and continually updating it. Just learning from your company&#8217;s emails is probably not fool-proof when you consider the volume and variety of SPAM on the net. Web-based email on the other hand, like <a href="http://mail.google.com">Gmail</a> and <a href="https://www.google.com/hosted">the hosted version</a>, should never have this problem because the filter learns from thousands of user&#8217;s SPAM folders.</p>
<p>Researchers from University of Calgary <a href="http://pharos.cpsc.ucalgary.ca/Dienst/UI/2.0/Describe/ncstrl.ucalgary_cs/2006-808-01">claim</a> that the next evolution of will be smart SPAM, which will infiltrate your computer via spyware/viruses and <a href="http://arstechnica.com/news.ars/post/20060502-6726.html">&#8216;mine&#8217; your emails</a>. By creating emails based on the your actual messages you&#8217;ve previously sent, the spammers hope they will be more believable to readers.</p>
<p>I would argue, however, that such a situation would merely make services Gmail, more attractive. Firstly because they have a truly massive body of knowledge to use to fine tune their spam filters, and secondly because it is unlikely such spyware could infiltrate a web-based system. Even if a program was distributed that waited for someone to log on and then took over, Google could have it effectively neutralised in a matter of hours.</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/05/smart-spam/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>DARPA Grand Challenge</title>
		<link>http://sbutler.com/blog/2006/05/darpa-urban-challenge/</link>
		<comments>http://sbutler.com/blog/2006/05/darpa-urban-challenge/#comments</comments>
		<pubDate>Thu, 04 May 2006 02:18:23 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Fun]]></category>
		<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/05/darpa-urban-challenge/</guid>
		<description><![CDATA[Start your engines, the DARPA Grand Challenge is on again only this time its an urban challenge! The last two competitions were to race an autonomous vehicle through a desert, with the 2005 winner, Standford, taking home a US$2 million prize. Stanford&#8217;s software in action: Input from GPS and many sensors feed the algorithms to [...]]]></description>
			<content:encoded><![CDATA[<p><em>Start your engines</em>, the <a href="http://www.darpa.mil/grandchallenge">DARPA Grand Challenge</a> is on again only this time its an urban challenge! The last two competitions were to race an autonomous  																vehicle through a desert, with the 2005 winner, <a href="http://www-cs.stanford.edu/group/roadrunner/">Standford</a>, taking home a US$2 million prize.</p>
<p><a title="stanford1.png" class="imagelink" href="http://sbutler.com/blog/wp-content/uploads/stanford1.png"><img alt="stanford1.png" id="image132" src="http://sbutler.com/blog/wp-content/uploads/stanford1.thumbnail.png" /></a>   <a title="stanford2.png" class="imagelink" href="http://sbutler.com/blog/wp-content/uploads/stanford2.png"><img alt="stanford2.png" id="image133" src="http://sbutler.com/blog/wp-content/uploads/stanford2.thumbnail.png" /></a><br clear="all" /><strong> Stanford&#8217;s software in action:</strong> Input from GPS and many sensors feed the algorithms to determine the safe path (see <a href="http://www.darpa.mil/grandchallenge05/TechPapers/Stanford.pdf">tech report</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/05/darpa-urban-challenge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Gmail for Backups</title>
		<link>http://sbutler.com/blog/2006/05/gmail-for-backups/</link>
		<comments>http://sbutler.com/blog/2006/05/gmail-for-backups/#comments</comments>
		<pubDate>Tue, 02 May 2006 23:55:56 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/05/gmail-for-backups/</guid>
		<description><![CDATA[While writing a thesis it is obviously imperative to have foolproof backups in place. So why not backup to that free 2.7Gb Gmail account? Here&#8217;s what you have to do: Install &#8220;email&#8221; (Gentoo users: emerge net-mail/email) Edit /etc/email/email.conf (Gentoo users: as a minimum you must set REPLY_TO) Test the commands. They are: cd /path/to/your/thesis/ tar [...]]]></description>
			<content:encoded><![CDATA[<p>While writing a thesis it is obviously imperative to have foolproof backups in place. So why not backup to that free 2.7Gb <a href="http://gmail.com">Gmail</a> account? Here&#8217;s what you have to do:</p>
<ol>
<li>Install &#8220;<em><a title="Command line email client called " href="http://email.cleancode.org/">email</a></em>&#8221; (Gentoo users: <code>emerge net-mail/email</code>)</li>
<li>Edit <code>/etc/email/email.conf</code> (Gentoo users: as a minimum you must set <code>REPLY_TO</code>)</li>
<li>Test the commands. They are:<br />
<code>cd /path/to/your/thesis/<br />
tar -czf /tmp/thesis.tar.gz *.*<br />
email --blank-mail --smtp-server <strong>mail.yourserver.com</strong> --from-name <strong>"your name"</strong> --from-addr  <strong>you@youremail.com</strong> --subject "Cron: Thesis Backup (`date`)" <strong>you@gmail.com</strong> --attach /tmp/thesis.tar.gz > /dev/null 2>&#038;1<br />
rm -f /tmp/thesis.tar.gz<br />
</code></li>
<li>Now add this as a <a href="http://www.adminschoice.com/docs/crontab.htm#Crontab%20file"><code>/etc/crontab</code> entry</a>. This example sends the backup at 7am each day.<br />
<code>0 7 * * *     <strong>unixusername</strong>     cd /path/to/your/thesis/; tar -czf /tmp/thesis.tar.gz *.*; email --blank-mail --smtp-server <strong>mail.yourserver.com</strong> --from-name <strong>"your name"</strong> --from-addr  <strong>you@youremail.com</strong> --subject "Cron: Thesis Backup (`date`)" <strong>you@gmail.com</strong> --attach /tmp/thesis.tar.gz > /dev/null 2>&#038;1; rm -f /tmp/thesis.tar.gz<br />
</code></li>
<li>Final step is to <a href="https://mail.google.com/support/bin/answer.py?answer=6579&#038;topic=1539">create a Gmail filter</a>! It would be nice if it was possible to stop the emails being downloaded via POP but I think this <a href="https://mail.google.com/support/bin/answer.py?answer=13291&#038;topic=1555">may require a filter that moves the incoming backup emails to Trash</a>.</li>
</ol>
<p>Obviously you don&#8217;t have to use this for backing up a thesis, it could easily be modified to backup whatever you want.<br />
Note: I can&#8217;t see mention of TLS support in the client <em>email</em>, so that&#8217;s why I&#8217;ve suggested you use your own SMTP server rather than Google&#8217;s.</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/05/gmail-for-backups/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Visualising Digg</title>
		<link>http://sbutler.com/blog/2006/05/digg-graph/</link>
		<comments>http://sbutler.com/blog/2006/05/digg-graph/#comments</comments>
		<pubDate>Tue, 02 May 2006 15:46:09 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/05/visualising-digg/</guid>
		<description><![CDATA[Digg, The Blog has info on a nice visualisation of activity on digg.com. Kevin mentions the zip-line effect in the videos are probably bots. Pretty cool!]]></description>
			<content:encoded><![CDATA[<p><em>Digg, The Blog</em> has info on a nice <a href="http://diggtheblog.blogspot.com/2006/05/visualizing-digg-data.html">visualisation of activity on digg.com</a>. Kevin mentions the zip-line effect in the videos are probably bots. Pretty cool!</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/05/digg-graph/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting to know R Graphs</title>
		<link>http://sbutler.com/blog/2006/04/getting-to-know-r-graphs/</link>
		<comments>http://sbutler.com/blog/2006/04/getting-to-know-r-graphs/#comments</comments>
		<pubDate>Fri, 07 Apr 2006 01:09:05 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/04/getting-to-know-r-graphs/</guid>
		<description><![CDATA[Check out the R Graph Gallery which includes not only detailed descriptions of graphs you can produce in R, but also R source! Props to Martin for the link.]]></description>
			<content:encoded><![CDATA[<p>Check out the <a href="http://addictedtor.free.fr/graphiques/">R Graph Gallery</a> which includes not only detailed descriptions of graphs you can produce in <a href="http://www.r-project.org/">R</a>, but also R source! Props to <a href="http://statgraphics.blog.com/644674/">Martin</a> for the link.</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/04/getting-to-know-r-graphs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What&#8217;s in a name?</title>
		<link>http://sbutler.com/blog/2006/04/whats-in-a-name/</link>
		<comments>http://sbutler.com/blog/2006/04/whats-in-a-name/#comments</comments>
		<pubDate>Wed, 05 Apr 2006 09:19:27 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/04/whats-in-a-name/</guid>
		<description><![CDATA[Dennis Forbes gives a fantastic analysis of one of the biggest databases on the Internet &#8211; the DNS records. His analysis includes insights into domain name length, personal and family name usage and other characteristics. For example, did you know that all 2- and 3-letter domains are taken? Dennis is planning a second part so [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.yafla.com/dforbes/">Dennis Forbes</a> gives a fantastic analysis of one of the biggest databases on the Internet &#8211; the <a href="http://www.yafla.com/dforbes/2006/03/29.html">DNS records</a>. His analysis includes insights into domain name length, personal and family name usage and other characteristics.  For example, did you know that all 2- and 3-letter domains are taken? Dennis is planning a second part so keep a look out for that too.</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/04/whats-in-a-name/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Future of Radio</title>
		<link>http://sbutler.com/blog/2006/03/pandora-musicminer/</link>
		<comments>http://sbutler.com/blog/2006/03/pandora-musicminer/#comments</comments>
		<pubDate>Tue, 28 Mar 2006 23:37:50 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Geek]]></category>
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/03/pandora-musicminer/</guid>
		<description><![CDATA[You may have listened to Internet radio before, but Pandora is a station of a different kind &#8211; totally personalised. Its a Flash based player (sorry Andy!) that sits inside your browser, so no problems with firewalls. But the real innovation is that when you start it up, you tell it the artists you like, [...]]]></description>
			<content:encoded><![CDATA[<p>You may have listened to <a href="http://www.shoutcast.com/">Internet radio</a> before, but <a href="http://www.pandora.com">Pandora</a> is a station of a different kind &#8211; totally personalised. Its a Flash based player (<a href="http://www.andybotting.com/mediawiki/index.php/Linux_on_a_15%22_Powerbook_1.67Ghz">sorry Andy!</a>) that sits inside your browser, so no problems with firewalls. But the real innovation is that when you start it up, you tell it the artists you like, and it will attempt to determine what other songs you will like too, and play those to your personal audio stream. As time progresses you can give each song played the thumbs-up or thumbs-down which will further refine what music is played to you!! Its not bad, but they should some more advanced techniques like <a href="http://musicminer.sourceforge.net/">MusicMiner</a> to better adapt to user tastes.</p>
<p>MusicMiner uses a <a href="http://en.wikipedia.org/wiki/Self-organising_map">Self-Organising Maps</a> based technique (&#8220;<a href="http://www.mathematik.uni-marburg.de/%7Edatabionics/en//?q=esom">Emergent SOM</a>&#8220;) to determine and visualise music similarity:</p>
<div style="text-align: center"><a href="http://musicminer.sourceforge.net/"><img alt="MusicMiner preview" id="image110" src="http://sbutler.com/blog/wp-content/uploads/musicminer.png" /><br />
</a></p>
<p align="left">The major advantage of MusicMiner is obviously you can use it on your own music collection and choose to play a particular song, whereas Pandora you can only define your interests and listen to see what is played. There is no guarantee Pandora will actually play that artist although usually it will eventually.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/03/pandora-musicminer/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Photoshop Plugins and Gtalk for Linux</title>
		<link>http://sbutler.com/blog/2006/03/photoshop-gtalk-linux/</link>
		<comments>http://sbutler.com/blog/2006/03/photoshop-gtalk-linux/#comments</comments>
		<pubDate>Fri, 24 Mar 2006 04:07:30 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/03/photoshop-gtalk-linux/</guid>
		<description><![CDATA[One common complaint from Windows enthusiasts is that image editor on Linux, The GIMP, is somehow lacking compared with Photoshop. Those people will be happy to know you can now use Photoshop plugins in The GIMP on Linux. The GIMP really is a fantastic tool for both Windows and Linux, so give it a go [...]]]></description>
			<content:encoded><![CDATA[<p>One common complaint from Windows enthusiasts is that image editor on Linux, <a href="http://www.gimp.org">The GIMP</a>, is somehow lacking compared with Photoshop. Those people will be happy to know you can now <a href="http://tml-blog.blogspot.com/2006/02/photoshop-filters-in-gimp-on-linux.html">use Photoshop plugins in The GIMP</a> on Linux. The GIMP really is a fantastic tool for both Windows and Linux, so <a href="http://www.gimp.org">give it a go now</a>!</p>
<p>Another piece of good news for Linux users is the announcement of the <a href="http://tapioca-voip.sourceforge.net/wiki/index.php/Tapioca">Tapioca Google Talk-compatible client</a>. While Google has made their IM network available to people of all OSes by <a href="http://www.google.com/talk/about.html#open">using the Jabber IM protocol</a> and <a href="http://code.google.com/apis/talk">open sourcing their VoIP backend</a>, Tapioca is the first application to provide both for Linux. Awesome!</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/03/photoshop-gtalk-linux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Snapshot of Web Development</title>
		<link>http://sbutler.com/blog/2006/03/web-devel/</link>
		<comments>http://sbutler.com/blog/2006/03/web-devel/#comments</comments>
		<pubDate>Mon, 20 Mar 2006 07:55:25 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/03/web-devel/</guid>
		<description><![CDATA[The Google Web Authoring Statistics site provides real insight into the way web developers are using the web. Of particular interest to me was the most commonly used elements page. The graphs on this site use SVG, although interestingly there is no comparison of image types people are using in their webpages. Anyway you&#8217;ll need [...]]]></description>
			<content:encoded><![CDATA[<p>The Google <a href="http://code.google.com/webstats/index.html">Web Authoring Statistics</a> site provides real insight into the way web developers are using the web. Of particular interest to me was the <a href="http://code.google.com/webstats/2005-12/pages.html">most commonly used elements</a> page. The graphs on this site use SVG, although interestingly there is no comparison of image types people are using in their webpages. Anyway you&#8217;ll need something like Firefox 1.5 to look at it.</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/03/web-devel/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WEKA in Jython or even C#</title>
		<link>http://sbutler.com/blog/2006/03/weka-jython-csharp/</link>
		<comments>http://sbutler.com/blog/2006/03/weka-jython-csharp/#comments</comments>
		<pubDate>Wed, 08 Mar 2006 04:43:31 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Geek]]></category>
		<category><![CDATA[Uni]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/03/weka-jython-csharp/</guid>
		<description><![CDATA[I was very excited to find out that Python scripts can access Java APIs if you run on them Jython interpreter. Jython is a Python interpretor written in Java which some people have put to good use for fast prototyping of WEKA applications. I built a simple classifier using Jython and weka classes and everything [...]]]></description>
			<content:encoded><![CDATA[<p>I was very excited to find out that <a href="https://list.scms.waikato.ac.nz/pipermail/wekalist/2004-January/002109.html">Python scripts can access Java APIs if you run on them Jython interpreter</a>. Jython is a Python interpretor written in Java which some people have put to good use for <a href="http://www.btbytes.com/blog/programming-the-weka-datamining-toolkit-with-jython">fast prototyping of WEKA applications</a>. I built a simple classifier using Jython and weka classes and everything seemed to be going fine. However, complications arose when trying to use databases jython. My intention was to use a <a href="http://www.sqlite.org">sqlite</a> database, but while databases work really well in standard python (or &#8216;CPython&#8217;) via the DB-API2, this is lost in the move to Jython. The web points to zxJDBC which has now been integrated into Jython, but it suffers from being a wrapper for JDBC that feels like a DB-API2 object&#8230; meaning you have to install JDBC drivers and all sorts.</p>
<p>On the .NET side of things, apparently <a href="http://www.ikvm.net/">IKVM</a> (part of the <a href="http://www.mono-project.com/">Mono</a> project), allows programmers to <a href="http://www.ikvm.net/devguide/net2java.html">use Java APIs into their .NET applications</a>, so maybe there is some hope for using Weka and .NET. BTW don&#8217;t forget there are heaps of free <a href="/blog/2005/11/c-sharp-programming/">great development tools</a> around if you are taking the .NET path.</p>
<p><strong>Update:</strong> Ok it turns out that although zxJDBC is a part of Jython now, it is still not included in the Gentoo package <img src='http://sbutler.com/blog/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/03/weka-jython-csharp/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Got Zeitgeist? Mining Online Trends</title>
		<link>http://sbutler.com/blog/2006/03/mining-online-trends/</link>
		<comments>http://sbutler.com/blog/2006/03/mining-online-trends/#comments</comments>
		<pubDate>Mon, 06 Mar 2006 04:12:31 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/03/mining-online-trends/</guid>
		<description><![CDATA[Each week, Google provides a taste of the top search queries, a site called Google Zeitgeist. At the end of each year, they compile a more comprehensive report of what people have been searching for. The 2005 Zeitgeist has been out since December and provides some interesting insights into online trends over past year. My [...]]]></description>
			<content:encoded><![CDATA[<p>Each week, Google provides a taste of the top search queries,  a site called <a href="http://www.google.com/intl/en/press/zeitgeist.html">Google Zeitgeist</a>. At the end of each year, they compile a more comprehensive report of what people have been searching for. The <a href="http://www.google.com/press/zeitgeist2005.html">2005 Zeitgeist</a> has been out since December and provides some interesting insights into online trends over past year. My favourites were <a href="http://www.google.com/press/zeitgeist2005/worldaffairs.html">world affairs</a> and <a href="http://www.google.com/press/zeitgeist2005/nature.html">nature</a>.</p>
<p>Beyond just being interesting, companies such as <a href="http://www.buzzmetrics.com/">BuzzMetrics</a> and <a href="http://www.blogpulse.com/">BlogPulse</a>  have realised that analysis of Internet activity will be a useful tool for many companies. They produce tools that mine blogs in a bid to capture consumer sentiment on particular product(s), for example to improve product marketing.  <a href="http://datamining.typepad.com/">Matthew Hurst of BlogPulse has an interesting blog</a> with the odd post covering Internet blogging activity, such as <a href="http://datamining.typepad.com/data_mining/oscars/index.html">this pre-Oscars analysis</a>.</p>
<p>Another interesting data mining application is the one pioneered by <a href="http://www.majesticresearch.com/">Majestic Research</a>. They provide stock research and earnings forecasts to analysts before actual company information is released. Using web-based data mining, they <a href="http://today.reuters.com/news/articleinvesting.aspx?type=fundsFundsNews&#038;storyid=2006-02-14T191858Z_01_N14387237_RTRIDST_0_FINANCIAL-MAJESTIC-HEDGE.XML">track the sales of the top consumer-sensitive web companies, and then use this information to infer the company&#8217;s performance</a>. Nice!</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/03/mining-online-trends/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Python Programming</title>
		<link>http://sbutler.com/blog/2006/02/python/</link>
		<comments>http://sbutler.com/blog/2006/02/python/#comments</comments>
		<pubDate>Mon, 27 Feb 2006 11:36:12 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Geek]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/02/python/</guid>
		<description><![CDATA[I&#8217;ve known PHP and Perl for some time now, so I decided it was time to learn the other &#8216;P&#8217; in scripting: Python! There is a lot of documentation on the Python website, but I would recommend two books available freely online to get started, Dive into Python and A Byte of Python.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve known <a href="http://www.php.net">PHP</a> and <a href="http://www.perl.org">Perl</a> for some time now, so I decided it was time to learn the other &#8216;P&#8217; in scripting: <a href="http://www.python.org">Python</a>! There is a lot of documentation on the Python website, but I would recommend two books available freely online to get started, <a href="http://diveintopython.org/">Dive into Python</a> and <a href="http://www.byteofpython.info/">A Byte of Python</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/02/python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More on Online Advertising</title>
		<link>http://sbutler.com/blog/2006/02/online-advertising-2/</link>
		<comments>http://sbutler.com/blog/2006/02/online-advertising-2/#comments</comments>
		<pubDate>Wed, 22 Feb 2006 00:58:29 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Geek]]></category>
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/02/online-advertising-2/</guid>
		<description><![CDATA[I previously blogged about advertisers going troppo for The Million Dollar Homepage. Well, online advertising is hot all over the net and is gaining on all traditional mediums fast. Australian Internet advert spending was AUD$620 Million last year, and will be AUD$1 Billion by the end of the year. Interestingly Internet advertising only accounts for [...]]]></description>
			<content:encoded><![CDATA[<p>I previously blogged about <a href="http://sbutler.com/blog/2006/01/online-advertising/">advertisers going troppo for The Million Dollar Homepage</a>. Well, online advertising is hot all over the net and is <a href="http://www.smh.com.au/news/business/online-ads-head-toward-1b-mark/2006/02/21/1140284067429.html">gaining on all traditional mediums fast</a>. Australian Internet advert spending was AUD$620 Million last year, and will be AUD$1 Billion by the end of the year. Interestingly Internet advertising only accounts for around 6% of Australian advertising budgets even though Australians are spending 15% of their media contact time online.</p>
<p>Everyone can get in on the action, too. Even for small bloggers, it is quite easy to display context-sensitive <a href="http://www.google.com/adsense/">Google AdSense</a> adverts and make the odd buck from the online advertising revolution. AdSense is based on an auction system, <a href="http://adwords.google.com/">AdWords</a>, where advertisers have to bid for which relevant keywords will trigger their ads to be displayed. As such <a href="http://www.cyberwyre.com/highest-paying-search-terms/">some keywords are worth significantly more than others</a>. Some people say they are making a living from ad clicks. Take <a href="http://www.problogger.net/">Darren Rowse</a>, for example, a professional blogger who claims an income of $400,000 &#8212; just from revenue generated from ad clicks! This may sound too good to be true, and maybe it is &#8211; author and Google expert Harold Davis says it is reasonable to expect a profit of <a href="http://www.wired.com/news/technology/0,70161-0.html">USD$10 per page per year</a> from AdSense.</p>
]]></content:encoded>
			<wfw:commentRss>http://sbutler.com/blog/2006/02/online-advertising-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

