Alliterations aside, here is a preview of something I’ve been tinkering with. My goal is to be able to run R code as a phase within a Riak map/reduce job. In a multi-cultural world filled with distinct languages, it should be obvious that one size does not fit all. In the case of erlang, statistics is not its strong suit. Writing a sparse matrix class is bad enough, but imagine implementing regression or random matrix theory. For its part and despite many honorable attempts, R isn’t great at distributed processing. So waving the banner of bringing the processing to the data, why not use R to process portions of a map/reduce job?

This actually isn’t as hard as it sounds. Below are a few snippets of running R code via an erlang RPC. This means that R is available and running as an erlang node!

First, we are calling the R function ‘mean’ to calculate the arithmetic mean of the list of numbers

<pre>(test@localhost)57> rpc:call('rchimedes@localhost', rchimedes, eval, {mean, [[10,12,13,25,20]]}).

Next we’ll get samples from a random normal distribution. To me, calling rnorm is analogous to Hello, World for R.

(test@localhost)58> rpc:call('rchimedes@localhost', rchimedes, eval, {rnorm, [10]}).

Currently the syntax is structured to use atoms as function references (i.e. the function must exist in R space) and binary strings as function defintions. Notice that the arguments passed to the function are sent in a list. This is standard erlang to support additional arguments for the remote function call. For example, lets say we want to pull from a normal distribution with mean 5:

(test@localhost)60> rpc:call('rchimedes@localhost', rchimedes, eval, {rnorm, [10,5]}).

The above examples hopefully whet your appetite for what is possible here. The next step in the exercise is to execute from a Riak job and pull it all together in a complete job. Any ideas on case studies are welcome. Otherwise, brace yourself for something finance related.