This is the third part in a three part series on teaching R to MFE students at CUNY Baruch. The focus of this lesson is on programming methods and application development in R.

Contents

PART I: PRELIMINARIES

PART II: STATISTICS

PART III: STRUCTURING CODE

Object-Oriented Programming

Depending on with whom you speak, you may hear that R is object-oriented. Others will say it’s functional. In fact it’s both and neither simultaneously. In R, object-oriented programming centers around how functions are dispatched and less about how code is structured. S3 introduces a class attribute and a polymorphic dispatching system, which resembles functional programming. In S4 certain embellishments give the illusion of a class-based programming model. The two systems are mostly compatible, but there are instances where there can be conflicts. Large projects like RMetrics and BioConductor heavily use the S4 style, but many smaller projects do not really benefit from the added complexity of S4.

S3 Classes and Dispatching

The simplest dispatching system is object-oriented in the sense that a function is called based on the ‘class’ of the first argument. A variable’s class is simply an attribute attached to the variable.

> class(h)
 [1] "xts" "zoo" "returns"
> attr(h, 'class')
 [1] "xts" "zoo" "returns"

When calling a function the actual implementation depends on whether the generic function is S3 or not. If it is, the definition will typically defer to a separate function called UseMethod. This function will dispatch to a concrete implementation based on the class of the first argument. The matching function will be named

dispatched function := base function "." class

If no such function is found, then the default function is called. As an example, let’s look at the function mean:

> mean
function (x, ...)
UseMethod("mean")
<bytecode: 0x1051616a8>
<environment: namespace:base>

This function has a number of implementations including a default function mean.default. Try (methods(mean) to see what’s available). Hence, to get the mean of the returns our portfolio, mean(h) will dispatch to mean.default since there are no declared functions for any of the classes associated with h.

> mean(h)
[1] 0.001991222

Unfortunately, this isn’t the behavior we want. Rather, we want to see the mean for each asset. We can accomplish this by implementing a new function mean.zoo (which would then apply to any zoo objects).

> mean.zoo <- function(x, ...) apply(x, 2, mean, ...)
> mean(h)
        AAPL          XOM           KO            F           GS
0.0023180089 0.0021953922 0.0002628634 0.0028525868 0.0023272563

This technique can be used to create new functions as well as add implementations to existing S3 methods.

S4 Classes and Dispatching

While S3 is simple yet powerful, it doesn’t offer much in the way of programmer safety. Since the class attribute can be changed at will, it’s easy to break the convention and consequently other people’s code. The S4 system attempts to formalize object-oriented programming. It introduces constructors, type safety, inheritance and other features typically associated with object-oriented programming languages.

Classes are defined using the setClass and setClassUnion functions.

setClassUnion('XtsNull', c('xts','NULL'))
setClass('Equity',
  representation(ticker='character', returns='XtsNull'),
  prototype=list(ticker='', returns=NULL))

Methods are then attached to the class using the setGeneric and setMethod functions.

setGeneric('beta', function(equity, market, ...) standardGeneric('beta'))
setMethod('beta', c('Equity','Equity'),
  function(equity, market) {
    cov(equity@returns, market@returns) / var(market@returns)
  })

Instances are created with the new function.

> xom <- new('Equity', ticker='XOM', returns=h$XOM)
> mkt <- new('Equity', ticker='^GSPC', returns=h[,'^GSPC'])
> beta(xom,mkt)
        ^GSPC
XOM 0.8693016

Clearly one cost of the S4 system is the added overhead in programming. It is not so easy to transition from exploratory programming to formal applications because S4 demands a lot of structure from the beginning.

There are also now ReferenceClasses, which is like S4 but objects are mutable, creating an even stronger object-oriented paradigm within R.

Functional Dispatching

While much emphasis has been on object-oriented programming in R, other programming paradigms are equally valid. Functional programming has become popular once again, and R is particularly suited for this programming style.

Lambda.R

(Note this section is updated to reflect the latest generation of my functional programming package. Lambda.r replaces the older futile.paradigm.)

R has its roots in both S and Scheme. Many of the improvements to S (e.g. lexical scoping) is directly attributed to Scheme, which is a functional language derived from LISP. The lambda.r approach borrows additional concepts from the functional world so programs can be structured functionally*. This package attempts to return R to an environment that is conducive to iterative development that leads to structured programs. In fact, this is one of John Chambers’ original goals for the S language [1]. Lambda.r introduces syntax to write multipart functions reminiscent of Erlang or Haskell.

Functions in lambda.r are defined as multipart definitions. The advantage of this approach is that data manipulation is kept separate from application logic. The drawback is that it can be more verbose. Multipart functions are defined as separate clauses each with corresponding guard statements or type constraints. Guards define the conditions for executing a particular implementation while type constraints restrict the function to specific input types. Here is the beta implementation again using a type constraint

beta(equity, market) %::% Equity : Equity : numeric
beta(equity, market) %as%
{
  cov(equity$returns, market$returns) / var(market$returns)
}

Each function clause is started with a %when% operator, which supports multiple conditional expressions. Each expression must evaluate to true for the function to execute. The actual function definition is then specified by the %as% operator. Supporting additional signatures is as easy as adding another function clause. In this example we only use a guard to show the different ways of defining functions.

beta(portfolio, market) %when% {
  portfolio %hasa% returns
  market %isa% Equity
} %as% {
  cov(portfolio$returns, market) / var(market)
}

* Note that I’m the author of lambda.r and futile.paradigm, so the discussion is slightly biased.

A Type System

Lambda.r offers its own type system. Types are simply data structures tagged as a particular data type. We avoid using the word ‘class’ to avoid confusion with the legacy OOP programming models. Types are instantiated by calling its type constructor, which is defined like any other function in lambda.r,

Bond(T, coupon=0.02, tenor=10) %as% {
  list(coupon=coupon, tenor=tenor)
}

Creating an instance of the type is the same as before,

> bond <- Bond()
> bond$coupon
[1] 0.02

In the above example, the astute reader will wonder how Bond can be represented as raw types. S3 and S4 classes are not native to the language, so they must be wrapped in quotes and represented as strings. Lambda.r provides syntactic sugar to allow the use of raw types. To use this feature, the types must be specified as PascalCase, otherwise they, too, must be enclosed in quotes.

Attributes

When working with data it is convenient to attach meta-data to an object. For example with a matrix that represents equity return correlations, it is useful to add attributes that indicate the number of observations or periodicity of the input set. This can be done easily with attributes in lambda.r. In the beta example above we might want to retain the date of the calculation. The ‘@’ symbol is used to access attributes and can be used in any lambda.r definition (both type constructors and functions).

beta(portfolio, market) %when% {
  portfolio %hasa% returns
  market %isa% Equity
} %as% {
  b <- cov(portfolio$returns, market) / var(market)
  b@date <- portfolio$date
  b
}

Ultimately the choice of programming style depends on the author of the software and the domain in use. In finance many concepts are directly related to mathematics, which itself is functional, so translating these ideas to code is much simpler than in object-oriented contexts [2].

References

[1] J. Chambers. Evolution of the s language. In Proceedings of the 20th Symposium on the Interface. The Interface Foundation of North America, 1996.
[2] B. Rowe. A Beautiful Paradigm: Functional Programming in Finance. R/Finance 2011, 2011.

About these ads