This is a continuation of the R workshop I’m teaching at the Baruch MFE program. This section discusses the programming model of R in a slightly biased way. The full contents are below.

Contents

PART I: PRELIMINARIES

PART II: STATISTICS

PART III: STRUCTURING CODE

Function Definition and Evaluation

Defining a function is done via assignment, similar to any other variable.

f <- function(x) 3 * x + 2

This function can be executed as f(5). Since R is vector-based, a vector can be passed into the function as is giving results for each element in the vector.

> f(-5:5)
 [1] -13 -10 -7 -4 -1 2 5 8 11 14 17

R is a dynamically typed language so any compatible argument will be evaluated. We'll see in the third part how this is combined with the dispatching systems to implement polymorphism (as well as class systems).

Note that in lambda.r, it is possible to define multipart functions that eschews direct assignment with a declarative notation. This is covered in my introduction to lambda.r.

Named Arguments

In the example above, the argument was passed into the function as a positional argument. Named arguments are also supported, which allows you to specify arguments in any order you choose, assuming that the names are valid.

f <- function(x,y) (x-5)^2 + (y+2)^2
f(y=3,x=4)

Optional Arguments

Not all arguments need to be specified in a function call. When defining a function, any argument can provide a default value. When calling a function, any argument not explicitly passed into the function will then be populated by the default value.

f <- function(x,y=3) (x-5)^2 + (y+2)^2
f(4)

The Ellipsis Argument

Sometimes a collection of arguments need to be passed onto another function. This happens frequently with functions that call plot functions though it's useful in numerous situations. Essentially any unmatched arguments will populate the ellipsis arguments, which can be passed along to another function as is.


f <- function(x, ...) plot(x, ...)

f(rnorm(100), main="100 Random Values")

The ellipsis argument can be manipulated in other ways, but that is out of scope for this discussion.

First Class Functions

All functions are first class in R, which means you can pass them around like any other variable. This property is used throughout R, where numerous higher order functions are used to operate on data.

Like many languages, a single statement does not require braces, although multi-line definitions must be formally blocked. Unlike many C variants, R does not require an explicit return statement at the end of the function definition. Whatever is the result of the last statement is returned by the function.

Higher Order Functions

When working with data structures it is useful to perform actions against each column (or row) of data. In other languages this would be accomplished using a loop while in R a higher order function is employed. The most basic of these is apply. This function is similar to the common higher order function map but operates on an array or matrix.

> h <- getPortfolioReturns(c('AAPL','XOM','KO','F','GS'), 100)
> apply(h, 2, sd)
      AAPL        XOM         KO          F         GS
0.01719355 0.01528400 0.01050431 0.02480413 0.03114184

When working with lists, lapply is typically the variant to use. Other variants include sapply (simplify result), mapply (multivariate sapply), and tapply (table data). In certain cases, do.call can be used to execute a function passing arguments to the function as a list.

Common higher order functions like fold or reduce are not built-in, although there are packages that provide these functions.

Exercise: Use a higher order function to separate up days from down days for each asset in h above.

Lambda Expressions

In the example above, sd was used as a function reference. If the default behavior of sd is not desired, we can construct an anonymous function that defines custom behavior.

> apply(h, 2, function(x) sd(x, na.rm=TRUE))

Anonymous functions like these are used throughout R. Note that lambda expressions are best when they are short and concise. Using this approach with longer definitions can make code hard to read.

Closures

A closure is a function with an associated environment. They can be constructed easily in R. An important consideration is that all referenced variables in the closure are by default read-only. To change their value a special assignment operator must be used, which will search through parent environments until a matching variable is found.

counter <- function(start=0)
{
  x <- start
  function() { x <<- x + 1; x }
}

The above function can be evaluated as,

> f <- counter(5)
> f()
[1] 6
> f()
[1] 7

Under what situations are closures useful? Any time a function reference is passed as an argument to a function, a function signature is implicitly defined. If your function does not have the same signature, it is necessary to wrap your function in another function that matches said signature. While a direct call can be made, sometimes it's better to delay evaluation, in which case return a function reference is the best choice.

Exercise: Use a closure to generate a parameterless function that caps returns to some threshold x.

About these ads