R for Quants, Part II (A)

This is the second part in a three part series on teaching R to MFE students at CUNY Baruch. The focus of this lesson is on basic statistical methods in R.

PART I: PRELIMINARIES

PART II: STATISTICS

A. Distributions, Sampling, and Regression
B. Optimization and Linear Programming

PART III: STRUCTURING CODE

A. Dispatching Systems
B. Real World Development

Distributions

As a statistical programming language, R provides many functions for describing distributions. Standard distributions include normal, binomial, uniform, chi-squared, etc. Others are available from add-on packages.

Distribution functions follow a standard pattern in R, which makes it easy to use any of the available variants. The stats package, which is automatically loaded, provides functions for many common distributions. To see what is available, try ?Distributions.

Prefix	Description	Example
d*	density function (PDF)	`dnorm`
p*	distribution function (CDF)	`pnorm`
q*	quantile function (inverse CDF)	`qnorm`
r*	random generation	`rnorm`

Let’s look at a plot for each of these functions. Notice the usage of par and plot to generate each chart. When adding multiple charts to a plot, special parameters must be set to configure the layout. When you are done, it’s good practice to reset the configuration by calling par again with the old par object (as returned by the function previously).

opar <- par(mfrow=c(2,2), mar=c(3,5,1,1)+0.1)
x <- seq(-5,5, by=0.1)
y <- seq(0,1, by=0.02)

plot(x, dnorm(x), t='l')
plot(x, pnorm(x), t='l')
plot(y, qnorm(y), t='l')
plot(x, rnorm(x))
par(opar)

The graphics capabilities in R are quite extensive [1] but can be quite cumbersome to work with. There are numerous packages that attempt to improve the way graphics are handled in R including ggplot, lattice [3].

Exercise: Write a function plot_dists(name) that generates the above plot for the given distribution.

Analyzing Distributions

It is easy to spot check whether a given set of data fits a distribution by using a QQ plot. The qqnorm and qqline functions use a normal distribution as a default. For other distributions, use qqplot to explicitly specify the x data series.

h <- getPortfolioReturns(c('AAPL','XOM','KO','F','GS'), 100)
qqplot(h$XOM)
qqline(h$XOM)

As expected, the tails do not fit a normal distribution. To align the theoretical quantiles with the sample quantiles, you have two approaches: normalize the sample data or use a different set of theoretical quantiles. [2]

z <- (h$XOM - mean(h$XOM)) / sd(as.vector(h$XOM))
qqnorm(z)
qqline(z)

Exercise: Generate a QQ plot using a better fit for the theoretical distribution. Hint: Try using rnorm.

Parameter Estimation

To fit data to a particular distribution, you can use any of the numerous optimization functions (optim, nlm, optimize, etc) to minimize your log likelihood function. Here is a contrived example of estimating the mean and variance of one of our stocks.

get_estimator <- function(x)
{
  n <- length(x)
  function(theta)
  {
    n/2*log(2*pi) + n/2*log(theta[2]) + 1/(2*theta[2]) * sum((x-theta[1])^2)
  }
}

Let’s execute the function and compare with the built-in functions.

> f <- get_estimator(h$XOM)
> optim(c(0,1), f)$par
[1] 0.0020516656 0.0002079123
> mean(h$XOM)
[1] 0.002052150
> sd(h$XOM)^2
         XOM
0.0002100090

Note the use of a closure in this example. Once we have the closure, it is easy to pass the closure to other optimization functions for experimentation. We will discuss optimization in more detail in the next section.

Sampling Data

R provides many methods for sampling data from an existing data set, depending on what you want to accomplish. The simplest is sample, which selects entries from the specified vector. By default, the probability of selecting a specific element is equally-weighted over the elements. For example, this is appropriate when drawing cards from a deck.

> cards <- sapply(c('H','S','C','D'), function(x) paste(x, 1:13, sep=''))
> sample(cards, 5)
[1] "C12" "S6" "H10" "S12" "D7"

To bootstrap a data set, set replace=TRUE, which will reuse all elements for each sample selection. This option is appropriate when flipping a coin multiple times.

> sample(c('H','T'), 10, replace=TRUE)
 [1] "T" "H" "H" "H" "H" "T" "T" "T" "H" "H"

For more sophisticated bootstrapping, see the boot package [4].

Covariance and Correlation

Not surprisingly, there are built-in functions for calculating the covariance (cov) and correlation (cor) between data. You can even convert between them with cov2cor.

> cov(h$XOM, h$F)

Exercise: For portfolio h, write a function to calculate the covariance matrix without using cov.

Regression

Linear regression is performed by using the lm function, which fits data to linear models. We can use it to estimate the parameters of the CAPM for our toy portfolio. Note we assume the risk free rate is 0 to simplify the example.

> ws <- t(sapply(1:100, function(x) rdirichlet(rep(1,5)) ))
> p <- apply(h * ws, 1, sum)
> m <- getPortfolioReturns('^GSPC',100)
> m$portfolio <- p
> colnames(m) <- c('market','portfolio')
> out <- lm(portfolio ~ market, m)
> summary(out)

In this example we are generating random weights for each day to calculate our portfolio returns. Afterward, we download our returns for our market and run the regression.

> plot(as.vector(m$market), as.vector(m$portfolio))
> abline(out)

A good discussion of doing regression analysis in R is in [5]. Other forms of regression analysis are supported as well but is out of scope for this document.

References

[1] An Introduction to R Graphics
[2] Fitting Distributions With R
[3] Lattice – Multivariate Data Visualization with R – Figures and Code
[4] Bootstrapping Regression Models
[5] Practical Regression and Anova using R

4 thoughts on “R for Quants, Part II (A)”

swtzang said:

February 17, 2012 at 7:27 pm

Regarding your code:

ws <- t(sapply(1:100, function(x) rdirichlet(rep(1,5)) ))

The returned message is

Error in length(alpha) : 'alpha' is missing

I think there may be some missing value to generate random draws from this distribution. Any suggestion on this? Thanks for your help.

LikeLike

- Brian Lee Yung Rowe said:
  
  February 18, 2012 at 10:58 am
  
  Hello, I did forget to mention that you need to have the bayesm package installed and loaded to run that:
  install.packages(‘bayesm’)
  library(bayesm)
  
  Curiously, even without the loading the library, I wasn’t able to get the error message you’re seeing. What version of R are you running? I’m running 2.14.0.
  
  LikeLike
  
  - swtzang said:
    
    February 20, 2012 at 9:35 am
    
    I am still have this problem after installing the package you mentioned. I am running 2.14.1.
    
    LikeLike
  - Brian Lee Yung Rowe said:
    
    February 20, 2012 at 11:39 am
    
    Can you provide the console output so I can see exactly the commands you are using? Seems like there’s a disconnect somewhere.
    
    LikeLike

Cartesian Faith

~ Insights of a modern alchemist

R for Quants, Part II (A)

Contents

PART I: PRELIMINARIES

PART II: STATISTICS

PART III: STRUCTURING CODE

Distributions

Analyzing Distributions

Parameter Estimation

Sampling Data

Covariance and Correlation

Regression

References

4 thoughts on “R for Quants, Part II (A)”

Leave a comment Cancel reply

Contents

PART I: PRELIMINARIES

PART II: STATISTICS

PART III: STRUCTURING CODE

Distributions

Analyzing Distributions

Parameter Estimation

Sampling Data

Covariance and Correlation

Regression

References

Share this:

Related

4 thoughts on “R for Quants, Part II (A)”

Leave a comment Cancel reply