# Visualizing systems of linear equations and linear transformations

This is a lecture post for my students in the CUNY MS Data Analytics program. In this series of lectures I discuss mathematical concepts from different perspectives. The goal is to ask questions and challenge standard ways of thinking about what are generally considered basic concepts. Consequently these lectures will not always be as rigorous as they could be.

### Solution sets for systems of linear equations

Let’s look first at a simple system of linear equations with a single solution. This system is from Example 1.10.1 of Kuttler.

$\begin{array}{rrrr} x ~+ & 3 y ~+ & 6 z = & 25 \\ 2x ~+ & 7 y ~+ & 14 z = & 58 \\ & 2 y ~+ & 5 z = & 19 \end{array}$

What is this system telling us? If there is a unique solution, then we know that there is exactly one set of variables that satisfies all the equations. In addition, we know that all of the variables are independent. For our system above the solution is $\left\{ 1, 2, 3\right\}$, which can be graphed using the R code below. Note that each equation is rewritten in terms of z.

z1 <- function(x,y) 25/6 - 1/6 * x - 1/2 * y
z2 <- function(x,y) 58/14 - 1/7 * x - 1/2 * y
z3 <- function(x,y) 19/5 - 2/5 * y

domain <- seq(-4,4, .5)
p <- expand.grid(x=domain, y=domain, KEEP.OUT.ATTRS = FALSE)
sp <- scatterplot3d(p$y, p$x, z1(p$x,p$y), pch=20, cex.symbols=.2, color='brown')
sp$points3d(p$y, p$x, z2(p$x,p$y), pch=20, cex=.2, col='red') sp$points3d(p$y, p$x, z3(p$x,p$y), pch=20, cex=.2, col='orange')
sp$points3d(2,1,3, pch=20)  What happens if only two of the three equations are defined? This turns out to be quite interesting as the solution set becomes a line. The fact that the solution set is a line also tells us that the variables are no longer independent. $\begin{array}{rrrr} x ~+ & 3 y ~+ & 6 z = & 25 \\ & 2 y ~+ & 5 z = & 19 \end{array}$ Let’s redraw our graph with only these two functions, along with the new solution set. sp <- scatterplot3d(p$y, p$x, z1(p$x,p$y), pch=20, cex.symbols=.2, color='brown') sp$points3d(p$y, p$x, z3(p$x,p$y), pch=20, cex=.2, col='orange')

sol <- function(z) data.frame(y=19/2 - 5/2 * z, x=-7/2 + 3/2 * z, z=z)
sp$points3d(sol(seq(-4,7,0.5)), type='l') sp$points3d(2,1,3, pch=20)


As a bonus, I drew the original solution when we had three functions. Thankfully this point is contained within the larger solution set. Going in the opposite direction, if we start with a single equation, our solution set is a plane. Adding a second equation to the system yields a line, and a third equation yields a point. If we consider these equations as constraints in an optimization problem, it is easy to see how additional constraints can reduce the solution set.

Notice that we arrived at this solution set by using only two of the three equations. Sometimes a system with three equations can result in a non-unique solution because the equations are not dependent. In Example 1.10.4 of Kuttler, the following system of linear equations is listed.
$\begin{array}{rrrr} 3x ~+ & -y ~+ & -5z = & 9 \\ & y ~+ & -10z = & 0\\ -2x ~+ & y ~ & = & -6 \end{array}$

Solving this system leads to the same situation, where the variables are dependent.

#### Exercises

• Use the sol function to verify that our unique solution {1,2,3} is a member of the line
• Graph the solution set for equations 1 and 2
• Write the function for the solution set of 1.10.4 and graph its result

### Linear transformations

In the previous section we looked at systems of linear equations. Each equation can be considered a function in two variables. Since these are linear equations, our solutions are (hyper) planes. What if instead of solving for a specific solution that satisfies each function, we use a matrix to represent a function that changes a set of values? These are linear transformations and are quite common. Anyone that has created a score or measure from a weighted set of variables has applied a linear transformation. The definition of a linear transformation (2.19 of Kuttler) is simply a restatement of the properties of linearity for vectors instead of scalars. This is easier to understand by rewriting $A$ as a function $f$. Then we get $f(a\vec{u} + b\vec{v}) = a f(\vec{u}) + b f(\vec{v})$.

The simplest linear transformations are for matrices that are simply a row vector. As an example suppose we want to model which twitter users are the most internet savvy based on their usage stats. We define this as $savvy = a_1 followers + a_2 following + a_3 favorites$. Our transformation matrix is then $A = [ a_1 a_2 a_3 ]$. Given a user u, their internet savviness is just $A u$. Knowing what we know about matrices, we can construct a matrix with all users and calculate their savviness measure in a single operation. We’ll use a data file of tweet information that looks like

> head(z)
created_at
1 Mon Dec 02 23:35:49 +0000 2013
2 Mon Dec 02 23:35:48 +0000 2013
3 Mon Dec 02 23:35:48 +0000 2013
4 Mon Dec 02 23:35:47 +0000 2013
5 Mon Dec 02 23:35:46 +0000 2013
6 Mon Dec 02 23:35:45 +0000 2013
text
1 RT @Citizenship4All: Fasters, families pray outside of @SpeakerBoehner's office: May Congress realize "they are servants of the people." ht&
2 RT @msnbc: What is the most important issue Congress should act on before they head home on December 13? Take our poll: http://t.co/VIHjzbc&
3 RT @EWTN: Argentine congress recommends Pope Francis for Nobel Peace Prize: Buenos Aires, Argentina, Dec 2, 2013 / 05:55... http://t.co/2Ts&
4 RT @Scalplock: ACTION: Congress Likely to Vote on Gun Control Today. Dec 2.  http://t.co/78JLTdcPMb Schumer worked overtime during Thanksgi&
5                                          Americans Want Congress Members To Pee In Cups To Prove They're Not On Drugs http://t.co/LVSHu2b0iI
6     Vote for @politifact's lie of the year! I voted for @SenTedCruz's "Congress is exempt from #ObamaCare" lie here - http://t.co/r4rPv8Ns6W
retweet_count favorite_count user.screen_name user.followers_count
1            16              0   HutchissonMike                  617
2             1              0       srwrdm1221                   68
3             1              0     sistervpaul_                  616
4             1              0     anthonydiana                 1567
5             0              0          linmom1                    1
6             0              0  DigitalMaxToday                  972
user.friends_count                user.created_at user.favourites_count
1               1345 Tue Apr 23 14:29:36 +0000 2013                 11179
2                553 Sun Jul 21 11:58:49 +0000 2013                    14
3                643 Fri Sep 06 18:42:40 +0000 2013                  3495
4                968 Fri Jul 24 22:42:20 +0000 2009                  4136
5                  1 Tue Oct 13 02:36:08 +0000 2009                     0
6               1362 Wed Feb 22 00:32:24 +0000 2012                   605


(When I get a chance I will post the file so people can try out their own measures.)

z <- read.csv('~/congress.csv', as.is=TRUE)
u <- as.matrix(z[,c('user.followers_count','user.friends_count',
'user.favourites_count')])
A <- as.matrix(c(3,1,2),nrow=1)
A %*% t(u)


Using this example as a starting point, what does it mean for $A$ to have multiple rows? Additional rows are merely additional aggregate measures. Consider a measure for receptiveness, which we define as the amount someone is willing to retweet someone else’s post. What variables might go into a measure like this? As a naive first try how about favorite/day, and friends/followers. To calculate averages on a daily basis we want to determine the age of an account and bind this to our matrix.

age <- Sys.Date() - as.Date(z\$user.created_at, format='%a %b %d %H:%M:%S %z %Y')
u <- cbind(u, favorites=u[,'user.favourites_count'] / as.numeric(age))


Note that this gives us a different set of variables from the first measure. At first glance this may seem problematic, but since this is a simple linear combination all that needs to be done is to union the variables and set the irrelevant ones to 0. Hence we can update A to be A <- cbind(A, 0).

This lesson provides some perspectives on matrices and how to apply them to the analysis process. Obviously matrices and linear algebra have a wealth of applications, so these are not the only interpretations. The goal here is to provide you with tools for how to conceptualize their usage. As we progress through the course, we will explore other interpretations of matrices along with their application. One point to consider is how here we are explicitly building a function to transform data, whereas in the previous section we were trying to find a solution to a given set of functions. In a later lecture, we will compare and contrast this approach to a linear regression.

#### Exercises

• Update the savviness measure to normalize friends and followers based on age of account
• Create the receptiveness measure and update the u and A matrices

# From area under the curve to the fundamental theorem of calculus

Tags

This is a lecture post for my students in the CUNY MS Data Analytics program. In this series of lectures I discuss mathematical concepts from different perspectives. The goal is to ask questions and challenge standard ways of thinking about what are generally considered basic concepts. Consequently these lectures will not always be as rigorous as they could be.

This week let’s take a closer look at integration. People often describe integration as area under the curve. This is indeed true, yet I always found it a bit difficult to understand how you get from area under the curve to the Fundamental Theorem of Calculus. This theorem can be cast two different ways, and I’m referring to it as $\int_a^b f(x) dx = F(b) - F(a)$, where $F$ is the antiderivative of $f$.

I like starting with simple examples since it’s a lot easier to understand the behavior of something when you minimize the variables introduced. Hence, let’s start by looking at a line.

xs <- seq(-1,5, by=0.02)
f <- function(x) x - 1
plot(xs, f(xs), type='l', col='green')
abline(h=0, v=0, col='grey')


There’s nothing particularly remarkable here, so let’s change that. What happens if we add to this graph the cumulative Riemann sum of $f$ for the interval [-1,5]? In other words let’s graph $F(x) = \sum_{k=-1}^x f(k) \Delta x$.

lines(xs, cumsum(f(xs)*.02), col='blue')


Well this looks kind of like a parabola, and obviously the limit is, but what’s the intuition around it? The simplest thing to do is to see what the cumulative sum of $f(x) \Delta x$ is.

head(cumsum(f(xs)*.02), 20)
[1] -0.0400 -0.0796 -0.1188 -0.1576 -0.1960 -0.2340 -0.2716 -0.3088 -0.3456
[10] -0.3820 -0.4180 -0.4536 -0.4888 -0.5236 -0.5580 -0.5920 -0.6256 -0.6588
[19] -0.6916 -0.7240


This is telling us that the area of a thin strip is rather small. It’s also telling us that since the slope is positive, a little bit less negative area is being added each time. Eventually something interesting happens as $x \rightarrow 1$. The value of the original function $f$ starts to get really small, eventually approaching 0 when $x = 1$. Consequently, there isn’t much contribution to the area from these parts of the line. At $x = 1$ the slope is exactly 0, which is where $f(x) = 0$. Once $x > 0$ the area contribution becomes positive and consequently the value of the Riemann sum begins to increase.

The second form of the Fundamental Theorem of Calculus is similar to our construction of the Riemann sum. It states that $\int_a^x f(t) dt = F(x) - F(a)$. This alternate construction gives the integral as a function of $x$ such that the derivative yields $\frac{d}{dx}(F(x) - F(a)) = F'(x) = f(x)$. The graph above confirms this since the slope $F'(1) = 0$ and this is exactly the value of $f(1)$.

Let’s explore the relationship of this version of the Fundamental Theorem of Calculus and the Riemann sum further. Both formulations describe a function in terms of a starting point up to some value $x$. Consider the interval [-1,1], where $a=-1, x=1$. At $F(a)$, the value is -0.04. This initial value is always going to be close to 0, since we take the limit of $\Delta x$ to 0. Getting back to our function $F$, at $x=1$ the total area is -2. We can verify this with some geometry since this is a triangle with area $A = \frac{(-2 - 0) (1 - -1)}{2} = -2$. In R this looks like

F <- function(x) sum(f(x) * .02)
> F(xs[xs <= 1])
[1] -2.02


Hence it seems reasonable that the integral for this special case is F(1), or $\sum_{-1}^1 f(x) \Delta x = F(1) = -2$. As shown above, the value computed in R is -2.02. I’ll leave it as an exercise to explain why this is so. Another useful point to look at is 3. Visually we can see that the area from -1 to 3 is represented by two congruent triangles with opposite sign, so the value must surely be 0.

> F(xs[xs <= 3])
[1] 1.665335e-16


Indeed, this value is close. We’ve successfully illustrated the relationship between area under the curve and the Fundamental Theorem of Calculus. However, this is the second version of the theorem and we started with the first. This second version relies on some constant point $a$ with a starting value $F(a) = 0$ whereas the first version uses two arbitrary points. Remember that with the Riemann sum we need to start with an initial starting point $a$ and the area will be close to 0 with small $\Delta x$. Suppose we want the value of $\int_1^3 f(x) dx$. By shifting the starting point to $a=1$, we could use the same technique so that F(3) gives us the right value.

 lines(xs[xs >= 1], cumsum(f(xs[xs >= 1])*.02), col='brown')


This has the effect of shifting the parabola by 2, which is essentially $F(1)$. Of course we don’t need to shift the starting point at all. Instead we can simply compute the difference of the two Riemann sums. This has the effect of cancelling any fixed starting point and give us the two arbitrary end points of the interval.

> F(xs[xs <= 3]) - F(xs[xs <= 1])
[1] 2.02


This gives us that $\sum_{x=a}^b f(x) \Delta x = F(b) - F(a)$. Taking the limit then gets us to the familiar $\int_a^b f(x) dx = F(b) - F(a)$.

#### Exercises

1. Why is F(xs[xs <= 1]) = -2.02 and not -2?
2. What happens when you use an interval of 0.5 instead of 0.02?
3. Are there situations where the area of the first x value of the Riemann sum won’t approach 0?

# Curve sketching for calculus

Tags

This is a short post for my students in the CUNY MS Data Analytics program on sketching curves in R.

### Graphing functions

Suppose we want to find the derivative of $\frac{-4x^5 + 3x^2 - 5}{x^2}$. In addition to computing the derivative analytically, it might be interesting to graph this function to see what it looks like. When graphing a function, I like to generate a sequence of x values and then pass it to the function. Since R is vectorized, there is no need to write a loop. This is because for vectors (aka sequences) arithmetic operators work on an element-wise basis. In other words, they are equivalent to the higher order function map.

f <- function(x) (-4*x^5 + 3*x^2 - 5) / x^2
xs <- -10:10
plot(xs, f(xs), type='l')


This gives us the graph below. You may notice that there is a big gap between -1 and 1. Why is this? The short answer is that 0 is undefined for $f$. While this is correct, the slightly longer answer is that the spacing of the $x$ values is too big at integer intervals. Hence the gap is 2 units wide and is not representative of the actual function. This is important to remember as incorrect scales can often lead to misleading results.

Let’s try again with a spacing of 0.1. What’s the best way to do this? If we want to use the syntactic sugar, then we need to scale the interval ourselves. For our example the scaling is easy. For the more general case, what is the best way to model the scaling? Getting back to the original discussion, here are two equivalent alternatives.

xs <- (-100:100) / 10
xs <- seq(-10, 10, by=0.1)


Note that in the first form the parentheses are mandatory due to the operator precedence rules of R.

Working with this interval, we get a more precise representation of the function. However, I still have this uneasy feeling that I don’t really know what this function looks like near 0. Let’s “zoom” into 0 by increasing the resolution by another order of magnitude. At a spacing of 0.01, this function looks very different from what we started with.

#### Exercises

• Write a function that takes an integer sequence and scales it to a given precision. For example, given the sequence -5:5, write a function s such that s(-5:5, 0.1) returns the sequence c(-5.0, -4.9, -4.8, …, 4.9, 5.0). Do not use the seq function in your answer.
• Reproduce the graph of f within the domain [-4, 4] and precision 0.2 using the function above to generate the x values.

### Composing the derivative

What do these graphs tell us about the derivative? It appears mostly well-behaved except when $x=0$. It’s straightforward to use the product rule to compute the derivative. Here we let $g(x) = -4x^5 + 3x^2 - 5$ and $h(x) = x^{-2}$. Why use the product rule instead of the quotient rule? It’s really a matter of style. Personally, it’s easier for me to remember fewer rules.

$\begin{array}{lcl} (g(x) h(x))' & = & g'(x) h(x) + h'(x) g(x) \\ & = & (-20x^4 + 6x) x^{-2} + (-2x^{-3}) (-4x^5 + 3x^2 - 5) \\ & = & -20x^2 + 6x^{-2} + 8x^2 - 6x^{-1} + 10x^{-3} \\ & = & -12x^2 - 6x^{-1} + 6x^{-2} + 10x^{-3} \end{array}$

Returning to the original motivation for this discussion, the question is whether these curves can shed any light on the behavior of the derivative for this function. Now that we’ve deconstructed $f = gh$, what do these two functions look like?

g <- function(x) -4*x^5 + 3*x^2 - 5
h <- function(x) x^-2

xs <- seq(-2, 2, by=0.02)
plot(xs, g(xs), type='l')
lines(xs, h(xs), col='blue')


As expected, $g$ looks like a classic odd-order polynomial while $h$ behaves according to a negative exponent. What might be surprising is that $h$ is positive $\forall x \in \Re$. The function $f$ goes to negative infinity at 0 because $g$ is slightly negative at 0, which is not obvious when first glancing at this graph.

What else does this graph tell us? It is useful to remember that the original function $f$ is the product of these two functions. A first observation is that around $|x| = 2$, $g$ begins to grow at a rate much faster than $h$ shrinks. Similarly, when $|x| < \frac{1}{2}$ it appears that $h$ grows very fast. Graphically then, it seems that the derivative of $g$ is going to be dominant when $|x| > 2$ and vice versa when $|x| < \frac{1}{2}$. We can’t really say much between $\frac{1}{2} \leq |x| \leq 2$.

Let’s write functions to represent the first derivative and overlay them as dotted lines onto the graph.1

g1 <- function(x) -20*x^4 + 6*x
h1 <- function(x) -2 * x^-3

lines(xs, g1(xs), lty=3)
lines(xs, h1(xs), col='blue', lty=3)


Now things are getting interesting. It takes a bit more effort to picture what the derivative of f looks like given these four curves. From a graphical perspective the product rule tells us to sum the product of the dotted black line and the solid blue line with the product of the dotted blue line and the solid black line.

To make it easier, here are the two products (g’h in orange and h’g in brown) along with the sum, which of course is f’ (in black).

Decomposing a function into smaller functions can be a useful exercise when looking to assess the relative impact of the constituent functions. Working from the opposite direction, it can also help in function approximation. Usually it is easier to build up a complex function from smaller functions rather than starting with a complicated function. I will explore this idea in a future post.

#### Exercises

• Reproduce the last graph

### The derivative and constants

Here is another example of using graphs to help illuminate the behavior of functions. Let’s look at why a constant has a derivative of 0. Consider the function $f(x) = 2x^2 - 4x + 5$. The derivative of this function is $4x - 4$. What about the function $f(x) + 10$? Intuitively we wouldn’t expect the derivative to be any different. In fact, since the derivative is linear, the derivative is simply $f'(x) + 0$. We can apply this same logic to $f$ itself and deconstruct this function into $g(x) = 2x^2 - 4x$ and $h(x) = 5$. Hence, any constant added to a function has no effect on the derivative.

Graphically, it is easy to see that the derivative of $f$ is the same as $g$ since the shift is merely a linear combination of the $g$ and $h$. Why is the derivative of $h$ (a constant function in x) 0? A constant value is telling us that the function is independent of $x$. Consequently, any change in $x$ has no effect on the constant function. Therefore the derivative of the function with respect to $x$ should be 0 everywhere. This observation also helps illustrate partial derivatives in multivariate calculus and why certain terms become 0.

While vertical shifts have no effect on the derivative, horizontal shifts do. Why is this? Simply put, it is because a horizontal shift modifies each $x$ value, so this change is dependent on $x$. A student asked whether the derivative is constant if you account for the horizontal shift. How do we answer this? Let’s define a modified function $g(x,n) = 2(x-n)^2 - 4(x-n) + 5$. If we take the partial derivative of $g$ with respect to $x$ we get $g'(x,n) = 4(x-n) - 4$. Indeed, if we think about how the chain rule works, each instance of (x-n) will be preserved such that $g'(x, n) = f'(x-n)$.

### Notes

[1] See this handy reference for plot styles

# Slides from FP Days

These are the slides from my talk at FP Days 2013. The first part goes into more detail about why functional programming is a good match for data analysis. It brings together a number of concepts and ideas I’ve written about here, while also elaborating a bit on the functional nature of mathematics. The second part talks a bit about how my technique for detecting regime change in irregular time series works.

On a separate but related note, I have a backlog of libraries to push to CRAN. Among them is a new package, called kingsmen, that provides the functionality for the regime change detection. I hope to publish that in the near future, so watch this space.

FP Days 2013 – Brian Rowe

# Quick tip on controlling log output in tests

Tags

When running tests for a package, it’s important that the console output is unadulterated since test results are printed with formatting. My code is littered with log output using my futile.logger package [1]. Since arbitrary appenders are supported in futile.logger, a neat trick is to redirect output to a file when running the tests. This is as simple as adding the following lines to the top of a testthat or RUnit script:

library(futile.logger)
flog.appender(appender.file("unit_tests.log"))


Now when R CMD check runs, the console output will be clean and all the log output from your package will be directed to <package>.Rcheck/<package>/tests/unit_tests.log.

Similarly, if you want to modify the log threshold when running tests, throw in a change of threshold.

flog.threshold(DEBUG)


So you get all of this without needing to change any of the code in the package proper.

### Notes

[1] The futile.logger package is available on CRAN, while the latest version is on github. To install from github, run the following from your R shell.

library(devtools)
install.packages('futile.options')
install_github('lambda.r', 'zatonovo')
install_github('futile.logger', 'zatonovo')


# The Prospector business model and the #slowbrood

Tags

This is my response and follow-up to Ysette Guevara’s nice introductory piece on the idea of the “slow brood”. If you like these ideas, consider posting comments and your ideas using the Twitter hash tag #slowbrood.

Our world has become increasingly frenetic where attention span is measured in seconds. Business models have followed suit. Rather than fighting the trend, business leaders and investors have discovered that the short attention span culture is positive for business since it is consistent with maximizing profits. In this model ideas play second fiddle to market and team because great teams in large markets can continue to pivot until they find a profitable idea. This can happen in ever-shortening cycles as the size of pivots converges with attention span.

If all you care about is money then this is a great approach. In this market you have a cadre of bright-eyed and bushy-tailed labor with short attention spans that expect to hit gold immediately. If you aren’t a billionaire by thirty then you must not be on the fast track. Clearly it’s best to get started as quickly as possible. Witness the people leaving college for startups not to mention the people encouraging students to leave. In an attention-deficit world, who has the time to toil away at unpractical study for four years? Besides all that matters any more is getting accepted into an elite institution, not the experience at the institution. The premise appears to be that going to college is now an opportunity cost instead of an opportunity!

Providing an environment to incubate these sorts of ideas is critical. If the universities no longer provide this environment for developing these ideas, then we are left with government and private industry as the only alternatives. Even worse is that the talent pool capable of thinking big will have shrunk to a barely discernible size. We are creating a society of “skilled labor” that only knows how to make incremental improvements. The irony is that in the 80s Japanese business was criticized for only being capable of incremental improvement, or technology renovation, whereas the United States was about true innovation driven by creativity. In the Prospector model, we have tragically donned the suit of renovation.

It’s time to embrace the slow brood and have the courage to focus on big ideas. Only then can we return to our roots as innovative and creative thinkers.

# Lambda.r at FP Days 2013 in Cambridge, UK

After a bit of a hiatus in public speaking, I will be presenting Functional Programming In The Computational Sciences With Lambda.r at this year’s FP Days conference in Cambridge. This will be my first talk on ideas from my book, and I think the slides are shaping up nicely. The core message is that embracing the duality between mathematics and functional programming yields insights in both fields. The talk is not super technical and is more about sharing some insights and lessons learned, including a surprising connection I found between Church Numerals and fractals. After establishing the symmetry between mathematical concepts and functional programming concepts, I discuss an application of these principles for detecting regime change in irregular time series. This is based on my work for forecasting the next 30 days of an individual’s spending to create a budget for them.

This will also be the first time I will be promoting this idea to the functional programming community instead of the quant community. Not surprisingly, R is not even cited as a functional language in the conference programme. As more fields embrace data and numerical methods as a core aspect of the field, I am confident that this idea will become more broadly accepted. The talk will be recorded and broadcast online at InfoQ. For those of you looking for more motivation regarding my book, hopefully this presentation will help. I’ll send details and post slides here when available.

# Book example – iterative function systems for generating fractals

A number of people suggested that my book, Modeling Data With Functional Programming In R, be more upfront with examples since the benefits of functional programming may not be immediately obvious to readers. As a first example, I have implemented some iterative function systems for a few common fractals. Iterative function systems are an elegant approach to generating fractals as they are based on repeated application of a collection of linear transformations. Hence each application of an IFS produces more granularity for an existing set of points.

The larger motivation for using IFS is that I’m updating my aging fractalrock package to provide a generic framework for generating data via processes. The idea is to be able to plug in any process to generate a time series (or other entity) for simulation purposes. In the current version, I use a pattern based approach for generating fractal-based time series. This is a fun and easy way to explore fractal processes, but it’s a bit clunky and doesn’t offer much in terms of mathematical reasoning. On the other hand, IFS and fractional brownian motion are more rigorous techniques that achieve the same goal.

### Heighway Dragon

The first IFS we’ll explore is the Heighway Dragon, which is initialized with a line segment. This system applies two transformations to each collection of points: scale by $r$ and rotate by 45 degrees counter clockwise; and scale by $r$, rotate 135 degrees, and translate to $(1, 0)$, where $r=\frac{1}{\sqrt{2}}$. This can be accomplished by taking the union of two function applications: $H(x) = \displaystyle\bigcup\limits_i^2 f_i(x)$, where $f_1(x) = \begin{bmatrix}1/2 & -1/2 \\1/2 & 1/2\end{bmatrix} \vec{x}$ and $f_2(x) = \begin{bmatrix}-1/2 & -1/2 \\1/2 & -1/2\end{bmatrix} \vec{x} + \begin{bmatrix}1 \\ 0 \end{bmatrix}$. Note that the $r=\frac{1}{\sqrt{2}}$ requirement is there simply to join the two segments at their end point and is equivalent to $f_1(x) = \frac{1}{\sqrt(2)} \begin{bmatrix}cos(45) & -sin(45) \\sin(45) & cos(45)\end{bmatrix} \vec{x}$, which uses a more familiar rotation matrix.

Here is what the Heighway Dragon looks like for different numbers of iterations.

I wanted to do a code comparison to show the difference in using a functional programming approach to this versus a more procedural approach. Funnily enough, most implementations I found described the process in terms of providing a sequence of commands to a line-drawing function (like Logo when I was little or Turtle these days), which is the ultimate imperative approach. Anyway here is my version for a single iteration.

heighway <- function(x) {
f1 <- function(x) matrix(c(.5,.5,-.5,.5), nrow=2) %*% x
f2 <- function(x) matrix(c(-.5,.5,-.5,-.5), nrow=2) %*% x + c(1,0)
cbind(f1(x),f2(x))
}


If we look at this function it does exactly what we described mathematically. In fact we can even transform the function back into symbolic notation since the code has no side effects and satisfies referential transparency.

Let f1 <- function(x) matrix(c(.5,.5,-.5,.5), nrow=2) %*% x, which is equivalent to
\begin{aligned} f1 = f_1 &= \lambda x. \begin{bmatrix} .5 & -.5 \\ .5 & .5 \end{bmatrix} x \\ f_1(x) &= \begin{bmatrix} .5 & -.5 \\ .5 & .5 \end{bmatrix} x \\ \end{aligned}.

Similarly, f2 <- function(x) matrix(c(-.5,.5,-.5,-.5), nrow=2) %*% x + c(1,0) is equivalent to
\begin{aligned} f2 = f_2 &= \lambda x. \begin{bmatrix} -.5 & -.5 \\ .5 & -.5 \end{bmatrix} x + \begin{bmatrix}1 \\ 0 \end{bmatrix}\\ f_2(x) &= \begin{bmatrix} -.5 & -.5 \\ .5 & -.5 \end{bmatrix} x + \begin{bmatrix}1 \\ 0 \end{bmatrix}\\ \end{aligned}.

The function overall is heighway(x) = cbind(f1(x), f2(x)) $= \begin{bmatrix}f_1(x) & f_2(x)\end{bmatrix}$. Generating an explicit version of the dragon is a matter of applying a n-fold composition on the function heighway. Note that the terminology points the way to the implementation:

x <- rbind(seq(0,1, by=0.05),0)
xn <- fold(1:12, function(a,b) heighway(b), x)


The fold function is a staple of functional programming and is a binary operator used for accumulating results from a single vector input. As shown above it can also be used for straight function composition by throwing away the first argument and keeping only the accumulated result. I include the function in my upcoming lambda.tools package, which is a collection of functions that supplement the book. In the meantime, the definition is below. You will need to install and load lambda.r for this to work.

fold(EMPTY, fn, acc) %as% acc

fold(x, fn, acc=0) %when% { is.null(dim(x)) } %as%
fold(x[-1], fn, fn(x[[1]], acc))

fold(x, fn, acc=0) %as%
fold(x[,-1,drop=FALSE], fn, fn(x[,1], acc))



In this rather trivial example it is clear that the code implementation is equivalent to the mathematical formulas. When such an equivalence exists, this is a great boon as we can be confident that the software model has the same certainty as the mathematical model. The goal is to extend this certainty to as much of the code as is possible. That is the point of my book.

### Sierpinsky Pentagon

The rationale for the implementation of the Sierpinski Pentagon is similar, albeit a little more involved. The function is three lines of code, where the first two lines are creating matrices and there is one line of function application. In this function fold makes an appearance within the body itself. Why is this? The reason is because the class of Sierpinski polygons is basically $n$ replicas that are scaled and shifted to the appropriate locations. Hence it is easy to simply create a matrix that represents the scaling and shifting for each replica and again perform a union of the results. In this implementation, b represents the accumulated set of points.

sierpinski5 <- function(x) {
m <- matrix(c(0.382,0, 0,0.382), nrow=2)
o <- matrix(c(0,0, 0.618,0, 0.809,0.588, 0.309,0.951, -0.191,0.588), nrow=2)
fold(o, function(a,b) cbind(m %*% x + a, b))
}


This function can be called in the same manner as the Heighway Dragon, with the difference being that the starting set of points is a polygon and not a line segment.

x <- polygon(5)
xn <- fold(1:5, function(a,b) sierpinski5(b), x)


If you try this yourself, be aware that the set of points grows quite quickly, so don't use too high an $n$ value (5 in this example), or you may run out of memory.

It would be remiss of me to not show the polygon function. Not to worry as the same philosophy underpins its implementation. Notice that most of this implementation is also primarily building data structures. The actual creation of the polygon happens in the last line with fold. This works by recognizing that just like the iterative function systems above, it is possible to generate any collection of line segments by taking the union of a collection of functions. So we fold by the number of sides, creating a rotation matrix as a multiple of the degree of each angle. The offset is simply a column vector in the offset matrix.

Not surprisingly, the offset matrix, o, is also constructed using functional programming principles. The function cumsum is naturally functional (trademark anyone?) and creates a vector of the cumulative sum of the input vector. The insight here is that each vertex of the polygon can be described as an offset from the previous vertex (moving in order counter clockwise). The offset is simply the cosine and sine of the angle of the line segment.

rotation <- function(degrees) {
cbind(c(cos(theta),sin(theta)), c(-sin(theta), cos(theta)))
}

radians <- function(degrees) degrees / 360 * 2 * pi

polygon <- function(sides=5, by=0.1) {
theta <- 360/sides
o <- rbind(xo,yo)

x <- rbind(seq(0,1,by=by),0)
fold(1:sides, function(a,b) cbind(rotation((a-1)*theta) %*% x + o[,a], b))
}


### Conclusion

As a first example of the benefits of functional programming for model development, I hope this gives readers a taste of what is to come. One of the remarkable things about this approach is that thinking about the program functionally gives insights into the mathematically nature of the problem. This elegant interplay between code and math is only possible because functional programming is so close to traditional mathematics.

For those wondering where the magic of lambda.r comes into play, I chose intentionally not to use any syntactic constructs from lambda.r (excepting the definition of fold) as I wanted to illustrate what is possible with standard R.

# Kilruddery House, County Wicklow

The Kilruddery House was our favorite estate that we saw. It’s formal gardens were immense and still nicely maintained. Part of the majesty might be tied to the juxtaposition of its location near Bray, a sea-side resort that had the feel of Coney Island. I doubt that was what they were going for, but it certainly didn’t have the same charm that we had experienced throughout the rest of County Wicklow.

The house in its current incarnation is about a third less than its largest due to massive amounts of wood rot found. Like an amputation the only way to save the host was by cutting off the limb. This is the back of the house, which is the part that visitors enter from. The real entrance is on the other side.

As you may have guessed, the house is still lived in. Evidence was in the form of a child’s bicycle.

This is a pool within the circular hedge, some 20 feet tall. There are only 3 entrances in, and looking through the passage you can see rows of hedges behind that hint at the regularity of the formal gardens. Apparently in the hey-day of the estate, there were over 50 full-time gardeners on staff.

The gardeners weren’t the only ones hard at work.

Adjacent to the formal gardens was a rock garden that apparently had 5 gardeners to tend to it. This is a view from the side looking at what I think is the Little Sugar Loaf in the distance.

From the top of the rock garden you get a nice view of the rest of the house along with the rolling hills beyond.

There was also wooded lanes that swept around the perimeter with more hidden treasures.

Pretty much everywhere you went, the grounds were nicely maintained with something else to admire.

Inside the house were other nice artifacts. Apparently one of the previous Earls enjoyed making clocks, and an interesting one made with a bicycle chain hangs inside (and still works). Unfortunately pictures aren’t allowed inside.

# Preview of my book – Modeling data with functional programming in R

As some of you know, I’ve been writing a book (to be published by CRC Press/Chapman & Hall and released in late 2014) for the past year and a half. It’s one of those books that spans multiple disciplines so is both unique and also niche. In essence it’s a manifesto of sorts on using functional programming for mathematical modeling and analysis, which is based on my R package lambda.r. It spans the lambda calculus, traditional mathematical analysis, and set theory to 1) develop a mathematical model for the R language, and 2) show how to use this formalism to prove the equivalence of programs to their underlying model. I try to keep the book focused on applications, so I discuss financial trading systems, some NLP/document classification, and also web analytics.

The book started off as a more practical affair, but one thing that I’ve learned through this process is how to push ideas to the limit. So now it delves into quite a bit of theory, which makes it a more compelling read. In some ways it reminds me of ice climbing, where you’re scaling a waterfall and really pushing yourself in ways you didn’t think possible. Three chapters into the process, and it’s been that same combination of exhilarating challenge that results in conflicting thoughts racing through your head: “Holy crap — what am I doing?!” versus “This is so fun — wheeeee!” versus “I can’t believe I did it!”

That said, this book pushes some new territory. Many of the ideas are experimental and provisional. The proofs probably could use some work. So in the spirit of openness I’m looking to share my thoughts and get feedback from the general community on the ideas within the pages. Issues I already know about include:

• Images need to be cleaned up. Some are in color, others need to be scaled
• Some proofs are incomplete. Their absence shouldn’t impact the content
• Some examples need to be completed. Again, it shouldn’t impact the overall reading of the material.

Read the draft: Rowe – Modeling data with functional programming. Any comments are greatly appreciated.