A question came up about regarding Q1.4.9 in Cherney, Denton, Waldron. Here is a snippet of the original question:

Consider the set S = \{ *, \star, \# \} . It contains just 3 elements, and has
no ordering; \{ *, \star, \# \} = \{ \#, \star, * \} etc. Invent a function with domain \{ *, \star, \# \} and codomain \mathbb{R}. (Remember that the domain of a function is the set of all its allowed inputs and the codomain (or target space) is the set where the outputs can live. A function is specified by assigning exactly one codomain element to each element of the domain.)

Now what is the point of this question? When thinking about data science, many of the problems to solve start in the real world, away from the ideal world of mathematics. The real world is a messy place, with symbols and objects that aren’t easily operated on by mathematical functions. Hence, the first step in an analysis is quantifying stuff in the real world. The process of quantification takes something in the real world (the domain) and transforms it into a number (the codomain). Here are some examples of conceptually similar transformations:

In gymnastics, a routine is transformed into a number between 1 and 10 (historically).

The art market transforms a painting into a dollar value.

Facebook transforms emotional state into an engagement number.

So how does one go about creating functions to transform non-mathematical things into numbers? Obviously it depends on the goal you are trying to accomplish, but there are two general themes. The above examples are based on some sort of measurement. The measurement is not always objective, which may or may not present problems. Other times the values are encoded. Consider:

  • Mapping boolean values of true and false to 1 and 0
  • In UNIX, a process returns 0 to indicate success
  • RGB color space
  • Unicode

How you choose to measure some real world phenomenon or encode a concept can have a profound impact on your analysis. It’s up to you to decide and justify why one approach is better than another.

The second half of the question is implementing your function in R. The point here is to get you to think about the representation of data within a computer. LaTeX is used to render the symbols for the blog (and possibly the book). Can this representation be used directly in R? If not, how do you represent these symbols in a format that is compatible with R? Does this transformation impact your function?

From an implementation perspective, don’t get caught thinking that mathematical functions can only process numbers. Consider string functions that process characters and are still valid operators. The requirement is to implement a mapping from S to \mathbb{R}, which needs to be deterministic. You have to decide the best way to do that.

These questions come up all the time when conducting an analysis. Indeed, at the other end of the process, numbers are oftentimes transformed into something else, like a visualization. Hence it is important to think about the complete analytical process and not just the mathematical theories.