Tags

### The Problem

In the real-world, problems are often not well defined. It is up to the practitioner to define the problem. Compare this to many classroom settings and entry-level positions that detail every last minutiae of work. This is equivalent to color-by-number coloring books. You are given the problem and the method. Your job is strictly execution. This can be an effective approach for learning a subject but less so for solving actual problems where things are more open-ended.

At some point in your drawing career you graduate from this detailed instruction and move to coloring books without numbers. The problem is still given to you, but now you choose the method. Hence, you have to decide what colors to use. More importantly, a “successful” drawing is now contingent upon whether you choose good color combinations.

Finally, you outgrow coloring books altogether. What happens now? Instead of a line drawing, you are given a blank sheet of paper. It is up to you to define the problem. Here you have the greatest freedom but also the highest risk of failure.

This progression from color-by-number to an empty piece of paper isn’t so different from the maturation of a (data) scientist. First you learn the techniques. Then you learn how to apply the techniques to problems given to you. Finally, you define the problems. As your career advances, your success will be contingent on transforming blank sheets of paper into something valuable, ie identifying opportunities from data. Hence, your first challenge is defining the problem.

There are a number of ways to ask this question. Equally valid are:

• What problem are you solving?
• What is the purpose of this project?

The answers need to be specific. A lot of times they will sound like a use case or user story, which takes the form of “I want to do X because Y”. This will help you identify who the beneficiary is for the project as well. If you don’t know what you are solving nor who benefits, you most certainly will fail as your project becomes indistinguishable from entertainment.

As you work through the problem definition, you may find that there are numerous people who benefit, each with a slightly different problem that can be solved by the same model or analysis. If this is the case, it’s necessary to prioritize the problems and focus on the most important one. This is particularly important when developing models. When problems are conflated, you’ll find that it’s much harder to find a solution. So this is a form of simplification.

Some of you might protest that prioritization is not part of your responsibilities. That may be true, but as you advance in your career, you will be expected to not only lead initiatives but drive new ones. That means knowing how to prioritize.

### Process

Once you know what you’re drawing and who it’s for, how do you go about creating the drawing? This is your process, or method. In data science, it usually follows the scientific method. That means you need to have a hypothesis and a way to test that hypothesis. Your model will likely be built on a number of such hypotheses.

Like acting, there are numerous methods, and no single approach trumps others. That said, your process needs to contain at least the following elements:

• Data – where are you getting it, how complete is it, what biases exist?
• Theory – what is your high-level thesis for your model, ie what relationships is the model exploiting to make an inference?
• Evaluation – how do you know if your model is (in)effective?