The dangers of data storytelling

In analyzing data for business, they say you’re best off telling a story. You see, when you review data, you determine what happened, and then use that information to synthesize a narrative. This helps the consumers of your analysis understand the results, and it promotes executive buy-in for the job you’re doing as an analyst.

This approach isn’t wrong. In fact, it’s very important to effective communication. Yet, data journalism does have a vulnerability that I want to address, especially when analysts lose sight of what they’re doing when crafting the narrative that they’re going to share within their organization.

Data isn’t telling you a story. Data is the footprints of what happened; it’s the measured impact of change. Now, you can tell some useful, informed stories about what that change might have looked like: people visited a website, customers bought some things, fingers pressed some icons in an app. But each one of these stories needs an element of doubt, especially when the narrative is being told by a single analyst, or a single organization.

Different things can leave similar footprints. My toddler likes to play with my phone, and my wife’s phone, and throws off some interesting data when he does. He likes to use any app he can get into, he makes calls, and he’s found his way to random YouTube videos more than once. Sometimes he knows what he’s doing, but more often than not he’s just pressing icons until something interesting happens. If you were to examine user behaviour on my phone, you might not be able to tell what’s going on, and not everyone who looked at such data would come to the same conclusions about the phone’s owner. Maybe I’m a parent whose phone occasionally finds its way into a child’s hands. Maybe I occasionally erupt into uncontrollable spastic fits. Maybe my phone fell into the tentacles of an alien who is trying to figure out our technology and is systematically exploring the device.

This example is getting away from me, but the point is this: There might be many explanations for why the data you get is the data you get, and not everyone is going to think about that data the same way.

A recent editorial comment in the journal Nature made an argument for crowdsourcing academic research. Its authors put together a test, wherein 29 teams of researchers were given identical data and asked the same question. The data included details about soccer players, referees and their interactions, and the teams were each asked to determine whether or not referees were more likely to give red cards to players with darker skin.

“We found that the overall group consensus was much more tentative than would be expected from a single-team analysis”, the authors write. Since teams employed different analytical methods to their research, they arrived at different conclusions. “Had any one of these 29 analyses come out as a single peer-reviewed publication, the conclusion could have ranged from no race bias in referee decisions to a huge bias.”

The fact that groups of researchers come to different conclusions shouldn’t be startling, but its impact is. In academic journals, this leads to conclusions being established by individual studies, even after peer review, and then informing other research. Meanwhile, media reports take the results of academic studies and sensationalize the news in an effort to attract readership. This compacted bias takes results that are complicated and nuanced, and turns them into a simplified, definitive statement that may influence people’s behaviour.

“Under the current system, strong storylines win out over messy results.”

If this is true in academic research, which is meant to be quite thorough, careful and stringent, consider how such variances play out in business analysis, where the timelines are compressed and methodology isn’t carefully checked over. How do you regain confidence in results?

The lesson here is to be sceptical of any individual analyst’s conclusions, especially if their methodology is unclear and their results are not framed with details about how they were obtained.

To protect against this, an organization should work with multiple analysts, working both in and outside of the organization. There are advantages and biases that come with either position; analysts inside an organization are more likely to be influenced by internal politics and the culture of the group, but on the other hand, they may have more contextual information that sheds light on their results. External analysts are less likely to be swayed by the workings of the organization, and can offer a fresh perspective from their differing vantage point. By employing analysts in both situations and occasionally having them approach the same sets of data, you can mitigate the biases of their backgrounds and approaches. When they tell you the same story, then you can have greater confidence in it. When their stories differ, you gain a more nuanced perspective and protect yourself against making great leaps in the wrong direction when you act on the results.

This is part of why I always thought Napkyn’s Analyst Program was a great idea. I used to be a Senior Analyst at Napkyn, where the company offers dedicated analysts to large enterprises as a managed service. Throughout the development of that program, we were often faced with two challenges: organizations that wanted to have their analysis done in-house, and in-house analysts who felt threatened by the presence of external analytics professionals working with the company. Both of these problems go away when everybody sees the advantages of having multiple teams with different perspectives working with the same data. Outsourcing an analyst is no substitute for having internal teams looking at, consuming and sharing data; nor is it enough to leave every aspect of your business analysis to internal teams.

Finally, back to analysts: don’t always expect the data to tell you a clear story. Data is messy; sometimes that’s because it’s not recorded or organized well, but sometimes it’s because that data represents the footprints of messy actions. As an analyst, it’s not your job to write compelling stories; it’s your job to deliver the truth. If the truth can be told as a true story, that’s fantastic. You will certainly discover facts that can be presented in a compelling narrative, and those insights will be the easiest to act on. But when the stars don’t align, you’re going to have to communicate that results are inconclusive or unclear, because that’s the truth. You should never be delivering a story that’s merely inspired by true events.


Two ugly attitudes towards mental health

Diagnoses of mental disorders are on the way up. The DSM-IV task force, led by Allen Frances, sought to limit this inflation. The DSM-5 is expected by many (including Frances) to make it worse. If it does come to pass that nearly everyone can be diagnosed of a mental disorder of some sort, how should we think about what it means to have such a disorder?

Well, there are two attitudes that we shouldn’t have: we shouldn’t be afraid of apparent epidemics in mental health, nor should we take the other extreme and shrug off mental disorder entirely.

  1. For one, we can see the rise in diagnoses of certain disorders as a sign that all of the chemicals/”toxins”/technologies/radiation/whatever in our food/air/society or that the stresses and immoralities of our lives in the modern era are making us, well, crazy.
  2. We can, alternatively, reduce our thinking about mental health to “everybody’s got something”, meaning that we all fit on a spectrum of mental health somewhere and let’s not make a big deal about it.

The first position is a little big ignorant. I’m not saying that it can’t be that all the dangers of our world aren’t impacting our physical and mental health. They certainly are. But there are other social and psychological issues that are inflating our perception that things are going badly. One of them is the inflation caused by awareness of disorders and the diagnostic criteria of the DSM, in its various incarnations. If you become aware of these facts, you start to see that so-called epidemics are a product of our evolving classifications and not a change in the actual prevalence of certain conditions.

The second position is equally ignorant. Occasionally, people with severe mental health conditions are shrugged off by an individual because that individual knows someone else with a milder form of the same diagnosis. Mental disorders tend to exist on a variety of spectra, to be certain. But the fact that so many people can or are being diagnosed with mental disorders should not trivialize the experience of those who struggle through daily life because of them. To approach these people with a “so, what?” attitude isn’t helpful.

In general, I find our understanding of mental health lacking. We’re fearful that our children will have certain disorders, and in some cases we over-medicate them at the first sign of what might be perfectly normal distress. Yet, at the same time, we trivialize the plight of those who struggle with severe obstacles to mental health and have different mental abilities. There’s a middle ground between panic and apathy here.

‘Just in case’: if and only if?

The prevailing view in North American philosophical writing seems to be that the phrase ‘just in case’ can be translated into the phrase ‘if and only if’. Consequently, this view holds that the phrase ‘just in case’ is best symbolized by the logical connective known as the biconditional (\leftrightarrow).

Now, this seems wrong to me for two reasons. One is the difference between ‘just in case’ in this sense and the sense it has in British English, as noted by Geoffrey K. Pullum:

  • British English: “We’ll bring an umbrella just in case it rains.”
  • American philosophers: “A formula is a tautology just in case it is true on all valuations.”

That’s a fine difference to note, but I also have a hard time grasping why ‘just in case’ should count as ‘if’ and ‘only if’ at all. That is, to me, ‘just in case’ sounds more like ‘only if’. It seems that it spells out a necessary condition but not necessarily a sufficient one. Consider:

  • Something is a tree just in case it is a plant.

Now, according to what seems to be the standard view, this is a false statement because something can be a plant and not a tree. That is, ‘if something is a plant then it is a tree’ is false, so this sentence, just like ‘something is a tree if and only if it is a plant’ is false.

But it seems to me that this sentence actually means ‘something is a tree only in the case that it is a plant’. That is, I’m more inclined to translate ‘just in case’ as ‘only if’. Under such a translation, the above sentence is true, because being a plant is a necessary condition for treehood.

The problem is that the lexical definition of ‘just’, as an adverb, spells out multiple meanings. One is ‘exactly’ or ‘precisely’, which supports the prevailing intuition that ‘if and only if’ best captures the meaning of ‘just in case’–that it means ‘exactly in the cases that’. But there is also the meaning ‘only’ or ‘simply’. This is the source of my intuition.

Meanwhile, it seems that a number of students in elementary logic classes agree with me, since I often see them translating ‘P just in case Q’ into something like ‘P\to Q‘. Officially their textbook and notes equate ‘just in case’ with ‘if and only if’, so I’m not meant to give them the marks for this, but I do empathize.

Some Notes: Scott Soames, ‘Linguistics and Psychology’

Scott Soames argues that linguistics and psychology are separate enterprises, since they differ in their domain of study and empirical discoveries in one are unlikely to be realized in the other. He does this primarily by identifying what sorts of things linguists are up to, and comparing that with what is properly psychological.

Conceptually distinct

Soames says that linguistics and psychology are “conceptually distinct” (155) in the sense that they differ in their domain of study. In order to do this, he identifies what he calls the three “Leading Questions” (158) of the linguistic enterprise. They are questions concerned with the differences and similarities between actual natural languages, between natural languages and artificial or animal ones, and between languages and their historical variations. These are, according to Soames, the basic questions that define the domain of linguistics because they are the questions that initiate the actual practice of linguistics. It is these sorts of questions that linguists are out to answer.

Soames also highlights facts about linguistics which are clearly not psychological. For instance, semantics in linguistics requires a non-psychological component in the form of truth conditions. Truth conditions are essentially relations between sentences in a language, which may be thought of as abstract or mentalistic, and the real world. To use the famous example, ‘snow is white’ if and only if snow is white. While one can argue that ‘snow is white’ is a mentalistic object, it would be much more difficult to make the case that the fact that snow is white is psychological in nature. Hence, the case of truth conditions in semantics provides a counterexample for the claim that linguistics is entirely about the minds of language users, and hence the claim that linguistics is psychology falls apart. They must differ, at least somewhat, in their domains; some facts about language are linguistic and not psychological.

Soames also comes at the problem from the opposite side, noting that psychologists are concerned with things like the processing times and error rates between individuals speaking certain languages. These, while interesting facts for Soames, are not a part of linguistics proper. That is, theoretical linguistics is not concerned with mental aspects of human speech, but rather the output of the speakers, the language itself. Because there are things that psychologists are concerned with that linguists need not be, the domain again seems to be different. Some facts about language users are psychological and not linguistic.

In short, for Soames, linguistics is about languages as abstract objects, while psychology is about language users.

Empirically divergent

Soames’ second major claim is that linguistics and psychology are “empirically divergent” (155), that is, empirical investigation of language speakers is unlikely to discover that the grammars posited by linguists “correspond exactly” (168) with the mental structures of competent speakers.

To make this case, Soames notes that while some linguistic facts can correspond to psycholinguistic ones (such as the case of grammatical sentences and competent speakers judging sentences to be grammatical), others will not correspond. Instead, there are facts that only one discipline (between linguistics and psychology) will be interested in. Psycholinguistic data will be of interest to psychologists, but not to theoretical linguists. Meanwhile, semantic facts of truth conditions, logical properties and relations (169) will be of interest only to linguists. In formulating their theories, each discipline has its own epistemological domain as well, the domain of empirical facts that are to be admitted into the theory-forming process.

Despite this diversity, there is a logical possibility that the linguistic theory of grammar and the psychological theory of competence will turn out to be isomorphic after all; the theory of grammar may indeed correspond 1:1 to a psychologically real structure, however unlikely this is to Soames. But to say that linguistic theories are psychological in nature is to assume in advance that such theories do correspond. It would be an empirical discovery that an isomorphism exists between a grammar and a competence model.

Soames again appeals also to the actual practices of linguists, noting that linguists aim to produce a theory of grammar that is as simple and general. Again, it may be that the psychologically real model of competence is optimally simple and general, but this cannot be assumed. There is no reason to suppose that things will turn out this way. Hence, to suppose that building a minimalistic and general theory of grammar is the proper means of building a theory of competence is ill-conceived.

Because counting linguistic models as psychological ones rests on epistemologically dubious assumptions, which he thinks are unlikely to be the case, Soames argues that linguistics cannot properly be thought of as a psychological enterprise.

Does Soames beg?

Soames’ criterion of demarcation between the linguistic and the non-linguistic rests on the Leading Questions of linguistics. Soames takes these definitionally as what linguistics is about, which seems to beg the question.

Further, Soames also says that “nothing [linguistic] logically follows” (159) from certain facts about processing times and grammatical mistakes between speakers of different expression types. This is based on the assumption that languages are abstract entities and that linguistics is about them. If, on the other hand, one takes the position that languages are mentalistic in nature, Soames’ reasoning doesn’t seem to work.

A bit of ontology

Soames frequently refers to the facts that linguists and psychologists rely on. One might wonder whether these facts are all mentalistic in nature if they are meant to be separate from states of affairs. He also says that truth conditions are at least partly about non-psychological facts. How a theory of truth is to work, however, is no simple matter. A coherence theory of truth, however implausible, would not rely on a correspondence with an external reality but rather with how they logically cohere with other beliefs the speaker has—both elements thus being mental states. Pragmatic theories of truth might suffer from similar struggles.

Soames, Scott. ‘Linguistics and Psychology’. Linguistics and Philosophy 7 (1984). 155–179.

LaTeX: Numberless lines in fitch.sty

To write natural deduction proofs in LaTeX, I use a package called fitch.sty. The package was written by Johan W. Klüwer and offers a nice clean way to typeset Fitch-style proofs. He provides a nice example:

Lovely. However, in some of my proofs, I wanted to have lines without numbers because they featured information that was not strictly part of the proof. For instance, like others, I commonly add a line that indicates the formula we’re out to prove after the list of premises. This is especially useful in teaching proofs. That line, I don’t want numbered — instead I want the counter to skip that line and continue after it, like so:

I had to dig around in the fitch.sty file itself to figure out how to do this, since there’s not really any documentation outside of it. I figured I’d share what I did for anyone facing the same issue.

Here’s what you do. Instead of beginning a line with “\fa” or something like that, add a line like this:

\ftag{~}{\vline\hspace{\fitchindent} CONTENT } \\

Where CONTENT is replaced by whatever you want to have on that line. The exact code for the my ‘∴ B’ line, for example, is:

\ftag{~}{\vline\hspace{\fitchindent} \fbox{$\therefore~ B$}} \\

And that’s all there is to it. I hope this helps someone looking to do the same thing as I was.

Happy typesetting!

Marcus Aurelius: The Character of Antoninus

In Book 6 of his Meditations, Marcus Aurelius makes some remarks about the character of Antoninus Pius, his predecessor as Emperor of Rome, and his adoptive father. Marcus warns himself against being seen as like Julius Caesar, and says to avoid this he must live a good and humble life and conduct his duties in such a manner, as a follower of philosophy. Such a proper lifestyle can be seen in the life of Antoninus.

Marcus describes Antoninus as having a “keenness for logical action” and an “equable temper” (54). This is to say that his adoptive father maintained a calm disposition in his duties and private life. He was not rash in his behaviour, but rather thought his actions through with great care. Here the contrast with Caesar can be seen. Whereas Caesar was power-hungry and with a great ego, Antoninus was much more reasoned in his actions and had a “lack of vainglory” (54), a formidable and unusual trait for an emperor to have.

That “keenness” for a reasoned approach is again evident as Marcus describes Antonius’ “ambition to understand affairs” (54). Again, Antoninus is not seen to be acting on his intuition but to carefully take the time to understand what is going on around him, and apply careful thought prior to action. Rather than merely rely on others for information, Antoninus wanted not only to know the truth of the matter, but to “understand”, as Marcus says. All possible choices before Antoninus were carefully and completely examined before one was chosen. “He never rushed things.” (54).

Marcus also praises Antoninus’ disposition towards other people, especially in his public life. When he was criticized publicly, he not only “tolerated” challenges to his views, but he “was glad” when presented with a more favourable view to his own position. To those who were wrong, or offered only negative commentary without foundation or suggestion for improvement, Antoninus “endured” their scorn. Neither slander nor rumour had any effect on him. He also maintained strong, “unchanging” friendships (54). That is, he did not abandon his friends, nor was he quick to end a friendship over petty matters.

In his duties of public office, Antoninus is said to have enjoyed his work to the point of being completely focused on it for long periods of time. He was “energetic” when working and continued to work until late, taking no breaks at all. His duties as a statesman were guided by justice and a goal of protecting the people, as Marcus lists among the virtues one must seek in his position.

Marcus also comments on Antonius’ spiritual lifestyle. Antoninus is described as pious, having respect for the gods and acting in accordance with their laws. Marcus also makes a curious remark, that Antoninus “was religious but free from superstition” (55). In this sense, Antoninus would have characterized the humbleness and reverence that come with piety and the proper following of a religion, along with its moral code. However, more petty or sensational aspects that often accompany religious belief, such as certain types of ritual and accompanying fears of retribution, would not have formed a part of Antoninus’ thought. Thus, he was able to obtain virtue from his beliefs without hinderance.

In all other areas of life, Antoninus demonstrated this same humility. Happy to live simply, he did not need elaborate housing or clothing. His diet was “scanty” (54), which both showed this humility and allowed him to to focus on work for longer periods of time. He also required “little in the way of . . . servants” (54). These aspects of his life again contrast him with Caesar, who was known to enjoy a lavish lifestyle and demand much of those around him.

Marcus’ description here follows one earlier in Book 1 of his text, where he tells of his adoptive father’s advice to follow such a lifestyle, and to “honor genuine philosophers” (7). Both there and in this section, what is good is to put proper philosophical and moral thought and action above one’s selfish, personal inclinations. Marcus tells himself that being Antoninus’ “disciple” (55) in all of these traits will leave him with a clear conscious at death, knowing that he had lived a good life.

Marcus Aurelius. The Meditations. Trans. G. M. A. Grube. Indianapolis: Hackett, 1983.