Storing a String larger than 1500 bytes in Google Cloud Datastore with Java

Just a quick one in case this saves someone some time. If you’re used to using Google Cloud Datastore with Google App Engine (in Java), then you know that large string values are passed into the App Engine Datastore API using the Text type. But, if you’re using the gcloud libraries instead ( com.google.cloud.datastore), you don’t have this option. Strings are passed into the entity builder as String objects, or as StringValue objects. E.g.:

So, what do you do when you need to store a String larger than 1500 bytes? It turns out that the Datastore itself sets the 1500-byte limit for indexed properties. You can add a String up to 1MB in size if you explicitly set the property to be unindexed. The Text type in the App Engine libraries is essentially a facade for an unindexed String value.

In order to set an unindexed property in your Java code, you need to define your String as a com.google.cloud.datastore.StringValue object, and set it to be excluded from indexes. Then, pass the StringValue into your entity builder, and you’re good to go:

I saw some answers on how to do this with Node.js, but I had to look up the javadoc for StringValue.Builder to sort this out in the Java libraries.

It’s worth noting that it’s a good practice to set properties as unindexed if you’re not actually going to use them in any Datastore queries, to improve performance and reduce Datastore-related costs.

The dangers of data storytelling

In analyzing data for business, they say you’re best off telling a story. You see, when you review data, you determine what happened, and then use that information to synthesize a narrative. This helps the consumers of your analysis understand the results, and it promotes executive buy-in for the job you’re doing as an analyst.

This approach isn’t wrong. In fact, it’s very important to effective communication. Yet, data journalism does have a vulnerability that I want to address, especially when analysts lose sight of what they’re doing when crafting the narrative that they’re going to share within their organization.

Data isn’t telling you a story. Data is the footprints of what happened; it’s the measured impact of change. Now, you can tell some useful, informed stories about what that change might have looked like: people visited a website, customers bought some things, fingers pressed some icons in an app. But each one of these stories needs an element of doubt, especially when the narrative is being told by a single analyst, or a single organization.

Different things can leave similar footprints. My toddler likes to play with my phone, and my wife’s phone, and throws off some interesting data when he does. He likes to use any app he can get into, he makes calls, and he’s found his way to random YouTube videos more than once. Sometimes he knows what he’s doing, but more often than not he’s just pressing icons until something interesting happens. If you were to examine user behaviour on my phone, you might not be able to tell what’s going on, and not everyone who looked at such data would come to the same conclusions about the phone’s owner. Maybe I’m a parent whose phone occasionally finds its way into a child’s hands. Maybe I occasionally erupt into uncontrollable spastic fits. Maybe my phone fell into the tentacles of an alien who is trying to figure out our technology and is systematically exploring the device.

This example is getting away from me, but the point is this: There might be many explanations for why the data you get is the data you get, and not everyone is going to think about that data the same way.

A recent editorial comment in the journal Nature made an argument for crowdsourcing academic research. Its authors put together a test, wherein 29 teams of researchers were given identical data and asked the same question. The data included details about soccer players, referees and their interactions, and the teams were each asked to determine whether or not referees were more likely to give red cards to players with darker skin.

“We found that the overall group consensus was much more tentative than would be expected from a single-team analysis”, the authors write. Since teams employed different analytical methods to their research, they arrived at different conclusions. “Had any one of these 29 analyses come out as a single peer-reviewed publication, the conclusion could have ranged from no race bias in referee decisions to a huge bias.”

The fact that groups of researchers come to different conclusions shouldn’t be startling, but its impact is. In academic journals, this leads to conclusions being established by individual studies, even after peer review, and then informing other research. Meanwhile, media reports take the results of academic studies and sensationalize the news in an effort to attract readership. This compacted bias takes results that are complicated and nuanced, and turns them into a simplified, definitive statement that may influence people’s behaviour.

“Under the current system, strong storylines win out over messy results.”

If this is true in academic research, which is meant to be quite thorough, careful and stringent, consider how such variances play out in business analysis, where the timelines are compressed and methodology isn’t carefully checked over. How do you regain confidence in results?

The lesson here is to be sceptical of any individual analyst’s conclusions, especially if their methodology is unclear and their results are not framed with details about how they were obtained.

To protect against this, an organization should work with multiple analysts, working both in and outside of the organization. There are advantages and biases that come with either position; analysts inside an organization are more likely to be influenced by internal politics and the culture of the group, but on the other hand, they may have more contextual information that sheds light on their results. External analysts are less likely to be swayed by the workings of the organization, and can offer a fresh perspective from their differing vantage point. By employing analysts in both situations and occasionally having them approach the same sets of data, you can mitigate the biases of their backgrounds and approaches. When they tell you the same story, then you can have greater confidence in it. When their stories differ, you gain a more nuanced perspective and protect yourself against making great leaps in the wrong direction when you act on the results.

This is part of why I always thought Napkyn’s Analyst Program was a great idea. I used to be a Senior Analyst at Napkyn, where the company offers dedicated analysts to large enterprises as a managed service. Throughout the development of that program, we were often faced with two challenges: organizations that wanted to have their analysis done in-house, and in-house analysts who felt threatened by the presence of external analytics professionals working with the company. Both of these problems go away when everybody sees the advantages of having multiple teams with different perspectives working with the same data. Outsourcing an analyst is no substitute for having internal teams looking at, consuming and sharing data; nor is it enough to leave every aspect of your business analysis to internal teams.

Finally, back to analysts: don’t always expect the data to tell you a clear story. Data is messy; sometimes that’s because it’s not recorded or organized well, but sometimes it’s because that data represents the footprints of messy actions. As an analyst, it’s not your job to write compelling stories; it’s your job to deliver the truth. If the truth can be told as a true story, that’s fantastic. You will certainly discover facts that can be presented in a compelling narrative, and those insights will be the easiest to act on. But when the stars don’t align, you’re going to have to communicate that results are inconclusive or unclear, because that’s the truth. You should never be delivering a story that’s merely inspired by true events.

 

Two ugly attitudes towards mental health

Diagnoses of mental disorders are on the way up. The DSM-IV task force, led by Allen Frances, sought to limit this inflation. The DSM-5 is expected by many (including Frances) to make it worse. If it does come to pass that nearly everyone can be diagnosed of a mental disorder of some sort, how should we think about what it means to have such a disorder?

Well, there are two attitudes that we shouldn’t have: we shouldn’t be afraid of apparent epidemics in mental health, nor should we take the other extreme and shrug off mental disorder entirely.

  1. For one, we can see the rise in diagnoses of certain disorders as a sign that all of the chemicals/”toxins”/technologies/radiation/whatever in our food/air/society or that the stresses and immoralities of our lives in the modern era are making us, well, crazy.
  2. We can, alternatively, reduce our thinking about mental health to “everybody’s got something”, meaning that we all fit on a spectrum of mental health somewhere and let’s not make a big deal about it.

The first position is a little big ignorant. I’m not saying that it can’t be that all the dangers of our world aren’t impacting our physical and mental health. They certainly are. But there are other social and psychological issues that are inflating our perception that things are going badly. One of them is the inflation caused by awareness of disorders and the diagnostic criteria of the DSM, in its various incarnations. If you become aware of these facts, you start to see that so-called epidemics are a product of our evolving classifications and not a change in the actual prevalence of certain conditions.

The second position is equally ignorant. Occasionally, people with severe mental health conditions are shrugged off by an individual because that individual knows someone else with a milder form of the same diagnosis. Mental disorders tend to exist on a variety of spectra, to be certain. But the fact that so many people can or are being diagnosed with mental disorders should not trivialize the experience of those who struggle through daily life because of them. To approach these people with a “so, what?” attitude isn’t helpful.

In general, I find our understanding of mental health lacking. We’re fearful that our children will have certain disorders, and in some cases we over-medicate them at the first sign of what might be perfectly normal distress. Yet, at the same time, we trivialize the plight of those who struggle with severe obstacles to mental health and have different mental abilities. There’s a middle ground between panic and apathy here.

‘Just in case’: if and only if?

The prevailing view in North American philosophical writing seems to be that the phrase ‘just in case’ can be translated into the phrase ‘if and only if’. Consequently, this view holds that the phrase ‘just in case’ is best symbolized by the logical connective known as the biconditional (\leftrightarrow).

Now, this seems wrong to me for two reasons. One is the difference between ‘just in case’ in this sense and the sense it has in British English, as noted by Geoffrey K. Pullum:

  • British English: “We’ll bring an umbrella just in case it rains.”
  • American philosophers: “A formula is a tautology just in case it is true on all valuations.”

That’s a fine difference to note, but I also have a hard time grasping why ‘just in case’ should count as ‘if’ and ‘only if’ at all. That is, to me, ‘just in case’ sounds more like ‘only if’. It seems that it spells out a necessary condition but not necessarily a sufficient one. Consider:

  • Something is a tree just in case it is a plant.

Now, according to what seems to be the standard view, this is a false statement because something can be a plant and not a tree. That is, ‘if something is a plant then it is a tree’ is false, so this sentence, just like ‘something is a tree if and only if it is a plant’ is false.

But it seems to me that this sentence actually means ‘something is a tree only in the case that it is a plant’. That is, I’m more inclined to translate ‘just in case’ as ‘only if’. Under such a translation, the above sentence is true, because being a plant is a necessary condition for treehood.

The problem is that the lexical definition of ‘just’, as an adverb, spells out multiple meanings. One is ‘exactly’ or ‘precisely’, which supports the prevailing intuition that ‘if and only if’ best captures the meaning of ‘just in case’–that it means ‘exactly in the cases that’. But there is also the meaning ‘only’ or ‘simply’. This is the source of my intuition.

Meanwhile, it seems that a number of students in elementary logic classes agree with me, since I often see them translating ‘P just in case Q’ into something like ‘P\to Q‘. Officially their textbook and notes equate ‘just in case’ with ‘if and only if’, so I’m not meant to give them the marks for this, but I do empathize.

Some Notes: Scott Soames, ‘Linguistics and Psychology’

Scott Soames argues that linguistics and psychology are separate enterprises, since they differ in their domain of study and empirical discoveries in one are unlikely to be realized in the other. He does this primarily by identifying what sorts of things linguists are up to, and comparing that with what is properly psychological.

Conceptually distinct

Soames says that linguistics and psychology are “conceptually distinct” (155) in the sense that they differ in their domain of study. In order to do this, he identifies what he calls the three “Leading Questions” (158) of the linguistic enterprise. They are questions concerned with the differences and similarities between actual natural languages, between natural languages and artificial or animal ones, and between languages and their historical variations. These are, according to Soames, the basic questions that define the domain of linguistics because they are the questions that initiate the actual practice of linguistics. It is these sorts of questions that linguists are out to answer.

Soames also highlights facts about linguistics which are clearly not psychological. For instance, semantics in linguistics requires a non-psychological component in the form of truth conditions. Truth conditions are essentially relations between sentences in a language, which may be thought of as abstract or mentalistic, and the real world. To use the famous example, ‘snow is white’ if and only if snow is white. While one can argue that ‘snow is white’ is a mentalistic object, it would be much more difficult to make the case that the fact that snow is white is psychological in nature. Hence, the case of truth conditions in semantics provides a counterexample for the claim that linguistics is entirely about the minds of language users, and hence the claim that linguistics is psychology falls apart. They must differ, at least somewhat, in their domains; some facts about language are linguistic and not psychological.

Soames also comes at the problem from the opposite side, noting that psychologists are concerned with things like the processing times and error rates between individuals speaking certain languages. These, while interesting facts for Soames, are not a part of linguistics proper. That is, theoretical linguistics is not concerned with mental aspects of human speech, but rather the output of the speakers, the language itself. Because there are things that psychologists are concerned with that linguists need not be, the domain again seems to be different. Some facts about language users are psychological and not linguistic.

In short, for Soames, linguistics is about languages as abstract objects, while psychology is about language users.

Empirically divergent

Soames’ second major claim is that linguistics and psychology are “empirically divergent” (155), that is, empirical investigation of language speakers is unlikely to discover that the grammars posited by linguists “correspond exactly” (168) with the mental structures of competent speakers.

To make this case, Soames notes that while some linguistic facts can correspond to psycholinguistic ones (such as the case of grammatical sentences and competent speakers judging sentences to be grammatical), others will not correspond. Instead, there are facts that only one discipline (between linguistics and psychology) will be interested in. Psycholinguistic data will be of interest to psychologists, but not to theoretical linguists. Meanwhile, semantic facts of truth conditions, logical properties and relations (169) will be of interest only to linguists. In formulating their theories, each discipline has its own epistemological domain as well, the domain of empirical facts that are to be admitted into the theory-forming process.

Despite this diversity, there is a logical possibility that the linguistic theory of grammar and the psychological theory of competence will turn out to be isomorphic after all; the theory of grammar may indeed correspond 1:1 to a psychologically real structure, however unlikely this is to Soames. But to say that linguistic theories are psychological in nature is to assume in advance that such theories do correspond. It would be an empirical discovery that an isomorphism exists between a grammar and a competence model.

Soames again appeals also to the actual practices of linguists, noting that linguists aim to produce a theory of grammar that is as simple and general. Again, it may be that the psychologically real model of competence is optimally simple and general, but this cannot be assumed. There is no reason to suppose that things will turn out this way. Hence, to suppose that building a minimalistic and general theory of grammar is the proper means of building a theory of competence is ill-conceived.

Because counting linguistic models as psychological ones rests on epistemologically dubious assumptions, which he thinks are unlikely to be the case, Soames argues that linguistics cannot properly be thought of as a psychological enterprise.

Does Soames beg?

Soames’ criterion of demarcation between the linguistic and the non-linguistic rests on the Leading Questions of linguistics. Soames takes these definitionally as what linguistics is about, which seems to beg the question.

Further, Soames also says that “nothing [linguistic] logically follows” (159) from certain facts about processing times and grammatical mistakes between speakers of different expression types. This is based on the assumption that languages are abstract entities and that linguistics is about them. If, on the other hand, one takes the position that languages are mentalistic in nature, Soames’ reasoning doesn’t seem to work.

A bit of ontology

Soames frequently refers to the facts that linguists and psychologists rely on. One might wonder whether these facts are all mentalistic in nature if they are meant to be separate from states of affairs. He also says that truth conditions are at least partly about non-psychological facts. How a theory of truth is to work, however, is no simple matter. A coherence theory of truth, however implausible, would not rely on a correspondence with an external reality but rather with how they logically cohere with other beliefs the speaker has—both elements thus being mental states. Pragmatic theories of truth might suffer from similar struggles.


Soames, Scott. ‘Linguistics and Psychology’. Linguistics and Philosophy 7 (1984). 155–179.

LaTeX: Numberless lines in fitch.sty

To write natural deduction proofs in LaTeX, I use a package called fitch.sty. The package was written by Johan W. Klüwer and offers a nice clean way to typeset Fitch-style proofs. He provides a nice example:

Lovely. However, in some of my proofs, I wanted to have lines without numbers because they featured information that was not strictly part of the proof. For instance, like others, I commonly add a line that indicates the formula we’re out to prove after the list of premises. This is especially useful in teaching proofs. That line, I don’t want numbered — instead I want the counter to skip that line and continue after it, like so:

I had to dig around in the fitch.sty file itself to figure out how to do this, since there’s not really any documentation outside of it. I figured I’d share what I did for anyone facing the same issue.

Here’s what you do. Instead of beginning a line with “\fa” or something like that, add a line like this:

\ftag{~}{\vline\hspace{\fitchindent} CONTENT } \\

Where CONTENT is replaced by whatever you want to have on that line. The exact code for the my ‘∴ B’ line, for example, is:

\ftag{~}{\vline\hspace{\fitchindent} \fbox{$\therefore~ B$}} \\

And that’s all there is to it. I hope this helps someone looking to do the same thing as I was.

Happy typesetting!