June 2017

Let's talk about stats, baby

  • Article

What kind of data visualization you use ultimately depends on what kind of data you have, and what you want to achieve. Let’s break down some different types of data and how they typically get put to work within a data visualization context.  

Factlets vs factoids

Factlet or factoid? According to CNN, a factoid is “a little-known bit of information; trivial but interesting data.” However there seems to be some semantic debate on the matter, with the New York Times weighing in with this pronouncement: 

“...factoid, which seems with us to stay, has three senses. The first is accusatory: "misinformation purporting to be factual; or, a phony statistic." The second is neutral: "seemingly though not necessarily factual"; the third is the CNN version: "a little-known bit of information; trivial but interesting data." 

Let’s assume we’re dealing with factlets then; brief facts. Here’s an example: 

“In the United States, income inequality, or the gap between the rich and everyone else, has been growing markedly… In 2009, CEO’s of major corporations averaged 263 times the average compensation of American workers.”

Factlets are eminently shareable because they’re short and interesting. They are often counter-intuitive or surprising, something you might mention in conversation. Usually you’ll find factlets in press releases, tweets and articles, and in a dataviz context you would most commonly put them to work in infographics. 


Statistics often take the form of percentages, like 62% (sixty two percent), but the same concept can be expressed as 62/100, or 0.62, depending on which is the most compelling within the context of the piece.  

Aggregate data

Aggregate data give high-level numbers that cover a selection of elements (years, countries, countries, groups of people etc). Because they are often represented visually, aggregates give the user the ability to quickly spot trends and patterns. Providing the actual numbers allows for detailed comparison as well, particularly if you provide interactive filtering features that give the user options to explore and drill-down into the data. 

Aggregate data are are most often used for articles, reports, charts and more sophisticated dataviz. 

Raw data

Data is their undoctored form are not too pretty. They look something like this: 

Using the right knowledge and tools, you can turn the numbers into something like this:

If you want a peek behind the curtain into how datasets are prepped and normalized, FFunction's CEO Sebastien Pierre wrote a very good piece for O’Reilly Media about APIs that’s worth reading. Here’s an excerpt: 

“ very rarely comes in an ideal format. Very often, you’ll find your data contains a multitude of formats (dates are a common offender there, where you’ll find YYYY-MM-DD along with DD/MM/YYY and, for good measure, MM/DD/YYYY). As a result, you almost always need to normalize your data to ensure that all the fields are in the same format. Normalization is not the only chore, though. If you want to keep your data up to date, you’ll also need to make sure that your new data format hasn’t changed since the last time (fields shuffled, added, removed, etc.), which means you’ll need to create a program to validate your new data. And this necessitates having an automated way to retrieve the data. Once you finally have valid, normalized data, you can eventually process the data into a format that is suitable for a visualization.” 

Big data

I remember someone at a business conference likening big data to teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it. 

There is an awful lot of hype about big data right now, and many people are wondering how it’s going to impact their organizations going forward. Obviously data volumes will continue to grow, but the question hanging in the air is: what can we do with it?  

But bigger isn’t always better when it comes to data; you can create some equally compelling narratives with small data. the most important element is the quality of the dataset (how “clean” is it? Are there elements missing? What story does it tell? Are the data correct in the first place?).

The bottom line is fairly simple: If the dataset isn’t solid, the dataviz won’t be either.   


Rebecca Galloway