I am curious as to what defines “Big Data”. Is it considered a population, a large sample? Are we talking parameters or statistics?

Anyone interested in the term “Big Data” and its implications would do well to read Gil Press’ article, A Short History of Big Data. http://onforb.es/16bw9Kt

When you generalize from the data that you have to any other case, the data is a sample. So, the results of Big Data analysis are statistics. However, Big Data sources are usually convenience samples, not random samples. So, the assumptions for classical statistical analyses are not met. That’s one reason why we should be cautious in interpreting Big Data, and not get too uppity about our fancy analyses.

]]>Heard some terrific talks.

One of the things I particularly liked was Mark Eduljee’s concise set of seven principles for useful analysis. I’ll be writing about the details soon!

Mingzhu Lu mentioned embarrassingly parallel computing, another topic begging for more explanation – maybe I should do a distributed computing piece similar to my Bluffer’s Guide to NoSQL Databases.

Janine Johnson led a GATE workshop. This gave participants the opportunity to see GATE (a text analytics tool for developers) in action, and get a good sense of how developers can work with it. Some of the crowd installed GATE and tried it hands-on. The rest of us watched as Janine demonstrated – and I, for one, saw more of what the tool could do in the two hour workshop that I probably could have worked out on my own in two days. It probably would have taken me more than two hours just to install and get it running!

]]>“Storytelling with data is critical. But the emphasis is on data, not story.”

— Richard Hren, marketing strategist

Often, what you need is not raw data. You might need to know the typical income of a plumber, the number of fatalities association with various forms of transportation, or the proportion of high-school students who graduated last year in your state. These are statistics.

Statistical information is available through many sources. Federal, state and local governments provide statistics. So do nonprofit organizations. Commercial entities develop statistics, and often make them available to the public, sometimes for a fee, yet often at no charge. In most cases, these statistics are prepared by well-qualified data analysts, who may provide significant information on background, methods and interpretation of results.

I’ve just written an article on great sources for statistics, to be published later this year. I’ll post an update with a link for you when it becomes available.

