I am curious as to what defines “Big Data”. Is it considered a population, a large sample? Are we talking parameters or statistics?

My reply:

Anyone interested in the term “Big Data” and its implications would do well to read Gil Press’ article, A Short History of Big Data. http://onforb.es/16bw9Kt

When you generalize from the data that you have to any other case, the data is a sample. So, the results of Big Data analysis are statistics. However, Big Data sources are usually convenience samples, not random samples. So, the assumptions for classical statistical analyses are not met. That’s one reason why we should be cautious in interpreting Big Data, and not get too uppity about our fancy analyses.

Here’s the scoop:

*Data Mining for Dummies*, an easy-to-read new book for beginners in data mining, published by John Wiley and Sons, and available through your favorite bookseller.

Data Mining for Dummies is for business people, information technology professionals and students who want to…

• Know what data mining is all about

• See what’s really involved in data mining, icky parts and all

• Find friendly expert guidance for getting started as a hands-on data miner

*Data Mining for Dummies* is written in a light, yet no-nonsense, style for readers who are new to data mining. You won’t need any special expertise to read and understand this book.

Beginners can learn the basics of data mining, including

• Understanding data mining concepts

• Embracing a comprehensive data mining process

• Planning for data mining

• Gathering data from internal, public and commercial sources

• Preparing data for exploration and predictive modeling

• Building predictive models

• Selecting software and dealing with vendors

Author Meta S. Brown is a hands-on data miner who has educated thousands of beginners from industry, government and academia in the fundamentals of data mining. She’s known in the analytics community for her articles, books and talks on data mining, text mining and classical statistics, reaching out to audiences from novices to working professionals.

Here’s what Tom Khabaza, pioneering data miner and Founding Chairman of the Society of Data Miners has to say about *Data Mining for Dummies*:

Meta S. Brown tells it like it is, more than anyone else in the field.

Data Mining for Dummies is the first data mining book for beginners which gives an accurate picture of what we data miners do. This is a landmark for the profession, and an essential tool for anyone learning or teaching practical data mining. I will be recommending it to everyone I meet: business people, students and teachers alike.

Where to find Data Mining for Dummies:

• Your favorite independent bookseller (find one on Indiebound http://bit.ly/1ruU9n0)

• Powell’s City of Books http://bit.ly/1qFLkQG

• Amazon http://amzn.to/1eFD3WI

• Barnes and Noble http://bit.ly/1qFLAz8

• Ask your local library to get it. ISBN: 978-1-118-89317-3

Find out why it’s good to business analysts on your team. Article on All Analytics.

]]>The next generation of data analysts is breeding in the high schools. Article on All Analytics.

]]>Heard some terrific talks.

One of the things I particularly liked was Mark Eduljee’s concise set of seven principles for useful analysis. I’ll be writing about the details soon!

Mingzhu Lu mentioned embarrassingly parallel computing, another topic begging for more explanation – maybe I should do a distributed computing piece similar to my Bluffer’s Guide to NoSQL Databases.

Janine Johnson led a GATE workshop. This gave participants the opportunity to see GATE (a text analytics tool for developers) in action, and get a good sense of how developers can work with it. Some of the crowd installed GATE and tried it hands-on. The rest of us watched as Janine demonstrated – and I, for one, saw more of what the tool could do in the two hour workshop that I probably could have worked out on my own in two days. It probably would have taken me more than two hours just to install and get it running!

]]>“Storytelling with data is critical. But the emphasis is on data, not story.”

— Richard Hren, marketing strategist

Storytelling for Data Analysts

http://bit.ly/alla021

Often, what you need is not raw data. You might need to know the typical income of a plumber, the number of fatalities association with various forms of transportation, or the proportion of high-school students who graduated last year in your state. These are statistics.

Statistical information is available through many sources. Federal, state and local governments provide statistics. So do nonprofit organizations. Commercial entities develop statistics, and often make them available to the public, sometimes for a fee, yet often at no charge. In most cases, these statistics are prepared by well-qualified data analysts, who may provide significant information on background, methods and interpretation of results.

I’ve just written an article on great sources for statistics, to be published later this year. I’ll post an update with a link for you when it becomes available.

]]>Don’t Go Steady (With 1 Data Analysis Method)

http://bit.ly/alla020

When Less (Data) Is More (Information)

http://bit.ly/alla016

Mo’ Data Blues

http://bit.ly/alla018