Friday, 22 of February of 2019

Category » Big Data

Big Data: Parameters or Statistics?

Here’s a good question that came up in the LinkedIn Statistics & Analytics Consultants discussion group (from John Rogers, slightly edited here):

I am curious as to what defines “Big Data”. Is it considered a population, a large sample? Are we talking parameters or statistics?

My reply:

Anyone interested in the term “Big Data” and its implications would do well to read Gil Press’ article, A Short History of Big Data.
When you generalize from the data that you have to any other case, the data is a sample. So, the results of Big Data analysis are statistics. However, Big Data sources are usually convenience samples, not random samples. So, the assumptions for classical statistical analyses are not met. That’s one reason why we should be cautious in interpreting Big Data, and not get too uppity about our fancy analyses.

1 comment

News galore

Data Mining for Dummies, my epic tome for beginning data miners, is available now.

Here’s the scoop:

Data Mining for Dummies, an easy-to-read new book for beginners in data mining, published by John Wiley and Sons, and available through your favorite bookseller.
Data Mining for Dummies is for business people, information technology professionals and students who want to…
• Know what data mining is all about
• See what’s really involved in data mining, icky parts and all
• Find friendly expert guidance for getting started as a hands-on data miner
Data Mining for Dummies is written in a light, yet no-nonsense, style for readers who are new to data mining. You won’t need any special expertise to read and understand this book.
Beginners can learn the basics of data mining, including
• Understanding data mining concepts
• Embracing a comprehensive data mining process
• Planning for data mining
• Gathering data from internal, public and commercial sources
• Preparing data for exploration and predictive modeling
• Building predictive models
• Selecting software and dealing with vendors
Author Meta S. Brown is a hands-on data miner who has educated thousands of beginners from industry, government and academia in the fundamentals of data mining. She’s known in the analytics community for her articles, books and talks on data mining, text mining and classical statistics, reaching out to audiences from novices to working professionals.
Here’s what Tom Khabaza, pioneering data miner and Founding Chairman of the Society of Data Miners has to say about Data Mining for Dummies:
Meta S. Brown tells it like it is, more than anyone else in the field.
Data Mining for Dummies is the first data mining book for beginners which gives an accurate picture of what we data miners do. This is a landmark for the profession, and an essential tool for anyone learning or teaching practical data mining. I will be recommending it to everyone I meet: business people, students and teachers alike.
Where to find Data Mining for Dummies:
Your favorite independent bookseller (find one on Indiebound
Powell’s City of Books
Barnes and Noble
• Ask your local library to get it. ISBN: 978-1-118-89317-3

Comments Off on News galore

When Less (Data) Is More (Information)

Have Big Data? Here’s why you still need to understand sampling.

When Less (Data) Is More (Information)

Comments Off on When Less (Data) Is More (Information)

Mo’ Data Blues

There’s a lot that Big Data can’t do.

Mo’ Data Blues

Comments Off on Mo’ Data Blues