Friday, 22 of February of 2019

Category » Data Mining

News galore

Data Mining for Dummies, my epic tome for beginning data miners, is available now.

Here’s the scoop:

Data Mining for Dummies, an easy-to-read new book for beginners in data mining, published by John Wiley and Sons, and available through your favorite bookseller.
Data Mining for Dummies is for business people, information technology professionals and students who want to…
• Know what data mining is all about
• See what’s really involved in data mining, icky parts and all
• Find friendly expert guidance for getting started as a hands-on data miner
Data Mining for Dummies is written in a light, yet no-nonsense, style for readers who are new to data mining. You won’t need any special expertise to read and understand this book.
Beginners can learn the basics of data mining, including
• Understanding data mining concepts
• Embracing a comprehensive data mining process
• Planning for data mining
• Gathering data from internal, public and commercial sources
• Preparing data for exploration and predictive modeling
• Building predictive models
• Selecting software and dealing with vendors
Author Meta S. Brown is a hands-on data miner who has educated thousands of beginners from industry, government and academia in the fundamentals of data mining. She’s known in the analytics community for her articles, books and talks on data mining, text mining and classical statistics, reaching out to audiences from novices to working professionals.
Here’s what Tom Khabaza, pioneering data miner and Founding Chairman of the Society of Data Miners has to say about Data Mining for Dummies:
Meta S. Brown tells it like it is, more than anyone else in the field.
Data Mining for Dummies is the first data mining book for beginners which gives an accurate picture of what we data miners do. This is a landmark for the profession, and an essential tool for anyone learning or teaching practical data mining. I will be recommending it to everyone I meet: business people, students and teachers alike.
Where to find Data Mining for Dummies:
Your favorite independent bookseller (find one on Indiebound
Powell’s City of Books
Barnes and Noble
• Ask your local library to get it. ISBN: 978-1-118-89317-3

Comments Off on News galore

How I Met Your Model

Mathematical models: You hear about them in the business media all the time, but what are they? What do they do, how do they work, and how are they are created?

How I Met Your Model

Comments Off on How I Met Your Model

Data mining book on the horizon…

Watch for a Spring 2014 release for this new data mining book. I wrote a section devoted to text analytics.

Right now, the title on the description, the title on the image, and the title I have from the editor are all different. And you won’t see me or the other coauthors in the description. Nor are the details of the contents here yet. Hope they’ll get that all sorted out by Spring!

Comments Off on Data mining book on the horizon…

New book: IBM SPSS Modeler Cookbook

I took the summer off from updating this blog, but not from writing! In the days to come, I’ll post about two upcoming books, articles you may have missed, and more.

IBM SPSS Modeler Cookbook, a how-to book for data miners who have some experience with the product, will be on the street this month. Get step by step examples and tips from a team of product insiders and crack data miners! We went through the pain of figuring this stuff out, so you don’t have to!

IBM SPSS Modeler Cookbook will be available in paperback and several e-book formats. You can order directly from the publisher or from online booksellers:, Amazon UK, Barnes and Noble, Safari Books Online and O’Reilly.

IBM SPSS Modeler Cookbook on Amazon:

IBM SPSS Modeler Cookbook on the Packt Publishing website:

Book Details
Language : English
Paperback : 382 pages [ 235mm x 191mm ]
Release Date : October 2013
ISBN : 1849685460
ISBN 13 : 9781849685467
Author(s) : Keith McCormick, Dean Abbott, Meta S. Brown, Tom Khabaza, Scott R. Mutchler

Comments Off on New book: IBM SPSS Modeler Cookbook

New book coming out

Returning from my summer away from blogging, with news.

IBM SPSS Modeler Cookbook, my new book with coauthors Keith McCormick, Dean Abbott, Tom Khabaza and Scott Mutchler is expected out in November from Packt Publishing.

This is a how-to book for those who have a little experience and want to sharpen their skills.

1 comment

Selecting Big Data sources for predictive analytics

New article on Smart Data Collective:
Selecting Big Data Sources for Predictive Analytics

Comments Off on Selecting Big Data sources for predictive analytics

Spare Me Tales of Your Massive Data Cluster

If I only had a dollar for every man who has bragged to me about the size of his Hadoop cluster! It’s not the size of your data that matters, but the size of the problem you can solve with it. More on this in my article, “Spare Me Tales of Your Massive Data Cluster”.

Comments Off on Spare Me Tales of Your Massive Data Cluster

Big Data and the insurance industry

Lately I have seen an upswing in press releases and media mentions of predictive analytics in the insurance industry. This industry has been making use of predictive analytics since long before the term was coined, so it’s intriguing to see that there are still many opportunities for improvement.

Working with insurance companies over the years, I’ve seen that rate-setting is generally supported by expert analysis and excellent analytic methods and tools. Yet other applications, such as marketing and process improvement, are often treated rather casually by comparison.

Over the past decade or so, much analytic power has been directed toward fraud analysis in insurance. This is a good application – there are good success stories going around with very impressive ROI. The ROI is important, too, because these applications often demand considerable resources, and the costs for software and IT support are considerable. Many people point to insurance fraud as a model Big Data application. Just one thing to know about fraud detection: while the value seems obvious, it isn’t necessarily the most costly problem facing the industry, nor the analytics opportunity with the best payback.

An insurance company once presented me with a pile of data and asked what I could do with it. First issue: they just sent data – no metadata, not even the names of the fields. Seems they had been led to believe that data mining was so magical that I wouldn’t need to know what those numbers represented. But I did need to know, and after a while they coughed up that information.

Perusing the data, I saw the potential to use it for process improvement. But everyone else was focused on fraud. Would the insurance company care about process improvement? If they could process claims at lower cost, would the savings be attractive?

I was fortunate to have access to an insurance industry insider with a lot of experience, so I gave him a call. He told me, in the bluntest of language, that the cost of fraud was a flea on the posterior by comparison with the routine costs of processing claims. Cha-ching! This might not have been true for all insurers but was clearly a big issue in the industry, and one that none of my competitors were addressing.

There was one more beautiful thing about it – it’s not a Big Data problem. Fraud detection, sooner or later, forces you to touch every individual row of data that comes in the door. Process improvement in general, and this example in particular, can be addressed with relatively small samples at the modeling stage, and there is no need to score every case. You sample, you study, you put what you learn into action (repeat as needed). So the potential returns of process improvement were greater, and the costs lower, than fraud applications.

Ladies and gentlemen, please remember that what counts in analytics is not the size of the data, but the size of the problem you can address.

Big Data Analytics: Reframing Political Campaigns

Planned Parenthood is using Big Data analytics to find its supporters throughout the country – and we’re seeing the results play out in current events. My new post on Smart data Collective:

Big Data Analytics: Reframing Political Campaigns

Comments Off on Big Data Analytics: Reframing Political Campaigns

Big Data Blasphemy: Why Sample?

New post on Smart Data Collective today, “Big Data Blasphemy: Why Sample?”

Comments Off on Big Data Blasphemy: Why Sample?