Tuesday, 21 of November of 2017

Category » Web and Social Media Analytics and Design

Big Data: A Big Trap for Product Development?

Kathleen Morrissey, a Partner at Strategy 2 Market (s2m) will present “Big Data: A Big Trap for Product Development?” at Chicago Product Management Association on Thursday, August 9, 2012.

What an important topic to tackle! How easy it can be to invest a fortune in a solution in search of a problem. I’ll be in the audience, and I hope some of my data-focused colleagues will also attend and add some life to the discussion.


Chicago Web Analytics Meetup has a new home

The Chicago Web, Game and Social Media Analytics Meetup has been around several years and has developed a substantial membership. Now, the group has a new home. Thoughtworks, a global IT consultancy based in Chicago, will host meetings at their headquarters at 200 East Randolph. Last week, I presented “Crossing the Language Chasm: Extracting Information from Foreign-Language Text” for the group at the new location, and it was a pleasure. The space is roomy, comfortable and a great match for this use. The meeting was well attended, and I expect that the new space will help to build attendance.

If you didn’t get to attend the presentation, you can read the original article on Smart Data Collective:

Crossing the Language Chasm: Extracting Information from Foreign-Language Text


Upcoming presentations

Social Media Analytics Summit, April 17-18, San Francisco
Capitalize on Multi-lingual Social Media Analytics

European Text Analytics Summit, April 23-24, London
Cross-lingual Text Analytics: A New Frontier in Linguistic Technology

Chicago Web, Game and Social Media Analytics Group, May 2, Chicago Free!
Crossing the Language Chasm: Extracting Information from Foreign-Language Text

Predictive Analytics World, June 25-26, Chicago
Cross-Language Text Analytics: Overcoming Language Barriers


Stuff I learned from my web stats

Visits have increased by nearly a factor of ten since I added a blog to the site.

Pre-blog, activity varied little by time of day. Post-blog, there’s a clear spike in activity when posts appear in the morning.

A lot of people visit my site using the links left with comments on other sites.

Google’s spiders crawl the site with amazing frequency! (You’d think I was CNN or something.)

There was a dramatic spike in visits on February 28. (I have no idea why.)

Someone found me by searching for the term “douchegrammer.” (Wonder if he or she was pleased or disappointed?)


Leave a comment

Men and women want mostly the same stuff

This is hilarious – a summary of research on the wants of men and women, as expressed in social media, from Netbase. We have a lot more similarities than differences! You must read this for yourself, but here’s a hint – we all want food!

BTW, I tweeted about this a few days ago, just once, and was stunned when I noticed that my tracking link was drawing hundreds of clicks. Turns out that the tweet got picked up by msn now. I would never have known if I hadn’t tracked. The moral: use tracking links and you may learn something!


Leave a comment

Responses to Big Data Blasphemy

Here are a couple of interesting comments that have come up for my recent article on Smart Data Collective: “Big Data Blasphemy: Why Sample?”

From Simon Geletta, Associate professor at Des Moines University, commenting in the SAS Analytics & BI group on LinkedIn:

“That (everything being equal) bigger samples result in narrower confidence intervals is not a matter of opinion… The argument for sampling as presented in this blog, would benefit (become worth following) if the blogger can demonstrate that estimations that are based on sampe (from a bad sampling frame) yield better results as compared to estimations that are based on the bad sampling frame itself.”
My thoughts:

Yes, it is a matter of fact that bigger samples result in narrower confidence intervals.

The case for sampling does not depend on samples from a data resource producing better estimates than using all of the data available. There are cases in which the sample data can be more carefully inspected, corrected and otherwise cleaned, and in those cases estimates may, indeed, be better than those which would be made using a larger, dirtier, data resource. However, even when there is no improvement in data quality for a sample, there is always the issue of balancing the resources consumed with the value of the information obtained. The question is not whether the estimate obtained using a sample is better than the estimate that would be obtained by using all available data. Rather, it is a matter of obtaining “good-enough” information to address the business problem at hand, and of doing so in a manner that does not waste resources which could yield greater returns if used in some other way.

From Blaise Egan, Lead Network Infrastructure Analyst at British Telecommunications PLC, commenting in the Predictive Analytics group on LinkedIn:

“This reflects my own experience encountering data miners with a background in computer science rather than statistical science. To some extent they have been hoodwinked by misleading sales material from vendors of large-scale computing systems, both hardware and software.

It’s an important message that you’re putting out.”


Big Data Blasphemy: Why Sample?

New post on Smart Data Collective today, “Big Data Blasphemy: Why Sample?”


From the horse’s mouth

The New York Times report about Target’s pregnancy prediction model has made a splash in 2012, but Predictive Analytics World had it in 2010, presented by Target’s own Andrew Pole. His talk is much clearer on just how hard and imperfect this process really is.See the video.


On translation and text analytics

Lately I have been speaking out against translating text with the aim of feeding it into text analytics tools that were not designed for use with the original language of the text. On several occasions, some conscientious person has stepped up and announced that, on the contrary, this practice is producing good results. All of these have been respected colleagues, and I don’t care to pick on them individually.

But, speaking to all of you as a group, here’s something you need to know: your customers are coming to me behind your back and telling me your results are lousy.


The Asian Banker – Rising to the challenge of cross-lingual sentiment analysis

The Asian Banker has established a working group focused on data and analytics. They offer specialized content and events for this audience, in a members-only environment. Today they are featuring my piece, “Rising to the challenge of cross-lingual sentiment analysis.”