The bottom line first: if you are interested in a broad and non technical introduction to the subject of “Big Data” then you should read this book. It is short and highlights a number of points (some that aren’t necessarily clear from reading elsewhere.)
Importantly in the first chapter it says that to be practising “big data” projects you do not have to be dealing with millions of data points. There may be a lot less but the issue is that you should be working will all the data that is available to you rather than just a sample. With all the data, it is possible to analyze it in different ways. With just a sample you will likely be limited to what you can discover after the sample has been taken. The authors discuss the very first article I read about this subject, Wired’s The End of Theory. It’s very interesting to read how the article is now regarded.
People may have to get used to the data revealing what is happening without actually revealing why it is happening. In some areas we will have to let go somewhat of the (natural) desire to understand the reasons behind the results.
The authors deal with the subject of data getting “messier” (becoming more imprecise) as as you increase the amount you are collecting:
However in many new situations that are cropping up today allowing for imprecision – for messiness – may be a positive feature not a shortcoming. It is a tradeoff. In return for relaxing the standards of allowable errors, one can get a hold of much more data. It isn’t just that “more trumps some” but that, in fact, sometimes “more trumps better”.
Because this data set consists of more data points, it offers far greater value that likely offsets its messiness.
Big Data transforms figures into something more probabilistic than precise.
So more trumps less. And sometimes more trumps smarter.
“Simple models and a lot of data trump more elaborate models based on less data.” (quote from Peter Norvig, Google)
… treating data as something imperfect and imprecise lets us make superior forecasts and thus understand out world better
The chapter on “Datafication” of just about everything is a good balance of history and the insights that can be gleamed from today’s social media giants. Location is particularly important:
The point is that these indirect uses of location data have nothing to do with the routine of mobile communications, the purpose for which the information was initially generated. Rather, once location is datafied new uses crop up and new value can be created.
Datafication is only just starting, but now it is under way it will continue, with many benefits:
Once the world has been datafied, the potential uses of the information are basically limited only by one’s ingenuity.
Seeing the world as information, as oceans of data that can be explored at ever greater breadth and depth offers us a perspective on reality that we did not have before.
Another important point is that humans will have to get used to the fact that their opinion is not always the best:
… the biggest impact of big data will be that data-driven decisions are poised to augment or overrule human judgement.
This is likely to mean a change in the requirements needed to do a specific job. The importance of experience will diminish as insight from data can dwarf the experience of one person.
Mathematics and statistics, perhaps with a sprinkle of programming and network science, will be as foundational to the modern workplace as numeracy was a century ago and literacy before that.
… the winners will be found among large and small firms, squeezing out the mass in the middle.
Big data squeezes the middle of an industry, pushing firms to be very large, or small and quick, or dead.
Re-use of data is looked at – old data can be combined with new in different ways to discover or exploit new opportunities. So what is the value of data? A company may have relatively few assets but a massive company valuation – therefore is the difference between the two the value of the data the company controls? That could mean billions of pounds / dollars / etc.
A number of times there were names of sites or companies that led me to put the book down, check out a website or install an app. The chapter called “Implications” is particularly good for that, but it does slow down the reading somewhat. Even when a book is this recent some of the examples are now out-of-date (for example, Decide.com shutting its doors as its staff join ebay). This is a fast moving field.
There is a lot more to this book, impressive given that it is only 200 pages long. I’m glad I read this book – it puts so much into focus.