Using “Big Data” to discover the unexpected in science
• O’Reilly Radar: Automated science, deep data and the paradox of information
This is the best post I’ve read recently, and it deals with discovering unexpected stories with large amounts of data. It takes me back full circle to the article that started me off in this Big Data / Data Science direction:
Read both – it will be time well spent. It shows how far I’ve still got to go – but it is writing like this that keeps me pushing in that direction.
Some extracts from the O’Reilly article:
…we’re excited because we can begin to distill patterns that were previously invisible to us due to a lack of information…
That’s big data.
Of course, data are just a collection of facts; bits of information that are only given context — assigned meaning and importance — by human minds. It’s not until we do something with the data that any of it matters.
…none of that means anything until someone makes a story out of the results.
Big data, data mining, and machine learning are becoming critical tools in the modern scientific arsenal.
…it’s possible to automate, or at least semi-automate, critical aspects of the scientific method itself
Sure, any data scientist worth their salt can take a mountain of data and reduce it down to a few simple plots. And such plots are important because they tell a story. But those aren’t the only stories that our data can tell us.
While it’s good to have a model that fits your data, knowing where the model breaks down is not only important for internal metrics, but it also makes for a more interesting story…
The interpretation of big data analytics can be a messy game.
I’ve formulated three laws of statistical analyses:
• The more advanced the statistical methods used, the fewer critics are available to be properly skeptical.
• The more advanced the statistical methods used, the more likely the data analyst will be to use math as a shield.
• Any sufficiently advanced statistics can trick people into believing the results reflect truth.
…”the more we turn to computers with these big questions, the more they’ll give us answers that we just don’t understand.”
Our goal as (data) scientists should be to distill the essence of the data into something that tells as true a story as possible while being as simple as possible to understand.