Using “Big Data” to discover the unexpected in science

• O’Reilly Radar: Automated science, deep data and the paradox of information

This is the best post I’ve read recently, and it deals with discovering unexpected stories with large amounts of data.  It takes me back full circle to the article that started me off in this Big Data / Data Science direction:

• Wired: The End of Theory: The Data Deluge Makes the Scientific Method Obsolete

Read both – it will be time well spent.  It shows how far I’ve still got to go – but it is writing like this that keeps me pushing in that direction.

Some extracts from the O’Reilly article:

…we’re excited because we can begin to distill patterns that were previously invisible to us due to a lack of information…

That’s big data.

Of course, data are just a collection of facts; bits of information that are only given context — assigned meaning and importance — by human minds. It’s not until we do something with the data that any of it matters.

…none of that means anything until someone makes a story out of the results.

Big data, data mining, and machine learning are becoming critical tools in the modern scientific arsenal.

…it’s possible to automate, or at least semi-automate, critical aspects of the scientific method itself

Sure, any data scientist worth their salt can take a mountain of data and reduce it down to a few simple plots. And such plots are important because they tell a story. But those aren’t the only stories that our data can tell us.

While it’s good to have a model that fits your data, knowing where the model breaks down is not only important for internal metrics, but it also makes for a more interesting story…

The interpretation of big data analytics can be a messy game.

I’ve formulated three laws of statistical analyses:

• The more advanced the statistical methods used, the fewer critics are available to be properly skeptical.

• The more advanced the statistical methods used, the more likely the data analyst will be to use math as a shield.

• Any sufficiently advanced statistics can trick people into believing the results reflect truth.

…”the more we turn to computers with these big questions, the more they’ll give us answers that we just don’t understand.”

Our goal as (data) scientists should be to distill the essence of the data into something that tells as true a story as possible while being as simple as possible to understand.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s