It’s almost become a cliche to say that we have so much information now that our biggest challenge is finding the relevant pieces, not making sense of them. And this is a fairly representative article cheerleading for the new technologies that will help with data search, focusing on data mining and the construction of a federated solution that can normalize across very different data sources with different base formats. But this is also a representative article in the misleading way that it blithely says what technologies will allow us to do, with no discussion of where we are now on these projects, what the hard problems remaining are, and realistically assessing how far away these solutions really are. As just a sampling of claims:
“By automatically classifying, summarizing, and discovering the “who,” “what,” “where” and “when” of each document, publishers, government organizations, and enterprises can do more than ever before — on a massive scale.”
“Visualizations allow users to quickly sift through and locate information and patterns in hierarchical, relational, tabular, or time-based data sets;”
“This new generation of solutions will need to go deeper than keyword search — it will require a deep understanding of language, to lend structure to unstructured data for use in downstream analysis and assessment.”
and the closing statement :”The world’s data is at your fingertips – where it should be.”
There is a single acknowledgment of fallability when it is mentioned that people can look at the output of an information extraction system to give feedback to improve the accuracy. But overall, this article reads as if all of these promises are a year or two away. Particularly when it comes to all of the claims about systems being able to handle widely diverse data sources in widely different structural formats, including free-form text, this is just flat wrong. On top of that, the ethical barriers to developing and deploying such solutions aren’t mentioned at all.
All of the technologies mentioned do come out of the problems researchers want to solve, but the business community is not going to get the universal tools that are described here any time soon.