UPDATE: I missed SWs blog post. Brilliant!
Early versions of this approach go back nearly 50 years, to the first phase of artificial intelligence research. And incremental progress has been made—notably as tracked for the past 20 years in the annual TREC (Text Retrieval Conference) question answering competition. IBM’s Jeopardy system is very much in this tradition—though with more sophisticated systems engineering, and with special features aimed at the particular (complex) task of competing on Jeopardy.
Wolfram|Alpha is a completely different kind of thing—something much more radical, based on a quite different paradigm. The key point is that Wolfram|Alpha is not dealing with documents, or anything derived from them. Instead, it is dealing directly with raw, precise, computable knowledge. And what’s inside it is not statistical representations of text, but actual representations of knowledge.
The first blush answer would be: NO.
The linguistics are simply not there yet.
However, if Jeopardy questions were more “computational” vs. linguistic and fact retrivial the answer might be: YES.
Wolfram|Alpha has the raw power to do it, but it lacks the data and linguistic system to do it.
IBM was clever to combine the history of Jeopardy questions with tons of documents. It’s similar, but not the same as, common sense engine from Cyc. It’s not fully computational knowledge. It’s semantic. It’s cleverness comes from the depth of the question training set and the document training set.
It would breakdown quickly if it were seeing questions about facts that had never been printed in a document before. An example would be “How far away will the moon be tomorrow?”
Wolfram|Alpha can answer that! Now, what’s challenging is that there is a much bigger universe of questions that have never been asked than those that have! So Wolfram|Alpha already has far more knowledge. However, its linguistics are not strong enough to clearly demonstrate that AND it will probably never catch up! Because Wolfram|Alpha can answer questions that have never been asked so people will always ask it questions that will trip it up… they will always push the linguistics.
In the end, a combination of Watson, Wolfram|Alpha and Cyc could be very fun indeed!
Perhaps we should hack that up?