Have Google and Bing Gotten Better At Answering Questions?
A Survey of Test Queries from Lotfi Zadeh’s 2006 Work
The father of Fuzzy Logic, Lotfi Zadeh, set out a series of test queries in a 2006 presentation to demonstrate the difficulties in parsing natural language questions and answering them with useful information. Google was chosen (as the most advanced search engine at the time) and this article replays his test queries in 2020 to see what advances have been made in both Google and Bing.
Elementary, My Dear Watson
Lotfi Zadeh, the father of Fuzzy Logic and renowned AI researcher, begins his 2006 presentation “From Search Engines to Question-Answering Systems — The Problems of World Knowledge, Relevance, Deduction and Precisiation” by speaking highly of the many advances and features of today’s search engines, specifically citing Google as at the top of its game and regularly making improvements.
Yet within this compliment sandwich lies the unsavory meat of a proposition: that search engines like Google lack deduction capability.
That is to say: they cannot “synthesize an answer to a query by drawing on bodies of information that reside in various parts of a knowledge base.”
They are not true “Question/Answer” systems.
Zadeh cites the problems confronting such Q/A development:
- World Knowledge: the sort of knowledge and information humans gain by experience, which he points out is heavily influenced by perception
- Relevance: all search engines deal with this in their own ways — it’s a difficult, unsolved problem to perfectly choose which sources of information to synthesize answers from or even display the most relevant answer every time
- Deduction: “Usually Robert returns from work at 6pm. What is the probability he arrives at 6:15?”
All three of these are subject to the same, fundamental problem: natural language processing. In order to handle NLP, he argues, we must also solve the problem of:
- Precisiation of meaning: construction of a computational model of an information carrying proposition (words, sentences, questions, etc)
The latter problem, including the very definition of “precisiation” as used by the author, is explored more deeply in his presentation “Precisiation of Meaning— Toward Computation with Natural Language.”
We could easily continue diving down the rabbit hole of all the problems facing NLP and the precisiation of information, but this article’s goal is to do something far simpler: see if any improvements have been integrated into Google’s search engine (and its competitors) that pass the Q/A style test queries that Lotfi Zadeh presented more than a decade ago.
Test Queries
The following queries are taken verbatim from his 2006 presentation, and I’ve interpreted the results to make it clear to the reader whether it “passes” or not.
All queries were submitted to Google and Bing.
Query #1: What is precisiation?
When Zadeh originally submitted this query to the Google search engine in 2006 it simply linked back articles with the word match for “precisiation.”
His point here was that Google in 2006 made no attempt to treat the question as a question that needed to be answered. It simply fell back to linking keywords. If Google had deductive capabilities it could have already learned the specialized meaning of “precisiation” and presented a synthesized definition (since it couldn’t rely on Merriam-Webster).
In 2020, the search engines treat the query thus:
Google: No Improvement
Much like in 2006 we get keyword matches first, with the inclusion of the Italian Wiktionary entry for the derivatives of “presicare” which has a root meaning similar to “precisiation.”
Unlike Bing it recognizes that the word is valid in scholarly articles and doesn’t default to a spelling correction to “precipitation.”
Bing: Precipitation Auto-correct, but Good Definition Available
Bing immediately thinks I’ve misspelled “precipitation” but, since there are results for “precisiation” as well, offers me a choice to override its decision. Google normally does this too when there aren’t enough matches, but in this case Bing was the more aggressive auto-corrector.
This time, after selecting the “Do you want results only for ‘What is precisiation?’?” link, I received a quote-box containing an actual definition from Lotfi Zadeh himself in his work: “Truth and Meaning.”
What’s interesting about this is that at least Bing was able to recognize that a definition existed in the slide-show presentation it links to, and it managed to select the most appropriate definition from a series of bullet points that mention precisiation.
Was this human driven or automatic? If the latter its impressive that Bing was able to discern that “Precisiation serves as a bridge between natural languages and mathematics” or “Precisiation of meaning is not a traditional issue. Precisiation of meaning goes beyond representation of meaning” were the less definitive statements compared to the quote it pulled to the front-page result.
I clicked the little thumbs-up button to let them know they were on the right track. I also set myself a reminder to ask the researchers at Bing how that response was generated.
Note, however, that the rest of the results are still fixated on “precipitation!”
Query #2: What is the population of the capital of New York?
The second query we’re testing is technically “q₂” from the above slides, where “q₁” was the simpler “What is the capital of New York?” and acted as a baseline to explain the more complex “q₂.”
Zadeh wanted to test if Google could relate two different parts of its knowledge base based on a relationship between them. In other words, was Google smart enough to not only have the data for the population of Albany, but could it deduce that it needed to solve the sub-question “What is the capital of New York?” first and then use the answer as the key to looking up the answer for the fuller question.
As you can see from the second slide, Google could not do this in 2006.
In 2020?
Google: Perfect Answer
Google passes this test now with flying colors, instantly providing the capital of New York’s name with the requested population data and a helpful graph visualizing its change over time.
Bing: Perfect Answer
Bing also passes the test, clearly announcing the population as requested along with the city name. It should be noted that the top info-box containing the population does change if you simplify the question to “What is the capital of New York?” — it appropriately reacts to the sub-question.
Query #3: What is the distance between the largest city in Spain and the largest city in Portugal?
While similar to Query #2 this one is far more difficult, in that the relationship is no longer one related key lookup between New York capital->Albany->Albany.population
but two independent fact discoveries and then performing a calculation based on their relative positions.
Google previously was only able to make travel suggestions — as though you were trying to take a trip instead of asking about a specific fact.
Now?
Google: Simple Keyword Matching
Google is unable to parse the question and simply attempts to match portions of the keywords.
Bing: Simple Keyword Matching and Tool Suggestion
Bing is much the same, suggesting a driving distance calculator first and then further keyword matched based results second.
Query #4: Age of son of Chirac
The fourth test query, like #2, requires a sub-question to be solved before the primary question can be solved: “Who is the son of Chirac?”
Note that a previous question “Age of Chirac” makes it clear that the “father of the son of Chirac” is Jacques René Chirac, the former Prime Minister and President of France.
One would expect, based on Zadeh’s definition of precisiation that the question could be broken down into a model involving Chirac.children.son
and the birthdate
of that son as well as calculating the current age of the child based on today’s date.
Yet the tricky thing about this question is that Chirac had no sons, just daughters. A system capable of deductive capability would inform the user of this fact as a substitute for the desired knowledge.
Or perhaps make a suggestion about his son-in-law(s).
Still, what do we get from the current search engines?
Google: Keyword Matching with Notoriety
Google appropriately identifies the father (Chirac), but fails to comprehend the reference to a son or lack thereof. Further, the references demonstrate that these are keyword matches.
Bing: Keyword + Some Conceptual Matching (Age as Dates)
Bing has similar results to Google in this query — unable to parse the meaning of the question. Unlike Google, however, it decides to highlight references to dates/death/age in the numeric sense, having correlated the question keyword of age to the concept.
Query #5: How many Ph.D. degrees in mathematics were granted by European Universities in 1986?
This final question requires not only conceptual understanding of different components in a knowledge base, but aggregation of data and constraints based on time across a class of different data sources.
It’s no surprise that Google in 2006 was unable to do more than suggest tangentially related articles to the keywords from the question.
And, sadly, no significant progress has been made on this front by either search engine. The best either can do is keyword matching.
So Have they Gotten Better?
In short: More than ten years later these questions still stump search engines.
Are the search engines able to answer questions requiring any sort of deductive reasoning or parsing of inferred sub-questions? No.
There is some minor improvement (see Query #2) in simple relationships, but anything requiring more “human” experience to understand falls flat on its face.
Lotfi Zadeh, having demonstrated these piercing questions as a method to quickly recognize how well a system could act as a Question/Answer engine, went on to write numerous works about solving general deduction problems and NLP.
Of course it can always be argued that Google and Bing are not Q/A systems and not intended to be Q/A systems.
On the other hand, there is great market value in being able to answer your customers’ questions before someone else can, which explains the deep motivations of both companies to move ahead quickly in incorporating more Artificial Intelligence into their products.
Think of these test queries as highlighting capabilities that future search engines will have as they evolve closer to the Q/A model, and not a condemnation of the keyword-based matching systems current generation search engines employ.