Thursday, March 15, 2012

Google's Assault on Semantics

Apparently Google has decided to rise to the challenge of Microsoft’s Bing, which grew out of technology introduced by Powerset, regardless of whether or not that challenge is carrying any weight.  According to a report by Amir Efrati, which appeared yesterday on the Web site for The Wall Street Journal, Google is planning a phased transition from its keyword-based search system into a technology that is more “semantic.”  (The quotes indicate that this is Google’s word choice;  but they are also scare quotes to bring attention to the complex nature of semantics in linguistic studies, along with related concepts such as “knowledge” and “understanding.”)

I see from my archives that, back in November of 2009, I accused Google of being a “small boy with a hammer who sees everything as a nail.”  Comparing what I said then with what I read today, I realized that, over the years, Google’s primary objective seems to have been to add more and more (and presumably larger and larger) hammers to its tool box, possibly at a time when fewer and fewer nails are being used in the world.  Here are three critical paragraphs from Efrati’s article:

Amit Singhal, a top Google search executive, said in a recent interview that the search engine will better match search queries with a database containing hundreds of millions of "entities"—people, places and things—which the company has quietly amassed in the past two years. Semantic search can help associate different words with one another, such as a company (Google) with its founders ( Larry Page and Sergey Brin).

Google search will look more like "how humans understand the world," Mr. Singhal said, noting that for many searches today, "we cross our fingers and hope there's a Web page out there with the answer." Some major changes will show up in the coming months, people familiar with the initiative said, but Mr. Singhal said Google is undergoing a years-long process to enter the "next generation of search."

Under the shift, people who search for "Lake Tahoe" will see key "attributes" that the search engine knows about the lake, such as its location, altitude, average temperature or salt content. In contrast, those who search for "Lake Tahoe" today would get only links to the lake's visitor bureau website, its dedicated page on Wikipedia.com, and a link to a relevant map.

Consider, now, the poor soul who wants to plan a ski trip.  By watching local weather reports on San Francisco television, he quickly finds out, particularly at this time of year, that there is good skiing in the Tahoe area.  In contrast to Singhal’s “in contrast” world, a visitor’s bureau page is probably exactly what this guy wants and could care less about the salt content of the lake.

To be fair, however, this vacation-planning guy probably wants something like a visitor’s bureau page because that is what he expects to find when doing a Google search.  In other words, to draw upon some of the pioneering research in semantics by Roger Schank, he thinks about planning a vacation in terms of a “script.”  He formed that script on the basis of past experience that informed him about what he could do with the variety of tools (not just metaphorical hammers) at his disposal.  More importantly, however, his behavior reflects a tight coupling between knowledge and action.  We need knowledge in order to act;  but we also understand that knowledge on the basis of how we act, rather than as some “entity” in a vast network of other entities.

The last time I tried to take on Google about such matters, my post was entitled “Google Gets it Wrong About Service.”  This was an easy topic to pursue, since service is all about action, basically outsourcing some action that we either cannot or do not want to do to some third party who, in some way or another, can be counted on to do it more effectively.  If we wish to think of a search engine as providing a service, rather than simply retrieving a bunch of pointers by analyzing some words according to a page rank algorithm, then we need to account for what the guy who wants that service is doing or wants to do.  Very knowledgeable logicians have cracked their heads over whether or not concepts involving actions and motives can be expressed through the symbolic primitives that constitute their stock-in-trade;  and they have yet to come up with any particularly viable analytic solutions.

However, the failure of logic may lie in the fact that it is, by design, objective.  Matters of motive, on the other hand, are subjective, usually with considerable social influence.  Thus, a truly “semantic” system will have to have a fair amount of knowledge about your personal psychology and probably the sociology of that corner of the world you inhabit.  Can this be done?  It certainly is a challenging research question, and Google probably has the resources to support an appropriate research program.  However, the more important question is:  Do you really want Google to have a model of your psychology and the sociology of your world?  If you think that advertising is already invasive, imagine what it would be like if Google could tap into your personal psychology and sociology!

1 comment:

jones said...

Back when there was a lot of hype about Wolfram Alpha, and when people were trying to figure out what it was, I performed a little experiment. I typed into Wolfram Alpha the query "1 1 2 3 5 8 13" which it correctly identified as the Fibonacci series. I then typed the same query into Google, which correctly produced the Wikipedia entry for "Fibonacci Number" as the first result. For all the hype about Alpha's "computational knowledge engine" Google demonstrated a remarkable ability to yield similar results as a sort of emergent epiphenomenon of the behavior of many individual netizens. My concern about Google changing their search results so regularly is that, with every change, the bias of Google's engineers will cause certain results to be given priority and others to be suppressed. Users who knew how to find a certain piece of information using Google as a tool may suddenly encounter difficulty relocating that same information. Most of what passes for "knowledge" today is really just a certain skill at manipulating search engines. Now even that debased form of "knowledge" is subject to the same pressures as the 24 hour news cycle -- "knowledge" is transposed into a competitive environment (rather than serving as a repository of accepted wisdom) and made to be a product "of the moment." One simple solution would be for Google to offer users a choice of what algorithm the search engine employs: to allow users to winnow and sift as *they* see fit. this is, of course, unlikely, if for no other reason than the simple fact that most people (including Google engineers) are acculturated to think of the new as both better and as a replacement for whatever came before: to be authentic, progress must amputate the past. For Google to make past algorithms available therefore becomes nearly an incoherent proposition.