Searching the Semantic Web: Ethical Issues in the Semantic Web Searching

AUTHOR
Lawrence M. Hinman

ABSTRACT

We are entering the third generation of web searching, and—like all new generations—it raises new ethical issues that the older generation must confront. This paper offers a preliminary consideration of some of the ethical issues raised by the development of the semantic web and the search techniques that will allow us to access it. Among those issues is a profound reframing of the balance between global and local frames of reference, especially as this is reflected in the development of W3C’s Web Ontology Language (OWL).

Localization dominated the first generation of web searching, which principally took the form of informal guides and content directories such as one found in the early days of Yahoo. Typically, such guides were specific to a certain locale or country, or at least specific to a particular set of issues. Often they were limited by the linguistic competence of their creators, and they occurred at a time when machine-based translation was in its infancy and more prone to produce baby talk than adult prose.

The second generation of web searching began when search engine technology began to emerge in its current form: massively arrayed computing power, storage capacity beyond the wildest dreams at the beginning of the web, and sophisticated web crawlers, spiders, and bots of all species that actively and increasingly accurately were able to retrieve, store, and index the material on the web. With the addition of sophisticated algorithms to establish the relative prominence of particular pages and site, this technology was brought to near perfection by Google and its competitors. This was, moreover, an intrinsically global technology, particularly with the advent of increasingly sophisticated translation technology. (Interestingly, Google has emerged as one of the international leaders in the development of new translation software. News@Nature.com, 7 November 2006) Once the translation barrier was crossed, there was no reason in principle that we could not have truly universal search engines, accessing all information on the web, regardless of language.

To be sure, there remained an important local function even in this picture. The process of importing already-existing information into the web for the first time continued to be a primarily local enterprise, the undertaking of nations, ethnic and tribal groups, scholarly societies, etc. This will continue to be the case in the foreseeable future. Indeed, as cell-phone-based searching becomes increasingly common, there will be even greater incentives to make local information available on-line. Even such a massive and ambitious project as Google Scholar seems to be focusing primarily (at least initially) on English language works. Interestingly, in its German version (http://scholar.google.de), one finds a greater sensitivity to the protection of information, a sensitivity that reflects both German law and German sensibilities in this area.

Certain political factors have also contributed to keeping searches localized. China has been the most widely publicized case. Not only have Chinese officials successfully sought to limit results available to local Chinese residents, but they have in several high profile cases sought to use search histories to track down dissidents. Nor is China the only country where this is the case. A number of Middle Eastern countries seek to limit search results available to their citizens both on political grounds and also on moral grounds (pornography, gambling, etc.).

In 2001, Tim Berners-Lee published his seminal article on the semantic web in Scientific American. Although a semantic web was part of Berners-Lee’s vision of the web since 1994, it is only recently that such a web seems within our technological reach. The traditional web, the one with which we are so familiar, was fundamentally organized as a series of documents, or clusters of documents, that human beings could access individually. First and foremost, the current web has become a massive collection of pages that individuals can look at, read, study, listen to. What has been missing, according to Berners-Lee and others, has been a structure of meaning within which these individual bits of information contained in documents could be embedded in a more meaningful and useful way. This is the underlying vision of a semantic web.

The movement toward a semantic web is, in many ways, a movement toward globalization, at least in format and ontology. The pressure for a semantic web comes from many forces, at least some of which are associated with globalization in other areas. Business, commerce, trade, and travel all become easier on a global level if there is a common semantics governing the exchange of information across the web. Such exchange becomes far easier if there is a framework for describing resources that is shared by all the participants in the exchange, and for this we need a Resource Description Framework (RDF) which is then able to provide Universal Resource Identifiers (URIs), which contains the possible characteristics of particular kinds of objects and thereby facilitates easy ranging across a set of data that would otherwise not be recognizable by a computer as being of the same type. This provides the beginnings of a universal framework within which all the bits of information on the web can be situated.

One of the most interesting conflicts internal to this developing set of standards concerns ontologies, and this controversy bears directly on the issue of “Glocalisation.” In order to see this more clearly, we can divide ontologies into two categories: top-down and bottom-up. Top-down ontologies are currently being established in many areas of the natural sciences, such as genomics and epidemiology, where there is a great need for the rapid and accurate and continual exchange of information across traditional geographical and political boundaries. Such ontologies typically resonate with, and reinforce, movements toward globalization. Folksonomies, the contrasting approach to data structuring is called provide much more emphasis on what could be called indigenous classificatory systems. These are rooted in the ways in which particular communities structure or tag their data, and as such are much friendlier to the “local” aspect of “Glocalisation.”

REFERENCES

Berners-Lee, t., J. Hendler, and O. Lassila, “The Semantic Web,” Scientific American, May 2001, pp. 34–43.

Berners-Lee, T., R.T. Fielding, and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” IETF RFP 3986 (standards track), Internet Eng. Task Force, Jan. 2005; http://www.ietf.org/rfc/rfc3986.txt.

Introna, Lucas D. and Helen Nissenbaum (2000) “Shaping the Web: Why the Politics of Search Engines Matters”, The Information Society, Vol. 16, No.3, 1-17.

Machill, M., Welp, C., eds. Wegweiser im Netz: Qualität und Nutzung von Suchmaschinen. Bielefeld: Verlag Bertelsman Stiftung, 2003.

Shadbolt, Nigel, Wendy Hall, and Tim Berners-Lee, “The Semantic Web Revisited,: IEEE INTELLIGENT SYSTEMS (MAY/JUNE 2006), 96-101.

Sparck-Jones, K. “What’s New about the Semantic Web? Some Questions,” SIGIR Forum, vol. 38, no. 2, 2004. http://www.acm.org/sigir/forum/2004D/sparck_jones_sigirforum_2004d.pdf .

Watts, D.J.,P.S. Dodds, and M.E.J. Newman, “Identity and Search in Social Networks,” Science, vol. 296, 2002, pp. 1302–1305.