Designing and Filtering On-line Information Quality: New Perspectives for Information Service Providers

Laure Berti and David Graveleau


Blowed out by the ever-widening explosion of networked documents, casual users and business executives are inevitably confronted with the difficulty in determining the value of on-line information. Clearly, despite their flexible accessibility and inherent capability of being manipulated by desktop, not all this mass of material is worth accessing and reading, the reasons for this disappointing state of affairs are manyfolds:

  • the extreme but omnipresent context of retrieval is overinformation and eventually disinformation,
  • many on-line databases are plagued with erroneous data,
  • data usually do not meet users’ needs (context mismatch),
  • collected data are multi-source but their future use doesn’t necessarlycorrespond to the prescribed one,
  • contextual meta-information are lost in most cases.

In the database and information systems context, a body of research has layed much emphasis on system quality engineering and software quality metrics, but a more system-introspective trend of research focuses on the quality of data as an information product [1]. In its formative stage, Data Quality research is emerging with methodologies, frameworks for conceptual specifications, techniques and tools to fight against the costly data non-quality problem in information systems [3] [4] [2].

In this paper, we propose, as a useful starting point, to set research in Information Technologies back in context with a special attention payed to data quality and on-line Information value. We will define the basic concepts such as value of information (related to both generation context and final use context), consistancy, accuracy, reliability and relevance of information, we considered as intrinsic dimensions of data quality. Meta-information about the contexts of information extraction and consumption are crucial and should be captured with data. The major problem is to explicilty represent and store in databases or information systems the informal aspect of information (which orientates the interpretation) and, more precisely, the notion of “information performance” ; that is, its capability to cause effects and reactions on the information receiver (e. g. he/she is sceptic or convinced)*. The concepts of Overinformation and Disinformation will then be discussed. These aspects of on-line information quality (or non-quality) will be presented in the way they are used for filtering or value-adding information available electronically in Competitive Intelligence applications.

The sheer scale of the Web activities and the volume of web-based information systems forcibly presents the Internet community with several urgent and nagging questions concerning the value of data and the performance of information we’ll attempt to answer in the perspective of content selection and certification [5] [6] [7]:

  • what are + good quality ; on-line information?
  • among the www-documents published on-line, some are perfectly irrelevant, can a standard for exclusion of non-relevant information and information sources be proposed?
  • how can the notion of quality of data be introduced, formalized, and estimated for filtering non-relevant information or value-added information in a commonly accepted standard?
  • how can the data consumers can capture the value of information and mesure it as an asset ? Without such measures it is difficult to create meaningful performance metrics of data quality relatively to user’s requirements and identify ways to take advantage of and track it. The ability to quantify and measure an asset is an important component of making that asset truly + strategic ; in the context of Technological Watch.

In integrating more contextual meta-information and tagging information quality, we attempt to create value-added data from on-line information we collect on the Web. Data manufacturing encouraged us to seek out cross-disciplinary analogies we’ll present.