Subjective and Objective Knowledge in a small and complex Relational Data Base

Philip Hughes Farro


The present paper combines work being carried out by the author in two fields of study: in the Professional Ethics task force of the Costa Rican Computer Professional College and in the postgraduate programme in Cognitive Science at the University of Costa Rica. The interrelationships between computer professionals, users, administrators and computer equipment are seen here as a scenario where ever-increasing complexity and technological dependence desperately require computer based explanatory tools. This requirement is justified on moral authority grounds [see Hoven 94] and a specific computer model is proposed for small and complex data bases.

Society, in its search to understand reality, has devised forms and representations for the data that particularly interests its wellbeing. Comparing Incan quipus with modern data bases we can find at least one common factor: vectorial data representation.

For the first time in history, information systems can produce answers in natural language to formulated questions. Previously, answers needed to be compiled and interpreted by a technician or expert who could understand not only the data context (aplication) but also the data representations and mechanisms used by the information system. Naturally the expert alone could give the reasoning behind the answers so produced.

Establish a truth value over a set of propositions is a necessary condition to exercising any profession. For example lawyers follow strict judicial protocol, designed to determine and document true events, maintained and developed for over 2000 years, and in medicine, clinical procedures are used in order to correctly diagnose illness. In informatics there is no established practice to determine the truth value of computer produced data, yet the term “computer profession” is in general use. The social and ethical consequences of this can be serious and far reaching.

As with the scientific method, the truth of a computer produced statement is dependent on perceived world data, effective measurement of the data, the model used to interpret the facts, the processes used in the model and the method used to display the results. An error in any one of these five processing areas can cause false results.

Increased complexity in information systems has tended to widen the gap between systems designers and future users who receive partial views of a data base with little possibility of inquiring into the original system design. That is, a user is epistemically dependent on the system creator. This situation has arisen probably because, with exponential increase in computing capacity, importance has been given to a cuantitive increase in users and data whereas the qualitative aspect of data manipulation, for example reasoning and verification, has remained at a mimimal level.

System and user documentation are traditionally offered as “explanation” to system operation, system design and user profiles. A user may receive answers from a system if he or she follows a required formalism in the submission of questions. However, systems may, and must, change rapidly to reflect needed corrections or new requirements. Documentation obviously cannot keep pace with system change. How therefore can truth values be assured across frequent model and data changes?

In answer to the stated problems above, we propose that the introduction of the question “why?” at all levels of data base usage (design staff, administrators and users) is critical. To a computer produced proposition X, the question “why X?” may be interpreted in two possible ways: as the chain of partial operational results from input data to X, or as that part of the model (system) that has been used to produce X. The scientific method requires both to be available, also that the chain and result be repeatable. In a very volatile system it may not be possible, with present day technology, to duplicate particular results.

Let us now propose a computer model that will begin to overcome the stated problems. In a compiled program or data base the original data structures, field names and operations are virtually lost in the resultant machine code. We therefore propose an interpreted data base as a means of ensuring permanent access to design and implementation criteria. In order to allow access to all raw and design data without previously establishing any user view bias leads us to furthur propose that all data be stored in RAM; in this way all possible search mechanisms can be optimized. We note here that this means that several typical database operations, such as indexing and joining, are unnecessary under this data modelling method.

Under this scheme, raw data (or flat files in the data base context) and design criteria, organized into some arbitrary model, may be examined immediately by professionals who have had no previous contact with that particular application.

We note furthur that traditional data base design criteria [see Codd 90] presuppose disk storage as the primary medium for data storage and access with consequent relatively slow access times when compared with RAM storage. This supposition may be gradually eliminated as modern technology provides cheaper RAM with each successive upgrade in hardware.

An interpreted data base definition will naturally require new types of variables in order to substitute the recursive and cyclic search processes typical of data base user routines. In this area we propose three new types of variables called structure, pointer and domain. These are all recursive vector variables, that is, an element of the variable can be a variable of the same type.

A query language over the previously defined data, design variables and data relations can now be formalized. Answers to standard data base queries may be given that include design or model criteria as well as data dependent chains. The vocabulary of this language will automatically include terms defined in specific models such as user defined names of fields, files and relations.

An implementation of the above proposed modelling scheme has been developed in standard C for a Unix-TCP/IP client-server platform.