Implementing a Fuzzy Relational Database Using Community Defined Membership Values
Karen L. Joy, Smita Dattatri
Source: ACM Southeast Regional Conference archive Proceedings of the 43rd annual Southeast regional conference - Volume 1.
1.PROBLEM AND MOTIVATION
Most conventional databases in use today are based on the relational model. Values in a relation are taken from a finite set of strictly typed domain values. Each relation in the database represents a proposition and each record in a relation is a statement such that it evaluates to ‘true’ for that proposition (e.g., [3], [5]). It could be argued, however, that this required precision actually gives an insufficient representation of the world. The model is grounded in binary black-and-white but much of reality actually exists in shades of gray. As such, the conventional relational database model has limited usefulness.
One area that illustrates this limitation is in the everyday, subjective language generally used to describe people. For instance, a person might be described as being “tall, with a wide face and very dark brown eyes”. This description would be difficult to represent under the conventional relational model both because it uses descriptive words that are inherently imprecise, and also because differing communities (i.e., groups with internal agreement on subjective meanings of these terms) may describe the same person differently. The purpose of this research project was to implement a relational database that represented images with imprecise attributes and that, by incorporating user feedback, would adapt the descriptions of the attributes so that the database appropriately represented the consensus of a particular user community.
2. BACKGROUND AND RELATED WORK
The traditional relational database model is based on precise values. The concept of fuzziness as seminally described by Zadeh [7], however, includes imprecision, uncertainty, and degrees of truthfulness of values. Fuzzy relational database theory extends the relational model to allow for the representation of imprecise data and thus, it provides a more accurate representation of the world that it models (e.g., [6], [2]).
Natural language processing allows database queries using everyday language. A natural language interface usually maintains its own dictionary containing terms associated with the relations and their relationships as well as a standard language dictionary. A standard approach used in natural language systems is that a semantic grammar is generated for the database and is used to parse the query; synonyms must be mapped as well. The drawback of this approach is that the grammar must be tailormade for each specific database and there may insufficient information in the database to create a reliable system [1]. This approach would have limited usefulness with a database/querying system intended to continually modify its descriptors based on user feedback.
3. APPROACH AND UNIQUENESS
In this project, a database system was designed to allow users to query a database using subjective, everyday language and to return images that met these subjective descriptions. Further, each user community, not the database designers, would define the images. Thus, each community could describe the same images differently. The fuzzy relational model is ideal for representing these subjective image descriptors. There are various methods of incorporating fuzziness into a database system. The method used here was to enhance a conventional relational database by adding a membership value [0,1] to represent the truthfulness of each proposition—the degree of each attribute—in a relation. This strategy allows the database to represent a range of values without compromising the data integrity constraints found in the relational model [4].
The application was implemented in VB.NET and SQL Server 2000. User-entered natural language queries were parsed by a separate component into the actual query and into its attributes and its modifiers, and these were stored in data files. Implemented stored procedures processed these data files, i.e., determined the value ranges for the modifiers, ran the modified query against the database, and returned the appropriate images to the user.
Every attribute of the images in the database was initially assigned a random membership value. The attribute modifiers were given numerical ranges. For example, the modifier ‘medium’ (e.g., “medium green eyes”) ranged from 0.30 to 0.74 and ‘very’ ranged from 0.75 to 1.0. Each modifier also utilized a threshold value that was designed to ‘stabilize’ each attribute’s membership value within the modifier’s range to represent most accurately that community’s consensus. This stabilizing property was implemented as follows. When users viewed the images that currently matched their criteria, they provided feedback indicating either that the image met their criteria, or that they believed the image would be better defined using a stronger or a weaker modifier. If the user chose the former then the attribute’s membership value was moved toward the threshold value of that modifier’s range. By adjusting the membership weight so that it was more deeply within the modifier’s range, the community opinion was strengthened with concurring feedback. However, if the user chose the latter then the attribute’s weight was moved slightly away from the threshold value in the direction suggested by the user. Thus, it was through use by each community that its correct values were established. This strategy potentially creates a database that is more robust and with improved overall usefulness.
4. RESULTS AND CONTRIBUTIONS
Initial results showed the prototypal implementation to be successful. Users were able to retrieve images using subjective descriptors, and different user groups were able to successfully modify the database image attributes to accurately represent their group’s meanings.
Future project work will include developing an improved natural language interface to translate more effectively every day language descriptor queries into SQL. Additional work will also be directed towards improved methods for generating synonyms of the attribute descriptors. Currently the synonyms are mapped by the natural language interface. However, as discussed above, this method has limited usefulness for a database system that is designed to be malleable. Work is currently underway to develop a system whereby the database itself continually learns the synonyms for each user community.
5. REFERENCES
[1] Bhootra, R. A and Mehrotra, A. Overview of natural
language interfaces. Directed Research Project, Virginia
Commonwealth University, Richmond, VA, 2004.
[2] Chen, G. Fuzzy functional dependencies as integrity
constraints. In Fuzzy Logic in Data Modeling: Semantics,
Constraints and Database Design. Kluwer Academic,
Boston, MA, 1998.
[3] Codd, E. F. A relational model for large shared data banks.
Communications of the ACM, 13, 6 (1970), 377-387.
[4] Date, C. J. On fuzzy databases. Database Debunkings
(3/12/04); www.dbdebunk.com/content2004.html.
[5] Date, C. J. A note on relation-valued attributes. In An
Introduction to Database Systems (8th ed.). Pearson/Addison
Wesley, Boston, MA, 2004.
[6] Petry, F.E. Fuzzy Databases: Principles and Applications,
Kluwer Academic, Boston, MA, 1996.
[7] Zadeh, L. A. Fuzzy sets. Information and Control, 8,
(1965), 338-353.