Consider a large university library. Tens of thousands of books, periodicals, and other information resources are available for use. But to...
Consider a large university library. Tens of thousands of books, periodicals, and other information resources are available for use. But to access these resources, a categorization scheme must be developed. To navigate this large volume of information, librarians have defined a classification scheme that includes a Library of Congress classification code, keywords, author names, and other index entries. All enable the user to find the needed resource quickly and easily.
Now, consider a large component repository. Tens of thousands of reusable software components reside in it. But how does a software engineer find the one she needs? To answer this question, another question arises: How do we describe software components in unambiguous, classifiable terms? These are difficult questions, and no definitive answer has yet been developed. In this section we explore current directions that will enable future software engineers to navigate reuse libraries.
Describing Reusable Components
A reusable software component can be described in many ways, but an ideal description encompasses what Tracz has called the 3C model—concept, content, and context.
The concept of a software component is “a description of what the component does” . The interface to the component is fully described and the semantics— represented within the context of pre- and postconditions—are identified. The concept should communicate the intent of the component.
The content of a component describes how the concept is realized. In essence, the content is information that is hidden from casual users and need be known only to those who intend to modify or test the component.
The context places a reusable software component within its domain of applicability. That is, by specifying conceptual, operational, and implementation features, the context enables a software engineer to find the appropriate component to meet application requirements.
To be of use in a pragmatic setting, concept, content, and context must be translated into a concrete specification scheme. Dozens of papers and articles have been written about classification schemes for reusable software components . The methods proposed can be categorized into three major areas: library and information science methods, artificial intelligence methods, and hypertext systems. The vast majority of work done to date suggests the use of library science methods for component classification.
Figure presents a taxonomy of library science indexing methods. Controlled indexing vocabularies limit the terms or syntax that can be used to classify an object (component). Uncontrolled indexing vocabularies place no restrictions on the nature of the description. The majority of classification schemes for software components fall into three categories:
Enumerated classification. Components are described by a hierarchical structure in which classes and varying levels of subclasses of software components are defined. Actual components are listed at the lowest level of any path in the enumerated hierarchy. For example, an enumerated hierarchy for window operations might be
window operations
display
open
menu-based
openWindow
system-based
sysWindow
close
via pointer
...
resize
via command
setWindowSize, stdResize, shrinkWindow
via drag
pullWindow, stretchWindow
up/down shuffle
...
move
...
close
...
display
open
menu-based
openWindow
system-based
sysWindow
close
via pointer
...
resize
via command
setWindowSize, stdResize, shrinkWindow
via drag
pullWindow, stretchWindow
up/down shuffle
...
move
...
close
...
The hierarchical structure of an enumerated classification scheme makes it easy to understand and to use. However, before a hierarchy can be built, domain engineering must be conducted so that sufficient knowledge of the proper entries in the hierarchy is available.
Faceted classification. A domain area is analyzed and a set of basic descriptive features are identified. These features, called facets, are then ranked by importance and connected to a component. A facet can describe the function that the component performs, the data that are manipulated, the context in which they are applied, or any other feature. The set of facets that describe a component is called the facet descriptor. Generally, the facet description is limited to no more than seven or eight facets.
As a simple illustration of the use of facets in component classification, consider a scheme that makes use of the following facet descriptor:
{function, object type, system type}
Each facet in the facet descriptor takes on one or more values that are generally descriptive keywords. For example, if function is a facet of a component, typical values assigned to this facet might be
function = (copy, from) or (copy, replace, all)
The use of multiple facet values enables the primitive function copy to be refined more fully. Keywords (values) are assigned to the set of facets for each component in a reuse library. When a software engineer wants to query the library for possible components for a design, a list of values is specified and the library is searched for matches. Automated tools can be used to incorporate a thesaurus function. This enables the search to encompass not only the keyword specified by the software engineer but also technical synonyms for those keywords. A faceted classification scheme gives the domain engineer greater flexibility in specifying complex descriptors for components . Because new facet values can be added easily, the faceted classification scheme is easier to extend and adapt than the enumeration approach.
Attribute-value classification. A set of attributes is defined for all components in a domain area. Values are then assigned to these attributes in much the same way as faceted classification. In fact, attribute value classification is similar to faceted classification with the following exceptions: (1) no limit is placed on the number of attributes that can be used; (2) attributes are not assigned priorities, and (3) the thesaurus function is not used.
Based on an empirical study of each of these classification techniques, Frakes and Pole indicate that there is no clear “best” technique and that “no method did more than moderately well in search effectiveness . . .” It would appear that further work remains to be done in the development of effective classification schemes for reuse libraries.
The Reuse Environment
Software component reuse must be supported by an environment that encompasses the following elements:
• A component database capable of storing software components and the classification information necessary to retrieve them.
• A library management system that provides access to the database.
• A software component retrieval system (e.g., an object request broker) that enables a client application to retrieve components and services from the library server.
• CBSE tools that support the integration of reused components into a new design or implementation.
• CBSE tools that support the integration of reused components into a new design or implementation.
Each of these functions interact with or is embodied within the confines of a reuse library.
The reuse library is one element of a larger CASE repository and provides facilities for the storage of software components and a wide variety of reusable artifacts (e.g., specifications, designs, code fragments, test cases, user guides). The library encompasses a database and the tools that are necessary to query the database and retrieve components from it. A component classification scheme serves as the basis for library queries.
Queries are often characterized using the context element of the 3C model . If an initial query results in a voluminous list of candidate components, the query is refined to narrow the list. Concept and content information are then extracted (after candidate components are found) to assist the developer in selecting the proper component.