Harnad, S. (1990) The Symbol Grounding Problem. Physica D 42: 335-346. --------------------------------------------------------------- THE SYMBOL GROUNDING PROBLEM Stevan Harnad Department of Psychology Princeton University Princeton NJ 08544 ABSTRACT: There has been much discussion recently about the scope and limits of purely symbolic models of the mind and about the proper role of connectionism in cognitive modeling. This paper describes the "symbol grounding problem": How can the semantic interpretation of a formal symbol system be made intrinsic to the system, rather than just parasitic on the meanings in our heads? How can the meanings of the meaningless symbol tokens, manipulated solely on the basis of their (arbitrary) shapes, be grounded in anything but other meaningless symbols? The problem is analogous to trying to learn Chinese from a Chinese/Chinese dictionary alone. A candidate solution is sketched: Symbolic representations must be grounded bottom-up in nonsymbolic representations of two kinds: (1) "iconic representations" , which are analogs of the proximal sensory projections of distal objects and events, and (2) "categorical representations" , which are learned and innate feature-detectors that pick out the invariant features of object and event categories from their sensory projections. Elementary symbols are the names of these object and event categories, assigned on the basis of their (nonsymbolic) categorical representations. Higher-order (3) "symbolic representations" , grounded in these elementary symbols, consist of symbol strings describing category membership relations (e.g., "An X is a Y that is Z"). Connectionism is one natural candidate for the mechanism that learns the invariant features underlying categorical representations, thereby connecting names to the proximal projections of the distal objects they stand for. In this way connectionism can be seen as a complementary component in a hybrid nonsymbolic/symbolic model of the mind, rather than a rival to purely symbolic modeling. Such a hybrid model would not have an autonomous symbolic "module," however; the symbolic functions would emerge as an intrinsically "dedicated" symbol system as a consequence of the bottom-up grounding of categories' names in their sensory representations. Symbol manipulation would be governed not just by the arbitrary shapes of the symbol tokens, but by the nonarbitrary shapes of the icons and category invariants in which they are grounded. KEYWORDS: symbol systems, connectionism, category learning, cognitive models, neural models 1. Modeling the Mind "1.1 From Behaviorism to Cognitivism." For many years the only empirical approach in psychology was behaviorism, its only explanatory tools input/input and input/output associations (in the case of classical conditioning; Turkkan 1989) and the reward/punishment history that "shaped" behavior (in the case of operant conditioning; Catania & Harnad 1988). In a reaction against the subjectivity of armchair introspectionism, behaviorism had declared that it was just as illicit to theorize about what went on in the head of the organism to generate its behavior as to theorize about what went on in its mind. Only observables were to be the subject matter of psychology; and, apparently, these were expected to explain themselves. Psychology became more like an empirical science when, with the gradual advent of cognitivism (Miller 1956, Neisser 1967, Haugeland 1978), it became acceptable to make inferences about the unobservable processes underlying behavior. Unfortunately, cognitivism let mentalism in again by the back door too, for the hypothetical internal processes came embellished with subjective interpretations. In fact, semantic interpretability (meaningfulness), as we shall see, was one of the defining features of the most prominent contender vying to become the theoretical vocabulary of cognitivism, the "language of thought" (Fodor 1975), which became the prevailing view in cognitive theory for several decades in the form of the "symbolic" model of the mind: The mind is a symbol system and cognition is symbol manipulation. The possibility of generating complex behavior through symbol manipulation was empirically demonstrated by successes in the field of artificial intelligence (AI). "1.2 Symbol Systems." What is a symbol system? From Newell (1980) Pylyshyn (1984), Fodor (1987) and the classical work of Von Neumann, Turing, Goedel, Church, etc. (see Kleene 1969) on the foundations of computation, we can reconstruct the following definition: A symbol system is: (1) a set of arbitrary "physical tokens" (scratches on paper, holes on a tape, events in a digital computer, etc.) that are (2) manipulated on the basis of "explicit rules" that are (3) likewise physical tokens and strings of tokens. The rule-governed symbol-token manipulation is based (4) purely on the shape of the symbol tokens (not their "meaning"), i.e., it is purely syntactic, and consists of (5) "rulefully combining" and recombining symbol tokens. There are (6) primitive atomic symbol tokens and (7) composite symbol-token strings. The entire system and all its parts -- the atomic tokens, the composite tokens, the syntactic manipulations (both actual and possible) and the rules -- are all (8) "semantically interpretable:" The syntax can be systematically assigned a meaning (e.g., as standing for objects, as describing states of affairs). According to proponents of the symbolic model of mind such as Fodor (1980) and Pylyshyn (1980, 1984), symbol-strings of this sort capture what mental phenomena such as thoughts and beliefs are. Symbolists emphasize that the symbolic level (for them, the mental level) is a natural functional level of its own, with ruleful regularities that are independent of their specific physical realizations. For symbolists, this implementation-independence is the critical difference between cognitive phenomena and ordinary physical phenomena and their respective explanations. This concept of an autonomous symbolic level also conforms to general foundational principles in the theory of computation and applies to all the work being done in symbolic AI, the branch of science that has so far been the most successful in generating (hence explaining) intelligent behavior. All eight of the properties listed above seem to be critical to this definition of symbolic.\** [footnote start] Paul Kube (personal communication) has suggested that (2) and (3) may be too strong, excluding some kinds of Turing Machine and perhaps even leading to an infinite regress on levels of explicitness and systematicity. [footnote end] Many phenomena have some of the properties, but that does not entail that they are symbolic in this explicit, technical sense. It is not enough, for example, for a phenomenon to be interpretable as rule-governed, for just about anything can be interpreted as rule-governed. A thermostat may be interpreted as following the rule: Turn on the furnace if the temperature goes below 70 degrees and turn it off if it goes above 70 degrees, yet nowhere in the thermostat is that rule explicitly represented. Wittgenstein (1953) emphasized the difference between explicit and implicit rules: It is not the same thing to "follow" a rule (explicitly) and merely to behave "in accordance with" a rule (implicitly).\** [footnote start] Similar considerations apply to Chomsky's (1980) concept of "psychological reality" (i. e., whether Chomskian rules are really physically represented in the brain or whether they merely "fit" our performance regularities, without being what actually governs them). Another version of the distinction concerns explicitly represented rules versus hard-wired physical constraints (Stabler 1985). In each case, an explicit representation consisting of elements that can be recombined in systematic ways would be symbolic whereas an implicit physical constraint would not, although both would be semantically "intepretable" as a "rule" if construed in isolation rather than as part of a system. [footnote end] The critical difference is in the compositeness (7) and systematicity (8) criteria. The explicitly represented symbolic rule is part of a formal system, it is decomposable (unless primitive), its application and manipulation is purely formal (syntactic, shape-dependent), and the entire system must be semantically interpretable, not just the chunk in question. An isolated ("modular") chunk cannot be symbolic; being symbolic is a systematic property. So the mere fact that a behavior is "interpretable" as ruleful does not mean that it is really governed by a symbolic rule.\** [footnote start] Analogously, the mere fact that a behavior is interpretable as purposeful or conscious or meaningful does not mean that it really is purposeful or conscious. (For arguments to the contrary, see Dennett 1983). [footnote end] Semantic interpretability must be coupled with explicit representation (2), syntactic manipulability (4), and systematicity (8) in order to be symbolic. None of these criteria is arbitrary, and, as far as I can tell, if you weaken them, you lose the grip on what looks like a natural category and you sever the links with the formal theory of computation, leaving a sense of "symbolic" that is merely unexplicated metaphor (and probably differs from speaker to speaker). Hence it is only this formal sense of "symbolic" and "symbol system" that will be considered in this discussion of the grounding of symbol systems. "1.3 Connectionist systems." An early rival to the symbolic model of mind appeared (Rosenblatt 1962), was overcome by symbolic AI (Minsky & Papert 1969) and has recently re-appeared in a stronger form that is currently vying with AI to be the general theory of cognition and behavior (McClelland, Rumelhart et al. 1986, Smolensky 1988). Variously described as "neural networks," "parallel distributed processing" and "connectionism," this approach has a multiple agenda, which includes providing a theory of brain function. Now, much can be said for and against studying behavioral and brain function independently, but in this paper it will be assumed that, first and foremost, a cognitive theory must stand on its own merits, which depend on how well it explains our observable behavioral capacity. Whether or not it does so in a sufficiently brainlike way is another matter, and a downstream one, in the course of theory development. Very little is known of the brain's structure and its "lower" (vegetative) functions so far; and the nature of "higher" brain function is itself a theoretical matter. To "constrain" a cognitive theory to account for behavior in a brainlike way is hence premature in two respects: (1) It is far from clear yet what "brainlike" means, and (2) we are far from having accounted for a lifesize chunk of behavior yet, even without added constraints. Moreover, the formal principles underlying connectionism seem to be based on the associative and statistical structure of the causal interactions in certain dynamical systems; a neural network is merely one possible implementation of such a dynamical system.\** [footnote start] It is not even clear yet that a "neural network" needs to be implemented as a net (i.e., a parallel system of interconnected units) in order to do what it can do; if symbolic simulations of nets have the same functional capacity as real nets, then a connectionist model is just a special kind of symbolic model, and connectionism is just a special family of symbolic algorithms. [footnote end] Connectionism will accordingly only be considered here as a cognitive theory. As such, it has lately challenged the symbolic approach to modeling the mind. According to connectionism, cognition is not symbol manipulation but dynamic patterns of activity in a multilayered network of nodes or units with weighted positive and negative interconnections. The patterns change according to internal network constraints governing how the activations and connection strengths are adjusted on the basis of new inputs (e.g., the generalized "delta rule," or "backpropagation," McClelland, Rumelhart et al. 1986). The result is a system that learns, recognizes patterns, solves problems, and can even exhibit motor skills. "1.4 Scope and Limits of Symbols and Nets." It is far from clear what the actual capabilities and limitations of either symbolic AI or connectionism are. The former seems better at formal and language-like tasks, the latter at sensory, motor and learning tasks, but there is considerable overlap and neither has gone much beyond the stage of "toy" tasks toward lifesize behavioral capacity. Moreover, there has been some disagreement as to whether or not connectionism itself is symbolic. We will adopt the position here that it is not, because connectionist networks fail to meet several of the criteria for being symbol systems, as Fodor & Pylyshyn (1988) have argued recently. In particular, although, like everything else, their behavior and internal states can be given isolated semantic interpretations, nets fail to meet the compositeness (7) and systematicity (8) criteria listed earlier: The patterns of interconnections do not decompose, combine and recombine according to a formal syntax that can be given a systematic semantic interpretation.\** [footnote start] There is some misunderstanding of this point because it is often conflated with a mere implementational issue: Connectionist networks can be simulated using symbol systems, and symbol systems can be implemented using a connectionist architecture, but that is independent of the question of what each can do qua symbol system or connectionist network, respectively. By way of analogy, silicon can be used to build a computer, and a computer can simulate the properties of silicon, but the functional properties of silicon are not those of computation, and the functional properties of computation are not those of silicon. [footnote end] Instead, nets seem to do what they do non symbolically. According to Fodor & Pylyshyn, this is a severe limitation, because many of our behavioral capacities appear to be symbolic, and hence the most natural hypothesis about the underlying cognitive processes that generate them would be that they too must be symbolic. Our linguistic capacities are the primary examples here, but many of the other skills we have -- logical reasoning, mathematics, chess-playing, perhaps even our higher-level perceptual and motor skills -- also seem to be symbolic. In any case, when we interpret our sentences, mathematical formulas, and chess moves (and perhaps some of our perceptual judgments and motor strategies) as having a systematic meaning or content, we know at first hand that that's literally true, and not just a figure of speech. Connectionism hence seems to be at a disadvantage in attempting to model these cognitive capacities. Yet it is not clear whether connectionism should for this reason aspire to be symbolic, for the symbolic approach turns out to suffer from a severe handicap, one that may be responsible for the limited extent of its success to date (especially in modeling human-scale capacities) as well as the uninteresting and ad hoc nature of the symbolic "knowledge" it attributes to the "mind" of the symbol system. The handicap has been noticed in various forms since the advent of computing; I have dubbed a recent manifestation of it the "symbol grounding problem" (Harnad 1987b). "2. The Symbol Grounding Problem" "2.1 The Chinese Room." Before defining the symbol grounding problem I will give two examples of it. The first comes from Searle's (1980) celebrated "Chinese Room Argument," in which the symbol grounding problem is referred to as the problem of intrinsic meaning (or "intentionality"): Searle challenges the core assumption of symbolic AI that a symbol system able to generate behavior indistinguishable from that of a person must have a mind. More specifically, according to the symbolic theory of mind, if a computer could pass the Turing Test (Turing 1964) in Chinese -- i.e., if it could respond to all Chinese symbol strings it receives as input with Chinese symbol strings that are indistinguishable from the replies a real Chinese speaker would make (even if we keep testing for a lifetime) -- then the computer would understand the meaning of Chinese symbols in the same sense that I understand the meaning of English symbols. Searle's simple demonstration that this cannot be so consists of imagining himself doing everything the computer does -- receiving the Chinese input symbols, manipulating them purely on the basis of their shape (in accordance with (1) to (8) above), and finally returning the Chinese output symbols. It is evident that Searle (who knows no Chinese) would not be understanding Chinese under those conditions -- hence neither could the computer. The symbols and the symbol manipulation, being all based on shape rather than meaning, are systematically interpretable as having meaning -- that, after all, is what it is to be a symbol system, according to our definition. But the interpretation will not be intrinsic to the symbol system itself: It will be parasitic on the fact that the symbols have meaning for us, in exactly the same way that the meanings of the symbols in a book are not intrinsic, but derive from the meanings in our heads. Hence, if the meanings of symbols in a symbol system are extrinsic, rather than intrinsic like the meanings in our heads, then they are not a viable model for the meanings in our heads: Cognition cannot be just symbol manipulation. "2.2 The Chinese/Chinese Dictionary-Go-Round." My own example of the symbol grounding problem has two versions, one difficult, and one, I think, impossible. The difficult version is: Suppose you had to learn Chinese as a second language and the only source of information you had was a Chinese/Chinese dictionary. The trip through the dictionary would amount to a merry-go-round, passing endlessly from one meaningless symbol or symbol-string (the definientes) to another (the definienda), never coming to a halt on what anything meant.\** [footnote start] Symbolic AI abounds with symptoms of the symbol grounding problem. One well-known (though misdiagnosed) manifestation of it is the so-called "frame" problem (McCarthy & Hayes 1969; Minsky 1974; NcDermott 1976; Pylyshyn 1987): It is a frustrating but familiar experience in writing "knowledge-based" programs that a system apparently behaving perfectly intelligently for a while can be foiled by an unexpected case that demonstrates its utter stupidity: A "scene-understanding" program will blithely describe the goings-on in a visual scene and answer questions demonstrating its comprehension (who did what, where, why?) and then suddenly reveal that it doesn't "know" that hanging up the phone and leaving the room does not make the phone disappear, or something like that. (It is important to note that these are not the kinds of lapses and gaps in knowledge that people are prone to; rather, they are such howlers as to cast serious doubt on whether the system has anything like "knowledge" at all.) The "frame" problem has been optimistically defined as the problem of formally specifying ("framing") what varies and what stays constant in a particular "knowledge domain," but in reality it's the problem of second-guessing all the contingencies the programmer has not anticipated in symbolizing the knowledge he is attempting to symbolize. These contingencies are probably unbounded, for practical purposes, because purely symbolic "knowledge" is ungrounded. Merely adding on more symbolic contingencies is like taking a few more turns in the Chinese/Chinese Dictionary-Go-Round. There is in reality no ground in sight: merely enough "intelligent" symbol-manipulation to lull the programmer into losing sight of the fact that its meaningfulness is just parasitic on the meanings he is projecting onto it from the grounded meanings in his own head. (I've called this effect the "hermeneutic hall of mirrors" [Harnad 1990]; it's the reverse side of the symbol grounding problem). Yet parasitism it is, as the next "frame problem" lurking around the corner is ready to confirm. (A similar form of over-interpretation has occurred in the ape "language" experiments [Terrace 1979]. Perhaps both apes and computers should be trained using Chinese code, to immunize their experimenters and programmers against spurious over-interpretations. But since the actual behavioral tasks in both domains are still so trivial, there's probably no way to prevent their being decrypted. In fact, there seems to be an irresistible tendency to overinterpret toy task performance itself, preemptively extrapolating and "scaling it up" conceptually to lifesize without any justification in practice.) [footnote end] -- Figure 1 (Chinese Dictionary Entry) about here. -- The only reason cryptologists of ancient languages and secret codes seem to be able to successfully accomplish something very like this is that their efforts are grounded in a first language and in real world experience and knowledge.\** [footnote start] Cryptologists also use statistical information about word frequencies, inferences about what an ancient culture or an enemy government are likely to be writing about, decryption algorithms, etc. [footnote end] The second variant of the Dictionary-Go-Round, however, goes far beyond the conceivable resources of cryptology: Suppose you had to learn Chinese as a first language and the only source of information you had was a Chinese/Chinese dictionary!\** [footnote start] There is of course no need to restrict the symbolic resources to a dictionary; the task would be just as impossible if one had access to the entire body of Chinese-language literature, including all of its computer programs and anything else that can be codified in symbols. [footnote end] This is more like the actual task faced by a purely symbolic model of the mind: How can you ever get off the symbol/symbol merry-go-round? How is symbol meaning to be grounded in something other than just more meaningless symbols?\** [footnote start] Even mathematicians, whether Platonist or formalist, point out that symbol manipulation (computation) itself cannot capture the notion of the intended interpretation of the symbols (Penrose 1989). The fact that formal symbol systems and their interpretations are not the same thing is hence evident independently of the Church-Turing thesis (Kleene 1969) or the Goedel results (Davis 1958, 1965), which have been zealously misapplied to the problem of mind-modeling (e.g., by Lucas 1964) -- to which they are largely irrelevant, in my view. [footnote end] This is the symbol grounding problem.\** [footnote start] Note that, strictly speaking, symbol grounding is a problem only for cognitive modeling, not for AI in general. If symbol systems alone succeed in generating all the intelligent machine performance pure AI is interested in -- e.g., an automated dictionary -- then there is no reason whatsoever to demand that their symbols have intrinsic meaning. On the other hand, the fact that our own symbols do have intrinsic meaning whereas the computer's do not, and the fact that we can do things that the computer so far cannot, may be indications that even in AI there are performance gains to be made (especially in robotics and machine vision) from endeavouring to ground symbol systems. [footnote end] "2.3 Connecting to the World." The standard reply of the symbolist (e.g., Fodor 1980, 1985) is that the meaning of the symbols comes from connecting the symbol system to the world "in the right way." But it seems apparent that the problem of connecting up with the world in the right way is virtually coextensive with the problem of cognition itself. If each definiens in a Chinese/Chinese dictionary were somehow connected to the world in the right way, we'd hardly need the definienda! Many symbolists believe that cognition, being symbol-manipulation, is an autonomous functional module that need only be hooked up to peripheral devices in order to "see" the world of objects to which its symbols refer (or, rather, to which they can be systematically interpreted as referring).\** [footnote start] The homuncular viewpoint inherent in this belief is quite apparent, as is the effect of the "hermeneutic hall of mirrors" (Harnad 1990). [footnote end] Unfortunately, this radically underestimates the difficulty of picking out the objects, events and states of affairs in the world that symbols refer to, i.e., it trivializes the symbol grounding problem. It is one possible candidate for a solution to this problem, confronted directly, that will now be sketched: What will be proposed is a hybrid nonsymbolic/symbolic system, a "dedicated" one, in which the elementary symbols are grounded in two kinds of nonsymbolic representations that pick out, from their proximal sensory projections, the distal object categories to which the elementary symbols refer. Most of the components of which the model is made up (analog projections and transformations, discretization, invariance detection, connectionism, symbol manipulation) have also been proposed in various configurations by others, but they will be put together in a specific bottom-up way here that has not, to my knowledge, been previously suggested, and it is on this specific configuration that the potential success of the grounding scheme critically depends. Table 1 summarizes the relative strengths and weaknesses of connectionism and symbolism, the two current rival candidates for explaining all of cognition single-handedly. Their respective strengths will be put to cooperative rather than competing use in our hybrid model, thereby also remedying some of their respective weaknesses. Let us now look more closely at the behavioral capacities such a cognitive model must generate. -- Table 1 about here -- 3. Human Behavioral Capacity Since the advent of cognitivism, psychologists have continued to gather behavioral data, although to a large extent the relevant evidence is already in: We already know what human beings are able to do. They can (1) discriminate, (2) manipulate,\** [footnote start] Although they are no doubt as important as perceptual skills, motor skills will not be explicitly considered here. It is assumed that the relevant features of the sensory story (e.g., iconicity) will generalize to the motor story (e.g., in motor analogs; Liberman 1982). In addition, large parts of the motor story may not be cognitive, drawing instead upon innate motor patterns and sensorimotor feedback. Gibson's (1979) concept of "affordances" -- the invariant stimulus features that are detected by the motor possibilities they "afford" -- is relevant here too, though Gibson underestimates the processing problems involved in finding such invariants (Ullman 1980). In any case, motor and sensory-motor grounding will no doubt be as important as the sensory grounding that is being focused on here. [footnote end] (3) identify and (4) describe the objects, events and states of affairs in the world they live in, and they can also (5) "produce descriptions" and (6) "respond to descriptions" of those objects, events and states of affairs. Cognitive theory's burden is now to explain how human beings (or any other devices) do all this.\** [footnote start] If a candidate model were to exhibit all these behavioral capacities, both linguistic (5-6) and robotic (i.e., sensorimotor), (1-3) it would pass the "Total Turing Test" (Harnad 1989). The standard Turing Test (Turing 1964) calls for linguistic performance capacity only: symbols in and symbols out. This makes it equivocal about the status, scope and limits of pure symbol manipulation, and hence subject to the symbol grounding problem. A model that could pass the Total Turing Test, however, would be grounded in the world. [footnote end] "3.1 Discrimination and Identification." Let us first look more closely at discrimination and identification. To be able to discriminate is to able to judge whether two inputs are the same or different, and, if different, how different they are. Discrimination is a relative judgment, based on our capacity to tell things apart and discern their degree of similarity. To be able to identify is to be able to assign a unique (usually arbitrary) response -- a "name" -- to a class of inputs, treating them all as equivalent or invariant in some respect. Identification is an absolute judgment, based on our capacity to tell whether or not a given input is a member of a particular category. Consider the symbol "horse." We are able, in viewing different horses (or the same horse in different positions, or at different times) to tell them apart and to judge which of them are more alike, and even how alike they are. This is discrimination. In addition, in viewing a horse, we can reliably call it a horse, rather than, say, a mule or a donkey (or a giraffe, or a stone). This is identification. What sort of internal representation would be needed in order to generate these two kinds of performance? "3.2 Iconic and categorical representations." According to the model being proposed here, our ability to discriminate inputs depends on our forming "iconic representations" of them (Harnad 1987b). These are internal analog transforms of the projections of distal objects on our sensory surfaces (Shepard & Cooper 1982). In the case of horses (and vision), they would be analogs of the many shapes that horses cast on our retinas.\** [footnote start] There are many problems having to do with figure/ground discrimination, smoothing, size constancy, shape constancy, stereopsis, etc., that make the problem of discrimination much more complicated than what is described here, but these do not change the basic fact that iconic representations are a natural candidate substrate for our capacity to discriminate. [footnote end] Same/different judgments would be based on the sameness or difference of these iconic representations, and similarity judgments would be based on their degree of congruity. No homunculus is involved here; simply a process of superimposing icons and registering their degree of disparity. Nor are there memory problems, since the inputs are either simultaneously present or available in rapid enough succession to draw upon their persisting sensory icons. So we need horse icons to discriminate horses. But what about identifying them? Discrimination is independent of identification. I could be discriminating things without knowing what they were. Will the icon allow me to identify horses? Although there are theorists who believe it would (Paivio 1986), I have tried to show why it could not (Harnad 1982, 1987b). In a world where there were bold, easily detected natural discontinuities between all the categories we would ever have to (or choose to) sort and identify -- a world in which the members of one category couldn't be confused with the members of any another category -- icons might be sufficient for identification. But in our underdetermined world, with its infinity of confusable potential categories, icons are useless for identification because there are too many of them and because they blend continuously\** [footnote start] Elsewhere (Harnad 1987a,b) I have tried to show how the phenomenon of "categorical perception" could generate internal discontinuities where there is external continuity. There is evidence that our perceptual system is able to segment a continuum, such as the color spectrum, into relatively discrete, bounded regions or categories. Physical differences of equal magnitude are more discriminable across the boundaries between these categories than within them. This boundary effect, both innate and learned, may play an important role in the representation of the elementary perceptual categories out of which the higher-order ones are built. [footnote end] into one another, making it an independent problem to identify which of them are icons of members of the category and which are not! Icons of sensory projections are too unselective. For identification, icons must be selectively reduced to those "invariant features" of the sensory projection that will reliably distinguish a member of a category from any nonmembers with which it could be confused. Let us call the output of this category-specific feature detector the "categorical representation" . In some cases these representations may be innate, but since evolution could hardly anticipate all of the categories we may ever need or choose to identify, most of these features must be learned from experience. In particular, our categorical representation of a horse is probably a learned one. (I will defer till section 4 the problem of how the invariant features underlying identification might be learned.) Note that both iconic and categorical representations are nonsymbolic. The former are analog copies of the sensory projection, preserving its "shape" faithfully; the latter are icons that have been selectively filtered to preserve only some of the features of the shape of the sensory projection: those that reliably distinguish members from nonmembers of a category. But both representations are still sensory and nonsymbolic. There is no problem about their connection to the objects they pick out: It is a purely causal connection, based on the relation between distal objects, proximal sensory projections and the acquired internal changes that result from a history of behavioral interactions with them. Nor is there any problem of semantic interpretation, or whether the semantic interpretation is justified. Iconic representations no more "mean" the objects of which they are the projections than the image in a camera does. Both icons and camera-images can of course be interpreted as meaning or standing for something, but the interpretation would clearly be derivative rather than intrinsic.\** [footnote start] On the other hand, the resemblance on which discrimination performance is based -- the degree of isomorphism between the icon and the sensory projection, and between the sensory projection and the distal object -- seems to be intrinsic, rather than just a matter of interpretation. The resemblance can be objectively characterized as the degree of invertibility of the physical transformation from object to icon (Harnad 1987b). [footnote end] "3.3 Symbolic Representations." Nor can categorical representations yet be interpreted as "meaning" anything. It is true that they pick out the class of objects they "name," but the names do not have all the systematic properties of symbols and symbol systems described earlier. They are just an inert taxonomy. For systematicity it must be possible to combine and recombine them rulefully into propositions that can be semantically interpreted. "Horse" is so far just an arbitrary response that is reliably made in the presence of a certain category of objects. There is no justification for interpreting it holophrastically as meaning "This is a [member of the category] horse" when produced in the presence of a horse, because the other expected systematic properties of "this" and "a" and the all-important "is" of predication are not exhibited by mere passive taxonomizing. What would be required to generate these other systematic properties? Merely that the grounded names in the category taxonomy be strung together into propositions about further category membership relations. For example: (1) Suppose the name "horse" is grounded by iconic and categorical representations, learned from experience, that reliably discriminate and identify horses on the basis of their sensory projections. (2) Suppose "stripes" is similarly grounded. Now consider that the following category can be constituted out of these elementary categories by a symbolic description of category membership alone: (3) "Zebra" = "horse" & "stripes"\** [footnote start] Figure 1 is actually the Chinese dictionary entry for "zebra," which is "striped horse." Note that the character for "zebra" actually happens to be the character for "horse" plus the character for "striped." Although Chinese characters are iconic in structure, they function just like arbitrary alphabetic lexigrams at the level of syntax and semantics. [footnote end] What is the representation of a zebra? It is just the symbol string "horse & stripes." But because "horse" and "stripes" are grounded in their respective iconic and categorical representations, "zebra" inherits the grounding, through its grounded symbolic representation. In principle, someone who had never seen a zebra (but had seen and learned to identify horses and stripes) could identify a zebra on first acquaintance armed with this symbolic representation alone (plus the nonsymbolic -- iconic and categorical -- representations of horses and stripes that ground it). Once one has the grounded set of elementary symbols provided by a taxonomy of names (and the iconic and categorical representations that give content to the names and allow them to pick out the objects they identify), the rest of the symbol strings of a natural language can be generated by symbol composition alone,\** [footnote start] Some standard logical connectives and quantifiers are needed too, such as not, and, all, etc. [footnote end] and they will all inherit the intrinsic grounding of the elementary set.\** [footnote start] Note that it is not being claimed that "horse," "stripes," etc. are actually elementary symbols, with direct sensory grounding; the claim is only that some set of symbols must be directly grounded. Most sensory category representations are no doubt hybrid sensory/symbolic; and their features can change by bootstrapping: "Horse" can always be revised, both sensorily and symbolically, even if it was previously elementary. Kripke (1980) gives a good example of how "gold" might be baptized on the shiny yellow metal in question, used for trade, decoration and discourse, and then we might discover "fool's gold," which would make all the sensory features we had used until then inadequate, forcing us to find new ones. He points out that it is even possible in principle for "gold" to have been inadvertently baptized on "fool's gold"! Of interest here are not the ontological aspects of this possibility, but the epistemic ones: We could bootstrap successfully to real gold even if every prior case had been fool's gold. "Gold" would still be the right word for what we had been trying to pick out all along, and its original provisional features would still have provided a close enough approximation to ground it, even if later information were to pull the ground out from under it, so to speak. [footnote end] Hence, the ability to discriminate and categorize (and its underlying nonsymbolic representations) has led naturally to the ability to describe and to produce and respond to descriptions through symbolic representations. 4. A Complementary Role for Connectionism The symbol grounding scheme just described has one prominent gap: No mechanism has been suggested to explain how the all-important categorical representations could be formed: How does the hybrid system find the invariant features of the sensory projection that make it possible to categorize and identify objects correctly?\** [footnote start] Although it is beyond the scope of this paper to discuss it at length, it must be mentioned that this question has often been begged in the past, mainly on the grounds of "vanishing intersections." It has been claimed that one cannot find invariant features in the sensory projection because they simply do not exist: The intersection of all the projections of the members of a category such as "horse" is empty. The British empiricists have been criticized for thinking otherwise; for example, Wittgenstein's (1953) discussion of "games" and "family resemblances" has been taken to have discredited their view. And current research on human categorization (Rosch & Lloyd 1978) has been interpreted as confirming that intersections vanish and that hence categories are not represented in terms of invariant features. The problem of vanishing intersections (together with Chomsky's  "poverty of the stimulus argument") has even been cited by thinkers such as Fodor (1985, 1987) as a justification for extreme nativism. The present paper is frankly empiricist. In my view, the reason intersections have not been found is that no one has yet looked for them properly. Introspection certainly isn't the way to look. And general pattern learning algorithms such as connectionism are relatively new; their inductive power remains to be tested. In addition, a careful distinction has not been made between pure sensory categories (which, I claim, must have invariants, otherwise we could not successfully identify them as we do) and higher-order categories that are grounded in sensory categories; these abstract representations may be symbolic rather than sensory, and hence not based directly on sensory invariants. For further discussion of this problem, see Harnad 1987b). [footnote end] Connectionism, with its general pattern learning capability, seems to be one natural candidate (though there may well be others): Icons, paired with feedback indicating their names, could be processed by a connectionist network that learns to identify icons correctly from the sample of confusable alternatives it has encountered by dynamically adjusting the weights of the features and feature combinations that are reliably associated with the names in a way that (provisionally) resolves the confusion, thereby reducing the icons to the invariant (confusion-resolving) features of the category to which they are assigned. In effect, the "connection" between the names and the objects that give rise to their sensory projections and their icons would be provided by connectionist networks. This circumscribed complementary role for connectionism in a hybrid system seems to remedy the weaknesses of the two current competitors in their attempts to model the mind independently. In a pure symbolic model the crucial connection between the symbols and their referents is missing; an autonomous symbol system, though amenable to a systematic semantic interpretation, is ungrounded. In a pure connectionist model, names are connected to objects through invariant patterns in their sensory projections, learned through exposure and feedback, but the crucial compositional property is missing; a network of names, though grounded, is not yet amenable to a full systematic semantic interpretation. In the hybrid system proposed here, there is no longer any autonomous symbolic level at all; instead, there is an intrinsically dedicated symbol system, its elementary symbols (names) connected to nonsymbolic representations that can pick out the objects to which they refer, via connectionist networks that extract the invariant features of their analog sensory projections. 5. Conclusions The expectation has often been voiced that "top-down" (symbolic) approaches to modeling cognition will somehow meet "bottom-up" (sensory) approaches somewhere in between. If the grounding considerations in this paper are valid, then this expectation is hopelessly modular and there is really only one viable route from sense to symbols: from the ground up. A free-floating symbolic level like the software level of a computer will never be reached by this route (or vice versa) -- nor is it clear why we should even try to reach such a level, since it looks as if getting there would just amount to uprooting our symbols from their intrinsic meanings (thereby merely reducing ourselves to the functional equivalent of a programmable computer). In an intrinsically dedicated symbol system there are more constraints on the symbol tokens than merely syntactic ones. Symbols are manipulated not only on the basis of the arbitrary shape of their tokens, but also on the basis of the decidedly nonarbitrary "shape" of the iconic and categorical representations connected to the grounded elementary symbols out of which the higher-order symbols are composed. Of these two kinds of constraints, the iconic/categorical ones are primary. I am not aware of any formal analysis of such dedicated symbol systems,\** [footnote start] Although mathematicians investigate the formal properties of uninterpreted symbol systems, all of their motivations and intuitions clearly come from the intended interpretations of those systems (see Penrose 1989). Perhaps these too are grounded in the iconic and categorical representations in their heads. [footnote end] but this may be because they are unique to cognitive and robotic modeling and their properties will depend on the specific kinds of robotic (i.e., behavioral) capacities they are designed to exhibit. It is appropriate that the properties of dedicated symbol systems should turn out to depend on behavioral considerations. The present grounding scheme is still in the spirit of behaviorism in that the only tests proposed for whether a semantic interpretation will bear the semantic weight placed on it consist of one formal test (does it meet the eight criteria for being a symbol system?) and one behavioral test (can it discriminate, identify and describe all the objects and states of affairs to which its symbols refer?). If both tests are passed, then the semantic interpretation of its symbols is "fixed" by the behavioral capacity of the dedicated symbol system, as exercised on the objects and states of affairs in the world to which its symbols refer; the symbol meanings are accordingly not just parasitic on the meanings in the head of the interpreter, but intrinsic to the dedicated symbol system itself. This is still no guarantee that our model has captured subjective meaning, of course. But if the system's behavioral capacities are lifesize, it's as close as we can ever hope to get. References Catania, A. C. & Harnad, S. (eds.) (1988) The Selection of Behavior. The Operant Behaviorism of B. F. Skinner: Comments and Consequences. New York: Cambridge University Press. Chomsky, N. (1980) Rules and representations. "Behavioral and Brain Sciences" 3: 1-61. Davis, M. (1958) "Computability and unsolvability." Manchester: McGraw-Hill. Davis, M. (1965) "The undecidable." New York: Raven. Dennett, D. C. (1983) Intentional systems in cognitive ethology. Behavioral and Brain Sciences 6: 343 - 90. Fodor, J. A. (1975) "The language of thought" New York: Thomas Y. Crowell Fodor, J. A. (1980) Methodological solipsism considered as a research strategy in cognitive psychology. "Behavioral and Brain Sciences 3: 63 - 109. Fodor, J. A. (1985) Pr\*'ecis of "The Modularity of Mind." "Behavioral and Brain Sciences" 8: 1 - 42. Fodor, J. A. (1987) Psychosemantics Cambridge MA: MIT/Bradford. Fodor, J. A. & Pylyshyn, Z. W. (1988) Connectionism and cognitive architecture: A critical appraisal. Cognition 28: 3 - 71. Gibson, J. J. (1979) "An ecological approach to visual perception." Boston: Houghton Mifflin Harnad, S. (1982) Metaphor and mental duality. In T. Simon & R. Scholes, R. (Eds.) "Language, mind and brain." Hillsdale, N. J.: Lawrence Erlbaum Associates Harnad, S. (1987a) Categorical perception: A critical overview. In S. Harnad (Ed.) "Categorical perception: The groundwork of Cognition." New York: Cambridge University Press Harnad, S. (1987b) Category induction and representation. In S. Harnad (Ed.) "Categorical perception: The groundwork of Cognition." New York: Cambridge University Press Harnad, S. (1989) Minds, Machines and Searle. "Journal of Theoretical and Experimental Artificial Intelligence" 1: 5-25. Harnad, S. (1990) Computational Hermeneutics. "Social Epistemology" in press. Haugeland, J. (1978) The nature and plausibility of cognitivism. "Behavioral and Brain Sciences" 1: 215-260. Kleene, S. C. (1969) "Formalized recursive functionals and formalized realizability." Providence, R.: American Mathematical Society. Kripke, S.A. (1980) "Naming and Necessity." Cambridge MA: Harvard University Press Liberman, A. M. (1982) On the finding that speech is special. "American Psychologist" 37: 148-167. Lucas, J. R. (1961) Minds, machines and G\*"odel. Philosophy 36: 112-117. McCarthy, J. & Hayes, P. (1969) Some philosophical problems from the standpoint of artificial intelligence. In: Meltzer B. & Michie, P. "Machine Intelligence" Volume 4. Edinburgh: Edinburgh University Press. McDermott, D. (1976) Artificial intelligence meets natural stupidity. "SIGART Newsletter" 57: 4 - 9. McClelland, J. L., Rumelhart, D. E., and the PDP Research Group (1986) "Parallel distributed processing: Explorations in the microstructure of cognition," Volume 1. Cambridge MA: MIT/Bradford. Miller, G. A. (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. "Psychological Review" 63: 81 - 97. Minsky, M. (1974) A framework for Representing knowledge. "MIT Lab Memo" # 306. Minsky, M. & Papert, S. (1969) "Perceptrons: An introduction to computational geometry." Cambridge MA: MIT Press (Reissued in an Expanded Edition, 1988). Newell, A. (1980) Physical Symbol Systems. "Cognitive Science 4:" 135 - 83. Neisser, U. (1967) "Cognitive Psychology" NY: Appleton-Century-Crofts. Cognitive Psychology Paivio, A. (1986) "Mental representation: A dual coding approach." New York: Oxford Penrose, R. (1989) "The emperor's new mind." Oxford: Oxford University Press Pylyshyn, Z. W. (1980) Computation and cognition: Issues in the foundations of cognitive science. "Behavioral and Brain Sciences" 3: 111-169. Pylyshyn, Z. W. (1984) "Computation and cognition." Cambridge MA: MIT/Bradford Pylyshyn, Z. W. (Ed.) (1987) "The robot's dilemma: The frame problem in artificial intelligence." Norwood NJ: Ablex Rosch, E. & Lloyd, B. B. (1978) "Cognition and categorization." Hillsdale NJ: Erlbaum Associates Rosenblatt, F. (1962) Principles of neurodynamics. NY: Spartan Searle, J. R. (1980) Minds, brains and programs. "Behavioral and Brain Sciences" 3: 417-457. Shepard, R. N. & Cooper, L. A. (1982) "Mental images and their transformations." Cambridge: MIT Press/Bradford. Smolensky, P. (1988) On the proper treatment of connectionism. "Behavioral and Brain Sciences" 11: 1 - 74. Stabler, E. P. (1985) How are grammars represented? "Behavioral and Brain Sciences" 6: 391-421. Terrace, H. (1979) Nim. NY: Random House. Turkkan, J. (1989) Classical conditioning: The new hegemony. "Behavioral and Brain Sciences 12:" 121 - 79. Turing, A. M. (1964) Computing machinery and intelligence. In: "Minds and machines, A. R. Anderson (ed.), Engelwood Cliffs NJ: Prentice Hall. Ullman, S. (1980) Against direct perception. "Behavioral and Brain Sciences" 3: 373 - 415. Wittgenstein, L. (1953) "Philosophical investigations." New York: Macmillan This figure should consist of the Chinese characters for "zebra," "horse" and "stripes," formatted as a dictionary entry, thus: "ZEBRA": "HORSE" with "STRIPES" Table 1. Connectionism Vs. Symbol Systems Strengths of Connectionism: (1) Nonsymbolic Function: As long as it does not aspire to be a symbol system, a connectionist network has the advantage of not being subject to the symbol grounding problem. (2) Generality: Connectionism applies the same small family of algorithms to many problems, whereas symbolism, being a methodology rather than an algorithm, relies on endless problem-specific symbolic rules. (3) "Neurosimilitude": Connectionist architecture seems more brain-like than a Turing machine or a digital computer. (4) Pattern Learning: Connectionist networks are especially suited to the learning of patterns from data. Weaknesses of Connectionism: (1) Nonsymbolic Function: Connectionist networks, because they are not symbol systems, do not have the systematic semantic properties that many cognitive phenomena appear to have. (2) Generality: Not every problem amounts to pattern learning. Some cognitive tasks may call for problem-specific rules, symbol manipulation, and standard computation. (3) "Neurosimilitude" : Connectionism's brain-likeness may be superficial and may (like toy models) camoflauge deeper performance limitations. Strengths of Symbol Systems: (1) Symbolic Function: Symbols have the computing power of Turing Machines and the systematic properties of a formal syntax that is semantically interpretable. (2) Generality: All computable functions (including all cognitive functions) are equivalent to a computational state in a Turing Machine. (3) Practical Successes: Symbol systems' ability to generate intelligent behavior is demonstrated by the successes of Artificial Intelligence. Weaknesses of Symbol Systems: (1) Symbolic Function: Symbol systems are subject to the symbol grounding problem. (2) Generality: Turing power is too general. The solutions to AI's many toy problems do not give rise to common principles of cognition but to a vast variety of ad hoc symbolic strategies.