Perceptrons (book)
Perceptrons: an introduction to computational geometry is a book written by Marvin Minsky and Seymour Papert and published in 1969. An edition with handwritten corrections and additions was released in the early 1970s. An expanded edition was further published in 1987, containing a chapter dedicated to counter the criticisms made of it in the 1980s.
Author | Marvin Minsky, Seymour Papert |
---|---|
Publication date | 1969 |
ISBN | 0 262 13043 2 |
The main subject of the book is the perceptron, a type of artificial neural network developed in the late 1950s and early 1960s. The book was dedicated to psychologist Frank Rosenblatt, who in 1957 had published the first model of a "Perceptron".[1] Rosenblatt and Minsky knew each other since adolescence, having studied with a one-year difference at the Bronx High School of Science.[2] They became at one point central figures of a debate inside the AI research community, and are known to have promoted loud discussions in conferences, yet remained friendly.[3]
This book is the center of a long-standing controversy in the study of artificial intelligence. It is claimed that pessimistic predictions made by the authors were responsible for a change in the direction of research in AI, concentrating efforts on so-called "symbolic" systems, a line of research that petered out and contributed to the so-called AI winter of the 1980s, when AI's promise was not realized.
The meat of Perceptrons is a number of mathematical proofs which acknowledge some of the perceptrons' strengths while also showing major limitations.[3] The most important one is related to the computation of some predicates, such as the XOR function, and also the important connectedness predicate. The problem of connectedness is illustrated at the awkwardly colored cover of the book, intended to show how humans themselves have difficulties in computing this predicate.[4]
Background
The perceptron is a neural net developed by psychologist Frank Rosenblatt in 1958 and is one of the most famous machines of its period.[5][6] In 1960, Rosenblatt and colleagues were able to show that the perceptron could in finitely many training cycles learn any task that its parameters could embody. The perceptron convergence theorem was proved for single-layer neural nets.[6]
During this period, neural net research was a major approach to the brain-machine issue that had been taken by a significant number of individuals.[6] Reports by the New York Times and statements by Rosenblatt claimed that neural nets would soon be able to see images, beat humans at chess, and reproduce.[3] At the same time, new approaches including symbolic AI emerged.[7] Different groups found themselves competing for funding and people, and their demand for computing power far outpaced available supply.[8]
Contents
Perceptrons: An Introduction to Computational Geometry is a book of thirteen chapters grouped into three sections. Chapters 1–10 present the authors' perceptron theory through proofs, Chapter 11 involves learning, Chapter 12 treats linear separation problems, and Chapter 13 discusses some of the authors' thoughts on simple and multilayer perceptrons and pattern recognition.[9][10]
Definition of perceptron
Minsky and Papert took as their subject the abstract versions of a class of learning devices which they called perceptrons, "in recognition of the pioneer work of Frank Rosenblatt".[10] These perceptrons were modified forms of the perceptrons introduced by Rosenblatt in 1958. They consisted of a retina, a single layer of input functions and a single output.[9][6]
Besides this, the authors restricted the "order", or maximum number of incoming connections, of their perceptrons. Sociologist Mikel Olazaran explains that Minsky and Papert "maintained that the interest of neural computing came from the fact that it was a parallel combination of local information", which, in order to be effective, had to be a simple computation. To the authors, this implied that "each association unit could receive connections only from a small part of the input area".[6] Minsky and Papert called this concept "conjunctive localness".[10]
Parity and connectedness
Two main examples analyzed by the authors were parity and connectedness. Parity involves determining whether the number of activated inputs in the input retina is odd or even, and connectedness refers to the figure-ground problem. Minsky and Papert proved that the single-layer perceptron could not compute parity under the condition of conjunctive localness and showed that the order required for a perceptron to compute connectivity grew impractically large.[11][10]
The XOR affair
Some critics of the book state that the authors imply that, since a single artificial neuron is incapable of implementing some functions such as the XOR logical function, larger networks also have similar limitations, and therefore should be dropped. Research on three-layered perceptrons showed how to implement such functions. Rosenblatt in his book proved that the elementary perceptron with a priori unlimited number of hidden layer A-elements (neurons) and one output neuron can solve any classification problem. (Existence theorem.[12]) Minsky and Papert used perceptrons with restricted number of inputs of the hidden layer A-elements and locality condition: each element of the hidden layer receives the input signals from a small circle. These restricted perceptrons cannot define whether the image is a connected figure or is the number of pixels in the image even (the parity predicate).
There are many mistakes in this story. Although a single neuron can in fact compute only a small number of logical predicates, it was widely known that networks of such elements can compute any possible boolean function. This was known by Warren McCulloch and Walter Pitts, who even proposed how to create a Turing machine with their formal neurons, is mentioned in Rosenblatt's book, and is even mentioned in the book Perceptrons.[13] Minsky also extensively uses formal neurons to create simple theoretical computers in his book Computation: Finite and Infinite Machines.
What the book does prove is that in three-layered feed-forward perceptrons (with a so-called "hidden" or "intermediary" layer), it is not possible to compute some predicates unless at least one of the neurons in the first layer of neurons (the "intermediary" layer) is connected with a non-null weight to each and every input. This was contrary to a hope held by some researchers in relying mostly on networks with a few layers of "local" neurons, each one connected only to a small number of inputs. A feed-forward machine with "local" neurons is much easier to build and use than a larger, fully connected neural network, so researchers at the time concentrated on these instead of on more complicated models.
Some other critics, most notably Jordan Pollack, note that what was a small proof concerning a global issue (parity) not being detectable by local detectors was interpreted by the community as a rather successful attempt to bury the whole idea.[14]
Perceptrons and pattern recognition
In the final chapter, the authors put forth thoughts on multilayer machines and Gamba perceptrons. They conjecture that Gamba machines would require "an enormous number" of Gamba-masks and that multilayer neural nets are a "sterile" extension. Additionally, they note that many of the "impossible" problems for perceptrons had already been solved using other methods.[10]
Reception and legacy
Perceptrons received a number of positive reviews in the years after publication. In 1969, Stanford professor Michael A. Arbib stated, "[t]his book has been widely hailed as an exciting new chapter in the theory of pattern recognition."[15] Earlier that year, CMU professor Allen Newell composed a review of the book for Science, opening the piece by declaring "[t]his is a great book."[16]
On the other hand, H.D. Block expressed concern at the authors' narrow definition of perceptrons. He argued that they "study a severely limited class of machines from a viewpoint quite alien to Rosenblatt's", and thus the title of the book was "seriously misleading".[9] Contemporary neural net researchers shared some of these objections: Bernard Widrow complained that the authors had defined perceptrons too narrowly, but also said that Minsky and Papert's proofs were "pretty much irrelevant", coming a full decade after Rosenblatt's perceptron.[11]
Perceptrons is often thought to have caused a decline in neural net research in the 1970s and early 1980s.[3][17] During this period, neural net researchers continued smaller projects outside the mainstream, while symbolic AI research saw explosive growth.[18][3]
With the revival of connectionism in the late 80s, PDP researcher David Rumelhart and his colleagues returned to Perceptrons. In a 1986 report, they claimed to have overcome the problems presented by Minsky and Papert, and that "their pessimism about learning in multilayer machines was misplaced".[3]
Analysis of the controversy
It is most instructive to learn what Minsky and Papert themselves said in the 1970s as to what was the broader implications of their book. On his website Harvey Cohen,[19] a researcher at the MIT AI Labs 1974+,[20] quotes Minsky and Papert in the 1971 Report of Project MAC, directed at funding agencies, on "Gamba networks":[21] "Virtually nothing is known about the computational capabilities of this latter kind of machine. We believe that it can do little more than can a low order perceptron." In the preceding page Minsky and Papert make clear that "Gamba networks" are networks with hidden layers.
Minsky has compared the book to the fictional book Necronomicon in H. P. Lovecraft's tales, a book known to many, but read only by a few.[22] The authors talk in the expanded edition about the criticism of the book that started in the 1980s, with a new wave of research symbolized by the PDP book.
How Perceptrons was explored first by one group of scientists to drive research in AI in one direction, and then later by a new group in another direction, has been the subject of a sociological study of scientific development.[3]
Notes
- Rosenblatt, Frank (January 1957). "The Perceptron: A Perceiving and Recognizing Automaton (Project PARA)" (PDF). Report (85-460-1). Cornell Aeronautical Laboratory, Inc., memorialized at Joe Pater, Brain Wars: How does the mind work? And why is that so important?, UmassAmherst. Retrieved 29 December 2019. Cite journal requires
|journal=
(help); External link in|publisher=
(help) - Crevier 1993
- Olazaran, Mikel (1996). "A Sociological Study of the Official History of the Perceptrons Controversy". Social Studies of Science. 26 (3): 611–659. doi:10.1177/030631296026003005. JSTOR 285702.
- Minsky-Papert 1972:74 shows the figures in black and white. The cover of the 1972 paperback edition has them printed purple on a red background, and this makes the connectivity even more difficult to discern without the use of a finger or other means to follow the patterns mechanically. This problem is discussed in detail on pp.136ff and indeed involves tracing the boundary.
- Rosenblatt, Frank (1958). "The perceptron: A probabilistic model for information storage and organization in the brain". Psychological Review. 65 (6): 386–408. CiteSeerX 10.1.1.588.3775. doi:10.1037/h0042519. PMID 13602029.
- Olazaran 1996, p. 618
- Haugeland, John (1985). Artificial Intelligence: The Very Idea. Cambridge, Mass: MIT Press. ISBN 978-0-262-08153-5.
- Hwang, Tim (2018). "Computational Power and the Social Impact of Artificial Intelligence". arXiv:1803.08971v1 [cs.AI].
- Block, H. D. (1970). "A Review of 'Perceptrons: An Introduction to Computational Geometry'". Information and Control. 17 (1): 501–522. doi:10.1016/S0019-9958(70)90409-2.
- Minsky, Marvin; Papert, Seymour (1988). Perceptrons: An Introduction to Computational Geometry. MIT Press.
- Olazaran 1996, p. 630
- Theorem 1 in Rosenblatt, F. (1961) Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Spartan. Washington DC.
- Cf. Minsky-Papert (1972:232): "... a universal computer could be built entirely out of linear threshold modules. This does not in any sense reduce the theory of computation and programming to the theory of perceptrons."
- Pollack, J. B. (1989). "No Harm Intended: A Review of the Perceptrons expanded edition". Journal of Mathematical Psychology. 33 (3): 358–365. doi:10.1016/0022-2496(89)90015-1.
- Arbib, Michael (November 1969). "Review of 'Perceptrons: An Introduction to Computational Geometry'". IEEE Transactions on Information Theory. 15 (6): 738–739. doi:10.1109/TIT.1969.1054388.
- Newell, Allen (1969). "A Step toward the Understanding of Information Processes". Science. 165 (3895): 780–782. doi:10.1126/science.165.3895.780. JSTOR 1727364.
- Alom, Md Zahangir; et al. (2018). "The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches". arXiv:1803.01164v1 [cs.CV].
1969: Minsky & Papert show the limitations of perceptron's, killing research in neural networks for a decade
- Bechtel, William (1993). "The Case for Connectionism". Philosophical Studies. 71 (2): 119–154. doi:10.1007/BF00989853. JSTOR 4320426.
- "The Perceptron Controversy".
- "Author of MIT AI Memo 338" (PDF).
- from the name of the italian neural network researcher Augusto Gamba (1923–1996), designer of the PAPA perceptron
- "History: The Past". Ucs.louisiana.edu. Retrieved 2013-07-10.
References
- McCorduck, Pamela (2004), Machines Who Think (2nd ed.), Natick, MA: A. K. Peters, Ltd., ISBN 1-56881-205-1, pp. 104−107
- Crevier, Daniel (1993), AI: The Tumultuous Search for Artificial Intelligence, New York, NY: BasicBooks, ISBN 0-465-02997-3, pp. 102−105
- Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2 p. 22
- Marvin Minsky and Seymour Papert, 1972 (2nd edition with corrections, first edition 1969) Perceptrons: An Introduction to Computational Geometry, The MIT Press, Cambridge MA, ISBN 0-262-63022-2.