Efficient coding hypothesis
The efficient coding hypothesis was proposed by Horace Barlow in 1961 as a theoretical model of sensory coding in the brain.[1] Within the brain, neurons communicate with one another by sending electrical impulses referred to as action potentials or spikes. One goal of sensory neuroscience is to decipher the meaning of these spikes in order to understand how the brain represents and processes information about the outside world. Barlow hypothesized that the spikes in the sensory system formed a neural code for efficiently representing sensory information. By efficient Barlow meant that the code minimized the number of spikes needed to transmit a given signal. This is somewhat analogous to transmitting information across the internet, where different file formats can be used to transmit a given image. Different file formats require different number of bits for representing the same image at given distortion level, and some are better suited for representing certain classes of images than others. According to this model, the brain is thought to use a code which is suited for representing visual and audio information representative of an organism's natural environment.
Efficient coding and information theory
The development of the Barlow's hypothesis was influenced by information theory introduced by Claude Shannon only a decade before. Information theory provides the mathematical framework for analyzing communication systems. It formally defines concepts such as information, channel capacity, and redundancy. Barlow's model treats the sensory pathway as a communication channel where neuronal spiking is an efficient code for representing sensory signals. The spiking code aims to maximize available channel capacity by minimizing the redundancy between representational units. H. Barlow was not the very first one to introduce the idea: it already appears in a 1954 article written by F. Attneave.[2]
A key prediction of the efficient coding hypothesis is that sensory processing in the brain should be adapted to natural stimuli. Neurons in the visual (or auditory) system should be optimized for coding images (or sounds) representative of those found in nature. Researchers have shown that filters optimized for coding natural images lead to filters which resemble the receptive fields of simple-cells in V1.[3] In the auditory domain, optimizing a network for coding natural sounds leads to filters which resemble the impulse response of cochlear filters found in the inner ear.[4]
Constraints on the visual system
Due to constraints on the visual system such as the number of neurons and the metabolic energy required for "neural activities", the visual processing system must have an efficient strategy for transmitting as much information as possible.[5] Information must be compressed as it travels from the retina back to the visual cortex. While the retinal receptors can receive information at 10^9 bit/s, the optic nerve, which is composed of 1 million ganglion cells transmitting at 1 bit/sec, only has a transmission capacity of 10^6 bit/s.[5] Further reduction occurs that limits the overall transmission to 40 bit/s which results in inattentional blindness.[5] Thus, the hypothesis states that neurons should encode information as efficiently as possible in order to maximize neural resources.[6] For example, it has been shown that visual data can be compressed up to 20 fold without noticeable information loss.[5]
Evidence suggests that our visual processing system engages in bottom-up selection. For example, inattentional blindness suggests that there must be data deletion early on in the visual pathway.[5] This bottom-up approach allows us to respond to unexpected and salient events more quickly and is often directed by attentional selection. This also gives our visual system the property of being goal-directed.[5] Many have suggested that the visual system is able to work efficiently by breaking images down into distinct components.[6] Additionally, it has been argued that the visual system takes advantage of redundancies in inputs in order to transmit as much information as possible while using the fewest resources.[5]
Evolution-based neural system
Simoncelli and Olshausen outline the three major concepts that are assumed to be involved in the development of systems neuroscience:
- an organism has specific tasks to perform
- neurons have capabilities and limitations
- an organism is in a particular environment.[7]
One assumption used in testing the Efficient Coding Hypothesis is that neurons must be evolutionarily and developmentally adapted to the natural signals in their environment.[7] The idea is that perceptual systems will be the quickest when responding to "environmental stimuli". The visual system should cut out any redundancies in the sensory input.[8]
Natural images and statistics
Central to Barlow's hypothesis is information theory, which when applied to neuroscience, argues that an efficiently coding neural system "should match the statistics of the signals they represent".[9] Therefore, it is important to be able to determine the statistics of the natural images that are producing these signals. Researchers have looked at various components of natural images including luminance contrast, color, and how images are registered over time.[8] They can analyze the properties of natural scenes via digital cameras, spectrophotometers, and range finders.[10]
Researchers look at how luminance contrasts are spatially distributed in an image: the luminance contrasts are highly correlated the closer they are in measurable distance and less correlated the farther apart the pixels are.[8] Independent component analysis (ICA) is an algorithm system that attempts to "linearly transform given (sensory) inputs into independent outputs (synaptic currents) ".[11] ICA eliminates the redundancy by decorrelating the pixels in a natural image.[8] Thus the individual components that make up the natural image are rendered statistically independent.[8] However, researchers have thought that ICA is limited because it assumes that the neural response is linear, and therefore insufficiently describes the complexity of natural images. They argue that, despite what is assumed under ICA, the components of the natural image have a "higher-order structure" that involves correlations among components.[8] Instead, researchers have now developed temporal independent component analysis (TICA), which better represents the complex correlations that occur between components in a natural image.[8] Additionally, a "hierarchical covariance model" developed by Karklin and Lewicki expands on sparse coding methods and can represent additional components of natural images such as "object location, scale, and texture".[8]
The chromatic spectra as it comes from natural light, but also as it is reflected off of "natural materials" can be easily characterized with principal components analysis (PCA).[10] Because the cones are absorbing a specific amount of photons from the natural image, researchers can use cone responses as a way of describing the natural image. Researchers have found that the three classes of cone receptors in the retina can accurately code natural images and that color is decorrelated already in the LGN.[8][10] Time has also been modeled: natural images transform over time, and we can use these transformations to see how the visual input changes over time.[8]
A padegogical review of efficient coding in visual processing --- efficient spatial coding, color coding, temporal/motion coding, stereo coding, and the combination of them --- is in chapter 3 of the book "Understanding vision: theory, models, and data".[12] It explains how efficient coding is realized when input noise makes redundancy reduction no longer adequate, and how efficient coding in different situations are related to each other or different from each other.
Hypotheses for testing the efficient coding hypothesis
If neurons are encoding according to the efficient coding hypothesis then individual neurons must be expressing their full output capacity.[6] Before testing this hypothesis it is necessary to define what is considered to be a neural response.[6] Simoncelli and Olshausen suggest that an efficient neuron needs to be given a maximal response value so that we can measure if a neuron is efficiently meeting the maximum level.[7] Secondly, a population of neurons must not be redundant in transmitting signals and must be statistically independent.[6] If the efficient coding hypothesis is accurate, researchers should observe is that there is sparsity in the neuron responses: that is, only a few neurons at a time should fire for an input.[8]
Methodological approaches for testing the hypotheses
One approach is to design a model for early sensory processing based on the statistics of a natural image and then compare this predicted model to how real neurons actually respond to the natural image.[6] The second approach is to measure a neural system responding to a natural environment, and analyze the results to see if there are any statistical properties to this response.[6]
Examples of these approaches
1. Predicted model approach
In one study by Doi et al. in 2012, the researchers created a predicted response model of the retinal ganglion cells that would be based on the statistics of the natural images used, while considering noise and biological constraints.[13] They then compared the actual information transmission as observed in real retinal ganglion cells to this optimal model to determine the efficiency. They found that the information transmission in the retinal ganglion cells had an overall efficiency of about 80% and concluded that "the functional connectivity between cones and retinal ganglion cells exhibits unique spatial structure...consistent with coding efficiency.[13]
A study by van Hateren and Ruderman in 1998 used ICA to analyze video-sequences and compared how a computer analyzed the independent components of the image to data for visual processing obtained from a cat in DeAngelis et al. 1993. The researchers described the independent components obtained from a video sequence as the "basic building blocks of a signal", with the independent component filter (ICF) measuring "how strongly each building block is present".[14] They hypothesized that if simple cells are organized to pick out the "underlying structure" of images over time then cells should act like the independent component filters.[14] They found that the ICFs determined by the computer were similar to the "receptive fields" that were observed in actual neurons.[14]
2. Analyzing actual neural system in response to natural images
In a report in Science from 2000, William E. Vinje and Jack Gallant outlined a series of experiments used to test elements of the efficient coding hypothesis, including a theory that the non-classical receptive field (nCRF) decorrelates projections from the primary visual cortex. To test this, they took recordings from the V1 neurons in awake macaques during "free viewing of natural images and conditions" that simulated natural vision conditions.[15] The researchers hypothesized that the V1 uses sparse code, which is minimally redundant and "metabolically more efficient".[15] They also hypothesized that interactions between the classical receptive field (CRF) and the nCRF produced this pattern of sparse coding during the viewing of these natural scenes. In order to test this, they created eye-scan paths and also extracted patches that ranged in size from 1-4 times the diameter of the CRF. They found that the sparseness of the coding increased with the size of the patch. Larger patches encompassed more of the nCRF—indicating that the interactions between these two regions created sparse code. Additionally as stimulus size increased, so did the sparseness. This suggests that the V1 uses sparse code when natural images span the entire visual field. The CRF was defined as the circular area surrounding the locations where stimuli evoked action potentials. They also tested to see if the stimulation of the nCRF increased the independence of the responses from the V1 neurons by randomly selecting pairs of neurons. They found that indeed, the neurons were more greatly decoupled upon stimulation of the nCRF. In conclusion, the experiments of Vinje and Gallant showed that the V1 uses sparse code by employing both the CRF and nCRF when viewing natural images, with the nCRF showing a definitive decorrelating effect on neurons which may increase their efficiency by increasing the amount of independent information they carry. They propose that the cells may represent the individual components of a given natural scene, which may contribute to pattern recognition[15]
Another study done by Baddeley et al. had shown that firing-rate distributions of cat visual area V1 neurons and monkey inferotemporal (IT) neurons were exponential under naturalistic conditions, which implies optimal information transmission for a fixed average rate of firing. A subsequent study of monkey IT neurons found that only a minority were well described by an exponential firing distribution. De Polavieja later argued that this discrepancy was due to the fact that the exponential solution is correct only for the noise-free case, and showed that by taking noise into consideration, one could account for the observed results.[6]
A study by Dan, Attick, and Reid in 1996 used natural images to test the hypothesis that early on in the visual pathway, incoming visual signals will be decorrelated to optimize efficiency. This decorrelation can be observed as the '"whitening" of the temporal and spatial power spectra of the neuronal signals".[16] The researchers played natural image movies in front of cats and used a multielectrode array to record neural signals. This was achieved by refracting the eyes of the cats and then contact lenses being fitted into them. They found that in the LGN, the natural images were decorrelated and concluded, "the early visual pathway has specifically adapted for efficient coding of natural visual information during evolution and/or development".[16]
Extensions
One of the implications of the efficient coding hypothesis is that the neural coding depends upon the statistics of the sensory signals. These statistics are a function of not only the environment (e.g., the statistics of the natural environment), but also the organism's behavior (e.g., how it moves within that environment). However, perception and behavior are closely intertwined in the perception-action cycle. For example, the process of vision involves various kinds of eye movements. An extension to the efficient coding hypothesis called active efficient coding (AEC) extends efficient coding to active perception. It hypothesizes that biological agents optimize not only their neural coding, but also their behavior to contribute to an efficient sensory representation of the environment. Along these lines, models for the development of active binocular vision and active visual tracking have been proposed. [17] [18] [19] [20]
The brain has limited resources to process information, in vision this is manifested as the visual attentional bottleneck.[21] The bottleneck forces the brain to select only a small fraction of visual input information for further processing, as merely coding information efficiently is no longer sufficient. A subsequent theory has been developed on exogenous attentional selection of visual input information for further processing guided by a bottom-up saliency map in the primary visual cortex.[22]
Criticisms
Researchers should consider how the visual information is used: The hypothesis does not explain how the information from a visual scene is used—which is the main purpose of the visual system. It seems necessary to understand why we are processing image statistics from the environment because this may be relevant to how this information is ultimately processed. However, some researchers may see the irrelevance of the purpose of vision in Barlow's theory as an advantage for designing experiments.[6]
Some experiments show correlations between neurons: When considering multiple neurons at a time, recordings "show correlation, synchronization, or other forms of statistical dependency between neurons".[6] However, it is relevant to note that most of these experiments did not use natural stimuli to provoke these responses: this may not fit in directly to the efficient coding hypothesis because this hypothesis is concerned with natural image statistics.[6] In his review article Simoncelli notes that perhaps we can interpret redundancy in the Efficient Coding Hypothesis a bit differently: he argues that statistical dependency could be reduced over "successive stages of processing", and not just in one area of the sensory pathway [6]
Observed redundancy: A comparison of the number of retinal ganglion cells to the number of neurons in the primary visual cortex shows an increase in the number of sensory neurons in the cortex as compared to the retina. Simoncelli notes that one major argument of critics in that higher up in the sensory pathway there are greater numbers of neurons that handle the processing of sensory information so this should seem to produce redundancy.[6] However, this observation may not be fully relevant because neurons have different neural coding. In his review, Simoncelli notes "cortical neurons tend to have lower firing rates and may use a different form of code as compared to retinal neurons".[6] Cortical Neurons may also have the ability to encode information over longer periods of time than their retinal counterparts. Experiments done in the auditory system have confirmed that redundancy is decreased.[6]
Difficult to test: Estimation of information-theoretic quantities requires enormous amounts of data, and is thus impractical for experimental verification. Additionally, informational estimators are known to be biased. However, some experimental success has occurred.[6]
Need well-defined criteria for what to measure: This criticism illustrates one of the most fundamental issues of the hypothesis. Here, assumptions are made about the definitions of both the inputs and the outputs of the system.[6] The inputs into the visual system are not completely defined, but they are assumed to be encompassed in a collection of natural images. The output must be defined to test the hypothesis, but variability can occur here too based on the choice of which type of neurons to measure, where they are located and what type of responses, such as firing rate or spike times are chosen to be measured.[6]
How to take noise into account: Some argue that experiments that ignore noise, or other physical constraints on the system are too simplistic.[6] However, some researchers have been able to incorporate these elements into their analyses, thus creating more sophisticated systems.[6]
Biomedical applications
Possible applications of the efficient coding hypothesis include cochlear implant design. These neuroprosthetic devices stimulate the auditory nerve by an electrical impulses which allows some of the hearing to return to people who have hearing impairments or are even deaf. The implants are considered to be successful and efficient and the only ones in use currently. Using frequency-place mappings in the efficient coding algorithm may benefit in the use of cochlear implants in the future.[9] Changes in design based on this hypothesis could increase speech intelligibility in hearing impaired patients. Research using vocoded speech processed by different filters showed that humans had greater accuracy in deciphering the speech when it was processed using an efficient-code filter as opposed to a cochleotropic filter or a linear filter.[9] This shows that efficient coding of noise data offered perceptual benefits and provided the listeners with more information.[9] More research is needed to apply current findings into medically relevant changes to cochlear implant design.[9]
References
- Barlow, H. (1961) "Possible principles underlying the transformation of sensory messages" in Sensory Communication, MIT Press
- Attneave, Fred (1954). "Some informational aspects of visual perception". Psychological Review. 61 (3): 183–93. doi:10.1037/h0054663. PMID 13167245.
- Olshausen, B. A.; Field, D.J. (1997). "Sparse coding with an overcomplete basis set: A strategy employed by V1?". Vision Research. 37 (23): 3311–3325. doi:10.1016/s0042-6989(97)00169-7. PMID 9425546.
- Lewicki, M.S. (2002). "Efficient coding of natural sounds". Nature Neuroscience. 5 (4): 356–363. CiteSeerX 10.1.1.386.3036. doi:10.1038/nn831. PMID 11896400.
- Zhaoping, L (Dec 2006). "Theoretical understanding of the early visual processes by data compression and data selection" (PDF). Network. 17 (4): 301–334. doi:10.1080/09548980600931995. PMID 17283516.
- Simoncelli, Eero P. (2003). "Vision and the statistics of the visual environment". Current Opinion in Neurobiology. 13 (2): 144–149. CiteSeerX 10.1.1.8.2800. doi:10.1016/S0959-4388(03)00047-3. PMID 12744966.
- Simoncelli, E.P.; Olshausen, B.A. (2001). "Natural image statistics and neural representation". Annual Review of Neuroscience. 24: 1193–1216. doi:10.1146/annurev.neuro.24.1.1193. PMID 11520932. S2CID 147618.
- Ma, L.B.; Wu, S. (Oct 25, 2011). "Efficient coding of natural images". Sheng li xue bao : [Acta Physiologica Sinica]. 63 (5): 463–471. PMID 22002237.
- Ming, Vivienne L; Holt, Lori L. (2009). "Efficient Coding in Human Auditory Perception". Journal of the Acoustical Society of America. 126 (3): 1312–1320. Bibcode:2009ASAJ..126.1312M. doi:10.1121/1.3158939. PMC 2809690. PMID 19739745.
- Geisler, W.S. (2008). "Visual perception and the statistical properties of natural scenes". Annual Review of Psychology. 59: 167–192. doi:10.1146/annurev.psych.58.110405.085632. PMID 17705683.
- Blättler, F; Hahnloser, R.H. (2011). "An efficient coding hypothesis links sparsity and selectivity of neural responses". PLOS ONE. 6 (10): e25506. Bibcode:2011PLoSO...625506B. doi:10.1371/journal.pone.0025506. PMC 3192758. PMID 22022405.
- Zhaoping L. 2014, The efficient coding principle , chapter 3, of Understanding vision: theory, models, and data
- Doi, E.; Gauthier, J. L.; Field, G. D.; Shlens, J.; Sher, A.; Greschner, M.; Machado, T. A.; Jepson, L. H.; Mathieson, K.; Gunning, D. E.; Litke, A. M.; Paninski, L.; Chichilnisky, E. J.; Simoncelli, E. P. (2012). "Efficient Coding of Spatial Information in the Primate Retina". Journal of Neuroscience. 32 (46): 16256–16264. doi:10.1523/JNEUROSCI.4036-12.2012. ISSN 0270-6474. PMC 3537829. PMID 23152609.
- van Hateren, J.H.; Ruderman, D.L. (Dec 7, 1998). "Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex". Proceedings: Biological Sciences. 265 (1412): 2315–2320. doi:10.1098/rspb.1998.0577. PMC 1689525. PMID 9881476.
- Vinje, W.E.; Gallant, J.L. (Feb 18, 2000). "Sparse coding and decorrelation in primary visual cortex during natural vision". Science. 287 (5456): 1273–1276. Bibcode:2000Sci...287.1273V. CiteSeerX 10.1.1.456.2467. doi:10.1126/science.287.5456.1273. PMID 10678835.
- Dan, Y; Atick, J.J.; Reid, R.C. (May 15, 1996). "Efficient coding of natural scenes in the lateral geniculate nucleus: experimental test of a computational theory". The Journal of Neuroscience. 16 (10): 3351–3362. doi:10.1523/JNEUROSCI.16-10-03351.1996. PMC 6579125. PMID 8627371.
- Zhao, Y., Rothkopf, C. A., Triesch, J., & Shi, B. E. (2012). A unified model of the joint development of disparity selectivity and vergence control. IEEE International Conference on Development and Learning and Epigenetic Robotics. pp. 1–6. doi:10.1109/DevLrn.2012.6400876.CS1 maint: uses authors parameter (link)
- Lonini, L., Forestier, S., Teulière, C., Zhao, Y., Shi, B. E., & Triesch, J. (2013). "Robust active binocular vision through intrinsically motivated learning". Frontiers in Neurorobotics. 7: 20. doi:10.3389/fnbot.2013.00020. PMC 3819528. PMID 24223552.CS1 maint: uses authors parameter (link)
- Zhang, C., Zhao, Y., Triesch, J., & Shi, B. E. (2014). Intrinsically motivated learning of visual motion perception and smooth pursuit. IEEE International Conference on Robotics and Automation. pp. 1902–1908. arXiv:1402.3344. doi:10.1109/ICRA.2014.6907110.CS1 maint: uses authors parameter (link)
- Teulière, C., Forestier, S., Lonini, L., Zhang, C., Zhao, Y., Shi, B., & Triesch, J. (2015). "Self-calibrating smooth pursuit through active efficient coding" (PDF). Robotics and Autonomous Systems. 71: 3–12. doi:10.1016/j.robot.2014.11.006.CS1 maint: uses authors parameter (link)
- visual spational attention https://en.wikipedia.org/wiki/Visual_spatial_attention
- Li. Z. 2002 A saliency map in primary visual cortex Trends in Cognitive Sciences vol. 6, Pages 9-16, and Zhaoping, L. 2014, The V1 hypothesis—creating a bottom-up saliency map for preattentive selection and segmentation in the book Understanding Vision: Theory, Models, and Data