What are some of the first steps in connecting vision with the world

What are some of the first steps in connecting vision with the world • The central operation is that of “picking out” or selectingand the usual mechanism that is appealed to in explaining this selection is attention (sometimes called focal attention or selective attention). • Why do we need to select? This is a nontrivial question and we will consider several different answers: • We need to select because we can’t process all the information available. This is the resource-limitation reason (channel capacity). • We need to select because of the way relevant information in the world is packaged. It gives rise to the Binding Problem • We need to select because certain patterns cannot be computed without first selecting (“marking”) certain elements of a scene • We need to select because selection is the first line of contact between the mind and the world – and precedes all conceptualizing and encoding <But I will not talk about that in this class>

 Attention as Selection Focus on the Selection or Filtering aspects. Ask yourself: • Why do we need to select anyway? • Because our processing capacity is limited? • The Big Question: In what way is it limited? (Miller, 1957) • We will return to this core question after some preliminaries on the early study of attention as selection and the filter theory. • On what basis do we select? Some alternatives: • We select according to what is important to us (e.g., affordances) • We select what can be described physically (i.e., “channels”) • We select based on what can be encoded without accessing LTM • We “pick out” things to which we subsequently attach concepts: i.e., we pick out objects (but what do we do if they move?) • What happens to what we have not selected?

 Big Question #1: Why do we need to select anyway? Human information processing is limited. But along what dimensions (in what respect) is it limited? • Channel capacity: Shannon-Hartley Theorem • Capacity measured in some sort of “chunks” (Miller) • Capacity measured in terms of the number of arguments that can be simultaneously bound to cognitive routines (Newell) • To what things in the world can the arguments of visual predicates be bound?

Amount of information in terms of the Information-theoretic measure (entropy) • Amount of information in a signal depends on how much one’s estimate of the probability of events is changed by the signal.H = -pi Log2 (pi) … information in bits • “One of by land, two if by sea” contains one bit of information if the two possibilities were equally likely, less if they were not (e.g., if one was twice as likely as the other the information in the message would be ⅓ Log ⅓ + ⅔ Log ⅔ = 0.92 bits) • The amount of information transmitted depends on the potential amount of information in the message and the amount of correlation between message sent and message received. So information transmitted is a type of correlation measure (without regard to any ordinal properties of messages).

Information transmitted in a typical absolute judgment experiment • Information transmitted in an experiment in which subjects were presented with tones drawn from a known practiced set (of a given size, which determines the value of input information) and had to name the tones from a learned name set. • The information transmitted was always around 2.5 bits or an average of 6.25 equiprobable alternatives!

Example of the use of chunking Binary Octal Hex …..? • To recall a string of binary bits – e.g., 00101110101110110101001 • People can recall a string of about 8 binary integers. If they learn a binary encoding rule (000, 011, 102, 113) they can recall about 8 such chunks or 18 binary bits. If they learn a 3:1 chunking rule (called the Octal number system) they can recall a 24 bit string, etc

Why can we retain vastly different amounts of “information” just by using a different encoding vocabulary size? • Answer: The architecture of the cognitive system has the property that it can deal with a fixed maximum number of items, regardless of what the items are. • This property can be exploited to get around the bottleneck of the short-term memory. We do this by recoding the input into a smaller number of discrete units, called chunks. • There is also evidence that it takes additional time to encode and decode chunks, so the recoding technique is a case of time-capacity tradeoff or what is known in CS as a compute-vs-store tradeoff. • Allan Newell’s novel model to account for the time taken in the Sternberg memory scan experiment attributes the observed RT to encoding or chunking.

Early studies of attention: The “Cocktail Party Problem” • What determines how well you can select one conversation among several? Why are we so good at it? • The more controlled version of this study used dichotic presentations – one “channel” per ear. • It was found that when attention is fully occupied in selecting information from one ear (through use of the “shadowing” task), almost nothing is noticed in the “rejected” ear. • More careful observations shows this was not quite true • Change in spectral properties (pitch) is noticed • You are likely to notice your name spoken • Even meaning is extracted, as shown by involuntary ear switching and disambiguating effect of rejected channel content

Broadbent’s Filter Theory Rehearsal loop Effectors Motor planner Senses Filter Limited Capacity Channel Very Short Term Store Store of conditional probabilities of past events (in LTM) Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press.

Problems with the Filter Theory • The filter “leaks.” Work by Treisman, Lackner, and many others shows that the filter could not be eliminating parts of the input using a physically-defined channel, because the properties on the basis of which the input is filtered require a high level of processing (e.g., determination of meaning). Consequently such information must have to have gotten through the filter! • Many solutions to this conundrum have been proposed, none of which are entirely satisfactory, but each of which embodies some ideas that may be part of the story. • What all these alternatives do is assume that the filter is responsive to top-down expectancy and prediction effects. But the evidence is against this sort of knowledge-based selection as a general property of perception (Pylyshyn, 1999), since perception is a modular function (i.e., early stages of vision are insensitive to cognitive factors – they are cognitively impenetrable)

Visual analogues illustrating the two-channel selection problem In these examples you are to read only the text in shadows and ignore the rest. Read as quickly as you can and when you are finished, close your eyes or look away from the text.

Visual analogue #1 illustrating the two-channel selection problem In performing an experiment like this one on man attention car it house is boy critically hat important she that candy the old material horse that tree is pen being phone read cow by book the hot subject tape for pin the stand relevant view task sky be read cohesive man and car gramatically house complete boy but hat without shoe either candy being horse so tree easy pen that phone full cow attention book is hot not tape required pin in stand order view to sky read red it nor too difficult.

Visual analogue #2 illustrating the two-channel selection problem It is important that the subject man be car pushed slightly boy beyond that his normal limits horse of tree competence open for be only in phone this cow way book can hot one tape be pin certain stand that snaps he with is his paying teeth attention in to the the empty relevant air task and rather minimal than to the attention candy to horse the tree second or peripheral task.

Stroop EffectBaseline: Name the colors of the ink                                                                                                                                                                    

Stroop Effect in English Name the colors of the ink REDGREENBLUEPINKBROWNORANGEGREENPINKREDYELLOWGREENYELLOWREDBROWNREDBLUEBROWNGREENREDORANGEREDBLUEYELLOWPINKORANGE GREENBLUEBROWNPINKREDYELLOWGREENYELLOWREDBROWNPINKREDYELLOWGREENYELLOWREDPINKORANGEGREENBLUE BROWNPINKREDYELLOWGREENYELLOWREDBROWNRED BLUEGREENBROWN YELLOWGREENYELLOWREDPINKORANGEGREENREDBLUEBROWNGREENREDORANGEREDBLUEYELLOWYELLOWGREENYELLOWREDBROWNPINKREDYELLOWGREENPINKREDYELLOW

Stroop Effect in PortugueseName the colors of the ink VERMELHOVERDEAZULMARROMROSAALARANJADOVERDEROSAVERMELHOAMARELOVERDEAMARELOVERMELHOMARROMVERMELHOAZULMARROMVERDEVERMELHOALARANJADOVERMELHOAZULAMARELOROSAALARANJADO VERDEAZULMARROMROSAVERMELHOAMARELOVERDEAMARELOVERMELHOMARROMROSAVERMELHOAMARELOVERDEAMARELOVERMELHOROSAALARANJADOVERDEAZUL MARROMROSAVERMELHOAMARELOVERDEAMARELOVERMELHOBROWNVERMELHO AZULMARROM VERDEAMARELOVERDEAMARELOVERMELHOROSAALARANJADOVERDEVERMELHOAZULMARROMVERDEVERMELHOALARANJADOVERMELHOAZUL

Degree of Interference of the attended message, as well as its interpretation, shows that the rejected message was understood • Moral: Although the rejected channel appears to be rejected, it is being processed enough to understand the words! • The semantic interpretation of attended message depends on the meaning content of the rejected message. Subjects were asked to paraphrase the attended message in: • Channel 1 (attended): “I think I will go down to the bank but I will be back for dinner” • Channel 2 (rejected): “The election results will depend on the value of the dollar against the Euro and on the state of the domestic economy” • OR Channel 2 (rejected): “The rain has resulted in erosion by the overflowing river”(Lackner, J. R., & Garrett, M. F. (1972). Resolving ambiguity: Effects of biasing context in the unattended ear. Cognition, 1, 359-372.)

The special case of visual attention • Visual working memory and visual selection • What is the nature of the input, storage and information processing limitations in vision?

Studies of the capacity of Visual Working Memory(Luck & Vogel, 1997) • People appear to be able to retain about 4 properties of an object (4 colors, 4 shapes, 4 orientations, etc) over a short time • People can also retain the identity of 4 objects for a short time. • Luck and Vogel found that as long as there are not more than 4 properties per object, people can retain large numbers of properties (a phenomenon that is reminiscent of Miller’s “chunking hypothesis” except the chunks are objects).

Luck & Vogel finding • People can retain about 4 properties of a visual display in their VSTM • People can retain the identity of about 4 objects in their VSTM • If the properties are associated with different objects people can retain 4 properties per object – a much higher total number of properties

What does visual attention select? (What are the bases for selection?) • If attention is selection, what does visual attention select? • An obvious answer is places. We can select places by moving our eyesso our gaze lands on different places. • When places are selected, are they selected automatically? • Must we always move our eyes to change what we attend to? • Studies of Covert Attention-Movement: Posner (1980). • How does attention switch from one place to another? • Is it always the case that we attend to places? Can we attend to any other property? Can we select on the basis of color, depth, spatial frequency, affordances, or the property a painting has of having been painted by Da Vinci (A property to which Bernard Berenson was able to attend extremely well). cf Gibson

 How else can visual attention select? • Can we control the size and shape of the region that is selected, or is selection always punctate and data-driven? • Zoom Lens model of spatial attention (Eriksen & St James, 1986). • We control where attention moves: • Is this automatic or voluntary? • How do we know where to direct our attention? How do we specify a location prior to attending to it? • We need a way to specify where or what prior to attending to it! • Keep this conundrum in mind – we will return to it later! • How narrowly can we focus our attention? Can we make it pick out one out of several objects? • Are there special conditions under which we are able to pick out individual things? We will return to “attentional resolution” or the minimum spacing for selecting individual things.

Covert movements of attention Example of an experiment using a cue-validity paradigm for showing that the locus of attention moves without eye movements and for estimating its speed.Posner, M. I. (1980). Orienting of Attention. Quarterly Journal of Experimental Psychology, 32, 3-25.

Recall Posner’s demonstration of exogenous attention switch Does the improved detection in intermediate locations entail that the “spotlight of attention” moves continuously through empty space?

Sperling & Weichselgartner (1995) “Episodic” or Quantal Theory of Attention switching Assumes a quantal “shift” in attention in which the spotlight pointed at location -2 is extinguished and, simultaneously, the spotlight at location +2 is turned on. Because extinction and onset take a measurable amount of time, there is a brief period when the spotlights partially illuminate both locations simultaneously.

But there are empirical reasons why objects are a better target for attentional selection than location • There is experimental evidence that attention attaches to things rather than places • The Posner evidence of analog movement of focal attention, when attention is exogenously summoned, can be explained by a punctate object-based theory of attention-allocation – Sperling & Weichselgartner (1995)

This object-based view of attentional selection is an important recent discovery • There are good reasons on both empirical and theoretical grounds for supposing that attention attaches itself to objects rather than locations • But first let’s look at some other ways that attention can be allocated in vision

We can select a shape even when it is intertwined among other similar shapes Are the green items the same? On a surprise test at the end, subjects were not able to recall shapes that had been present but had not been attended in the task But this should not be possible if we allocate attention to locations

The time-course of attention:Inhibition of return • If we vary the time between the cue and target in a modified Posner paradigm, we find that when the Cue-Target-Onset-Asynchrony (CTOA) gets to around 300-900 ms, reaction time to the target begins to increase. This is called Inhibition-of-return (Klein, 2000). • To get this effect we actually have to attract attention to the target location and then attract it back to the origin. IOR is one of many examples of an inhibition effect being produced by attention.

Other examples of attentionally induced inhibition • Negative Priming (Treisman & DeShepper, 1996). • Is there a figure on the right that is the same shape as the figure on the left? • When the figure on the left is one that had appeared as an ignored figure on the right in a previous trial, Reaction Time is longer and accuracy poorer. • This “negative priming” effect persisted over 200 intervening trials and was reported to last for a month!

Another negative attention effect: Inattentional Blindness

Inattentional Blindness • The background task is to report which of two arms of the + is longer. One critical trial per subject, after about 3,4 background trials. Another “critical” trial presented as a divided attention control. • 25% of subjects failed to see the square when it was presented in the parafovea (2° from fixation). • But 65% failed to see it when it was at fixation! • When the background task cross was made 10% as large, Inattentional Blindness increased from 25% to 66%. • It is not known whether this IB is due to concentration of attention at the primary task, or whether there is inhibition of outside regions. Mack & Rock (1988)

In what other ways might our visual information capacity be limited? • There are obviously limitations on the input side of vision that depend on the acuity of the sensors and the range of physical properties to which they respond. • But there is a limitation beyond that of acuity: The perceptual system is limited in what it can individuate and how many of these individuals it can deal with at one time. The capacity to individuate is different from the capacity to discriminate. • Some reason for thinking that individuating is a distinct process

Exploring the limits of attention and the units over which selection operates • It appears that the human information-processing bottleneck cannot be expressed perspicuously in terms of information-theoretic measures, nor can it be specified in physical parameters (e.g., in terms of locations or spatio-temporal regions), although such measures often do capture important aspects of attention (e.g., visual attention often moves continuously through space). • But there are other possible ways one might consider expressing the limits of attention. • Over the past 25 years evidence has been accumulating that the human attention system is, at least in part, tuned to individual objects in the world. This would certainly make sense from an evolutionary perspective. But what does this mean?

The increasingly important role played by ‘Objects’ in studies of visual attention • Miller’s ‘Magic Number 7’ has continued to haunt us even beyond studies of short-term memory (STM). • There is a limitation in visual information processing that is beyond the limitation of acuity and of STM capacity: The perceptual system is limited in what it can individuate and how many of these individuals it can deal with at one time. • The capacity to individuate is different from memory capacity and discrimination capacity. • This notion of individuating and of individuals may be related to Miller’s “chunks”, but it has a special role in vision which I can only sketch very briefly at this time • First some reasons why individuating is a distinct process

Individuating is different from discriminating

Individuals and patterns • Vision does not recognize patterns by applying a template but by parsing in into parts (recognition-by-parts) • A pattern is encoded over time (and often over saccades), therefore the visual system must keep track of the individual parts and merge descriptions of the same part at different times and stages of encoding • Thus in order to recognize a pattern, the visual system must pick out individual parts and bind them to the representation being constructed – keep track of them • It must do so before it has recognized any properties of the parts – it must individuate prior to recognizing • Examples include what are called “visual routines”

Are there collinear items (n>3)?

Several objects must be picked out at once in making relational judgments • The same is true for other relational judgments like inside or on-the-same-contour… etc. We must pick out the relevant individual objects first.

When items cannot be individuated, patterns over them cannot be recognized Do these figures contain one or two distinct curves? Individuating these curves requires a “curve tracing” operation, so Number_of_curves (C1, C2, …) takes time proportional to the length of the shortest curve.

The figure on the left is one continuous curve, the one on the right is two distinct curves – as shown in color.

Another example: Subitizing vs Counting. How many squares are there? Subitizing is fast, accurate and only slightly dependent on how many items there are. Only the squares on the right can be subitized. Concentric squares cannot be subitized because individuating them requires curve tracing, just as it did in the previous example.

Signature subitizing phenomena only appear when objects are automatically individuated and indexed

Example of subitizing popout and non-popout features(Count Pink vs. Count Online)

Encoding conjunctions of properties • Experiments showing the special difficulty that vision has in detecting conjunctions of several properties have provided a basis for understanding an important problem in in visual analysis

How are conjunctions of features detected? Read the vertical line of digits in the following display Under these conditions Conjunction Errors are very frequent

Rapid visual search (Treisman) Find the following simple figure in the next slide:

This case is easy – and the time is independent of how many nontargets there are – because there is only one red item. This is called a ‘popout’ search

This case is also easy – and the time is independent of how many nontargets there are – because there is only one right-leaning item. This is also a ‘popout’ search.

Rapid visual search (conjunction) Find the following simple figure in the next slide:

What are some of the first steps in connecting vision with the world

What are some of the first steps in connecting vision with the world

Presentation Transcript

What are some of the obstacles?

What are the steps of the Scientific Method?

What are the next steps?

Connecting the World

What are the steps of mitosis? What happens in each?

Connecting The World

First Steps in the Clouds

What in the world are forensics?

Connecting the team with vision and strategy

What are some of the effects of war?

What are some events in history that have changed the world?

First Steps in the Clouds

Connecting with… the World

The first steps

What are some “real world” applications of the quadratic equation?

Dentists in India are some of the best in the world

What are some volunteer opportunities with the homeless?

What Are The Steps Which Can Help In Connecting a Wireless Brother Printer?

What Are Some Best Places To Travel in The World?

The First Vision

What are some of the obstacles?

What are the steps involved in the concreting process