KEMBAR78
COMP3411 Week 7 - Computer Vision | PDF | Computer Vision | Vision
0% found this document useful (0 votes)
12 views58 pages

COMP3411 Week 7 - Computer Vision

The document provides an overview of computer vision, covering topics such as image processing, scene analysis, and cognitive vision. It discusses the challenges machines face in interpreting visual information, the techniques used for image processing, and the interaction between vision and reasoning in cognitive vision. Additionally, it highlights applications of computer vision in various fields, including robotics and object recognition.

Uploaded by

tianzong Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views58 pages

COMP3411 Week 7 - Computer Vision

The document provides an overview of computer vision, covering topics such as image processing, scene analysis, and cognitive vision. It discusses the challenges machines face in interpreting visual information, the techniques used for image processing, and the interaction between vision and reasoning in cognitive vision. Additionally, it highlights applications of computer vision in various fields, including robotics and object recognition.

Uploaded by

tianzong Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Computer Vision

COMP3411/9814: Artificial Intelligence


Lecture Overview

• Introduction

• Image processing

• Scene analysis

• Cognitive vision
Lecture Overview

• Introduction

• Image processing

• Scene analysis

• Cognitive vision
Introduction

• Other sensory modalities an agent uses for interaction with the


world (vision, acoustic, temperature, pressure, etc.)
• Computer vision endows machines to “see” the world.
• Applications in character recognition, image interpretation, face
recognition, fingerprint identification, robot control.
Introduction
• Effortless for humans, but a difficult problem for machines:
• Variable and uncontrolled illumination
• Shadows
• Complex and hard-to-describe objects
• Objects from outdoor scenes
• Non-rigid objects
• Objects occluding other objects
Introduction

• Stateof computer vision  The general computer vision problem


is unsolved
• Develop a visual system as good as humans. No progress for 40 years.
•A lot of progress in specific computer vision problems
• e.g. face recognition used in digital cameras, surveillance, security.
• e.g. pick and place.
Introduction
Doggie cam
Introduction
Object recognition
Introduction
Computer vision in action
Introduction
Doggie cam in action
Introduction
What the robot sees
Introduction

• Computer vision creates an image of a scene on an array.


• It uses the lens camera to produce a perspective projection of the
scene within the camera field of view.
• Perspective projection is a many-to-one transformation.
• It can be noisy due to low ambient light levels.
Introduction

• Perspective projection  Many-to-one transformation.


• Several different scenes can produce identical images.
• The image cannot be directly “inverted” to reconstruct the scene.
Introduction
• Image is represented as a two-dimensional, time-varying matrix of
intensity values I(x,y,t).
• Colour vision uses three matrices, RGB. Monochromatic only one.
• In static scenes, time variable is not considered.

• Iconic model or features can be obtained from this matrix.


• Information to be extracted depends on the task, e.g.,
• For safely navigation: object locations, boundaries, surface property.
• For object manipulation: locations, sizes, shapes, compositions and
textures.
• Others might include colour, belonging to certain classes.
Introduction

• Object features
• Illumination (incident light)

• Reflectance (reflected light)

• Depth (distance from camera)

• Orientation (angle of normal to surface)

• Other features: shading, colour, texture


Introduction
Image formation

Each pixel is a
Light image is number that is
created by an index into
camera a palette of
colours or
grey scales.
Introduction
• Binary vision

• The original image is “thresholded”, i.e.

new[x, y] = (old[x, y] > threshold)

• Every pixel brighter than a certain threshold is given a value of 1


otherwise it is zero.

• Easy to process and powerful enough to use in some industrial


applications, e.g., picking parts from an assembly line.
Example: Steering an Automobile
• Neural networks can be used to convert
the image intensity matrix directly into
actions.
• ALVINN steers an automobile:
• Input  low-resolution 30x32 image from a
mounted camera looking straight ahead.
• Hidden layer  5 sigmoid units.
• Output  30 units to control the steering
angle. Winner-take-all.
Example: Steering an Automobile

• Training 5 minutes of human driving,


using actual steering angles as labels.
• Incrementally training with
backpropagation.
• Problems:
• The driver usually drives well.
• After long, straight distance, the network
produces only straight-ahead angles.
Example: Steering an Automobile
• Non-official video:
• https://youtu.be/oHEH2VDDGss
Robot vision: Two stages
• Image processing involves filtering operations to reduce noise,
accentuate edges, and find regions.
• Scene analysis creates an iconic model or a feature-based
description including only relevant details.
Robot vision: Two stages
• Man-made environments: doorways,
furniture, other agents, humans, walls,
floors, etc.
• In exterior environments: animals,
plants, man-made structures,
automobiles, roads, etc.
• Two techniques: look for edges (e.g.,
intensity changes abruptly) or regions
(e.g., intensity changes gradually)
through discontinuities.
Robot vision: Two stages
• From robot view:
• Three toy blocks (A, B, C).
• A doorway.
• A corner of the room.

• Dealing only with disposition of the


blocks. Iconic model:
• ((C B A FLOOR))
• If C moved  ((C FLOOR) (B A FLOOR)) or
((B A FLOOR) (C FLOOR))
Lecture Overview

• Introduction

• Image processing

• Scene analysis

• Cognitive vision
Image processing: Averaging

• Image represented as n x m array I(x,y)  image intensity array.


• Cells are called pixels. Each number represent light intensity.
• Real images always contain noise.
• Smoothing tries to remove isolated bright and dark regions.
• Averaging + sliding  convolution.
• Has side-effect of blurring image.
Image processing: Averaging

• It can use a threshold.


• Larger rectangles achieve more
smoothing.
• Broad lines are thickened and thin lines
eliminated
• In the example, ε = 3, i.e., 0 if sum ≤ 3, 1
otherwise.
Image processing: Averaging

• Image smoothing with a


Gaussian filter.
• Images increasingly
blurred.
Image processing: Averaging

• Given a simple 4 x 4 picture matrix:

9 9 9 3
9 9 3 3
9 3 3 3
3 3 3 3

• Smooth this matrix using an averaging technique and a 3 x 3


pixel window.
Image processing: Averaging
• There are four 3 x 3 pixel windows in the
matrix.

• Replace middle value in each window by


average of all the values in the window.
9 9 9 3 9 9 9 3
9 9 3 3 9 7 5 3
9 3 3 3 9 5 4 3
3 3 3 3 3 3 3 3
Image processing: Edge enhancement
• Edges are used to build a line drawing.
• Outlines can be compared with object models.
• Edges are parts of the image with markedly
different property values (e.g., intensity)
Image processing: Edge enhancement

• Averaging and edge enhancement can be combined.


• For instance, using a Laplacian filter.
Image processing: Region finding
• To find regions in which a property does not change abruptly.
• A region is homogeneous. Intensity difference no more than some
ε threshold
• Split-and-merge method. 2 n x 2 n array of pixels.
• Each no homogeneous region is split in four.
• Splits continues until no more splits need to be made.
• Adjacent regions are merged if homogeneous.
Image processing: Region finding
• Splitting and merging candidate regions.
• In this example, intensities may not vary
more than 1 unit. Therefore, ε <= 1.
Lecture Overview

• Introduction

• Image processing

• Scene analysis

• Cognitive vision
Scene Analysis
• Extract information about the scene.
• As scene-to-image is many-to-one additional images or
information is needed.
• Information can be very general or specific, e.g., camera location,
illumination sources, indoors/outdoors, particular objects.
• Iconic model or features:
• Iconic model builds a model of the scene or part of it.
• Feature-based analysis is task-oriented.
Interpreting lines and curves in the image

• For scenes with rectilinear objects, lines should be postulated.


Methods fit segments of straight lines to edges.
• Scenes with curved objects fit conic sections (ellipses, parabolas,
hyperbolas).
• Interpreting the line drawing associates properties with
components of a line drawing.
• For instance, scenes with only planar surfaces have no more than
three surfaces intersected in a point.
Interpreting lines and curves in the image
• Scene with bounding walls, floor,
ceiling, a cube on the floor.
• Only three possible intersections:
• Occlude: 2 planes, one occluding ().
• Blade: both visible forming a convex edge
(+).
• Fold: both visible forming a concave edge
(-).
Interpreting lines and curves in the image
• Labelling types of junctions: V, W, Y, T assigning +, -, 
Model-based vision

• Use increasing knowledge about the scene.


• For instance, an industrial scene could use geometric models of
components to interpret images – still not semantics though.
• Or if we know a cube is in the scene, a projection can be fitted
specifying size, position, and orientation (using Euler angles).
Model-based vision

• Generalized cylinders for model


construction.
• Each cylinder uses 9 parameters: a, b,
c, 6 location parameters.
• Hierarchical representation.
Stereo vision
• Under perspective projection large,
distant objects might produce the
same image as similar but smaller,
closer ones.
• Distance estimation from single
images is problematic, but sometimes
possible.
• e.g., If we know an object is on the
floor and the camera height.
Stereo vision

• Depth information from stereo vision.


• Two-dimensional setup.
• Two lenses with distance b.
• Correspondence problem for pairs of
points.
Lecture Overview

• Introduction

• Image processing

• Scene analysis

• Cognitive vision
Principles of cognitive vision

• Is perception only a recovery process?


• Computer vision  3D descriptions of the scene, assigning labels to
objects and/or actions.
• Labels  provided to symbolic reasoning systems.

• Visual perception is seen as a black box delivering labels through


recognition, using (mostly static) data.
• From pixels to symbols is difficult, with no causal link between
present and past. Therefore, not well-fitted to anticipate the future.
Principles of cognitive vision

• Human behaviour is active!


• Humans (and animals) continuously shift their gaze.
• Humans have intentions and goals linking past with present with
the aim of anticipating the future.
• Human actions are goal driven, guided by motor and perceptual
expectations.
Principles of cognitive vision
• Is perception only an inference process?
• Signal analysis is not enough to understand a scene.
• Additional knowledge through inference – as we look at the world, we think
about it.

• Cognitive vision continuously exchanges information between


perception and reasoning. A form of predictive vision.
• Actions driven by perceptual expectations – how should I act to
see my hand close to the object vs. how should I act to reach the
object?
Vision and reasoning interaction

• Cognitive vision extends processing visual data beyond the


concept of extracting visual features for real-time control.
• Reasoning and perception talk about objects, actions, events, and
alternative possibilities.
• Loop between prediction (what the system expects perceptually)
and exploration (how the system acts to verify if predictions are
met).
Vision and reasoning interaction

• Five interaction paths for Vision (V) and Reasoning (R)


• V  R. Traditional perspective for computer vision.
• R  V. For example “search for the scissors” invokes a visual search.
• V  R  V. For example “someone is cutting the tomato with the spoon”.
Implausible for R so ask V again.
• R  V  R. For example R needs to know the number of cars.
• R  VV…V. Imaging and envisioning a situation, action, or event.
Vision and reasoning interaction

• Interactions between V and R can happen at earlier, later, or


middle stages.
V + R interaction examples

• Cognitive vision to support human-robot interaction.

• iCub’s behavior driven only by the direction of the subject’s gaze


making explicit intention to reach for the left or right hand.
V + R interaction examples

• Cognitive vision to support human-robot interaction.

• iCub’s behavior driven only by the direction of the subject’s gaze


making explicit intention to reach for the left or right hand.
V + R interaction examples

• Cognitive vision for signature of


biological motion.
•  Angular velocity and curvature of the
trajectory.
• Hand during drawing or writing.
• Knee or ankle during walking.

• Visually measured independently of


its shape and color.
V + R interaction examples

• Cognitive vision involves language as an attention mechanism.

• Synonymy (same meaning) and hypernymy (“is a” relation).

• Objects likely co-occur, e.g., table, cups, spoons.

• A knife put in a drawer is not gone but hidden from the sight. In
this case language acts as a part of the reasoning process.
V + R interaction examples

• Cognitive vision for object’s affordances.


• Segmentation to infer adjectives.

• Pixel coloured associated to affordances.


V + R interaction

• Cognitive vision does not exist in isolation to detect what is where.


• Direct contrast to how vision is predominantly studied today.

• Unified representation within vision and other sensory modalities


through action. V + R  Action.
• Questions beyond what, where  why, how, who.
• Also how synthesize visual information to anticipate action effects.
Resources
• OpenCV: real-time optimized
Computer Vision library.
• https://opencv.org/

• YOLO: state-of-the-art, real-time


object detection.
• https://pjreddie.com/darknet/yolo/
References
• Nilsson, N. J. (1998). Artificial
intelligence: a new synthesis. Morgan
Kaufmann. Chapter 6.

• Aloimonos, Y., & Sandini, G. Principles


of Cognitive Vision. In Cangelosi, A., &
Asada, M. (Eds.). (2022). Cognitive
robotics. MIT Press. Chapter 14.
Feedback
• In case you want to provide anonymous
feedback on these lectures, please visit:

• https://forms.gle/KBkN744QuffuAZLF8

Muchas gracias!

You might also like