The document discusses augmented reality (AR) technology, emphasizing its definition, characteristics, and various tracking methods essential for integrating virtual content with real-world environments. It covers different types of tracking technologies including mechanical, magnetic, inertial, optical, and hybrid systems, detailing how they function and their respective advantages and disadvantages. Additionally, the document highlights challenges in tracking, such as ensuring accurate registration and handling dynamic errors during AR experiences.
LECTURE 10: AR
TECHNOLOGY:TRACKING
COMP 4010 – Virtual Reality
Semester 5 – 2016
Bruce Thomas, Mark Billinghurst
University of South Australia
October 18th 2016
2.
Augmented Reality Definition
• Defining Characteristics [Azuma 97]
• Combines Real andVirtual Images
• Both can be seen at the same time
• Interactive in real-time
• The virtual content can be interacted with
• Registered in 3D
• Virtual objects appear fixed in space
Azuma, R. T. (1997). A survey of augmented reality. Presence, 6(4), 355-385.
3.
Augmented RealityTechnology
• CombiningReal and Virtual Images
• Display technologies
• Interactive in Real-Time
• Input and interactive technologies
• Registered in 3D
• Viewpoint tracking technologies
Display
Processing
Input Tracking
AR RequiresTracking andRegistration
• Registration
• Positioning virtual object wrt real world
• Fixing virtual object on real object when view is fixed
• Tracking
• Continually locating the users viewpoint when view moving
• Position (x,y,z), Orientation (r,p,y)
Tracking Requirements
• Augmented RealityInformation Display
• World Stabilized
• Body Stabilized
• Head Stabilized
Increasing Tracking
Requirements
Head Stabilized Body Stabilized World Stabilized
MagneticTracker
• Idea: coil generatescurrent when moved in
magnetic field. Measuring current gives position
and orientation relative to magnetic source.
• ++: 6DOF, robust
• -- : wired, sensible to metal, noisy, expensive
Flock of Birds (Ascension)
12.
InertialTracker
• Idea: measuring linearand angular orientation rates
(accelerometer/gyroscope)
• ++: no transmitter, cheap, small, high frequency, wireless
• -- : drifts over time, hysteresis effect, only 3DOF
IS300 (Intersense)
Wii Remote
13.
UltrasonicTracker
• Idea: time ofFlight or phase-Coherence Sound Waves
• ++: Small, Cheap
• -- : 3DOF, Line of Sight, Low resolution, Affected by
environmental conditons (pressure, temperature)
Ultrasonic
Logitech IS600
14.
Global Positioning System(GPS)
• Created by US in 1978
• Currently 29 satellites
• Satellites send position + time
• GPS Receiver positioning
• 4 satellites need to be visible
• Differential time of arrival
• Triangulation
• Accuracy
• 5-30m+, blocked by weather, buildings etc.
15.
Mobile Sensors
• Inertial compass
• Earth’smagnetic field
• Measures absolute orientation
• Accelerometers
• Measures acceleration about axis
• Used for tilt, relative rotation
• Can drift over time
Why Optical Trackingfor AR?
• Many AR devices have cameras
• Mobile phone/tablet, Video see-through display
• Provides precise alignment between video and AR overlay
• Using features in video to generate pixel perfect alignment
• Real world has many visual features that can be tracked from
• Computer Vision well established discipline
• Over 40 years of research to draw on
• Old non real time algorithms can be run in real time on todays devices
18.
Common AR OpticalTracking Types
• Marker Tracking
• Tracking known artificial markers/images
• e.g. ARToolKit square markers
• Markerless Tracking
• Tracking from known features in real world
• e.g. Vuforia image tracking
• Unprepared Tracking
• Tracking in unknown environment
• e.g. SLAM tracking
19.
Marker tracking
• Availablefor more than 10 years
• Several open source solutions exist
• ARToolKit,ARTag,ATK+, etc
• Fairly simple to implement
• Standard computer vision methods
• A rectangle provides 4 corner points
• Enough for pose estimation!
Goal:Find Camera Pose
• Goal is to find the camera pose in maker coordinate frame
• Knowing:
• Position of key points in on-screen video image
• Camera properties (focal length, image distortion)
Coordinates for MarkerTracking
Marker Camera
• Final Goal
• Rotation & Translation
1: Camera Ideal Screen
• Perspective model
• Obtained from Camera Calibration
2: Ideal Screen Observed Screen
• Nonlinear function (barrel shape)
• Obtained from Camera Calibration
3: Marker Observed Screen
• Correspondence of 4 vertices
• Real time image processing
MarkerTracking – FiducialDetection
• Threshold the whole image to black and white
• Search scanline by scanline for edges (white to black)
• Follow edge until either
• Back to starting pixel
• Image border
• Check for size
• Reject fiducials early that are too small (or too large)
27.
MarkerTracking – RectangleFitting
• Start with an arbitrary point “x” on the contour
• The point with maximum distance must be a corner c0
• Create a diagonal through the center
• Find points c1 & c2 with maximum distance left and right of diag.
• New diagonal from c1 to c2
• Find point c3 right of diagonal with maximum distance
28.
MarkerTracking – Patternchecking
• Calculate homography using the 4 corner points
• “Direct Linear Transform” algorithm
• Maps normalized coordinates to marker coordinates
(simple perspective projection, no camera model)
• Extract pattern by sampling and check
• Id (implicit encoding)
• Template (normalized cross correlation)
29.
MarkerTracking – Cornerrefinement
• Refine corner coordinates
• Critical for high quality tracking
• Remember: 4 points is the bare minimum!
• So these 4 points should better be accurate…
• Detect sub-pixel coordinates
• E.g. Harris corner detector
• Specialized methods can be faster and more accurate
• Strongly reduces jitter!
• Undistort corner coordinates
• Remove radial distortion from lens
30.
Marker tracking –Pose estimation
• Calculates marker pose relative to the camera
• Initial estimation directly from homography
• Very fast, but coarse with error
• Jitters a lot…
• Iterative Refinement using Gauss-Newton method
• 6 parameters (3 for position, 3 for rotation) to refine
• At each iteration we optimize on the error
• Iterat
31.
From MarkerTo Camera
• Rotation & Translation
TCM : 4x4 transformation matrix
from marker coord. to camera coord.
32.
Tracking challenges inARToolKit
Falsepositives and inter-marker confusion
(image by M. Fiala)
Image noise
(e.g. poor lens, block
coding /
compression, neon tube)
Unfocused camera,
motion blur
Dark/unevenly lit
scene, vignetting
Jittering
(Photoshop illustration)
Occlusion
(image by M. Fiala)
But - Youcan’t cover world with ARToolKit Markers!
35.
Markerless Tracking
Magnetic TrackerInertial
Tracker
Ultrasonic
Tracker
Optical
Tracker
Marker-Based
Tracking
Markerless
Tracking
Specialized
Tracking
Edge-Based
Tracking
Template-Based
Tracking
Interest Point
Tracking
• No more Markers! "Markerless Tracking
Mechanical
Tracker
36.
Natural Feature Tracking
• UseNatural Cues of Real Elements
• Edges
• Surface Texture
• Interest Points
• Model or Model-Free
• No visual pollution
Contours
Features Points
Surfaces
Tracking by KeypointDetection
• This is what most trackers do…
• Targets are detected every frame
• Popular because
tracking and detection
are solved simultaneously
Keypoint detection
Descriptor creation
and matching
Outlier Removal
Pose estimation
and refinement
Camera Image
Pose
Recognition
39.
What is aKeypoint?
• It depends on the detector you use!
• For high performance use the FAST corner detector
• Apply FAST to all pixels of your image
• Obtain a set of keypoints for your image
• Describe the keypoints
Rosten, E., & Drummond, T. (2006, May). Machine learning for high-speed corner detection.
In European conference on computer vision (pp. 430-443). Springer Berlin Heidelberg.
Descriptors
• Describe theKeypoint features
• Can use SIFT
• Estimate the dominant keypoint
orientation using gradients
• Compensate for detected
orientation
• Describe the keypoints in terms
of the gradients surrounding it
Wagner D., Reitmayr G., Mulloni A., Drummond T., Schmals<eg D.,
Real-Time Detec<on and Tracking for Augmented Reality on Mobile Phones.
IEEE Transac<ons on Visualiza<on and Computer Graphics, May/June, 2010
43.
Database Creation
• Offlinestep – create database of known features
• Searching for corners in a static image
• For robustness look at corners on multiple scales
• Some corners are more descriptive at larger or smaller scales
• We don’t know how far users will be from our image
• Build a database file with all descriptors and their
position on the original image
44.
Real-time tracking
• Searchfor known keypoints
in the video image
• Create the descriptors
• Match the descriptors from the
live video against those
in the database
• Brute force is not an option
• Need the speed-up of special
data structures
• E.g., we use multiple spill trees
Keypoint detection
Descriptor creation
and matching
Outlier Removal
Pose estimation
and refinement
Camera Image
Pose
Recognition
45.
NFT – Outlierremoval
• Removing outlining features
• Cascade of removal techniques
• Start with cheapest, finish with most expensive…
• First simple geometric tests
• E.g., line tests
• Select 2 points to form a line
• Check all other points being on correct side of line
• Then, homography-based tests
46.
NFT – Poserefinement
• Pose from homography makes good starting point
• Based on Gauss-Newton iteration
• Try to minimize the re-projection error of the keypoints
• Part of tracking pipeline that mostly benefits
from floating point usage
• Can still be implemented effectively in fixed point
• Typically 2-4 iterations are enough…
47.
NFT – Real-timetracking
• Search for keypoints
in the video image
• Create the descriptors
• Match the descriptors from the
live video against those
in the database
• Remove the keypoints that
are outliers
• Use the remaining keypoints
to calculate the pose
of the camera
Keypoint detection
Descriptor creation
and matching
Outlier Removal
Pose estimation
and refinement
Camera Image
Pose
Recognition
Marker vs.Natural FeatureTracking
• Marker tracking
• Usually requires no database to be stored
• Markers can be an eye-catcher
• Tracking is less demanding
• The environment must be instrumented
• Markers usually work only when fully in view
• Natural feature tracking
• A database of keypoints must be stored/downloaded
• Natural feature targets might catch the attention less
• Natural feature targets are potentially everywhere
• Natural feature targets work also if partially in view
56.
Tracking from anUnknown Environment
• What to do when you don’t know any features?
• Very important problem in mobile robotics - Where am I?
• SLAM
• Simultaneously Localize And Map the environment
• Goal: to recover both camera pose and map structure
while initially knowing neither.
• Mapping:
• Building a map of the environment which the robot is in
• Localisation:
• Navigating this environment using the map while keeping
track of the robot’s relative position and orientation
57.
Visual SLAM
• EarlySLAM systems (1986 - )
• Computer visions and sensors (e.g. IMU, laser, etc.)
• One of the most important algorithms in Robotics
• Visual SLAM
• Using cameras only, such as stereo view
• MonoSLAM (single camera) developed in 2007 (Davidson)
How SLAMWorks
• Threemain steps
1. Tracking a set of points through successive camera frames
2. Using these tracks to triangulate their 3D position
3. Simultaneously use the estimated point locations to calculate
the camera pose which could have observed them
• By observing a sufficient number of points can solve for for both
structure and motion (camera path and scene structure).
60.
SLAM Optimization
• SLAMis an optimisation problem
• compute the best configuration of camera poses and point
positions in order to minimise reprojection error
• difference between a point's tracked location and where it is expected to be
• Can be solved using bundle adjustment
• a nonlinear least squares algorithm that finds minimum error
• But – time taken grows as size of map increases
• Multi-core machines can do localization and mapping on different threads
• Relocalisation
• Allows tracking to be restarted when it fails
61.
Evolution of SLAMSystems
• MonoSLAM (Davidson, 2007)
• Real time SLAM from single camera
• PTAM (Klein, 2009)
• First SLAM implementation on mobile phone
• FAB-MAP (Cummins, 2008)
• Probabilistic Localization and Mapping
• DTAM (Newcombe, 2011)
• 3D surface reconstruction from every pixel in image
• KinectFusion (Izadi, 2011)
• Realtime dense surface mapping and tracking using RGB-D
LSD-SLAM (Engel 2014)
• A novel, direct monocular SLAM technique
• Uses image intensities both for tracking and mapping.
• The camera is tracked using direct image alignment, while
• Geometry is estimated as semi-dense depth maps
• Supports very large scale tracking
• Runs in real time on CPU and smartphone
Direct Method vs.FeatureBased
• Direct uses all information in image, cf feature based approach that
only use small patches around corners and edges
66.
Applications of SLAMSystems
• Many possible applications
• Augmented Reality camera tracking
• Mobile robot localisation
• Real world navigation aid
• 3D scene reconstruction
• 3D Object reconstruction
• Etc..
• Assumptions
• Camera moves through an unchanging scene
• So not suitable for person tracking, gesture recognition
• Both involve non-rigidly deforming objects and a non-static map
SensorTracking
• Used bymany “AR browsers”
• GPS, compass, accelerometer, gyroscope
• Not sufficient alone (drift, interference)
Inertial Compass Drifting Over Time
69.
Combining Sensors andVision
• Sensors
• Produces noisy output (= jittering augmentations)
• Are not sufficiently accurate (= wrongly placed augmentations)
• Gives us first information on where we are in the world,
and what we are looking at
• Vision
• Is more accurate (= stable and correct augmentations)
• Requires choosing the correct keypoint database to track from
• Requires registering our local coordinate frame (online-
generated model) to the global one (world)
70.
Example: Outdoor HybridTracking
• Combines
• computer vision
• inertial gyroscope sensors
• Both correct for each other
• Inertial gyro
• provides frame to frame prediction of camera
orientation, fast sensing
• drifts over time
• Computer vision
• Natural feature tracking, corrects for gyro drift
• Slower, less accurate
The Registration Problem
• Virtualand Real content must stay properly aligned
• If not:
• Breaks the illusion that the two coexist
• Prevents acceptance of many serious applications
t = 0 seconds t = 1 second
77.
Sources of RegistrationErrors
• Static errors
• Optical distortions (in HMD)
• Mechanical misalignments
• Tracker errors
• Incorrect viewing parameters
• Dynamic errors
• System delays (largest source of error)
• 1 ms delay = 1/3 mm registration error
78.
Reducing Static Errors
• Distortioncompensation
• For lens or display distortions
• Manual adjustments
• Have user manually alighn AR andVR content
• View-based or direct measurements
• Have user measure eye position
• Camera calibration (video AR)
• Measuring camera properties
Reducing dynamic errors(1)
• Reduce system lag
• Faster components/system modules
• Reduce apparent lag
• Image deflection
• Image warping
82.
Reducing System Lag
TrackingCalculate
Viewpoint
Simulation
Render
Scene
Draw to
Display
x,y,z
r,p,y
Application Loop
Faster Tracker Faster CPU Faster GPU Faster Display
Reducing dynamic errors(2)
• Match video + graphics input streams (video AR)
• Delay video of real world to match system lag
• User doesn’t notice
• Predictive Tracking
• Inertial sensors helpful
Azuma / Bishop 1994
Wrap-up
• Tracking and Registrationare key problems
• Registration error
• Measures against static error
• Measures against dynamic error
• AR typically requires multiple tracking technologies
• Computer vision most popular
• Research Areas:
• SLAM systems, Deformable models, Mobile outdoor
tracking