Image Processing Patent
Image Processing Patent
Fig. 1
U.S. Patent Feb. 23, 2016 Sheet 2 of 13 US 9.269,105 B2
U.S. Patent Feb. 23, 2016 Sheet 3 of 13 US 9.269,105 B2
BOrder
detected ? Apply majority 220
filtering to remove
thin/small regions
Extract
features
Add black border
back in as
Normalise separate region
features
Total
WCC changed?
Terminate
clustering
Fig. 3
U.S. Patent Feb. 23, 2016 Sheet 4 of 13 US 9.269,105 B2
28O
27O
Fig. 4
27O
U.S. Patent Feb. 23, 2016 Sheet 5 of 13 US 9.269,105 B2
O O O O
Fig. 7
U.S. Patent Feb. 23, 2016 Sheet 6 of 13 US 9.269,105 B2
Original image
its
air.
Quantise
Colours
Generate
MarkOV 415
Model
Markov Model data
Generate
Feature
Vector 425
RaW Feature Vector
(0.90.10.00.00.00,1070.10.2000,10,10,10,1060.10.00.00.00.90.6020.00200)- 430
Random
Vector
Map 435
255 1 .
O
- - - 25 G O
H255
O
Fig. 9
/YAYYAY
Ol 32 64 255
Fig. 10
U.S. Patent Feb. 23, 2016 Sheet 8 of 13 US 9.269,105 B2
Possible quantised
RGB values
Fig. 12
U.S. Patent Feb. 23, 2016 Sheet 9 of 13 US 9.269,105 B2
L8=Distance between
real Colour and
C4 C5 pallet colour C8
U.S. Patent Feb. 23, 2016 Sheet 10 of 13 US 9.269,105 B2
WO Quantised image
W1
W2
WO
W N.
N r
4
455
W2
460' PS as in SU N
In N
D2 N w; WO
W2
WO WO WO
W1 W1 W1
W2 W2 W2
Fig. 14
U.S. Patent Feb. 23, 2016 Sheet 11 of 13 US 9.269,105 B2
52O
600
61 O 62O
630
Fig. 16
U.S. Patent Feb. 23, 2016 Sheet 12 of 13 US 9.269,105 B2
Video
Feature 750
Data
Detector
Reference 76O
Feature
Data Store
U.S. Patent Feb. 23, 2016 Sheet 13 of 13 US 9.269,105 B2
98O 900
Analyse ingest video
at Server
990 910
Generate Reference Partition into
Feature Data segments
92O
ldentify Feature
data
93O
Display at
client
940
User selects
partition
Fig. 19
US 9,269,105 B2
1. 2
IMAGE PROCESSING (i) deriving image property data from pixels of the image
under test;
CROSS-REFERENCE TO RELATED (ii) grouping pixels into image segments in accordance
APPLICATIONS with their respective image property data.
Sub-features of other aspects of the invention are appli
This application is a continuation application of and claims cable to this aspect.
the benefit of priority under 35 U.S.C. S 120 from U.S. appli Further respective aspects and features of the invention are
cation Ser. No. 12/092,090, filed Oct. 7, 2008, the entire defined in the appended claims.
contents of which is incorporated herein by reference. U.S. Embodiments of the invention will now be described, by
application Ser. No. 12/092,090 is a national stage of inter- 10 way of example only, with reference to the accompanying
national application PCT/GB06/03625, filed Sep. 29, 2006, drawings in which:
which is based upon and claims the benefit of priority under FIG. 1 schematically illustrates an image processing sys
35 U.S.C. S 119 from British Patent Application No. tem;
0.522157.7, filed Oct. 31, 2005. FIGS. 2a and 2b schematically illustrate the partitioning of
This invention relates to image processing. 15 an image into a plurality of image segments;
Techniques have been derived for indexing and searching FIG. 3 is a schematic flowchart illustrating a process for
textual information items, or at least items having some tex partitioning an image:
tual content. An example of Such a technique is to generate FIG. 4 schematically illustrates an image with a dark bor
feature data from the textual item (e.g. word distribution) and der;
to allow comparisons between items to be made on the basis 20 FIG. 5 schematically illustrates a search area within an
of a comparison of the feature data. image;
With image items, however, few useful techniques have FIG. 6 schematically illustrates an early stage of generating
been proposed. clusters within an image:
One simple technique is to associate some text with an FIG. 7 schematically illustrates a majority filtering pro
image. This could be as simple as a title, or could involve 25 CeSS;
more detailed “metadata'. Such as a paragraph of description, FIG. 8 is a schematic flowchart illustrating the generation
a schedule of items or people in the image, a time of capture of a feature vector;
of the image, a schedule of those involved in its capture, and FIG.9 schematically illustrates a quantised RGB space:
so on. Text-based searching techniques can then be used to FIG. 10 schematically illustrates quantisation boundaries:
identify similarimages. But of course, providing accurate and 30 FIG.11 schematically illustrates the generation of a feature
useful metadata is time-consuming and expensive. histogram;
Other techniques establish feature databased on properties FIG. 12 schematically illustrates a random mapping pro
of the images themselves. These might include colour prop CeSS;
erties, texture properties and the like. But this is also limited FIG. 13 schematically illustrates a weighted colour quan
because two images, which to a human observer represent the 35 tisation process;
same thing, may have very different image properties. For FIG. 14 schematically illustrates the generation of a
example, a pair of images of a particular person might have weighted Markov model;
very different image properties because the image back FIG. 15 schematically illustrates a camcorder as an
grounds are different. example of a video acquisition and/or processing apparatus;
This invention provides an image processing method com- 40 FIG. 16 schematically illustrates a personal digital assis
prising the steps of: tant as an example of portable data processing apparatus;
partitioning an image under test to form a plurality of FIG. 17 schematically illustrates a network-based shop
image segments, each segment representing a set of pixels ping arrangement;
having similar image properties, at least Some segments being FIG. 18 schematically illustrates a user selection display;
contiguous; 45 and
deriving feature data from a Subset comprising one or more FIG. 19 is a schematic flowchart illustrating the operation
of the image segments; and of the apparatus of FIG. 17.
comparing the feature data from the Subset of image seg FIG. 1 is a schematic diagram of an image processing
ments with feature data indicative of respective reference system based around a general-purpose computer 10 having a
image segments so as to detect a degree of similarity between 50 processor unit 20 including disk storage 30 for programs and
the image segments under test and the reference image seg data, a network interface card 40 connected to a network 50
mentS. such as an Ethernet network or the Internet, a display device
The invention addresses the above problems by first parti Such as a cathode ray tube or liquid crystal display device 60,
tioning an image into segments (regions, areas etc) having a keyboard 70 and a user input device such as a mouse 80. The
similar image properties, and then establishing feature data 55 system operates under program control, the programs being
from the individual segments for comparison with other ref stored on the disk storage 30 and provided, for example, by
erence segments, for example segments in that or another the network 50, a removable disk (not shown) or a pre-instal
image or even artificially created segments which are generi lation on the disk storage 30.
cally representative of image portions. This can act to reduce In general terms, the image processing apparatus is
the variability introduced by parts of the image which are not 60 arranged to partition an image into image segments. So-called
of interest to the present comparison. feature data is then derived from the segments. This allows
In another aspect the invention also provides an image images to be compared at a segment level, that is, the prop
processing method comprising the steps of: erties (as represented by the feature data) of one or more
partitioning an image under test to form a plurality of segments of a test image can be compared with properties of
image segments, each segment representing a set of pixels 65 other segments in that image or, more usually, in other
having similar image properties, at least some being contigu images, to detect reference images deemed to be "similar to
ous, the partitioning step comprising: the image under test or the selected segment(s) of the image
US 9,269,105 B2
3 4
under test. Alternatively, the feature data can be compared It will also be appreciated that the system 10 of FIG. 1 is but
with artificially generated (or perhaps averaged over multiple one example of possible systems which could use the feature
instances) reference feature data which could be generically data derived from partitioned images. Although it is envis
indicative of (for example) a blue automobile, rather than aged that the initial (partitioning) phase could be carried out
necessarily representing exactly aparticular blue automobile. 5 by a reasonably powerful computer, most likely by a non
FIG. 2a Schematically illustrates an example image, and portable computer (though possibly by a portable computer
01 FIG. 2b Schematically illustrates a sample set of image seg with a signal processing function), the later phase of access
ments (e.g. a segment 75) derived from the image of FIG.2a. ing the information could be carried out at a portable machine
Tanmay Sethi In general, the segments are illustrated in FIG. 2b as being 10
Such as a “personal digital assistant” (a term for a data pro
surrounded by a dark border, but this is just so that the seg cessing device with display and user input devices, which
ments can be conveniently represented on paper. The dark generally fits in one hand), a portable computer Such as a
border need not be (and probably would not be) present in the laptop computer, or even devices such as a mobile telephone,
actual partitioned image. a video editing apparatus or a video camera. In general, prac
The system can associate feature data with each of the 15 tically any device having a display could be used for the
segments—for example, a single value representing image information-accessing phase of operation. Examples of other
(e.g. colour) properties of that segment, or multi-valued fea suitable devices are described below with reference to FIGS.
ture data referred to generically as a “feature vector repre 10 and 11.
senting various different image properties of the segment. The processes are not limited to particular numbers of
The image processing system can operate in various modes images or segments.
of operation. In a first mode, a set of images is assembled on FIG. 3 is a schematic flow chart illustrating a process for
2-3
the disk storage 30 or on a network disk drive connected via partitioning an image. The process steps of FIG. 3 will be
2 notes:
the network 50 and is partitioned, sorted and indexed ready described with reference to FIGS. 4 to 7.
for a searching operation. A second mode of operation is the In FIG. 3, at a step 100, the presence of a dark border
actual searching involving a comparison of a current image 25 around an image to be partitioned is detected. Border of this
and the indexed and sorted data. A further mode of operation type often occur because of mis-matches between an image
is a quasi-real-time search or comparison operation. For this, capture format and an image storage/transmission format. For
the image data need not have been pre-partitioned, indexed example, when an image is captured in a wide screen mode
and sorted; instead, feature data could be derived from the but stored in a non-wide screen mode, dark borders can be
images to be compared in response to a need for Such infor 30
inserted into the image as shown in FIG. 4. Here, upper and
mation.
It will therefore be appreciated that in the embodiments to lower dark borders 280 have been applied at some previous
be described below, operations such as partitioning an image processing stage to an image 270. If Such dark borders are
and deriving feature data could be done “inadvance', allow detected at the step 100, they are removed at a step 110. This
ing a later comparison of the feature data between images or 35 involves preventing the relevant pixels from taking part in the
image segments. Alternatively, they could be carried as processing which follows, i.e. cropping the image, so that the
required. It will also be appreciated that the feature data could border regions are not detected as distinct image segments in
be generated (in part or in entirety) by one system, whereas the processing which follows. A flag is set to indicate (a) that
the comparison takes place on another system using that the cropping has been carried out, and (b) the size of the
feature data. 40 regions which have been cropped. This allows the borders to
The images are loaded onto the disk storage 30 in a con be reinstated at a step 230 to be described below.
ventional manner. Preferably, they are stored as part of a A maximum border width of (for example) 50 pixels can be
database structure which allows for easier retrieval and index predetermined, to avoid cropping the entire image if the scene
04 ing of the items, but this is not essential. is generally very dark.
It will be also be appreciated that the feature data and/or the 45 After removal of the borders at the step 110, or after a
Tanmay Sethi images need not be stored on the local disk drive 30. The data negative detection of the presence of borders at the step 100,
could be stored on a remote drive connected to the system 10 control passes to a step 120 at which so-called “features” are
via the network 50. Alternatively, the information may be extracted from the image under test. This is carried out as
stored in a distributed manner, for example at various sites follows. Referring to FIG. 5, at each pixel position 290 within
across the internet. If the information is stored at different 50 an image under test 270', a block of pixels around that pixel
internet or network sites, a second level of information stor position is defined. An example block is shown schematically
age could be used to store locally a “link' (e.g. a URL) to the as a block 285 in FIG. 5. Typically, the block might be 9x9
remote information, optionally with an associated Summary, pixels. For each such block, the median value of the colour
05 abstract or metadata associated with that link. So, the properties R(red), G (green), B(blue), Cb and Cr (colour dif
remotely held information would not be accessed unless the 55 ference values) are established. In this way, each pixel posi
Tanmay Sethi
user selected the relevant link. tion has an associated set of five values (R, G, B, Cb, Cr),
In a further example, the images and/or feature data could though these do not represent the actual colour properties of
be stored across a networked work group, such as a research the pixel but rather the median value of the block surrounding
team, a newspaper publisher or a medical practice. A hybrid that pixel position. These sets of five colour property values
approach might involve some items stored locally and/or 60 for each pixel position represent the features extracted at the
Some items stored across a local area network and/or some step 120.
items stored across a wide area network. In this case, the At a step 130, the features are normalised. The way in
system could be useful in locating similar images captured or which this is carried out in the present embodiment is that the
prepared by others. Or, if a new television programme is R values across the entire set of pixels relating to a single
being planned, the present technique could be used to check 65 image are normalised to have a mean of Zero and a standard
for its originality by detecting previous programmes having deviation of one. The same condition is applied to all of the G
similar content. values across the image and so on.
US 9,269,105 B2
6-7
5 6
2 notes: At a step 140, the process of clustering pixels together is So far, the pixels have been considered as being clustered
started. In particular, the step 140 involves an initialisation of together in the feature space (colour space) represented by
the centres of a set of clusters. five variables (R, G, B, Cb, Cr). Consideration now passes to
The clusters are expressed in a multi-dimensional (R,G,B, grouping the pixels in the image spatial domain, with the aim
Cb, Cr) colour (or feature) space rather than—at this stage— of generating a small number of image segments which are
relating to adjacent regions in the image space. So, the aim is individually contiguous and which represent similar parts of
to group together those pixels which have similar colour the image, at least in So far as their colour properties are
properties rather than (necessarily) those which are close similar. Here, there is no precise definition of the desired
together in the image spatial domain. “small number, as this will depend entirely on the image
The cluster centres are set up as follows. 10 COntent.
An initial set of 2"'''''' clusters, i.e. 32 clusters, If the clustering which has been carried out as far as the step
is set up. In the (R, G, B, Cb, Cr) space, the centres of these 210 is represented in the image domain, so that pixels in the
clusters are setup so as to correspond to the set of positions for same cluster are grouped together in a displayed version of
which the individual variables R, G, B, Cb, Cr have either a the image, an example of the result might be that shown in
minimum or maximum value. Examples of the co-ordinates 15 FIG. 6, where contiguous groups of pixels 300 in the image
in the colour space of the initial 32 cluster centres are as 270" arise from the same cluster. Note that one cluster (in the
follows: colour space) might be represented by several distinct image
(Rmin Gmin Bnin Crni, Cbain) areas in FIG. 6.
(Rma- Gmin Bnin: Crnin: Cb,.) Considering each of the bordered areas 300 in FIG. 6 as an
image region, at a step 220, so-called majority filtering is
(Rma-Gnae Bina Crna Cbma) applied to remove thin or Small regions. Majority filtering is
This completes the step 140, the initialisation of the cluster schematically illustrated in FIG. 7, which shows an array of
centres. Control passes to a step 150, where each pixel is pixels around a centre pixel 310. The pixel 310 actually falls
assigned to the cluster centre which is nearest to that pixel in within a small region 320 which is surrounded by a larger
the colour space. The distance between a pixel’s position in 25 region 330. The effect of the majority filtering is to examine
the colour space and the cluster centres is calculated using the array of pixels Surrounding the pixel 310 and assign the
conventional mathematical techniques including a detection pixel 310 to the region having the greatest representation
of the Euclidean distance between the two positions in colour amongst the array of pixels. In the example of FIG. 7, it can be
space. At the end of the step 150, all of the pixels in the image seen that this would involve reassigning the pixel 310 to the
under test have been assigned to a cluster centre. 30 region 330. The same would apply to the other pixels within
At a step 160, any empty clusters are removed. So, the the small region320, so that the region320 would effectively
number of clusters will tend to decrease each time the itera disappear. In practice, a 9x9 array of pixels may well be used
for the majority filtering step.
tion of steps 150 to 200 is carried out. At a step 230, if necessary, the dark border removed at the
At a step 170, any clusters which are closer together (in the step 110 is reapplied.
five-dimensional colour space) than a cluster merge threshold 35 At a step 240, connected component analysis is performed
are merged together. to determine which pixels in each cluster are contiguous.
At a step 180, the cluster centres are recalculated. As Connected component analysis involves Scanning the pixels
described above, the cluster centres were initialised to horizontally and vertically to detect whether or not neigh
extremes of the five variable values in the colour space. At the bouring pixels (in the image domain) belong to the same
step 180, the cluster centres are recalculated to be the mean 40 cluster. Contiguous pixels belonging to the same cluster are
positions (in the colour space) of all of the pixels in that given the same region number. Non-contiguous pixels
cluster. So, for example, the R values for all of the pixels in a belonging to the same cluster are given separate region num
cluster are combined to form a mean R value which forms the bers. After this process, there will normally be at least as
new R-co-ordinate of the cluster centre for that cluster. many regions as before the process, and often several more.
At a step 190, a variable referred to as “within-cluster 45 Note that this stage could be omitted if it is considered accept
distance' (wcd) is calculated for each cluster. The formula for able to have some regions which are non-contiguous.
deriving wed is as follows: At a step 250, the number of clusters is reset to equal the
current number of image regions, with a one-to-one corre
wccl=Xdistance(pixel,cluster centre) spondence between clusters and regions. A cluster centre for
Accordingly, wcd represents the total of displacements of 50 each newly established cluster is calculated as described
above.
the pixels (in the colour space) from their respective cluster Finally, at a step 260, any remaining Small regions (fewer
Centres.
than 500 pixels) are merged with their closest neighbour
At a step 200, a test is carried out to see whether the sum of region. This is carried out as follows.
all wed values (total WCd) has changed since it was last For regions offewer than 100 pixels, merge with the neigh
calculated. Of course, the first pass through the loop of steps 55 bouring region that corresponds to a cluster centre closest to
150 to 200 will generate a first value of wed, so the test at the that of the region to be merged.
step 200 will be positive and control will return to the step For regions between 100 and 500 pixels, calculate a “merge
150. Thereafter, the comparison is made between a newly cost as follows:
calculated value of total wed and the corresponding value
calculated during the previous iteration. 60 merge cost=(number of pixels)'+smallest inter-clus
The test at the step 200 could be absolute, i.e. “has total wed ter distance with any spatially neighbouring
region
changed at all?', or a threshold test could be applied, i.e. “has
total wed changed by less than a threshold amount?”. If the merge cost is less than a predetermined threshold, the
After an appropriate number of iterations, the step 200 will regions are merged.
detect that total wed has not changed since the last iteration 65 Otherwise they are not merged.
and control passes to a step 210 where the clustering opera A system will be described below whereby a segmented
tion is terminated. image is used as the basis for comparing image segments with
US 9,269,105 B2
7 8
those in other images through the use of a feature vector. clarity of the diagram) illustrate possible quantised points
Other applications of image segmentation include: within the RGB space. R, G and B values are each quantised
1. Region-based video coding (e.g. at a low bit rate). Regions to the nearest Such point.
(segments) could encoded by a description of the area each The raw RGB data, in this embodiment, is represented by
region covers and a description of its colour and/or texture. 5 three 8-bit values and so each of R, G and B can have a value
This is especially useful for very low bit rate video coding between 0 and 255. The quantisation boundaries are set to
for use in mobile phones, handheld devices, video over IP give the centres of the quantisation sub-ranges at 32, 96, 160
and the like, where the screen resolution is likely to below and 224. This means that the overall range of 0 to 255 is
and the representation of image regions as a single colour/ 10
divided into four Substantially equal Sub-ranges.
texture is unlikely to have a large impact on Subjective The quantisation process of the step 405 gives rise to quan
quality. tised image data 410.
2. Region activity based video coding. Images are divided The quantisation of the colour space is an important part of
into regions and coded using an object-based coding the system, as the size of the raw feature vector (see below) is
Scheme. Smooth (low activity) regions are treated less 15 the square of colour palette size. For example, if the colour
harshly during quantisation than textured (higher activity) palette consisted of all the discrete points in the 24 bit RGB
regions, as texture is generally better at hiding quantisation space the palette size would be 256 and the raw feature
noise. vector size would be 256° which would be impractical in
3. Image/video compositing. The image/video is segmented many situations. Experiments have been done with non-linear
into object. This allows the selection of objects for extrac quantisation of hue-saturation-value (HSV) space and linear
tion and insertion into other video/images, i.e. without the quantisation of the 24bit RGB space. Linear quantisation of
need for conventional “blue screen chroma keying. the 24bit RGB space was found to cause fewer problems with
4. CCTV (closed circuit TV) analysis. CCTV images are quantisation errors, but other quantisation schemes could of
divided into objects such that the user can select objects or course be used.
areas of the image either to ignore or to pay special atten 25 A Markov model is generated at the step 415.
tion to during automatic monitoring (e.g. during computer For each pixel, the process identifies its eight neighbours in
vision operations such as crowd counting, analysis of Sus a 3x3 square array around that pixel. Here, the neighbours are
picious behaviour, vehicle tracking, traffic analysis, restricted to those within the current region, so if a pixel lies
motion detection etc). at an edge of the current region, it will be considered to have
5. Machine vision applications, e.g. counting the number of 30 fewer than eight neighbours.
(possibly unknown) objects on a conveyor belt etc. A two dimensional 64x64 bin (i.e. 4096 bin) histogram is
6. Medical image segmentation and diagnosis e.g. cell detec built up a so-called Markov model of the region.
tion. The histogram is built up as follows.
7. Analysis of aerial photographs, for example segmentation For each pixel, its own quantised pixel value (in the
into different homogeneous regions classification of the 35 64-value range, numbered according to a predetermined
regions into different land uses. order as values 1 to 64) forms a position along one axis of the
The processing described so far has provided the partition histogram. Each neighbour-pixel value, again expressed in
ing of an image into respective image segments. Now, in order the range 1 to 64, forms a position along the other axis of the
to be able to compare the segments with one another or with histogram. Accordingly, for a particular centre pixel, there
other reference segments (or data generated in respect of 40 could be up to eight different bins identified by the corre
generic reference segments), it is appropriate to derive feature sponding neighbour pixel values. Each of these bins, repre
data (e.g. a so-called “feature vector”) from each segment. A senting a respective permutation of pixel and neighbour pixel
technique for deriving a feature vector from each image seg properties, is incremented. More generally, each bin repre
ment will now be described. sents a permutation of properties within a contiguous pixel
Accordingly, the following description can relate to pixels 45 group, which in the present embodiment comprises two pix
within a segment as identified by the process above. Alterna els but could have more. In one view, the Markov model could
tively, the following process can be applied to an entire image. be said to represent the texture of the image segment.
That is to say, although it is particularly useful in the context The process then repeats for the centre pixel value of a next
of a segmented image, it is separately applicable without pixel and its eight neighbour pixel values. Over the whole
necessarily requiring an image to be segmented. 50 region under consideration, this will populate a 64x64 bin
FIG. 8 is a schematic flowchart illustrating the generation two dimensional histogram.
of a feature vector. To avoid any confusion over the nature of FIG.11 schematically illustrates the generation of a feature
FIG. 8, the flowchart includes four steps (steps 405, 415,425 histogram, but with reference to the heavily simplified situa
and 435). In between those steps, the respective inputs/out tion of only 3 (instead of 64) quantised RGB values, 0, 1 or 2.
puts are schematically illustrated (as data 400, 410, 420, 430 55 Within a local image area 450, a 3x3 pixel scan window 455
and 440). is arranged around a centre pixel 460.
The process starts with a region 402 (identified as The colour of the centre pixel (within the 3-value colour
described above) in an input image 400. As mentioned above, space in this simplified example) defines a row in the histo
the process which will be described is applicable to an entire gram at the bottom of FIG. 11. The colour of each neighbour
image or to a region within an image. 60 ing pixel then defines a column. Where the row and the
First, at the step 405, the colour properties of the pixels columns intersect, the bin is incremented. In the example
within a segment are quantised to four difference values for shown, the scan windows centre pixel has colour index 1. It
each of the R,G,B colour properties (Crand Cb are not used has a total of 8 neighbouring pixels, from which 5 are colour
in this process). index 0, 2 are colour index 1 and 1 is colour index 2. This
Four values of three variables give 64 possible levels. A 65 results in that the Markov model is increased by 5 in the binat
schematic representation of a 64-level colour cube is shown in row 1 and column 0, 2 in the binat row 1 and column 1 and 1
FIG.9. Here, the black dots (many of which are not shown, for in the bin at row 1 and column 2.
US 9,269,105 B2
10
The histogram is then normalised. This could be carried out distance between their feature vectors is established. A lower
on a region-by-region basis and/or across the group of histo distance implies a greater similarity.
grams relating to an entire image. The normalisation process An example of the use of this technique is for a user to
is such that the Sum of all the values in one row is equal to 1. select one or more segments from an image such as the
08 segmented image of FIG. 2B. For example, the user might
Reference is made to the following normalisation equations
Tanmay Sethi
which refer to simplified 3x3 example of the drawings: select the segment labelled as 75. A feature vector is derived
from that segment and is compared with feature vectors from
other segments within the same image and feature vectors
2 from segments in other images (i.e. in a database to detect
Xxo, 10 similar image segments). Note that the normalisation process
i=0 means that image segments of different sizes can still be
X0.0 x0, x0.2 1 2 X0 detected to be similar if their colour properties are similar.
X1.0 v1.1 x 1.2 - 1 = X v1. X The feature vectors for all of the segments could be gener
=0
X2.0 v2.1 v2.2 || 1 f
2
X2 ated in advance, or could be generated as needed. In a hybrid
09 15 approach, where a database of images is held, feature vectors
Xi=0 v2. could be generated in advance for the stored images. When a
Tanmay Sethi new image is to be compared with the database, a feature
vector is generated from that image alone (or from a segment
Woo Wo. WO2 of that image).
10 W0 W0 W0
V1.0 V1.1 V12
If the user selects more than one segment, there are various
Tanmay Sethi Markoy Model - - - - - - -
W W W
different ways of dealing with this. The segments could be
W2.0 W.2. W.2.2
treated individually and a corresponding set of results (similar
segments) could be derived for each of the selected segments.
Alternatively, the user-selected segments could be treated in
25 combination, so that the distances between the user-selected
A schematic representation of example normalised segments and a segment under test are combined, and a set of
Markov model data is provided as the data 420 in FIG. 8. results is derived for which the combined distance is the
A feature vector is then generated at the step 425. lowest. The combination of distances from two (or more)
The feature vector is generated by concatenating all 64 user-selected segments to a segment under test is usually done
values in the 2 dimensional normalised Markov histogram 30 by simply multiplying the two or more distances. It is also
(corresponding to the image or to a particular image region) to possible to allow the distances to relate to more than one
form a 4096 value vector. The concatenation takes place segment under test, as long as all the segments under test
according to a predetermined, though arbitrary, order. A sche belong to the same image. In this case, the Smallest distance of
matic example of such a feature vector, referred to as a “raw any segment in the test image to each user-selected segment is
feature vector, is provided as the data 430 in FIG.8. Note that
35 used in the multiplication. The system then returns the test
not all 4096 values have been shown, for clarity of the dia image that has the Smallest overall distance.
gram.
A modification providing an improvement to the above
technique will now be described. In some circumstances, the
Then, the 4096 value vector is reduced to a 200-value use of a small number of quantisation levels (64 quantisation
vector at the step 435 by either principal components analysis 40 levels) means that the boundaries between the quantisation
or random mapping. The random mapping technique is well levels are too sharp. A Small change in colour properties can
described elsewhere, and involves multiplying the 4096 value lead to a dramatic change in the quantisation level assigned to
vector by a 200x4096 random matrix, which may be stored in a pixel. So, for example, a slightly brighter sky region could
a pre-set form for use in this technique. FIG. 12 provides a lead to a very poor correlation with other sky regions because
schematic representation of this random mapping process, in 45 of the change in quantised level for that region.
which a 1xN vector is reduced to a 1xM vector by matrix A solution to this feature is to assign contributions from
multiplying the 1xN vector by an NxM matrix of random each neighbour pixel in FIG. 11 to the various bins. So, each
numbers. Previous published work has shown that although centre pixel value (there will be several contributions to the
the resulting vectors may be much shorter, their dot product centre pixel value) is set on the vertical axis as before, but for
remains substantially the same after the random mapping 50 each neighbour pixel, non-integer contributions are added to
process. multiple bins in dependence upon the distance between that
The output of the random mapping process is a 200-value neighbour-pixel (in colour space) and the nearest quantisa
“reduced' feature vector indicative of the colour properties of tion levels. An example of Such an arrangement will now be
the image or each image segment. A schematic example is described with reference to FIGS. 13 and 14.
shown as the data 440 in FIG. 8, but once again not all 200 55 The main difference provided by the arrangement of FIGS.
values have been shown, for clarity of the diagram. The main 13 and 14 is a reduction of the effect of quantisation errors.
point of this schematic representation is to recognise that the The changes affect the colour quantisation step 405 and the
reduced feature vector is shorter than the raw feature vector. Markov model generation step 415. The following describes
It will of course be appreciated that the reduced feature how the weighting affects the previously described algo
vector need not be 200 values in length. This number was 60 rithms.
selected as a reasonable compromise between the require The 24 bit RGB space is divided in the same way as
ment to be short (for ease of storage and other processing)and described above. But instead of quantising a particular pixel
the requirement to have a sufficient length to be accurately colour to one palette colour, it is quantised to several palette
representative of the colour properties. But other lengths colours and the contribution to each palette colour is
could be used. 65 recorded.
In order to compare two image segments (or at least to FIG. 13 schematically illustrates a sub-group of the quan
compare derived feature data with reference feature data), the tised points of FIG. 9. Each point C1 . . . C12 represents a
US 9,269,105 B2
11 12
quantised RGB colour in the 3-dimensional RGB space. A area 620 and a touch sensitive area 630 providing user con
real (unduantised) pixel would be unlikely to fall directly on trols; along with data processing and storage (not shown).
one of the quantised points, but would generally fall in Again, the skilled man will be aware of alternatives in this
between quantised points. In FIG. 13, a pixel under consid field. The PDA may be used as described above in connection
eration falls at a position 480 in the RGB space. The distance, with the system of FIG. 1.
in RGB space, between the pixel position 480 and a quantised The feature vectors derived by the above process could be
point Cn is shown as Ln. used to train and/or populate a self-organising map for dis
11 It could be said that the real colour consists of contributions play, such as a map disclosed in GB-A-2393 275.
from the nearest palette colours. To calculate the contribu Images or material could be classified by grouping together
Tanmay Sethi tions from each palette colour, take the maximum distance 10 into a classification all images or segments having a feature
Dmax (illustrated as a line between the quantisation point C4 vector within a threshold distance of a classification feature
and the quantisation point C10 in FIG. 13, representing the vector. The classification feature vector can be set analytically
maximum distance within the quantisation grid) and Subtract or could be derived as the average feature vector of a set of
the distance between the palette colour and the real colour. images or segments which the user has decided are linked by
This generates a respective weight value win. 15 a common concept (e.g. beach views).
If the weight wine-0 the palette colour is used in the follow In Summary of the feature vector generation technique, a
12 ing process; otherwise it is discarded or set to Zero. colour based feature vector is provided that is rotation, trans
For each real colour there is therefore generated a collec lation and scale invariant. The feature vector can be used to
Tanmay Sethi tion of contributing palette colours and their weights. Each search image databases using all or part of an image. The
Such collection is normalised so that the Sum of each collec feature vector is based on a Markov model that describes the
tions weights is one. colour transitions within the image. The feature vector is
The Markov model is in principle generated the same way based on a Markov model that describes the colour transitions
as described above, but instead of having one palette colour within the image. To enable the use of a Markov model on the
representing each pixel, there is now a collection of palette colour property a technique has been developed to quantise
colours with weights representing each pixel. 25 the RGB colour space to a palette colour space, which repre
This situation is represented schematically in FIG. 14 sents the states in the Markov model.
which illustrates a set of weights w0, w1 and w2 for a centre A further application of the above techniques will now be
pixel 460' and the surrounding 8 pixels in a scan window 455". described with reference to FIGS. 17 to 19.
AS before, a palette (quantised colour space) of just three FIG. 17 schematically illustrates a network-based shop
colours is used. 30 ping arrangement in which a user's data processing apparatus
When determining the contribution of neighbouring pixels or "client 700 is connected, via a network 710 (such as a
to the model, a number of cells in the two dimensional histo local area network or more usually the internet, to a video
gram are affected instead of just one. The appropriate contri server 720 and an online shopping server 740.
bution to the histogram is found by multiplying the column The apparatus also comprises: a feature data detector 730,
vector of weights for the centre pixel by a row vector of 35 comparison logic 750 and a reference feature data store 760.
weights for the neighbouring pixels to give a matrix with the Note that the division of functionality into the three areas of
same dimension as the Markov model matrix. Each of those the client, the video server and the online shop is generally
matrices are added to the model matrix. arbitrary and in most instances technically unimportant. In
For example, consider the transition between the centre particular, the units 730, 750 and 760 may each be imple
13 pixel 460' and one of the neighbouring pixels n' in FIG. 15, 40 mented as a part of the client, the video server or the online
08/06/19 it can be seen that for each such relation there are 9 contribu shop. Furthermore, the video server and the client could be
Tanmay Sethi tions to the Markov model. Note that the schematic example implemented as a single client processor. The particular
here has a colour palette with 3 colours and a Markov model arrangement shown in FIG. 17 is provided simply for ease of
of size 3x3, therefore 9 contributions. The real application has explanation.
a colour palette with 64 colours and a Markov matrix of size 45 Reference will also be made in the following description to
64x64, and the contribution for each neighbour is a matrix of FIG. 18, which shows a schematic user selection display at
size 64x64 (if colours with weight win-O are set to zero–see the client 700, and to a schematic flowchart shown in FIG. 19.
above) or up to 64x64 (if such colours are discarded) Once again the flowchart relates to the particular example
As previously, when the whole region has been scanned, arrangement of FIG. 17.
the Markov model matrix is normalised so the sum of each 50 The video server ingests video data (a step 900), which
row is equal to one. involves uploading video data onto a video data store at (or
FIG. 15 schematically illustrates a camcorder 500 as an associated with) the video data server. At a step 910 the video
example of a video acquisition and/or processing apparatus, server partitions each (or a Subset) of the video images into
the camcorder including an image capture device 510 with an segments using techniques described above. At a step 920,
associated lens 520; a data/signal processor 530; tape storage 55 feature data is derived by the feature data detector 730 and is
540; disk or other random access storage 550; user controls associated with data defining which segment (for display) has
560; and a display device 570 with eyepiece 580. Other fea that feature data—a process referred to in the flowchart as
tures of conventional camcorders or other alternatives (such “identifying the feature data. In other words, each segment
as different storage media or different display Screen arrange (or some of the segments) within an image has associated
ments) will be apparent to the skilled man. In use, metadata 60 feature data, and the segment and the feature data are associ
relating to captured video material may be stored on the ated together by identifying data.
storage 550, and an SOM relating to the stored data viewed on Clearly, if the video data has been generated by capture
the display device 570 and controlled as described above apparatus such as that shown in FIG. 15, or if it has been
using the user controls 560. pre-processed to derive segments and/or feature data, then not
FIG. 16 schematically illustrates a personal digital assis 65 all of the steps 910 and 920 may be needed. Similarly, there is
tant (PDA) 600, as an example of portable data processing no technical necessity to store the video data at the video
apparatus, having a display Screen 610 including a display server; the video server could operate substantially in real
US 9,269,105 B2
13 14
time to forward video data onto the client. Further, the feature by the client (e.g. in a sub-window of a display screen) and—
data (and even the segmentation) need not be pre-derived; it if desired the user can execute a purchase (at a step 970) in
or they could be derived as required when the user selects a the usual way—e.g. by operating (possibly with the pointing
segment at the client (see below). device) a purchase control Such as a purchase menu item or
14-15
At a step 930, the video images are displayed at the client. 5 control button.
2 notes:
An example of a displayed image is shown in FIG. 18. Here The system could be administered by the online shop or by
two image segments are indicated by reference numerals: a a third party. In either instance, the (or another) online shop
segment 800 relating to the rear of a car, and a segment 810 provider can be invited to Supply data in the form of images
relating to a person’s trousers. The segments identified by the (or video) and associated hyperlinks or shopping informa
segmentation process (the step 910) could be identified on the 10 tion, optionally with associated reference feature data. This
display (e.g. by an outline, or by a cursor changing in nature could be done in return for payment to the system adminis
in a similar manner to a cursor passing over a hyperlink in trator—either in advance, or in response to the user selection
Microsoft(R) Internet ExplorerTM) or alternatively identifica of an item relating to that shop (so-called "click through”
tion need not be provided. payment), or when the user makes a purchase, or any combi
The user has a pointing device Such as a mouse, thumb- 15 nation of these.
wheel, touch sensitive screen or the like. The user is able to It will be appreciated that these techniques need not be
point to an area on the displayed image, in order to select one applied only to video images—a series of one or more still
of the segments or partitions of the image (a step 940). A images could be used instead.
selection control Such as a mouse button may be used to The invention claimed is:
indicate an area to be selected. If the user's point (and/or 20 1. An image processing method by a client device, com
click) falls within one of the partitioned segments (whether or prising the steps of:
not the partitioning process has already taken place) the fea partitioning an image under test to form a plurality of
ture data associated with that segment is passed to the com image segments, each segment representing a set of
parison logic 750. Here, at a step 950, that feature data is pixels having similar image properties;
compared with reference feature data stored in the reference 25 deriving feature data from a Subset comprising one or more
feature data Store 760. of the image segments;
The derivation of the reference feature data will now be transmitting at least the feature data to a server device;
described. receiving, from the server device, data indicative of a pur
One example of this process is shown Schematically in chasable article or service associated with the reference
steps 980 and 990 of FIG. 19, in that images are analysed and 30 feature data detected to be most similar to the feature
reference feature data is derived. As regards which images data of one or more of the image segments; and
should be analysed, first it is appropriate to consider the displaying purchase details of the article or service.
purpose of the reference feature data. The intention is that the 2. The method according to claim 1, in which:
reference feature data is generically indicative of image seg the reference feature data is indicative of features of
ments representing a certain type of product. An example is 35 respective reference image segments;
that reference feature data which should (ideally) compare the reference image segments are respective images seg
well with feature data for the segment 800 would be indicative ments of one or more other images; and
of automobiles in general, or saloon automobiles in particu the comparing step is operable to detect a degree of simi
lar, or even a certain colour or style of saloon automobiles. larity between the image under test and the one or more
Accordingly, one way to generate the reference feature data 40 other images.
for Such an instance is to analyse (segment and derive feature 3. The method according to claim 1, in which at least some
data from) one or more images representing Such a class of of the segments are contiguous.
products. It may be that more generic results can be obtained 4. The method according to claim 1, further comprising the
by averaging the results from an analysis of plural Such steps of:
images. An alternative way is to generate the reference feature 45 accessing reference feature data indicative of a plurality of
data analytically, i.e. by deciding what are the likely image reference image segments, each reference image seg
properties of a segment relating to a part of an automobile, ment having an associated data item;
and then establishing what feature data Such properties would detecting user selection of an image segment derived from
represent. In general, multiple examples of reference feature the image under test;
data, each associated with a respective product or service 50 accessing the data itemassociated with the reference image
type, are held by the reference feature data store 760. data detected to be most similar to the image segment
In either case, a correspondence between the detected fea selected.
ture data of the image segment selected by the user and an 5. The method according to claim 4, in which:
instance of the reference feature data is detected by the com the associated data items are hyperlinks; and
parison logic 750 at the step 950. Here, the best match could 55 the step of accessing the associated data item comprises
be selected, or a threshold “goodness of match” could be accessing the hyperlink target associated with that
applied so that only matches of at least a certain statistical hyperlink.
significance would be allowed, all using known matching 6. The method according to claim 4, in which the step of
techniques. accessing the associated data item comprises providing to the
Where a match is found, a type of product or service (and 60 user data indicative of a purchase price of an article or service
possibly an internet address or hyperlink corresponding to associated with that reference image segment.
that product) associated with the matching reference feature 7. The method according to claim 6, further comprising the
data is passed to the online shop. Either a shop portal, or a step of:
particular item or items for sale, or both, is passed to the client detecting user operation of a purchase control indicating
700 at a step 960 in accordance with the targets of the 65 that the user wishes to purchase the article or service.
hyperlink(s) mentioned above if they are used. Details of the 8. The method according to claim 1, in which the feature
shop and/or items (e.g. their identity and price) are displayed data is indicative of at least colour properties.
US 9,269,105 B2
15 16
9. The method according to claim 1, in which the feature derive feature data from a Subset comprising one or more of
data is indicative of at least image texture properties. the image segments;
10. An image processing method by a server device, com transmit at least the feature data to a server device;
prising the steps of: receive, from the server device, data indicative of a pur
receiving data from a client device indicative of feature 5 chasable article or service associated with the reference
data derived from a Subset comprising one or more feature data detected to be most similar to the feature
image segments of an image, each segment representing data of one or more of the image segments; and
a set of pixels having similar image properties; display purchase details of the article or service.
comparing the feature data from the Subset of image seg 14. The apparatus according to claim 13, further compris
ments with reference feature data so as to detect a degree
10 ing a camera for capturing an image under test.
of similarity between the feature data for the image 15. An image processing apparatus comprising:
a processor configured to:
segments under test and the reference feature data, in receive data from a client device indicative of feature data
which the reference feature data or corresponding refer derived from a Subset comprising one or more image
ence images have associated data items; and 15 segments of an image, each segment representing a set
transmitting, to the client device, data indicative a purchas of pixels having similar image properties;
able article or service associated with that reference compare the feature data from the Subset of image seg
feature data detected to be most similar to the feature ments with reference feature data so as to detect a degree
data of one or more of the image segments. of similarity between the feature data for the image
11. An image processing method comprising the steps of segments under test and the reference feature data, in
by a client device: which the reference feature data or corresponding refer
partitioning an image under test to form a plurality of ence images have associated data items; and
image segments, each segment representing a set of transmit, to the client device, data indicative a purchasable
pixels having similar image properties; article or service associated with that reference feature
deriving feature data from a Subset comprising one or more 25 data detected to be most similar to the feature data of one
of the image segments; and or more of the image segments.
transmitting at least the feature data to a server device, and 16. An image processing system comprising a client device
by the server device: and a server device,
comparing the feature data from the Subset of image seg the client device having a processor configured to:
ments with reference feature data so as to detect a degree 30 partition an image under test to form a plurality of image
of similarity between the feature data for the image segments, each segment representing a set of pixels hav
segments under test and the reference feature data, in ing similar image properties:
which the reference feature data or corresponding refer derive feature data from a Subset comprising one or more of
ence images have associated data items; the image segments; and
accessing the associated data item associated with the ref 35 transmit at least the feature data to the server device, and
erence feature data detected to be most similar to the the server device having a processor configured to:
feature data of an image segment; and compare the feature data from the Subset of image seg
providing to the client device data indicative of a purchase ments with reference feature data so as to detect a degree
price of an article or service associated with that refer of similarity between the feature data for the image
ence feature data detected to be most similar to the 40 segments under test and the reference feature data, in
feature data of one or more of the image segments. which the reference feature data or corresponding refer
12. An image processing method comprising the steps of ence images have associated data items;
by a client device and a server device connected via an access the associated data item associated with the refer
interface cooperating to: ence feature data detected to be most similar to the
partition an image under test to form a plurality of image 45 feature data of an image segment; and
segments, each segment representing a set of pixels hav provide to the client device data indicative of a purchase
ing similar image properties; price of an article or service associated with that refer
derive feature data from a Subset comprising one or more of ence feature data detected to be most similar to the
the image segments; feature data of one or more of the image segments.
compare the feature data from the Subset of image seg 50 17. An image processing system comprising a client device
ments with reference feature data so as to detect a degree and a server device, each having a respective processor, the
of similarity between the feature data for the image processors of the client device and the server device being
segments under test and the reference feature data, in configured to cooperate to:
which the reference feature data or corresponding refer partition an image under test to form a plurality of image
ence images have associated data items; 55 segments, each segment representing a set of pixels hav
access the associated data item associated with the refer ing similar image properties;
ence feature data detected to be most similar to the derive feature data from a Subset comprising one or more of
feature data of an image segment; and the image segments;
display purchase details of an article or service associated compare the feature data from the Subset of image seg
with that reference feature data detected to be most 60 ments with reference feature data so as to detect a degree
similar to the feature data of one or more of the image of similarity between the feature data for the image
Segments. segments under test and the reference feature data, in
13. An image processing apparatus comprising a processor which the reference feature data or corresponding refer
configured to: ence images have associated data items;
partition an image under test to form a plurality of image 65 access the associated data item associated with the refer
segments, each segment representing a set of pixels hav ence feature data detected to be most similar to the
ing similar image properties; feature data of an image segment; and
US 9,269,105 B2
17 18
display purchase details of an article or service associated server computer connected by an interface, causes the client
with that reference feature data detected to be most computer and the server computer to cooperate to:
similar to the feature data of one or more of the image partition an image under test to form a plurality of image
Segments. segments, each segment representing a set of pixels hav
18. A non-transitory machine readable storage medium ing similar image properties;
storing program code which, when executed by a client com derive feature data from a Subset comprising one or more of
the image segments;
puter, causes the computer to: compare the feature data from the Subset of image seg
partition an image under test to form a plurality of image ments with reference feature data so as to detect a degree
segments, each segment representing a set of pixels hav 10 of similarity between the feature data for the image
ing similar image properties; segments under test and the reference feature data, in
derive feature data from a Subset comprising one or more of which the reference feature data or corresponding refer
the image segments; ence images have associated data items;
transmit at least the feature data to a server device; access the associated data item associated with the refer
receive, from the server device, data indicative of a pur ence feature data detected to be most similar to the
15
chasable article or service associated with the reference feature data of an image segment; and
feature data detected to be most similar to the feature display purchase details of an article or service associated
data of one or more of the image segments; and with that reference feature data detected to be most
display purchase details of the article or service. similar to the feature data of one or more of the image
19. A non-transitory machine readable storage medium Segments.
storing program code which, when executed by a client and a
Annotations