KEMBAR78
Emotion Triggered H.264 Compression: This Chapter Describes | PDF | Data Compression | Video
0% found this document useful (0 votes)
104 views27 pages

Emotion Triggered H.264 Compression: This Chapter Describes

This chapter describes H.264 video compression and the impact of quantization step size on output video quality. It discusses how compression removes redundancy between frames to reduce file size. Quantization step size is an important coding parameter - larger sizes lead to higher compression but lower quality, while smaller sizes improve quality at the cost of less compression. The chapter shows output frames from sample videos compressed with different quantization step sizes and their corresponding PSNR and bitrate values, demonstrating the quality-compression tradeoff.

Uploaded by

Rajshree Mandal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views27 pages

Emotion Triggered H.264 Compression: This Chapter Describes

This chapter describes H.264 video compression and the impact of quantization step size on output video quality. It discusses how compression removes redundancy between frames to reduce file size. Quantization step size is an important coding parameter - larger sizes lead to higher compression but lower quality, while smaller sizes improve quality at the cost of less compression. The chapter shows output frames from sample videos compressed with different quantization step sizes and their corresponding PSNR and bitrate values, demonstrating the quality-compression tradeoff.

Uploaded by

Rajshree Mandal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 27

CHAPTER 2

Emotion Triggered H.264 Compression

This chapter describes


1.1 Introduction
A video is nothing but large number of still images combined together. When these still images
are played frame by frame sequentially, we are able to see a video. Transfer or storage of a raw
video file may be the best solution in terms of quality but as the transmission and storage cost is
increasing day by day, it may not be the best solution in terms of cost. So, it becomes necessary to
develop a cost effective compression technique for this purpose.

Now, to understand the importance of a compression technique, let us consider the videos
shown in figure. We notice large that there is a large amount of redundancy in the consecutive
frames of these raw video files. The main aim of any compression technique is to remove the
redundancy present in a raw video signal, thereby reducing the number of binary bits required to
represent the raw video.

Video 1: suzie_qcif.yuv (add 2 more videos)

In order to carry out compression of a raw video, many standards are available nowadays like
MPEG-1, MPEG-2, JPEG, JPEG2000, and many more. A standard itself does not define the
encoding process. Rather, it defines the syntax in which the original video will be present in its
compressed form and also the method to decode the compressed data to get the decoded output. It
also makes sure that the encoder and decoder are compliant to each other, which means that the
raw video encoded by the encoder can be successfully decoded by the compliant decoder.

The video compression standard like MPEG-1 and MPEG-2 were developed by Moving
Pictures Expert Group (MPEG), which are now widely used in the field of communication and
storage of digital video. The JPEG and JPEG2000 standards which are widely popular were
developed by the Joint Photographic Experts Group (JPEG) for coding still images. ITU-T
Advanced Video Experts Group developed the H.263+ standard. The main advantage of H.264
standard over the previous standards was its improved compression efficiency for low bit-rate
encoding of video sequences. The latest development by ITU-T and JVT led to H.264 standard for
video compression which is widely used nowadays.

1.2 Quality Measure of an H.264 Compressed Video


From previous discussions, it is clear that compression is very essential in order to transfer or store
a raw digital video. On one hand, it gives us immense amount of added advantage in terms of
reduced storage space or lesser number of bits while transmission, but on the other hand we lose
the original quality of the video signal. In any communication system, comprising of an encoder
and a decoder, it is also equally essential to evaluate the quality of the video signal at the input of
the encoder and the output of the decoder.

The visual quality measurement of the video at the output of the decoder is not an easy task as
many factors are involved in the quality measurement process. The viewer’s state of mind or
his/her personal opinion about the quality may be taken, for example, as one of the important
factors. Other factors may include the kind of video for which the encoding procedure is carried
out. For example, a person who is watching a football match may not look into the detail of the
audiences watching the match, but the same person may look into the detail of the facial
expressions of a news reader reading news on TV. Again, the objective for which encoding is
done also affects the quality measure. For example, people may expect a high quality video output
for a videoconferencing or a surveillance video scene.

1.2.1 Subjective Quality Measurement


The subjective quality measurement also depends on many factors. How a video scene is
subjectively measured by a viewer depends on his/her perception which is mainly governed by the
Human Visual System (HVS). The other factors being how much the viewer is interacting with the
output video and the environment in which he/she is watching the video. Also, different viewers
may have their own opinion regarding the quality of the video. The quality which may be of
‘average’ quality to one viewer may seem to be of ‘good’ quality to someone else.

Keeping all these factors in mind, a very commonly used procedure for subjective quality
assessment of the video is outlined in the standard, known as the Double Stimulus
Continuous Quality Scale (DSCQS) method. The experimental setup for the
procedure is shown in Figure 2.

Original Video
Sequence

Display

ENCODER DECODER
Figure 2. Double Stimulus Continuous Quality Scale (DSCQS) method

In this procedure to assess the subjective quality of the video, the viewer is shown two
versions of the same video sequence. One version (version A) is the original or the reference video
and the other being the encoded and decoded one (version B). These two versions are shown
randomly to the viewer and he/she is made to rate the quality of these two version in the
continuous scale between ‘Excellent’ to ‘Bad’. Many such videos, each comprising of two
versions, are shown to the viewer to know the final assessment of the quality metric of the encoder
and decoder.

1.2.2 Objective Quality Measurement


Although the subjective quality measurement involves real experiences of people, this kind of
quality measure is both time consuming and expensive. Thus, another quality measure, known as
the objective quality measure is more often adopted. The most widely used objective quality
measurement is the Peak Signal to Noise Ratio (PSNR).

PSNR is measured on logarithmic scale and is given by

(2 n −1) 2
PSNR = 10 log 10
MSE

Here, MSE or Mean Square Error is calculated between the original video and the reconstructed
video at the output of the decoder. n being the number of bits per image sample. Although PSNR
is a convenient option to measure the quality of the reconstructed video sequence, but its
calculation requires the original video sequence, which may not always be present.

1.3 Effect of Quantization Step Size on the Output Video Quality


We have already been familiarized by the encoding and decoding procedure using H.264 standard
in the first chapter. It is seen that some coding parameters affect the quality of the video sequence.
Quantization Step Size is one such coding parameter which has an important role to play in the
quality of the reconstructed video. If the step size is large, we get high compression but at the cost
of poor video quality. On the other hand, making the quantization step size small for good quality
video, we have to pay the price with less compressed output. Some video sequences are shown
below with varying quantization step size and its effect on the visual quality. What is Quantization
Step Size
Video Sequence 1: akiyo_orig.yuv

Original:

[Frame 1 …………………………………………………………………………………………………………….Frame 6]

[Frame 7 …………………………………………………………………………………………………………..Frame 12]

[Frame 13 …………………………………………………………………………………………………………Frame 18]

Encoded with Quantization Step Size: 1

[Frame 1 …………………………………………………………………………………………………………….Frame 6]

[Frame 7 ………………………………………………………………………………………………………… Frame 12]

[Frame 13 ………………………………………………………………………………………………………….Frame 18]

SNR : 61 dB, Bit rate: 3008 kbps


Encoded with Quantization Step Size: 5

[Frame 1 …………………………………………………………………………………………………………….Frame 6]

[Frame 7 …………………………………………………………………………………………………………...Frame 12]

[Frame 13…………………………………………………………………………………………………………..Frame 18]

SNR : 56 dB, Bit rate: 2320 kbps

Encoded with Quantization Step Size: 15

[Frame 1 …………………………………………………………………………………………………………….Frame 6]

[Frame 7 …………………………………………………………………………………………………………...Frame 12]

[Frame 13 ………………………………………………………………………………………………………….Frame 18]

SNR : 49 dB, Bit rate: 779 kbps


Encoded with Quantization Step Size: 25

[Frame 1 …………………………………………………………………………………………………………….Frame 6]

[Frame 7 …………………………………………………………………………………………………………...Frame 12]

[Frame 13 ………………………………………………………………………………………………………….Frame 18]

SNR : 42 dB, Bit rate: 195 kbps

Encoded with Quantization Step Size: 35

[Frame 1 …………………………………………………………………………………………………………….Frame 6]

[Frame 7 …………………………………………………………………………………………………………...Frame 12]

[Frame 13 ………………………………………………………………………………………………………….Frame 18]

SNR : 36 dB, Bit rate: 54 kbps


Encoded with Quantization Step Size: 45

[Frame 1 …………………………………………………………………………………………………………….Frame 6]

[Frame 7 …………………………………………………………………………………………………………...Frame 12]

[Frame 13 ………………………………………………………………………………………………………….Frame 18]

SNR : 30 dB, Bit rate: 22 kbps

Encoded with Quantization Step Size: 50

[Frame 1 …………………………………………………………………………………………………………….Frame 6]

[Frame 7 …………………………………………………………………………………………………………...Frame 12]

[Frame 13 ………………………………………………………………………………………………………….Frame 18]

SNR : 26 dB, Bit rate: 14 kbps


65

60

Average PSNR in dB 55

50

45

40

35

30

25
0 10 20 30 40 50
Quantization Step Size

Figure 4: Plot of PSNR versus Quantization Step Size for “akiyo_orig.yuv” sequence

65

60

55
Average PSNR in dB

50

45

40

35

30

25
0 500 1000 1500 2000 2500 3000 3500
BitRate in kbps

Figure 5: Plot of PSNR versus BitRate for “akiyo_orig.yuv” sequence


1.3.1 Concept of Fixed QP
In H.264 standard, each frame is coded in units of macroblock. A macroblock is formed by
dividing a video frame into non overlapped regions of 16×16 pixels. It is seen from previous
examples of “akiyo_orig.yuv” sequence that the output video may be of good or poor quality
depending on the QP fixed while encoding the sequence.
Let us consider a video shown in Figure 7(a). It is seen that the video sequence has lot of
emotional information present in it.

[Frame 1 ……………………………………………………………………………………………………………...Frame 6]

[Frame 7 …………………………………………………………………………………………………………….Frame 12]

[Frame 13……………………………………………………………………………………………………...…….Frame 18]

We encode the video with QP=1 and QP=25 (figure 7b and c).

[Frame 1 ……………………………………………………………………………………………………………...Frame 6]

[Frame 7 …………………………………………………………………………………………………………….Frame 12]

[Frame 13……………………………………………………………………………………………………...…….Frame 18]

Figure 7: a. Original video sequence b. Encoded with QP = 1 c. Encoded with QP = 25


It is obvious that smaller the QP better is the output video quality and vice-versa. But, as the
video sequence has lot of emotion involved in it, we cannot afford to lose the information about
the emotion involved in each and every frame of the video while saving on the number of bits. The
main disadvantage with fixed QP setting is that the entire frame will have either good quality or
poor quality. Preservation of the emotional information in each and every frame is thus a difficult
task while we are encoding with fixed QP for all the macroblocks. Thus, we go for the adaptive
QP setting of the macroblocks.

1.3.2 Adaptive QP setting


Unlike as in fixed QP setting where all the macroblocks are encoded with same QP value, in
adaptive QP setting, the macroblocks are encoded with different QP settings. As our goal is to
preserve the emotional information present in the video sequence, we go for emotion based
adaptive QP setting, as explained in the next section.

1.4 Emotion Based QP setting for H.264 Encoding


While encoding a video sequence which has a lot of emotional information present in it (for
example Figure 7a), we will go for an adaptive QP setting of the macroblocks in each and every
frame of the video sequence. The main reason for using this adaptive QP setting for the
macroblocks is that we would like to give more importance to some part of the frame than the rest.
We have already seen that smaller the QP value better is the quality of the output video. So, we
shall encode the important regions of the frame with smaller QP value and the non important
region with comparatively larger QP value. The important region of a frame are termed as the
Region of Interest (ROI) while the rest of the frame may be considered as the non-Region of
Interest (non- ROI).

From earlier research results, it is evident that the most important regions for recognizing the
emotion of a person are the eyes and lips. So, for emotion based QP setting of the macroblocks, it
is obvious that the ROI shall be the eye and lip region of the person. The macroblocks belonging
to these regions will be encoded with smaller QP value than the rest of the frame. This approach
will lead to better quality in the ROI and thus the emotion of the person will not be lost due to
compression.

1.5 Methodology
The two main steps involved in emotion based H.264 encoding are

1.5.1 Look-up Table Generation

A look-up table needs to be generated for all the frames in the video sequence. This look up table
comprises of information like frame number, the emotion expressed in a given frame and also the
macroblock number involved in different ROI portions of the frame. Figure 8 shows the schematic
diagram
Videofor the look-up
Frame 1table generation for all the frames in a video sequence.
Frame 2 Feature Emotion
Frame 3 Look-up Table
Extraction Recognition
.
.
Frame n

Figure 8: Schematic diagram for Look-up Table generation for a video sequence

In order to recognize the emotion of person in a frame, first a fuzzy face-space is constructed,
which comprises of primary and secondary membership curves of 10 known subjects. Considering
5 emotions (anger, disgust, happiness, relax and fear) and 5 facial features, we get 5×5×10 = 250
primary and secondary curves in the fuzzy face-space.

1. Skin region detection

This is carried out is to separate out the skin and the non-skin regions of the image. The skin
region detection is performed on the HSV (Hue-Saturation-Value) color model. Two parameters,
namely x and y are identified based on the formulae:

x = 0.148* H - 0.291* S + 0.439 * V + 128


y = 0.439 * H - 0.368 * S - 0.071 * V + 128

The computation of the parameters x and y are done for each pixel according to the above
equations. A pixel is said to be a skin pixel provided the values for parameters x, y and H of the
pixel satisfies the following inequalities:

140 ≤ x ≤ 195
140 ≤ y ≤ 165
0.01≤ H ≤ 0.1

As the skin region detection is based purely on the color value matching, apart from the face
and neck portion other parts of the body present in the image are also included in skin regions. For
example, Fig.2(b) which is obtained from the skin separation procedure of Fig.2(a) contains
portion of both hands apart from the face and neck region. It is also possible that the skin
separation procedure may detect some skin colored regions in the image background provided the
background color values match with the skin range values. These regions along with the non-facial
skin portions do not contribute to the features required for emotion detection. In contrary, those
may lead to incorrect emotion inference. In order to achieve high accuracy in the emotion
detection, the unwanted skin regions and glitches needs to be removed. The next step ‘Face and
Neck Region Extraction’ is carried out for that purpose.

2. Face and Neck Region Detection

In order to filter out the unwanted skin regions, the column wise sum is obtained for all the
columns. Then the non-zero-column-sum windows are marked by grouping the adjacent columns
having non-zero column sum value. Among those windows, the window with the maximum size
corresponds to the face and neck portion. The remaining windows are left out by marking the pixel
values in those windows as zero. For example in Fig. 2(b), we obtain three windows. From left to
right, the windows are: i) the right hand region window ii) the face and neck region window and
iii) the right hand region window. We select the face and neck region as it has the maximum
window size amongst the three. Thus the left and right hand region windows are eliminated out.
Similar operations are carried out row wise on the new image to find out the maximum row
window (based on non-zero-row-sum values). In this way, the glitches and unnecessary skin
patches are eliminated. After the face and neck region extraction of the image in Fig.2(b), we get
an image as shown in Fig.2(c).

Add steps and make the algo concise

Function: Face_and_neck_region_extraction
Input: Skin segmented image P of m × n pixels.
Output: Face and neck region extracted image P_face of h × w pixels

Begin
S:= ϕ // a set for holding the sum of pixel values of all columns
count := 1; // counter for holding the number of windows
For i:= 1 to n do Begin
m
S (i ) := ∑P ( k , i )
k =1
End For
window := ϕ
min_col_list := ϕ
max_col_list := ϕ
window_size := 0
min_col := 1
max_col := n
For i:=1 to n do Begin
If S(i) ≠ 0 then
window_size := window_size +1
Else
If window_size ≠ 0 then
window(count):=window_size
min_col_list(count) := min_col
max_col_list(count):= (i-1)
count := count + 1
End If
window_size = 0
min_col := (i + 1)
End If
End For
Find the index (i) for which the window_size is maximum;
min_col := min_col_list(i)
max_col := max_col_list(i)
Eliminate other windows by setting the pixel values to 0;
Repeat the above operations to determine the row;
boundaries (minRow and maxRow);
return P_face := P(minRow to maxRow and minCol to maxCol);
End

3. Localization of Eye Region Search Area for Left and Right Eye

A sharp change in color is considered as a key to localize the eye region. For example, while
marking the pixel values from the forehead region, the eyebrow region has a distinct color than the
forehead region. In this approach, first the column wise sum value is computed for each of the
columns. The columns which have a sum less than 50% of the maximum column sum are
eliminated out. This narrows down the search region by redefining the minCol and maxCol values.
Next, in order to determine the sharp color changes, we take the row sums enclosed by the column
boundaries (say sum1 to sumh). Then we calculate the gradient between two consecutive row sums.
i.e difference between sums of ith and (i+1)th row. Taking the ten maximum gradients and their
corresponding row values, we notice that the least row value among these gives the row where the
eyebrows are located. In case, the left and right eyebrows are not aligned in the same row, we get
the row value of the upper eyebrow. Once the eyebrows are located, the upper limit of the search
area is defined. The lower boundary of the search area will be limited by upper limit plus half of
the column window width (maxCol – minCol). The location of the left and right boundaries for
both the eyes are as described in the following algorithm:
Add steps and make the algo concise

Function: Search_area_for_eyes()
Input: P_face of h × w pixels
Output: Left_eye_area of a × b pixels; Right_eye_area of c × d pixels
Begin
S := Φ //a set for holding the sum of pixel values of all columns
For i:= 1 to w do Begin
h
S (i ) := ∑ P _ face(k , i )
k =1
End For;
Find maximum of set S and store in max_col_sum;
For i:= 1 to w do Begin
If S(i) ≤ (0.5*max_col_sum) then
S (i ) := 0
End If
End For;
Mark the beginning and end of non zero values of S(i) to get minCol and maxCol values;
Z := Φ // a set for holding the difference of sum of pixel values of two consecutive rows
For j:= 1 to h-1 do Begin
w w
Z ( j ) := ∑ P _ face( j, k ) − ∑ P _ face( j + 1, k )
k =1 k =1
End For;
Find 10 maximum values of Z and store their corresponding row values in A;
minRow = min(A); //row location of eyebrow

//Search Area for Left Eye


minColLeft = (minCol + maxCol)/2;
maxColLeft = minColLeft + 1.2*((maxCol - minCol)/2);
minRowLeft = minRow - 10;
maxRowLeft = minRow + (maxCol - minCol)/2;

//Search Area for Right Eye


maxColRight = (minCol + maxCol)/2;
minColRight = maxColRight - 1.2*((maxCol - minCol)/2);
minRowRight = minRow - 10;
maxRowRight = minRow + (maxCol - minCol)/2;

return Left_eye_area = P_face (minRowLeft to maxRowLeft ; minColLeft to maxColLeft);


return Right_eye_area = P_face(minRowRight to maxRowRight ; minColRight to maxColRight);
End;

4. Estimation of EOL, EOR, LEEL, LEER

A look at Fig. 2 reveals that there is a sharp change in color while moving from the forehead
region to the eyebrow region. Thus to detect the location of the eyebrow, we take the average
intensity (in three primary color planes) over each row of the image from the top, and identify the
row with a maximum dip in all the three planes. This row indicates the top of the eyebrow region
(Fig.3b). Similarly, we detect the lower eyelid by identifying the row with a sharp dip in intensity
in all the three planes, while scanning the face up from the bottommost row. The location of the
top eyelid region is identified by scanning the face up from the marked lower eyelid until a dip in
the three color planes are noted together.
Add steps and make the algo concise

Function: Estimation_of_eye_features()
Input: Left_eye_area (L_image) of a × b pixels
Output: Estimation of EOL and LEEL

Begin
S := Φ // set for holding the difference of sum of pixel values of two rows from top to bottom
For j:= 1 to a-1 do Begin
b b
S ( j ) := ∑ L _ image( j, k ) − ∑ L _ image( j + 1, k )
k =1 k =1
End For;
Find 10 maximum values of S and store their corresponding row values in A;
Eyebrow_row = min(A); //row location of Eyebrow
S := Φ // set for holding the difference of sum of pixel values of two rows from bottom to top
For j:= a-1 to 1 do Begin
b b
S ( j ) := ∑ L _ image( j + 1, k ) − ∑ L _ image( j, k )
k =1 k =1
End For;
Find 10 maximum values of S and store their corresponding row values in A;
Lower_Eyelid_row = max(A); //row location of Lower Eyelid
S := Φ // set for holding the difference of sum of pixel values of two rows from top to bottom
For j:= Eyebrow_row to Lower_Eyelid_row do Begin
b b
S ( j ) := ∑ L _ image( j, k ) − ∑ L _ image( j + 1, k )
k =1 k =1
End For;
Find 10 maximum values of S and store their corresponding row values in A;
Upper_Eyelid_row = min(A); //row location of Upper Eyelid

//Feature List
return EOL = Lower_Eyelid_row - Upper_Eyelid_row;
return LEEL = Lower_Eyelid_row - Eyebrow_row;
EOR and LEER can be estimated similarly using the right eye search area as input image
End;

5. Localization of Mouth Opening Search Region


6. Estimation of MO
The above algorithms are shown on some subjects for different emotions.

Original image Skin Region Face-Neck Right Left Mouth Lip Facial
Region Eye Eye Search Cluster Features
Search Search Area Extracted
Area Area

HAPPY

DISGUST

FEAR

ANGER

RELAX

Add some more subjects (at least 3 more – anisha, basabdatta, annesha)
1.6 Experimental Details and Results
The experiment is conducted with two sets of subjects: a) the first set of 10 subjects (n=10) is
considered for designing the fuzzy face-space and, b) the other set of 30 facial expressions taken
from 6 unknown subjects is considered to validate the result of the proposed emotion classification
scheme. The experiment thus consists of two distinct phases as indicated in the next two sub-
sections.

a. Creating the Type-2 Fuzzy Face-Space

Type-2 fuzzy face-space contains both primary and secondary membership distributions for each
facial feature. In order to create the primary curves, we consider 10 known subjects. Ten instances
of one subject expressing a given emotion are considered, say anger. We take down the ten values
of a given facial feature, say EOL from these ten snapshots. The mode of all these values is
considered and a second moment around the mode is calculated. A bell shaped curve is drawn
with the peak as the mode and the standard deviation as the second moment around the mode.
Since we have 5 facial features and the experiment includes 5 distinct emotions of 10
subjects, we obtain 10×5×5=250 primary membership curves. These 250 membership curves are
grouped into 25 heads, each containing 10 membership curves of ten subjects for a specific feature
representing a given emotion. The primary membership curves for different features and different
emotions are shown in Figure 5.

Feature: Left Eye Opening (EOL)


Emotion: ANGER Emotion: DISGUST Emotion: HAPPY
1 1 1

0.9 0.9 0.9


Primary memberships -->

Primary memberships -->


Primary memberships -->

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
0.5 1 1.5 0.5 1 1.5 0.5 1 1.5
Feature --> Feature --> Feature -->
Emotion: FEAR Emotion: RELAX
1 1

0.9 0.9

Primary memberships -->

Primary memberships -->


0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0.5 1 1.5 0.5 1 1.5
Feature --> Feature -->

Feature: Right Eye Opening (EOR)


Emotion: ANGER Emotion: DISGUST Emotion: HAPPY
1 1 1

0.9 0.9 0.9


Primary memberships -->

Primary memberships -->

Primary memberships -->


0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
0.5 1 1.5 0.5 1 1.5 0.5 1 1.5
Feature --> Feature --> Feature -->

Emotion: FEAR Emotion: RELAX


1 1

0.9 0.9
Primary memberships -->

Primary memberships -->

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0.5 1 1.5 0.5 1 1.5
Feature --> Feature -->

Feature: Lower Eyelid to Eyebrow for Left Eye (LEEL)


Emotion: ANGER Emotion: DISGUST Emotion: HAPPY
1 1 1

0.9 0.9 0.9


Primary memberships -->

Primary memberships -->

Primary memberships -->

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
0.5 1 1.5 0.5 1 1.5 0.5 1 1.5
Feature --> Feature --> Feature -->
Emotion: FEAR Emotion: RELAX
1 1

0.9 0.9
Primary memberships -->

Primary memberships -->

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0.5 1 1.5 0.5 1 1.5
Feature --> Feature -->

Feature: Lower Eyelid to Eyebrow for Right Eye (LEER)


Emotion: ANGER Emotion: DISGUST Emotion: HAPPY
1 1 1

0.9 0.9 0.9

Primary memberships -->

Primary memberships -->

Primary memberships -->


0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
0.5 1 1.5 0.5 1 1.5 0.5 1 1.5
Feature --> Feature --> Feature -->
Emotion: FEAR Emotion: RELAX
1 1

0.9 0.9
Primary memberships -->

Primary memberships -->


0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0.5 1 1.5 0.5 1 1.5
Feature --> Feature -->

Feature: Mouth Opening (MO)


Emotion: ANGER Emotion: DISGUST Emotion: HAPPY
1 1 1

0.9 0.9 0.9


Primary memberships -->

Primary memberships -->

Primary memberships -->


0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
0.8 1 1.2 1.4 1.6 1.8 0.5 1 1.5 0.8 1 1.2 1.4 1.6 1.8
Feature --> Feature --> Feature -->

Emotion: FEAR Emotion: RELAX


1 1

0.9 0.9
Primary memberships -->

Primary memberships -->

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0.8 1 1.2 1.4 1.6 1.8 0.5 1 1.5
Feature --> Feature -->

Now, for each primary membership distribution, we have a corresponding secondary


membership distribution. Thus we obtain 250 secondary membership distributions. Ten illustrative
type-2 secondary distributions for 10 subjects for the feature EOL and emotion disgust are given in
Fig. 6. The axes in the figure represent feature (EOL), primary and secondary membership values
as indicated.
1 1

0.8 0.8

Primary 0.6 Primary 0.6

membership 0.4 membership 0.4

0.2 0.2

0 0
1 1
1.5 1.5
0.5 0.5
Secondary 1 1

0 0.5 feature Secondary 0 0.5


feature
membership
membership

1 1

0.8 0.8

Primary 0.6 Primary 0.6

membership 0.4 membership 0.4

0.2 0.2

0 0
1 1
1.5 1.5
0.5 0.5
1 1
Secondary Secondary
membership
0 0.5
feature 0 0.5
feature
membership

0.8 1

0.8
0.6
Primary
Primary 0.6

membership 0.4

membership 0.4

0.2
0.2

0 0
1 1
1.5 1.5
0.5 0.5
1 1
Secondary feature Secondary
0 0.5 0 0.5
feature
membership membership

1 1

Primary 0.8 0.8

Primary
membership 0.6 0.6

0.4 membership 0.4

0.2 0.2

0 0
1 1
1.5 1.5
0.5 0.5
1 1
Secondary Secondary
0 0.5
feature 0 0.5
feature
membership membership

0.8 1

0.8
0.6
Primary Primary 0.6

membership 0.4
membership 0.4

0.2
0.2

0 0
1 1
1.5 1.5
0.5 0.5
Secondary 1
Secondary 1

membership
0 0.5
feature membership
0 0.5
feature
b. Emotion recognition of an unknown person
The process of emotion recognition for the unknown person is divided in two steps, as
outlined below.

1. Feature Extraction
The facial features are extracted as mentioned in section IV-A. The extracted features are self-
normalized by dividing individual feature obtained in a given emotional state by its value in the
relaxed state. This nullifies the effect of the distance variation while image capturing. The step by
step approach to feature extraction for the unknown facial image and for the same person in
relaxed state is shown in tabular form. The extracted features for Fig.7 are enlisted in Table II.

2. Consulting the fuzzy face-space


Now, for each facial feature, we consult 5 sets of 10 primary membership distributions as in Fig.
5, each corresponding to one distinct emotion of 10 subjects. Thus we obtain 10 primary
membership values of that feature to fall in a given emotion class. Secondary memberships
corresponding to a given feature and obtained primary membership are determined using curves
like Fig. 6. Summary of the primary and secondary memberships obtained for five features and
five emotion classes are listed in tabular form.
In the last row of subsequent table, we obtain a final range with lower (upper) boundary equal
to the intersection of the lower (upper) range boundaries for different features under a given
emotion. This measure indicates an interval of fuzzy certainty about the joint occurrence of the
features in a given facial expression representing a specific emotion class. The center of the
interval has minimum uncertainty, and so we compare them, and finally determine the winning
emotion with the largest centre. The theory mentioned above shall be clear by some experimental
results as shown by some unknown facial images.

UNKNOWN FACIAL IMAGE 1:


Let us consider the unknown facial image as shown in Figure 7.

Fig. 4. Image of an unknown person

As mentioned earlier, we follow two main steps to recognize the emotion of the person.

1. Facial Feature Extraction


The step by step method facial feature extraction method for the unknown facial image is
shown in Table I and the extracted features are tabulated in Table II.
TABLE IA
Step by Step Feature Extraction Method for the unknown facial image

Original image Skin Region Face-Neck Right Left Mouth Lip Facial
Region Eye Eye Search Cluster Features
Search Search Area Extracted
Area Area

TABLE IIB
Calculated Feature Value
EOL EOR MO LEEL LEER
7 10 19 25 27

The step by step method facial feature extraction method for the same person in relax state is
shown in Table I and the extracted features are tabulated in Table II.

TABLE IIIIA
Step by Step Feature Extraction Method for the same person in relax state

Original image Skin Region Face-Neck Right Left Mouth Lip Facial
Region Eye Eye Search Cluster Features
Search Search Area Extracted
Area Area

TABLE IVIB
Calculated Feature Value
EOL EOR MO LEEL LEER
11 13 20 35 36

The final feature list for the unknown facial image is obtained by dividing the features for the
unknown images by the feature list for the same person in relax state. Thus, dividing the feature
value in Table IB by Table IIB, we obtain table III.

TABLE VII
Calculated Feature Value
EOL EOR MO LEEL LEER
0.64 0.77 0.95 0.71 0.75
2. Consulting the Fuzzy face-space for emotion recognition

TABLE IVA
Consulting the Fuzzy face-space
Parameter Anger Disgust Happy Fear Relax

µpri µsec µpri µpri µsec µpri µpri µsec µpri µpri µsec µpri µpri µsec µpri
× × × × ×
µsec µsec µsec µsec µsec

0.04 0.61 0.02 0.50 0.45 0.23 0.36 0.45 0.16 0.02 0.61 0.01 0 0.64 0
0 0.59 0 0.96 0.91 0.87 0 0.61 0 0 0.67 0 0 0.65 0
0 0.61 0 0.02 0.53 0.01 0.02 0.53 0.01 0.5 0.6 0.3 0 0.67 0
EOL(0.64) 0.16 0.61 0.09 0.01 0.53 0.01 0 0.65 0 0.01 0.6 0.01 0 0.64 0
0 0.62 0 0.02 0.45 0.01 0.01 0.53 0.01 0 0.68 0 0 0.64 0
Dipti_disgust 0.07 0.61 0.04 0.81 0.91 0.74 0.08 0.45 0.04 0 0.68 0 0 0.61 0
0.29 0.58 0.17 0.04 0.53 0.02 0.8 0.91 0.73 0 0.67 0 0 0.66 0
0.01 0.61 0.01 0.26 0.45 0.12 0 0.63 0 0 0.68 0 0 0.66 0
0 0.63 0 0 0.53 0 0 0.66 0 0.01 0.62 0.01 0 0.65 0
0.3 0.6 0.18 0.96 0.9 0.86 0.11 0.64 0.07 0 0.60 0 0 0.6 0

(0 – 0.18) (0 – 0.87) (0 – 0.73) (0 – 0.3) (0 – 0)

0 0.67 0 0.52 0.44 0.23 0.38 0.47 0.18 0.03 0.58 0.02 0 0.58 0
0 0.69 0 0.88 0.89 0.78 0.01 0.66 0.01 0.01 0.65 0.01 0 0.59 0
0.19 0.63 0.12 0.02 0.56 0.02 0 0.49 0 0.44 0.62 0.27 0 0.65 0
EOR (0.77) 0 0.58 0 0.01 0.45 0 0.85 0.91 0.77 0.02 0.69 0.01 0 0.63 0
0.03 0.60 0.02 0.83 0.89 0.74 0 0.61 0 0 0.62 0 0 0.67 0
0.25 0.59 0.15 0.27 0.46 0.12 0 0.69 0 0 0.64 0 0 0.69 0
0 0.62 0 0 0.52 0 0.15 0.62 0.09 0.01 0.69 0.01 0 0.61 0
0.03 0.62 0.02 0.98 0.91 0.89 0.01 0.45 0 0 0.62 0 0 0.60 0
0 0.69 0. 0.01 0.56 0.02 0.09 0.53 0.05 0 0.60 0 0 0.65 0
0.34 0.61 0.21 0.05 0.56 0.03 0 0.69 0 0 0.68 0 0 0.69 0

(0 – 0.21) (0 – 0.89) (0 – 0.77) (0 – 0.27) (0 – 0)

0.51 0.72 0.36 1 0.9 0.9 0.94 0.72 0.67 0.77 0.67 0.51 0.39 0.72 0.28
0.85 0.77 0.65 0.49 0.68 0.33 0.94 0.77 0.72 0.93 0.77 0.71 0.32 0.72 0.23
MO 0.9 0.72 0.65 0 0.6 0 1 0.9 0.9 1 0.9 0.9 0.38 0.72 0.27
(0.95) 0.99 0.9 0.89 0.13 0.6 0.08 0.86 0.6 0.51 0.75 0.67 0.51 0.03 0.72 0.02
0.46 0.45 0.21 0.44 0.68 0.3 0.99 0.9 0.89 0.88 0.77 0.67 0.45 0.72 0.3
0.62 0.68 0.42 0.72 0.72 0.51 0.95 0.77 0.73 0.89 0.67 0.59 0 0.72 0
0.84 0.77 0.64 0.36 0.68 0.24 0.92 0.77 0.71 0.92 0.77 0.71 0.18 0.72 0.12
0 0.6 0 0.9 0.9 0.81 0 0.51 0 0 0.6 0 0.01 0.72 0.01
0.02 0.51 0.01 0.9 0.9 0.81 0.96 0.6 0.57 0.16 0.51 0.08 0.05 0.72 0.03
0.81 0.60 0.49 0 0.6 0 0.91 0.77 0.71 1 0.9 0.9 0.21 0.72 0.15

(0 – 0.89) (0 – 0.9) (0 – 0.9) (0 – 0.9) (0 – 0.3)

0.14 0.48 0.07 0.9 0.9 0.81 0 0.45 0 0.5 0.48 0.24 0 0.5 0
0 0.58 0 0.01 0.6 0.01 0 0.48 0 0.01 0.5 0.01 0 0.5 0
0.06 0.54 0.03 0 0.64 0 0.57 0.48 0.27 0.09 0.5 0.05 0 0.48 0
0.35 0.48 0.2 0.56 0.8 0.45 0 0.45 0 0.05 0.48 0.02 0 0.49 0
LEEL (0.71) 0.55 0.45 0.25 0.84 0.9 0.76 0 0.48 0 0 0.49 0 0 0.48 0
0.45 0.53 0.24 0 0.45 0 0.86 0.6 0.52 0.01 0.45 0 0 0.5 0
0.03 0.53 0.02 0.97 0.9 0.87 0 0.48 0 0.03 0.49 0.01 0 0.48 0
0 0.48 0 0 0.6 0 0 0.48 0 0 0.49 0 0 0.48 0
0 0.58 0 0.99 0.9 0.89 0 0.53 0 0 0.5 0 0 0.49 0
0.12 0.48 0.06 0.99 0.9 0.89 0 0.48 0 0 0.5 0 0 0.5 0

(0 – 0.25) (0 – 0.89) (0 – 0.52) (0 – 0.24) (0 – 0)


0.2 0.45 0.09 0.91 0.92 0.83 0.01 0.44 0.01 0.45 0.6 0.27 0 0.59 0
0.02 0.59 0.01 0 0.69 0 0.01 0.53 0.01 0 0.60 0 0 0.46 0
0.36 0.47 0.17 0.02 0.69 0.01 0.59 0.54 0.31 0.07 0.64 0.05 0 0.48 0
LEER 0.45 0.54 0.24 0.59 0.78 0.46 0 0.68 0 0 0.50 0 0 0.44 0
(0.75) 0.55 0.55 0.30 0.88 0.91 0.8 0.03 0.43 0.01 0.04 0.47 0.02 0 0.54 0
0 0.60 0 0.02 0.44 0.01 0.89 0.65 0.57 0 0.48 0 0 0.52 0
0.01 0.58 0.01 0.95 0.91 0.86 0.03 0.53 0.02 0.02 0.54 0.02 0 0.49 0
0.02 0.45 0.01 0.02 0.60 0.01 0 0.58 0 0.01 0.56 0.01 0 0.49 0
0.14 0.46 0.06 0.91 0.91 0.82 0.01 0.52 0.01 0.02 0.59 0.01 0 0.52 0
0 0.62 0 0.92 0.95 0.87 0 0.67 0 0 0.47 0 0 0.53 0

(0 – 0.30) (0 – 0.87) (0 – 0.57) (0 – 0.27) (0 – 0)


Consulting the ranges in the above table for 5 facial features and 5 emotion classes, we construct a
final table for the emotion recognition process.

TABLE IVB
Final Range for Facial Features
Emotion Range of Features Range after Centre Value
EOL EOR MO LEEL LEER intersection
Anger 0 – 0.18 0 – 0.21 0 – 0.89 0 – 0.18 0.09
Disgust 0 – 0.87 0 – 0.89 0 – 0.9 0 – 0.87 0.435
Happy 0 – 0.73 0 – 0.77 0 – 0.9 0 – 0.52 0.26
Fear 0 – 0.3 0 – 0.27 0 – 0.9 0 – 0.24 0.12
Relax 0–0 0–0 0 – 0.3 0–0 0

From the last column of Table IVB, it is seen that Emotion ‘disgust’ has the highest centre value
among all the emotion classes. Thus, it is concluded that the emotion expressed by the unknown
facial image 1 is disgust.

UNKNOWN IMAGE 1:
Let us consider the unknown facial image shown in Figure 7.

Fig. 4. Image of an unknown person

The process of emotion recognition for the unknown person is divided in two steps, as outlined
below.

1.
TABLE VI
Step by Step Feature Extraction Method

Original Skin Region Face-Neck Right Left Mouth Search Lip Facial
image Region Eye Eye Area Cluster Features
Search Search Extracted
Area Area

TABLE VIII
Calculated Feature Value
EOL EOR MO LEEL LEER
0.636 0.6371 1.77 0.969 0.968

TABLE III
CALCULATED FEATURE RANGES AND CENTRE VALUE FOR EACH EMOTION
1.7 Conclusion
The paper proposed a simple and time-efficient scheme for emotion recognition from a pre-
constructed type-2 fuzzy face space. Experiments reveal that the classification accuracy of
emotion by considering both type-2 primary and secondary memberships is as high as 96.67%.
The accuracy falls off by more 8% when only the type-2 primary memberships are considered.
The classical rule based method for emotion classification depends largely on the relational matrix
used to represent implication relations. In the present context, the emotion analysis is performed
intentionally on the fuzzy encoded measurement space to make the system performance robust.

1.8 Summary

You might also like