International Journal of Social Robotics (2022) 14:1583–1604
https://doi.org/10.1007/s12369-022-00867-0

Facial Emotion Expressions in Human–Robot Interaction: A Survey

Niyati Rawal1 · Ruth Maria Stock-Homburg1

Accepted: 4 January 2022 / Published online: 24 June 2022
© The Author(s) 2022

Abstract
Facial expressions are an ideal means of communicating one’s emotions or intentions to others. This overview will focus on
human facial expression recognition as well as robotic facial expression generation. In the case of human facial expression
recognition, both facial expression recognition on predefined datasets as well as in real-time will be covered. For robotic facial
expression generation, hand-coded and automated methods i.e., facial expressions of a robot are generated by moving the
features (eyes, mouth) of the robot by hand-coding or automatically using machine learning techniques, will also be covered.
There are already plenty of studies that achieve high accuracy for emotion expression recognition on predefined datasets,
but the accuracy for facial expression recognition in real-time is comparatively lower. In the case of expression generation
in robots, while most of the robots are capable of making basic facial expressions, there are not many studies that enable
robots to do so automatically. In this overview, state-of-the-art research in facial emotion expressions during human–robot
interaction has been discussed leading to several possible directions for future research.

Keywords Facial emotion recognition · Facial emotion expressions · Human–robot interaction · Survey · Overview

1 Introduction

Robots are no longer just machines being used in factories
and industries. There is a growing need and demand towards
robots sharing space with humans as collaborative robotics
or assistive robotics [35,63]. Robots are, now, increasingly
being deployed in a variety of domains as receptionists [120],
educational tutors [49,59], household supporters [111] and
caretakers [25,49,67,125]. Thus, there is a need for these
social robots to effectively interact with humans, both ver-
bally and non-verbally. Facial expressions are non-verbal
signals that can be used to indicate one’s current status in
a conversation, e.g., via backchanneling or rapport [3,31].

Perceived sociability is an important aspect in human–
robot interaction (HRI) and users want robots to behave in a
friendly and emotionally intelligent manner [28,48,99,105].
For social robots to be more anthropomorphic and for
human–robot interaction to bemore like human-human inter-
action (HHI), robots need to be able to understand human

B Ruth Maria Stock-Homburg
RSH@tu-darmstadt.de

Niyati Rawal
niyati.rawal@tu-darmstadt.de

1 Technical University Darmstadt, Hochschulstr. 1, 64295
Darmstadt, Germany

emotions and appropriately respond to those human emo-
tions. Stock and Merkle show that emotional expressions
of anthropomorphic robots become increasingly important
in business settings as well [118,121]. The authors of [119]
emphasize that robotic emotions are particularly important
for the acceptance of a robot by the user. Thus, emotions are
pivotal for HRI [122]. In any interaction, 7% of the affective
information is conveyed through words, 38% is conveyed
through tone, and 55% is conveyed through facial expres-
sions [92]. This makes facial expressions an indispensable
mode of affective communication. Accordingly, numerous
studies have examined facial expressions of emotions during
HRI [e.g.2,8,15,17–19,33,38,50,81,81,91,91,110,116].

In any HHI, human beings first infer the emotional state of
the other person and then accordingly generate facial expres-
sions in response to their peer. The generated emotion could
be a result of parallel empathy (generating the same emo-
tion as the peer) or reactive empathy (generating emotion in
response to the peer’s emotion) [26]. Similarly, in the case of
HRI, we would like to study robots recognizing human emo-
tion as well as robots generating their emotion as a response
to human emotion.

There has been a growth in the number of papers on
facial expressions in HRI in the last decade. Between 2000
and 2020 (see Fig. 1), there has been a gradual increase in

123

http://crossmark.crossref.org/dialog/?doi=10.1007/s12369-022-00867-0&domain=pdf
http://orcid.org/0000-0002-8576-5883


1584 International Journal of Social Robotics (2022) 14:1583–1604

Fig. 1 Publications on emotion recognition of human faces during HRI and generation of facial expressions of robots

the number of publications. Thus, the overarching research
question is: What has been done so far on facial emotion
expressions in human–robot interaction, and what still needs
to be done?

In Sect. 2 the framework of the overview is outlined,
followed by the method of selection of studies in Sect. 3.
Recognition of human facial expressions and generation of
facial expressions by robots are covered in Sects. 4 and 5.
The current state of the art and future research are discussed
in Sect. 6 with the conclusion in Sect. 7.

2 Framework of the Overview

This overview focuses on two aspects: (1) recognition of
human facial expressions and (2) generation of facial expres-
sions by robots. The review framework (Fig. 3) is based
on these two streams. (1) Recognition of human facial
expressions is further subdivided depending on whether the
recognition takes place on (a) a predefined dataset or in (b)
real-time. (2) Generation of facial expressions by robots is
also subdivided depending on whether the facial generation
is (a) hand-coded or (b) automated, i.e., facial expressions of
a robot are generated by moving the features (eyes, mouth)
of the robot by hand-coding or automatically using machine
learning techniques.

3 Method

Studies with the keywords “facial expression recognition
AND human–robot interaction / HRI”, ”facial expression

recognition” and ”facial expression generationANDhuman–
robot interaction / HRI” between 2000 and 2020 were
reviewed on Google Scholar.

In this overview, studies that use voice or body gestures
as a modality for emotional expression but do not involve
facial expressions are not included. Studies that involve HRI
with humans havingmental disorders like autism are also not
included. Furthermore, studies that work on single emotion
such as recognition of smile or facial expression generation
of anger are not included. In total, 175 studies of 276 were
rejected (Fig. 2).

In Table 3, various studies on facial expression recog-
nition are listed. Here, studies with an accuracy of greater
than 90% for facial expression recognition on predefined
datasets are selected. For real-time facial expression recog-
nition, all studies that perform facial expression recognition
in a human–robot interaction scenario are listed.

4 Recognition of Human Facial Expressions

Earlier, facial expression recognition (FER) consisted of the
following steps: detection of face, image pre-processing,
extraction of important features and classification of expres-
sion (Fig. 4). As deep learning algorithms have become
popular, the pre-processed image is directly fed into deep
networks (like CNN, RNN etc.) to predict an output [71]
(Fig. 5).

In the machine learning algorithms, Viola Jones algo-
rithm and OpenCV were popular choices for face detection.
However, dlib face detector and ADABOOST algorithm

123


International Journal of Social Robotics (2022) 14:1583–1604 1585

Fig. 2 Flowchart of the literature screening process

Fig. 3 Framework of the overview

123


1586 International Journal of Social Robotics (2022) 14:1583–1604

Fig. 4 Process of facial expression recognition in machine learning (adapted from Canedo and Neves [13])

Fig. 5 Process of facial expression recognition in deep leaning (adapted from Li and Deng [71])

were also used. To pre-process the images, greyscale con-
version, image normalization, image augmentation (such as
flip, zoom, rotate etc.) were usually applied. Further, some
studies extract the important regions in faces like eyebrows,
eyes, nose andmouth (also known as the acting units or AUs)
that play an important role in FER. Others use local binary
pattern (LBP) or histogram of oriented gradients (HOG) to
extract the featural information. Finally, the classification is
performed. Most of the studies perform classification for the
six universally known emotions (happy, sad, disgust, anger,
fear and surprise) and sometimes include a neutral expres-
sion. For final classification, k-Nearest Neighbor (KNN),
Hidden Markov Model (HMM), Recurrent Neural Network
(RNN), Convolutional Neural Network (CNN), Support Vec-
tor Machine (SVM) and Long Short-TermMemory (LSTM)
are used.

In the deep learning algorithms, the input images are first
pre-processed by performing face alignment, data augmen-
tation and normalization. Then the images are directly fed
into deep networks like CNN, RNN etc. which predict the
emotion of the images. The most commonly used classifica-
tion methods are explained in more detail below. They are
arranged in the order in which they were invented.

KNN: Nearest neighbor based classifier was first invented
in the 1950s [37]. In KNN [57], given the training instances
and the class labels, the class label of an unknown instance is
predicted. KNN is based on a distance function that measures
the difference between two instances. While the Euclidean
distance formula is mostly used, there are also other distance
formulae such as Hamming distance which can be used.

HMM: An HMM [104] was introduced in the late 1960s.
It is a doubly embedded stochastic process, bearing a hid-
den stochastic process (a Markov chain) that is only visible
through another stochastic process, producing a sequence of
observations. The state sequence can be learned usingViterbi
algorithm or Expectation-Modification (EM) algorithm.

RNN: RNN [78] was introduced in the 1980s. RNN is
a feed-forward neural network that has an edge over adja-
cent time steps, introducing a notion of time. Hence, RNN
is mainly used for a dynamic data input that has a temporal
sequence. In RNN, a state depends upon the current input
as well as the state of the network at the previous time step,
making it possible to contain information from a long time
window.

CNN:ConvolutionalNetworks [70]were invented in 1989
[69]. CNNs are trainable multistage architectures composed
of multiple stages. The input and output of each stage are
sets of arrays called feature maps. Each stage of CNN is
composed of three layers- a filter bank layer, a non-linearity
layer and a feature pooling layer. The network is trained using
the backpropagation method. They are used for end-to-end
recognition wherein given the input image, the output is pre-
dicted by CNNs. They are even used as feature extractors
which are further connected with neural networks layers like
LSTM or RNN for the prediction.

SVM: SVMwas invented byVapnik [128]. In SVM [129],
the training data can be separated by a hyperplane.

LSTM: LSTM [39] was invented by Hochreiter and
Schmidhuber [47]. It also has recurrent connections but

123


International Journal of Social Robotics (2022) 14:1583–1604 1587

unlike RNN, it is capable of learning long-term dependen-
cies.

Table 1 summarizes the major purpose, application areas,
advantages, disadvantages and frequency of use for com-
monly used algorithms. For the frequency of use, only the
number of papers that implement facial expression recog-
nition during HRI or in real-time scenarios were counted.
AlthoughRNNhas not been used for facial expression recog-
nition duringHRI or in real-time, some studies perform facial
expression recognition on predefined datasets using RNN.

27 studies on facial expression recognition during HRI
were reviewed. Some of the studies have not been per-
formed on a robot platform. These studies perform emotion
recognition in real-time and mention HRI as their intended
application. The studies are summarized in Table 2. Here,
studies that perform facial expression recognition on pre-
defined datasets or studies that perform facial expression
recognition but not in real-time were not included.

4.1 FER on Predefined Dataset

Although the goal of this study is to perform FER in
real-time and during HRI, the studies on real-time FER are
compared with FER on predefined datasets. FER has been
carried out on static human images as well as on dynamic
human video clips.While some studies, perform facial recog-
nition on still images, others perform facial recognition on
videos. In Datcu and Rothkrantz [24], they show that there
is an advantage in using data from video frames over still
images. This is because videos contain temporal information
that is absent in still images.

Results of studies with above 90% accuracy in FER on
still images are summarized in Table 3a and on videos are
summarized in Table 3b. Table 3a, b are for comparison with
Table 3c. Studies are arranged according to their accuracy
level. It should be noted that these studies are carried out on
predefined datasets consisting of human images and videos
and do not involve robots. There are a considerable number
of studies that achieve accuracy greater than 90% on CK+,
Jaffe andOulu-Casia datasets on both still images and videos.

4.2 FER in Real-Time

It is easier to achieve high accuracy while performing emo-
tion recognition on predefined datasets as they are recorded
under controlled environmental conditions. On the other
hand, it is difficult to achieve the same level of accuracy
when performing emotion recognition in real-time when the
movements are spontaneous. It should be noted that studies
that perform facial expression recognition in real-time were
carried out under controlled laboratory conditions with little
variation in lighting conditions and head poses.

As this study is about facial expressions inHRI, for a robot
to be able to recognize emotion, emotion recognition has to
be performed in real-time. Table 3c provides studies with
facial expression recognition in real-time for HRI. Here, the
accuracies are comparatively lower than the accuracies for
predefined datasets. As can be seen in Table 3c, only two
studies have an accuracy greater than 90%. The robots that
are used in the studies are either robotic heads or humanoid
robots such as Pepper, Nao, iCub etc. Many studies that per-
form facial expression recognition in real-time use CNNs,
making it a popular choice for facial expression recognition
[2,2,8,15,133]. However, the highest accuracy is achieved by
Bayesian and Artificial Neural Network (ANN) methods for
facial expression recognition in real-time.

5 Facial Emotion Expression by Robots

For robots to be empathic, it is necessary that the robots not
only be able to recognize human emotions but also be able to
generate emotions using facial expressions. Several studies
enable robots to generate facial expressions either in a hand-
coded or an automated manner (Fig. 6). By hand-coded, we
mean that the facial expressions are coded bymoving the eyes
andmouth of the robot in a desirousmanner, and automated is
when the emotions are learned automatically using machine
learning techniques.

16 studies on facial emotion expression in robots were
reviewed. These studies are summarized in Table 4.

5.1 Facial Expression Generation is Hand-Coded

Earlier studies started by hand-coding the facial expressions
in robots. There is a static as well as dynamic generation of
facial expressions on robots.

Among the static methods, there is a humanoid social
robot “Alice” that imitates human facial expressions in real-
time [91]. Kim et al. [61] introduced an artificial facial
expression imitation system using a robot head, Ulkni. As
Ulkni is composed of 12 RC servos, with four Degrees of
Freedom (DoFs) to control its gaze direction, two DoFs for
its neck, and six DoFs for its eyelids and lips, it is capa-
ble of making the basic facial expressions after the position
commands for actuators are sent from the PC. Bennett and
Sabanovic [7] identified minimal features, i.e. movement of
eyes, eyebrows,mouth and neck, which are sufficient to iden-
tify the facial expression.

In this study, the main program called functions that spec-
ified facial expressions according to the direction (used to
make or undo an expression) and degree (strength of the
expression–i.e. smaller vs. larger). The facial expression
functions would in turn call lower functions that moved
specific facial components given a direction and degree, fol-

123


1588 International Journal of Social Robotics (2022) 14:1583–1604

Ta
bl
e
1

D
et
ai
ls
ab
ou
tt
he

co
m
m
on
ly

us
ed

al
go
ri
th
m
s

A
lg
or
ith

m
(y
ea
r)

M
aj
or

Pu
rp
os
e

A
pp
lic
at
io
n
ar
ea
s

A
dv
an
ta
ge
s

D
is
ad
va
nt
ag
es

Fr
eq
ue
nc
y
of

us
e

K
N
N
(1
95
0s
)

St
or
in
g
of

al
l
av
ai
la
bl
e
ca
se
s
an
d

cl
as
si
fie

s
ne
w

in
st
an
ce
s
by

m
ea
-

su
ri
ng

th
e
si
m
ila

ri
ty

or
di
ff
er
en
ce

be
tw

ee
n
tw
o
in
st
an
ce
s
us
in
g
a
di
s-

ta
nc
e
fu
nc
tio

n
[5
7]

C
la
ss
ifi
ca
tio

n
of

fa
ci
al
ex
pr
es
si
on

s
N
o
tr
ai
ni
ng

re
qu

ir
ed

be
fo
re

m
ak
in
g

pr
ed
ic
tio

ns
G
re
at

co
m
pu

ta
tio

na
l

co
m
pl
ex
ity
,

be
ca
us
e
th
e
di
st
an
ce

be
tw

ee
n
ev
er
y

sa
m
pl
e
sh
ou

ld
be

ca
lc
ul
at
ed

in
or
de
r

to
cl
as
si
fy

[1
32

]

1

H
M
M

(1
96
0s
)

G
iv
en

a
se
qu
en
ce

of
ob
se
rv
at
io
ns
,

de
co
de

th
e
hi
dd

en
st
at
es

[1
04

]
Pe
rf
or
m

FE
R

ba
se
d

on
dy
na
m
ic

da
ta
in
pu
ti
.e
.v
id
eo
s

C
ap
tu
re

th
e
de
pe
nd
en
ci
es

be
tw
ee
n

co
ns
ec
ut
iv
e
m
ea
su
re
m
en
ts

In
fo
rm

at
io
n
fr
om

st
at
es

in
th
e
pr
e-

ce
di
ng

tim
e
st
ep
s
(n
ot

th
e
pr
ev
io
us

tim
e
st
ep
)
ca
nn
ot

be
ca
pt
ur
ed

1

R
N
N
(1
98
0s
)

L
ea
rn

te
m
po
ra
ld

ep
en
de
nc
ie
s
[7
8]

Pe
rf
or
m

FE
R

ba
se
d

on
dy
na
m
ic

da
ta
in
pu
ti
.e
.v
id
eo
s

U
se
fu
li
n
m
od

el
lin

g
se
qu

en
tia

ld
at
a

D
if
fic

ul
t
to

le
ar
n

lo
ng

-t
er
m

te
m
-

po
ra
l

de
pe
nd
en
ci
es

as
gr
ad
ie
nt
s

ex
pl
od
e
or

va
ni
sh

ov
er

m
an
y
tim

e
st
ep
s
[6
]

–

C
N
N
(1
98
9)

G
iv
en

a
se
to

f
im

ag
es
,e
xt
ra
ct
s
fe
a-

tu
ra
li
nf
or
m
at
io
n
su
ch

as
ed
ge
s
an
d

pe
rf
or
m
s
th
e
cl
as
si
fic

at
io
n
ta
sk

[7
0]

L
ea
rn

th
e

sp
at
ia
l
fe
at
ur
es

in
an

im
ag
e
i.e
.
pe
rf
or
m

FE
R

ba
se
d
on

st
at
ic
da
ta
in
pu

t

E
as
ie
rt
o
tr
ai
n
an
d
ge
ne
ra
liz

es
m
uc
h

be
tte

r
th
an

ne
tw
or
ks

w
ith

fu
ll
co
n-

ne
ct
iv
ity

be
tw

ee
n

ad
ja
ce
nt

la
ye
rs

[6
8]

R
eq
ui
re
s
a
lo
t
of

da
ta

to
pr
ev
en
t

ov
er
-fi
tti
ng

[1
38

]
13

SV
M

(1
99
5)

A
su
pe
rv
is
ed

le
ar
ni
ng

al
go
ri
th
m

th
at
ca
n
be

us
ed

fo
rc
la
ss
ifi
ca
tio

n
or

re
gr
es
si
on

on
sm

al
ld

at
as
et
s
[1
29

]

C
la
ss
ifi
ca
tio

n
of

fa
ci
al
ex
pr
es
si
on

s
R
ob
us
tt
o
ov
er
-fi
tti
ng

[1
24

]
D
oe
s
no
t
pe
rf
or
m

w
el
l
w
ith

la
rg
e

tr
ai
ni
ng

sa
m
pl
es

[8
0]

3

L
ST

M
(1
99
7)

L
ea
rn

lo
ng
-t
er
m

te
m
po
ra
l
de
pe
n-

de
nc
ie
s
[3
9]

Pe
rf
or
m

FE
R

ba
se
d

on
dy
na
m
ic

da
ta
in
pu
ti
.e
.v
id
eo
s

A
bi
lit
y
to

le
ar
n
lo
ng

-t
er
m

te
m
po

ra
l

de
pe
nd
en
ci
es

[4
7]

C
om

pu
ta
tio

na
lly

ex
pe
ns
iv
e
to

tr
ai
n

1

K
N
N

k-
ne
ar
es
tn

ei
gh
bo
r,
H
M
M

hi
dd
en

m
ar
ko
v
m
od
el
,R

N
N

re
cu
rr
en
tn

eu
ra
ln

et
w
or
k,

C
N
N

co
nv
ol
ut
io
na
ln

eu
ra
ln

et
w
or
k,

SV
M

su
pp
or
tv

ec
to
r
m
ac
hi
ne
,L

ST
M

lo
ng

sh
or
t-
te
rm

m
em

or
y

123


International Journal of Social Robotics (2022) 14:1583–1604 1589

Ta
bl
e
2

D
et
ai
le
d
in
fo
rm

at
io
n
ab
ou
ts
tu
di
es

on
em

ot
io
n
re
co
gn
iti
on

an
d
H
um

an
R
ob
ot

In
te
ra
ct
io
n
(H

R
I)

R
ef
er
en
ce
s

R
ec
og
ni
tio

n
m
od
e

R
ec
og
ni
ze
d
em

ot
io
ns

R
ob
ot

A
lg
or
ith

m
ty
pe

M
aj
or

fin
di
ng
s

A
hm

ed
et
al
.[
1]

Fa
ce

A
ng

ry
,d
is
gu

st
,f
ea
r,
ha
pp
y,
ne
ut
ra
l,

sa
d,

su
rp
ri
se

–
C
N
N
w
ith

da
ta
au
gm

en
ta
tio

n
T
he

m
od

el
ac
hi
ev
ed

an
ac
cu
ra
cy

of
m
or
e
th
an

90
%

fo
r
ea
ch

em
ot
io
n
as

it
co
ul
d
cl
as
si
fy

ge
om

et
ri
ca
lly

di
s-

pl
ac
ed

fa
ci
al
im

ag
es

B
ar
ro
s
et
al
.[
2]

Fa
ce

Po
si
tiv

e,
ne
ga
tiv

e,
ne
ut
ra
l

iC
ub

C
N
N

T
he

ne
tw
or
k

is
ab
le

to
re
co
g-

ni
ze

em
ot
io
ns

fr
om

di
ff
er
en
t
en
vi
-

ro
nm

en
ts
,
di
ff
er
en
t
su
bj
ec
ts

pe
r-

fo
rm

in
g

sp
on
ta
ne
ou
s
ex
pr
es
si
on
s,

an
d
in

re
al
-t
im

e

B
er
a
et
al
.[
8]

Fa
ce

H
ap
py
,s
ad
,a
ng
ry
,n

eu
tr
al

Pe
pp
er

B
ay
es
ia
n
in
fe
re
nc
e
an
d
C
N
N

A
m
ul
ti-
ch
an
ne
l
m
od
el

to
cl
as
-

si
fy

pe
de
st
ri
an

fe
at
ur
es

in
to

fo
ur

ca
te
go

ri
es

of
em

ot
io
ns
.
E
m
ot
io
na
l

de
te
ct
io
n
ac
cu
ra
cy

of
85
.3
3%

w
as

ob
se
rv
ed

in
th
e
va
lid

at
io
n
re
su
lts

B
ye
on

an
d
K
w
ak

[1
2]

Fa
ce

H
ap
pi
ne
ss
,s
ad
ne
ss
,a
ng
er
,s
ur
pr
is
e,

di
sg
us
t,
fe
ar

–
3D

-C
N
N

T
he

ex
pe
ri
m
en
ta
l
re
su
lts

on
vi
de
o-

ba
se
d

fa
ci
al

ex
pr
es
si
on

da
ta
ba
se

re
ve
al
ed

th
at

th
e
m
et
ho

d
sh
ow

ed
a

go
od

pe
rf
or
m
an
ce

in
co
m
pa
ri
so
n
to

th
e
co
nv
en
tio

na
l
m
et
ho
ds

su
ch

as
PC

A
an
d
T
M
PC

A

C
he
n
et
al
.[
15

]
Fa
ce

A
ng
ry
,d
is
gu
st
,f
ea
r,
ha
pp
y,
sa
d,
su
r-

pr
is
e,
ne
ut
ra
l

X
ia
oB

ao
C
N
N
(V

G
G
-1
6)

T
he

fa
ci
al
ex
pr
es
si
on

re
co
gn
iti
on

in
th
e
w
ild

(F
E
R
W
)
m
od

el
ca
n
re
co
g-

ni
ze

fa
ci
al

ex
pr
es
si
on

s
in

th
e
re
al
-

w
or
ld
w
ith

an
ac
cu
ra
cy

of
79

%
an
d
a

re
al
-t
im

e
po

si
tiv

e
em

ot
io
n
in
ce
nt
iv
e

sy
st
em

(P
E
IS
)
w
as

ab
le
to

en
ha
nc
e

us
er

ex
pe
ri
en
ce

C
id

et
al
.[
18

]
Fa
ce
,v
oi
ce

Sa
d,

ha
pp
y,
fe
ar
,a
ng

er
,n

eu
tr
al

M
ue
ca
s

D
yn

am
ic
B
ay
es
ia
n
N
et
w
or
k

T
he

B
ay
es
ia
n
ap
pr
oa
ch

to
th
e
em

o-
tio

n
re
co
gn
iti
on

pr
ob
le
m

pr
es
en
ts

go
od

re
su
lts

fo
r
re
al
-t
im

e
ap
pl
ic
a-

tio
ns

w
ith

un
tr
ai
ne
d

us
er
s
in

an
un
co
nt
ro
lle
d
en
vi
ro
nm

en
t

D
an
dı
la
nd

Ö
zd
em

ir
[2
3]

Fa
ce

A
ng
er
,
fe
ar
,
ha
pp
y,

su
rp
ri
se
,
sa
d,

ne
ut
ra
l

–
C
N
N

Su
cc
es
sf
ul

re
su
lts

ob
ta
in
ed

on
re
al
-

tim
e
vi
de
o,

in
ch
an
gi
ng

lig
ht

an
d

en
vi
ro
nm

en
t
co
nd
iti
on
s
w
ith

80
%

ac
cu
ra
cy

123


1590 International Journal of Social Robotics (2022) 14:1583–1604

Ta
bl
e
2

co
nt
in
ue
d

R
ef
er
en
ce
s

R
ec
og
ni
tio

n
m
od
e

R
ec
og
ni
ze
d
em

ot
io
ns

R
ob
ot

A
lg
or
ith

m
ty
pe

M
aj
or

fin
di
ng
s

D
en
g
et
al
.[
27

]
Fa
ce

Su
rp
ri
se
,
sa
d,

ne
ut
ra
l,
ha
pp
y,

fe
ar
,

di
sg
us
t,
an
ge
r

–
cG

A
N

C
an

le
ar
n

no
t
on
ly

th
e

re
gi
on
s

re
la
te
d
to

ex
pr
es
si
on

bu
t
al
so

m
ax
-

im
al
ly

ca
pt
ur
in
g

nu
an
ce
d

ch
ar
-

ac
te
ri
st
ic
s

re
le
va
nt

to
ex
pr
es
si
on

an
d

th
en

tr
an
sf
or
m
in
g

th
e

or
ig
i-

na
le
xp
re
ss
io
n
to
an
ot
he
re
xp
re
ss
io
n

w
ith

id
en
tit
y
an
d
ot
he
r
fa
ct
or
s
pr
e-

se
rv
ed

Fa
ri
a
et
al
.[
34

]
Fa
ce

A
fr
ai
d,

an
gr
y,

ha
pp
y,

ne
ut
ra
l,
di
s-

gu
st
in
g,

sa
d,

su
rp
ri
se
d

N
ao

D
yn
am

ic
B
ay
es
ia
n
M
ix
tu
re

M
od
el

(D
B
M
M
)

A
n
ov
er
al
la
cc
ur
ac
y
ar
ou
nd

85
%

on
K
D
E
F
da
ta
se
ta
nd

80
%

on
te
st
s
on
-

th
e-
fly

du
ri
ng

hu
m
an
–r
ob
ot
in
te
ra
c-

tio
n

G
e
et
al
.[
38

]
Fa
ce

H
ap
py
,s
ad
,f
ea
r,
di
sg
us
t,
an
ge
r,
su
r-

pr
is
e

R
ob
ot
ic
he
ad

SV
M

T
he

ex
pe
ri
m
en
ta
l
re
su
lts

sh
ow

ed
th
at

th
e
pr
op

os
ed

no
nl
in
ea
r
fa
ci
al

m
as
s-
sp
ri
ng

m
od
el
co
up
le
d
w
ith

th
e

SV
M

cl
as
si
fie

r
is

ef
fe
ct
iv
e
to

re
c-

og
ni
ze

th
e
fa
ci
al

ex
pr
es
si
on
s
co
m
-

pa
re
d
w
ith

th
e
lin

ea
r
m
as
s-
sp
ri
ng

m
od
el

In
th
ia
m

et
al
.[
56

]
Fa
ce

an
d
bo
di
ly

m
ov
em

en
t

Po
si
tiv

e
m
oo
d
an
d
ne
ga
tiv

e
m
oo
d

–
H
M
M

Fa
ci
al

ex
pr
es
si
on

al
on
e

m
ay

be
m
is
le
ad
in
g,

si
nc
e
it
m
ay

no
t
be

a
tr
ue

ex
pr
es
si
on

of
in
ne
r
fe
el
in
g.
B
y

in
cl
ud
in
g
bo
di
ly

ex
pr
es
si
on

in
th
e

an
al
ys
is
,t
he

es
tim

at
io
n
m
od

el
ga
ve

a
be
tte

rr
es
ul
tw

ith
es
tim

at
io
n
ac
cu
-

ra
cy

ov
er

70
%

L
ie
ta
l.
[7
2]

Fa
ce

H
ap
pi
ne
ss
,a
ng

er
,d
is
gu

st
,f
ea
r,
sa
d-

ne
ss
,s
ur
pr
is
e

H
ar
le
y

C
N
N
-L
ST

M
C
N
N

an
d
L
ST

M
ar
e
co
m
bi
ne
d
to

ex
pl
oi
t
th
ei
r
ad
va
nt
ag
es

in
th
e
pr
o-

po
se
d

m
od
el

an
d

th
e

sy
st
em

is
ap
pl
ie
d

to
a

hu
m
an
oi
d

ro
bo
t
to

de
m
on

st
ra
te

its
pr
ac
tic

ab
ili
ty

fo
r

im
pr
ov
in
g
th
e
H
R
I

L
iu

et
al
.[
81

]
Fa
ce

H
ap
py
,
an
gr
y,

su
rp
ri
se
,
fe
ar
,
di
s-

gu
st
,s
ad
,a
nd

ne
ut
ra
l

M
ob

ile
ro
bo

t
E
L
M

cl
as
si
fie

r
R
ec
og

ni
ze
d
hu

m
an

em
ot
io
ns

w
ith

80
%

ac
cu
ra
cy

L
iu

et
al
.[
79

]
Fa
ce

A
ng

er
,s
ca
re
d,
sa
d,
su
rp
ri
se
,n
eu
tr
al
,

di
sg
us
t

–
C
N
N

A
n
av
er
ag
e
w
ei
gh
tin

g
m
et
ho
d
w
as

pr
op

os
ed

to
av
oi
d
po

te
nt
ia
le
rr
or
s
in

re
al
-t
im

e
fa
ci
al
ex
pr
es
si
on

re
co
gn

i-
tio

n
ba
se
d
on

th
e
tr
ad
iti
on
al
co
nv
o-

lu
tio

na
ln

eu
ra
ln

et
w
or
k

123


International Journal of Social Robotics (2022) 14:1583–1604 1591

Ta
bl
e
2

co
nt
in
ue
d

R
ef
er
en
ce
s

R
ec
og
ni
tio

n
m
od
e

R
ec
og
ni
ze
d
em

ot
io
ns

R
ob
ot

A
lg
or
ith

m
ty
pe

M
aj
or

fin
di
ng
s

L
op
ez
-R
in
co
n
[8
2]

Fa
ce

Sa
dn
es
s,
ha
pp
in
es
s,
su
rp
ri
se
,a
ng
er
,

di
sg
us
t,
fe
ar

N
ao

A
FF

D
E
X
SD

K
an
d
C
N
N

T
he

gl
ob
al

em
ot
io
n

cl
as
si
fic
at
io
n

ac
cu
ra
cy

of
th
e
C
N
N

is
be
tte

r
th
an

th
e
A
FF

D
E
X

sy
st
em

,
bu
t
th
e
fa
ce

de
te
ct
io
n

in
A
FF

D
E
X

is
sl
ig
ht
ly

be
tte

r

M
ar
tin

et
al
.[
87

]
Fa
ce

A
ng
er
,
di
sg
us
t,
fe
ar
,
su
rp
ri
se
,
ne
u-

tr
al
,h
ap
py
,s
ad

–
A
ct
iv
e
A
pp
ea
ra
nc
e
M
od
el

(A
A
M
),

M
ul
ti
L
ay
er

Pe
rc
ep
tr
on

(M
L
P)

an
d

SV
M

C
om

pa
re
d

th
re
e

di
ff
er
en
t

fa
ci
al

ex
pr
es
si
on

cl
as
si
fie

rs
(A

A
M

cl
as
si
-

fie
r
se
t,
M
L
P
an
d
SV

M
)

M
eg
hd
ar
ie
ta
l.
[9
1]

Fa
ce

H
ap
pi
ne
ss
,s
ad
ne
ss
,f
ea
r,
an
ge
r,
su
r-

pr
is
e

A
lic

e
A
N
N

C
an

re
co
gn

iz
e

em
ot
io
ns

w
ith

92
.5
2%

ac
cu
ra
cy

in
re
al
-t
im

e

N
un
es

[1
00

]
Fa
ce

an
d
up
pe
r
bo
dy

A
ng
er
,d
is
gu
st
,f
ea
r,
ha
pp
in
es
s,
sa
d-

ne
ss
,s
ur
pr
is
e
an
d
ne
ut
ra
l

–
C
N
N

B
im

od
al

ap
pr
oa
ch

(8
6.
6%

ac
cu
-

ra
cy
)
ba
se
d

on
em

ot
io
n

re
co
gn
i-

tio
n

fr
om

bo
th

fa
ce

an
d

up
pe
r

bo
dy

pr
od
uc
ed

be
tte
r
re
su
lts

th
an

m
on
om

od
al
ap
pr
oa
ch
.

R
om

er
o
et
al
.[
10
8]

Fa
ce
,v
oi
ce
,b

od
y
ge
st
ur
es

N
eu
tr
al
,f
ea
r,
an
gr
y,
sa
d,

ha
pp
y

–
D
yn

am
ic
B
ay
es
ia
n
N
et
w
or
k

T
he

st
an
da
rd
iz
ed

va
ri
ab
le
s
as
so
ci
-

at
ed

to
th
e
A
U
s
of

th
e
us
er

w
er
e

ob
ta
in
ed
,a
llo

w
in
g
re
al
-t
im

e
re
co
g-

ni
tio

n
of

ea
ch

fa
ci
al
ex
pr
es
si
on

w
ith

di
ff
er
en
t
fa
ct
or
s,

su
ch

as
lig

ht
in
g

co
nd
iti
on
s,

ge
nd
er
,
un
us
ua
l
fa
ci
al

fe
at
ur
es

(l
ik
e

in
ju
ri
es

or
sc
ar
s)
,

am
on
g
ot
he
rs

R
ui
z-
G
ar
ci
a
et
al
.[
11
0]

Fa
ce

Su
rp
ri
se
,h
ap
py
,d
is
gu
st
,a
ng
ry
,f
ea
r,

sa
d

–
C
N
N

Im
ag
es

fr
om

28
pa
rt
ic
ip
an
ts

w
er
e

co
lle

ct
ed

in
an

un
co
nt
ro
lle

d
en
vi
-

ro
nm

en
t
to

te
st

ou
r
C
N
N

em
ot
io
n

re
co
gn

iti
on

m
od

el
,
re
su
lti
ng

in
a

cl
as
si
fic

at
io
n
ra
te
of

73
.5
5%

Sh
ie
ta
l.
[1
14

]
Fa
ce

an
d
bo
dy

In
te
re
st
ed
,d

is
tr
ac
te
d,
co
nf
us
ed

Pe
pp
er

k-
N
ea
re
st
N
ei
gh
bo
ur

(K
N
N
)

A
m
ul
ti-
st
ud
en
t
af
fe
ct

re
co
gn
iti
on

sy
st
em

w
hi
ch
,
st
ar
tin

g
fr
om

ei
gh
t

ba
si
c
em

ot
io
ns

de
te
ct
ed

fr
om

fa
ci
al

ex
pr
es
si
on
s,
ca
n
in
fe
r
hi
gh
er

em
o-

tio
na
l
st
at
es

re
le
va
nt

to
a
le
ar
ni
ng

co
nt
ex
t,
su
ch

as
“i
nt
er
es
te
d”
,
“d
is
-

tr
ac
te
d”

an
d
“c
on
fu
se
d”

123


1592 International Journal of Social Robotics (2022) 14:1583–1604

Ta
bl
e
2

co
nt
in
ue
d

R
ef
er
en
ce
s

R
ec
og
ni
tio

n
m
od
e

R
ec
og
ni
ze
d
em

ot
io
ns

R
ob
ot

A
lg
or
ith

m
ty
pe

M
aj
or

fin
di
ng
s

Si
m
ul

et
al
.[
11
6]

Fa
ce

N
eu
tr
al
,s
ur
pr
is
e,
sa
d,

sm
ile

,a
ng

ry
R
ib
o

SV
M

R
ob

ot
R
ib
o

re
co
gn

iz
es

hu
m
an

fa
ci
al

ex
pr
es
si
on

,
fa
ci
al

ge
st
ur
e

m
ov
em

en
t
an
d
de
te
ct
s
hu

m
an

ge
n-

de
r
in

re
al
-t
im

e

V
ith

an
aw

as
am

an
d
M
ad
hu
sa
nk
a
[1
30

]
Fa
ce

an
d
up
pe
r
bo
dy

A
ng
er
,f
ea
r,
bo
re
d

–
Fi
sh
er
fa
ce

al
go
ri
th
m

/
po
si
tio

n
of

ar
m
s

A
ng

er
w
as

co
rr
ec
tly

pr
ed
ic
te
d

81
.5
4%

of
th
e
tim

es
,
fo
llo

w
ed

by
bo
re
d
(7
2.
20
%
)
an
d
fe
ar

(6
8.
37
%
).

T
he

re
su
lts

w
er
e
se
ns
iti
ve

to
th
e

lig
ht
in
g
co
nd
iti
on
s

W
eb
b
et
al
.[
13
3]

Fa
ce

A
ng

ry
,d
is
gu

st
,f
ea
r,
ha
pp
y,
ne
ut
ra
l,

sa
d,

su
rp
ri
se
,n

eu
tr
al

N
ao

C
N
N

W
he
n
ev
al
ua
te
d
on

no
ve
ld

at
a
w
ith

no
nu
ni
fo
rm

co
nd
iti
on
s
ta
ke
n
by

a
N
ao

ro
bo
t
an

ac
cu
ra
cy

of
79
.7
5%

w
as

ac
hi
ev
ed

W
im

m
er

et
al
.[
13
4]

Fa
ce

A
ng
er
,d
is
gu
st
,f
ea
r,
ha
pp
y,
sa
d,
su
r-

pr
is
e

B
21

ro
bo

t
B
in
ar
y
D
ec
is
io
n
T
re
e

E
xp

er
im

en
ta
l

ev
al
ua
tio

n
re
po

rt
s

a
re
co
gn
iti
on

ra
te

of
70
%

on
th
e
C
oh
n–
K
an
ad
e
fa
ci
al
ex
pr
es
si
on

da
ta
ba
se
,
an
d
67
%

in
a
ro
bo
t
sc
e-

na
ri
o

W
u
et
al
.[
13
6]

Fa
ce

A
ng
ry
,
sa
d,

di
sg
us
t,

fe
ar
,
ha
pp
y,

ne
ut
ra
l,
su
rp
ri
se

–
W
ei
gh

t-
A
da
pt
ed

C
on
vo
lu
tio

n
N
eu
-

ra
lN

et
w
or
k
(W

A
C
N
N
)

T
he

re
co
gn

iti
on

ac
cu
ra
ci
es

of
th
e

pr
op
os
ed

al
go
ri
th
m

w
er
e

hi
gh
er

th
an

th
e
de
ep

C
N
N

w
ith

ou
t
H
G
A
,

in
di
ca
tin

g
a
be
tte

r
gl
ob

al
op

tim
iz
a-

tio
n
ab
ili
ty

Y
u
an
d
Ta
pu
s
[1
42

]
Fa
ce
,b
od
y
ge
st
ur
e

N
eu
tr
al
,h

ap
py
,a
ng
ry
,s
ad

Pe
pp
er

R
an
do
m

Fo
re
st
s
(R
F)

D
ev
el
op
ed

a
m
ul
tim

od
al

em
ot
io
n

re
co
gn

iti
on

m
od

el
w
ith

ga
it

an
d

th
er
m
al

fa
ci
al

da
ta
,
w
hi
ch

is
ba
se
d

on
R
F
m
od
el
an
d
th
e
m
od
ifi
ed

co
n-

fu
si
on

m
at
ri
ce
s
of

tw
o
in
di
vi
du
al

m
od

el
s

123


International Journal of Social Robotics (2022) 14:1583–1604 1593

Table 3 Studies on FER; Note: Studies listed according to accuracy level

Study Dataset Algorithm Classes Accuracy

(a) With accuracy greater than 90% on static input i.e., human images

Mistry et al. [95] CK+/MMI SVM 7 100%/94.66%

Kotsia and Pitas [66] CK SVM 6 99.7%

Hossain et al. [51] Jaffe/CK GMM 7 99.8%/99.7%

Kar et al. [60] CK+ BPNN 6 99.51%

Mliki et al. [45] CK/Jaffe SVM 7 99.24%/96.50%

Chen et al. [16] CK+/Jaffe CNN 7 99.1597%/87.7350%

Zhang et al. [146] CK+ CNN 6 98.9%

Mayya et al. [89] Jaffe/CK+ CNN 7 98.12%/96.02%

Minaee and Abdolrashidi [94] CK+/Jaffe CNN 7 98.0%/92.8%

Nwosu et al. [101] Jaffe/CK+ CNN 7 97.71%/95.72%

Yang et al. [140] CK+/Oulu-Casia/Jaffe CNN 6 97.02%/92.89%/92.21%

Yang et al. [139] CK+/Oulu-Casia/Jaffe WMDNN 6 97.0%/92.3%/92.2%

Ding et al. [30] CK+/Oulu-Casia CNN 8/6 96.8%/87.71%

Gogić et al. [40] CK+/Jaffe/ MMI NN 7 96.48%/85.88%/73.73%

Kim et al. [62] CK+/Jaffe CNN 6 96.46%/91.27%

Hua et al. [52] Jaffe CNN 7 96.44%

Mannan et al. [85] CK+ SVM 7 96.36%

Ruiz-Garcia et al. [110] KDEF/CK+ CNN-SVM 7 96.26%/95.87%

Hamester et al. [44] Jaffe CNN 7 95.8%

Meng et al. [93] CK+/MMI CNN 6 95.27%/71.55%

Liliana et al. [77] CK+ SVM 7 93.93%

Ferreira et al. [36] CK+/Jaffe CNN 8/6 93.64%/89.01%

Mollahosseini et al. [97] CK+ DNN 7 93.2%

Yaddadenet al. [137] Jaffe/KDEF KNN 7 92.29%/79.69%

(b) With accuracy greater than 90% on dynamic input i.e., human videos

Liang et al. [76] CK+/Oulu-Casia/MMI CNN-BiLSTM 6 99.6%/91.07%/80.71%

Carcagnì et al. [14] CK+ SVM 7 98.5%

Wu et al. [135] CK+ HMM 7 98.54%

Zhang et al. [145] CK+/Oulu-Casia/MMI CNN-RNN 6 98.5%/86.25%/81.18%

Kotsia et al. [65] CK SVM 6 98.2%

Uddin et al. [126] Depth DBN 6 96.67%

Zhao et al. [147] CK+/Oulu-Casia/MMI SVM 7 95.8%/74.37%/71.92%

Elaiwat et al. [32] CK+/MMI RBM 7 95.66%/81.63%

Uddin et al. [127] CK CNN 6 95.42%

Sikka et al. [115] CK+/Oulu-Csia HMM 7 94.60%/75.62%

Kabir et al. [58] Depth HMM 6 94.17%

Study Robot Sensor Algorithm Classes Accuracy

(c) FER in real-time i.e., on dynamic input during HRI

Cid et al. [18] Muecas camera Bayesian 5 94%

Meghdari et al. [91] Alice Kinect ANN 6 92.52%

Simul et al. [116] Ribo Webcam SVM 5 86%

Bera et al. [8] Pepper Camera CNN 4 85.33%

Liu et al. [81] Mobile robot Kinect ELM 7 Above 80%

123


1594 International Journal of Social Robotics (2022) 14:1583–1604

Table 3 continued

Study Robot Sensor Algorithm Classes Accuracy

Yu and Tapus [142] Pepper camera RF 4 78.125%

Webb et al. [133] Nao Camera CNN 8 79.75%

Chen et al. [15] XiaoBao Camera CNN 7 79%

Barros et al. [2] iCub RGB camera CNN 3 74.2%

Ruiz-Garcia et al. [110] Nao Built-in camera CNN-SVM 7 68.75%

Wimmer et al. [134] B21 robot Camera Binary decision tree 6 67%

CK Cohn–Kanade, Jaffe Japanese Female Facial Expression, GAN Generative Adversarial Network, KNN k-Nearest Neighbor, HMM Hidden
Markov Model, RNN Recurrent Neural Network, CNN Convolutional Neural Network, SVM Support Vector Machine, LSTM Long Short-Term
Memory, WMDNN Weighted Mixture Deep Neural Network, NN Neural Network, ANN Artificial Neural Network, ELM Extreme Learning
Machine, BPNN Back Propagation Neural Network, DBN Deep Belief Network, RF Random Forests

Fig. 6 Facial expression generation techniques

lowing the movement related to specific AUs in the facial
acting coding system (FACS).

Breazeal’s [9] robot Kismet generated emotions using an
interpolation-based technique over a 3-D space, where the
three dimensions correspond to valence, arousal and stance.
The expressions become intense as the affect state moves to
extreme values in the affect space. Park et al. [102] made
diverse facial expressions by changing their dynamics and
increased the lifelikeness of a robot by adding secondary
actions such as physiological movements (eye blinking and
sinusoidal motions concerning respiration). A second-order
differential equation based on the linear affect-expressions
space model is used to achieve the dynamic motion for
expressions. Prajapati et al. [103] used a dynamic emotion
generation model to convert the facial expressions derived
from the human face into a more natural form before render-
ing them on the robotic face. The model is provided with the
facial expression of the person interacting with the system
and corresponding synthetic emotions generated are fed to
the robotic face.

Summary of findings The robot faces are capable of mak-
ing basic facial expressions as they contain enough DoFs in

the eyes andmouth. They are able to generate static emotions
[7,61,91]. Additionally, the robot faces are able to generate
dynamic emotions [9,102,103].

5.2 Facial Expression Generation is Automated

Some of the studies automatically generate facial expres-
sions on robots. Unlike hand-coded techniques where the
commands for the position of features like eyes and mouth
are sent from the computer, here, the facial expressions are
generated using machine learning techniques such as neural
networks and RL.

Breazeal et al. [10] presented a robot Leonardo that can
imitate human facial expressions. They use neural networks
to learn the direct mapping of a human’s facial expressions
onto Leonardo’s own joint space. InHorii et al. [50], the robot
does not directly imitate the human but estimates the correct
emotion and generates the estimated emotion using RBM.
RBM[46] is a generativemodel that represents the generative
process of data distribution and latent representation, and can
generate data from latent signals [98,117,123].

123


International Journal of Social Robotics (2022) 14:1583–1604 1595

Ta
bl
e
4

D
et
ai
le
d
in
fo
rm

at
io
n
ab
ou
ts
tu
di
es

on
em

ot
io
n
ex
pr
es
si
on

an
d
H
um

an
R
ob
ot

In
te
ra
ct
io
n
(H

R
I)

R
ef
er
en
ce
s

E
xp
re
ss
io
n
m
od
e

E
xp
re
ss
ed

em
ot
io
ns

R
ob
ot

C
od
in
g

M
aj
or

fin
di
ng
s

B
en
ne
tt
an
d
Sa
ba
no
vi
c
[7
]

Fa
ce

N
eu
tr
al
,
sa
dn

es
s,
ha
pp

in
es
s,
an
ge
r,

fe
ar
,s
ur
pr
is
e

M
iR
A
E

H
an
d-
co
de
d

Id
en
tifi

ed
m
in
im

al
fe
at
ur
es
,

i.e
.

m
ov
em

en
t

of
ey
es
,

ey
eb
ro
w
s,

m
ou
th

an
d

ne
ck

ar
e
su
ffi
ci
en
t
to

id
en
tif
y
th
e
fa
ci
al
ex
pr
es
si
on

B
re
az
ea
l[
9]

Fa
ce

A
ng

er
,c
al
m
,d

is
gu

st
,f
ea
r,
co
nt
en
t,

in
te
re
st
,s
or
ro
w
,s
ur
pr
is
e,
tir
ed

K
is
m
et

H
an
d-
co
de
d

K
is
m
et
ca
n
ge
ne
ra
te
em

ot
io
ns

us
in
g

an
in
te
rp
ol
at
io
n-
ba
se
d

te
ch
ni
qu
e

ov
er

a
3-
D

sp
ac
e
w
he
re

th
e
th
re
e

di
m
en
si
on
s
co
rr
es
po
nd

to
va
le
nc
e,

ar
ou
sa
la
nd

st
an
ce

B
re
az
ea
le
ta
l.
[1
0]

Fa
ce

L
eo
na
rd
o

A
ut
om

at
ed

(N
eu
ra
lN

et
w
or
k)

B
ui
lt

a
ro
bo
t
ca
pa
bl
e
of

le
ar
ni
ng

ho
w

to
im

ita
te

fa
ci
al

ex
pr
es
si
on

s
fr
om

si
m
pl
e
im

ita
tiv

e
ga
m
es

pl
ay
ed

w
ith

a
hu

m
an
,
us
in
g
bi
ol
og

ic
al
ly

in
sp
ir
ed

m
ec
ha
ni
sm

s

C
hu

ra
m
an
ie
ta
l.
[1
7]

Fa
ce

A
ng
er
,h

ap
pi
ne
ss
,n

eu
tr
al
,s
ad
ne
ss
,

su
rp
ri
se

N
ic
o

A
ut
om

at
ed

(R
L
)

E
xp
lo
re
d
a
co
nt
in
uo
us

re
pr
es
en
ta
-

tio
n

of
ex
pr
es
si
on

on
th
e

N
IC
O

ro
bo
tu

si
ng

th
e
co
m
pl
et
e
fa
ce

L
E
D

m
at
ri
x
to

ge
ne
ra
te
ex
pr
es
si
on

s

C
id

et
al
.[
19

]
Fa
ce

Sa
d,
ha
pp
y,
fe
ar
,a
ng
er
,n

eu
tr
al

M
ue
ca
s

H
an
d-
co
de
d

T
he

ou
tp
ut

of
th
e
B
ay
es
ia
n
cl
as
si
-

fie
r
is
im

ita
te
d
by

th
e
ro
bo

tic
he
ad

M
ue
ca
s

E
sf
an
db
od

et
al
.[
33

]
Fa
ce

N
eu
tr
al
,

ha
pp
y,

sa
d,

su
rp
ri
se
d,

an
gr
y

R
A
SA

H
an
d-
co
de
d

Fr
om

th
e
su
bj
ec
ts
’
vi
ew

po
in
t,
th
e

sy
st
em

’s
pe
rf
or
m
an
ce

w
as

fa
ir
ly

pr
om

is
in
g
w
ith

a
sc
or
e
of

4.
1
ou
t

of
5

G
e
et
al
.[
38

]
Fa
ce

H
ap
py
,
fe
ar
,
su
rp
ri
se
,
di
sg
us
t,
sa
d-

ne
ss
,a
ng
er

R
ob
ot
ic
he
ad

H
an
d-
co
de
d

T
he

ro
bo
th
ea
d
ar
e
tr
ig
ge
re
d
to
im

i-
ta
te
hu

m
an

fa
ci
al
ex
pr
es
si
on

s
by

th
e

em
ot
io
n
ge
ne
ra
to
r
en
gi
ne

an
d
ca
n

ge
ne
ra
te
a
vi
vi
d
im

ita
tio

n
ac
co
rd
in
g

to
th
e
te
st
er
’s
fa
ci
al
ex
pr
es
si
on

H
or
ii
et
al
.[
50

]
Fa
ce
,h
an
ds
,v
oi
ce

H
ap
pi
ne
ss
,n

eu
tr
al
,a
ng
er
,s
ad
ne
ss

iC
ub

A
ut
om

at
ed

(R
B
M
)

T
he

ro
bo
t

do
es

no
t

co
py

th
e

hu
m
an
’s

ex
pr
es
si
on
s
di
re
ct
ly

bu
t

ge
ne
ra
te
s
ex
pr
es
si
on
s
on

its
ow

n
af
te
r
es
tim

at
in
g
th
e
hu

m
an
’s

em
o-

tio
n

Il
ic
et
al
.[
54

]
Fa
ce

A
ng
er
,d
is
gu
st
,f
ea
r,
ha
pp
in
es
s,
sa
d-

ne
ss
,s
ur
pr
is
e,
in
di
ff
er
en
ce

A
is
oy

H
an
d-
co
de
d

L
ea
rn
ed

a
m
od
el

of
th
e
em

ot
io
na
l

va
lu
e
of

th
e
ro
bo
t’s

fa
ci
al

ex
pr
es
-

si
on
s

w
ith

ou
t

hu
m
an
s’

ex
pl
ic
it

fe
ed
ba
ck

K
im

et
al
.[
61

]
Fa
ce

N
eu
tr
al
,a
ng

er
,h
ap
pi
ne
ss
,f
ea
r,
sa
d-

ne
ss
,s
ur
pr
is
e

U
lk
ni

H
an
d-
co
de
d

In
tr
od

uc
ed

an
ar
tifi

ci
al

fa
ci
al

ex
pr
es
si
on

im
ita

tio
n
sy
st
em

us
in
g

a
ro
bo
th

ea
d

123


1596 International Journal of Social Robotics (2022) 14:1583–1604

Ta
bl
e
4

co
nt
in
ue
d

R
ef
er
en
ce
s

E
xp
re
ss
io
n
m
od
e

E
xp
re
ss
ed

em
ot
io
ns

R
ob
ot

C
od
in
g

M
aj
or

fin
di
ng
s

K
is
hi

et
al
.[
64

]
Fa
ce

A
ng

er
,
sa
dn

es
s,

fe
ar
,
di
sg
us
t,
su
r-

pr
is
e,
ha
pp
y

K
O
B
IA

N
H
an
d-
co
de
d

D
ev
el
op
ed

a
ne
w

he
ad

fo
r
bi
pe
d

w
al
ki
ng

ro
bo
t
K
O
B
IA

N
th
at

co
ul
d

ex
pr
es
s
th
e
6
ba
si
c
em

ot
io
ns

L
iu

et
al
.[
81

]
Fa
ce

H
ap
py
,
an
gr
y,

su
rp
ri
se
,
fe
ar
,
di
s-

gu
st
,s
ad
,n
eu
tr
al

M
ob
ile

ro
bo
t

H
an
d-
co
de
d

G
en
er
at
ed

fa
ci
al
ex
pr
es
si
on
s
ad
ap
t-

in
g
to
hu
m
an

em
ot
io
ns

us
in
g
a
fo
ur
-

la
ye
r
fr
am

ew
or
k
de
si
gn
ed

fo
r
th
e

sy
st
em

to
re
co
gn

iz
e
hu

m
an

em
ot
io
n

ba
se
d
on

H
R
I

M
ae
da

an
d
G
es
hi

[8
4]

Fa
ce

A
ng

er
,c
on

te
m
pt
,d
is
gu

st
,f
ea
r,
ha
p-

pi
ne
ss
,n
eu
tr
al
,s
ad
ne
ss
,s
ur
pr
is
e

TA
PI
A

H
an
d-
co
de
d

A
n

in
te
ra
ct
iv
e

co
m
m
un
ic
at
io
n

m
et
ho
d

of
a
hu
m
an

an
d

a
ro
bo
t

ba
se
d
on

th
e
M
ar
ko
vi
an

em
ot
io
na
l

m
od
el

(M
E
M
)
by

us
in
g
th
e
fa
ci
al

ex
pr
es
si
on

,w
as

si
gn

ifi
ca
nt
ly

be
tte

r
th
an

us
in
g
an

id
en
tic

al
,
sy
m
m
et
ri
c

or
ra
nd
om

em
ot
io
n

in
te
ra
ct
io
n

m
et
ho
ds

M
eg
hd
ar
ie
ta
l.
[9
1]

Fa
ce

H
ap
py
,
sa
d,

an
gr
y,

su
rp
ri
se
d,

di
s-

gu
st
ed
,n

eu
tr
al

A
lic
e

H
an
d-
co
de
d

A
hu
m
an
oi
d
so
ci
al

ro
bo
t
“A

lic
e”

im
ita

te
sh

um
an

fa
ci
al
ex
pr
es
si
on

si
n

re
al
-t
im

e

Pa
rk

et
al
.[
10
2]

Fa
ce

A
ng
er
,d
is
gu
st
,f
ea
r,
ha
pp
in
es
s,
sa
d-

ne
ss
,s
ur
pr
is
e,
di
sl
ik
e

R
ob
ot
ic
he
ad

H
an
d-
co
de
d

M
ad
e

di
ve
rs
e

fa
ci
al

ex
pr
es
si
on
s

by
ch
an
gi
ng

th
ei
r
dy
na
m
ic
s
an
d

in
cr
ea
se
d
th
e
lif
el
ik
en
es
s
of

a
ro
bo

t
by

ad
di
ng

se
co
nd
ar
y
ac
tio

ns
su
ch

as
ph
ys
io
lo
gi
ca
l
m
ov
em

en
ts

(e
ye

bl
in
ki
ng

an
d

si
nu
so
id
al

m
ot
io
ns

co
nc
er
ni
ng

re
sp
ir
at
io
n)

Y
oo

et
al
.[
14
1]

Fa
ce

A
ng
ry
,d
is
gu
st
,f
ea
r,
ha
pp
in
es
s,
sa
d-

ne
ss
,s
ur
pr
is
e

R
ob

ot
ic
he
ad

H
an
d-
co
de
d

A
fu
zz
y
in
te
gr
al
-b
as
ed

ge
ne
ra
tio

n
m
et
ho
d
of

co
m
po
si
te
fa
ci
al
ex
pr
es
-

si
on
s
w
as

pr
op
os
ed

an
d

de
m
on
-

st
ra
te
d

its
ef
fe
ct
iv
en
es
s

th
ro
ug

h
th
e
ex
pe
ri
m
en
t
w
ith

th
e
de
ve
lo
pe
d

ro
bo

tic
he
ad

123


International Journal of Social Robotics (2022) 14:1583–1604 1597

Li andHashimoto [73] developed aKANSEI communica-
tion system based on emotional synchronization. KANSEI is
a Japanese term that means emotions, feeling, sensitivity etc.
The KANSEI communication system first recognizes human
emotion and maps the recognized emotion to the emotion
generation space. Finally, the robot expresses its emotion
synchronized with the human’s emotion in the emotion gen-
eration space. When the human changes his/her emotion, the
robot also synchronizes its emotion with the human’s emo-
tion, establishing a continuous communication between the
human and the robot. It was found that the subjects became
more comfortable with the robot and communicated more
with the robot when there was emotional synchronization.

In Churamani et al. [17], the robot Nico learned the cor-
rect combination of eyebrow and mouth wavelet parameters
to express itsmoodusingRL.The learned expressions looked
slightly distorted but were sufficient to distinguish between
various expressions. The robot could also generate expres-
sions that were not limited to the basic five expressions that
were learned. For amixed emotional state (for example, anger
mixed with sadness), the model was able to generate novel
expression representations representing the mixed state of
the mood.

Summary of findings In all of the above studies, the
robots learn to generate facial expressions automatically
using machine learning techniques. While Breazeal [10], Li
and Hashimoto [73] used direct mapping of human facial
expressions,Horii et al. [50] generated the estimated human’s
emotion on the robot. In Churamani et al. [17], the robot was
able to associate the learned expressions with the context of
the conversation.

6 Discussion

6.1 Summary of the State of the Art

There are already studies having high accuracy (greater than
90%) in facial expression recognition on CK+, Jaffe and
Oulu-Casia datasets. (see Table 3a, b). The accuracies on
CK+, Jaffe and Oulu-Casia datasets have been as high as
100%, 99.8%and 92.89% respectively. In comparison to this,
the accuracy for facial expression recognition in real-time is
not as high.

Zhang et al. [146] used a deep convolutional network
(DCN) that had an accuracy of 98.9% on CK+ dataset and
55.27% on Static Facial Expressions in the Wild (SFEW)
dataset. Here, the same network produced very different
results for two different datasets. SFEW [29] consists of
close to a real-world environment extracted from movies.
The database covers unconstrained facial expressions, varied
head poses, large age range, occlusions, varied focus, differ-
ent resolution of faces, and close to real-world illumination.

In Zhang et al. [146] the accuracy for ”in the wild” settings
was considerably lower than on CK+ dataset, implying that
the expression recognition algorithms can still not handle
the variations in environment, head poses etc. in real-world
settings.

Table 5 provides possible categories for facial recog-
nition in the wild. It contains the basic emotional facial
expressions, situation-specific face occlusions, permanent
face features, facemovements, situation-specific expressions
and side activities during facial expressions.

Most of the current research in facial expression recog-
nition relates to the first category of basic emotional facial
expression. Survey articles on facial expression recognition
have been cited in the Table 5 [11,13,21,42,43,71,88,109,
112]. For more details on individual studies, refer to Table
3. Facial expression recognition in the presence of situation-
specific face occlusions like a mouth–nose mask, glasses,
hand in front of face etc. has also been studied [74,75,131].
Pose invariant facial expression recognition when the face is
moving or turned sideways has also been partially studied
[96,113,143,144].

For the facial expression generation, robots can make cer-
tain basic facial expressions by moving their eyes, mouth
and neck.However, they cannotmake asmany expressions as
human beings due to the limited number of DoFs present in a
robot’s face. There are relatively fewer studies for automated
facial expression generation in robots [10,17,50,73]. While
the robots are capable of displaying their facial expressions
by manually coding the movement of the eyes and mouth,
there are fewer studies that would make a robot learn to dis-
play its facial expressions automatically.

Most of the studies on facial expression generation have
been carried out on robotic heads or humanoid robots like
iCub and Nico [e.g.9,10,17,50]. In Becker-Asano and Ishig-
uro [5], Geminoid F’s facial actuators are tuned such that the
readability of its facial expressions is comparable to a real
person’s static display of emotional expression. It was found
that the android’s emotional expressions were more ambigu-
ous than that of a real person and ’fear’ was often confused
with ’surprise’.

An advantage of automated facial expression generation
over hand-coded facial expression generation is that in auto-
mated facial expression generation, a robot could learnmixed
expressions than simply the learned expressions. Unlike in
hand-coded facial expression generation, where a robot can
only express the emotions that it has learned, in Churamani
et al. [17], the robot could express complex emotions that
were made up of a combination of emotions.

6.2 Future Research

Although facial expression recognition under specific set-
tings has high accuracy and robots can express basic emotions

123


1598 International Journal of Social Robotics (2022) 14:1583–1604

Ta
bl
e
5

Po
ss
ib
le
ca
te
go

ri
es

fo
r
fa
ci
al
re
co
gn

iti
on

in
th
e
w
ild

C
at
eg
or
y

D
es
cr
ip
tio

n
E
xa
m
pl
es

R
el
at
ed

st
ud

ie
s

A
pp

lie
d
al
go

ri
th
m

B
as
ic
em

ot
io
na

lf
ac
ia
le
xp

re
ss
io
ns

E
xp

re
ss
io
n

of
ba

si
c

em
ot
io
ns
,

su
ch

as
FA

C
S
by

E
km

an
(2
00
1)

H
ap

pi
ne
ss
,
sa
dn

es
s,

an
ge
r,

di
s-

gu
st
,f
ea
r,
su
rp
ri
se
,n

eu
tr
al

[1
1,
13

,2
1,
42

,4
3,
71

,8
8,
10
9,
11
2]

K
N
N
,H

M
M
,R

N
N
,C

N
N
,S

V
M

an
d
L
ST

M

Fa
ce

m
ov
em

en
ts

M
ov
em

en
ts

of
th
e
fa
ce

it
se
lf
(a
s
a

w
ho

le
)

M
ov
in
g
fo
rw

ar
d,
ba
ck
w
ar
d,
tu
rn
in
g

ar
ou
nd
,
tu
rn
in
g

si
de
w
ay
s
je
rk
in
g

th
e
he
ad

fo
rw

ar
d,

sp
in
ni
ng

[9
6,
11
3,
14
3,
14
4]

G
A
N
,C
N
N

Si
tu
at
io
n-
sp
ec
ifi
c
fa
ce

oc
cl
us
io
ns

O
cc
lu
si
on
s
of

th
e
fa
ce

du
e
to

si
tu
a-

ti
on
al

re
qu
ir
em

en
ts

M
ou
th
–n
os
e
m
as
k,
gl
as
se
s,
ha
nd

in
fr
on
to
ff
ac
e,
re
st
in
g
ha
nd

on
m
ou
th
,

he
ad
se
t

[7
4,
75

,1
31

]
C
N
N

Pe
rm

an
en
tf
ac
e
fe
at
ur
es

Pe
rm

an
en
tl
y

in
st
al
le
d

fe
at
ur
es

of
th
e
fa
ce

A
rt
ifi
ci
al

ey
e,
be
ar
d

[7
4,
75

,1
31

]
C
N
N

Si
tu
at
io
n-
sp
ec
ifi
c
ex
pr
es
si
on

s
F
ac
ia
le

xp
re
ss
io
ns

of
th
e
fa
ce

du
e

to
si
tu
at
io
na

lr
eq
ui
re
m
en

ts
N
od
di
ng

,
bl
in
ki
ng

,
lo
ok
in
g
do
w
n,

ya
w
ni
ng

,s
ha

ke
he
ad

in
ag
re
em

en
t,

ey
e
ro
ll,

cl
os
in
g
ey
es
,
lip

bi
tin

g,
pu

rs
ed

lip
s,

st
ic
ki
ng

to
ng

ue
ou

t,
w
in
ki
ng

Si
de

ac
tiv

iti
es

du
ri
ng

fa
ci
al

ex
pr
es
si
on

A
dd

iti
on

al
ac
tiv

iti
es

du
ri
ng

th
e

ex
pr
es
si
on

s
of

em
ot
io
ns

Ta
lk
in
g,

ea
tin

g,
dr
in
ki
ng

,
br
us
h-

in
g

te
et
h,

fix
in
g

ha
ir
,
co
m
bi
ng

ha
ir
,
bi
tin

g
na

ils
,
cl
ea
ni
ng

ey
es

w
ith

a
ha

nd
,c
ou

gh
in
g,
su
pp
or
tin

g
th
e
fa
ce

w
ith

a
ha

nd
,
itc
hi
ng

on
fa
ce
,b

lo
w
in
g
no

se
,s
ne
ez
in
g,

ru
b-

bi
ng

ey
es
,s
ip
pi
ng

th
ro
ug

h
a
st
ra
w
,

ap
pl
yi
ng

cr
ea
m

(1
)
ex
am

pl
es

w
er
e
ge
ne
ra
te
d
ba
se
d
on

50
lif
e
ob
se
rv
at
io
ns

an
d
50

vi
de
o-
ba
se
d
ob
se
rv
at
io
ns

by
th
e
au
th
or
s;
(2
)
bo
ld

=
ve
ry

w
el
l
un
de
rs
to
od

in
re
se
ar
ch
;
ita

lic
=
pa
rt
ly

un
de
rs
to
od

in
re
se
ar
ch
;

bo
ld
ita
lic

=
ha
rd
ly

un
de
rs
to
od

123


International Journal of Social Robotics (2022) 14:1583–1604 1599

through facial expressions, there are several possible direc-
tions for future research in this area.

Suggestion 1: Performing facial expression recognition in
the wild needs to be emphasized upon.
To efficiently recognize facial expressions in real-time and in
a real-world environment, the robot should be able to perform
facial expression recognition with varied head poses, varied
focus, presence of occlusions, different resolutions of the face
and varied illumination conditions. The studies that perform
facial expression recognition in real-time are limited to a
laboratory environment which is far different from a real-
world scenario. A good study would be the one where facial
expression recognition in the wild is performed.

Some studies perform facial expression recognition in the
wild, but their accuracy is much less than the accuracy on
predefined datasets like CK+, Jaffe, MMI etc. To increase
the efficiency of facial expression recognition in real-world
scenarios, the performance of facial expression recognition
in the wild needs to be improved. This can also be used to
recognize facial expressions in real-time. Based on this, a
direct adaptation of emotions would make HRI smoother.

Suggestion 2: Facial expressions during activities like
talking, nodding etc. need to be studied.
Situation-specific expressions (nodding, yawning, blinking,
looking down) and side activities during facial expressions
(talking, eating, drinking, sneezing) in Table 5 have not been
studied. To understand vivid expressions, it is required to
be able to recognize facial expressions for all categories.
Humans also express emotions while interacting with some-
one verbally, such as smiling while speaking when they are
happy. In this case, it should be possible to recognize a smile
during speech.

Suggestion 3: Combine facial expression recognition with
the data from other modalities such as voice, text, body
gestures and physiological data to improve the emotion
recognition rate.
Although this overview focuses on facial expression recogni-
tion, it may be possible to control one’s face and not express
the emotion one is truly experiencing. Some studies combine
facial expression recognition with audio data, body gestures
or physiological data for an improved emotion recognition
[41,53,83]. There are very few studies that combine facial
data with both audio and physiological data [106,107] and
studies that analyze all modalities (face, voice, text, body
gestures and physiological signals) have not been found.
Humans can recognize the emotion of a person quickly and
effectively by taking into account their facial expression,
body gestures, voice and words. Combining facial, audio,
text and body gestures with physiological data could lead to
a higher emotion recognition rate by machine learning algo-
rithms than by humans.

Suggestion 4: How should a robot react towards a given
human emotion?

In HHI, a human’s reaction to a given emotion is either a
result of parallel empathy or reactive empathy [26]. It should
be studied with which emotion should a robot appropriately
react to a given human emotion. Moreover, it needs to be
studied if a robot should be able to express negative emo-
tions. Most of the existing studies allow a robot to be able to
express basic emotions (anger, fear, happiness, neutral, sad-
ness, surprise). It may be reasonable for a robot to react with
a sad expression when a human being expresses anger. But,
should a robot be able to express extreme emotions such as
anger?

For facial expression generation, while robots are capa-
ble of displaying facial expressions both static and dynamic,
they are unable to generate facial expressions when they
are speaking. For example, robots could smile while talk-
ing to express their happiness or they could speak with a
frown when angry. Robots could also express their emotions
through partial facial or bodily gestures instead of showing
a full face expression. For example, tilting head down to
express sadness, frowning to express anger, eyes wide open
to express surprise and raising eyebrows.

Suggestion 5: Robots should be able to recognize and gen-
erate facial expressions with various intensities.
Emotions form a continuous range and can have various
intensities. If one is less happy, one would smile less. Simi-
larly, if someone is very happy, the smile would also be big.
It should be possible to recognize not just the emotion but
the intensity of emotion. Moreover, in most of the existing
studies, robots express their emotions with only one config-
uration per emotion. Robots should also be able to express
their emotions with different intensities. Finally, it needs to
be studied whether the intensity of emotion with which a
robot reacts to a given human emotion has any effects on the
human and whether these effects are positive or negative.

Suggestion 6: Robots should be able to express their emo-
tions through a combination of body gestures and facial
expressions.
While in this overview, we focus on robotic facial expres-
sions, there are other articles where emotional expression is
performed through the robot’s body postures [4,20,22,55,86,
90]. A potential future study could be to compare the robot’s
facial expressions with robot’s bodily expressions and also
with the combination of facial and bodily expressions to see
if there is any difference in the recognition of these.

Suggestion 7: Robots should be able to both recognize and
generate complex emotions such as that of thinking, calm and
bored states.
For both facial expression recognition and generation, there
is a need to go beyond the basic seven emotions to recog-
nizing and generating more complex emotions such as calm,
fatigued, bored etc. It might be difficult to generate complex
emotions given the hardware limitations of the robot, but if

123


1600 International Journal of Social Robotics (2022) 14:1583–1604

this is made possible, robots could express a wider range of
emotions similar to human beings.

7 Conclusion

This overview emphasizes the recognition of human facial
expressions and the generation of robotic facial expres-
sions. There are already plenty of studies having high
accuracy for facial expression recognition on pre-existing
datasets. Accuracy on facial expression recognition in the
wild is considerably lower than the experiments which have
been conducted under controlled laboratory conditions. For
human facial emotion recognition, future work would be to
improve emotion recognition for non-frontal head poses in
presence of occlusions (i.e. emotion recognition in the wild).
It should be made possible to recognize emotions during
speech as well emotions with varying intensities. In the case
of facial expression generation in robots, robots are capable
of making the basic facial expressions. Few studies perform
autonomous facial generation in robots. In the future, there
could be studies comparing robotic facial expressions with
the robot’s bodily expressions and also with a combination
of facial and bodily expressions to see if there is any differ-
ence in recognizing these. Robots should be able to express
their emotion with partial bodily or facial gestures while
speaking. They should also be express their emotions with
various intensities instead of a single configuration per emo-
tion. Lastly, there is a need to go beyond the basic seven
expressions for both facial expression recognition and gen-
eration.

Acknowledgements The authors thank Vignesh Prasad for his insight-
ful comments.

Funding Open Access funding enabled and organized by Projekt
DEAL. This research was funded by the German Research Founda-
tion (DFG, Deutsche Forschungsgemeinschaft). The authors also thank
the ZEVEDI Hessen and the leap in time foundation for the grateful
funding of the project.

Declarations

Conflict of interest The authors declare that they have no conflict of
interest.

Ethical approval The authors declare that there are no compliance
issues with this research.

Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing, adap-
tation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indi-
cate if changes were made. The images or other third party material
in this article are included in the article’s Creative Commons licence,

unless indicated otherwise in a credit line to the material. If material
is not included in the article’s Creative Commons licence and your
intended use is not permitted by statutory regulation or exceeds the
permitted use, youwill need to obtain permission directly from the copy-
right holder. To view a copy of this licence, visit http://creativecomm
ons.org/licenses/by/4.0/.

References

1. Ahmed TU, Hossain S, Hossain MS, ul Islam R, Andersson K
(2019) Facial expression recognition using convolutional neural
network with data augmentation. In: 2019 joint 8th international
conference on informatics, electronics vision (ICIEV) and 2019
3rd international conference on imaging, vision pattern recogni-
tion (icIVPR), pp 336–341

2. Barros P, Weber C, Wermter S (2015) Emotional expression
recognitionwith a cross-channel convolutional neural network for
human–robot interaction. In: 2015 IEEE-RAS 15th international
conference on humanoid robots (humanoids), pp 582–587

3. Bavelas J, Gerwing J (2011) The listener as addressee in face-to-
face dialogue. Int J Listen 25:178–198

4. Beck A, Cañamero L, Hiolle A, Damiano L, Cosi P, Tesser F,
Sommavilla G (2013) Interpretation of emotional body language
displayed by a humanoid robot: a case study with children. Int J
Soc Robot 5(3):325–334

5. Becker-Asano C, Ishiguro H (2011) Evaluating facial displays of
emotion for the android robot geminoid f, pp 1–8

6. BengioY, Simard P, Frasconi P (1994) Learning long-term depen-
dencieswith gradient descent is difficult. IEEETransNeuralNetw
5(2):157–166

7. Bennett CC, Sabanovic S (2014) Deriving minimal features for
human-like facial expressions in robotic faces. Int J Soc Robot
6:367–381

8. Bera A, Randhavane T, Prinja R, Kapsaskis K, Wang A,
Gray K, Manocha D (2019) The emotionally intelligent robot:
improving social navigation in crowded environments. ArXiv
arXiv:1903.03217

9. Breazeal C (2003) Emotion and sociable humanoid robots. Int J
Hum Comput Stud 59(1–2):119–155

10. Breazeal C, Buchsbaum D, Gray J, Gatenby D, Blumberg B
(2005) Learning from and about others: towards using imitation
to bootstrap the social understanding of others by robots. Artif
Life 11:31–62. https://doi.org/10.1162/1064546053278955

11. Buciu I, Kotsia I, Pitas I (2005) Facial expression analysis under
partial occlusion, pp v/453 –v/456, vol 5

12. Byeon YH, Kwak KC (2014) Facial expression recognition using
3d convolutional neural network. Int J Adva Comput Sci Appl
5(12)

13. Canedo D (2019) Facial expression recognition using computer
vision: a systematic review. Appl Sci. https://doi.org/10.3390/
app9214678

14. Carcagnì P, Del Coco M, Leo M, Distante C (2015) Facial
expression recognition and histograms of oriented gradients: a
comprehensive study. Springerplus 4(1):645

15. Chen H, Gu Y, Wang F, ShengW (2018) Facial expression recog-
nition and positive emotion incentive system for human–robot
interaction. In: 2018 13th world congress on intelligent control
and automation (WCICA), pp 407–412

16. Chen X, Yang X, Wang M, Zou J (2017) Convolution neural
network for automatic facial expression recognition. In: 2017
international conference on applied system innovation (ICASI),
pp 814–817

17. Churamani N, Barros P, Strahl E, Wermter S (2018) Learning
empathy-driven emotion expressions using affective modulations

123

http://creativecommons.org/licenses/by/4.0/
http://creativecommons.org/licenses/by/4.0/
http://arxiv.org/abs/1903.03217
https://doi.org/10.1162/1064546053278955
https://doi.org/10.3390/app9214678
https://doi.org/10.3390/app9214678


International Journal of Social Robotics (2022) 14:1583–1604 1601

18. Cid F,Moreno J, Bustos P,Núñez P (2014)Muecas: amulti-sensor
robotic head for affective human robot interaction and imitation.
Sensors (Basel, Switzerland) 14:7711–7737

19. Cid F, Prado JA, Bustos P, Núñez P (2013) A real time and robust
facial expression recognition and imitation approach for affective
human–robot interaction using gabor filtering. In: 2013 IEEE/RSJ
international conference on intelligent robots and systems, pp
2188–2193 . https://doi.org/10.1109/IROS.2013.6696662

20. Cohen I (2010) Recognizing robotic emotions: facial versus body
posture expression and the effects of context and learning. Mas-
ter’s thesis

21. Corneanu CA, Simón MO, Cohn JF, Guerrero SE (2016) Survey
on rgb, 3d, thermal, and multimodal approaches for facial expres-
sion recognition: history, trends, and affect-related applications.
IEEE Trans Pattern Anal Mach Intell 38(8):1548–1568

22. Costa S, Soares F, Santos C (2013) Facial expressions and ges-
tures to convey emotions with a humanoid robot. In: International
conference on social robotics. Springer, pp 542–551

23. Dandıl E, Özdemir R (2019) Real-time facial emotion classifica-
tion using deep learning. Data Sci Appl 2(1):13–17

24. Datcu D, Rothkrantz L (2007) Facial expression recognition
in still pictures and videos using active appearance models. A
comparison approach, p 112. https://doi.org/10.1145/1330598.
1330717

25. Dautenhahn K (2007) Methodology & themes of human–robot
interaction: a growing research field. Int J Adv Robot Syst 4:15

26. Davis M (2018) Empathy: a social psychological approach
27. Deng J, Pang G, Zhang Z, Pang Z, Yang H, Yang G (2019) cgan

based facial expression recognition for human–robot interaction.
IEEE Access 7:9848–9859

28. deGraafM,Allouch S, VanDijk JA (2016) Long-term acceptance
of social robots in domestic environments: insights from a user’s
perspective

29. DhallA,GoeckeR,LuceyS,GedeonT (2011)Static facial expres-
sion analysis in tough conditions: data, evaluation protocol and
benchmark. In: 2011 IEEE international conference on computer
vision workshops (ICCV Workshops), pp 2106–2112

30. Ding H, Zhou SK, Chellappa R (2017) Facenet2expnet: regular-
izing a deep face recognition net for expression recognition. In:
2017 12th IEEE international conference on automatic face ges-
ture recognition (FG 2017), pp 118–126

31. Drolet A, Morris MW (2000) Rapport in conflict resolution:
accounting for how face-to-face contact fosters mutual cooper-
ation in mixed-motive conflicts. J Exp Soc Psychol 36:26–50

32. Elaiwat S, Bennamoun M, Boussaïd F (2016) A spatio-temporal
rbm-based model for facial expression recognition. Pattern
Recogn 49:152–161

33. Esfandbod A, Rokhi Z, Taheri A, Alemi M, Meghdari A (2019)
Human–robot interaction based on facial expression imitation. In:
2019 7th international conference on robotics and Mechatronics
(ICRoM), pp 69–73

34. Faria DR, Vieira M, Faria FCC, Premebida C (2017) Affective
facial expressions recognition for human–robot interaction. In:
2017 26th IEEE international symposium on robot and human
interactive communication (RO-MAN), pp 805–810

35. Feil-SeiferD,MatarićMJ (2011) Socially assistive robotics. IEEE
Robot Autom Mag 18(1):24–31

36. Ferreira PM,Marques F, Cardoso JS, Rebelo A (2018) Physiolog-
ical inspired deep neural networks for emotion recognition. IEEE
Access 6:53930–53943

37. Fix E (1951) Discriminatory analysis: nonparametric discrimina-
tion, consistency properties. USAF School of Aviation Medicine

38. GeS,WangC,HangC (2008)A facial expression imitation system
in human robot interaction

39. Gers FA, Schmidhuber J (2000) Recurrent nets that time and
count. In: Proceedings of the IEEE-INNS-ENNS international

joint conference on neural networks. IJCNN 2000. Neural com-
puting: new challenges and perspectives for the new millennium.
IEEE, vol 3, pp 189–194

40. Gogić I, Manhart M, Pandžić I, Ahlberg J (2018) Fast facial
expression recognition using local binary features and shallow
neural networks. Vis Comput. https://doi.org/10.1007/s00371-
018-1585-8

41. Gunes H, Piccardi M (2007) Bi-modal emotion recognition
from expressive face and body gestures. J Netw Comput Appl
30(4):1334–1345

42. Gunes H, Schuller B (2013) Categorical and dimensional affect
analysis in continuous input: current trends and future directions.
Image Vis Comput 31:120–136

43. Gunes H, Schuller B, Pantic M, Cowie R (2011) Emotion repre-
sentation, analysis and synthesis in continuous space: a survey.
Face Gesture 2011:827–834

44. Hamester D, Barros P, Wermter S (2015) Face expression recog-
nition with a 2-channel convolutional neural network, pp 1–8 .
https://doi.org/10.1109/IJCNN.2015.7280539

45. HazarM, Fendri E, HammamiM (2015) Face recognition through
different facial expressions. J Signal Process Syst. https://doi.org/
10.1007/s11265-014-0967-z

46. Hinton G, Salakhutdinov R (2006) Reducing the dimensionality
of data with neural networks. Science (New York, NY) 313:504–
7. https://doi.org/10.1126/science.1127647

47. Hochreiter S, Schmidhuber J (1997) Long short-term memory.
Neural Comput 9(8):1735–1780

48. Hoffman G, Breazeal C (2006) Robotic partners’ bodies and
minds: an embodied approach to fluid human–robot collabora-
tion. In: AAAI workshop—technical report

49. Hoffman G, Zuckerman O, Hirschberger G, Luria M, Shani Sher-
man T (2015) Design and evaluation of a peripheral robotic
conversation companion. In: Proceedings of the Tenth Annual
ACM/IEEE international conference on human–robot interaction,
HRI ’15, pp 3–10. Association for Computing Machinery, New
York, NY, USA. https://doi.org/10.1145/2696454.2696495

50. Horii T, Nagai Y, Asada M (2016) Imitation of human expres-
sions based on emotion estimation by mental simulation. Paladyn
J Behav Robot. https://doi.org/10.1515/pjbr-2016-0004

51. Hossain MS, Muhammad G (2017) An emotion recognition sys-
tem for mobile applications. IEEE Access 5:2281–2287

52. Hua W, Dai F, Huang L, Xiong J, Gui G (2019) Hero: human
emotions recognition for realizing intelligent internet of things.
IEEE Access 7:24321–24332

53. Huang Y, Yang J, Liao P, Pan J (2017) Fusion of facial expres-
sions and eeg for multimodal emotion recognition. Comput Intell
Neurosci 2017:1–8. https://doi.org/10.1155/2017/2107451

54. Ilic, D., Žužić, I., Brscic, D.: Calibrate my smile: robot learning
its facial expressions through interactive play with humans, pp
68–75 (2019)

55. Inthiam J, Hayashi E, Jitviriya W, Mowshowitz A (2019) Mood
estimation for human–robot interaction based on facial and bodily
expression using a hidden Markov model. In: 2019 IEEE/SICE
international symposium on system integration (SII). IEEE, pp
352–356

56. Inthiam J, Mowshowitz A, Hayashi E (2019) Mood perception
model for social robot based on facial and bodily expression using
a hidden Markov model. J Robot Mechatron 31:629–638

57. Jiang L, Cai Z, Wang D, Jiang S (2007) Survey of improving
k-nearest-neighbor for classification. In: Fourth international con-
ference on fuzzy systems and knowledge discovery (FSKD2007),
vol 1, pp 679–683

58. Kabir MH, Salekin MS, Uddin MZ, Abdullah-Al-Wadud M
(2017) Facial expression recognition from depth video with pat-
terns of oriented motion flow. IEEE Access 5:8880–8889

123

https://doi.org/10.1109/IROS.2013.6696662
https://doi.org/10.1145/1330598.1330717
https://doi.org/10.1145/1330598.1330717
https://doi.org/10.1007/s00371-018-1585-8
https://doi.org/10.1007/s00371-018-1585-8
https://doi.org/10.1109/IJCNN.2015.7280539
https://doi.org/10.1007/s11265-014-0967-z
https://doi.org/10.1007/s11265-014-0967-z
https://doi.org/10.1126/science.1127647
https://doi.org/10.1145/2696454.2696495
https://doi.org/10.1515/pjbr-2016-0004
https://doi.org/10.1155/2017/2107451


1602 International Journal of Social Robotics (2022) 14:1583–1604

59. Kanda T, Hirano T, Eaton D, Ishiguro H (2004) Interactive robots
as social partners and peer tutors for children: a field trial. Hum
Comput Interact (Special issues on human–robot interaction)
19:61–84

60. Kar NB, Babu KS, Jena SK (2017) Face expression recognition
using histograms of oriented gradients with reduced features. In:
Raman B, Kumar S, Roy PP, Sen D (eds) Proceedings of inter-
national conference on computer vision and image processing.
Springer, Singapore, pp 209–219

61. Kim DH, Jung S, An K, Lee H, Chung M (2006) Development of
a facial expression imitation system, pp 3107–3112

62. Kim J, Kim B, Roy PP, Jeong D (2019) Efficient facial expression
recognition algorithm based on hierarchical deep neural network
structure. IEEE Access 7:41273–41285

63. Kirgis FP, Katsos P, Kohlmaier M (2016) Collaborative robotics.
Springer, Cham, pp 448–453

64. Kishi T, Otani T, Endo N, Kryczka P, Hashimoto K, Nakata K,
Takanishi A (2012) Development of expressive robotic head for
bipedal humanoid robot. In: 2012 IEEE/RSJ international confer-
ence on intelligent robots and systems, pp 4584–4589

65. Kotsia I,NikolaidisN, Pitas I (2007) Facial expression recognition
in videos using a novel multi-class support vector machines vari-
ant. In: 2007 IEEE international conference on acoustics, speech
and signal processing–ICASSP ’07, vol 2, pp II-585–II-588

66. Kotsia I, Pitas I (2007) Facial expression recognition in image
sequences using geometric deformation features and support vec-
tor machines. IEEE Trans Image Process 16(1):172–187

67. Kozima H, Nakagawa C, Yasuda Y (2005) Interactive robots for
communication-care: a case-study in autism therapy. In: ROMAN
2005. In: IEEE international workshop on robot and human inter-
active communication, pp 341–346

68. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature
521(7553):436–444

69. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hub-
bardW, JackelLD (1989)Backpropagation applied to handwritten
zip code recognition. Neural Comput 1(4):541–551

70. LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional net-
works and applications in vision. In: Proceedings of 2010 IEEE
international symposium on circuits and systems, pp 253–256

71. Li S, DengW (2020)Deep facial expression recognition: a survey.
IEEE Trans Affect Comput, pp 1–1

72. Li TS, Kuo P, Tsai T, Luan P (2019) Cnn and lstm based facial
expression analysis model for a humanoid robot. IEEE Access
7:93998–94011

73. Li Y, Hashimoto M (2011) Effect of emotional synchronization
using facial expression recognition in human–robot communica-
tion

74. Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial
expression recognition using cnnwith attentionmechanism. IEEE
Trans Image Process, pp 1–1 (2018)

75. Li Y, Zeng J, Shan S, Chen X (2018) Patch-gated cnn for
occlusion-aware facial expression recognition. In: 201824th inter-
national conference on pattern recognition (ICPR), pp 2209–2214

76. Liang D, Liang H, Yu Z, Zhang Y (2019) Deep convolutional bil-
stm fusion network for facial expression recognition. Vis Comput
36:499–508

77. Liliana DY, Basaruddin C, Widyanto MR (2017) Mix emotion
recognition from facial expression using svm-crf sequence classi-
fier. In: Proceedings of the international conference on algorithms,
computing and systems, ICACS ’17. Association for Computing
Machinery, New York, NY, USA, pp 27–31

78. Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of
recurrent neural networks for sequence learning. arXiv preprint
arXiv:1506.00019

79. Liu K, Hsu C,WangW, Chiang H (2019) Real-time facial expres-
sion recognition based on cnn. In: 2019 international conference

on system science and engineering (ICSSE), pp 120–123 . https://
doi.org/10.1109/ICSSE.2019.8823409

80. Liu P,ChooKKR,WangL,HuangF (2017) Svmor deep learning?
A comparative study on remote sensing image classification. Soft
Comput 21(23):7053–7065

81. Liu ZT, Wu M, Cao W, Chen LF, Xu J, Zhang R, Zhou M, Mao
J (2017) A facial expression emotion recognition based human–
robot interaction system. IEEE/CAA J Autom Sin 4:668–676.
https://doi.org/10.1109/JAS.2017.7510622

82. Lopez-Rincon A (2019) Emotion recognition using facial expres-
sions in children using the nao robot. In: 2019 international con-
ference on electronics, communications and computers (CONI-
ELECOMP), pp 146–153

83. Ma F, Zhang W, Li Y, Huang SL, Zhang L (2020) Learning
better representations for audio-visual emotion recognition with
common information. Appl Sci 10:7239. https://doi.org/10.3390/
app10207239

84. Maeda Y, Geshi S (2018) Human–robot interaction using Marko-
vian emotional model based on facial recognition. In: 2018 Joint
10th international conference on soft computing and intelligent
systems (SCIS) and 19th international symposium on advanced
intelligent systems (ISIS). IEEE, pp 209–214

85. MannanMA, LamA, Kobayashi Y, Kuno Y (2015) Facial expres-
sion recognition based on hybrid approach. In: Huang DS, Han K
(eds)Adv Intell ComputTheorAppl. Springer, Cham, pp 304–310

86. Marmpena M, Lim A, Dahl TS, Hemion N (2019) Generating
robotic emotional body language with variational autoencoders.
In: 2019 8th international conference on affective computing and
intelligent interaction (ACII). IEEE, pp 545–551

87. Martin C,Werner U, Gross H (2008) A real-time facial expression
recognition system based on active appearance models using gray
images and edge images. In: 2008 8th IEEE international confer-
ence on automatic face gesture recognition, pp 1–6 . https://doi.
org/10.1109/AFGR.2008.4813412

88. Martinez B, Valstar MF, Jiang B, Pantic M (2019) Automatic
analysis of facial actions: a survey. IEEE Trans Affect Comput
10(3):325–347

89. Mayya V, Pai RM, Pai MMM (2016) Automatic facial expression
recognition using DCNN. Proc Comput Sci 93:453–461. https://
doi.org/10.1016/j.procs.2016.07.233

90. McColl D, Nejat G (2014) Recognizing emotional body language
displayed by a human-like social robot. Int J Soc Robot 6(2):261–
280

91. Meghdari A, Shouraki S, Siamy A, Shariati A (2016) The real-
time facial imitation by a social humanoid robot

92. Mehrabian A (1968) Communication without words. Psychol
Today 2:53–56

93. Meng Z, Liu P, Cai J, Han S, Tong Y (2017) Identity-aware con-
volutional neural network for facial expression recognition. In:
2017 12th IEEE international conference on automatic face ges-
ture recognition (FG 2017), pp 558–565 (2017)

94. Minaee S, Abdolrashidi A (2019) Deep-emotion: facial expres-
sion recognition using attentional convolutional network. arXiv
preprint arXiv:1902.01019

95. MistryK, Zhang L, Neoh SC, LimCP, Fielding B (2017) Amicro-
GA embedded PSO feature selection approach to intelligent facial
emotion recognition. IEEE Trans Cybern 47(6):1496–1509

96. Moeini A, Moeini H, Faez K (2014) Pose-invariant facial expres-
sion recognition based on 3d face reconstruction and synthesis
from a single 2d image. In: 2014 22nd international conference
on pattern recognition, pp 1746–1751. https://doi.org/10.1109/
ICPR.2014.307

97. Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper
in facial expression recognition using deep neural networks. In:
2016 IEEE winter conference on applications of computer vision
(WACV), pp 1–10 (2016)

123

http://arxiv.org/abs/1506.00019
https://doi.org/10.1109/ICSSE.2019.8823409
https://doi.org/10.1109/ICSSE.2019.8823409
https://doi.org/10.1109/JAS.2017.7510622
https://doi.org/10.3390/app10207239
https://doi.org/10.3390/app10207239
https://doi.org/10.1109/AFGR.2008.4813412
https://doi.org/10.1109/AFGR.2008.4813412
https://doi.org/10.1016/j.procs.2016.07.233
https://doi.org/10.1016/j.procs.2016.07.233
http://arxiv.org/abs/1902.01019
https://doi.org/10.1109/ICPR.2014.307
https://doi.org/10.1109/ICPR.2014.307


International Journal of Social Robotics (2022) 14:1583–1604 1603

98. Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Mul-
timodal deep learning. In: ICML

99. Nicolescu MN, Mataric MJ (2001) Learning and interacting in
human–robot domains. IEEE Trans Syst Man Cybern Part A Syst
Hum 31(5):419–430

100. NunesARV (2019)Deep emotion recognition through upper body
movements and facial expression. Master’s thesis, Aalborg Uni-
versity

101. Nwosu L,Wang H, Lu J, Unwala I, Yang X, Zhang T (2017) Deep
convolutional neural network for facial expression recognition
using facial parts. In: 2017 IEEE 15th international conference
on dependable, autonomic and secure computing, 15th interna-
tional conference on pervasive intelligence and computing, 3rd
international conference on big data intelligence and computing
and cyber science and technology congress, pp 1318–1321

102. Park JW, Lee H, Chung M (2014) Generation of realistic robot
facial expressions for human robot interaction. J Intell Robot Syst
78:443–462

103. Prajapati S, Shrinivasa Naika CL, Jha S, Nair S (2013) On ren-
dering emotions on a robotic face, pp 1–7

104. Rabiner LR (1989) A tutorial on hidden Markov models and
selected applications in speech recognition. Proc IEEE77(2):257–
286

105. Ray C, Mondada F, Siegwart R (2008) What do people expect
from robots? pp 3816–3821

106. Ringeval F, Eyben F, Kroupi E, Yuce A, Thiran JP, Ebrahimi T,
Lalanne D, Schuller B (2015) Prediction of asynchronous dimen-
sional emotion ratings from audiovisual and physiological data.
Pattern Recogn Lett 66:22–30. https://doi.org/10.1016/j.patrec.
2014.11.007 (Pattern Recognition in Human Computer Inter-
action)

107. Ringeval F, SchullerB,ValstarM, Jaiswal S,Marchi E, LalanneD,
CowieR, PanticM (2015)Av+ ec 2015–the first affect recognition
challenge bridging across audio, video, and physiological data

108. Romero P, Cid F, Núnez P (2013) A novel real time facial expres-
sion recognition systembased on candide-3 reconstructionmodel.
In: Proceedings of the XIV workshop on physical agents (WAF
2013), Madrid, Spain, pp 18–19

109. Rouast PV, Adam MTP, Chiong R (2019) Deep learning for
human affect recognition: insights and new developments. ArXiv
arXiv:1901.02884

110. Ruiz-Garcia A, ElshawM, Altahhan A, Palade V (2018) A hybrid
deep learning neural approach for emotion recognition from facial
expressions for socially assistive robots. Neural Comput Appl.
https://doi.org/10.1007/s00521-018-3358-8

111. Saerbeck M, Bartneck C (2010) Perception of affect elicited by
robot motion, pp. 53–60

112. Sariyanidi E, Gunes H, Cavallaro A (2015) Automatic analysis of
facial affect: a survey of registration, representation, and recogni-
tion. IEEE Trans Pattern Anal Mach Intell 37(6):1113–1133

113. Saxena S, Tripathi S, Sudarshan TSB (2019) Deep dive into faces:
Pose illumination invariant multi-face emotion recognition sys-
tem. In: 2019 IEEE/RSJ international conference on intelligent
robots and systems (IROS), pp. 1088–1093 . https://doi.org/10.
1109/IROS40897.2019.8967874

114. Shi Y, Chen Y, Ardila LR, Venture G, Bourguet ML (2019) A
visual sensing platform for robot teachers. In: Proceedings of the
7th international conference on human–agent interaction, pp 200–
201

115. Sikka K, Dhall A, Bartlett M (2015) Exemplar hidden Markov
models for classification of facial expressions in videos. In: Pro-
ceedings of the IEEE conference on computer vision and pattern
recognition workshops, pp 18–25

116. Simul NS, Ara NM, Islam MS (2016) A support vector machine
approach for real time vision based human robot interaction. In:

2016 19th international conference on computer and information
technology (ICCIT), pp 496–500

117. Srivastava N, Salakhutdinov RR (2012)Multimodal learningwith
deep Boltzmann machines. In: Pereira F, Burges CJC, Bottou L,
Weinberger KQ (eds) Advances in neural information processing
systems, vol 25. Curran Associates Inc, Red Hook, pp 2222–2230

118. Stock R, Merkle M (2018) Can humanoid service robots perform
better than service employees? a comparison of innovative behav-
ior cues. https://doi.org/10.24251/HICSS.2018.133

119. Stock R, Nguyen MA (2019) Robotic psychology what do we
know about human–robot interaction and what do we still need to
learn?

120. Stock RM (2016) Emotion transfer from frontline social robots to
human customers during service encounters: testing an artificial
emotional contagion modell. In: ICIS

121. Stock RM, Merkle M (2017) A service robot acceptance model:
user acceptance of humanoid robots during service encounters.
In: IEEE international conference on pervasive computing and
communications workshops (PerCom Workshops), pp 339–344 .
https://doi.org/10.1109/PERCOMW.2017.7917585

122. Stock-Homburg R (2021) Survey of emotions in human–robot
interaction—after 20 years of research: What do we know and
what have we still to learn? Int J Soc Robot

123. Sukhbaatar S, Makino T, Aihara K, Chikayama T (2011) Robust
generation of dynamical patterns in humanmotion by a deep belief
nets. In: Asian conference on machine learning, pp 231–246

124. Taira H, Haruno M (1999) Feature selection in SVM text catego-
rization. In: AAAI/IAAI, pp 480–486

125. Tanaka F, Cicourel A, Movellan J (2007) Socialization between
toddlers and robots at an early childhood education center. Proc
Natl Acad Sci USA 104:17954–8

126. Uddin MZ, Hassan MM, Almogren A, Alamri A, Alrubaian M,
Fortino G (2017) Facial expression recognition utilizing local
direction-based robust features and deep belief network. IEEE
Access 5:4525–4536

127. UddinMZ,KhaksarW,Torresen J (2017) Facial expression recog-
nition using salient features and convolutional neural network.
IEEE Access 5:26146–26161

128. Vapnik VN (1995) The nature of statistical learning theory.
Springer, NewYork

129. Vapnik VN (1999) An overview of statistical learning theory.
IEEE Trans Neural Netw 10(5):988–999

130. Vithanawasam T, Madhusanka A (2019) Face and upper-body
emotion recognition using service robot’s eyes in a domestic envi-
ronment, pp 44–50

131. WangK, PengX,Yang J,MengD,QiaoY (2020)Region attention
networks for pose and occlusion robust facial expression recog-
nition. IEEE Trans Image Process 29:4057–4069

132. Wang Q, Ju S (2008) A mixed classifier based on combination
of HMM and KNN. In: 2008 fourth international conference on
natural computation, vol 4, pp 38–42

133. Webb N, Ruiz-Garcia A, Elshaw M, Palade V (2020) Emotion
recognition from face images in an unconstrained environment
for usage on social robots. In: 2020 international joint conference
on neural networks (IJCNN), pp. 1–8

134. WimmerM,MacDonald BA, Jayamuni D, Yadav A (2008) Facial
expression recognition for human–robot interaction—aprototype.
In: SommerG,Klette R (eds) RobotVis. Springer, Berlin, pp 139–
152

135. Wu C, Wang S, Ji Q (2015) Multi-instance hidden Markov model
for facial expression recognition. In: 2015 11th IEEE interna-
tional conference and workshops on automatic face and gesture
recognition (FG), vol 1, pp 1–6

136. Wu M, Su W, Chen L, Liu Z, Cao W, Hirota K (2019) Weight-
adapted convolution neural network for facial expression recog-

123

https://doi.org/10.1016/j.patrec.2014.11.007
https://doi.org/10.1016/j.patrec.2014.11.007
http://arxiv.org/abs/1901.02884
https://doi.org/10.1007/s00521-018-3358-8
https://doi.org/10.1109/IROS40897.2019.8967874
https://doi.org/10.1109/IROS40897.2019.8967874
https://doi.org/10.24251/HICSS.2018.133
https://doi.org/10.1109/PERCOMW.2017.7917585


1604 International Journal of Social Robotics (2022) 14:1583–1604

nition in human–robot interaction. IEEE Trans Syst Man Cybern
Syst

137. Yaddaden Y, Bouzouane A, Adda M, Bouchard B (2016) A new
approach of facial expression recognition for ambient assisted liv-
ing. In: Proceedings of the 9th ACM international conference on
PErvasive technologies related to assistive environments, PETRA
’16. Association for Computing Machinery, New York, NY, USA

138. Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolu-
tional neural networks: an overview and application in radiology.
Insights Imaging 9(4):611–629

139. Yang B, Cao J, Ni R, Zhang Y (2018) Facial expression recog-
nition using weighted mixture deep neural network based on
double-channel facial images. IEEE Access 6:4630–4640

140. Yang H, Yin L (2017) CNN based 3d facial expression recog-
nition using masking and landmark features. In: 2017 seventh
international conference on affective computing and intelligent
interaction (ACII), pp 556–560

141. Yoo B, Cho S, Kim J (2011) Fuzzy integral-based composite
facial expression generation for a robotic head. In: 2011 IEEE
international conference on fuzzy systems (FUZZ-IEEE 2011),
pp 917–923

142. Yu C, Tapus A (2019) Interactive robot learning for multimodal
emotion recognition. In: SalichsMA,GeSS,BarakovaEI, Cabibi-
han JJ, Wagner AR, Castro-González Á, He H (eds) Social
robotics. Springer, Cham, pp 633–642

143. Zhang F, Zhang T, Mao Q, Xu C (2018) Joint pose and expression
modeling for facial expression recognition. In: 2018 IEEE/CVF
conference on computer vision and pattern recognition, pp 3359–
3368 . https://doi.org/10.1109/CVPR.2018.00354

144. Zhang F, Zhang T, Mao Q, Xu C (2020) Geometry guided pose-
invariant facial expression recognition. IEEETrans Image Process
29:4445–4460. https://doi.org/10.1109/TIP.2020.2972114

145. ZhangK, HuangY,DuY,Wang L (2017) Facial expression recog-
nition based on deep evolutional spatial-temporal networks. IEEE
Trans Image Process 26(9):4193–4203

146. Zhang Z, Luo P, Loy CC, Tang X (2016) From facial expression
recognition to interpersonal relation prediction. Int J Comput Vis

147. Zhao L, Wang Z, Zhang G (2017) Facial expression recogni-
tion from video sequences based on spatial-temporal motion local
binary pattern andGabormultiorientation fusion histogram.Math
Probl Eng

Publisher’s Note Springer Nature remains neutral with regard to juris-
dictional claims in published maps and institutional affiliations.

123

https://doi.org/10.1109/CVPR.2018.00354
https://doi.org/10.1109/TIP.2020.2972114

	Facial Emotion Expressions in Human–Robot Interaction: A Survey
	Abstract
	1 Introduction
	2 Framework of the Overview
	3 Method
	4 Recognition of Human Facial Expressions
	4.1 FER on Predefined Dataset
	4.2 FER in Real-Time

	5 Facial Emotion Expression by Robots
	5.1 Facial Expression Generation is Hand-Coded
	5.2 Facial Expression Generation is Automated

	6 Discussion
	6.1 Summary of the State of the Art
	6.2 Future Research

	7 Conclusion
	Acknowledgements
	References