International Journal of Social Robotics (2022) 14:1583–1604 https://doi.org/10.1007/s12369-022-00867-0 Facial Emotion Expressions in Human–Robot Interaction: A Survey Niyati Rawal1 · Ruth Maria Stock-Homburg1 Accepted: 4 January 2022 / Published online: 24 June 2022 © The Author(s) 2022 Abstract Facial expressions are an ideal means of communicating one’s emotions or intentions to others. This overview will focus on human facial expression recognition as well as robotic facial expression generation. In the case of human facial expression recognition, both facial expression recognition on predefined datasets as well as in real-time will be covered. For robotic facial expression generation, hand-coded and automated methods i.e., facial expressions of a robot are generated by moving the features (eyes, mouth) of the robot by hand-coding or automatically using machine learning techniques, will also be covered. There are already plenty of studies that achieve high accuracy for emotion expression recognition on predefined datasets, but the accuracy for facial expression recognition in real-time is comparatively lower. In the case of expression generation in robots, while most of the robots are capable of making basic facial expressions, there are not many studies that enable robots to do so automatically. In this overview, state-of-the-art research in facial emotion expressions during human–robot interaction has been discussed leading to several possible directions for future research. Keywords Facial emotion recognition · Facial emotion expressions · Human–robot interaction · Survey · Overview 1 Introduction Robots are no longer just machines being used in factories and industries. There is a growing need and demand towards robots sharing space with humans as collaborative robotics or assistive robotics [35,63]. Robots are, now, increasingly being deployed in a variety of domains as receptionists [120], educational tutors [49,59], household supporters [111] and caretakers [25,49,67,125]. Thus, there is a need for these social robots to effectively interact with humans, both ver- bally and non-verbally. Facial expressions are non-verbal signals that can be used to indicate one’s current status in a conversation, e.g., via backchanneling or rapport [3,31]. Perceived sociability is an important aspect in human– robot interaction (HRI) and users want robots to behave in a friendly and emotionally intelligent manner [28,48,99,105]. For social robots to be more anthropomorphic and for human–robot interaction to bemore like human-human inter- action (HHI), robots need to be able to understand human B Ruth Maria Stock-Homburg RSH@tu-darmstadt.de Niyati Rawal niyati.rawal@tu-darmstadt.de 1 Technical University Darmstadt, Hochschulstr. 1, 64295 Darmstadt, Germany emotions and appropriately respond to those human emo- tions. Stock and Merkle show that emotional expressions of anthropomorphic robots become increasingly important in business settings as well [118,121]. The authors of [119] emphasize that robotic emotions are particularly important for the acceptance of a robot by the user. Thus, emotions are pivotal for HRI [122]. In any interaction, 7% of the affective information is conveyed through words, 38% is conveyed through tone, and 55% is conveyed through facial expres- sions [92]. This makes facial expressions an indispensable mode of affective communication. Accordingly, numerous studies have examined facial expressions of emotions during HRI [e.g.2,8,15,17–19,33,38,50,81,81,91,91,110,116]. In any HHI, human beings first infer the emotional state of the other person and then accordingly generate facial expres- sions in response to their peer. The generated emotion could be a result of parallel empathy (generating the same emo- tion as the peer) or reactive empathy (generating emotion in response to the peer’s emotion) [26]. Similarly, in the case of HRI, we would like to study robots recognizing human emo- tion as well as robots generating their emotion as a response to human emotion. There has been a growth in the number of papers on facial expressions in HRI in the last decade. Between 2000 and 2020 (see Fig. 1), there has been a gradual increase in 123 http://crossmark.crossref.org/dialog/?doi=10.1007/s12369-022-00867-0&domain=pdf http://orcid.org/0000-0002-8576-5883 1584 International Journal of Social Robotics (2022) 14:1583–1604 Fig. 1 Publications on emotion recognition of human faces during HRI and generation of facial expressions of robots the number of publications. Thus, the overarching research question is: What has been done so far on facial emotion expressions in human–robot interaction, and what still needs to be done? In Sect. 2 the framework of the overview is outlined, followed by the method of selection of studies in Sect. 3. Recognition of human facial expressions and generation of facial expressions by robots are covered in Sects. 4 and 5. The current state of the art and future research are discussed in Sect. 6 with the conclusion in Sect. 7. 2 Framework of the Overview This overview focuses on two aspects: (1) recognition of human facial expressions and (2) generation of facial expres- sions by robots. The review framework (Fig. 3) is based on these two streams. (1) Recognition of human facial expressions is further subdivided depending on whether the recognition takes place on (a) a predefined dataset or in (b) real-time. (2) Generation of facial expressions by robots is also subdivided depending on whether the facial generation is (a) hand-coded or (b) automated, i.e., facial expressions of a robot are generated by moving the features (eyes, mouth) of the robot by hand-coding or automatically using machine learning techniques. 3 Method Studies with the keywords “facial expression recognition AND human–robot interaction / HRI”, ”facial expression recognition” and ”facial expression generationANDhuman– robot interaction / HRI” between 2000 and 2020 were reviewed on Google Scholar. In this overview, studies that use voice or body gestures as a modality for emotional expression but do not involve facial expressions are not included. Studies that involve HRI with humans havingmental disorders like autism are also not included. Furthermore, studies that work on single emotion such as recognition of smile or facial expression generation of anger are not included. In total, 175 studies of 276 were rejected (Fig. 2). In Table 3, various studies on facial expression recog- nition are listed. Here, studies with an accuracy of greater than 90% for facial expression recognition on predefined datasets are selected. For real-time facial expression recog- nition, all studies that perform facial expression recognition in a human–robot interaction scenario are listed. 4 Recognition of Human Facial Expressions Earlier, facial expression recognition (FER) consisted of the following steps: detection of face, image pre-processing, extraction of important features and classification of expres- sion (Fig. 4). As deep learning algorithms have become popular, the pre-processed image is directly fed into deep networks (like CNN, RNN etc.) to predict an output [71] (Fig. 5). In the machine learning algorithms, Viola Jones algo- rithm and OpenCV were popular choices for face detection. However, dlib face detector and ADABOOST algorithm 123 International Journal of Social Robotics (2022) 14:1583–1604 1585 Fig. 2 Flowchart of the literature screening process Fig. 3 Framework of the overview 123 1586 International Journal of Social Robotics (2022) 14:1583–1604 Fig. 4 Process of facial expression recognition in machine learning (adapted from Canedo and Neves [13]) Fig. 5 Process of facial expression recognition in deep leaning (adapted from Li and Deng [71]) were also used. To pre-process the images, greyscale con- version, image normalization, image augmentation (such as flip, zoom, rotate etc.) were usually applied. Further, some studies extract the important regions in faces like eyebrows, eyes, nose andmouth (also known as the acting units or AUs) that play an important role in FER. Others use local binary pattern (LBP) or histogram of oriented gradients (HOG) to extract the featural information. Finally, the classification is performed. Most of the studies perform classification for the six universally known emotions (happy, sad, disgust, anger, fear and surprise) and sometimes include a neutral expres- sion. For final classification, k-Nearest Neighbor (KNN), Hidden Markov Model (HMM), Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), Support Vec- tor Machine (SVM) and Long Short-TermMemory (LSTM) are used. In the deep learning algorithms, the input images are first pre-processed by performing face alignment, data augmen- tation and normalization. Then the images are directly fed into deep networks like CNN, RNN etc. which predict the emotion of the images. The most commonly used classifica- tion methods are explained in more detail below. They are arranged in the order in which they were invented. KNN: Nearest neighbor based classifier was first invented in the 1950s [37]. In KNN [57], given the training instances and the class labels, the class label of an unknown instance is predicted. KNN is based on a distance function that measures the difference between two instances. While the Euclidean distance formula is mostly used, there are also other distance formulae such as Hamming distance which can be used. HMM: An HMM [104] was introduced in the late 1960s. It is a doubly embedded stochastic process, bearing a hid- den stochastic process (a Markov chain) that is only visible through another stochastic process, producing a sequence of observations. The state sequence can be learned usingViterbi algorithm or Expectation-Modification (EM) algorithm. RNN: RNN [78] was introduced in the 1980s. RNN is a feed-forward neural network that has an edge over adja- cent time steps, introducing a notion of time. Hence, RNN is mainly used for a dynamic data input that has a temporal sequence. In RNN, a state depends upon the current input as well as the state of the network at the previous time step, making it possible to contain information from a long time window. CNN:ConvolutionalNetworks [70]were invented in 1989 [69]. CNNs are trainable multistage architectures composed of multiple stages. The input and output of each stage are sets of arrays called feature maps. Each stage of CNN is composed of three layers- a filter bank layer, a non-linearity layer and a feature pooling layer. The network is trained using the backpropagation method. They are used for end-to-end recognition wherein given the input image, the output is pre- dicted by CNNs. They are even used as feature extractors which are further connected with neural networks layers like LSTM or RNN for the prediction. SVM: SVMwas invented byVapnik [128]. In SVM [129], the training data can be separated by a hyperplane. LSTM: LSTM [39] was invented by Hochreiter and Schmidhuber [47]. It also has recurrent connections but 123 International Journal of Social Robotics (2022) 14:1583–1604 1587 unlike RNN, it is capable of learning long-term dependen- cies. Table 1 summarizes the major purpose, application areas, advantages, disadvantages and frequency of use for com- monly used algorithms. For the frequency of use, only the number of papers that implement facial expression recog- nition during HRI or in real-time scenarios were counted. AlthoughRNNhas not been used for facial expression recog- nition duringHRI or in real-time, some studies perform facial expression recognition on predefined datasets using RNN. 27 studies on facial expression recognition during HRI were reviewed. Some of the studies have not been per- formed on a robot platform. These studies perform emotion recognition in real-time and mention HRI as their intended application. The studies are summarized in Table 2. Here, studies that perform facial expression recognition on pre- defined datasets or studies that perform facial expression recognition but not in real-time were not included. 4.1 FER on Predefined Dataset Although the goal of this study is to perform FER in real-time and during HRI, the studies on real-time FER are compared with FER on predefined datasets. FER has been carried out on static human images as well as on dynamic human video clips.While some studies, perform facial recog- nition on still images, others perform facial recognition on videos. In Datcu and Rothkrantz [24], they show that there is an advantage in using data from video frames over still images. This is because videos contain temporal information that is absent in still images. Results of studies with above 90% accuracy in FER on still images are summarized in Table 3a and on videos are summarized in Table 3b. Table 3a, b are for comparison with Table 3c. Studies are arranged according to their accuracy level. It should be noted that these studies are carried out on predefined datasets consisting of human images and videos and do not involve robots. There are a considerable number of studies that achieve accuracy greater than 90% on CK+, Jaffe andOulu-Casia datasets on both still images and videos. 4.2 FER in Real-Time It is easier to achieve high accuracy while performing emo- tion recognition on predefined datasets as they are recorded under controlled environmental conditions. On the other hand, it is difficult to achieve the same level of accuracy when performing emotion recognition in real-time when the movements are spontaneous. It should be noted that studies that perform facial expression recognition in real-time were carried out under controlled laboratory conditions with little variation in lighting conditions and head poses. As this study is about facial expressions inHRI, for a robot to be able to recognize emotion, emotion recognition has to be performed in real-time. Table 3c provides studies with facial expression recognition in real-time for HRI. Here, the accuracies are comparatively lower than the accuracies for predefined datasets. As can be seen in Table 3c, only two studies have an accuracy greater than 90%. The robots that are used in the studies are either robotic heads or humanoid robots such as Pepper, Nao, iCub etc. Many studies that per- form facial expression recognition in real-time use CNNs, making it a popular choice for facial expression recognition [2,2,8,15,133]. However, the highest accuracy is achieved by Bayesian and Artificial Neural Network (ANN) methods for facial expression recognition in real-time. 5 Facial Emotion Expression by Robots For robots to be empathic, it is necessary that the robots not only be able to recognize human emotions but also be able to generate emotions using facial expressions. Several studies enable robots to generate facial expressions either in a hand- coded or an automated manner (Fig. 6). By hand-coded, we mean that the facial expressions are coded bymoving the eyes andmouth of the robot in a desirousmanner, and automated is when the emotions are learned automatically using machine learning techniques. 16 studies on facial emotion expression in robots were reviewed. These studies are summarized in Table 4. 5.1 Facial Expression Generation is Hand-Coded Earlier studies started by hand-coding the facial expressions in robots. There is a static as well as dynamic generation of facial expressions on robots. Among the static methods, there is a humanoid social robot “Alice” that imitates human facial expressions in real- time [91]. Kim et al. [61] introduced an artificial facial expression imitation system using a robot head, Ulkni. As Ulkni is composed of 12 RC servos, with four Degrees of Freedom (DoFs) to control its gaze direction, two DoFs for its neck, and six DoFs for its eyelids and lips, it is capa- ble of making the basic facial expressions after the position commands for actuators are sent from the PC. Bennett and Sabanovic [7] identified minimal features, i.e. movement of eyes, eyebrows,mouth and neck, which are sufficient to iden- tify the facial expression. In this study, the main program called functions that spec- ified facial expressions according to the direction (used to make or undo an expression) and degree (strength of the expression–i.e. smaller vs. larger). The facial expression functions would in turn call lower functions that moved specific facial components given a direction and degree, fol- 123 1588 International Journal of Social Robotics (2022) 14:1583–1604 Ta bl e 1 D et ai ls ab ou tt he co m m on ly us ed al go ri th m s A lg or ith m (y ea r) M aj or Pu rp os e A pp lic at io n ar ea s A dv an ta ge s D is ad va nt ag es Fr eq ue nc y of us e K N N (1 95 0s ) St or in g of al l av ai la bl e ca se s an d cl as si fie s ne w in st an ce s by m ea - su ri ng th e si m ila ri ty or di ff er en ce be tw ee n tw o in st an ce s us in g a di s- ta nc e fu nc tio n [5 7] C la ss ifi ca tio n of fa ci al ex pr es si on s N o tr ai ni ng re qu ir ed be fo re m ak in g pr ed ic tio ns G re at co m pu ta tio na l co m pl ex ity , be ca us e th e di st an ce be tw ee n ev er y sa m pl e sh ou ld be ca lc ul at ed in or de r to cl as si fy [1 32 ] 1 H M M (1 96 0s ) G iv en a se qu en ce of ob se rv at io ns , de co de th e hi dd en st at es [1 04 ] Pe rf or m FE R ba se d on dy na m ic da ta in pu ti .e .v id eo s C ap tu re th e de pe nd en ci es be tw ee n co ns ec ut iv e m ea su re m en ts In fo rm at io n fr om st at es in th e pr e- ce di ng tim e st ep s (n ot th e pr ev io us tim e st ep ) ca nn ot be ca pt ur ed 1 R N N (1 98 0s ) L ea rn te m po ra ld ep en de nc ie s [7 8] Pe rf or m FE R ba se d on dy na m ic da ta in pu ti .e .v id eo s U se fu li n m od el lin g se qu en tia ld at a D if fic ul t to le ar n lo ng -t er m te m - po ra l de pe nd en ci es as gr ad ie nt s ex pl od e or va ni sh ov er m an y tim e st ep s [6 ] – C N N (1 98 9) G iv en a se to f im ag es ,e xt ra ct s fe a- tu ra li nf or m at io n su ch as ed ge s an d pe rf or m s th e cl as si fic at io n ta sk [7 0] L ea rn th e sp at ia l fe at ur es in an im ag e i.e . pe rf or m FE R ba se d on st at ic da ta in pu t E as ie rt o tr ai n an d ge ne ra liz es m uc h be tte r th an ne tw or ks w ith fu ll co n- ne ct iv ity be tw ee n ad ja ce nt la ye rs [6 8] R eq ui re s a lo t of da ta to pr ev en t ov er -fi tti ng [1 38 ] 13 SV M (1 99 5) A su pe rv is ed le ar ni ng al go ri th m th at ca n be us ed fo rc la ss ifi ca tio n or re gr es si on on sm al ld at as et s [1 29 ] C la ss ifi ca tio n of fa ci al ex pr es si on s R ob us tt o ov er -fi tti ng [1 24 ] D oe s no t pe rf or m w el l w ith la rg e tr ai ni ng sa m pl es [8 0] 3 L ST M (1 99 7) L ea rn lo ng -t er m te m po ra l de pe n- de nc ie s [3 9] Pe rf or m FE R ba se d on dy na m ic da ta in pu ti .e .v id eo s A bi lit y to le ar n lo ng -t er m te m po ra l de pe nd en ci es [4 7] C om pu ta tio na lly ex pe ns iv e to tr ai n 1 K N N k- ne ar es tn ei gh bo r, H M M hi dd en m ar ko v m od el ,R N N re cu rr en tn eu ra ln et w or k, C N N co nv ol ut io na ln eu ra ln et w or k, SV M su pp or tv ec to r m ac hi ne ,L ST M lo ng sh or t- te rm m em or y 123 International Journal of Social Robotics (2022) 14:1583–1604 1589 Ta bl e 2 D et ai le d in fo rm at io n ab ou ts tu di es on em ot io n re co gn iti on an d H um an R ob ot In te ra ct io n (H R I) R ef er en ce s R ec og ni tio n m od e R ec og ni ze d em ot io ns R ob ot A lg or ith m ty pe M aj or fin di ng s A hm ed et al .[ 1] Fa ce A ng ry ,d is gu st ,f ea r, ha pp y, ne ut ra l, sa d, su rp ri se – C N N w ith da ta au gm en ta tio n T he m od el ac hi ev ed an ac cu ra cy of m or e th an 90 % fo r ea ch em ot io n as it co ul d cl as si fy ge om et ri ca lly di s- pl ac ed fa ci al im ag es B ar ro s et al .[ 2] Fa ce Po si tiv e, ne ga tiv e, ne ut ra l iC ub C N N T he ne tw or k is ab le to re co g- ni ze em ot io ns fr om di ff er en t en vi - ro nm en ts , di ff er en t su bj ec ts pe r- fo rm in g sp on ta ne ou s ex pr es si on s, an d in re al -t im e B er a et al .[ 8] Fa ce H ap py ,s ad ,a ng ry ,n eu tr al Pe pp er B ay es ia n in fe re nc e an d C N N A m ul ti- ch an ne l m od el to cl as - si fy pe de st ri an fe at ur es in to fo ur ca te go ri es of em ot io ns . E m ot io na l de te ct io n ac cu ra cy of 85 .3 3% w as ob se rv ed in th e va lid at io n re su lts B ye on an d K w ak [1 2] Fa ce H ap pi ne ss ,s ad ne ss ,a ng er ,s ur pr is e, di sg us t, fe ar – 3D -C N N T he ex pe ri m en ta l re su lts on vi de o- ba se d fa ci al ex pr es si on da ta ba se re ve al ed th at th e m et ho d sh ow ed a go od pe rf or m an ce in co m pa ri so n to th e co nv en tio na l m et ho ds su ch as PC A an d T M PC A C he n et al .[ 15 ] Fa ce A ng ry ,d is gu st ,f ea r, ha pp y, sa d, su r- pr is e, ne ut ra l X ia oB ao C N N (V G G -1 6) T he fa ci al ex pr es si on re co gn iti on in th e w ild (F E R W ) m od el ca n re co g- ni ze fa ci al ex pr es si on s in th e re al - w or ld w ith an ac cu ra cy of 79 % an d a re al -t im e po si tiv e em ot io n in ce nt iv e sy st em (P E IS ) w as ab le to en ha nc e us er ex pe ri en ce C id et al .[ 18 ] Fa ce ,v oi ce Sa d, ha pp y, fe ar ,a ng er ,n eu tr al M ue ca s D yn am ic B ay es ia n N et w or k T he B ay es ia n ap pr oa ch to th e em o- tio n re co gn iti on pr ob le m pr es en ts go od re su lts fo r re al -t im e ap pl ic a- tio ns w ith un tr ai ne d us er s in an un co nt ro lle d en vi ro nm en t D an dı la nd Ö zd em ir [2 3] Fa ce A ng er , fe ar , ha pp y, su rp ri se , sa d, ne ut ra l – C N N Su cc es sf ul re su lts ob ta in ed on re al - tim e vi de o, in ch an gi ng lig ht an d en vi ro nm en t co nd iti on s w ith 80 % ac cu ra cy 123 1590 International Journal of Social Robotics (2022) 14:1583–1604 Ta bl e 2 co nt in ue d R ef er en ce s R ec og ni tio n m od e R ec og ni ze d em ot io ns R ob ot A lg or ith m ty pe M aj or fin di ng s D en g et al .[ 27 ] Fa ce Su rp ri se , sa d, ne ut ra l, ha pp y, fe ar , di sg us t, an ge r – cG A N C an le ar n no t on ly th e re gi on s re la te d to ex pr es si on bu t al so m ax - im al ly ca pt ur in g nu an ce d ch ar - ac te ri st ic s re le va nt to ex pr es si on an d th en tr an sf or m in g th e or ig i- na le xp re ss io n to an ot he re xp re ss io n w ith id en tit y an d ot he r fa ct or s pr e- se rv ed Fa ri a et al .[ 34 ] Fa ce A fr ai d, an gr y, ha pp y, ne ut ra l, di s- gu st in g, sa d, su rp ri se d N ao D yn am ic B ay es ia n M ix tu re M od el (D B M M ) A n ov er al la cc ur ac y ar ou nd 85 % on K D E F da ta se ta nd 80 % on te st s on - th e- fly du ri ng hu m an –r ob ot in te ra c- tio n G e et al .[ 38 ] Fa ce H ap py ,s ad ,f ea r, di sg us t, an ge r, su r- pr is e R ob ot ic he ad SV M T he ex pe ri m en ta l re su lts sh ow ed th at th e pr op os ed no nl in ea r fa ci al m as s- sp ri ng m od el co up le d w ith th e SV M cl as si fie r is ef fe ct iv e to re c- og ni ze th e fa ci al ex pr es si on s co m - pa re d w ith th e lin ea r m as s- sp ri ng m od el In th ia m et al .[ 56 ] Fa ce an d bo di ly m ov em en t Po si tiv e m oo d an d ne ga tiv e m oo d – H M M Fa ci al ex pr es si on al on e m ay be m is le ad in g, si nc e it m ay no t be a tr ue ex pr es si on of in ne r fe el in g. B y in cl ud in g bo di ly ex pr es si on in th e an al ys is ,t he es tim at io n m od el ga ve a be tte rr es ul tw ith es tim at io n ac cu - ra cy ov er 70 % L ie ta l. [7 2] Fa ce H ap pi ne ss ,a ng er ,d is gu st ,f ea r, sa d- ne ss ,s ur pr is e H ar le y C N N -L ST M C N N an d L ST M ar e co m bi ne d to ex pl oi t th ei r ad va nt ag es in th e pr o- po se d m od el an d th e sy st em is ap pl ie d to a hu m an oi d ro bo t to de m on st ra te its pr ac tic ab ili ty fo r im pr ov in g th e H R I L iu et al .[ 81 ] Fa ce H ap py , an gr y, su rp ri se , fe ar , di s- gu st ,s ad ,a nd ne ut ra l M ob ile ro bo t E L M cl as si fie r R ec og ni ze d hu m an em ot io ns w ith 80 % ac cu ra cy L iu et al .[ 79 ] Fa ce A ng er ,s ca re d, sa d, su rp ri se ,n eu tr al , di sg us t – C N N A n av er ag e w ei gh tin g m et ho d w as pr op os ed to av oi d po te nt ia le rr or s in re al -t im e fa ci al ex pr es si on re co gn i- tio n ba se d on th e tr ad iti on al co nv o- lu tio na ln eu ra ln et w or k 123 International Journal of Social Robotics (2022) 14:1583–1604 1591 Ta bl e 2 co nt in ue d R ef er en ce s R ec og ni tio n m od e R ec og ni ze d em ot io ns R ob ot A lg or ith m ty pe M aj or fin di ng s L op ez -R in co n [8 2] Fa ce Sa dn es s, ha pp in es s, su rp ri se ,a ng er , di sg us t, fe ar N ao A FF D E X SD K an d C N N T he gl ob al em ot io n cl as si fic at io n ac cu ra cy of th e C N N is be tte r th an th e A FF D E X sy st em , bu t th e fa ce de te ct io n in A FF D E X is sl ig ht ly be tte r M ar tin et al .[ 87 ] Fa ce A ng er , di sg us t, fe ar , su rp ri se , ne u- tr al ,h ap py ,s ad – A ct iv e A pp ea ra nc e M od el (A A M ), M ul ti L ay er Pe rc ep tr on (M L P) an d SV M C om pa re d th re e di ff er en t fa ci al ex pr es si on cl as si fie rs (A A M cl as si - fie r se t, M L P an d SV M ) M eg hd ar ie ta l. [9 1] Fa ce H ap pi ne ss ,s ad ne ss ,f ea r, an ge r, su r- pr is e A lic e A N N C an re co gn iz e em ot io ns w ith 92 .5 2% ac cu ra cy in re al -t im e N un es [1 00 ] Fa ce an d up pe r bo dy A ng er ,d is gu st ,f ea r, ha pp in es s, sa d- ne ss ,s ur pr is e an d ne ut ra l – C N N B im od al ap pr oa ch (8 6. 6% ac cu - ra cy ) ba se d on em ot io n re co gn i- tio n fr om bo th fa ce an d up pe r bo dy pr od uc ed be tte r re su lts th an m on om od al ap pr oa ch . R om er o et al .[ 10 8] Fa ce ,v oi ce ,b od y ge st ur es N eu tr al ,f ea r, an gr y, sa d, ha pp y – D yn am ic B ay es ia n N et w or k T he st an da rd iz ed va ri ab le s as so ci - at ed to th e A U s of th e us er w er e ob ta in ed ,a llo w in g re al -t im e re co g- ni tio n of ea ch fa ci al ex pr es si on w ith di ff er en t fa ct or s, su ch as lig ht in g co nd iti on s, ge nd er , un us ua l fa ci al fe at ur es (l ik e in ju ri es or sc ar s) , am on g ot he rs R ui z- G ar ci a et al .[ 11 0] Fa ce Su rp ri se ,h ap py ,d is gu st ,a ng ry ,f ea r, sa d – C N N Im ag es fr om 28 pa rt ic ip an ts w er e co lle ct ed in an un co nt ro lle d en vi - ro nm en t to te st ou r C N N em ot io n re co gn iti on m od el , re su lti ng in a cl as si fic at io n ra te of 73 .5 5% Sh ie ta l. [1 14 ] Fa ce an d bo dy In te re st ed ,d is tr ac te d, co nf us ed Pe pp er k- N ea re st N ei gh bo ur (K N N ) A m ul ti- st ud en t af fe ct re co gn iti on sy st em w hi ch , st ar tin g fr om ei gh t ba si c em ot io ns de te ct ed fr om fa ci al ex pr es si on s, ca n in fe r hi gh er em o- tio na l st at es re le va nt to a le ar ni ng co nt ex t, su ch as “i nt er es te d” , “d is - tr ac te d” an d “c on fu se d” 123 1592 International Journal of Social Robotics (2022) 14:1583–1604 Ta bl e 2 co nt in ue d R ef er en ce s R ec og ni tio n m od e R ec og ni ze d em ot io ns R ob ot A lg or ith m ty pe M aj or fin di ng s Si m ul et al .[ 11 6] Fa ce N eu tr al ,s ur pr is e, sa d, sm ile ,a ng ry R ib o SV M R ob ot R ib o re co gn iz es hu m an fa ci al ex pr es si on , fa ci al ge st ur e m ov em en t an d de te ct s hu m an ge n- de r in re al -t im e V ith an aw as am an d M ad hu sa nk a [1 30 ] Fa ce an d up pe r bo dy A ng er ,f ea r, bo re d – Fi sh er fa ce al go ri th m / po si tio n of ar m s A ng er w as co rr ec tly pr ed ic te d 81 .5 4% of th e tim es , fo llo w ed by bo re d (7 2. 20 % ) an d fe ar (6 8. 37 % ). T he re su lts w er e se ns iti ve to th e lig ht in g co nd iti on s W eb b et al .[ 13 3] Fa ce A ng ry ,d is gu st ,f ea r, ha pp y, ne ut ra l, sa d, su rp ri se ,n eu tr al N ao C N N W he n ev al ua te d on no ve ld at a w ith no nu ni fo rm co nd iti on s ta ke n by a N ao ro bo t an ac cu ra cy of 79 .7 5% w as ac hi ev ed W im m er et al .[ 13 4] Fa ce A ng er ,d is gu st ,f ea r, ha pp y, sa d, su r- pr is e B 21 ro bo t B in ar y D ec is io n T re e E xp er im en ta l ev al ua tio n re po rt s a re co gn iti on ra te of 70 % on th e C oh n– K an ad e fa ci al ex pr es si on da ta ba se , an d 67 % in a ro bo t sc e- na ri o W u et al .[ 13 6] Fa ce A ng ry , sa d, di sg us t, fe ar , ha pp y, ne ut ra l, su rp ri se – W ei gh t- A da pt ed C on vo lu tio n N eu - ra lN et w or k (W A C N N ) T he re co gn iti on ac cu ra ci es of th e pr op os ed al go ri th m w er e hi gh er th an th e de ep C N N w ith ou t H G A , in di ca tin g a be tte r gl ob al op tim iz a- tio n ab ili ty Y u an d Ta pu s [1 42 ] Fa ce ,b od y ge st ur e N eu tr al ,h ap py ,a ng ry ,s ad Pe pp er R an do m Fo re st s (R F) D ev el op ed a m ul tim od al em ot io n re co gn iti on m od el w ith ga it an d th er m al fa ci al da ta , w hi ch is ba se d on R F m od el an d th e m od ifi ed co n- fu si on m at ri ce s of tw o in di vi du al m od el s 123 International Journal of Social Robotics (2022) 14:1583–1604 1593 Table 3 Studies on FER; Note: Studies listed according to accuracy level Study Dataset Algorithm Classes Accuracy (a) With accuracy greater than 90% on static input i.e., human images Mistry et al. [95] CK+/MMI SVM 7 100%/94.66% Kotsia and Pitas [66] CK SVM 6 99.7% Hossain et al. [51] Jaffe/CK GMM 7 99.8%/99.7% Kar et al. [60] CK+ BPNN 6 99.51% Mliki et al. [45] CK/Jaffe SVM 7 99.24%/96.50% Chen et al. [16] CK+/Jaffe CNN 7 99.1597%/87.7350% Zhang et al. [146] CK+ CNN 6 98.9% Mayya et al. [89] Jaffe/CK+ CNN 7 98.12%/96.02% Minaee and Abdolrashidi [94] CK+/Jaffe CNN 7 98.0%/92.8% Nwosu et al. [101] Jaffe/CK+ CNN 7 97.71%/95.72% Yang et al. [140] CK+/Oulu-Casia/Jaffe CNN 6 97.02%/92.89%/92.21% Yang et al. [139] CK+/Oulu-Casia/Jaffe WMDNN 6 97.0%/92.3%/92.2% Ding et al. [30] CK+/Oulu-Casia CNN 8/6 96.8%/87.71% Gogić et al. [40] CK+/Jaffe/ MMI NN 7 96.48%/85.88%/73.73% Kim et al. [62] CK+/Jaffe CNN 6 96.46%/91.27% Hua et al. [52] Jaffe CNN 7 96.44% Mannan et al. [85] CK+ SVM 7 96.36% Ruiz-Garcia et al. [110] KDEF/CK+ CNN-SVM 7 96.26%/95.87% Hamester et al. [44] Jaffe CNN 7 95.8% Meng et al. [93] CK+/MMI CNN 6 95.27%/71.55% Liliana et al. [77] CK+ SVM 7 93.93% Ferreira et al. [36] CK+/Jaffe CNN 8/6 93.64%/89.01% Mollahosseini et al. [97] CK+ DNN 7 93.2% Yaddadenet al. [137] Jaffe/KDEF KNN 7 92.29%/79.69% (b) With accuracy greater than 90% on dynamic input i.e., human videos Liang et al. [76] CK+/Oulu-Casia/MMI CNN-BiLSTM 6 99.6%/91.07%/80.71% Carcagnì et al. [14] CK+ SVM 7 98.5% Wu et al. [135] CK+ HMM 7 98.54% Zhang et al. [145] CK+/Oulu-Casia/MMI CNN-RNN 6 98.5%/86.25%/81.18% Kotsia et al. [65] CK SVM 6 98.2% Uddin et al. [126] Depth DBN 6 96.67% Zhao et al. [147] CK+/Oulu-Casia/MMI SVM 7 95.8%/74.37%/71.92% Elaiwat et al. [32] CK+/MMI RBM 7 95.66%/81.63% Uddin et al. [127] CK CNN 6 95.42% Sikka et al. [115] CK+/Oulu-Csia HMM 7 94.60%/75.62% Kabir et al. [58] Depth HMM 6 94.17% Study Robot Sensor Algorithm Classes Accuracy (c) FER in real-time i.e., on dynamic input during HRI Cid et al. [18] Muecas camera Bayesian 5 94% Meghdari et al. [91] Alice Kinect ANN 6 92.52% Simul et al. [116] Ribo Webcam SVM 5 86% Bera et al. [8] Pepper Camera CNN 4 85.33% Liu et al. [81] Mobile robot Kinect ELM 7 Above 80% 123 1594 International Journal of Social Robotics (2022) 14:1583–1604 Table 3 continued Study Robot Sensor Algorithm Classes Accuracy Yu and Tapus [142] Pepper camera RF 4 78.125% Webb et al. [133] Nao Camera CNN 8 79.75% Chen et al. [15] XiaoBao Camera CNN 7 79% Barros et al. [2] iCub RGB camera CNN 3 74.2% Ruiz-Garcia et al. [110] Nao Built-in camera CNN-SVM 7 68.75% Wimmer et al. [134] B21 robot Camera Binary decision tree 6 67% CK Cohn–Kanade, Jaffe Japanese Female Facial Expression, GAN Generative Adversarial Network, KNN k-Nearest Neighbor, HMM Hidden Markov Model, RNN Recurrent Neural Network, CNN Convolutional Neural Network, SVM Support Vector Machine, LSTM Long Short-Term Memory, WMDNN Weighted Mixture Deep Neural Network, NN Neural Network, ANN Artificial Neural Network, ELM Extreme Learning Machine, BPNN Back Propagation Neural Network, DBN Deep Belief Network, RF Random Forests Fig. 6 Facial expression generation techniques lowing the movement related to specific AUs in the facial acting coding system (FACS). Breazeal’s [9] robot Kismet generated emotions using an interpolation-based technique over a 3-D space, where the three dimensions correspond to valence, arousal and stance. The expressions become intense as the affect state moves to extreme values in the affect space. Park et al. [102] made diverse facial expressions by changing their dynamics and increased the lifelikeness of a robot by adding secondary actions such as physiological movements (eye blinking and sinusoidal motions concerning respiration). A second-order differential equation based on the linear affect-expressions space model is used to achieve the dynamic motion for expressions. Prajapati et al. [103] used a dynamic emotion generation model to convert the facial expressions derived from the human face into a more natural form before render- ing them on the robotic face. The model is provided with the facial expression of the person interacting with the system and corresponding synthetic emotions generated are fed to the robotic face. Summary of findings The robot faces are capable of mak- ing basic facial expressions as they contain enough DoFs in the eyes andmouth. They are able to generate static emotions [7,61,91]. Additionally, the robot faces are able to generate dynamic emotions [9,102,103]. 5.2 Facial Expression Generation is Automated Some of the studies automatically generate facial expres- sions on robots. Unlike hand-coded techniques where the commands for the position of features like eyes and mouth are sent from the computer, here, the facial expressions are generated using machine learning techniques such as neural networks and RL. Breazeal et al. [10] presented a robot Leonardo that can imitate human facial expressions. They use neural networks to learn the direct mapping of a human’s facial expressions onto Leonardo’s own joint space. InHorii et al. [50], the robot does not directly imitate the human but estimates the correct emotion and generates the estimated emotion using RBM. RBM[46] is a generativemodel that represents the generative process of data distribution and latent representation, and can generate data from latent signals [98,117,123]. 123 International Journal of Social Robotics (2022) 14:1583–1604 1595 Ta bl e 4 D et ai le d in fo rm at io n ab ou ts tu di es on em ot io n ex pr es si on an d H um an R ob ot In te ra ct io n (H R I) R ef er en ce s E xp re ss io n m od e E xp re ss ed em ot io ns R ob ot C od in g M aj or fin di ng s B en ne tt an d Sa ba no vi c [7 ] Fa ce N eu tr al , sa dn es s, ha pp in es s, an ge r, fe ar ,s ur pr is e M iR A E H an d- co de d Id en tifi ed m in im al fe at ur es , i.e . m ov em en t of ey es , ey eb ro w s, m ou th an d ne ck ar e su ffi ci en t to id en tif y th e fa ci al ex pr es si on B re az ea l[ 9] Fa ce A ng er ,c al m ,d is gu st ,f ea r, co nt en t, in te re st ,s or ro w ,s ur pr is e, tir ed K is m et H an d- co de d K is m et ca n ge ne ra te em ot io ns us in g an in te rp ol at io n- ba se d te ch ni qu e ov er a 3- D sp ac e w he re th e th re e di m en si on s co rr es po nd to va le nc e, ar ou sa la nd st an ce B re az ea le ta l. [1 0] Fa ce L eo na rd o A ut om at ed (N eu ra lN et w or k) B ui lt a ro bo t ca pa bl e of le ar ni ng ho w to im ita te fa ci al ex pr es si on s fr om si m pl e im ita tiv e ga m es pl ay ed w ith a hu m an , us in g bi ol og ic al ly in sp ir ed m ec ha ni sm s C hu ra m an ie ta l. [1 7] Fa ce A ng er ,h ap pi ne ss ,n eu tr al ,s ad ne ss , su rp ri se N ic o A ut om at ed (R L ) E xp lo re d a co nt in uo us re pr es en ta - tio n of ex pr es si on on th e N IC O ro bo tu si ng th e co m pl et e fa ce L E D m at ri x to ge ne ra te ex pr es si on s C id et al .[ 19 ] Fa ce Sa d, ha pp y, fe ar ,a ng er ,n eu tr al M ue ca s H an d- co de d T he ou tp ut of th e B ay es ia n cl as si - fie r is im ita te d by th e ro bo tic he ad M ue ca s E sf an db od et al .[ 33 ] Fa ce N eu tr al , ha pp y, sa d, su rp ri se d, an gr y R A SA H an d- co de d Fr om th e su bj ec ts ’ vi ew po in t, th e sy st em ’s pe rf or m an ce w as fa ir ly pr om is in g w ith a sc or e of 4. 1 ou t of 5 G e et al .[ 38 ] Fa ce H ap py , fe ar , su rp ri se , di sg us t, sa d- ne ss ,a ng er R ob ot ic he ad H an d- co de d T he ro bo th ea d ar e tr ig ge re d to im i- ta te hu m an fa ci al ex pr es si on s by th e em ot io n ge ne ra to r en gi ne an d ca n ge ne ra te a vi vi d im ita tio n ac co rd in g to th e te st er ’s fa ci al ex pr es si on H or ii et al .[ 50 ] Fa ce ,h an ds ,v oi ce H ap pi ne ss ,n eu tr al ,a ng er ,s ad ne ss iC ub A ut om at ed (R B M ) T he ro bo t do es no t co py th e hu m an ’s ex pr es si on s di re ct ly bu t ge ne ra te s ex pr es si on s on its ow n af te r es tim at in g th e hu m an ’s em o- tio n Il ic et al .[ 54 ] Fa ce A ng er ,d is gu st ,f ea r, ha pp in es s, sa d- ne ss ,s ur pr is e, in di ff er en ce A is oy H an d- co de d L ea rn ed a m od el of th e em ot io na l va lu e of th e ro bo t’s fa ci al ex pr es - si on s w ith ou t hu m an s’ ex pl ic it fe ed ba ck K im et al .[ 61 ] Fa ce N eu tr al ,a ng er ,h ap pi ne ss ,f ea r, sa d- ne ss ,s ur pr is e U lk ni H an d- co de d In tr od uc ed an ar tifi ci al fa ci al ex pr es si on im ita tio n sy st em us in g a ro bo th ea d 123 1596 International Journal of Social Robotics (2022) 14:1583–1604 Ta bl e 4 co nt in ue d R ef er en ce s E xp re ss io n m od e E xp re ss ed em ot io ns R ob ot C od in g M aj or fin di ng s K is hi et al .[ 64 ] Fa ce A ng er , sa dn es s, fe ar , di sg us t, su r- pr is e, ha pp y K O B IA N H an d- co de d D ev el op ed a ne w he ad fo r bi pe d w al ki ng ro bo t K O B IA N th at co ul d ex pr es s th e 6 ba si c em ot io ns L iu et al .[ 81 ] Fa ce H ap py , an gr y, su rp ri se , fe ar , di s- gu st ,s ad ,n eu tr al M ob ile ro bo t H an d- co de d G en er at ed fa ci al ex pr es si on s ad ap t- in g to hu m an em ot io ns us in g a fo ur - la ye r fr am ew or k de si gn ed fo r th e sy st em to re co gn iz e hu m an em ot io n ba se d on H R I M ae da an d G es hi [8 4] Fa ce A ng er ,c on te m pt ,d is gu st ,f ea r, ha p- pi ne ss ,n eu tr al ,s ad ne ss ,s ur pr is e TA PI A H an d- co de d A n in te ra ct iv e co m m un ic at io n m et ho d of a hu m an an d a ro bo t ba se d on th e M ar ko vi an em ot io na l m od el (M E M ) by us in g th e fa ci al ex pr es si on ,w as si gn ifi ca nt ly be tte r th an us in g an id en tic al , sy m m et ri c or ra nd om em ot io n in te ra ct io n m et ho ds M eg hd ar ie ta l. [9 1] Fa ce H ap py , sa d, an gr y, su rp ri se d, di s- gu st ed ,n eu tr al A lic e H an d- co de d A hu m an oi d so ci al ro bo t “A lic e” im ita te sh um an fa ci al ex pr es si on si n re al -t im e Pa rk et al .[ 10 2] Fa ce A ng er ,d is gu st ,f ea r, ha pp in es s, sa d- ne ss ,s ur pr is e, di sl ik e R ob ot ic he ad H an d- co de d M ad e di ve rs e fa ci al ex pr es si on s by ch an gi ng th ei r dy na m ic s an d in cr ea se d th e lif el ik en es s of a ro bo t by ad di ng se co nd ar y ac tio ns su ch as ph ys io lo gi ca l m ov em en ts (e ye bl in ki ng an d si nu so id al m ot io ns co nc er ni ng re sp ir at io n) Y oo et al .[ 14 1] Fa ce A ng ry ,d is gu st ,f ea r, ha pp in es s, sa d- ne ss ,s ur pr is e R ob ot ic he ad H an d- co de d A fu zz y in te gr al -b as ed ge ne ra tio n m et ho d of co m po si te fa ci al ex pr es - si on s w as pr op os ed an d de m on - st ra te d its ef fe ct iv en es s th ro ug h th e ex pe ri m en t w ith th e de ve lo pe d ro bo tic he ad 123 International Journal of Social Robotics (2022) 14:1583–1604 1597 Li andHashimoto [73] developed aKANSEI communica- tion system based on emotional synchronization. KANSEI is a Japanese term that means emotions, feeling, sensitivity etc. The KANSEI communication system first recognizes human emotion and maps the recognized emotion to the emotion generation space. Finally, the robot expresses its emotion synchronized with the human’s emotion in the emotion gen- eration space. When the human changes his/her emotion, the robot also synchronizes its emotion with the human’s emo- tion, establishing a continuous communication between the human and the robot. It was found that the subjects became more comfortable with the robot and communicated more with the robot when there was emotional synchronization. In Churamani et al. [17], the robot Nico learned the cor- rect combination of eyebrow and mouth wavelet parameters to express itsmoodusingRL.The learned expressions looked slightly distorted but were sufficient to distinguish between various expressions. The robot could also generate expres- sions that were not limited to the basic five expressions that were learned. For amixed emotional state (for example, anger mixed with sadness), the model was able to generate novel expression representations representing the mixed state of the mood. Summary of findings In all of the above studies, the robots learn to generate facial expressions automatically using machine learning techniques. While Breazeal [10], Li and Hashimoto [73] used direct mapping of human facial expressions,Horii et al. [50] generated the estimated human’s emotion on the robot. In Churamani et al. [17], the robot was able to associate the learned expressions with the context of the conversation. 6 Discussion 6.1 Summary of the State of the Art There are already studies having high accuracy (greater than 90%) in facial expression recognition on CK+, Jaffe and Oulu-Casia datasets. (see Table 3a, b). The accuracies on CK+, Jaffe and Oulu-Casia datasets have been as high as 100%, 99.8%and 92.89% respectively. In comparison to this, the accuracy for facial expression recognition in real-time is not as high. Zhang et al. [146] used a deep convolutional network (DCN) that had an accuracy of 98.9% on CK+ dataset and 55.27% on Static Facial Expressions in the Wild (SFEW) dataset. Here, the same network produced very different results for two different datasets. SFEW [29] consists of close to a real-world environment extracted from movies. The database covers unconstrained facial expressions, varied head poses, large age range, occlusions, varied focus, differ- ent resolution of faces, and close to real-world illumination. In Zhang et al. [146] the accuracy for ”in the wild” settings was considerably lower than on CK+ dataset, implying that the expression recognition algorithms can still not handle the variations in environment, head poses etc. in real-world settings. Table 5 provides possible categories for facial recog- nition in the wild. It contains the basic emotional facial expressions, situation-specific face occlusions, permanent face features, facemovements, situation-specific expressions and side activities during facial expressions. Most of the current research in facial expression recog- nition relates to the first category of basic emotional facial expression. Survey articles on facial expression recognition have been cited in the Table 5 [11,13,21,42,43,71,88,109, 112]. For more details on individual studies, refer to Table 3. Facial expression recognition in the presence of situation- specific face occlusions like a mouth–nose mask, glasses, hand in front of face etc. has also been studied [74,75,131]. Pose invariant facial expression recognition when the face is moving or turned sideways has also been partially studied [96,113,143,144]. For the facial expression generation, robots can make cer- tain basic facial expressions by moving their eyes, mouth and neck.However, they cannotmake asmany expressions as human beings due to the limited number of DoFs present in a robot’s face. There are relatively fewer studies for automated facial expression generation in robots [10,17,50,73]. While the robots are capable of displaying their facial expressions by manually coding the movement of the eyes and mouth, there are fewer studies that would make a robot learn to dis- play its facial expressions automatically. Most of the studies on facial expression generation have been carried out on robotic heads or humanoid robots like iCub and Nico [e.g.9,10,17,50]. In Becker-Asano and Ishig- uro [5], Geminoid F’s facial actuators are tuned such that the readability of its facial expressions is comparable to a real person’s static display of emotional expression. It was found that the android’s emotional expressions were more ambigu- ous than that of a real person and ’fear’ was often confused with ’surprise’. An advantage of automated facial expression generation over hand-coded facial expression generation is that in auto- mated facial expression generation, a robot could learnmixed expressions than simply the learned expressions. Unlike in hand-coded facial expression generation, where a robot can only express the emotions that it has learned, in Churamani et al. [17], the robot could express complex emotions that were made up of a combination of emotions. 6.2 Future Research Although facial expression recognition under specific set- tings has high accuracy and robots can express basic emotions 123 1598 International Journal of Social Robotics (2022) 14:1583–1604 Ta bl e 5 Po ss ib le ca te go ri es fo r fa ci al re co gn iti on in th e w ild C at eg or y D es cr ip tio n E xa m pl es R el at ed st ud ie s A pp lie d al go ri th m B as ic em ot io na lf ac ia le xp re ss io ns E xp re ss io n of ba si c em ot io ns , su ch as FA C S by E km an (2 00 1) H ap pi ne ss , sa dn es s, an ge r, di s- gu st ,f ea r, su rp ri se ,n eu tr al [1 1, 13 ,2 1, 42 ,4 3, 71 ,8 8, 10 9, 11 2] K N N ,H M M ,R N N ,C N N ,S V M an d L ST M Fa ce m ov em en ts M ov em en ts of th e fa ce it se lf (a s a w ho le ) M ov in g fo rw ar d, ba ck w ar d, tu rn in g ar ou nd , tu rn in g si de w ay s je rk in g th e he ad fo rw ar d, sp in ni ng [9 6, 11 3, 14 3, 14 4] G A N ,C N N Si tu at io n- sp ec ifi c fa ce oc cl us io ns O cc lu si on s of th e fa ce du e to si tu a- ti on al re qu ir em en ts M ou th –n os e m as k, gl as se s, ha nd in fr on to ff ac e, re st in g ha nd on m ou th , he ad se t [7 4, 75 ,1 31 ] C N N Pe rm an en tf ac e fe at ur es Pe rm an en tl y in st al le d fe at ur es of th e fa ce A rt ifi ci al ey e, be ar d [7 4, 75 ,1 31 ] C N N Si tu at io n- sp ec ifi c ex pr es si on s F ac ia le xp re ss io ns of th e fa ce du e to si tu at io na lr eq ui re m en ts N od di ng , bl in ki ng , lo ok in g do w n, ya w ni ng ,s ha ke he ad in ag re em en t, ey e ro ll, cl os in g ey es , lip bi tin g, pu rs ed lip s, st ic ki ng to ng ue ou t, w in ki ng Si de ac tiv iti es du ri ng fa ci al ex pr es si on A dd iti on al ac tiv iti es du ri ng th e ex pr es si on s of em ot io ns Ta lk in g, ea tin g, dr in ki ng , br us h- in g te et h, fix in g ha ir , co m bi ng ha ir , bi tin g na ils , cl ea ni ng ey es w ith a ha nd ,c ou gh in g, su pp or tin g th e fa ce w ith a ha nd , itc hi ng on fa ce ,b lo w in g no se ,s ne ez in g, ru b- bi ng ey es ,s ip pi ng th ro ug h a st ra w , ap pl yi ng cr ea m (1 ) ex am pl es w er e ge ne ra te d ba se d on 50 lif e ob se rv at io ns an d 50 vi de o- ba se d ob se rv at io ns by th e au th or s; (2 ) bo ld = ve ry w el l un de rs to od in re se ar ch ; ita lic = pa rt ly un de rs to od in re se ar ch ; bo ld ita lic = ha rd ly un de rs to od 123 International Journal of Social Robotics (2022) 14:1583–1604 1599 through facial expressions, there are several possible direc- tions for future research in this area. Suggestion 1: Performing facial expression recognition in the wild needs to be emphasized upon. To efficiently recognize facial expressions in real-time and in a real-world environment, the robot should be able to perform facial expression recognition with varied head poses, varied focus, presence of occlusions, different resolutions of the face and varied illumination conditions. The studies that perform facial expression recognition in real-time are limited to a laboratory environment which is far different from a real- world scenario. A good study would be the one where facial expression recognition in the wild is performed. Some studies perform facial expression recognition in the wild, but their accuracy is much less than the accuracy on predefined datasets like CK+, Jaffe, MMI etc. To increase the efficiency of facial expression recognition in real-world scenarios, the performance of facial expression recognition in the wild needs to be improved. This can also be used to recognize facial expressions in real-time. Based on this, a direct adaptation of emotions would make HRI smoother. Suggestion 2: Facial expressions during activities like talking, nodding etc. need to be studied. Situation-specific expressions (nodding, yawning, blinking, looking down) and side activities during facial expressions (talking, eating, drinking, sneezing) in Table 5 have not been studied. To understand vivid expressions, it is required to be able to recognize facial expressions for all categories. Humans also express emotions while interacting with some- one verbally, such as smiling while speaking when they are happy. In this case, it should be possible to recognize a smile during speech. Suggestion 3: Combine facial expression recognition with the data from other modalities such as voice, text, body gestures and physiological data to improve the emotion recognition rate. Although this overview focuses on facial expression recogni- tion, it may be possible to control one’s face and not express the emotion one is truly experiencing. Some studies combine facial expression recognition with audio data, body gestures or physiological data for an improved emotion recognition [41,53,83]. There are very few studies that combine facial data with both audio and physiological data [106,107] and studies that analyze all modalities (face, voice, text, body gestures and physiological signals) have not been found. Humans can recognize the emotion of a person quickly and effectively by taking into account their facial expression, body gestures, voice and words. Combining facial, audio, text and body gestures with physiological data could lead to a higher emotion recognition rate by machine learning algo- rithms than by humans. Suggestion 4: How should a robot react towards a given human emotion? In HHI, a human’s reaction to a given emotion is either a result of parallel empathy or reactive empathy [26]. It should be studied with which emotion should a robot appropriately react to a given human emotion. Moreover, it needs to be studied if a robot should be able to express negative emo- tions. Most of the existing studies allow a robot to be able to express basic emotions (anger, fear, happiness, neutral, sad- ness, surprise). It may be reasonable for a robot to react with a sad expression when a human being expresses anger. But, should a robot be able to express extreme emotions such as anger? For facial expression generation, while robots are capa- ble of displaying facial expressions both static and dynamic, they are unable to generate facial expressions when they are speaking. For example, robots could smile while talk- ing to express their happiness or they could speak with a frown when angry. Robots could also express their emotions through partial facial or bodily gestures instead of showing a full face expression. For example, tilting head down to express sadness, frowning to express anger, eyes wide open to express surprise and raising eyebrows. Suggestion 5: Robots should be able to recognize and gen- erate facial expressions with various intensities. Emotions form a continuous range and can have various intensities. If one is less happy, one would smile less. Simi- larly, if someone is very happy, the smile would also be big. It should be possible to recognize not just the emotion but the intensity of emotion. Moreover, in most of the existing studies, robots express their emotions with only one config- uration per emotion. Robots should also be able to express their emotions with different intensities. Finally, it needs to be studied whether the intensity of emotion with which a robot reacts to a given human emotion has any effects on the human and whether these effects are positive or negative. Suggestion 6: Robots should be able to express their emo- tions through a combination of body gestures and facial expressions. While in this overview, we focus on robotic facial expres- sions, there are other articles where emotional expression is performed through the robot’s body postures [4,20,22,55,86, 90]. A potential future study could be to compare the robot’s facial expressions with robot’s bodily expressions and also with the combination of facial and bodily expressions to see if there is any difference in the recognition of these. Suggestion 7: Robots should be able to both recognize and generate complex emotions such as that of thinking, calm and bored states. For both facial expression recognition and generation, there is a need to go beyond the basic seven emotions to recog- nizing and generating more complex emotions such as calm, fatigued, bored etc. It might be difficult to generate complex emotions given the hardware limitations of the robot, but if 123 1600 International Journal of Social Robotics (2022) 14:1583–1604 this is made possible, robots could express a wider range of emotions similar to human beings. 7 Conclusion This overview emphasizes the recognition of human facial expressions and the generation of robotic facial expres- sions. There are already plenty of studies having high accuracy for facial expression recognition on pre-existing datasets. Accuracy on facial expression recognition in the wild is considerably lower than the experiments which have been conducted under controlled laboratory conditions. For human facial emotion recognition, future work would be to improve emotion recognition for non-frontal head poses in presence of occlusions (i.e. emotion recognition in the wild). It should be made possible to recognize emotions during speech as well emotions with varying intensities. In the case of facial expression generation in robots, robots are capable of making the basic facial expressions. Few studies perform autonomous facial generation in robots. In the future, there could be studies comparing robotic facial expressions with the robot’s bodily expressions and also with a combination of facial and bodily expressions to see if there is any differ- ence in recognizing these. Robots should be able to express their emotion with partial bodily or facial gestures while speaking. They should also be express their emotions with various intensities instead of a single configuration per emo- tion. Lastly, there is a need to go beyond the basic seven expressions for both facial expression recognition and gen- eration. Acknowledgements The authors thank Vignesh Prasad for his insight- ful comments. Funding Open Access funding enabled and organized by Projekt DEAL. This research was funded by the German Research Founda- tion (DFG, Deutsche Forschungsgemeinschaft). The authors also thank the ZEVEDI Hessen and the leap in time foundation for the grateful funding of the project. Declarations Conflict of interest The authors declare that they have no conflict of interest. Ethical approval The authors declare that there are no compliance issues with this research. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adap- tation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indi- cate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, youwill need to obtain permission directly from the copy- right holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. References 1. Ahmed TU, Hossain S, Hossain MS, ul Islam R, Andersson K (2019) Facial expression recognition using convolutional neural network with data augmentation. In: 2019 joint 8th international conference on informatics, electronics vision (ICIEV) and 2019 3rd international conference on imaging, vision pattern recogni- tion (icIVPR), pp 336–341 2. Barros P, Weber C, Wermter S (2015) Emotional expression recognitionwith a cross-channel convolutional neural network for human–robot interaction. In: 2015 IEEE-RAS 15th international conference on humanoid robots (humanoids), pp 582–587 3. Bavelas J, Gerwing J (2011) The listener as addressee in face-to- face dialogue. Int J Listen 25:178–198 4. Beck A, Cañamero L, Hiolle A, Damiano L, Cosi P, Tesser F, Sommavilla G (2013) Interpretation of emotional body language displayed by a humanoid robot: a case study with children. Int J Soc Robot 5(3):325–334 5. Becker-Asano C, Ishiguro H (2011) Evaluating facial displays of emotion for the android robot geminoid f, pp 1–8 6. BengioY, Simard P, Frasconi P (1994) Learning long-term depen- dencieswith gradient descent is difficult. IEEETransNeuralNetw 5(2):157–166 7. Bennett CC, Sabanovic S (2014) Deriving minimal features for human-like facial expressions in robotic faces. Int J Soc Robot 6:367–381 8. Bera A, Randhavane T, Prinja R, Kapsaskis K, Wang A, Gray K, Manocha D (2019) The emotionally intelligent robot: improving social navigation in crowded environments. ArXiv arXiv:1903.03217 9. Breazeal C (2003) Emotion and sociable humanoid robots. Int J Hum Comput Stud 59(1–2):119–155 10. Breazeal C, Buchsbaum D, Gray J, Gatenby D, Blumberg B (2005) Learning from and about others: towards using imitation to bootstrap the social understanding of others by robots. Artif Life 11:31–62. https://doi.org/10.1162/1064546053278955 11. Buciu I, Kotsia I, Pitas I (2005) Facial expression analysis under partial occlusion, pp v/453 –v/456, vol 5 12. Byeon YH, Kwak KC (2014) Facial expression recognition using 3d convolutional neural network. Int J Adva Comput Sci Appl 5(12) 13. Canedo D (2019) Facial expression recognition using computer vision: a systematic review. Appl Sci. https://doi.org/10.3390/ app9214678 14. Carcagnì P, Del Coco M, Leo M, Distante C (2015) Facial expression recognition and histograms of oriented gradients: a comprehensive study. Springerplus 4(1):645 15. Chen H, Gu Y, Wang F, ShengW (2018) Facial expression recog- nition and positive emotion incentive system for human–robot interaction. In: 2018 13th world congress on intelligent control and automation (WCICA), pp 407–412 16. Chen X, Yang X, Wang M, Zou J (2017) Convolution neural network for automatic facial expression recognition. In: 2017 international conference on applied system innovation (ICASI), pp 814–817 17. Churamani N, Barros P, Strahl E, Wermter S (2018) Learning empathy-driven emotion expressions using affective modulations 123 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ http://arxiv.org/abs/1903.03217 https://doi.org/10.1162/1064546053278955 https://doi.org/10.3390/app9214678 https://doi.org/10.3390/app9214678 International Journal of Social Robotics (2022) 14:1583–1604 1601 18. Cid F,Moreno J, Bustos P,Núñez P (2014)Muecas: amulti-sensor robotic head for affective human robot interaction and imitation. Sensors (Basel, Switzerland) 14:7711–7737 19. Cid F, Prado JA, Bustos P, Núñez P (2013) A real time and robust facial expression recognition and imitation approach for affective human–robot interaction using gabor filtering. In: 2013 IEEE/RSJ international conference on intelligent robots and systems, pp 2188–2193 . https://doi.org/10.1109/IROS.2013.6696662 20. Cohen I (2010) Recognizing robotic emotions: facial versus body posture expression and the effects of context and learning. Mas- ter’s thesis 21. Corneanu CA, Simón MO, Cohn JF, Guerrero SE (2016) Survey on rgb, 3d, thermal, and multimodal approaches for facial expres- sion recognition: history, trends, and affect-related applications. IEEE Trans Pattern Anal Mach Intell 38(8):1548–1568 22. Costa S, Soares F, Santos C (2013) Facial expressions and ges- tures to convey emotions with a humanoid robot. In: International conference on social robotics. Springer, pp 542–551 23. Dandıl E, Özdemir R (2019) Real-time facial emotion classifica- tion using deep learning. Data Sci Appl 2(1):13–17 24. Datcu D, Rothkrantz L (2007) Facial expression recognition in still pictures and videos using active appearance models. A comparison approach, p 112. https://doi.org/10.1145/1330598. 1330717 25. Dautenhahn K (2007) Methodology & themes of human–robot interaction: a growing research field. Int J Adv Robot Syst 4:15 26. Davis M (2018) Empathy: a social psychological approach 27. Deng J, Pang G, Zhang Z, Pang Z, Yang H, Yang G (2019) cgan based facial expression recognition for human–robot interaction. IEEE Access 7:9848–9859 28. deGraafM,Allouch S, VanDijk JA (2016) Long-term acceptance of social robots in domestic environments: insights from a user’s perspective 29. DhallA,GoeckeR,LuceyS,GedeonT (2011)Static facial expres- sion analysis in tough conditions: data, evaluation protocol and benchmark. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), pp 2106–2112 30. Ding H, Zhou SK, Chellappa R (2017) Facenet2expnet: regular- izing a deep face recognition net for expression recognition. In: 2017 12th IEEE international conference on automatic face ges- ture recognition (FG 2017), pp 118–126 31. Drolet A, Morris MW (2000) Rapport in conflict resolution: accounting for how face-to-face contact fosters mutual cooper- ation in mixed-motive conflicts. J Exp Soc Psychol 36:26–50 32. Elaiwat S, Bennamoun M, Boussaïd F (2016) A spatio-temporal rbm-based model for facial expression recognition. Pattern Recogn 49:152–161 33. Esfandbod A, Rokhi Z, Taheri A, Alemi M, Meghdari A (2019) Human–robot interaction based on facial expression imitation. In: 2019 7th international conference on robotics and Mechatronics (ICRoM), pp 69–73 34. Faria DR, Vieira M, Faria FCC, Premebida C (2017) Affective facial expressions recognition for human–robot interaction. In: 2017 26th IEEE international symposium on robot and human interactive communication (RO-MAN), pp 805–810 35. Feil-SeiferD,MatarićMJ (2011) Socially assistive robotics. IEEE Robot Autom Mag 18(1):24–31 36. Ferreira PM,Marques F, Cardoso JS, Rebelo A (2018) Physiolog- ical inspired deep neural networks for emotion recognition. IEEE Access 6:53930–53943 37. Fix E (1951) Discriminatory analysis: nonparametric discrimina- tion, consistency properties. USAF School of Aviation Medicine 38. GeS,WangC,HangC (2008)A facial expression imitation system in human robot interaction 39. Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN 2000. Neural com- puting: new challenges and perspectives for the new millennium. IEEE, vol 3, pp 189–194 40. Gogić I, Manhart M, Pandžić I, Ahlberg J (2018) Fast facial expression recognition using local binary features and shallow neural networks. Vis Comput. https://doi.org/10.1007/s00371- 018-1585-8 41. Gunes H, Piccardi M (2007) Bi-modal emotion recognition from expressive face and body gestures. J Netw Comput Appl 30(4):1334–1345 42. Gunes H, Schuller B (2013) Categorical and dimensional affect analysis in continuous input: current trends and future directions. Image Vis Comput 31:120–136 43. Gunes H, Schuller B, Pantic M, Cowie R (2011) Emotion repre- sentation, analysis and synthesis in continuous space: a survey. Face Gesture 2011:827–834 44. Hamester D, Barros P, Wermter S (2015) Face expression recog- nition with a 2-channel convolutional neural network, pp 1–8 . https://doi.org/10.1109/IJCNN.2015.7280539 45. HazarM, Fendri E, HammamiM (2015) Face recognition through different facial expressions. J Signal Process Syst. https://doi.org/ 10.1007/s11265-014-0967-z 46. Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science (New York, NY) 313:504– 7. https://doi.org/10.1126/science.1127647 47. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 48. Hoffman G, Breazeal C (2006) Robotic partners’ bodies and minds: an embodied approach to fluid human–robot collabora- tion. In: AAAI workshop—technical report 49. Hoffman G, Zuckerman O, Hirschberger G, Luria M, Shani Sher- man T (2015) Design and evaluation of a peripheral robotic conversation companion. In: Proceedings of the Tenth Annual ACM/IEEE international conference on human–robot interaction, HRI ’15, pp 3–10. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2696454.2696495 50. Horii T, Nagai Y, Asada M (2016) Imitation of human expres- sions based on emotion estimation by mental simulation. Paladyn J Behav Robot. https://doi.org/10.1515/pjbr-2016-0004 51. Hossain MS, Muhammad G (2017) An emotion recognition sys- tem for mobile applications. IEEE Access 5:2281–2287 52. Hua W, Dai F, Huang L, Xiong J, Gui G (2019) Hero: human emotions recognition for realizing intelligent internet of things. IEEE Access 7:24321–24332 53. Huang Y, Yang J, Liao P, Pan J (2017) Fusion of facial expres- sions and eeg for multimodal emotion recognition. Comput Intell Neurosci 2017:1–8. https://doi.org/10.1155/2017/2107451 54. Ilic, D., Žužić, I., Brscic, D.: Calibrate my smile: robot learning its facial expressions through interactive play with humans, pp 68–75 (2019) 55. Inthiam J, Hayashi E, Jitviriya W, Mowshowitz A (2019) Mood estimation for human–robot interaction based on facial and bodily expression using a hidden Markov model. In: 2019 IEEE/SICE international symposium on system integration (SII). IEEE, pp 352–356 56. Inthiam J, Mowshowitz A, Hayashi E (2019) Mood perception model for social robot based on facial and bodily expression using a hidden Markov model. J Robot Mechatron 31:629–638 57. Jiang L, Cai Z, Wang D, Jiang S (2007) Survey of improving k-nearest-neighbor for classification. In: Fourth international con- ference on fuzzy systems and knowledge discovery (FSKD2007), vol 1, pp 679–683 58. Kabir MH, Salekin MS, Uddin MZ, Abdullah-Al-Wadud M (2017) Facial expression recognition from depth video with pat- terns of oriented motion flow. IEEE Access 5:8880–8889 123 https://doi.org/10.1109/IROS.2013.6696662 https://doi.org/10.1145/1330598.1330717 https://doi.org/10.1145/1330598.1330717 https://doi.org/10.1007/s00371-018-1585-8 https://doi.org/10.1007/s00371-018-1585-8 https://doi.org/10.1109/IJCNN.2015.7280539 https://doi.org/10.1007/s11265-014-0967-z https://doi.org/10.1007/s11265-014-0967-z https://doi.org/10.1126/science.1127647 https://doi.org/10.1145/2696454.2696495 https://doi.org/10.1515/pjbr-2016-0004 https://doi.org/10.1155/2017/2107451 1602 International Journal of Social Robotics (2022) 14:1583–1604 59. Kanda T, Hirano T, Eaton D, Ishiguro H (2004) Interactive robots as social partners and peer tutors for children: a field trial. Hum Comput Interact (Special issues on human–robot interaction) 19:61–84 60. Kar NB, Babu KS, Jena SK (2017) Face expression recognition using histograms of oriented gradients with reduced features. In: Raman B, Kumar S, Roy PP, Sen D (eds) Proceedings of inter- national conference on computer vision and image processing. Springer, Singapore, pp 209–219 61. Kim DH, Jung S, An K, Lee H, Chung M (2006) Development of a facial expression imitation system, pp 3107–3112 62. Kim J, Kim B, Roy PP, Jeong D (2019) Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access 7:41273–41285 63. Kirgis FP, Katsos P, Kohlmaier M (2016) Collaborative robotics. Springer, Cham, pp 448–453 64. Kishi T, Otani T, Endo N, Kryczka P, Hashimoto K, Nakata K, Takanishi A (2012) Development of expressive robotic head for bipedal humanoid robot. In: 2012 IEEE/RSJ international confer- ence on intelligent robots and systems, pp 4584–4589 65. Kotsia I,NikolaidisN, Pitas I (2007) Facial expression recognition in videos using a novel multi-class support vector machines vari- ant. In: 2007 IEEE international conference on acoustics, speech and signal processing–ICASSP ’07, vol 2, pp II-585–II-588 66. Kotsia I, Pitas I (2007) Facial expression recognition in image sequences using geometric deformation features and support vec- tor machines. IEEE Trans Image Process 16(1):172–187 67. Kozima H, Nakagawa C, Yasuda Y (2005) Interactive robots for communication-care: a case-study in autism therapy. In: ROMAN 2005. In: IEEE international workshop on robot and human inter- active communication, pp 341–346 68. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 69. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hub- bardW, JackelLD (1989)Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551 70. LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional net- works and applications in vision. In: Proceedings of 2010 IEEE international symposium on circuits and systems, pp 253–256 71. Li S, DengW (2020)Deep facial expression recognition: a survey. IEEE Trans Affect Comput, pp 1–1 72. Li TS, Kuo P, Tsai T, Luan P (2019) Cnn and lstm based facial expression analysis model for a humanoid robot. IEEE Access 7:93998–94011 73. Li Y, Hashimoto M (2011) Effect of emotional synchronization using facial expression recognition in human–robot communica- tion 74. Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using cnnwith attentionmechanism. IEEE Trans Image Process, pp 1–1 (2018) 75. Li Y, Zeng J, Shan S, Chen X (2018) Patch-gated cnn for occlusion-aware facial expression recognition. In: 201824th inter- national conference on pattern recognition (ICPR), pp 2209–2214 76. Liang D, Liang H, Yu Z, Zhang Y (2019) Deep convolutional bil- stm fusion network for facial expression recognition. Vis Comput 36:499–508 77. Liliana DY, Basaruddin C, Widyanto MR (2017) Mix emotion recognition from facial expression using svm-crf sequence classi- fier. In: Proceedings of the international conference on algorithms, computing and systems, ICACS ’17. Association for Computing Machinery, New York, NY, USA, pp 27–31 78. Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 79. Liu K, Hsu C,WangW, Chiang H (2019) Real-time facial expres- sion recognition based on cnn. In: 2019 international conference on system science and engineering (ICSSE), pp 120–123 . https:// doi.org/10.1109/ICSSE.2019.8823409 80. Liu P,ChooKKR,WangL,HuangF (2017) Svmor deep learning? A comparative study on remote sensing image classification. Soft Comput 21(23):7053–7065 81. Liu ZT, Wu M, Cao W, Chen LF, Xu J, Zhang R, Zhou M, Mao J (2017) A facial expression emotion recognition based human– robot interaction system. IEEE/CAA J Autom Sin 4:668–676. https://doi.org/10.1109/JAS.2017.7510622 82. Lopez-Rincon A (2019) Emotion recognition using facial expres- sions in children using the nao robot. In: 2019 international con- ference on electronics, communications and computers (CONI- ELECOMP), pp 146–153 83. Ma F, Zhang W, Li Y, Huang SL, Zhang L (2020) Learning better representations for audio-visual emotion recognition with common information. Appl Sci 10:7239. https://doi.org/10.3390/ app10207239 84. Maeda Y, Geshi S (2018) Human–robot interaction using Marko- vian emotional model based on facial recognition. In: 2018 Joint 10th international conference on soft computing and intelligent systems (SCIS) and 19th international symposium on advanced intelligent systems (ISIS). IEEE, pp 209–214 85. MannanMA, LamA, Kobayashi Y, Kuno Y (2015) Facial expres- sion recognition based on hybrid approach. In: Huang DS, Han K (eds)Adv Intell ComputTheorAppl. Springer, Cham, pp 304–310 86. Marmpena M, Lim A, Dahl TS, Hemion N (2019) Generating robotic emotional body language with variational autoencoders. In: 2019 8th international conference on affective computing and intelligent interaction (ACII). IEEE, pp 545–551 87. Martin C,Werner U, Gross H (2008) A real-time facial expression recognition system based on active appearance models using gray images and edge images. In: 2008 8th IEEE international confer- ence on automatic face gesture recognition, pp 1–6 . https://doi. org/10.1109/AFGR.2008.4813412 88. Martinez B, Valstar MF, Jiang B, Pantic M (2019) Automatic analysis of facial actions: a survey. IEEE Trans Affect Comput 10(3):325–347 89. Mayya V, Pai RM, Pai MMM (2016) Automatic facial expression recognition using DCNN. Proc Comput Sci 93:453–461. https:// doi.org/10.1016/j.procs.2016.07.233 90. McColl D, Nejat G (2014) Recognizing emotional body language displayed by a human-like social robot. Int J Soc Robot 6(2):261– 280 91. Meghdari A, Shouraki S, Siamy A, Shariati A (2016) The real- time facial imitation by a social humanoid robot 92. Mehrabian A (1968) Communication without words. Psychol Today 2:53–56 93. Meng Z, Liu P, Cai J, Han S, Tong Y (2017) Identity-aware con- volutional neural network for facial expression recognition. In: 2017 12th IEEE international conference on automatic face ges- ture recognition (FG 2017), pp 558–565 (2017) 94. Minaee S, Abdolrashidi A (2019) Deep-emotion: facial expres- sion recognition using attentional convolutional network. arXiv preprint arXiv:1902.01019 95. MistryK, Zhang L, Neoh SC, LimCP, Fielding B (2017) Amicro- GA embedded PSO feature selection approach to intelligent facial emotion recognition. IEEE Trans Cybern 47(6):1496–1509 96. Moeini A, Moeini H, Faez K (2014) Pose-invariant facial expres- sion recognition based on 3d face reconstruction and synthesis from a single 2d image. In: 2014 22nd international conference on pattern recognition, pp 1746–1751. https://doi.org/10.1109/ ICPR.2014.307 97. Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE winter conference on applications of computer vision (WACV), pp 1–10 (2016) 123 http://arxiv.org/abs/1506.00019 https://doi.org/10.1109/ICSSE.2019.8823409 https://doi.org/10.1109/ICSSE.2019.8823409 https://doi.org/10.1109/JAS.2017.7510622 https://doi.org/10.3390/app10207239 https://doi.org/10.3390/app10207239 https://doi.org/10.1109/AFGR.2008.4813412 https://doi.org/10.1109/AFGR.2008.4813412 https://doi.org/10.1016/j.procs.2016.07.233 https://doi.org/10.1016/j.procs.2016.07.233 http://arxiv.org/abs/1902.01019 https://doi.org/10.1109/ICPR.2014.307 https://doi.org/10.1109/ICPR.2014.307 International Journal of Social Robotics (2022) 14:1583–1604 1603 98. Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Mul- timodal deep learning. In: ICML 99. Nicolescu MN, Mataric MJ (2001) Learning and interacting in human–robot domains. IEEE Trans Syst Man Cybern Part A Syst Hum 31(5):419–430 100. NunesARV (2019)Deep emotion recognition through upper body movements and facial expression. Master’s thesis, Aalborg Uni- versity 101. Nwosu L,Wang H, Lu J, Unwala I, Yang X, Zhang T (2017) Deep convolutional neural network for facial expression recognition using facial parts. In: 2017 IEEE 15th international conference on dependable, autonomic and secure computing, 15th interna- tional conference on pervasive intelligence and computing, 3rd international conference on big data intelligence and computing and cyber science and technology congress, pp 1318–1321 102. Park JW, Lee H, Chung M (2014) Generation of realistic robot facial expressions for human robot interaction. J Intell Robot Syst 78:443–462 103. Prajapati S, Shrinivasa Naika CL, Jha S, Nair S (2013) On ren- dering emotions on a robotic face, pp 1–7 104. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE77(2):257– 286 105. Ray C, Mondada F, Siegwart R (2008) What do people expect from robots? pp 3816–3821 106. Ringeval F, Eyben F, Kroupi E, Yuce A, Thiran JP, Ebrahimi T, Lalanne D, Schuller B (2015) Prediction of asynchronous dimen- sional emotion ratings from audiovisual and physiological data. Pattern Recogn Lett 66:22–30. https://doi.org/10.1016/j.patrec. 2014.11.007 (Pattern Recognition in Human Computer Inter- action) 107. Ringeval F, SchullerB,ValstarM, Jaiswal S,Marchi E, LalanneD, CowieR, PanticM (2015)Av+ ec 2015–the first affect recognition challenge bridging across audio, video, and physiological data 108. Romero P, Cid F, Núnez P (2013) A novel real time facial expres- sion recognition systembased on candide-3 reconstructionmodel. In: Proceedings of the XIV workshop on physical agents (WAF 2013), Madrid, Spain, pp 18–19 109. Rouast PV, Adam MTP, Chiong R (2019) Deep learning for human affect recognition: insights and new developments. ArXiv arXiv:1901.02884 110. Ruiz-Garcia A, ElshawM, Altahhan A, Palade V (2018) A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3358-8 111. Saerbeck M, Bartneck C (2010) Perception of affect elicited by robot motion, pp. 53–60 112. Sariyanidi E, Gunes H, Cavallaro A (2015) Automatic analysis of facial affect: a survey of registration, representation, and recogni- tion. IEEE Trans Pattern Anal Mach Intell 37(6):1113–1133 113. Saxena S, Tripathi S, Sudarshan TSB (2019) Deep dive into faces: Pose illumination invariant multi-face emotion recognition sys- tem. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1088–1093 . https://doi.org/10. 1109/IROS40897.2019.8967874 114. Shi Y, Chen Y, Ardila LR, Venture G, Bourguet ML (2019) A visual sensing platform for robot teachers. In: Proceedings of the 7th international conference on human–agent interaction, pp 200– 201 115. Sikka K, Dhall A, Bartlett M (2015) Exemplar hidden Markov models for classification of facial expressions in videos. In: Pro- ceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 18–25 116. Simul NS, Ara NM, Islam MS (2016) A support vector machine approach for real time vision based human robot interaction. In: 2016 19th international conference on computer and information technology (ICCIT), pp 496–500 117. Srivastava N, Salakhutdinov RR (2012)Multimodal learningwith deep Boltzmann machines. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates Inc, Red Hook, pp 2222–2230 118. Stock R, Merkle M (2018) Can humanoid service robots perform better than service employees? a comparison of innovative behav- ior cues. https://doi.org/10.24251/HICSS.2018.133 119. Stock R, Nguyen MA (2019) Robotic psychology what do we know about human–robot interaction and what do we still need to learn? 120. Stock RM (2016) Emotion transfer from frontline social robots to human customers during service encounters: testing an artificial emotional contagion modell. In: ICIS 121. Stock RM, Merkle M (2017) A service robot acceptance model: user acceptance of humanoid robots during service encounters. In: IEEE international conference on pervasive computing and communications workshops (PerCom Workshops), pp 339–344 . https://doi.org/10.1109/PERCOMW.2017.7917585 122. Stock-Homburg R (2021) Survey of emotions in human–robot interaction—after 20 years of research: What do we know and what have we still to learn? Int J Soc Robot 123. Sukhbaatar S, Makino T, Aihara K, Chikayama T (2011) Robust generation of dynamical patterns in humanmotion by a deep belief nets. In: Asian conference on machine learning, pp 231–246 124. Taira H, Haruno M (1999) Feature selection in SVM text catego- rization. In: AAAI/IAAI, pp 480–486 125. Tanaka F, Cicourel A, Movellan J (2007) Socialization between toddlers and robots at an early childhood education center. Proc Natl Acad Sci USA 104:17954–8 126. Uddin MZ, Hassan MM, Almogren A, Alamri A, Alrubaian M, Fortino G (2017) Facial expression recognition utilizing local direction-based robust features and deep belief network. IEEE Access 5:4525–4536 127. UddinMZ,KhaksarW,Torresen J (2017) Facial expression recog- nition using salient features and convolutional neural network. IEEE Access 5:26146–26161 128. Vapnik VN (1995) The nature of statistical learning theory. Springer, NewYork 129. Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999 130. Vithanawasam T, Madhusanka A (2019) Face and upper-body emotion recognition using service robot’s eyes in a domestic envi- ronment, pp 44–50 131. WangK, PengX,Yang J,MengD,QiaoY (2020)Region attention networks for pose and occlusion robust facial expression recog- nition. IEEE Trans Image Process 29:4057–4069 132. Wang Q, Ju S (2008) A mixed classifier based on combination of HMM and KNN. In: 2008 fourth international conference on natural computation, vol 4, pp 38–42 133. Webb N, Ruiz-Garcia A, Elshaw M, Palade V (2020) Emotion recognition from face images in an unconstrained environment for usage on social robots. In: 2020 international joint conference on neural networks (IJCNN), pp. 1–8 134. WimmerM,MacDonald BA, Jayamuni D, Yadav A (2008) Facial expression recognition for human–robot interaction—aprototype. In: SommerG,Klette R (eds) RobotVis. Springer, Berlin, pp 139– 152 135. Wu C, Wang S, Ji Q (2015) Multi-instance hidden Markov model for facial expression recognition. In: 2015 11th IEEE interna- tional conference and workshops on automatic face and gesture recognition (FG), vol 1, pp 1–6 136. Wu M, Su W, Chen L, Liu Z, Cao W, Hirota K (2019) Weight- adapted convolution neural network for facial expression recog- 123 https://doi.org/10.1016/j.patrec.2014.11.007 https://doi.org/10.1016/j.patrec.2014.11.007 http://arxiv.org/abs/1901.02884 https://doi.org/10.1007/s00521-018-3358-8 https://doi.org/10.1109/IROS40897.2019.8967874 https://doi.org/10.1109/IROS40897.2019.8967874 https://doi.org/10.24251/HICSS.2018.133 https://doi.org/10.1109/PERCOMW.2017.7917585 1604 International Journal of Social Robotics (2022) 14:1583–1604 nition in human–robot interaction. IEEE Trans Syst Man Cybern Syst 137. Yaddaden Y, Bouzouane A, Adda M, Bouchard B (2016) A new approach of facial expression recognition for ambient assisted liv- ing. In: Proceedings of the 9th ACM international conference on PErvasive technologies related to assistive environments, PETRA ’16. Association for Computing Machinery, New York, NY, USA 138. Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolu- tional neural networks: an overview and application in radiology. Insights Imaging 9(4):611–629 139. Yang B, Cao J, Ni R, Zhang Y (2018) Facial expression recog- nition using weighted mixture deep neural network based on double-channel facial images. IEEE Access 6:4630–4640 140. Yang H, Yin L (2017) CNN based 3d facial expression recog- nition using masking and landmark features. In: 2017 seventh international conference on affective computing and intelligent interaction (ACII), pp 556–560 141. Yoo B, Cho S, Kim J (2011) Fuzzy integral-based composite facial expression generation for a robotic head. In: 2011 IEEE international conference on fuzzy systems (FUZZ-IEEE 2011), pp 917–923 142. Yu C, Tapus A (2019) Interactive robot learning for multimodal emotion recognition. In: SalichsMA,GeSS,BarakovaEI, Cabibi- han JJ, Wagner AR, Castro-González Á, He H (eds) Social robotics. Springer, Cham, pp 633–642 143. Zhang F, Zhang T, Mao Q, Xu C (2018) Joint pose and expression modeling for facial expression recognition. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 3359– 3368 . https://doi.org/10.1109/CVPR.2018.00354 144. Zhang F, Zhang T, Mao Q, Xu C (2020) Geometry guided pose- invariant facial expression recognition. IEEETrans Image Process 29:4445–4460. https://doi.org/10.1109/TIP.2020.2972114 145. ZhangK, HuangY,DuY,Wang L (2017) Facial expression recog- nition based on deep evolutional spatial-temporal networks. IEEE Trans Image Process 26(9):4193–4203 146. Zhang Z, Luo P, Loy CC, Tang X (2016) From facial expression recognition to interpersonal relation prediction. Int J Comput Vis 147. Zhao L, Wang Z, Zhang G (2017) Facial expression recogni- tion from video sequences based on spatial-temporal motion local binary pattern andGabormultiorientation fusion histogram.Math Probl Eng Publisher’s Note Springer Nature remains neutral with regard to juris- dictional claims in published maps and institutional affiliations. 123 https://doi.org/10.1109/CVPR.2018.00354 https://doi.org/10.1109/TIP.2020.2972114 Facial Emotion Expressions in Human–Robot Interaction: A Survey Abstract 1 Introduction 2 Framework of the Overview 3 Method 4 Recognition of Human Facial Expressions 4.1 FER on Predefined Dataset 4.2 FER in Real-Time 5 Facial Emotion Expression by Robots 5.1 Facial Expression Generation is Hand-Coded 5.2 Facial Expression Generation is Automated 6 Discussion 6.1 Summary of the State of the Art 6.2 Future Research 7 Conclusion Acknowledgements References