Introducing CARESSER: a Framework for in Situ Learning Robot Social Assistance from Expert Knowledge and Demonstrations
Antonio Andriella a , Carme Torras a , Carla Abdelnour b and Guillem Alenyà a
a Institut de Robòtica i Informàtica Industrial, CSIC-UPC, C/ Llorens i Artigas 4-6, 08028 Barcelona, Spain.
b Research Center and Memory Clinic, Fundació ACE, Institut Català de Neurociències Aplicades, Universitat Internacional de Catalunya, Barcelona, Spain. .
Abstract: Socially Assistive Robots have the potential to augment and enhance caregiver's effectiveness in repetitive tasks such as cognitive therapies. However, their contribution has generally been limited as domain experts have not been fully involved in the entire pipeline of the design process as well as in the automatisation of the robots' behaviour.
In this article, we present aCtive leARning agEnt aSsiStive bEhaviouR (CARESSER), a novel framework that actively learns robotic assistive behaviour by leveraging the therapist's expertise (knowledge-driven approach) and their demonstrations (data-driven approach. By exploiting that hybrid approach, the presented method enables in situ fast learning, in a fully autonomous fashion, of personalised patient-specific policies.
With the purpose of evaluating our framework, we conducted two user studies in a daily care centre in which older adults affected by mild dementia and mild cognitive impairment (N=22) were requested to solve cognitive exercises with the support of a therapist and later on of a robot endowed with CARESSER. Results showed that: i) the robot was more competent than the therapist in keeping patients' performance constant during the sessions; ii) the assistance offered by the robot during the sessions eventually matched the therapist's preferences. We conclude that CARESSER, with its stakeholder-centric design, can pave the way to new AI approaches, that learn by leveraging human-human interactions along with human expertise, which has the benefits of speeding up the learning process, eliminating the need for the design of complex reward functions and finally, avoiding undesired states.
The Figure shows the main stages of CARESSER Framework. In the offline learning phase, firstly, we gather therapist's expertise and demonstrations over several sessions (1-2). Secondly, we build the generative Bayesian models of the patient and the therapist (human or robotic) (3) and we run a simulation using the GOAL simulator (4). Thirdly, with the collected episodes output from GOAL, we compute the reward function by means of Max Causal Entropy Inverse Reinforcement Learning and therefore the policy obtained by using value interaction (5). Fourthly, we embed the policy on the robot (6).
Then, in the online learning phase, the robot with the initial learnt policy starts administering the exercise to the patient (7). After each session, the generative models are updated (3) with the new data, new episodes are generated (4) and a new reward function and policy are learnt (5-6) and employed in the next session (7).
RQ1a: Would the social assistance offered by the robot match the therapist's preferences?
RQ1b: To what extent, if any, are the patients' performances different when assisted by the robot therapist from when they are assisted by the human therapist or estimated by the persona-specific simulator?
RQ1c: Would the robot be able to keep the patient engaged to avoid both boredom and anxiety?
1. developing CARESSER, a framework that actively learns the robot’s socially assistive behaviours by leveraging therapist’s demonstrations and expertise,
2. developing Generative mOdel Agent simuLation (GOAL), a patient-specific simulator which by means of generative Bayesian models of the patient and the robot keeps track of the patient’s cognitive abilities during the task and the robot’s assistive behaviour and generates interactions accordingly,
3. designing effective robot's socially assistive behaviours, which combine voice, gestures, and facial expressions, by involving stakeholders in the designing process,
4. validating CARESSER in a fully autonomous robot with patients affected by mild cognitive impairment and mild dementia in a short-term in-situ cognitive training scenario.
Study with the Human Therapist to Model Interactions
The following video shows an example of cognitive therapy's session. A human therapist, Joan, provides a patient, Mary, with different levels of assistance, while she is playing a cognitive exercise.
Study with the Robot Therapist to Evaluate CARESSER
The following video shows an example of cognitive therapy's session (S=1). A robot therapist, TIAGo provides a patient, Mary, with different levels of assistance learnt by means of CARESSER.
The following video shows an example of cognitive therapy's session (S=3). A robot therapist, TIAGo provides a patient, Mary, with different levels of assistance learnt by means of CARESSER.
1. The socially assistive behaviour offered by the robot eventually matched the therapist's preferences after 6 sessions.
The Figure shows the average score assigned by the therapist to each session (n.s. denotes p<.05, * denotes .01 < p < .05, ** denotes .001 < p < .01, *** denotes .0001 < p < .001).
2. Patients' performances were significant different when they were assisted by the robot therapist from when they received assistance by the human therapist. On the contrary, patients' performances estimated by using GOAL simulator were not significantly different from when patients were assisted by the robot therapist.
Figure (a) shows how the performance of the patients (y-axis) changed according to who provided them with assistance (x-axis, assistance). It is important to note, that in simulation, both the therapist and the patients were simulated according to their respective generative models (n.s. denotes p < .05, * denotes .01 < p < .05). Figure (b) shows the average assistance for each level offered by the human therapist (blue bar) and the robot therapist (red bar), respectively.
3. The robot therapist, endowed with CARESSER, was capable to keep the patients' performance constant during the 6 sessions better than the human therapist.
In Figure the results of the human therapist and the robot therapist to keep the patients' performance constant over the six sessions. The standard deviation between the six sessions is employed as an evaluation metric (*** denotes .001 < p < .0001).
Human therapist policies correlation
Robot therapist policies correlation
In Figures the correlation matrix illustrates the rate of the number of matching of actions between two policies summed up after each of the six sessions for each patient. Low correlation indicates more personalisation.