In the situation of supervised Finding out, the trainers played either side: the consumer and the AI assistant. From the reinforcement Finding out stage, human trainers to start with ranked responses that the design had established within a previous dialogue.[14] These rankings were being utilised to make "reward styles" that were utilized to fanta