This is my dissertation for my bachelor’s degree in Cognitive Science.
Head motion plays an important nonverbal role in face-to-face communication. In the literature on animated talking agents, there is some work on speech-driven head motion synthesis, but little on text-driven synthesis in spite of the fact that the semantic content of an utterance is an important contributor to head motion. We present an evaluation of different neural network architectures, an analysis of hyperparameters, and finally a text-driven, deep neural network-based system for head motion synthesis. The proposed model performs well in subjective evaluations, and we show that this is in part because of its access to the meaning of words in input sentences.
You can read the dissertation here (PDF link).