TOURSG: A RESEARCH DIALOGUE CORPUS IN THE TOURISTIC DOMAIN
TourSG consists of dialogue sessions on touristic information for Singapore. It was collected from Skype calls between actual tour guides and tourists. Two collections are available: one in English (EN-TourSG) and one in Chinese (ZH-TourSG). EN-TourSG comprises 35 dialogue sessions and ZH-TourSG comprises 36 dialogue sessions, with a total length of 21 hours of conversations per language.
All dialogue sessions have been manually transcribed and annotated with speech act and semantic labels at the turn level. Annotations at the sub-dialogue segment level are also available (each full dialogue session is divided into sub-dialogues considering their topical coherence; each sub-dialogue is assigned to a major topic category and annotated with an additional frame structure with slot value pairs to represent the subject discussed within the sub-dialogue).
EN-TourSG and ZH-TourSG have been used as evaluation data for the Fourth and Fifth Dialogue State Tracking Challenges (DSTC4 [1] and DSTC5 [2] ).
Basic statistics of the datasets:
Language | Dialogues | Utterances | Words / Characters | Total Duration |
English | 35 | 31,034 | 273,580 words | 21 hours |
Chinese | 36 | 54,464 | 492,711 characters | 21 hours |
[1] Seokhwan Kim, Luis Fernando D'Haro, Rafael E. Banchs, Jason D. Williams, Matthew Henderson, The Fourth Dialog State Tracking Challenge. Proceedings of the 7th International Workshop on Spoken Dialogue Systems (IWSDS 2016), Saariselkä, Jan 2016
[2] Seokhwan Kim, Luis Fernando D'Haro, Rafael E. Banchs, Jason D. Williams, Matthew Henderson, Koichiro Yoshino, The Fifth Dialog State Tracking Challenge. Proceedings of the 2016 IEEE Spoken Language Technology Workshop (SLT 2016), San Diego, Dec 2016
A*STAR celebrates International Women's Day
From groundbreaking discoveries to cutting-edge research, our researchers are empowering the next generation of female science, technology, engineering and mathematics (STEM) leaders.
Get inspired by our #WomeninSTEM