2024-03-28T19:12:24Zhttps://www.tdx.cat/oai/requestoai:www.tdx.cat:10803/6662222024-03-15T10:58:01Zcom_10803_236col_10803_690279
nam a 5i 4500
Prosody
Automatic speech transcription
Punctuation restoration
Spoken language machine translation
Bilingual spoken corpus
Prosòdia
Transcripció automàtica de la parla
Restauració de la puntuació
Traducció automàtica de llenguatge oral
Corpus bilingües
Incorporating prosody into neural speech processing pipelines: applications on automatic speech transcription and spoken language machine translation
[Barcelona] :
Universitat Pompeu Fabra,
2019
Accés lliure
http://hdl.handle.net/10803/666222
cr |||||||||||
AAMMDDs2019 sp ||||fsm||||0|| 0 eng|c
Öktem, Alp,
autor
Programa de doctorat en Tecnologies de la Informació i les Comunicacions,
degree
1 recurs en línia (139 pàgines)
Tesi
Doctorat
Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions
2019
Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions
Tesis i dissertacions electròniques
Farrús, Mireia,
supervisor acadèmic
Bonafonte Cávez, Antonio,
supervisor acadèmic
TDX
In this dissertation, I study the inclusion of prosody into two applications that involve speech understanding:~automatic speech transcription and spoken language translation. In the former case, I propose a method that uses an attention mechanism over parallel sequences of prosodic and morphosyntactic features. Results indicate an $F_1$ score of 70.3\% in terms of overall punctuation generation accuracy. In the latter problem I deal with enhancing spoken language translation with prosody. A neural machine translation system trained with movie-domain data is adapted with pause features using a prosodically annotated bilingual dataset. Results show that prosodic punctuation generation as a preliminary step to translation increases translation accuracy by 1\% in terms of BLEU scores. Encoding pauses as an extra encoding feature gives an additional 1\% increase to this number. The system is further extended to jointly predict pause features in order to be used as an input to a text-to-speech system.
f
ES-BaCBU
cat
rda
ES-BaCBU
text
txt
rdacontent
informàtic
c
rdamedia
recurs en línia
cr
rdacarrier