2024-03-28T18:01:11Zhttps://www.tdx.cat/oai/requestoai:www.tdx.cat:10803/4046812024-03-15T10:58:04Zcom_10803_236col_10803_690279
nam a 5i 4500
Signal processing
Machine learning
Music information retrieval
Singing voice
Lyrics
Lyrics-to-audio alignment
Phonemes
Music scores
Turkish makam music
Beijing Opera
Hidden Markov models
Dynamic Bayesian Networks
Processament de senyal
Extracció d’informació musical
Aprenentatge automàtic
Veu de cant
Text del la cançó
Alineament de text amb àudio
Fonemes
Partitura
Música makam turc-otomana
Opera de Beijing
Models ocults de Markov
Xarxes dinàmiques de Bayes
Knowledge-based probabilistic modeling for tracking lyrics in music audio signals
[Barcelona] :
Universitat Pompeu Fabra,
2017
Accés lliure
http://hdl.handle.net/10803/404681
cr |||||||||||
AAMMDDs2017 sp ||||fsm||||0|| 0 eng|c
Dzhambazov, Georgi,
autor
Programa de doctorat en Tecnologies de la Informació i les Comunicacions,
degree
1 recurs en línia (113 pàgines)
Tesi
Doctorat
Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions
2017
Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions
Tesis i dissertacions electròniques
Serra, Xavier,
supervisor acadèmic
TDX
This thesis proposes specific signal processing and machine learning methodologies for
automatically aligning the lyrics of a song to its corresponding audio recording. The research
carried out falls in the broader field of music information retrieval (MIR) and in this respect,
we aim at improving some existing state-of-the-art methodologies, by introducing
domain-specific knowledge.
The goal of this work is to devise models capable of tracking in the music audio signal the
sequential aspect of one particular element of lyrics - the phonemes. Music can be
understood as comprising different facets, one of which is lyrics. The models we build take
into account the complementary context that exists around lyrics, which is any musical facet
complementary to lyrics. The facets used in this thesis include the structure of the music
composition, structure of a melodic phrase, the structure of a metrical cycle. From this
perspective, we analyse not only the low-level acoustic characteristics, representing the
timbre of the phonemes, but also higher-level characteristics, in which the complementary
context manifests. We propose specific probabilistic models to represent how the transitions
between consecutive sung phonemes are conditioned by different facets of complementary
context.
The complementary context, which we address, unfolds in time according to principles that
are particular of a music tradition. To capture these, we created corpora and datasets for two
music traditions, which have a rich set of such principles: Ottoman Turkish makam and
Beijing opera. The datasets and the corpora comprise different data types: audio recordings,
music scores, and metadata. From this perspective, the proposed models can take
advantage both of the data and the music-domain knowledge of particular musical styles to
improve existing baseline approaches.
As a baseline, we choose a phonetic recognizer based on hidden Markov models (HMM): a
widely-used methodology for tracking phonemes both in singing and speech processing
problems. We present refinements in the typical steps of existing phonetic recognizer
approaches, tailored towards the characteristics of the studied music traditions. On top of the
refined baseline, we device probabilistic models, based on dynamic Bayesian networks
(DBN) that represent the relation of phoneme transitions to its complementary context. Two
separate models are built for two granularities of complementary context: the structure of a
melodic phrase (higher-level) and the structure of the metrical cycle (finer-level). In one
model we exploit the fact the syllable durations depend on their position within a melodic
phrase. Information about the melodic phrases is obtained from the score, as well as from
music-specific knowledge.Then in another model, we analyse how vocal note onsets,
estimated from audio recordings, influence the transitions between consecutive vowels and
consonants. We also propose how to detect the time positions of vocal note onsets in
melodic phrases by tracking simultaneously the positions in a metrical cycle (i.e. metrical
accents).
In order to evaluate the potential of the proposed models, we use the lyrics-to-audio
alignment as a concrete task. Each model improves the alignment accuracy, compared to
the baseline, which is based solely on the acoustics of the phonetic timbre. This validates
our hypothesis that knowledge of complementary context is an important stepping stone for
computationally tracking lyrics, especially in the challenging case of singing with instrumental
accompaniment.
The outcomes of this study are not only theoretic methodologies and data, but also specific
software tools that have been integrated into Dunya - a suite of tools, built in the context of
CompMusic, a project for advancing the computational analysis of the world's music. With
this application, we have also shown that the developed methodologies are useful not only
for tracking lyrics, but also for other use cases, such as enriched music listening and
appreciation, or for educational purposes.
f
ES-BaCBU
cat
rda
ES-BaCBU
text
txt
rdacontent
informàtic
c
rdamedia
recurs en línia
cr
rdacarrier