2024-03-29T12:07:17Zhttps://www.tdx.cat/oai/requestoai:www.tdx.cat:10803/1178502017-08-31T21:06:07Zcom_10803_1col_10803_82
nam a 5i 4500
Paràfrasi
Paráfrasis
Paraphrase
Corpus (Lingüística)
Corpora (Linguistics)
Tipologia (Lingüística)
Tipología (Lingüística)
Typology (Linguistics)
Anotació de corpus
Anotación de corpus
Corpus annotation
Detecció de plagis
Detección de plagios
Plagiarism detection
Paraphrase Scope and Typology. A Data-Driven Approach from Computational Linguistics / Abast i tipologia de la paràfrasi. Una aproximació empíriica des de la lingüíística computacional
[Barcelona] :
Universitat de Barcelona,
2013
Accés lliure
http://hdl.handle.net/10803/117850
cr |||||||||||
AAMMDDs2013 sp ||||fsm||||0|| 0 eng|c
Vila Rigat, Marta,
autor
1 recurs en línia (235 pàgines)
The contents of the first article in this PhD thesis (from page 30), namely "Is this a paraphrase? What kind? Paraphrase boundaries and typology" by Marta Vila, M. Antònia Martí, and Horacio Rodríguez, were finally published at the Open Journal of Modern Linguistics journal. Vila, Marta, M. Antònia Martí, and Horacio Rodríguez. 2014. Is this a paraphrase? What kind? Paraphrase boundaries and typology. Open Journal of Modern Linguistics, 4:205-218. http://dx.doi.org/10.4236/ojml.2014.41016
Els continguts del primer article d'aquesta tesi doctoral (des de la pàgina 30), titulat "Is this a paraphrase? What kind? Paraphrase boundaries and typology" de Marta Vila, M. Antònia Martí i Horacio Rodríguez, van ser finalment publicats a la revista Open Journal of Modern Linguistics. Vila, Marta, M. Antònia Martí i Horacio Rodríguez. 2014. Is this a paraphrase? What kind? Paraphrase boundaries and typology. Open Journal of Modern Linguistics, 4:205-218. http://dx.doi.org/10.4236/ojml.2014.41016
Los contenidos del primer artículo de esta tesis doctoral (desde la página 30), titulado "Is this a paraphrase? What kind? Paraphrase boundaries and typology" de Marta Vila, M. Antònia Martí y Horacio Rodríguez, fueron finalmente publicados en la revista Open Journal of Modern Linguistics. Vila, Marta, M. Antònia Martí y Horacio Rodríguez. 2014. Is this a paraphrase? What kind? Paraphrase boundaries and typology. Open Journal of Modern Linguistics, 4:205-218. http://dx.doi.org/10.4236/ojml.2014.41016
Tesi
Doctorat
Universitat de Barcelona. Departament de Lingüística General
2013
Universitat de Barcelona. Departament de Lingüística General
Tesis i dissertacions electròniques
Martí Antonin, M. Antònia,
supervisor acadèmic
Rodríguez Hontoria, Horacio,
supervisor acadèmic
TDX
Paraphrasing is generally understood as approximate sameness of meaning between snippets of text with a different wording. Paraphrases are omnipresent in natural languages demonstrating all the aspects of its multifaceted nature. The pervasiveness of paraphrasing has made it a focus of several tasks in computational linguistics; its complexity has in turn resulted in paraphrase remaining a still unresolved challenge.
Two basic issues, directly linked to the complex nature of paraphrasing, make its computational treatment particularly difficult, namely the absence of a precise and commonly accepted definition and the lack of reference corpora for paraphrasing. Based on the assumption that linguistic knowledge should underlie computational-linguistics research, this thesis aims to go a step forward in these two questions: paraphrase characterization and paraphrase-corpus building and annotation. The knowledge and resources created are then applied to natural language processing and, in concrete, to automatic plagiarism detection in order to empirically analyse their potential.
This thesis is built as an article compendium comprising six core articles divided in three blocks: (i) paraphrase scope and typology, (ii) paraphrase-corpus creation and annotation, and (iii) paraphrasing in automatic plagiarism detection.
In the first block, assuming that paraphrase boundaries are not fixed but depend on the field, task, and objectives, three borderline paraphrase cases are presented: paraphrases involving content loss, pragmatic knowledge, and certain grammatical features. The limits between paraphrasing and related phenomena such as coreference are also analysed. Paraphrase characterization takes on a new dimension if we look at it in extensional terms. We have built a general and linguistically-grounded paraphrase typology in line with this approach. The third issue addressed in this block is paraphrase representation, which we consider to be essential in order to formally apprehend paraphrasing.
In the second block, the Wikipedia-based Relational Paraphrase Acquisition method (WRPA) is presented. It allows for the automatic extraction of paraphrases expressing a concrete relation from Wikipedia. Using this method, the WRPA corpus, covering different relations and two languages (English and Spanish), was built. A subset of the Spanish WRPA corpus, together with paraphrases in two English paraphrase corpora that are different in nature were annotated applying a new annotation scheme derived from our paraphrase typology. These annotations were validated applying the Inter-annotator Agreement for Paraphrase-Type Annotation measures (IAPTA), also developed in the framework of this thesis.
In the third and final block, our typology is applied to the field of automatic plagiarism detection, demonstrating that more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, and that lexical substitutions
and text-snippet additions/deletions are the most widely used paraphrase mechanisms when plagiarizing. This provides insights for future research in automatic plagiarism detection and demonstrates, through a concrete example, the value of the knowledge and data provided in this thesis to computational-linguistics research.
b
ES-BaCBU
cat
rda
ES-BaCBU
text
txt
rdacontent
informàtic
c
rdamedia
recurs en línia
cr
rdacarrier