Generative models for automatic  multi-document summarization

Bensidiaissa, Walid; Bouchetara, Rym

dc.contributor.author	Bensidiaissa, Walid
dc.contributor.author	Bouchetara, Rym
dc.date.accessioned	2021-01-26T12:42:19Z
dc.date.available	2021-01-26T12:42:19Z
dc.date.issued	2020-10-26
dc.identifier.uri	http://di.univ-blida.dz:8080/jspui/handle/123456789/9420
dc.description	ill., Bibliogr.	fr_FR
dc.description.abstract	In recent years, there has been an explosion in the amount of text data from a variety of sources. This data needs to be eﬀectively summarized to be useful. Text summarization in natural language processing has widely been approached with extractive methods that stick to selecting parts of the original document to capture the main topic ideas. What has been less attempted is abstractive summarization. In our work, we focus on the latter type of automatic summarization. We performed a series of experiments to judge the eﬀectiveness of abstractive summarization systems, whether or not they are applicable in a real context. Our choice went towards the use of the machine learning approach with models inspired by the architecture of transformers. At ﬁrst, we focused on the extractive multi-document summarization, then we ﬁnetuned DistilBart, a recent model proposed by the Huggingface team, for abstractive summarization on diﬀerent datasets and compared each of the obtained models with the basic model, and then between them. We also created an algorithm to be used during preprocessing. The objective of this algorithm is to replace similar sentences that are grouped in clusters by a single sentence belonging to that cluster. This algorithm also uses a model based on transformers. Evaluation is done automatically using the ROUGE scores. Our method, as simple as it is, has shown promising results since the scores were higher when using that preprocessing. Keywords: Automatic Summary, Abstract, Multi-Document, Deep Learning, Semantic similarity, Fine-tuning, Transformers, BERT, GPT-2, BART.	fr_FR
dc.language.iso	en	fr_FR
dc.publisher	Université Blida 1	fr_FR
dc.subject	Automatic Summary	fr_FR
dc.subject	Abstract	fr_FR
dc.subject	Multi-Document	fr_FR
dc.subject	Deep Learning	fr_FR
dc.subject	Semantic similarity	fr_FR
dc.subject	Fine-tuning	fr_FR
dc.subject	Transformers	fr_FR
dc.subject	BERT	fr_FR
dc.subject	GPT-2	fr_FR
dc.subject	BART	fr_FR
dc.title	Generative models for automatic multi-document summarization	fr_FR
dc.type	Thesis	fr_FR