Multi-Documents summarizing using transformers neural networks

Khelifa, Souhail; Zoubir, Chouaib Wali-Eddine

Multi-Documents summarizing using transformers neural networks

Khelifa, Souhail; Zoubir, Chouaib Wali-Eddine

URI: http://di.univ-blida.dz:8080/jspui/handle/123456789/13807

Date: 2021

Résumé:

Living in the age of data has made the world eager to find the right information quickly and efficiently. But the task became harder and harder over the years because of the huge amount of existing data! That is why, scientists had thought of the Multi-Documents summarizing, a technique that can help people from finding the most sufficient data in no-time by using newly made inventions such as Machine learning and Neural Networking. Automatic summarization is the process of shortening a text document with software, to create a summary with the major points of the original document. Why it is Important? It can quickly extract accurate content and help reader understand large volume of information. Our job in this thesis was to build and fine tune a model based on transformers neural network called Pegasus and compare our work with the work of last year. We have built and fine-tuned a model of Pegasus that was proposed by the Google AI team in 10 Jul 2020 that suggested a new way of fine-tuning. Using Clustering Algorithms, we have preprocessed our data sets and got pertinent results. The Algorithm consisted of choosing the most pertinent sentences and concatenated several documents into one document, then got sentences from that document and compared each sentence with all others. The evaluation was done automatically using the ROUGE scores. Our method, as simple as it is, has shown promising results since the scores were higher then the previous works. Key Words : Text Summary, Transformers, BERT, GPT-3, Pegasus, ROUGE.