Résumé:
Living in the age of data has made the world eager to find the right information quickly and
efficiently. But the task became harder and harder over the years because of the huge amount
of existing data! That is why, scientists had thought of the Multi-Documents summarizing, a
technique that can help people from finding the most sufficient data in no-time by using newly
made inventions such as Machine learning and Neural Networking.
Automatic summarization is the process of shortening a text document with software, to
create a summary with the major points of the original document. Why it is Important? It can
quickly extract accurate content and help reader understand large volume of information.
Our job in this thesis was to build and fine tune a model based on transformers neural network
called Pegasus and compare our work with the work of last year.
We have built and fine-tuned a model of Pegasus that was proposed by the Google AI team
in 10 Jul 2020 that suggested a new way of fine-tuning. Using Clustering Algorithms, we have
preprocessed our data sets and got pertinent results. The Algorithm consisted of choosing the
most pertinent sentences and concatenated several documents into one document, then got
sentences from that document and compared each sentence with all others.
The evaluation was done automatically using the ROUGE scores. Our method, as simple as
it is, has shown promising results since the scores were higher then the previous works.
Key Words :
Text Summary, Transformers, BERT, GPT-3, Pegasus, ROUGE.