Résumé:
Social media analysis is an effective tool to keep track of what are the general
public’s demands and opinions about all sorts of subjects ; however, when it comes
to non literal language social media content can be interpreted into misleading and
embarrassingly wrong information and that interpretation gets worse with ironic
and sarcastic content.
In this thesis we create an approach to detect sarcastic and ironic content in
Twitter so that it will possess more attention from analysts than other content and
therefore can be interpreted more delicately or manually if not possible ,the languages
that we are interested in our detection are English and Arabic.
In our approach we have focused on the linguistic features that previous researchers
have discovered in addition to the normalization of the used language in
terms of spelling and formalism . After performing tests on different artificial intelligence
models we obtained encouraging results in terms of accuracy .We found that
the ideal classifier for English sarcasm detection is Support vector machine SVM
which gave 98.38% in terms of accuracy, for English irony detection the best classifier
was logistic regression ; it gave 89.42% in terms of accuracy and finally for Arabic
sarcasm detection we found that the best classifier was stochastic gradient descent
and its accuracy was 86.16%,these results were encouraging due to the application
of language normalization.
After this step we have retained these models for a final web application that
allows to analyze a tweet or a group of tweets and return whether they’re ironic/sarcastic
or not.
Key words: Social Media Analysis , Machine learning ,Twitter , Sarcasm detection ,Irony detection.