Veuillez utiliser cette adresse pour citer ce document : https://di.univ-blida.dz/jspui/handle/123456789/12546
Affichage complet
Élément Dublin CoreValeurLangue
dc.contributor.authorAmouboudi, Dyhia-
dc.contributor.authorHadfi, Amel-
dc.date.accessioned2021-10-28T09:39:45Z-
dc.date.available2021-10-28T09:39:45Z-
dc.date.issued2021-10-03-
dc.identifier.urihttp://di.univ-blida.dz:8080/jspui/handle/123456789/12546-
dc.descriptionill., Bibliogr.fr_FR
dc.description.abstractThe main purpose of this thesis paper deals with large and heterogenous formats of data. The reason behind why Big Data is so immense goes back to the five V’s: Variety, Veracity, Volume, Velocity and Value. Our research aims to tackle the Variety and Value aspect of big data. Compromised within our research, we will be working in a Data Lake environment. DL’s are made up with several components such as; Data Ingestion, Meta Data, Data Governance, Data security, etc. The module we have chosen to work on is Data Ingestion. Our study’s aim is to ingest massive volumes of information from various sources into a Lake environment. To ingest our data, we will be implementing the Extract, Load, Transform (ELT) process instead of Extract, Transform, Load (ETL). The reason behind this decision was because we’re working in a Data Lake environment, so data must be loaded in AS IS format with light transformations only. After exploring various data ingestion frameworks, we came across several solutions. The one that stood out from the crowd was Apache Spark. After thoroughly analyzing the framework, we found a couple of missing elements. After adopting Sparks framework, we proceeded to extend it by adding two of our features. The first is a Data Classifier and the second is a Data Visualizer. The new data ingestion platform has been developed in PyCharm IDE, Apache Spark 3.0.0, using Python 3.6, under Ubuntu 20 and the Data Lake we chose is Hadoop. Keywords: Data Lake, Data Ingestion, ELT, Data Classifier, Data Visualizer and Big Data.fr_FR
dc.language.isoenfr_FR
dc.publisherUniversité Blida 1fr_FR
dc.subjectData Lakefr_FR
dc.subjectData Ingestionfr_FR
dc.subjectELTfr_FR
dc.subjectData Classifierfr_FR
dc.subjectData Visualizer and Big Datafr_FR
dc.titleImplementation of a heterogeneous data ingestion framework in a DATA LAKE environmentfr_FR
dc.typeThesisfr_FR
Collection(s) :Mémoires de Master

Fichier(s) constituant ce document :
Fichier Description TailleFormat 
Amouboudi Dyhia et Hadfi Amel.pdf3,7 MBAdobe PDFVoir/Ouvrir


Tous les documents dans DSpace sont protégés par copyright, avec tous droits réservés.