Implementation of a heterogeneous data ingestion framework in a DATA LAKE environment

Amouboudi, Dyhia; Hadfi, Amel

Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/12546

Full metadata record

DC Field	Value	Language
dc.contributor.author	Amouboudi, Dyhia	-
dc.contributor.author	Hadfi, Amel	-
dc.date.accessioned	2021-10-28T09:39:45Z	-
dc.date.available	2021-10-28T09:39:45Z	-
dc.date.issued	2021-10-03	-
dc.identifier.uri	http://di.univ-blida.dz:8080/jspui/handle/123456789/12546	-
dc.description	ill., Bibliogr.	fr_FR
dc.description.abstract	The main purpose of this thesis paper deals with large and heterogenous formats of data. The reason behind why Big Data is so immense goes back to the ﬁve V’s: Variety, Veracity, Volume, Velocity and Value. Our research aims to tackle the Variety and Value aspect of big data. Compromised within our research, we will be working in a Data Lake environment. DL’s are made up with several components such as; Data Ingestion, Meta Data, Data Governance, Data security, etc. The module we have chosen to work on is Data Ingestion. Our study’s aim is to ingest massive volumes of information from various sources into a Lake environment. To ingest our data, we will be implementing the Extract, Load, Transform (ELT) process instead of Extract, Transform, Load (ETL). The reason behind this decision was because we’re working in a Data Lake environment, so data must be loaded in AS IS format with light transformations only. After exploring various data ingestion frameworks, we came across several solutions. The one that stood out from the crowd was Apache Spark. After thoroughly analyzing the framework, we found a couple of missing elements. After adopting Sparks framework, we proceeded to extend it by adding two of our features. The ﬁrst is a Data Classiﬁer and the second is a Data Visualizer. The new data ingestion platform has been developed in PyCharm IDE, Apache Spark 3.0.0, using Python 3.6, under Ubuntu 20 and the Data Lake we chose is Hadoop. Keywords: Data Lake, Data Ingestion, ELT, Data Classiﬁer, Data Visualizer and Big Data.	fr_FR
dc.language.iso	en	fr_FR
dc.publisher	Université Blida 1	fr_FR
dc.subject	Data Lake	fr_FR
dc.subject	Data Ingestion	fr_FR
dc.subject	ELT	fr_FR
dc.subject	Data Classiﬁer	fr_FR
dc.subject	Data Visualizer and Big Data	fr_FR
dc.title	Implementation of a heterogeneous data ingestion framework in a DATA LAKE environment	fr_FR
dc.type	Thesis	fr_FR
Appears in Collections:	Mémoires de Master

Files in This Item:

File	Description	Size	Format
Amouboudi Dyhia et Hadfi Amel.pdf		3,7 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets