ISIS-3510
DATA
PIPELINES
SET OF TOOLS AND PROCESSES USED
TO AUTOMATE THE MOVEMENT AND
TRANSFORMATION OF DATA BETWEEN
A SOURCE SYSTEM AND A TARGET
“QLINK”
Main Components
Source
Set of processing steps/
transformations
Destination/Target
General Architecture
Source Processing steps Target
General Architecture
DATA +
DATA
insights
Source Processing steps Target
General Architecture
DATA +
DATA Pr1 Pr2 Pr3
insights
Source Processing steps Target
Data All your raw data live
Source(s) in this layer
Pipeline Layers
DATA
Source
Data All your raw data live
Source(s) in this layer
Pipeline Layers
Your data can be
DATA
structured or
unstructured data
Source
Data All your raw data live
Source(s) in this layer
Pipeline Layers
Your data can be
DATA
structured or
unstructured data
Source
Data All your raw data live
Source(s) in this layer
Pipeline Layers
Your data can be
DATA
structured or
unstructured data
Source
Pipeline Layers - APP
Data
Source
Source(s)
Pipeline Layers
layer
Ingestion
Ingestion
layer
Pipeline Layers
Reading the data from
data sources into data
processing
Ingestion Reading the data from
layer data sources into data
Pipeline Layers
processing
& Gives the data the
Integration format used by
layer the pipeline
Ingestion Message
layer SFTP
Pipeline Layers
Queues
REST REST
endpoints endpoints
&
Integration Firebase Lambda
layer rest api functions
Pipeline Layers -APP
REST
endpoints
Firebase
rest api
Data Ingestion layer &
Source(s) Integration layer
Storage layer
Pipeline Layers
Saves data in a format that is
Data
understandable by the system
Processing steps
Storage layer
Pipeline Layers
Saves data in a format that is
Data
understandable by the system
NO SQL Relational
DBs DBs
Storage Layer Relational DBs
Storage Layer NO SQL DBs
NO SQL DBs
Key-Value
Storage Layer
NO SQL DBs
Key-Value Document
Storage Layer
NO SQL DBs
Key-Value Document Column
Storage Layer
NO SQL DBs
Key-Value Document Column Graph
Storage Layer
Comparative table NO SQL DBs
Storage Layer
Lourenço, J.R., Cabral, B., Carreiro, P. et al. Choosing the right NoSQL database for the job: a quality attribute evaluation. Journal of Big Data 2, 18 (2015).
CAP Theorem
Storage Layer
C - Consistency
A - Availability
P - Partition tolerance
CAP Theorem
Storage Layer
Diagram elaborated by Vivek Kumar Singh
Pipeline Layers -APP
REST
endpoints
Firebase
rest api
Data Ingestion layer & Storage layer
Source(s) Integration layer
Processing layer
Pipeline Layers
Functionalities: Aggregation, mix data
Data sources, pre-calculate data…. This could
use on real time or batch processing
Processing steps
Python
Processing layer
Java
Pipeline Layers
Functionalities: Aggregation, mix data
Data sources, pre-calculate data…. This could
use on real time or batch processing
Processing steps
Pipeline Layers - APP
REST
endpoints
Firebase
rest api
Data Ingestion layer & Storage layer Computation
Source(s) Integration layer layer
Presentation layer
Pipeline Layers
Data
Present insights through dashboards,
emails, SMS, push noti cations…..
Target
fi
Presentation layer
Pipeline Layers
Data
Present insights through dashboards,
emails, SMS, push noti cations…..
Target
fi
Pipeline Layers - APP
REST
endpoints
Firebase
rest api
Ingestion layer & Storage layer Computation Presentation
Data
Source(s) layer layer
Integration layer