KEMBAR78
Data Pipeline Tools & Architecture | PDF | No Sql | Data Management Software
0% found this document useful (0 votes)
40 views35 pages

Data Pipeline Tools & Architecture

This document describes the main components and general architecture of data pipelines. It discusses how data pipelines are used to automate the movement and transformation of data between a source system and a target. The key components include a source, processing steps/transformations, and a destination/target. The general architecture shows data flowing from the source through processing steps to the target. The document then describes the main layers of a pipeline including the source, ingestion and integration, storage, processing, and presentation layers. It provides examples of technologies that can be used in each layer.

Uploaded by

astonnonmartin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views35 pages

Data Pipeline Tools & Architecture

This document describes the main components and general architecture of data pipelines. It discusses how data pipelines are used to automate the movement and transformation of data between a source system and a target. The key components include a source, processing steps/transformations, and a destination/target. The general architecture shows data flowing from the source through processing steps to the target. The document then describes the main layers of a pipeline including the source, ingestion and integration, storage, processing, and presentation layers. It provides examples of technologies that can be used in each layer.

Uploaded by

astonnonmartin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

ISIS-3510

DATA
PIPELINES
SET OF TOOLS AND PROCESSES USED
TO AUTOMATE THE MOVEMENT AND
TRANSFORMATION OF DATA BETWEEN
A SOURCE SYSTEM AND A TARGET
“QLINK”
Main Components

Source

Set of processing steps/


transformations

Destination/Target
General Architecture

Source Processing steps Target


General Architecture

DATA +
DATA
insights

Source Processing steps Target


General Architecture

DATA +
DATA Pr1 Pr2 Pr3
insights

Source Processing steps Target


Data All your raw data live
Source(s) in this layer
Pipeline Layers

DATA

Source
Data All your raw data live
Source(s) in this layer
Pipeline Layers

Your data can be


DATA
structured or
unstructured data

Source
Data All your raw data live
Source(s) in this layer
Pipeline Layers

Your data can be


DATA
structured or
unstructured data

Source
Data All your raw data live
Source(s) in this layer
Pipeline Layers

Your data can be


DATA
structured or
unstructured data

Source
Pipeline Layers - APP
Data

Source
Source(s)
Pipeline Layers

layer
Ingestion
Ingestion
layer
Pipeline Layers

Reading the data from


data sources into data
processing
Ingestion Reading the data from
layer data sources into data
Pipeline Layers

processing

& Gives the data the


Integration format used by
layer the pipeline
Ingestion Message
layer SFTP
Pipeline Layers

Queues

REST REST
endpoints endpoints

&
Integration Firebase Lambda
layer rest api functions
Pipeline Layers -APP

REST
endpoints
Firebase
rest api

Data Ingestion layer &


Source(s) Integration layer
Storage layer
Pipeline Layers

Saves data in a format that is


Data
understandable by the system

Processing steps
Storage layer
Pipeline Layers

Saves data in a format that is


Data
understandable by the system

NO SQL Relational
DBs DBs
Storage Layer Relational DBs
Storage Layer NO SQL DBs
NO SQL DBs

Key-Value
Storage Layer
NO SQL DBs

Key-Value Document
Storage Layer
NO SQL DBs

Key-Value Document Column


Storage Layer
NO SQL DBs

Key-Value Document Column Graph


Storage Layer
Comparative table NO SQL DBs
Storage Layer

Lourenço, J.R., Cabral, B., Carreiro, P. et al. Choosing the right NoSQL database for the job: a quality attribute evaluation. Journal of Big Data 2, 18 (2015).
CAP Theorem
Storage Layer

C - Consistency

A - Availability

P - Partition tolerance
CAP Theorem
Storage Layer

Diagram elaborated by Vivek Kumar Singh


Pipeline Layers -APP

REST
endpoints
Firebase
rest api

Data Ingestion layer & Storage layer


Source(s) Integration layer
Processing layer
Pipeline Layers

Functionalities: Aggregation, mix data


Data sources, pre-calculate data…. This could
use on real time or batch processing

Processing steps
Python
Processing layer
Java
Pipeline Layers

Functionalities: Aggregation, mix data


Data sources, pre-calculate data…. This could
use on real time or batch processing

Processing steps
Pipeline Layers - APP

REST
endpoints
Firebase
rest api

Data Ingestion layer & Storage layer Computation


Source(s) Integration layer layer
Presentation layer
Pipeline Layers

Data
Present insights through dashboards,
emails, SMS, push noti cations…..

Target
fi
Presentation layer
Pipeline Layers

Data
Present insights through dashboards,
emails, SMS, push noti cations…..

Target
fi
Pipeline Layers - APP

REST
endpoints
Firebase
rest api

Ingestion layer & Storage layer Computation Presentation


Data
Source(s) layer layer
Integration layer

You might also like