Tools For Data Science
1. Languages used :
TOP 3 :
Python, R, SQL
2. INTRODUCTION TO PYTHON
For Data Science – computing libraries like Pandas,
Numpy, SciPy, Matplotlib
For AI – PyTorch, TensorFlow, Keras, Scikit learn.
Natural Language Processing (NLP) : Natural
Language Tool Kit
3. INTRODUCTION TO R LANGUAGE
Free Software,
Similarities between Python and R language
Both commonly refer to the same set of
license
Both support collaboration
In many cases these terms can be used
interchangeably (but not all)
Difference –
Pyhton or open source is business focused while
free software or R is value focused.
R has become worlds largest repository of
statistical knowledge
Its has 15000 publicly released package
R integrates with other languages like C++
java, .net , python .
Mathgematical operation like matrix
multiplication work straight out of the box.
Object oriented programming facilities than
other statistical language
4. INTRODUCTION TO SQL( STRUCTURED QUERY
LANGUAGE )
20 yrs older than python and R
Used in : handling structured data , i.e. Data
incorporating relations among entities and variables
Used in database
Available database – mySQL, ibm db2, oracle
database, SQLite, MariaDB, PostgreSQL,
5. OTHER LANGUAGE USED IN DATASCIENCE
SCALA, JAVA, C++, JULIA, JS, PHPVISUAL BASIC.
6. DATA SCIENCE TOOLS
a. Open Source Tools for Data Science
1. Widely used sources ( relational database )
– my SQL & PostgreSQL
2. Data Integration and Transformation tools-
Apache AIRFLOW,kube flow,apache
kafka,apache nifi, apache spark SQL ,
NODE RED.
3. DATA VISUALIZATION TOOLS – HUE,
KIBANA, SUPERSET
4. MODEL DEPLOYMENT – Prediction
IO ,SELDON,MLEAP, TENSOR FLOW
SERVICES
5. MODEL MONITORING AND ASSESMENT –
ModelDB, PROMETHEUS,
6. DEVELOPMENT ENVIRONMENT – jupyter ,
jupyter lab, apache zeppelin, R studio,
7. EXECUTION ENVIRONMENT – APACHE
SPARK.APCHE FLINK, RAY
8. FULLY INTEGRETED VISUAL TOOLS –
KNIME
7. COMMERCIAL TOOLS FOR DATA SCIENCE
a. DATA BASE MANAGEMENT- ORACLE
DATABASE, MS SQL SERVER , IBM DB2 P.S.
ETL = EXTRACT , TRANSFORM , LOAD
b. DATA INTEGRATION AND TRANSFORMATION –
IBM INFOSPHERE DATA STAGE , INFORMATICA ,
TALEND, WATSON STUDIO DESKTOP
c. DATA VISUALIZATION ( BESUINESS
INTELLIGENCE TOOLS) – POWER BI ,IBM
COGNOS ANALYTICS, TABLEAU.
d. MODDEL BUILDING – SPSS, SAS MINERS
e. MODEL DEPLOYMENT – IBM
SPSSCOLLABRATION AND DEPLOYMENT
SERVICES
f. MODEL MONITORING AND ASSESMENT
g. DATA ASSET MANAGEMENT – INFORMATICA,
IBM INFOSPHARE
h. DEVELOPENT ENVIRONMENT – WATSON
STUDIO DESKTOP OF IBM
i. FULLY INTEGRATED VISUAL TOOL – H2O.AI
8. Cloud based tools –
a. FULLY INTEGRATED VISUAL TOOL - WATSON
STUDIO DESKTOP, azure machine learning ,
h2o.ai.
b. DATA BASE MANAGEMENT- amazon
DynamoDB, cloudant, CouchDB,
c. DATA INTEGRATION AND TRANSFORMATION-
INFORMATICA, IBM DATA REFINERY
d. DATA VISUALIZATION- DATA MEER , IBM
COGNO ANALYTICS.
e. MODDEL BUILDING – GOOGLE CLOUD , IBM
SPSS, IBM WATSON MACHINE LEARNING ,
AMAZON SAGEMAKER MODEL MONITER .