KEMBAR78
Tools For Data Science | PDF | Databases | Sql
0% found this document useful (0 votes)
519 views4 pages

Tools For Data Science

The document discusses various tools used for data science. It identifies the top 3 programming languages used as Python, R, and SQL. Python is used for computing libraries like Pandas and data analysis frameworks. R is a free software environment for statistical analysis and has over 15,000 publicly available packages. SQL is a language for handling structured data in databases. The document also outlines various open source and commercial tools across categories like data integration, visualization, model deployment, and cloud-based services.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
519 views4 pages

Tools For Data Science

The document discusses various tools used for data science. It identifies the top 3 programming languages used as Python, R, and SQL. Python is used for computing libraries like Pandas and data analysis frameworks. R is a free software environment for statistical analysis and has over 15,000 publicly available packages. SQL is a language for handling structured data in databases. The document also outlines various open source and commercial tools across categories like data integration, visualization, model deployment, and cloud-based services.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Tools For Data Science

1. Languages used :
TOP 3 :
Python, R, SQL
2. INTRODUCTION TO PYTHON
For Data Science – computing libraries like Pandas,
Numpy, SciPy, Matplotlib
For AI – PyTorch, TensorFlow, Keras, Scikit learn.
Natural Language Processing (NLP) : Natural
Language Tool Kit
3. INTRODUCTION TO R LANGUAGE
Free Software,
Similarities between Python and R language
Both commonly refer to the same set of
license
Both support collaboration
In many cases these terms can be used
interchangeably (but not all)
Difference –
Pyhton or open source is business focused while
free software or R is value focused.
 R has become worlds largest repository of
statistical knowledge
 Its has 15000 publicly released package
 R integrates with other languages like C++
java, .net , python .
 Mathgematical operation like matrix
multiplication work straight out of the box.
 Object oriented programming facilities than
other statistical language

4. INTRODUCTION TO SQL( STRUCTURED QUERY


LANGUAGE )

20 yrs older than python and R


Used in : handling structured data , i.e. Data
incorporating relations among entities and variables
Used in database
Available database – mySQL, ibm db2, oracle
database, SQLite, MariaDB, PostgreSQL,
5. OTHER LANGUAGE USED IN DATASCIENCE
SCALA, JAVA, C++, JULIA, JS, PHPVISUAL BASIC.
6. DATA SCIENCE TOOLS
a. Open Source Tools for Data Science
1. Widely used sources ( relational database )
– my SQL & PostgreSQL
2. Data Integration and Transformation tools-
Apache AIRFLOW,kube flow,apache
kafka,apache nifi, apache spark SQL ,
NODE RED.
3. DATA VISUALIZATION TOOLS – HUE,
KIBANA, SUPERSET
4. MODEL DEPLOYMENT – Prediction
IO ,SELDON,MLEAP, TENSOR FLOW
SERVICES
5. MODEL MONITORING AND ASSESMENT –
ModelDB, PROMETHEUS,
6. DEVELOPMENT ENVIRONMENT – jupyter ,
jupyter lab, apache zeppelin, R studio,
7. EXECUTION ENVIRONMENT – APACHE
SPARK.APCHE FLINK, RAY
8. FULLY INTEGRETED VISUAL TOOLS –
KNIME
7. COMMERCIAL TOOLS FOR DATA SCIENCE
a. DATA BASE MANAGEMENT- ORACLE
DATABASE, MS SQL SERVER , IBM DB2 P.S.
ETL = EXTRACT , TRANSFORM , LOAD
b. DATA INTEGRATION AND TRANSFORMATION –
IBM INFOSPHERE DATA STAGE , INFORMATICA ,
TALEND, WATSON STUDIO DESKTOP
c. DATA VISUALIZATION ( BESUINESS
INTELLIGENCE TOOLS) – POWER BI ,IBM
COGNOS ANALYTICS, TABLEAU.
d. MODDEL BUILDING – SPSS, SAS MINERS
e. MODEL DEPLOYMENT – IBM
SPSSCOLLABRATION AND DEPLOYMENT
SERVICES
f. MODEL MONITORING AND ASSESMENT
g. DATA ASSET MANAGEMENT – INFORMATICA,
IBM INFOSPHARE
h. DEVELOPENT ENVIRONMENT – WATSON
STUDIO DESKTOP OF IBM
i. FULLY INTEGRATED VISUAL TOOL – H2O.AI
8. Cloud based tools –
a. FULLY INTEGRATED VISUAL TOOL - WATSON
STUDIO DESKTOP, azure machine learning ,
h2o.ai.
b. DATA BASE MANAGEMENT- amazon
DynamoDB, cloudant, CouchDB,
c. DATA INTEGRATION AND TRANSFORMATION-
INFORMATICA, IBM DATA REFINERY
d. DATA VISUALIZATION- DATA MEER , IBM
COGNO ANALYTICS.
e. MODDEL BUILDING – GOOGLE CLOUD , IBM
SPSS, IBM WATSON MACHINE LEARNING ,
AMAZON SAGEMAKER MODEL MONITER .

You might also like