KEMBAR78
Using python and docker for data science | PDF
USING DOCKER FOR DATA 
SCIENCE
WHY DOCKER 
Portable environment 
Isolated between projects 
Stateless 
Fast local file access 
Hetrogenous
GET DOCKER 
https://docs.docker.com/installation/ 
boot2docker .dmg or .exe 
apt-get install docker.io ...
RUN SCIPYSERVER 
$ docker run -d -e "PASSWORD=YourPassword?" ipython/scipyserver 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
ipython/scipyserver 
https://localhost:443 
https://{boot2docker ip}:443
CREATE DATA-ONLY CONTAINERS 
$ docker run  
-d  
-v ~/notebooks:/notebooks  
--name notebooks_container  
ubuntu 
echo notebooks 
$ docker run -d -v ~/data:/data --name data_container ubuntu echo data
MOUNT DATA-ONLY CONTAINERS 
$ docker stop dev_notebook 
$ docker rm dev_notebook 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
--volumes-from data_container  
--volumes-from notebooks_container  
ipython/scipyserver
CREATE A DOCKERFILE 
FROM ipython/scipyserver 
MAINTAINER Calvin Giles <calvin.giles@gmail.com> 
COPY requirements.txt /requirements.txt 
RUN pip2 install -r /requirements.txt 
RUN pip3 install -r /requirements.txt 
$ docker build  
-t calvingiles/ds-notebook  
. 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
--volumes-from data_container  
--volumes-from notebooks_container  
calvingiles/ds-notebook
NEXT TIME 
Connecting to local database containers 
Tweaking the boot2docker vm memory from 2GB to 8 (or 
more...) 
Linking to private git repositories 
Automated builds with github linking
MORE? 
Docker User Guide: 
http://docs.docker.com/userguide/ 
Docker Reference: 
http://docs.docker.com/reference/commandline/cli/ 
ipython docker images: 
https://registry.hub.docker.com/repos/ipython/ 
my docker image: 
https://github.com/calvingiles/ds-notebook 
https://registry.hub.docker.com/u/calvingiles/ds-notebook/
ABOUT ME 
Calvin Giles 
Data Scientist at Adthena 
PyData Meetup Organiser 
untangleconsulting.io 
calvin.giles@gmail.com 
@calvingiles on twitter, github, docker hub (and many more)

Using python and docker for data science

  • 1.
    USING DOCKER FORDATA SCIENCE
  • 2.
    WHY DOCKER Portableenvironment Isolated between projects Stateless Fast local file access Hetrogenous
  • 3.
    GET DOCKER https://docs.docker.com/installation/ boot2docker .dmg or .exe apt-get install docker.io ...
  • 4.
    RUN SCIPYSERVER $docker run -d -e "PASSWORD=YourPassword?" ipython/scipyserver $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 ipython/scipyserver https://localhost:443 https://{boot2docker ip}:443
  • 5.
    CREATE DATA-ONLY CONTAINERS $ docker run -d -v ~/notebooks:/notebooks --name notebooks_container ubuntu echo notebooks $ docker run -d -v ~/data:/data --name data_container ubuntu echo data
  • 6.
    MOUNT DATA-ONLY CONTAINERS $ docker stop dev_notebook $ docker rm dev_notebook $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container ipython/scipyserver
  • 7.
    CREATE A DOCKERFILE FROM ipython/scipyserver MAINTAINER Calvin Giles <calvin.giles@gmail.com> COPY requirements.txt /requirements.txt RUN pip2 install -r /requirements.txt RUN pip3 install -r /requirements.txt $ docker build -t calvingiles/ds-notebook . $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container calvingiles/ds-notebook
  • 8.
    NEXT TIME Connectingto local database containers Tweaking the boot2docker vm memory from 2GB to 8 (or more...) Linking to private git repositories Automated builds with github linking
  • 9.
    MORE? Docker UserGuide: http://docs.docker.com/userguide/ Docker Reference: http://docs.docker.com/reference/commandline/cli/ ipython docker images: https://registry.hub.docker.com/repos/ipython/ my docker image: https://github.com/calvingiles/ds-notebook https://registry.hub.docker.com/u/calvingiles/ds-notebook/
  • 10.
    ABOUT ME CalvinGiles Data Scientist at Adthena PyData Meetup Organiser untangleconsulting.io calvin.giles@gmail.com @calvingiles on twitter, github, docker hub (and many more)