GitHub Setup for Data Science Docker Project
Quick Reference Guide for Version Control Setup & Daily Workflow
Prerequisites Checklist
Git installed locally
GitHub account created
SSH keys configured (recommended) or Personal Access Token
PyCharm Git integration enabled
Docker Data Science project created
Phase 1: Repository Setup
1
Create GitHub Repository
Set up a new repository on GitHub:
Go to GitHub.com → New Repository
Repository name: my-ds-project (match your local folder)
Description: Brief project description
Set to Public or Private as needed
Do NOT initialize with README, .gitignore, or license
Click "Create repository"
Keep the GitHub page open - you'll need the repository URL
Initialize Local Git Repository
In your project directory, initialize Git:
cd /path/to/my-ds-project git init git branch -M main
Modern Git uses 'main' as the default branch name
3
Create Comprehensive .gitignore
Create .gitignore file for Data Science projects:
# Python __pycache__/ *.py[cod] *$py.class *.so .Python env/ venv/ ENV/ env.bak/
venv.bak/ # Jupyter Notebook .ipynb_checkpoints */.ipynb_checkpoints/* # Data files
(add specific paths as needed) data/raw/* data/processed/* *.csv *.xlsx *.parquet
!data/.gitkeep # Models and outputs models/*.pkl models/*.joblib outputs/ results/ #
Docker .docker/ # Environment files .env .env.local # IDE .vscode/ .idea/ *.swp *.swo
*~ # OS .DS_Store Thumbs.db # Logs logs/ *.log
Create README.md
Document your project with a comprehensive README:
# My Data Science Project ## Overview Brief description of what this project does ##
Setup Instructions ### Prerequisites - Docker - Docker Compose - Git ### Quick Start
```bash git clone https://github.com/yourusername/my-ds-project.git cd my-ds-project
docker-compose up -d ``` Access Jupyter Lab at http://localhost:8888 ## Project
Structure ``` ├── data/ │ ├── raw/ # Original data │ ├── processed/ # Cleaned data │
└── external/ # External datasets ├── notebooks/ # Jupyter notebooks ├── src/ # Source
code ├── tests/ # Unit tests ├── docker-compose.yml ├── Dockerfile └──
requirements.txt ``` ## Usage [Add specific instructions for your project] ##
Contributing [Add contribution guidelines] ## License [Add license information]
Configure Git User Information
Set up your Git identity (if not already done):
git config --global user.name "Your Name" git config --global user.email
"your.email@example.com"
⚠️Use the same email as your GitHub account
Phase 2: Initial Commit and Push
6
Stage and Commit Initial Files
Add and commit your initial project structure:
git add . git status # Review what will be committed git commit -m "Initial commit:
Docker DS project setup - Add Dockerfile with Python 3.11 and DS libraries - Add
docker-compose.yml for easy container management - Add requirements.txt with core DS
packages - Add comprehensive .gitignore for DS projects - Add project structure and
README"
Write descriptive commit messages that explain the 'what' and 'why'
Connect to GitHub Repository
Link your local repository to GitHub:
git remote add origin https://github.com/yourusername/my-ds-project.git # OR if using
SSH: # git remote add origin git@github.com:yourusername/my-ds-project.git git remote
-v # Verify the remote is set correctly
⚠️Replace 'yourusername' with your actual GitHub username
Push to GitHub
Upload your project to GitHub:
git push -u origin main
The -u flag sets up tracking between local and remote branches.
After this, you can use just git push for future pushes
Phase 3: PyCharm Git Integration
9
Enable VCS in PyCharm
Activate version control in PyCharm:
VCS → Enable Version Control Integration
Select "Git" from dropdown
Click OK
PyCharm should detect your existing Git repository
You'll see Git options appear in the VCS menu and toolbar
10
Configure GitHub Integration
Connect PyCharm to your GitHub account:
File → Settings → Version Control → GitHub
Click "+" to add account
Login via Token (recommended) or GitHub credentials
Test connection
Click OK
11
Test PyCharm Git Operations
Verify Git integration works:
Make a small change to README.md
Notice file appears in "Local Changes" (VCS tool window)
Right-click file → Git → Commit File
Write commit message and commit
VCS → Git → Push (or Ctrl+Shift+K)
Check GitHub to confirm the change appears online
Phase 4: Branch Strategy Setup
12
Set Up Development Branch
Create a development branch for ongoing work:
git checkout -b develop git push -u origin develop
Or in PyCharm: VCS → Git → Branches → New Branch
Keep 'main' for stable releases, use 'develop' for active development
Setup Validation Checklist
GitHub repository created and accessible
Local Git repository initialized
Initial files committed and pushed
PyCharm VCS integration working
Main and develop branches created
📋 Daily/Weekly Workflow Reminders
Follow these patterns as your project evolves:
🔄 Daily Development Cycle
Start day → Pull latest changes → Work on features → Commit frequently → Push at end of day
Commands:
git checkout develop git pull origin develop # ... work on your code ... git add
. git commit -m "Add: [description]" git push origin develop
🔀 Feature Development
Create feature branch → Develop → Test → Merge back to develop
Commands:
git checkout develop git checkout -b feature/data-preprocessing # ... develop
feature ... git add . git commit -m "Implement data preprocessing pipeline" git
checkout develop git merge feature/data-preprocessing git push origin develop git
branch -d feature/data-preprocessing
📊 Experiment Tracking
Create experiment branches for different approaches
Commands:
git checkout -b experiment/lstm-model # ... run experiments ... git add
notebooks/lstm_experiment.ipynb git commit -m "Experiment: LSTM model with
attention Results: - Accuracy: 92.5% - Loss: 0.234 - Training time: 45min Next:
Try with different hyperparameters"
Release Preparation
When ready for a release, merge develop to main
Commands:
git checkout main git pull origin main git merge develop git tag -a v1.0.0 -m
"Release version 1.0.0" git push origin main --tags
🛠️Handling Data Files
Remember: Never commit large data files! Use Git LFS or external storage
For large files, consider:
# Install Git LFS (one time setup) git lfs install # Track large files git lfs
track "*.csv" git lfs track "*.parquet" git lfs track "models/*.pkl" # Add
.gitattributes git add .gitattributes
📝 Commit Message Guidelines
Use clear, descriptive commit messages
Format:
# Type: Brief description (50 chars or less) # # Longer explanation if needed
(wrap at 72 chars) # # Types: Add, Update, Fix, Remove, Refactor, Experiment
Examples: - "Add: Initial data preprocessing pipeline" - "Fix: Handle missing
values in feature engineering" - "Update: Improve model accuracy from 85% to 92%"
- "Experiment: Test XGBoost vs Random Forest"
🔍 Regular Maintenance
Keep your repository clean and organized
Weekly: Review and clean up old branches
Monthly: Update dependencies in requirements.txt
Quarterly: Update documentation and README
Before releases: Run full test suite
Pro Tip: Always pull before you push, and commit often with meaningful messages!