Engineering Onboarding & Tech Stack Overview
Project/Org: AI Paralegal Systems
Audience: New Engineers, DevOps, ML Engineers, and QA
Prepared By: Engineering Leadership
Version: 1.0
Date: [Insert Date]
1. Welcome to the Team
Welcome to the Engineering Team at [Company Name]! We're building
cutting-edge Retrieval-Augmented Generation (RAG) products that
blend LLMs, vector databases, and intelligent pipelines to transform
enterprise document handling. This document will guide you through
your onboarding process, tools, workflows, and responsibilities.
2. First 7 Days Checklist
Day Task
Day 1 Company orientation, meet the team, Slack access
Day 2 GitHub access, repo walkthrough
Day 3 Local setup of RAG stack (API, FAISS, Frontend)
Day 4 Explore chunking, embeddings & vector search
Day 5 Run unit tests, review code review guidelines
Day 6 Small bug fix or ticket
Day 7 Onboarding feedback + mentor sync-up
3. Team Structure
Role Description
Backend Engineer API, FAISS wrapper, document ingestion
Embeddings, prompt engineering, LLM
ML Engineer
optimization
Frontend
Streamlit or React-based interface
Developer
DevOps CI/CD, monitoring, deployments
QA Testing flows, validating prompt outputs
4. Project Architecture Overview
Your work will involve components like:
• Chunking Engine: Splits documents into semantic chunks
• Embedding Generator: Google 004 model for embedding
documents & queries
• Vector Store: FAISS or Pinecone (in some clients)
• Retriever: Semantic search on vectors
• LLM Interface: Mistral or client-preferred model
• Frontend: Streamlit for internal, React for client-facing apps
• Backend API: FastAPI (Python) exposed for end-to-end flow
• CI/CD: GitHub Actions → Docker → Kubernetes
5. Tech Stack Summary
Layer Stack
Language Python (FastAPI, LangChain), JavaScript (React)
Frontend Streamlit / ReactJS
Backend FastAPI, Celery (task queues)
Embedding Google 004, OpenAI (fallback)
Vector DB FAISS (local), Pinecone (prod optional)
LLM Mistral, Open-source models, or client-specific LLM
Database PostgreSQL, Redis (cache/queues)
Infra Docker, Kubernetes, GitHub Actions
Monitoring Prometheus, Grafana, Loki, Sentry
Collab Tools Slack, Notion, Linear, GitHub
6. GitHub Repositories
Repo Name Description
ai-paralegal-backend Core backend, document chunking, API
ai-paralegal-frontend Streamlit UI
rag-embeddings Embedding pipelines and testing scripts
devops-infra Docker, K8s manifests, monitoring setup
test-suite Unit + integration tests for end-to-end flow
7. Local Dev Setup
Minimum Requirements
• Python 3.10+
• Docker & Docker Compose
• Node.js (if working on React frontend)
• VSCode / PyCharm recommended
Steps
1. Clone repos:
git clone https://github.com/your-org/ai-paralegal-backend.git
2. Create .env files from templates
3. Install dependencies:
pip install -r requirements.txt
4. Start services:
docker-compose up
8. Coding Guidelines
• Follow PEP8 and use black for formatting
• All features must include:
o Unit tests (pytest)
o Type hints (mypy)
o Inline comments for complex logic
• Use feature branches and submit PRs for review before merging
9. Slack Channels
Channel Purpose
#eng-general Announcements and team-wide updates
#rag-ml Embedding & LLM-related discussions
#frontend-dev UI queries and deployments
#infra-alerts Monitoring & CI/CD alerts
#support Internal testing and issues
10. Expectations & Values
• Own your features from design to deployment
• Be proactive in reviewing PRs
• Test rigorously before pushing to main
• Ask questions early—no one builds in isolation
• Follow security & compliance mandates for client data
11. Additional Resources
• 📘 RAG Workflow Diagram (see internal Notion)
• 🧪 Test Cases Reference – /test-suite/docs/
• 📦 Model Snapshots – /models/
• 🔐 Security Onboarding – Contact DevSecOps