HOW TO DOCUMENT YOUR
DATA SCIENCE PROJECT
Learn how to document your data
science/machine learning project
www.datalab.com.ng
Introduction
Good Data Scientists know the value of
documentation. Project documentation
helps in tracking every step in a project
while maintaining workflow. Good
documentation also allows stakeholders
to comprehend every aspect of a
project. We examine some steps on
how to effectively document your next
data science project.
COMMENTS
Comments are text added to your code for more explainability.
When working on data science project using programming
languages such as python, R or SQL you can use the # or // key
for commenting out major steps in your code which will help
you in tracking what a code is doing. Good use of comments can
also make code maintenance easy, as well as help you find bugs
in your code faster. Clean codes are well commented and easy to
understand by others. As a Data Scientist make it a practice to
comment out major steps in your project such as adding few
statements alongside your code. This should however be done
only for major steps so you won’t overdo it. Below are some
rules for commenting your code:
9 rules on writing excellent
comments on your code
Rule 1: Comments should not duplicate the code.
Rule 2: Good comments do not excuse unclear code.
Rule 3: If you can’t write a clear comment, there may
be a problem with the code.
Rule 4: Comments should dispel confusion, not cause
it.
Rule 5: Explain unidiomatic code in comments.
Rule 6: Provide links to the original source of copied
code.
Rule 7: Include links to external references where they
will be most helpful.
Rule 8: Add comments when fixing bugs.
Rule 9: Use comments to inform about incomplete
implementations or updates.
(List Source: Best practices for writing code comments, by Ellen Spertus,
https://stackoverflow.blog/2021/07/05/best-practices-for-writing-code-comments/ )
TECHNICAL REPORT
Documenting your data science project as a
technical report can help in distinguishing your
work from others. Reporting makes your work
publishable or reproducible. It takes your data
science project from the coding environment to
a technical document that can be consumed
by others. Here we listed major areas you need
to report when writing a professional report for
your data science project.
Background to the project (What problem
1 you are trying to solve and why).
2 What are the success metrics or key
performance indicators (KPIs) for your
project should your model work well (E.g.
Achieve prediction accuracy of up to 90%
in predicting fraud cases, Reduce customer
waiting time down to 10%).
3 How you collected your data (public data,
scrapped data, argumented data, data
require permission to access e.t.c.).
Your tools (Programming languages or
4 Business Intelligence Dashboards).
5 How you cleaned your data (what are the
data cleaning techniques you adopted?).
6 How you selected features from your data
(How did you select features?, Did you do
any feature transformation or engineering?
Did you find any interesting interactions
between features?, What are the important
drivers/features relevant to your stated
KPIs that you used for model building?,
Did you discard any feature?)
7 The training method(s) you used (Linear
Regression, Logistics Regression, XGBoost,
Did you ensemble models e.t.c.).
How long it took your model to train and make
8 predicions (It is very important you report on
model computational time).
How you evaluated your model performance
9 (E.g Did you explain your Confusion Matrix
and AUC-ROC Curve? Did you explain your
regression result estimate values and P-Values,
Did you explain your Adjusted R-Squared?
e.t.c.).
How you presented your output or predictions
10 (E.g. Accuracy, Precision and Recall Scores,
Probability or Likelihood Percentage, Adjusted
R-Squared, Visualizations e.t.c.).
Which model achieved the best result
11 overall?
Is there a subset of features that would get
12 90-95% of your final model performance?
Which features?
13 Did you try any simpler model that can
achieve the same result as your best
model? (E.g. Aim for Model simplicity).
14 How you made your model explainable
(E.g. Did you use any model explainability
library?)
15 How did you deloy your model (E.g Did you
prototype it with a web or mobile
application?, what are the technology stack
you used for the web or mobile application?).
16 Are there suggestions on how the project
can be improved if other people wish to
work on it?
17 Are there technical challenges you faced
during the project that other people can
avoid?
18 Are there new ideas generated from the
project which could form a new research
area?
19 Did you mention your team members who
also worked on the project (if any)?.
20 Did you reference the original source of
copied codes or external materials used?
(Some tips taken from: www.kaggle.com/WinningModelDocumentationGuidelines)
Conclusion
As a Data Scientist keep in mind that your project may
be read by people with technical and non-technical
backgrounds and should aim to be clear and well
documented. Documentation can be written in Word
or PDF format.
NEED MORE INSIGHT?
Join Our Data
science Internship
Program
We are a team of Data Scientists
and AI specialists. We analyze
and generate actionable insights
from data to execute innovative
Artificial Intelligence solutions
that drives various businesses
ADDRESS:
Suite 33 Mazfalah
Plaza, Karu Site,
Abuja, Nigeria
PHONE:
+2348038518576
WEBSITE:
www.datalab.com.ng