0% found this document useful (0 votes)

87 views34 pages

DNA Sequencing With Machine Learning

This document discusses using machine learning classifiers to predict gene function from DNA sequences. It introduces using k-mers to represent DNA sequences as bags of words that can be analyzed using natural language processing and machine learning techniques. The document walks through preparing human, chimpanzee and dog DNA sequence and label data, using k-mers and CountVectorizer to represent the sequences as word counts, splitting the human data into train and test sets, training a multinomial naive Bayes classifier on the k-mer counts, and evaluating the classifier's performance on the test set.

Uploaded by

esraamohammed1112000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views34 pages

DNA Sequencing With Machine Learning

Uploaded by

esraamohammed1112000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

DNA sequencing and applying

classifier with ML

INTRODUCATION:- 2

In the field of medical information research, the

genetic series is widely used as a component of a
category. One of the applications of ML is
biochemistry. Bioinformatics is an interdisciplinary
science that uses computers and communication
science to understand biological data. One of its
most difficult tasks is to distinguish between regular
genes and disease-causing genes.
3

The classification of gene sequences into

existing categories is utilized in genomic
research to discover the functions of novel
proteins. As a result, it is critical to identify
and categorize such genes. We employ ML
approaches to distinguish between infected
and normal genes using classification
methods.
I will apply a classification model that
can predict a gene's function based on
the DNA sequence of the coding
sequence alone.
5

You will need some libraries

such as: numpy, pandas ..
I will upload human data and read it 6

to became have some data for human

DNA sequence coding regions
and a class label.
7

I also upload and read data

for Chimpanzee and a more
divergent species, the dog.
Here are the definitions for each of 8

the 7 classes and how many there are

in the human training data. They are
gene sequence function groups.
9

Since seq is not equal, we will apply the k-

mers to the complete sequences.
Using get Kmers function
10

Now, our coding sequence data is

changed to lowercase, split up into all
possible k-mer words of length 6
11
12
13

Since we are going to use scikit-learn

natural language processing tools to
do the k-mer counting, we need to
now convert the lists of k-mers for
each gene into string sentences of
words that the count vectorizer can
use.
14
We can also make a y variable
to hold the class labels.
16
17

We will perform the same

steps for chimpanzee and dog
18
19
20
21
we will apply the BAG of WORDS
using CountVectorizer using NLP.
This is equivalent to k-mer counting.
23
24

If we have a look at class balance we can

see we have relatively balanced dataset.
25
26
27

Splitting the human dataset into the

training set and test set.
28

A multinomial naive Bayes classifier will be

created. I previously did some parameter
tuning and found the ngram size of 4
(reflected in the Countvectorizer() instance)
and a model alpha of 0.1 did the best
29
let's look at some model
performce metrics like the
confusion matrix, accuracy,
precision, recall and f1 score.
We are getting really good
results on our unseen data,
31
32
33
THANK YOU

Bio Report El
No ratings yet
Bio Report El
8 pages
Genomic Sequence Data Classification Using Machine Learning Techniques
100% (1)
Genomic Sequence Data Classification Using Machine Learning Techniques
23 pages
DNA Sequence Classification Using ML
No ratings yet
DNA Sequence Classification Using ML
8 pages
Machine Learning in DNA Analysis
No ratings yet
Machine Learning in DNA Analysis
12 pages
Gene Prediction Using Statistical Methods
No ratings yet
Gene Prediction Using Statistical Methods
47 pages
Improving Genomic Models Via Task-Specific Self-Pretraining: Sohan Mupparapu Parameswari Krishnamurthy Ratish Puduppully
No ratings yet
Improving Genomic Models Via Task-Specific Self-Pretraining: Sohan Mupparapu Parameswari Krishnamurthy Ratish Puduppully
7 pages
Analysis of Machine Learning Approaches For DNA Sequencing and Classification: An Optimized Approach
No ratings yet
Analysis of Machine Learning Approaches For DNA Sequencing and Classification: An Optimized Approach
18 pages
Em and Forward
No ratings yet
Em and Forward
11 pages
Enhanced Viral Genome Classification Using Large L
No ratings yet
Enhanced Viral Genome Classification Using Large L
16 pages
Personalized Cancer Diagnosis
No ratings yet
Personalized Cancer Diagnosis
100 pages
Machine Learning in Genomic Medicine
No ratings yet
Machine Learning in Genomic Medicine
22 pages
ML Lab 6-9
No ratings yet
ML Lab 6-9
15 pages
Clasification 1 - 240117 - 133229
No ratings yet
Clasification 1 - 240117 - 133229
10 pages
Personalised Medicine Solution Methodology
No ratings yet
Personalised Medicine Solution Methodology
4 pages
Gene Prediction
No ratings yet
Gene Prediction
24 pages
Research Article Analysis of DNA Sequence Classification Using CNN and Hybrid Models
No ratings yet
Research Article Analysis of DNA Sequence Classification Using CNN and Hybrid Models
12 pages
Nihms 839467
No ratings yet
Nihms 839467
30 pages
Project Report Bioinformatics VCW
No ratings yet
Project Report Bioinformatics VCW
44 pages
Data Mining Fall 2023
No ratings yet
Data Mining Fall 2023
15 pages
Lab Cycle
No ratings yet
Lab Cycle
1 page
LayoutingFix
No ratings yet
LayoutingFix
8 pages
Wa0001
No ratings yet
Wa0001
39 pages
R20!63!20ITC27 Deep Learning Lab Manual (Minor Proj 2) Dr.K.ramu
No ratings yet
R20!63!20ITC27 Deep Learning Lab Manual (Minor Proj 2) Dr.K.ramu
47 pages
MLExample1 1
No ratings yet
MLExample1 1
37 pages
A Review of Deep Learning Applications in Human Genomics Using Next-Generation Sequencing Data
No ratings yet
A Review of Deep Learning Applications in Human Genomics Using Next-Generation Sequencing Data
20 pages
Advanced Machine Learning Course
No ratings yet
Advanced Machine Learning Course
4 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
Machine Learning Notes22
No ratings yet
Machine Learning Notes22
45 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
49 pages
cp4252 Machine Learning Lab Manual
No ratings yet
cp4252 Machine Learning Lab Manual
38 pages
Decision Trees & Neural Networks
No ratings yet
Decision Trees & Neural Networks
10 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Machine Learning
100% (1)
Machine Learning
21 pages
Lecture 05 Preview
No ratings yet
Lecture 05 Preview
65 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
50 pages
MITx 6.86x Notes - MD
No ratings yet
MITx 6.86x Notes - MD
91 pages
AI Report 6sem
No ratings yet
AI Report 6sem
6 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Generative vs Discriminative Models
No ratings yet
Generative vs Discriminative Models
31 pages
37 DATA SCIENCE (B.SC.) MAJOR-46-47
No ratings yet
37 DATA SCIENCE (B.SC.) MAJOR-46-47
2 pages
Machine Learning Lab Manual (15CSL76)
No ratings yet
Machine Learning Lab Manual (15CSL76)
30 pages
Bioinformatics TM4
No ratings yet
Bioinformatics TM4
44 pages
2023s2 Cosc122 Assignment1 Handout
No ratings yet
2023s2 Cosc122 Assignment1 Handout
9 pages
ML Syll
No ratings yet
ML Syll
2 pages
Genomic Language Models for Scientists
No ratings yet
Genomic Language Models for Scientists
25 pages
Kritika Sejwal - 24MCI10023 - ML Lab - Worksheet 4
No ratings yet
Kritika Sejwal - 24MCI10023 - ML Lab - Worksheet 4
4 pages
Ad3461 ML Manual
No ratings yet
Ad3461 ML Manual
34 pages
Lab Manual: Department of Computer Science and Engineering
No ratings yet
Lab Manual: Department of Computer Science and Engineering
30 pages
PL LAB 3 File
No ratings yet
PL LAB 3 File
56 pages
Mac Unit 3
No ratings yet
Mac Unit 3
65 pages
ML Lab Manual
No ratings yet
ML Lab Manual
26 pages
Lecture3 Linear Classifiers
No ratings yet
Lecture3 Linear Classifiers
36 pages
Cores Bioinformatics - and - Computational - Biology
No ratings yet
Cores Bioinformatics - and - Computational - Biology
4 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
44 pages
AI and ML Lab Manual
No ratings yet
AI and ML Lab Manual
29 pages
Final PRINT 2022 SCHEME VI SEM SCHEME & SYLLABUS
No ratings yet
Final PRINT 2022 SCHEME VI SEM SCHEME & SYLLABUS
30 pages
Intro Slides
No ratings yet
Intro Slides
31 pages
Unit 2 Data Science Process (P)
No ratings yet
Unit 2 Data Science Process (P)
24 pages
Printing Parameters Optimization Assisted by Machine Learning and Sintering Behavior of Binder Jetting 3D Printed 2024al Alloy
No ratings yet
Printing Parameters Optimization Assisted by Machine Learning and Sintering Behavior of Binder Jetting 3D Printed 2024al Alloy
13 pages
AI-Compass A Framework For Identifying High-ROI AI Use Cases
No ratings yet
AI-Compass A Framework For Identifying High-ROI AI Use Cases
22 pages
Deep Learning Project For Computer Vision With Python 2022
No ratings yet
Deep Learning Project For Computer Vision With Python 2022
297 pages
Sahono 2020
No ratings yet
Sahono 2020
6 pages
Modeling Beats and Downbeats With A Time-Frequency Transformer
No ratings yet
Modeling Beats and Downbeats With A Time-Frequency Transformer
5 pages
Afzal Et Al., 2020
No ratings yet
Afzal Et Al., 2020
18 pages
Synopsis
No ratings yet
Synopsis
17 pages
AI-Driven Continuous Feedback Mechanisms in DevOps For Proactive Performance Optimization and User Experience Enhancement in Software Development
No ratings yet
AI-Driven Continuous Feedback Mechanisms in DevOps For Proactive Performance Optimization and User Experience Enhancement in Software Development
38 pages
Ayush Ranjan's Tech Projects & Skills
No ratings yet
Ayush Ranjan's Tech Projects & Skills
1 page
UAV Aerial Image-Based Forest Fire Detection Using Deep Learning KEDDOUS AKILA 2.0
No ratings yet
UAV Aerial Image-Based Forest Fire Detection Using Deep Learning KEDDOUS AKILA 2.0
100 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
Project
No ratings yet
Project
74 pages
AMR-based CNN Model
No ratings yet
AMR-based CNN Model
31 pages
AI Pros and Cons Explained
No ratings yet
AI Pros and Cons Explained
5 pages
Oracle Genetative AI Mock Test - Set - 10
No ratings yet
Oracle Genetative AI Mock Test - Set - 10
12 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
54 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
Kanishka's Logbook
No ratings yet
Kanishka's Logbook
20 pages
ChatGPT Prompts Cheat Sheet
100% (1)
ChatGPT Prompts Cheat Sheet
4 pages
R08 Multiple Regression and Machine Learning
No ratings yet
R08 Multiple Regression and Machine Learning
24 pages
Stock Prediction via Sentiment Analysis
No ratings yet
Stock Prediction via Sentiment Analysis
5 pages
Grammarly Result
No ratings yet
Grammarly Result
39 pages
ML Course Project
No ratings yet
ML Course Project
13 pages
Ronel Arida Missinychrista - 24040124410017 - UAS QA QC
No ratings yet
Ronel Arida Missinychrista - 24040124410017 - UAS QA QC
7 pages
Blood Group Detection Using Image Processing and Deep Learning
100% (1)
Blood Group Detection Using Image Processing and Deep Learning
7 pages
Jamapsychiatry Scott 2024 VP 240016 1730747532.31674
No ratings yet
Jamapsychiatry Scott 2024 VP 240016 1730747532.31674
2 pages
Getting Started With Anonymisation: Guide To
No ratings yet
Getting Started With Anonymisation: Guide To
23 pages
CNN 6
No ratings yet
CNN 6
10 pages
Titanic - Machine Learning From Disaster - Kaggle
No ratings yet
Titanic - Machine Learning From Disaster - Kaggle
19 pages

DNA Sequencing With Machine Learning

Uploaded by

DNA Sequencing With Machine Learning

Uploaded by

DNA sequencing and applying

classifier with ML​

In the field of medical information research, the

The classification of gene sequences into

You will need some libraries

to became have some data for human

I also upload and read data

the 7 classes and how many there are

Since seq is not equal, we will apply the k-

Now, our coding sequence data is

Since we are going to use scikit-learn

We will perform the same

If we have a look at class balance we can

Splitting the human dataset into the

A multinomial naive Bayes classifier will be

You might also like

classifier with ML