A Synopsis on
CodeSense: AI Software Code Analyzer
Submitted in partial fulfillment of the
requirements for the award of the degree
of
Bachelor of Technology
in
Computer Science and
Engineering by
Yash Sharma (2100970100133)
Sarthak Agrawal (2200970100148)
Shivam Jaiswal (2200970100159)
Tanishq Kumar (2200970100175)
Semester – VII
Under the Supervision of
Ms. Ramandeep Kaur
Galgotias College of Engineering & Technology
Greater Noida 201306
Affiliated to
Dr. APJ Abdul Kalam Technical University, Lucknow
ABSTRACT
The rapid growth of software development has resulted in increasingly complex codebases,
making code review and quality assurance more challenging. Manual code reviews are time-
consuming, error-prone, and often fail to detect hidden issues such as subtle bugs, code
smells, and maintainability problems. To address this gap, we propose CodeSense, an AI-
powered software code analyser that automates code quality inspection and provides
intelligent feedback.
CodeSense integrates Natural Language Processing (NLP), Machine Learning (ML), and static
code analysis to detect errors, security vulnerabilities, and violations of coding standards.
Trained on large-scale open-source repositories, the system learns best practices and
incorporates semantic analysis for deeper insights into program logic. Unlike traditional
analysers, CodeSense not only identifies issues but also suggests corrective measures,
thereby improving code readability, maintainability, and security.
The system reduces developer workload by automating routine checks, increasing
productivity, and enhancing software reliability. With applications in both academia and
industry, CodeSense promotes clean coding practices, accelerates development cycles, and
lowers the cost of debugging and maintenance.
INTRODUCTION
Software systems are growing rapidly in scale and complexity, making it increasingly difficult
for developers to maintain clean, secure, and efficient code. Large projects often involve
millions of lines of code, multiple teams, and diverse technologies, which makes manual
code review both time-consuming and error-prone. Even with experienced reviewers, subtle
bugs, performance issues, and security vulnerabilities often remain undetected.
Traditional static analysis tools such as SonarQube, PMD, and FindBugs provide rule-based
inspections that can detect common issues, but they struggle with deeper problems like
code smells, logical flaws, and cross-module dependencies. Dynamic analysers, though more
powerful, require significant computational resources and time, making them unsuitable for
frequent use in agile and fast-paced development environments.
At the same time, the rising number of security threats in modern software—ranging from
SQL injection to cross-site scripting—demands more intelligent and adaptive approaches.
With widespread reliance on third-party libraries and open-source frameworks,
vulnerabilities can spread quickly if not identified early.
Artificial Intelligence (AI) and Machine Learning (ML) present a strong opportunity to
overcome these limitations. Unlike rule-based systems, AI can learn from large repositories
of real-world code, recognize patterns of good and bad practices, and adapt to new
programming paradigms. When combined with Natural Language Processing (NLP), such
systems can even interpret code comments and developer intent, offering richer insights
into maintainability and design quality.
The proposed system, CodeSense, addresses these challenges by combining the
deterministic power of static analysis with the adaptive intelligence of AI. Instead of merely
reporting violations of fixed rules, CodeSense learns from real-world repositories, prioritizes
issues by severity, and suggests corrective actions. This makes it more developer-friendly,
context-aware, and scalable compared to existing tools.
In summary, CodeSense aims to bridge the gap between traditional code analysers and
modern AI-driven solutions. It represents a hybrid approach that not only improves software
reliability and maintainability but also supports developers in writing cleaner, more secure,
and future-ready code.
LITERATURE SURVEY
Ensuring the quality and reliability of software has always been a primary concern in
software engineering. As software projects grow in size and complexity, manual inspection
becomes impractical, leading to the development of automated approaches for code
analysis. The literature reveals extensive research across four main areas: (i) traditional
static analysis tools, (ii) code smell detection and maintainability, (iii) AI-based methods, and
(iv) hybrid systems. Each area has contributed valuable insights, but each also presents
limitations that motivate the need for a more advanced solution such as CodeSense.
1. Traditional Static Analysis Tools
Static analysis represents one of the earliest and most widely adopted methods for code
inspection. Early tools like Lint (Johnson, 1979) focused on identifying stylistic issues and
simple programming errors in C programs. Later, tools such as FindBugs (Hovemeyer &
Pugh, 2004), Checkstyle, and PMD extended this approach to Java and other languages.
Despite their popularity, static analysers have several limitations:
Heavy reliance on fixed rule sets, which require constant updates for new
frameworks or languages.
Lack of semantic understanding, as tools typically check syntax rather than program
intent.
Figure 1: Workflow of a typical static analyser (Source: Parasoft)
Static analysis is effective at catching simple defects early in the development cycle, but its
limited adaptability calls for more advanced approaches.
2. Code Smell Detection and Maintainability
The concept of code smells, introduced by Fowler (1999), describes symptoms of poor
design that hinder long-term maintainability. Examples include long methods, duplicated
code, and large classes. Tools like SonarQube and PMD include modules to detect such
smells.
Researchers have contributed significantly in this area:
Marinescu (2004) proposed metric-based detection strategies using coupling,
cohesion, and complexity indicators.
Olbrich et al. (2010) found that classes with smells
tend to accumulate more defects during software
evolution.
Fontana et al. (2016) validated that eliminating
smells early improves maintainability and reduces
technical debt.
Figure 2: Common code smells that impede maintainability (Source: 8th Light)
While smell detection has improved awareness of design flaws, rule-based methods struggle
with context sensitivity. For example, a “large class” may be acceptable in certain
framework libraries but harmful in business logic. This reinforces the need for context-
aware analysis.
3. AI and Machine Learning Approaches
With advances in AI, researchers have applied machine learning to improve software
analysis. Unlike static rule-based methods, AI can learn patterns from large code
repositories, making predictions about bugs, smells, or vulnerabilities.
3.1 Bug Prediction Models
Nagappan et al. (2006) used logistic regression on code churn metrics to predict
defect-prone files.
Kim et al. (2011) leveraged change history mining for more accurate bug prediction.
3.2 Deep Learning Representations
Allamanis et al. (2018) introduced code2vec, a method that represents source code
as vectors for tasks like bug detection and method prediction.
Feng et al. (2020) developed CodeBERT, trained on massive GitHub repositories,
enabling semantic understanding for vulnerability detection.
3.3 AI in Security
Li et al. (2018) used recurrent neural networks (RNNs) to detect SQL injection
vulnerabilities.
Russell et al. (2019) demonstrated AI-driven detection of cross-site scripting attacks.
Figure 3: Evolution of AI—from rule-based logic to deep learning models (Source:
GeeksforGeeks)
*AI models achieve higher accuracy and adaptability, but their “black box” nature reduces
developer trust, as they often lack explainability.*
4. Hybrid and Context-Aware Systems
Recognizing the weaknesses of both static and AI-only methods, researchers propose hybrid
analysers that combine rule-based analysis with machine learning.
White et al. (2019) integrated static rules with ML classifiers, reducing false positives
by learning from developer feedback.
Tufano et al. (2020) applied neural machine translation techniques to suggest bug
fixes, showing promising results over traditional refactoring.
Industrial systems like Google’s Tricorder and Facebook’s Sapienz already implement
hybrid approaches in large-scale settings, though they remain proprietary.
Hybrid systems combine the explainability of static rules with the adaptability of ML models,
making them the most balanced approach.
From the reviewed literature, the following insights emerge:
Static analysers are efficient for surface-level errors but limited in adaptability and
prone to false positives.
Code smell research highlights maintainability issues but struggles with contextual
accuracy.
AI-based methods excel at learning semantic patterns but lack transparency and
developer trust.
Hybrid systems emerge as the most effective direction, blending clarity with
intelligence.
These findings support the motivation for CodeSense, which aims to implement a hybrid
analyser enhanced with AI techniques to provide context-aware, adaptive, and developer-
friendly insights. By blending the strengths of traditional rule-based approaches with
modern AI capabilities, CodeSense is positioned to minimize false positives, offer actionable
recommendations, and continuously adapt to new programming trends. Moreover, its
ability to integrate seamlessly into existing development workflows makes it practical for
real-world adoption in both industry and academia.
PROBLEM FORMULATION
Despite advances in software engineering, code quality assurance still faces two key
limitations:
1. Static tools – Rigid, rule-based, and unable to adapt to evolving frameworks.
2. AI-only tools – Powerful but opaque, often acting as “black boxes” with limited
explainability.
Thus, the research problem is:
“How can we design an AI-driven code analyser that combines the precision of static
analysis with the adaptability of machine learning to ensure higher code quality, reduced
defects, and improved maintainability?”
Significance of the Research
For Developers: Reduces debugging effort and accelerates development cycles.
For Organizations: Lowers maintenance costs and improves software reliability.
For Academia: Provides a framework for applying AI in software engineering
education.
For Security: Identifies vulnerabilities early, reducing risks of cyberattacks.
By addressing these needs, CodeSense bridges the gap between traditional analysers and
modern AI-driven systems, contributing to the advancement of intelligent software quality
assurance.