ACER An AST-based Call Graph Generator Framework

The document introduces ACER, an AST-based call graph generator framework designed to facilitate the development of call graph generators across various programming languages. It emphasizes the advantages of using abstract syntax trees (ASTs) over intermediate representations (IRs) for speed and simplicity, while also detailing the framework's architecture and components. The paper also discusses the taxonomy of call graph generation algorithms and the challenges associated with implementing them.

Uploaded by

joashjohnson2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views6 pages

ACER An AST-based Call Graph Generator Framework

Uploaded by

joashjohnson2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)

ACER: An AST-based Call Graph Generator

Framework
Andrew Chen Yanfu Yan Denys Poshyvanyk
Computer Science Department Computer Science Department Computer Science Department
William & Mary William & Mary William & Mary
Williamsburg, USA Williamsburg, USA Williamsburg, USA
achen08@wm.edu yyan09@wm.edu dposhyvanyk@wm.edu

Abstract—We introduce ACER, an AST-based call graph enclosing classes, name, and argument types are all taken into
generator framework. ACER leverages tree-sitter to interface account.
with any language. We opted to focus on generators that operate
A call site’s left hand side (if it exists) is referred to as the
on abstract syntax trees (ASTs) due to their speed and simplicitly
in certain scenarios; however, a fully quantiﬁed intermediate receiver. In b.bar(), the receiver is b. Call containers are
representation usually provides far better information at the cost nodes that lexically contain call sites. There are two call con-
of requiring compilation. To evaluate our framework, we created tainers in the ﬁgure: class Foo and void method1(Bar
two context-insensitive Java generators and compared them to a).
existing open-source Java generators.
Code: https://github.com/WM-SEMERU/ACER Call graph generators vary mainly by the following three
Index Terms—Call Graph, Static Analysis, Software Frame- parameters: algorithm, source format, and scope of analysis.
works We list examples in Table I.

I. I NTRODUCTION Paramters Examples

Long-standing research endeavors in static call graph gen- Algorithm NR, CHA, SCHA, RTA, k-CFA
eration have made significant contributions to the field of Scope of Analysis Application-only, Library-only, In-
program analysis over the years. Call graphs can be used between, Full software
in a wide variety of areas: from critical tasks like compiler Source Format Raw source, AST, IR
optimization [1] and detecting security vulnerabilities [2], to TABLE I: Generator parameters and examples.
useful applications such as code profiling [3], refactoring [4],
and navigation [5]. Formally, a call graph is defined as a
Generators first differ drastically by their algorithm. Grover
directed graph G = (V, E) wherein each vertex v ∈ V
et al. [6] established a taxonomy for call graph algorithms over
represents a method, and an edge e = (v1 , v2 ) ∈ E signifies
twenty years ago, which Tip et al. [7] broadly categorized into
that the method v1 invokes the method v2 within its body.
three levels:
1) Simple and few-passed, such as Name-based Resolu-
class Bar { tion (NR) and Class Hierarchy Analysis (CHA). NR
void bar() {} disregards receivers and resolves call sites using solely
} the method name. CHA, on the other hand, considers
public class Foo { receivers and even their subtypes to support polymor-
Bar b = new Bar(); phism.
void method1(Bar b) { 2) Context-insensitive but incorporates some points-to anal-
b.bar(); ysis. Examples include Variable Type Analysis (VTA)
} and Andersen-style Pointer Analysis (APA) [8].
} 3) Context-sensitive, exemplified by k-CFA (k-Control
Flow Analysis), where k means to utilize the last k call
Fig. 1: For jargon demonstration purposes. container contexts.

In call graph terminology, an invocation is known as a Call graph generators also vary in their scope of analysis, the
call site. There exist two call sites in Figure 1: new Bar() part of the full software analyzed. By full software, we mean
and b.bar(). The main operation of call graph generators the application source plus its library dependencies. There are
lies in call site resolution — the process of identifying the four levels of scope of analysis (from narrow to broad):
fully quantiﬁed method(s) a call site corresponds to. In Java, 1) Application-only: Only includes edges between applica-
a method is considered fully quantiﬁed when its package, tion methods.

979-8-3503-0506-7/23/$31.00 ©2023 IEEE

254
DOI 10.1109/SCAM59687.2023.00035
Authorized licensed use limited to: Fr C Rodrigues Institute of Technology Vashi. Downloaded on August 25,2025 at 09:05:04 UTC from IEEE Xplore. Restrictions apply.
2) Library-only: While libraries by themselves could just erators do not impose. Further, IR compilation may com-
be regarded as applications, Reif et al. introduced al- pute information irrelevant to generators. Consequently,
gorithms dedicated to libraries [9]. The key distinction AST-based generators theoretically perform faster.
is that libraries could be analyzed from the user’s 2) Generalizable: Most languages can be parsed to an
perspective and assume certain private methods to be AST while only some statically-typed languages have
closed off. This constructs a smaller set of entry points compilers that output rich, fully-resolved IRs.
and thus a more precise graph. 3) Synergizes with the application-only scope: If the goal
3) In-between: This contains application-only edges and is to generate application-only edges, there’s no need
library methods invoked in application code [10]. to consider the dependencies. AST-based generators
4) Full-software: This contains in-between edges and in- complement this by operating directly on ASTs from ap-
cludes library methods invoked in library code. plication source. Conversely, IR-based generators must
Lastly, generators differ by their source format. Source- take into account the dependencies, as most IRs can only
based generators use the raw source files while AST-based be produced when the entire software is compilable.
generators have access to the abstract syntax trees. The choice We have outlined the key generator parameters and provided
of algorithm is tied with the source format — source-based reasons for using AST-based generators. We now turn to the
generators are likely to implement the simplest NR algorithm focus of the paper — how to construct AST-based generators.
because extracting information from unstructured raw code The first piece of our approach is the powerful tree-sitter
is hard and unscalable. Examples of source-based generators [22] library. Tree-sitter is both a parser generator tool and
include the internal utility within Doxygen [11] and this Perl- an incremental parsing library. It provides parsers for all
based multilingual generator [12]. Examples of AST-based popular languages and a common interface to interact with
generators are the popular Python generators [13] [14], which the parsers and their concrete syntax trees outputs. Though
operate on the outputs of the ast and symtable modules. the common interface is written in C, tree-sitter provides
Additionally, there exist IR-based generators, which operate additional language-bindings — we use the Python binding.
on some internal representation (IR) created during compi- The second piece of our approach consists of language-
lation. Typically, only statically-typed languages have IRs. agnostic components and utilities. These abstract the shared
IRs deliver information that neither raw sources nor ASTs logic common to generators across various algorithms, scopes,
directly provide — the most important data being fully- and languages. These components are detailed in section III.
qualified functions. Since a call graph’s vertices are just fully- Together, these two pieces form the backbone of our frame-
resolved functions, the construction of a sound call graph from work, ACER, designed to simplify the development of call
IR is simple (but not trivial due to polymorphism) because IR graph generators.
has already resolved the functions. AST-based generators, on Here are our main contributions:
the other hand, need to implement resolution logic. 1) Introduced ACER, an AST-based Callgraph GEnerator
Examples of IR include JVM bytecode [15] and the fully- FRamework.
resolved syntax tree outputted by Clang-based compilers dur- 2) Provided a taxonomy for generator design choices and
ing semantic analysis [16]. We consider Clang’s rich tree to document implementational challenges.
be IR and no longer just an AST. Tools that generate IR 3) Developed and evaluated a single NR and a single SCHA
usually require the source to be compilable — a program’s (Simple CHA) AST-based Java generator using ACER.
dependencies thus must be pulled in and built. Ideal and
error-resilient tools like Roslyn [17] make exceptions — II. BACKGROUND AND R ELATED WORK
they can generate the IR without building the dependencies In the last section, we introduced the main call graph
because they resolve all resolvable entities and simply flag parameters: algorithms, source formats, and scope of analysis.
the unresolvable ones. On the other hand, most tools like In addition to these main traits, there exist subtle details like
Clang AST and GCC [18] either throw or remove unresolvable reachability. In this section, we expand on all parameters and
entities. discuss the challenges in implementing them.
As an aside, languages that can compile to JVM bytecode
can get call graph generators “for free” through bytecode A. Algorithm
analysis frameworks WALA [19] and Soot [20]. However, An advanced call graph algorithm builds upon a simpler one
the implementation details matter greatly — JVM-hosted im- by either pre-caching supplementary structures or monitoring
plementations of Groovy, Clojure, Python, and Ruby produce additional runtime data. High recall and soundness is easier
very unsound call graphs due to dynamic translation schemes to achieve while high precision demands significantly more
[21]. computational power [7].
Although IRs contain more useful data than ASTs, we still As previously mentioned, the source format is tied inti-
build our call graph framework based on AST for the following mately with the algorithm. The Java bytecode IR, if interpreted
reasons: directly, encourages an algorithm that is more precise than NR
1) Compile-free: Most IRs can only be generated if the but less sound than CHA. Surprisingly, four out of the six most
source is compilable, a restriction that AST-based gen- popular Java generators use this direct interpretation and thus

255

Authorized licensed use limited to: Fr C Rodrigues Institute of Technology Vashi. Downloaded on August 25,2025 at 09:05:04 UTC from IEEE Xplore. Restrictions apply.
do not fully support CHA. We illustrate the difficulty of fully D. Ambiguity Problems
supporting CHA in Figure 2. Application-only, AST-based generators face ambiguity
problems. Because IRs come from compilation, they have
class A { void method() {}; } fully-resolved all entities. But, application-only AST-based
class B extends A {} generators can never fully resolve because they do not have
class C extends B {} access to the library sources. We illustrate this ambiguity in
Figure 3. For a reasonable implementation, methods in the
class Bar {
void foo(A a) { // Could be A, B, C import java.util.*;
a.method();
} class Bar {
} int add(int a, int b) {...}
int add(float a, float b) {...}
Fig. 2: Liskov’s substitution principle [23] in play. void foo(List<?> l1, List<?> l2) {
int sum = add(l1.size(), l2.size());
The call site a.method(), in JVM bytecode, is }
represented as 1: invokevirtual #2 // Method }
A.method:()V. A simple, direct generator deduces
the edge from this information alone, thereby including Fig. 3: Without library analysis, we do not know the type of
only the edge from Bar.foo → A.method. However, l1.size() and l2.size(). So, we can at best return all
polymorphism also allows objects of type B and C to be methods named add with two arguments.
passed into foo. Thus, sound analysis must include the
additional edges Bar.foo → B.method and Bar.foo → figure should only aim to be identified by their package, class,
C.method. method names, and the number of arguments, which creates
To support CHA, the generator must pre-cache a class unnecessary edges.
hierarchy. After the type of a receiver is calculated, the In-between scoped and AST-based generators like PyCG do
generator considers the receiver to inhabit any subtype of the not face this problem. PyCG uses importlib to attempt to
type calculated. To support VTA and k-CFA, the generator also resolve library definitions.
need to consider intraprocedural assignments and data flow.
III. T OOL D ESCRIPTION AND I NTERNALS
B. Handling Language Features
Practical call graph generators abiding by the same al-
gorithm and scope of analysis can still differ due to the
attention to details [24]. Some Java generators might not regard
static initializer blocks as call containers, or disregard default
interface methods. Conversely, some Python generators may
not handle higher-order functions and iterators. In the last
subsection, we also touched on the importance of handling Fig. 4: Call graph Generation Flow. Preprocessor and Gener-
polymorphism, which is addressed by building and using a ator are user-implemented.
class hierarchy.
In this section, we describe the two essential classes of
C. Scope of Analysis ACER: Preprocessor and Generator. We also show how
Earlier, we mentioned that AST-based generators synergize ACER can be used to implement a hypothetical, application-
with application-only scope. However, for broader scopes only, AST-based, and fully-typed CHA Java generator. Gen-
starting from in-between, IRs are preferred because most IRs erators built using ACER are by default confined to ASTs
compile the libraries by default. for their source format and initiate their scope of analysis at
As a side note, what Karim et al. [10] regards as the application-only level. The algorithm, reachability, and the
“application-only” is what we consider as “in-between”. Their treatment of language features can be freely explored.
framework, CGC, parses just the structural information of the Users create new generators by extending the Preprocessor
libraries to efficiently include edges from application to library and the Generator and implementing a handful of abstract
and vice versa. methods. Internally, we make use of strict type hints and
An additional parameter in the realm of scope is reachabil- generics to guide implementation.
ity. The most sound analysis considers all methods as potential
entry points to handle potential multi-threading [9]. Precise Preprocessor
generators may consider a smaller set of entry points (the The sole purpose of the Preprocessor is to cache lookup
singleton set of just the main method). structures that the Generator utilizes. In this section, we first

256

Authorized licensed use limited to: Fr C Rodrigues Institute of Technology Vashi. Downloaded on August 25,2025 at 09:05:04 UTC from IEEE Xplore. Restrictions apply.
describe the inputs and outputs of the Preprocessor. Then, can narrow down these entry points; for example, they can
we give a high level description of our CHA generator’s filter out methods whose names are not “main”.
Preprocessor implementation. We now turn to explaining the generation flow, summarized
Preprocessor first loads the tree-sitter libraries: The C run- by the pseudocode in algorithm 1:
time and the input, language-specific parser. Preprocessor then
parses the input source files into tree-sitter trees. These trees Algorithm 1: Generation Flow
are composed of tree-sitter Nodes, which contain information Input: Preprocess Results P , Entry Points EP
like node type, text, and pointers to parent and children. Data: AnalysisDeque
Provided with the roots of these trees, the Preprocessor then Output: Call graph G(V, E)
produces its outputs. These outputs are the desired lookup 1 initializeDeque(EP )
structures. The Preprocessor class has two abstract methods, 2 visited = empty set()
which mandates the constructions of the two following struc- 3 while AnalysisDeque.not empty() do
tures: 4 if call site.id ∈ visited then
1) method_dict: A mapping between unique 5 continue
method keys to their method bodies (tree- 6 end
sitter Node objects). Our Java CHA generator 7 visited.add(call site.id)
creates the method_dict by first locating all 8 next contexts = resolve(context, call site)
nodes of types method_declaration and 9 foreach context ∈ next contexts do
constructor_declaration. From these nodes, 10 call container = context.call container
the generator navigates upwards to locate their enclosing 11 call sites =
class and package for full resolution. seek call sites(call containers)
2) unique_dict: A cache between non-unique keys to 12 foreach call site ∈ call sites do
unique keys. The unique_dict exists due to the 13 AnalysisDeque.add((context, call site))
application-only ambiguity problem described in sub- 14 end
section II-D: Call sites can not always resolve their 15 end
argument types and should instead be identified by their 16 end
argument count, but this introduces plurality which we 17 return G = (V, E)
keep track of through unique_dict.
While every Preprocessor must cache method_dict and We clarify a few aspects. First, similar to Preproces-
unique_dict, advanced algorithms require more structures sor, Generator is also an abstract class. It requires two
to be cached. Our Java CHA Preprocessor builds two addi- abstract methods to be implemented: seek_call_sites
tional structures: and resolve. seek_call_sites finds new call sites
1) package_importables: Contains the mapping to be added to the analysis from call containers. resolve
from package to its exports (e.g., fully quantified classes, determines the fully quantified function(s) that a call site
interfaces, and enums). This aids full type resolution. corresponds to.
2) class_cache: Contains the mapping from each class The generation algorithm keeps the following two runtime
to its fields, subclasses, and method signatures. The structures:
cached fields and method signatures help resolve com- 1) AnalysisDeque: An explicit deque is used to keep
plex receivers. The cached subclasses mappings effec- track of the call sites to resolve. By default, the algo-
tively form a class hierarchy, which the generator uses rithm automatically adds call sites contained by meth-
to handle polymorphism. ods. This explicitness, however, allows users to add call
Both of these structures are created in a similar manner to sites contained by non-method call containers.
how method_dict was created — a parallelizable operation 2) context: Caches data-flow and points-to analysis re-
that runs on each file. All of these structures are then passed sults required by VTA and k-CFA.
to the Generator. Our CHA generator must implement seek_call_sites
and resolve. Our generator seeks call sites by
Generator returning all instances of method_invocation and
The Generator creates the call graph from the specified object_creation_expression nodes within call
source files. Similar to the previous section, we first describe containers.
the I/O behavior of the Generator. Then, we discuss the Implementing resolve requires more sophistication. We
internal flow of the Generator before concluding with how show just the logic to resolve a method_invocation
we can implement the hypothetical CHA Generator. node, which looks like receiver.method(...). The
The inputs to the Generator are the source files, entry receiver may be of any expression. In JDK8, we found that
points, and the Preprocessor results. By default, the entry implicit/explicit this and identifiers make up 88% of the re-
points are all methods (the keys of method_dict). Users ceivers while field_access and method_invocation

257

Authorized licensed use limited to: Fr C Rodrigues Institute of Technology Vashi. Downloaded on August 25,2025 at 09:05:04 UTC from IEEE Xplore. Restrictions apply.
Soot OSA SPOON JCG WALA JDT NR NR-ALL SCHA SCHA-ALL
Soot 34574 44.6% 41.3% 42.4% 16.4% 44.0% 59.8% 69.5% 1.7% 25.5%
OSA 45.8% 33685 45.5% 51.1% 12.8% 56.5% 56.4% 65.0% 1.4% 25.5%
SPOON 87.7% 94.3% 16263 82.7% 27.0% 92.2% 64.6% 76.6% 2.5% 40.1%
JCG 64.3% 75.4% 58.9% 22834 17.8% 74.4% 53.8% 62.1% 2.1% 35.7%
WALA 100.0% 75.8% 77.4% 71.5% 5668 71.3% 66.2% 73.3% 6.1% 41.1%
JDT 78.1% 97.8% 77.1% 87.3% 20.8% 19463 70.5% 81.9% 2.1% 33.6%
NR 5.4% 4.9% 2.7% 3.2% 1.0% 3.6% 384315 100.0% 0.2% 3.5%
NR-ALL 5.5% 5.0% 2.9% 3.2% 1.0% 3.7% 88.0% 436726 0.2% 3.4%
SCHA 18.0% 14.1% 12.5% 14.6% 10.7% 12.5% 23.6% 24.4% 3242 100.0%
SCHA-ALL 14.5% 14.2% 10.8% 13.4% 3.8% 10.8% 21.9% 24.7% 5.3% 60677

TABLE II: Common calls of the ArgoUML project between 6 existing generators and 4 of our generators.

nodes only made up 8%. A call site whose receiver is of type measure our speed performances against existing tools, we
field_access will look like a.b.c, where the receiver is cannot directly compare the call graph generation times. Speed
a.b.; a call site whose receiver is a method invocation node depends on multiple factors, the most important being the
looks like a.b().c, where the receiver is a.b(). preciseness of the algorithm. Given that the algorithms of
In all of these scenarios, we must first resolve the full type the evaluated generators are quite different, we first present
of the identifier a. We implement this by walking upwards edge comparisons across all tools. This establishes a baseline
from the identifier to find where it was introduced. We scan level of precision that serves to validate the more important
the declaration statements, method arguments, and class fields. measurements.
However, we can only expect to retrieve the shorthand type We borrow the edge results of six open-source Java gener-
from these declarations because Java supports type aliasing. ators from Jász et al. [24]. They evaluated existing tools on
To resolve the full type, we build a list of possible alias to ArgoUML, a sufficiently large, UML diagramming application
full type mappings by analyzing the import statements and written in Java. They thoughtfully performed full-software, in-
utilizing package_importables. Then, we determine the between, and application-only analysis. Because our genera-
full type by querying the aliased type against the built list. tors are application-only, we care only for the application-only
If the query yields no results, the aliased type is probably findings, which are shown in Table II.
defined by the libraries, which the application-only scope The cells along the diagonal represents the number of edges
ignores. Once the receiver’s full type is resolved, we address the corresponding generator created. The other cells represents
polymorphism by identifying the type’s subclasses through a the ratio of the edges discovered by the row generator that were
query to class_cache. also discovered by the column generator. For example, 44.6%
With the full type of the identifier a determined, call sites of Soot’s edges were also discovered by OSA. The last four
like a.b() can be easily resolved. To resolve scenarios like rows and columns show the edge performances of our tool.
a.b.c() and a.b().c, we just have to utilize the lookup The “ALL” suffix signifies that a generator considers the entry
structures generated by the Preprocessor a bit more. points to be the set of all methods.
We have covered the high level overview of building a As expected, the NR generators have high recall but low
Java CHA generator with ACER. Evidently, much of the user precision — they contain, on average, 52.1% of the edges
implemented logic deals with type resolution, which comes that other tools generated. But other tools, on average, only
with freely with IR. contain 3% of the edges NR generated. Conversely, the SCHA
IV. E XAMPLES AND E VALUATION generators have higher precision but lower recall.
Generating more edges seems to correlate with a faster
Under ACER, we implemented a NR generator and a SCHA
and sounder algorithm while generating less edges seem to
generator. We elaborate on our SCHA algorithm.
imply a slower and more precise algorithm. This observation
SCHA (Simple Class Hierarchy Analysis) is indeed true for our sound NR tool, which had the most
Our SCHA generator is simple in comparison to the hy- edges, and WALA, which ran an approximation of the APA
pothetical CHA generator of section III. First, our generator algorithm. In reality, this is not always the case. The difference
uses aliases, so collisions will occur if two methods belong in choosing which language features to handle drastically
to classes of the same alias and have the same name and change the edges. In the case the SCHA tool, it generated the
number of arguments. Second, only call sites with a simple least edges due to its simplicity (e.g., only resolving call sites
identifier receiver like a.b() and those with no callers are with identifier receivers) and not its algorithm rigor (which
considered. Call sites with complex receivers are ignored (e.g., falls behind WALA’s APA and Soot’s CHA).
(builder.make_class()).method(). We admit that further detailed analysis on why the edges
differ should be conducted. But, with this anchor set, we now
Evaluation turn to discuss speed performances.
Our evaluation goals revolve around measuring framework- We evaluated the time it took for JCG, WALA (in CHA
centric metrics, e.g., speed and lines-of-code. However, to mode), NR-all, and SCHA-all to operate on JDK8’s jar, on

258

Authorized licensed use limited to: Fr C Rodrigues Institute of Technology Vashi. Downloaded on August 25,2025 at 09:05:04 UTC from IEEE Xplore. Restrictions apply.
Generator Preprocessing Time Generation Time 2022 for contributing to the initial version of the project. This
NR 00:32 01:23 research has been supported in part by the NSF CCF-2311469,
SCHA 00:59 02:29 CNS-2132281, CCF-2007246, and CCF-1955853. We also
JCG 20:00 00:07 acknowledge support from Cisco Systems. Any opinions,
WALA 20:00 01:50 findings, and conclusions expressed herein are the authors’
and do not necessarily reflect those of the sponsors.
TABLE III: Generator speeds in mm:ss
R EFERENCES
[1] S. Muchnick, Advanced Compiler Design and Implementation. San
a machine with a Intel® Core™ i7-11800H processor. We Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1998.
[2] V. Prokhorenko, K.-K. R. Choo, and H. Ashman, “Web application
zoomed in on JCG because it most closely resembles NR-ALL protection techniques,” vol. 60, no. C, p. 95–112, jan 2016. [Online].
in terms of straightforwardness and WALA (in CHA mode Available: https://doi.org/10.1016/j.jnca.2015.11.017
instead of APA) because it most closely resembles SCHA. At [3] J. M. Spivey, “Fast, accurate call graph profiling,” Softw: Pract. Exper.,
vol. 34, pp. 249–264, 2004.
last, we consider all methods as entry points because JDK8 is [4] A. Feldthaus, T. Millstein, A. Møller, M. Schäfer, and F. Tip,
a library and does not have main methods. “Tool-supported refactoring for javascript,” SIGPLAN Not., vol. 46,
Per the table, other tools outperform our tools in terms of no. 10, p. 119–138, oct 2011. [Online]. Available: https://doi.org/10.
1145/2076021.2048078
generation speed. But, this can be reasonably attributed to the [5] A. Feldthaus, M. Schäfer, M. Sridharan, J. Dolby, and F. Tip, “Efficient
fact that IR has already fully quantified all entities. For a fair construction of approximate call graphs for javascript ide services,” 2013
comparison, we must consider the preprocessing time it took 35th International Conference on Software Engineering (ICSE), 2013.
[6] D. Grove and C. Chambers, “A framework for call graph construction
to build the heavy JDK8 jar, which takes 20 minutes. For our algorithms,” ACM Trans. Program. Lang. Syst., vol. 23, no. 6, p.
tools, we attribute preprocessing time to the Preprocessor and 685–746, nov 2001. [Online]. Available: https://doi.org/10.1145/506315.
generation time to the Generator. With the preprocessing time 506316
[7] F. Tip and J. Palsberg, “scalable propagation-based call graph construc-
considered, our tools perform much faster. tion algorithms,” SIGPLAN Not., vol. 35, pp. 281–293, 2000.
At last, we note that the framework itself, including utilities, [8] L. O. Andersen and P. Lee, “Program analysis and specialization
contains 800 lines of code in Python. The NR generator was for the c programming language,” 2005. [Online]. Available: https:
//api.semanticscholar.org/CorpusID:20876553
written in only 100 lines and the SCHA took 300 lines. [9] M. Reif, M. Eichberg, B. Hermann, J. Lerch, and M. Mezini, “Call
The additional lines in SCHA went towards generating class graph construction for java libraries,” Proceedings of the 2016 24th
hierarchy and resolving the full type of identifiers. We actually ACM SIGSOFT International Symposium on Foundations of Software
Engineering, 2016.
wrote two incomplete CHA generators for Java and Python [10] K. Ali and O. Lhoták, “application-only call graph construction,” pp.
that attempted to resolve all expression types. They came 688–712, 2012.
out to around 3K lines — most lines going towards type [11] “Doxygen,” accessed: 2023-07-05. [Online]. Available: http://www.
doxygen.nl/
resolution. Thus, we plan to research further in designing [12] A. F. N. Koknat, “callgraph,” 2023, gitHub repository. [Online].
language-agnostic abstractions for assisting entity resolution. Available: https://github.com/koknat/callGraph
As mentioned before, the ideal solution is a source format that [13] D. Fraser, “Pyan,” 2021, gitHub repository. [Online]. Available:
https://github.com/davidfraser/pyan
is fully quantified wherever possible, resilient to errors, and [14] V. Salis, T. Sotiropoulos, P. Louridas, D. Spinellis, and D. Mitropoulos,
does not require full compilation and dependency building. “Pycg: practical call graph generation in python,” 2021 IEEE/ACM 43rd
International Conference on Software Engineering (ICSE), 2021.
V. C ONCLUSION AND F UTURE W ORK [15] T. Lindholm, F. Yellin, G. Bracha, and A. Buckley, The Java
Virtual Machine Specification, Java SE 8 Edition. Pearson
In this paper, we presented ACER, an AST-based call graph Education, 2014. [Online]. Available: https://books.google.com/books?
generator framework. First, we introduced the parameters of id=0o-AAwAAQBAJ
call graph generators and highlighted implementation chal- [16] L. Project. (2023) Clang: a c language family frontend for llvm.
[Online]. Available: https://clang.llvm.org/
lenges. Then, we discussed the process in building a hypo- [17] M. Corporation. (2023) Roslyn (.net compiler platform). [Online].
thetical CHA generator under ACER. At last, we evaluated Available: https://github.com/dotnet/roslyn
our Java NR and SCHA generators against existing, IR-based [18] F. S. Foundation. (2023) Gnu compiler collection. [Online]. Available:
https://gcc.gnu.org/
Java generators. [19] IBM Corporation, “WALA - T.J. Watson libraries for analysis,” https:
Here is a list of items we plan to address in our future work: //github.com/wala/WALA, 2019.
1) Conduct an in-depth analysis on edge outputs of differ- [20] Sable Research Group, “Soot: A java bytecode optimization framework,”
https://soot-oss.github.io/soot/, 2023.
ent generators on JDK8. Build a table similar to Table II [21] K. Ali, X. Lai, Z. Luo, O. Lhotak, J. Dolby, and F. Tip, “A study of
and attribute edges variations to specific design choices. call graph construction for jvm-hosted languages,” Ieee Transactions on
2) Build abstractions to assist name resolution and full Software Engineering, vol. 47, pp. 2644–2666, 2021.
[22] T. sitter Contributors, “Tree-sitter: An incremental parsing system
quantification. for programming tools,” 2021. [Online]. Available: https://github.com/
3) Build generators for other languages to demonstrate the tree-sitter/tree-sitter
multilinguality of the framework. [23] B. H. Liskov and J. M. Wing, “A behavioral notion of subtyping,”
ACM Trans. Program. Lang. Syst., vol. 16, no. 6, p. 1811–1841, nov
1994. [Online]. Available: https://doi.org/10.1145/197320.197383
VI. ACKNOWLEDGEMENTS [24] J. Jász, I. Siket, E. Pengő, Z. Ságodi, and R. Ferenc, “systematic
We would like to thank the CSci 435 students Vinny Allegra, comparison of six open-source java call graph construction tools,” 2019.
Daniel Lee, Aamir Mohammed, and Chas Rinne from Fall

259

Authorized licensed use limited to: Fr C Rodrigues Institute of Technology Vashi. Downloaded on August 25,2025 at 09:05:04 UTC from IEEE Xplore. Restrictions apply.

Application-Only Call Graphs
No ratings yet
Application-Only Call Graphs
25 pages
Evaluation of Static Javascript Call Graph Algorithms
No ratings yet
Evaluation of Static Javascript Call Graph Algorithms
76 pages
Finite Models of Abstraction
No ratings yet
Finite Models of Abstraction
59 pages
Lecture 10 - January 25, 2010
No ratings yet
Lecture 10 - January 25, 2010
3 pages
WALA: Java Program Analysis Library
No ratings yet
WALA: Java Program Analysis Library
93 pages
Interprocedural Control-Flow Analysis: Nate Nystrom CS 711 6 Sep 05
No ratings yet
Interprocedural Control-Flow Analysis: Nate Nystrom CS 711 6 Sep 05
19 pages
A General Approach To Modeling Java Framework Behaviors
No ratings yet
A General Approach To Modeling Java Framework Behaviors
3 pages
Understanding JavaScript Language Insights and Applications
No ratings yet
Understanding JavaScript Language Insights and Applications
7 pages
Lecture 5
No ratings yet
Lecture 5
33 pages
Improving Code Quality Using The Roslyn Compiler API
No ratings yet
Improving Code Quality Using The Roslyn Compiler API
43 pages
Reflection Analysis For Java
No ratings yet
Reflection Analysis For Java
22 pages
Type-Based Analysis and Applications
No ratings yet
Type-Based Analysis and Applications
8 pages
C++ Network Programming: Systematic Reuse With ACE & Frameworks
No ratings yet
C++ Network Programming: Systematic Reuse With ACE & Frameworks
383 pages
Week-4 (Lecture-2)
No ratings yet
Week-4 (Lecture-2)
50 pages
Introduction to Software Architecture
100% (1)
Introduction to Software Architecture
15 pages
Chapter 2 (New)
No ratings yet
Chapter 2 (New)
45 pages
Review of GNU Cflow Functionality
No ratings yet
Review of GNU Cflow Functionality
4 pages
De-Virtualizing Virtual Function Calls Using Various Type Analysis Techniques in Object-Oriented Programming Languages
No ratings yet
De-Virtualizing Virtual Function Calls Using Various Type Analysis Techniques in Object-Oriented Programming Languages
9 pages
Unit 4 CD
No ratings yet
Unit 4 CD
21 pages
SOFT3406 Week11 2
No ratings yet
SOFT3406 Week11 2
30 pages
08 StaticAnalysis
No ratings yet
08 StaticAnalysis
39 pages
Admi Ftcorba
No ratings yet
Admi Ftcorba
41 pages
SAD Ch05 08 Software Design Patterns
No ratings yet
SAD Ch05 08 Software Design Patterns
52 pages
Agile Software Architecture Guide
No ratings yet
Agile Software Architecture Guide
36 pages
Assignment CS - 701
No ratings yet
Assignment CS - 701
9 pages
Mal & Rev
No ratings yet
Mal & Rev
193 pages
CS504 Short Notes: Want To Read All 40 Pages?
No ratings yet
CS504 Short Notes: Want To Read All 40 Pages?
1 page
Eclipse Control Flow Graph Plugin
No ratings yet
Eclipse Control Flow Graph Plugin
31 pages
Architectural Styles
No ratings yet
Architectural Styles
65 pages
SVF: LLVM Static Value-Flow Tool
No ratings yet
SVF: LLVM Static Value-Flow Tool
5 pages
Fault Taxonomy: Types and Examples
No ratings yet
Fault Taxonomy: Types and Examples
47 pages
Black-Box Testing Using Flowgraphs
No ratings yet
Black-Box Testing Using Flowgraphs
45 pages
UML Basics and System Architecture
100% (1)
UML Basics and System Architecture
101 pages
2-A Case of Dynamic Program Analysis - CS510 Software Engineering
No ratings yet
2-A Case of Dynamic Program Analysis - CS510 Software Engineering
42 pages
ROSE Tutorial
No ratings yet
ROSE Tutorial
430 pages
Module 3 Final 2
No ratings yet
Module 3 Final 2
21 pages
ROSE Tutorial
No ratings yet
ROSE Tutorial
432 pages
Software Component Reuse Guide
No ratings yet
Software Component Reuse Guide
11 pages
Software Evolution 2
No ratings yet
Software Evolution 2
29 pages
Component Technology Notes 2 PDF
No ratings yet
Component Technology Notes 2 PDF
26 pages
Assignment No 01:: Name: Fiza Saif Roll No: SE-080 Batch: 2019 Year: 2021 Second Year Department: Software Engineering
No ratings yet
Assignment No 01:: Name: Fiza Saif Roll No: SE-080 Batch: 2019 Year: 2021 Second Year Department: Software Engineering
9 pages
Software Component Design Guide
No ratings yet
Software Component Design Guide
25 pages
Rajib Mall Lecture Notes
No ratings yet
Rajib Mall Lecture Notes
73 pages
PYQs Unit 3 CD
No ratings yet
PYQs Unit 3 CD
34 pages
L8 Architecture
No ratings yet
L8 Architecture
44 pages
Com 310 Cat 1
No ratings yet
Com 310 Cat 1
6 pages
DevOps Static Code Analysis Guide
No ratings yet
DevOps Static Code Analysis Guide
11 pages
Stochastic Regular Expressions
No ratings yet
Stochastic Regular Expressions
16 pages
Service-Based CASE Model & Integration
No ratings yet
Service-Based CASE Model & Integration
22 pages
Case Studies
No ratings yet
Case Studies
41 pages
Software Tools and Environments
No ratings yet
Software Tools and Environments
5 pages
CodeSense 1
No ratings yet
CodeSense 1
8 pages
Connectors
No ratings yet
Connectors
40 pages
Hanyang - SE
No ratings yet
Hanyang - SE
72 pages
The InetAddress Class Has No Visible Constructors
0% (1)
The InetAddress Class Has No Visible Constructors
18 pages
Compiler Construction Past Paper 2022 Solution
No ratings yet
Compiler Construction Past Paper 2022 Solution
12 pages
Module-3 Handwritten Notes
No ratings yet
Module-3 Handwritten Notes
19 pages
Module-5 Handwritten Notes
No ratings yet
Module-5 Handwritten Notes
23 pages
Module-1 Handwritten Notes
No ratings yet
Module-1 Handwritten Notes
35 pages
Module-2 Handwritten Notes
No ratings yet
Module-2 Handwritten Notes
38 pages
Java Programming
No ratings yet
Java Programming
9 pages
Kotlin Docs
No ratings yet
Kotlin Docs
228 pages
IJACSA - Volume 3 No. 7, July 2012
No ratings yet
IJACSA - Volume 3 No. 7, July 2012
156 pages
WebAssembly vs. Linux Containers in Kubernetes
No ratings yet
WebAssembly vs. Linux Containers in Kubernetes
90 pages
125 Basic Core Java Iinterview Questions
No ratings yet
125 Basic Core Java Iinterview Questions
9 pages
Java Computer Programming Language
No ratings yet
Java Computer Programming Language
196 pages
Reference Manual Astah UML Professional
No ratings yet
Reference Manual Astah UML Professional
427 pages
Starting Out With Java: 7 Edition
No ratings yet
Starting Out With Java: 7 Edition
58 pages
Comptel Test Automation Framework: Installation Guide
No ratings yet
Comptel Test Automation Framework: Installation Guide
26 pages
Application Programming Interface (API)
100% (1)
Application Programming Interface (API)
8 pages
Object Oriented Programming CS F213: BITS Pilani
No ratings yet
Object Oriented Programming CS F213: BITS Pilani
26 pages
Camel Microservices for Developers
No ratings yet
Camel Microservices for Developers
67 pages
Getting Started HANAexpress VM
No ratings yet
Getting Started HANAexpress VM
20 pages
Maximo Finr Tune Guide
No ratings yet
Maximo Finr Tune Guide
72 pages
Infor ION Grid Administration Guide
No ratings yet
Infor ION Grid Administration Guide
89 pages
Photran 7.0 User's Guide PDF Version: Mariano M Endez September 28, 2011
No ratings yet
Photran 7.0 User's Guide PDF Version: Mariano M Endez September 28, 2011
98 pages
Advance OOP - 021814
No ratings yet
Advance OOP - 021814
64 pages
Ramchandra - Corejava (Update)
No ratings yet
Ramchandra - Corejava (Update)
603 pages
Advanced Programming Language 2 Lecture 0 - Course Introduction
No ratings yet
Advanced Programming Language 2 Lecture 0 - Course Introduction
563 pages
System Software Testing Manual
No ratings yet
System Software Testing Manual
86 pages
Building and Deploying Reliable High-Performance Java Applications
No ratings yet
Building and Deploying Reliable High-Performance Java Applications
4 pages
Analisis Dan Perancangan Aplikasi Sistem Informasi Simpan Pinjam Berbasis Android Versi 2.3 Pada Koperasi Credit Union Jakarta
No ratings yet
Analisis Dan Perancangan Aplikasi Sistem Informasi Simpan Pinjam Berbasis Android Versi 2.3 Pada Koperasi Credit Union Jakarta
10 pages
Java Virtual Machine Overview
No ratings yet
Java Virtual Machine Overview
34 pages
OOp With Java-Module1 Notes
No ratings yet
OOp With Java-Module1 Notes
52 pages
Job Portal Documentation
85% (26)
Job Portal Documentation
88 pages
Cheat Sheet Rebel JVM Options
No ratings yet
Cheat Sheet Rebel JVM Options
1 page
Introduction To Compiler Construction: Robert Van Engelen
No ratings yet
Introduction To Compiler Construction: Robert Van Engelen
13 pages
Java Programming Solved Papers
No ratings yet
Java Programming Solved Papers
98 pages
Java
100% (3)
Java
119 pages
(Ebook) Java 17 For Absolute Beginners by Iuliana Cosmina ISBN 9781484270790, 9781484270806, 1484270797, 1484270800
No ratings yet
(Ebook) Java 17 For Absolute Beginners by Iuliana Cosmina ISBN 9781484270790, 9781484270806, 1484270797, 1484270800
81 pages

ACER An AST-based Call Graph Generator Framework

Uploaded by

ACER An AST-based Call Graph Generator Framework

Uploaded by

2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)

ACER: An AST-based Call Graph Generator

I. I NTRODUCTION Paramters Examples

979-8-3503-0506-7/23/$31.00 ©2023 IEEE

You might also like