Solving ANRs 101:
Diving into the
Android framework
embrace.io
Executive
summary
      We recently released an eBook about why the complex mobile ecosystem – and
      the Android mobile ecosystem in particular – makes it so difficult to identify,
      prioritize, and solve Application Not Responding (ANRs) errors. We explored the
      causes and impact to your app and business. In this eBook, we’ll dive deeper
      into the technical causes of ANRs so your team can rapidly identify and solve
      them.
      While Android’s documentation might read as all-encompassing, our engineers
      – driven by curiosity and innovation – have dug deep into the source code itself
      to unlock important insights about these critical errors. In fact, what they’ve
      found has thoroughly busted the most common myths about ANRs, and we’re
      excited to share those insights with you here.
      You likely know that an ANR is “officially” triggered when the main thread has
      been blocked for at least 5 seconds. But, do you know what the source code
      actually says about ANRs for specific Android components such as activities,
      services, and broadcast receivers? (Hint: It’s not always 5 seconds!) Do you
      know exactly how the Google Play Console collects and reports ANR data
      compared to Firebase Crashlytics, and why it matters during troubleshooting?
             In this eBook, we’ll help you better understand
             the relationship between ANRs and the Android
             framework, including:
             • How the Android framework truly measures
               and triggers ANRs for Android components,
               including activities, services, and broadcast
               receivers.
             • The limitations of Google Play Console and
               Crashlytics when monitoring, identifying, and
               solving ANR errors.
             • How Embrace can better help you identify and
               eliminate ANRs.
             Let’s start with a quick recap.
                            2
How the Android
framework
measures ANRs
Quick recap: Why ANRs matter
The Google Play Store is extremely particular about user experience. It sets a strict threshold requiring that a
maximum of only 0.47% of daily active users can experience an ANR. (They’ve also recently added a second bad
behavior threshold, where only 8% of daily active users on a single device model can experience an ANR.) If you
exceed either of these thresholds, you can expect:
• A lower Play Store ranking, meaning less visibility in search results.
• Negative reviews, which impact your business by making user acquisition increasingly difficult.
• User churn, resulting from frustration associated with a frozen app experience.
The consequences of exceeding Google’s ANR threshold can cascade throughout your whole organization.
When you have negative reviews, don’t show up in search results, or provide a poor user experience, your user
engagement and revenue are impacted. You’ll have fewer purchases (both in-app or at checkout, in the case of
e-commerce apps). Your users will engage less as they experience frustration. And, you’ll have fewer installs which
can taint the rest of your brand — a compounding issue that can slow momentum for other apps in your company’s
portfolio.
Here’s a quick snapshot of how high ANR rates can impact your business:
IMPACT OF ANR                                    IMPACT OF ANR ON REVENUE
Lower Google Play Store ranking                  Lower visibility and fewer organic installs
Poor user experience                             Less engagement and decrease in user retention
User flow interruptions                          Fewer in-app purchases
Negative app store reviews                       Poor brand perception and impact on your other apps
Reminder: ANRs are not crashes
It’s easy to confuse crashes with ANRs. This is especially true because Google provides data as if ANRs were
crashes. While we’ll get into that detail later in this eBook, for now, it’s critical to remember that ANRs and crashes
are not the same. Mistaking the two as one and the same can result in a ton of wasted time chasing down root
causes that are totally unrelated. As a quick refresher:
• A crash is a coding error that kills your app immediately.
• An ANR is a frozen experience for your user.
According to the Android documentation, Google will report an ANR with an associated stack trace if the main
thread is blocked for 5 seconds*. We know it can be helpful to see examples from the real world, so here’s an
example of what an ANR and crash look like in an e-commerce app.
                                                           4
                                                 E-commerce app crash flow
                                       A user goes to the checkout screen and the app crashes.
                                                                           $19
                                              Buy                       Checkout
                                       Product Page                    Checkout                       Crash
                                                                        Screen
      THE CAUSE: A bug in the checkout code causes the crash due to an uncaught exception or a C signal.
      THE RESULT: Google provides a stack trace, which makes it easy to identify the cause of the crash and
      address the issue.
                                                    E-commerce app ANR flow
                                  A user goes to checkout screen and the screen is unresponsive.
                                                                          $19                          $19
                                             Buy                        Checkout                     Checkout
                                      Product Page                    Checkout                         ANR
                                                                       Screen
      THE CAUSE: The ANR could result from many possible root causes, including:
      • A heavy UI layout blocking the main thread for too long.
      • A misbehaving third-party SDK.
      • Background services blocking the main thread.
      THE RESULT: At the end of 5 seconds of non-responsiveness, Google will provide you with a stack trace
      to indicate an ANR. Unfortunately, that stack trace alone is rarely sufficient to get at the root cause of the
      ANR.
* While Android documentation states an ANR is triggered after 5 seconds, Embrace engineers have found exceptions to this “rule” within Google’s own
source code. We’ll explain those exceptions and bust the myth around the 5-second ANR trigger in the following section.
                                                                            5
How Google reports ANRs
The Android operating system monitors your main thread from a background thread. Once it detects that the main
thread is blocked, it starts an ANR timer on the background thread. According to the Android documentation, if that
timer exceeds a particular time threshold – 5 seconds – the Android OS triggers an ANR.
Because the Android OS treats ANRs the same way it treats crashes, it generates a single stack trace at the time
the ANR is triggered. Unfortunately, this means you don’t have insight into what has been happening from the
moment the ANR begins. We’ll dive further into why this is an issue later in this eBook.
Here are some more insights into the differences between the two main tools Google uses to provide ANR data:
Firebase Crashlytics and the Google Play Console.
                             FIREBASE CRASHLYTICS                     GOOGLE PLAY CONSOLE
                             Records the “exit reason” for a          Utilizes Android OS’s built-in trace file
Type of data collected       process on the device each time an       mechanism and records on the device’s file
                             ANR occurs                               system
                             Collected from Firebase for delivery     Up to 24-hour delay between ANR
When the data is
                             when the user relaunches the app         occurring and diagnostic data showing in
collected
                             after an ANR                             Play Console
Android versions
                             Android 11+                              All Android versions
supported
                                                         6
The truth
about ANRs
and the limitations of Google Play
Console and Crashlytics
What really triggers ANRs in Android
and why it matters for you
If we gave you a pop quiz and asked, “what triggers ANRs?”,
you’d technically be correct if you answered, “when the main
thread has been blocked for 5 seconds.” After all, it’s what the
Android documentation says.                                                     INSTEAD OF A UNIVERSAL
                                                                                5-SECOND ANR TRIGGER,
But at Embrace, we believe in a healthy skepticism. We are
                                                                              THERE ARE SEPARATE CHECKS
driven by a deep curiosity and a desire to help our customers
                                                                                  AND SEPARATE TIMING
optimize their user experience. So, our engineers went beyond
                                                                                  THRESHOLDS FOR EACH
the documentation and closely studied the Android source
                                                                                  ANDROID COMPONENT.
code. They found that what triggers ANRs in the Android
environment is far more nuanced than what the documentation
says.
In fact, there are specific components and situations that are quite different from the “universal” 5-second rule.
That’s why we created this eBook – so you’ll have an easy reference manual and can understand the important
nuances.
As a quick refresher, you know there are four main components in Android: activities, services, broadcast receivers,
and content providers. Every time one of these components performs work on the main thread, the Android
framework creates a timer on a monitor thread. Instead of a universal 5-second ANR trigger, there are separate
checks and separate timing thresholds for each component. Knowing the real timing will help you optimize how you
code your apps, and how best to understand what might really be causing an ANR.
To help, let’s dive into a few of these components to see how the Android source code helped us bust the myth of
the 5-second ANR trigger.
Busting the myths about activities & ANRs
Recap: An activity is a component that hosts some sort of user interface for user interaction. For example, an email
app might have one activity that shows a list of new emails, another activity to compose an email, and yet another
activity for reading emails.
When do activities trigger an ANR? Activities trigger an ANR if input dispatching takes more than 5 seconds. The
Android OS interprets actions like touches, entries, or taps on the screen or keyboard as an “input event.” The OS
places these input events onto an input dispatch queue and processes them on the main thread, where a watchdog
thread checks whether the processing takes more than 5 seconds. If the main thread is blocked or busy, the input
dispatch queue will not empty within 5 seconds. At this point, the watchdog thread will trigger an ANR.
                                        MYTH
                                        There is a 5-second threshold for an ANR to be
                                        triggered for activities.
                                        REALITY
                                        There is actually a 5-second threshold for ANRs to be
                                        triggered, but the timer only starts after an input event
                                        has been dispatched!
                                                          8
Busting the myths about services &
ANRs
Recap: A service is a component on Android which does not provide
a user interface, and typically performs long-running operations in
the background. This can include a service that plays music in the
background while the user is in a different app, tracks location or
performance via a consistency check on the database at regular
intervals, or fetches data over the network without blocking user       What happens
interaction with an activity.                                           when the Android
When do services trigger an ANR? There are two conditions when          framework
an ANR can be triggered in a service. If a foreground service does
not call startForeground in under 10 seconds, the Android OS will
                                                                        detects an ANR in
trigger an ANR. As a reminder, a foreground service is something        any component?
which is typically performing work in the foreground, like music
playback. If the service is in the background and doesn’t start
or bind in under 20 seconds, it will trigger an ANR. This longer             The Android framework
threshold is because it doesn’t have as high a priority on the CPU.     schedules work in the
We believe the thresholds are higher than 5 seconds for services        component, and then
because users aren’t interacting with services the same way they        starts a timer to continually
are with activities. Unlike the activities component, there’s no user   check whether the work is
input event required for the services component.                        completed within the allotted
                                                                        time period.
                                                                              If the ANR threshold is
                                                                        exceeded, the timer finishes
                        MYTH                                            its countdown, and the
                        There is a 5-second threshold for an            Android framework calls
                        ANR to be triggered for services.               into the ANR helper from the
                                                                        monitor thread.
                        REALITY
                        There is actually a 10-second                         The Android OS records
                        threshold for foreground services and           a trace file, including a stack
                        20-second threshold for background              trace, and shows an ANR
                        services!                                       dialogue to your user.
                                                                              Your user can decide to
                                                                        kill your app (in which case
                                                                        your process is terminated)
                                                                        or they can allow your app to
                                                                        continue waiting and see if it
Busting the myths about broadcast                                       catches up.
receivers & ANRs
Recap: A broadcast receiver is a component that enables the
system to deliver events to the app outside of a regular user
flow, allowing the app to respond to system-wide broadcast
announcements. This can encompass a broad array of functionality
including state changes or events on the device. Examples
include messaging apps that hook into SMS events and perform
functionality in the application, changes in network connection, or
locale changes that prompt language changes.
                                                          9
When do broadcast receivers trigger an ANR? Broadcast receivers
trigger an ANR if they take longer than 10 seconds to process a
message. However, there is an interesting exception to this: when
the Android OS is booting, a lot of CPU work is underway so early
broadcasts could be false positives.
                         MYTH                                               Easy tips for
                         There is a 5-second threshold for an               minimizing ANRs
                         ANR to be triggered by a broadcast
                         receiver.                                               Minimize network
                                                                            operations on the main thread
                         REALITY                                            – they can take a long time to
                         There is actually a 10-second                      complete!
                         threshold for a broadcast receiver!
                                                                                 Minimize file operations
                                                                            on the main thread. Offload
                                                                            them wherever possible to
                                                                            the background thread, and
                                                                            asynchronously wait on the
Why Google ANR reporting is                                                 results.
insufficient for each Android                                                     Minimize synchronization
component                                                                   on the main thread. The goal
                                                                            is to minimize the amount of
We’ve busted the myths that ANRs are universally triggered for each         waiting on locks and mutexes
component when the main thread is blocked for 5 seconds. As you             by confining it to the smallest
know, all Android components typically run on the main thread –             space possible. This can be
and often simultaneously. This can make it difficult to unmask the          challenging depending on
culprit of which component is specifically responsible for an ANR.          how you use concurrency and
                                                                            synchronization.
For example, any Android component can starve the main thread
so that an activity has no time to respond to input events. This                 Don’t forget about your
can trigger ANRs within an activity which indicate a problem in a           Native Development Kit (NDK)
different component.                                                        code! If the NDK layer blocks
                                                                            the main thread, you’ll trigger
When you receive a stack trace from Google, it can be impossible            an ANR if input dispatching
to determine which component is responsible – it depends which              times out.
component is running at the time the stack trace is generated.
For example, if a broadcast receiver blocks for 4.8 seconds and an
activity then blocks for 0.3 seconds, the activity will be the culprit in
the ANR stack trace.
This can make ANR error reports highly misleading when viewed
in isolation. Typical Android apps can have dozens of components
running at the same time. This means debugging the true cause
of ANRs in the real world is often even more difficult than in our
relatively simple example above. Stack traces provided by Google
aren’t helpful because they don’t let you see what is happening
from the moment the ANR begins and throughout the duration of
the ANR.
                                                            10
                                                                                                       ANR trace file
                                                                                                        generated
        User input
          event
00:00                00:01            00:02               00:03              00:04               00:05              00:06
As a visual example, picture the components running on the main thread as a series of cars driving on a road.
Google’s ANR stack trace would highlight the last blue car as the cause of the ANR. In reality, the freeze started
during the first red car. Collecting stack traces as soon as the app freezes is crucial for getting to the actual root
cause.
                                                            11
How Embrace makes
solving ANRs faster
and easier
An alternative solution for ANRs
Embrace is a data-driven toolset to help mobile engineers build better experiences. Because of this mobile-centric
approach, we have a very different method of data collection.
Unlike event-based monitoring solutions which limit data collection and can only help you solve known issues,
Embrace collects 100% of the data from every user session to provide capabilities that were previously impossible.
With full visibility across every user experience (including both foreground and background sessions), Embrace
enables you to see complete technical and behavioral data, giving you the context to solve both known and
unknown issues.
5 ways Embrace enables rapid prioritization and resolution of
ANRs
01      Embrace accurately detects ANR stack traces
        with superior data capture
When it comes to capturing ANR sample stack traces, the Google Play Console and other point solutions only show
a portion of the picture by capturing a stack trace 5 seconds after the ANR has occurred. This means you’re seeing
sample stack traces long after the ANR was triggered, which can lead to incorrect diagnosis. The sampling for the
Google Play Console may even be delayed beyond 5 seconds due to the load on the device during the ANR, which
means you may be missing valuable ANR stack trace data. Lastly, this method only captures data for fatal ANRs and
neglects non-fatal ANRs, which are equally vital for enhancing your user’s experience.
To accurately triage and resolve ANRs for good, it’s important to understand what code was running from the
moment the ANR is triggered to the end of the ANR interval. With Embrace intelligent ANR Reporting, teams can
auto-capture and surface a stack trace as soon as the main thread is blocked for 1 second, followed by auto-
collecting main thread stack traces every 100ms until the app recovers, force quits, or the ANR dialog appears. By
capturing these additional stack traces engineers can gain deep insights into the code execution and its evolution
throughout the ANR interval to get to the true root cause, all without introducing any unnecessary overhead. This
level of detail empowers engineers to quickly spot and resolve both fatal and non-fatal ANRs and ultimately drive
better user experiences.
                                                        13
02      Embrace filters out noisy ANRs with intelligent grouping
Finding which stack traces to focus on can also be challenging when you’re basing your decision solely on the
final stack trace. Oftentimes, the stack traces exhibit enough differences that identifying the root cause becomes
challenging. Because Embrace auto-collects stack traces throughout the entire ANR interval, teams gain access
to a broader range of stack trace samples, providing deeper insights into the underlying cause. Teams can filter
sample stack traces by “Most Representative”, “First Sample” or “Ad SDK” to help visualize the data in different
ways and quickly surface the right ANRs. When selecting “Most Representative” Embrace will scan and analyze
sample stack traces for you and group them by the most relevant method to identify the code sections likely
contributing to the ANR. Embrace then ranks them by volume, sessions impacted, or users impacted and maps
issues to a category (Ads or Concurrency) to help you cut out the noise and focus on what matters most for you and
your team.
03      Embrace’s powerful flame graphs let you drill down
        to the line of code
Even with advanced grouping it may still be overwhelming to sift through stack traces. Using flame graph
visualizations, we make it easy to surface critical stack traces that contain known problematic methods. Using
the flame graph, each span represents the number of sample stack traces. The length of the span represents how
many sample stack traces the method appears in. The wider and deeper the span, the more likely the code path
contributed to an ANR. Simply select a method (shown in red) to pinpoint which first- or third-party code may have
contributed to the ANR.
                                                        14
Selecting a method will take you to the ANR Method Troubleshooting graph. This allows you to see all the code
paths that lead to the selected problematic method ( “callers”) and the code paths following the method (“callees”)
to help you nail down the line of code that may be contributing to the ANR.
                                                        15
04      Embrace highlights what contributed to the
        ANR with complete out-of-the-box user session context
Capturing multiple stack traces may not be enough to pinpoint every ANR root cause. ANRs can also stem
from unpredictable factors in the device and from the user that you may need to take into account when
troubleshooting. Factors like failed network calls, low connectivity, heavy view rendering and more can lead to
an ANR. Embrace gives you the ability to easily pivot from a high priority sample stack trace in the flame graph or
summary view directly into sample affected user sessions (‘Sample Sessions’) to help you understand the ANR in
more depth.
With User Session Insights, Embrace collects all behavioral and technical user activity leading up to the ANR,
completely out-of-the-box, so engineers can quickly and accurately reproduce the ANR and understand how other
factors may be contributing to it. All this without wasting time cobbling together log, ANR, and product analytics
data. If you find that low connectivity, heavy view rendering, and bad code all contributed to the ANR, you can
quickly share session details with the right teams using out-of-the-box Jira and Slack integrations.
                                                         16
05      Embrace identifies patterns and trends in real-time with
        high-level ANR overview and proactive alerting
Embrace continuously analyzes millions of data points across your applications to help you proactively spot critical
ANRs before your users do. Our real-time alerting can help you separate important ANRs from the noise with
context rich alerting that identifies spikes and drops for critical ANR indicators. You can set up alerts for important
user flows, like payment flows, add to cart, or during paid advertisements to optimize user experiences for
revenue-generating moments.
With ANR Summary, anyone can get an out-of-the-box overview of critical ANR metrics like ANR-Free Sessions and
ANR-Free Users, Total ANRs, Affected Users and more in a single view. Teams can surface patterns and anomalies
by version and deployment across your user base.
                                                          17
Closing
thoughts
        ANRs aren’t just annoying for users to encounter and for developers to debug.
        They’re critical errors that have an outsized effect on user engagement,
        acquisition, and retention, and can ultimately drag down your bottom line.
        In this eBook, we’ve taken you beyond the Android documentation, busted the
        myth of the universal 5-second ANR trigger, and provided key insights to help
        you stay above Google’s stringent bad behavior thresholds.
        By underlining the relationship between ANRs and the Android framework, we
        further explained issues with event-based monitoring tools and the need for a
        complete data-driven approach to building the best mobile experiences.
        While we hope this eBook serves as a valuable reference guide for future ANR
        debugging, Embrace can provide even more support through the use of our
        platform.
        From providing flame graphs that help you quickly get at the root cause of an
        ANR, to the ability to stitch multiple sessions together for deeper insight and
        analysis, Embrace is the best option for Android developers who care about
        providing superior mobile experiences.
        Learn how Embrace can help you get a handle on ANRs and optimize your app
        for greater visibility today.
       Get started today with
     1 million free user sessions.
                 Try Embrace free
     Embrace is a data-driven toolset to help      Contact
 engineers manage the complexity of mobile.
Using automated data collection and a unified         8569 Higuera St, Culver City, CA 90232
    digital platform, Embrace reduces the toil
  of mining for insight across disparate tools.       (424)-326-9004
 Engineers can identify, prioritize, and resolve
  problems in their apps, while also surfacing        contact@embrace.io
    opportunities to perfect app performance
    and delight their end users. Learn more at        embrace.io
   embrace.io or follow Embrace on LinkedIn,
                          Facebook, or Twitter.