KEMBAR78
Tools and Techniques for Understanding Threading Behavior in Android* | PPTX
Tools and Techniques for Understanding Threading Behavior in Android* 
Dr. Ramesh Peri 
Principal Engineer & Architect of Android tools 
Intel Corporation, 
Austin, TX 78746 
Email: ramesh.v.peri@intel.com
2 
Agenda 
 Goals 
 Tools Roundup 
 Intel Vtune, Linux Perf, Nvidia System Profiler, Google Systrace, ARM DS-5 
 Threading Examples 
 Simple Example 
 Simple Threading 
 Communicating Threads 
 Simultaneously executing threads 
 Lazy Thread 
 False Sharing
3 
Goals 
 Learn about performance analysis tools in Android 
 Develop simple micro-benchmarks and run them under the control of a performance 
analysis tool 
 Interpret and validate the data 
 Understanding of Threading models used in Android 
 Understand core to thread mapping 
 Impact on responsiveness 
 Impact on power and performance
4 
Intel® VTune
5 
Nvidia® System Profiler
6 
Google Systrace
7 
ARM DS-5
8 
Linux Perf
Tools 
9 
VTune Linux Perf Nvidia System Profiler 
Google 
Systrace 
OTB Experience hard hard hard good 
TimelIne Yes No Yes Yes 
Java Profiling Yes No No No 
Hotspot views Yes Yes Yes No 
h/w events Yes Limited Limited No 
OS Events Limited Yes Limited Yes 
Filtering Yes No No No 
Call stack Yes Yes No No
Devices 
10 
Column Heading Nexus 7 New Dell venue 8 Nvidia Shield 
Processor ARM Snapdragon Intel Merrifield Nvidia A15 
Frequency 
Memory
Simple Example 
11
A Simple Example 
12 
public void showValue(View v) { 
int i, j, sum=0; 
for (i=0;i<10000;i++) 
for (j=0;j<10000;j++) 
sum+=i; 
TextView tv = (TextView)findViewById(R.id.textView2); 
tv.setText(String.valueOf(sum)); 
} 
public void clearValue(View v) { 
TextView tv = (TextView)findViewById(R.id.textView2); 
tv.setText(""); 
} 
textview2
Performance Profile 
13 
All in one thread
Which core is being used ? 
14 
Mostly using core 1
Which key press used core 0 ? 
Second press used 
Core 0 
15
Simple Threading 
16
Simple Threading 
17 
public void showValue(View v) { 
new LongOperation().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); 
} 
public void clearValue(View v) { 
TextView tv = (TextView)findViewById(R.id.textView2); 
tv.setText("CLEARED"); 
}
Thread Code 
18 
private class LongOperation extends AsyncTask<String, Void, String> { 
@Override 
protected String doInBackground(String... params) { 
int sum=0; 
for (int i=0;i<10000;i++) 
for (int j=0;j<10000;j++) 
sum+=i; 
return (String.valueOf(sum)); 
} 
@Override 
protected void onPostExecute(String result) { 
TextView txt = (TextView) findViewById(R.id.textView2); 
txt.setText(result); 
} 
}
The Threading Picture for 10 key presses 
5 threads created 
and each one 
handling 2 key 
presses 
19
The Threading Picture for 15 key presses 
20
21 
Observation 
 There are 5 worker threads 
 Each one gets a piece of work in a roundrobin fashion 
 Why did the thread really terminate right after the last key press ? 
 No – there were no samples from the thread
Communicating Threads 
22
Communication Pattern 
Thread1 
Thread2 
bqt2 
bqt1 
busy 
time sleep
Communicating Threads 
24 
public void showValue(View v) { 
bqt1 = new LinkedBlockingQueue<String>(2); 
bqt2 = new LinkedBlockingQueue<String>(2); 
new Thread1().executeOnExecutor(AsyncTask. 
THREAD_POOL_EXECUTOR); 
new Thread2().executeOnExecutor(AsyncTask. 
THREAD_POOL_EXECUTOR); 
} 
public void clearValue(View v) { 
TextView tv = (TextView)findViewById(R.id.textView2); 
tv.setText(""); 
} 
Output from thread2 
Output from thread1
Thread Code 
25 
private class Thread1 extends AsyncTask<String, Void, String> { 
@Override 
protected String doInBackground(String... params) { 
int sum=0; 
for (int times=0;times<5;times++) 
{ 
for (int i=0;i<10000;i++) 
for (int j=0;j<10000;j++) 
sum+=i; 
try 
{ bqt2.put("1"); } 
catch (InterruptedException intEx) 
{ System.out.println("Interrupted! "); } 
try 
{ bqt1.take(); } 
catch (InterruptedException intEx) 
{ System.out.println("Interrupted!"); } 
} 
return (String.valueOf(sum)); 
} 
@Override 
protected void onPostExecute(String result) { 
TextView txt = (TextView) findViewById(R.id.textView2); 
txt.setText(txt.getText() + " t1:" + result); 
} 
private class Thread2 extends AsyncTask<String, Void, String> { 
@Override 
protected String doInBackground(String... params) { 
int sum=0; 
for (int times=0;times<5;times++) 
{ 
try 
{ bqt2.take(); } 
catch (InterruptedException intEx) 
{ System.out.println("Interrupted! "); } 
for (int i=0;i<10000;i++) 
for (int j=0;j<10000;j++) 
sum+=i; 
try 
{ bqt1.put("1"); } 
catch (InterruptedException intEx) 
{ System.out.println("Interrupted! "); } 
} 
return (String.valueOf(sum)); 
} 
@Override 
protected void onPostExecute(String result) { 
TextView txt = (TextView) findViewById(R.id.textView2); 
txt.setText(txt.getText() + " t2:" + result); 
}
The Threading Picture for 5 key presses 
26 
Alternating 
Thread1 & 
Thread2
Zoomed in view 
27
Simultaneously Executing Threads 
28
Communication Pattern 
Thread1 
Thread2 
busy 
time sleep 
Main thread
Communicating Threads 
30 
public void showValue(View v) { 
new MasterThread().execute(); 
} 
public void clearValue(View v) { 
TextView tv = (TextView)findViewById(R.id.textView2); 
tv.setText(""); 
} 
5tuples - <Output from thread1, output from thread2>
Thread Code 
31 
private class MasterThread extends AsyncTask<String, Void, String> { 
@Override 
protected String doInBackground(String... params) { 
String result=""; 
for (int i=0;i<5;i++) 
{ 
AsyncTask<String,Void,String> t1 = new 
SlaveThread().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); 
AsyncTask<String,Void,String> t2 = new 
SlaveThread().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); 
String res1 = null; 
try { res1 = t1.get(); } 
catch (InterruptedException e) { e.printStackTrace(); } 
catch (ExecutionException e) { e.printStackTrace(); } 
String res2 = null; 
try { res2 = t2.get(); } 
catch (InterruptedException e) { e.printStackTrace(); } 
catch (ExecutionException e) { e.printStackTrace(); } 
result = result + ":" + res1 + "," + res2; 
} 
return (result); 
} 
@Override 
protected void onPostExecute(String result) { 
TextView txt = (TextView) findViewById(R.id.textView2); 
txt.setText(result); 
} 
}
The Threading Picture for two key presses 
32 
Thread1 & 
Thread2 executing 
At same time
Lazy Threads 
33
Communication Pattern 
Thread1 
busy 
sleep 
time 
3s 3s 3s 3s
Lazy Thread 
35 
public void showValue(View v) { 
new Thread().execute(“”); 
} 
public void clearValue(View v) { 
TextView tv = (TextView)findViewById(R.id.textView2); 
tv.setText(""); 
}
Thread Code 
36 
private class Thread extends AsyncTask<String, Void, String> { 
@Override 
protected String doInBackground(String... params) { 
int sum=0; 
for (int i=0;i<5;i++) 
{ 
try { 
synchronized (this) { 
wait(3000); // wait for 3sec 
} 
} catch (InterruptedException e) { 
e.printStackTrace(); 
} 
for (int j=0;j<10000;j++) 
for (int k=0;k<10000;k++) 
sum+=i; 
} 
return (String.valueOf(sum)); 
} 
@Override 
protected void onPostExecute(String result) { 
TextView txt = (TextView) findViewById(R.id.textView2); 
txt.setText(result); 
} 
}
Performance view 
37 
Thread slept 
For 3 sec
False Sharing 
38
What is False Sharing ? 
struct { 
int x; 
int y; 
} v; 
/* sum & inc run in parallel */ 
int sum(void) 
{ 
int i, s = 0; 
int i; 
for (i = 0; i < 1000000; ++i) 
s+=v.x; 
return s; 
} 
void inc(void) 
{ 
int i; 
for (i = 0; i < 10000000; ++i) 
v.y++;; 
} 
v.x v.y 
memory 
cache cache 
v.x v.y 
sum inc
False Sharing App 
40 
public void showValue1(View v) { 
for (int i=0;i<256;i++) 
a[i]=0; 
for (int i=0;i<4;i++) 
new Thread().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR, String.valueOf(i)); 
} 
// No False Sharing 
public void showValue2(View v) { 
for (int i=0;i<256;i++) 
a[i]=0; 
for (int i=0;i<4;i++) 
new Thread().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR, 
String.valueOf(i*64)); 
}
Thread Body 
41 
private class Thread extends AsyncTask<String, Void, String> { 
@Override 
protected String doInBackground(String... params) { 
int tid=Integer.parseInt(params[0]); 
int lim = tid+32; 
for (int j=0;j<1000;j++) 
for (int k=0;k<10000;k++) 
for (int i=tid;i<lim;i+=4) 
a[i]=a[i]+1; 
return (params[0]); 
} 
@Override 
protected void onPostExecute(String result) { 
TextView txt = (TextView) findViewById(R.id.textView2); 
txt.setText(txt.getText() + " " + result + ":" + a[Integer.parseInt(result)]); 
} 
}
Memory Access Pattern of Threads 
cacheline 1 2 3 4 1 2 3 4 
1 1 
2 2 
3 3 
4 4 
cacheline 
cacheline 
cacheline 
cacheline 
False Sharing 
No False Sharing 
In both cases same amount of work is done
Sample App 
43 
Click falseS Click NfalseS
Profile of the run 
False Sharing 
No False sharing
Detailed view 
45 
False Sharing No False Sharing 
Same number of instructions executed
Conclusion 
46
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY 
INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS 
ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS 
FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY 
RIGHT. 
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as 
SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those 
factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated 
purchases, including the performance of that product when combined with other products. 
Copyright © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. 
Optimization Notice 
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel 
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the 
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent 
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are 
reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific 
instruction sets covered by this notice. 
Notice revision #20110804 
Legal Disclaimer & Optimization Notice 
47 Intel Confidential 
11/26/2014 
Copyright© 2013, Intel Corporation. All rights reserved. 
*Other brands and names are the property of their respective owners.
Tools and Techniques for Understanding Threading Behavior in Android*

Tools and Techniques for Understanding Threading Behavior in Android*

  • 1.
    Tools and Techniquesfor Understanding Threading Behavior in Android* Dr. Ramesh Peri Principal Engineer & Architect of Android tools Intel Corporation, Austin, TX 78746 Email: ramesh.v.peri@intel.com
  • 2.
    2 Agenda Goals  Tools Roundup  Intel Vtune, Linux Perf, Nvidia System Profiler, Google Systrace, ARM DS-5  Threading Examples  Simple Example  Simple Threading  Communicating Threads  Simultaneously executing threads  Lazy Thread  False Sharing
  • 3.
    3 Goals Learn about performance analysis tools in Android  Develop simple micro-benchmarks and run them under the control of a performance analysis tool  Interpret and validate the data  Understanding of Threading models used in Android  Understand core to thread mapping  Impact on responsiveness  Impact on power and performance
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    Tools 9 VTuneLinux Perf Nvidia System Profiler Google Systrace OTB Experience hard hard hard good TimelIne Yes No Yes Yes Java Profiling Yes No No No Hotspot views Yes Yes Yes No h/w events Yes Limited Limited No OS Events Limited Yes Limited Yes Filtering Yes No No No Call stack Yes Yes No No
  • 10.
    Devices 10 ColumnHeading Nexus 7 New Dell venue 8 Nvidia Shield Processor ARM Snapdragon Intel Merrifield Nvidia A15 Frequency Memory
  • 11.
  • 12.
    A Simple Example 12 public void showValue(View v) { int i, j, sum=0; for (i=0;i<10000;i++) for (j=0;j<10000;j++) sum+=i; TextView tv = (TextView)findViewById(R.id.textView2); tv.setText(String.valueOf(sum)); } public void clearValue(View v) { TextView tv = (TextView)findViewById(R.id.textView2); tv.setText(""); } textview2
  • 13.
    Performance Profile 13 All in one thread
  • 14.
    Which core isbeing used ? 14 Mostly using core 1
  • 15.
    Which key pressused core 0 ? Second press used Core 0 15
  • 16.
  • 17.
    Simple Threading 17 public void showValue(View v) { new LongOperation().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); } public void clearValue(View v) { TextView tv = (TextView)findViewById(R.id.textView2); tv.setText("CLEARED"); }
  • 18.
    Thread Code 18 private class LongOperation extends AsyncTask<String, Void, String> { @Override protected String doInBackground(String... params) { int sum=0; for (int i=0;i<10000;i++) for (int j=0;j<10000;j++) sum+=i; return (String.valueOf(sum)); } @Override protected void onPostExecute(String result) { TextView txt = (TextView) findViewById(R.id.textView2); txt.setText(result); } }
  • 19.
    The Threading Picturefor 10 key presses 5 threads created and each one handling 2 key presses 19
  • 20.
    The Threading Picturefor 15 key presses 20
  • 21.
    21 Observation There are 5 worker threads  Each one gets a piece of work in a roundrobin fashion  Why did the thread really terminate right after the last key press ?  No – there were no samples from the thread
  • 22.
  • 23.
    Communication Pattern Thread1 Thread2 bqt2 bqt1 busy time sleep
  • 24.
    Communicating Threads 24 public void showValue(View v) { bqt1 = new LinkedBlockingQueue<String>(2); bqt2 = new LinkedBlockingQueue<String>(2); new Thread1().executeOnExecutor(AsyncTask. THREAD_POOL_EXECUTOR); new Thread2().executeOnExecutor(AsyncTask. THREAD_POOL_EXECUTOR); } public void clearValue(View v) { TextView tv = (TextView)findViewById(R.id.textView2); tv.setText(""); } Output from thread2 Output from thread1
  • 25.
    Thread Code 25 private class Thread1 extends AsyncTask<String, Void, String> { @Override protected String doInBackground(String... params) { int sum=0; for (int times=0;times<5;times++) { for (int i=0;i<10000;i++) for (int j=0;j<10000;j++) sum+=i; try { bqt2.put("1"); } catch (InterruptedException intEx) { System.out.println("Interrupted! "); } try { bqt1.take(); } catch (InterruptedException intEx) { System.out.println("Interrupted!"); } } return (String.valueOf(sum)); } @Override protected void onPostExecute(String result) { TextView txt = (TextView) findViewById(R.id.textView2); txt.setText(txt.getText() + " t1:" + result); } private class Thread2 extends AsyncTask<String, Void, String> { @Override protected String doInBackground(String... params) { int sum=0; for (int times=0;times<5;times++) { try { bqt2.take(); } catch (InterruptedException intEx) { System.out.println("Interrupted! "); } for (int i=0;i<10000;i++) for (int j=0;j<10000;j++) sum+=i; try { bqt1.put("1"); } catch (InterruptedException intEx) { System.out.println("Interrupted! "); } } return (String.valueOf(sum)); } @Override protected void onPostExecute(String result) { TextView txt = (TextView) findViewById(R.id.textView2); txt.setText(txt.getText() + " t2:" + result); }
  • 26.
    The Threading Picturefor 5 key presses 26 Alternating Thread1 & Thread2
  • 27.
  • 28.
  • 29.
    Communication Pattern Thread1 Thread2 busy time sleep Main thread
  • 30.
    Communicating Threads 30 public void showValue(View v) { new MasterThread().execute(); } public void clearValue(View v) { TextView tv = (TextView)findViewById(R.id.textView2); tv.setText(""); } 5tuples - <Output from thread1, output from thread2>
  • 31.
    Thread Code 31 private class MasterThread extends AsyncTask<String, Void, String> { @Override protected String doInBackground(String... params) { String result=""; for (int i=0;i<5;i++) { AsyncTask<String,Void,String> t1 = new SlaveThread().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); AsyncTask<String,Void,String> t2 = new SlaveThread().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); String res1 = null; try { res1 = t1.get(); } catch (InterruptedException e) { e.printStackTrace(); } catch (ExecutionException e) { e.printStackTrace(); } String res2 = null; try { res2 = t2.get(); } catch (InterruptedException e) { e.printStackTrace(); } catch (ExecutionException e) { e.printStackTrace(); } result = result + ":" + res1 + "," + res2; } return (result); } @Override protected void onPostExecute(String result) { TextView txt = (TextView) findViewById(R.id.textView2); txt.setText(result); } }
  • 32.
    The Threading Picturefor two key presses 32 Thread1 & Thread2 executing At same time
  • 33.
  • 34.
    Communication Pattern Thread1 busy sleep time 3s 3s 3s 3s
  • 35.
    Lazy Thread 35 public void showValue(View v) { new Thread().execute(“”); } public void clearValue(View v) { TextView tv = (TextView)findViewById(R.id.textView2); tv.setText(""); }
  • 36.
    Thread Code 36 private class Thread extends AsyncTask<String, Void, String> { @Override protected String doInBackground(String... params) { int sum=0; for (int i=0;i<5;i++) { try { synchronized (this) { wait(3000); // wait for 3sec } } catch (InterruptedException e) { e.printStackTrace(); } for (int j=0;j<10000;j++) for (int k=0;k<10000;k++) sum+=i; } return (String.valueOf(sum)); } @Override protected void onPostExecute(String result) { TextView txt = (TextView) findViewById(R.id.textView2); txt.setText(result); } }
  • 37.
    Performance view 37 Thread slept For 3 sec
  • 38.
  • 39.
    What is FalseSharing ? struct { int x; int y; } v; /* sum & inc run in parallel */ int sum(void) { int i, s = 0; int i; for (i = 0; i < 1000000; ++i) s+=v.x; return s; } void inc(void) { int i; for (i = 0; i < 10000000; ++i) v.y++;; } v.x v.y memory cache cache v.x v.y sum inc
  • 40.
    False Sharing App 40 public void showValue1(View v) { for (int i=0;i<256;i++) a[i]=0; for (int i=0;i<4;i++) new Thread().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR, String.valueOf(i)); } // No False Sharing public void showValue2(View v) { for (int i=0;i<256;i++) a[i]=0; for (int i=0;i<4;i++) new Thread().executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR, String.valueOf(i*64)); }
  • 41.
    Thread Body 41 private class Thread extends AsyncTask<String, Void, String> { @Override protected String doInBackground(String... params) { int tid=Integer.parseInt(params[0]); int lim = tid+32; for (int j=0;j<1000;j++) for (int k=0;k<10000;k++) for (int i=tid;i<lim;i+=4) a[i]=a[i]+1; return (params[0]); } @Override protected void onPostExecute(String result) { TextView txt = (TextView) findViewById(R.id.textView2); txt.setText(txt.getText() + " " + result + ":" + a[Integer.parseInt(result)]); } }
  • 42.
    Memory Access Patternof Threads cacheline 1 2 3 4 1 2 3 4 1 1 2 2 3 3 4 4 cacheline cacheline cacheline cacheline False Sharing No False Sharing In both cases same amount of work is done
  • 43.
    Sample App 43 Click falseS Click NfalseS
  • 44.
    Profile of therun False Sharing No False sharing
  • 45.
    Detailed view 45 False Sharing No False Sharing Same number of instructions executed
  • 46.
  • 47.
    INFORMATION IN THISDOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Legal Disclaimer & Optimization Notice 47 Intel Confidential 11/26/2014 Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.