Welcome to:
Performance and Workload Management
Copyright IBM Corporation 2004 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
3.0.2
Unit Objectives
After completing this unit, students should be able to: Provide basic performance concepts Provide basic performance analysis Manage the workload on a system Work with the Performance Diagnostic Tool (PDT)
Copyright IBM Corporation 2004
Performance Problems
What a fast machine!
The system is so slow today!
Performance is very often not objective!
Copyright IBM Corporation 2004
Understand the Workload
Analyze the hardware: Model Memory Disks Network Identify all the work performed by the system Identify critical applications and processes: What is the system doing? What happens under the cover (for example, NFS-mounts)? Characterize the workload: Workstation Multiuser System Server Mixture of all above?
Copyright IBM Corporation 2004
Critical Resources: The Four Bottlenecks
CPU
Memory
Disk I/O
Network
Number of processes Process-Priorities
Real memory Paging Memory leaks
Disk balancing Types of disks LVM policies
NFS used to load applications Network type Network traffic
Copyright IBM Corporation 2004
Identify CPU-Intensive Programs: ps aux
# ps aux USER PID root 516 johnp 7570 root 1032 root 1
%CPU 98.2 1.2 0.8 0.1
%MEM 0.0 1.0 0.0 1.0
... ... ... ... ...
STIME 13:00:00 17:48:32 15:13:47 15:13:50
TIME 1329:38 0:01 78:37 13:59
COMMAND wait -ksh kproc /etc/init
Percentage of time the process has used the CPU
Percentage of real memory
Total Execution Time
Copyright IBM Corporation 2004
Identify High-Priority Processes: ps -elf
# ps -elf F S 200003 A 240001 A 200001 A
UID PID PPID C PRI 0 1 0 0 60 0 3860 1 0 60 299 7852 7570 24 72
NI 20 20 20
... ... ... ...
TIME CMD 13.59 init 6:06 syncd 0:00 ps
Priority of the process
Nice value
The smaller the PRI value, the higher the priority of the process. The average process runs a priority around 60. The NI value is used to adjust the process priority. The higher the nice value is, the lower the priority of the process.
Copyright IBM Corporation 2004
Basic Performance Analysis
Check CPU
sar -u
possible CPU constraint
yes
High CPU % no Check memory
vmstat
no
iostat
Check disk
High paging yes
Balance disk possible memory constraint
no
Disk balanced yes possible disk/SCSI constraint
Copyright IBM Corporation 2004
Monitoring CPU Usage: sar -u
Interval Number
# sar -u 60 30 AIX www 08:25:10 48 08:26:10 63 08:27:10 59 . . Average 57 43 0 0 1 5 000400B24C00 06/06/01 %wio 0 0 0 0 0 0 %idle 52 37 41 08:24:10 %usr %sys
A system is CPU bound, if: %usr + %sys > 80%
Copyright IBM Corporation 2004
Simultaneous Multi-Threading (SMT)
Each chip appears as a two-way SMP to software Appear as 2 logical CPUs Performance tools may show number of logical CPUs Processor resources optimized for enhanced SMT performance May result in a 25-40% boost and even more. Benefits vary - based on workload To enable: smtctl [ -m off | on [ -w boot | now]]
Copyright IBM Corporation 2004
Monitoring Memory Usage: vmstat
Summary report every 5 seconds
# vmstat 5 kthr ---r 0 0 0 0 0 b memory page ... cpu ----------- ---------------------------------------avm fre 81 66 69 64 5821 re 0 0 0 0 0 pi 0 0 0 21 24 po 0 16 53 0 0 fr sr cy 0 0 0 0 0 ... us 1 1 1 20 5 sy 2 6 4 5 8 id 95 77 63 42 41 wa 2 16 33 33 46
0 8793 0 9192 0 9693 0 10194 0 4794
1 7 81 167 95 216 0 0 0 0
pi, po: Paging space page ins and outs: If any paging-space I/O is taking place, the workload is approaching the system's memory limit wa: I/O wait percentage of CPU If nonzero, a significant amount of time is being spent waiting on file I/O
Copyright IBM Corporation 2004
Monitoring Disk I/O: iostat
# iostat 10 2
tty: tin 0.0 tout avg-cpu: %user %sys 4.3 0.2 0.6 Kbps 0.2 0.0 0.0 tps 0.0 0.0 0.0 Kb_read 7993 0 0 %idle %iowait 98.8 0.4 Kb_wrtn 4408 0 0 %idle %iowait 0.0 33.7 Kb_wrtn 8 0 0
cumulative activity since last reboot
Disks: %tm_act hdisk0 hdisk1 cd0 tty: tin 0.1 0.0 0.0 0.0
tout avg-cpu: %user %sys 110.7 7.0 59.4 Kbps 115.7 0.0 0.0 tps 28.7 0.0 0.0 Kb_read 456 0 0
Disks: %tm_act hdisk0 hdisk1 cd0 77.9 0.0 0.0
A system is I/O bound, if: %iowait > 25%, %tm_act > 70%
Copyright IBM Corporation 2004
topas
CPU Info
iostat Info
VMSTAT Info
Copyright IBM Corporation 2004
topas, vmstat, and iostat Enhancements for Micro-Partitioning (AIX 5.3)
Added two new values to the default screen Physc and %Entc The vmstat command has two new metrics: pc and ec The iostat command has also two new metrics: %physc and %entc
Copyright IBM Corporation 2003
AIX Performance Tools
Identify causes of bottlenecks:
CPU Bottlenecks Processes using CPU time
tprof
c
Memory Bottlenecks Processes using memory
AUS
es
svmon
I/O Bottlenecks File systems, LVs, and files causing disk activity
filemon
Copyright IBM Corporation 2004
AIX Tools: tprof
# tprof -x sleep 60 # more _prof.all
Process wait netscape_aix4 lslpp Process wait netscape_aix4 ksh PID 516 23494 17566 FREQ 1 5 46 TID 517 40015 43613 Total 6855 961 77
This file is created by tprof
Total 6855 201 11 Kernel 6855 122 64
Kernel 6855 27 5 User 0 139 7
User 0 29 4 Shared 0 700 6
Shared 0 145 2 Other 0 0 0
Other 0 0 0
Copyright IBM Corporation 2004
AIX Tools: svmon
Global report # svmon -G
memory pg space size inuse 32744 20478 65536 294 work 2768 13724 pers 0 6754 free 12266 pin virtual 2760 11841
pin in use
clnt 0 0
Sizes are in # of 4K frames
# svmon -Pt 3
Pid 14624 9292 3596 Command java httpd X
Top 3 users of memory
Inuse 6739 6307 6035 Pin 1147 1154 1147 Pgsp 425 205 1069 Virtual 4288 3585 4252 64-bit N N N Mthrd Y Y N
* output has been modified
Copyright IBM Corporation 2004
AIX Tools: filemon
# filemon -o fmout Starts monitoring disk activity
# trcstop # more fmout
Stops monitoring and creates report
Most Active Logical Volumes
util 0.03 0.02 0.02 #rblk 3368 0 56 #wblk 888 1584 928 KB/s 26.5 9.9 6.1 volume /dev/hd2 /dev/hd8 /dev/hd4 description /usr jfslog /
Most Active Physical Volumes
util 0.10 0.02 #rblk 24611 56 #wblk 12506 8418 KB/s 231.4 52.8 volume /dev/hdisk0 /dev/hdisk1 description N/A N/A
Copyright IBM Corporation 2004
There Is Always a Next Bottleneck!
Our system is I/O bound. Let's buy faster disks !
# iostat 10 60
Our system is now memory bound! Let's buy more memory !!! # vmstat 5 Oh no! The CPU is completely overloaded !
# sar -u 60 60
Copyright IBM Corporation 2004
Workload Management Techniques (1 of 3)
Run programs at a specific time
# echo "/usr/local/bin/report" | at 0300 # echo "/usr/bin/cleanup" | at 1100 friday
# crontab -e 0 3 * * 1-5 /usr/local/bin/report
minute
hour day_of_month month
weekday
command
Copyright IBM Corporation 2004
Workload Management Techniques (2 of 3)
Sequential execution of programs
# vi /etc/qconfig ksh: device = kshdev discipline = fcfs kshdev: backend = /usr/bin/ksh # qadm -D ksh # qprt -P ksh report1 # qprt -P ksh report2 # qprt -P ksh report3 # qadm -U ksh
Queue is down Jobs will be queued Queue is up: Jobs will be executed sequentially
Copyright IBM Corporation 2004
Workload Management Techniques (3 of 3)
Run programs at a reduced priority
# nice -n 15 backup_all & # ps -el F S UID PID PPID C PRI 240001 A 0 3860 2820 30 90
NI
...
TIME 0:01
CMD backup_all
35 ...
Very low priority
Nice value: 20+15
# renice -n -10 3860 # ps -el F S UID PID PPID 240001 A
C PRI 78
NI
...
TIME 0:02
CMD backup_all
0 3860 2820 26
25 ...
Copyright IBM Corporation 2004
Next Step
Exercise 11: System Performance
Copyright IBM Corporation 2004
Performance Diagnostic Tool (PDT)
PDT assesses the current state of a system and tracks changes in workload and performance.
Balanced use of resources
Operation within bounds
Identify workload trends
PDT
Error-Free Operation
Changes should be investigated
Appropriate setting of system parameters
Copyright IBM Corporation 2004
Enabling PDT
# /usr/sbin/perf/diag_tool/pdt_config
-----------PDT customization menu----------1) show current 3) disable 5) disable 6) de-install 7) exit pdt_config Please enter a number: 4 PDT report recipient and severity level PDT reporting PDT collection PDT 2) modify/enablePDT reporting 4) modify/enable PDT collection
Copyright IBM Corporation 2004
cron Control of PDT Components
# cat /var/spool/cron/crontabs/adm 0 9 * * 1-5 /usr/sbin/perf/diag_tool/Driver_daily
Collect system data, each workday at 9:00
0 10
1-5
/usr/sbin/perf/diag_tool/Driver_daily2
Create a report, each workday at 10:00
0 21 * * 6 /usr/sbin/perf/diag_tool/Driver_offweekly
Cleanup old data, each saturday evening
Copyright IBM Corporation 2004
PDT Files
Collection Driver_ daily /var/perf/cfg/diag_tool/.collection.control
Clean Up Driver_ offweekly /var/perf/cfg/diag_tool/.retention.control 35 days .retention.list
Reporting
/var/perf/tmp/.sm /var/perf/tmp/.SM.discards
Driver_ daily2 /var/perf/cfg/diag_tool/.reporting.control
/var/perf/tmp/.SM.last adm /var/perf/tmp/PDT_REPORT Next Day /var/perf/tmp/PDT_REPORT.last
Copyright IBM Corporation 2004
Customizing PDT: Changing Thresholds
# vi /var/perf/cfg/diag_tool/.thresholds
(int) (int) (int) ... DISK_STORAGE_BALANCE 800 NUMBER_OF_BALANCE 1 FS_UTIL_LIMIT 90 [0:10000 MB] [0:10000] [0:100%]
Current Value
Valid Range
Copyright IBM Corporation 2004
Customizing PDT: Specific Monitors
# vi /var/perf/cfg/diag_tool/.files /var/adm/wtmp /var/spool/qdaemon/ /var/adm/ras/ /tmp/
Files and directories to monitor
# vi /var/perf/cfg/diag_tool/.nodes pluto neptun mars
Machines to monitor
Copyright IBM Corporation 2004
PDT Report Example (Part 1)
Performance Diagnostic Facility 1.0 Report printed: Wed Jun 6 14:37:07 2001 Host name: master Range of analysis included measurements from: Hour 14 on Monday 4th June 2001 to: Hour 9 on Wednesday 6th June Alerts I/O CONFIGURATION - Note: volume hdisk2 has 480 MB available for allocation while volume hdisk1 has 0 MB available PAGING CONFIGURATION - Physical Volume hdisk1 (type:SCSI) has no paging space defined I/O BALANCE - Physical volume hdisk0 is significantly busier than others volume hdisk0, mean util. = 11.75 volume hdisk1, mean util. = 0.00 NETWORK - Host sys1 appears to be unreachable
Copyright IBM Corporation 2004
PDT Report Example (Part 2)
Upward Trends FILES - File (or directory) /var/adm/ras/ SIZE is increasing now, 364 KB and increasing an avg. of 5282 bytes/day FILE SYSTEMS - File system lv01(/fs3) is growing now, 29.00% full, and growing an avg. of 0.30%/day At this rate lv01 will be full in about 45 days ERRORS - Hardware ERRORS; time to next error is 0.982 days System Health SYSTEM HEALTH - Current process state breakdown: 2.10 [0.5%]: waiting for the CPU 89.30 [22.4%]: sleeping 306.60 [77.0%]: zombie 398.00 = TOTAL Summary This is a severity level 1 report No further details available at severity level >1
Copyright IBM Corporation 2004
Next Step
Exercise 12: PDT
Copyright IBM Corporation 2004
Checkpoint
1. What command can be executed to identify CPU-intensive programs?
2. What command can be executed to start processes with a lower priority?
3. What command can you use to check paging I/O?
4. T/F: The higher the PRI value, the higher the priority of a process.
Copyright IBM Corporation 2004
Unit Summary
The following commands can be used to identify potential bottlenecks in the system: ps sar vmstat iostat If you cannot fix a performance problem, manage your workload through other means (at, crontab, nice, renice). Use PDT to assess and control your systems performance.
Copyright IBM Corporation 2004