Bug #2089
install sysstat on idmecluster head node
0%
Description
To find out why the IO is so slow.
History
#1 Updated by Jonathan Barber over 12 years ago
Use "rug" to install packages:
idmecluster:~ # rug sl # | Status | Type | Name | URI --+--------+------+----------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------- 1 | Active | ZYPP | SUSE-Linux-Enterprise-Server-x86_64-10-0-20080129-181407 | iso:///?iso=/cluster/BACKUPS_FSC/SLES10_64.iso&alias=SUSE-Linux-Enterprise-Server-x86_64-10-0-20080129-181407 2 | Active | ZYPP | SUSE-Linux-Enterprise-Server-x86_64-10-0-20110325-102358 | dir:///cluster/home2/setupcds/x86/SLES10_x64/?alias=SUSE-Linux-Enterprise-Server-x86_64-10-0-20110325-102358 3 | Active | ZYPP | SUSE-Linux-Enterprise-Server-x86_64-10-0-20110325-151617 | iso:///?iso=/cluster/home2/setupcds/x86/SLES-10-AMD64-EM64T-CD1.iso&alias=SUSE-Linux-Enterprise-Server-x86_64-10-0-20110325-151617
Check the package is found:
idmecluster:~ # rug if "*sysstat*" Waking up ZMD...Done Catalog: dir:///cluster/home2/setupcds/x86/SLES10_x64/?alias=SUSE-Linux-Enterprise-Server-x86_64-10-0-20110325-102358 Name: sysstat Version: 6.0.2-16.4 Arch: x86_64 Installed: No Status: up-to-date Installed Size: 421807 Summary: Sar and Iostat Commands for Linux Description: <!-- DT:Rich --> Sar and Iostat commands for Linux. The sar command collects and reports system activity information. The iostat command reports CPU statistics and I/O statistics for TTY devices and disks. The information collected by sar and iostat can be saved in a binary file for future inspection. Both commands now support SMP machines when displaying CPU utilization.
Installation fails because Service 1 doesn't exists, so delete it:
idmecluster:~ # rug sd 1 Successfully removed service 'iso:///?iso=/cluster/BACKUPS_FSC/SLES10_64.iso&alias=SUSE-Linux-Enterprise-Server-x86_64-10-0-20080129-181407'
Now install it (slow...):
idmecluster:~ # rug in sysstat
#2 Updated by Jonathan Barber over 12 years ago
"iostat -x 10" shows very high wait times on sda, but why?:
avg-cpu: %user %nice %system %iowait %steal %idle 0.40 0.00 14.90 232.50 0.00 152.20 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 5.10 0.00 120.50 0.00 1004.00 0.00 502.00 8.33 124.52 1154.12 8.30 100.00 sdb 0.00 3399.50 0.10 132.40 0.80 28281.60 0.40 14140.80 213.45 95.42 719.99 5.83 77.28 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.10 3535.20 0.80 28281.60 0.40 14140.80 8.00 4109.82 1162.51 0.22 77.28 dm-1 0.00 0.00 0.10 3535.20 0.80 28281.60 0.40 14140.80 8.00 4109.83 1162.51 0.22 77.28 dm-2 0.00 0.00 0.10 3534.40 0.80 28275.20 0.40 14137.60 8.00 4109.33 1162.63 0.22 77.28 dm-3 0.00 0.00 0.00 0.80 0.00 6.40 0.00 3.20 8.00 0.51 640.50 311.50 24.92 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
(sdb and sdc are the multipath'd iSCSI block devices from the Clariion).
No swapping:
idmecluster:~ # sar -r 1 10 Linux 2.6.16.21-0.8-smp (idmecluster) 05/30/2012 06:37:35 PM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad 06:37:36 PM 56748 7610408 99.26 41536 6825504 8393736 216 0.00 0 06:37:37 PM 56748 7610408 99.26 41544 6825496 8393736 216 0.00 0 06:37:38 PM 56024 7611132 99.27 41564 6825476 8393736 216 0.00 0 06:37:39 PM 56040 7611116 99.27 41564 6825476 8393736 216 0.00 0 06:37:40 PM 56148 7611008 99.27 41564 6825476 8393736 216 0.00 0 06:37:41 PM 56148 7611008 99.27 41564 6825476 8393736 216 0.00 0 06:37:42 PM 56156 7611000 99.27 41564 6825476 8393736 216 0.00 0 06:37:43 PM 56300 7610856 99.27 41568 6825472 8393736 216 0.00 0 06:37:44 PM 56968 7610188 99.26 41568 6825472 8393736 216 0.00 0 06:37:44 PM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad 06:37:45 PM 54716 7612440 99.29 41576 6825464 8393736 216 0.00 0 Average: 56200 7610956 99.27 41561 6825479 8393736 216 0.00 0
Hmm, see rpc.mountd, kjournald, pdflush in D state (later 2 occasionally).
#3 Updated by Jonathan Barber over 12 years ago
- Assignee set to Jonathan Barber
#4 Updated by Jonathan Barber over 12 years ago
nfs was blocking on all of the clients. Solved by "service nfsserver restart" on idmecluster.
Related?
rpc.mountd no longer in D state. iostat still shows high utilisation.
#5 Updated by Jonathan Barber over 12 years ago
Could be that the high wait time on the local disk is due to lack of battery backed RAID controller and therefore no writeback cache?
#6 Updated by Jonathan Barber over 12 years ago
sar isn't included by default in cron on SuSE:
chkconfig sysstat on service sysstat start
#7 Updated by Jonathan Barber over 12 years ago
- Status changed from New to Closed
Machine crashed at Thu May 31 03:36:00 WEST 2012 after running smartctl to query local SCSI interface. Load is now more normal.