Purpose: to tune and configure our data direct disk array to maximize sustained transfers.
Pre-testing:
An ftp transfer from H70 to a winterhawk I, produces read of times from 15-46 /meg/sec The higher number is a memory to network copy. A disk (non-cache) copy to network (gig) is 15-ish seconds for a 292 meg file (19 meg/sec)
TEST CASE 1:
setup:
3 different machines concurrently ftp a 400 meg file to /dev/null of swift
result:
test1: 22.17meg/s , 18.52meg/s , 26.27meg/s = (agg) 66.96 meg/sec
test2: 21.78meg/s , 18.76meg/s , 25.70meg/s = (agg) 66.24 meg/sec
test3: 24.08meg/s , 18.75meg/s , 24.16meg/s = (agg) 66.99 meg/sec
conclusion: H70's gig ethernet can sustain 60+ meg/sec real bandwidth
TEST CASE 2:
setup:
3 different machines concurrent ftp 400meg file to /testfs2 (fc filesystem)
raid 3, same filesystem, no caching
result:
test1: 2.43 meg/s , 1.81 meg/s , 1.92 meg/s = (agg) 6.16 meg/sec
test2: 1.72 meg/s , 1.66 meg/s , 1.74 meg/s = (agg) 5.12 meg/sec
test3: 2.41 meg/s , 2.08 meg/s , 3.44 meg/s = (agg) 7.93 meg/sec
TEST CASE 3:
setup: 2 different machine ftp transfers of 400 meg file
again raid 3 , same filesystem, no caching
test1: 4.16 meg/s , 3.93 meg/s = (agg) 8.09 meg/sec
test2: 4.02 meg/s , 4.15 meg/s = (agg) 8.17 meg/sec
test3: 24.59 meg/s , 17.53 meg/s = (agg) 42.12 meg/sec
#first removed all files
test4: 4.16 meg/s , 4.04 meg/s = (agg) 8.20 meg/sec
test5: 9.99 meg/s , 8.90 meg/s = (agg) 18.89 meg/sec
test6: 3.01 meg/s , 3.00 meg/s = (agg) 6.01 meg/sec
test7: 23.81 meg/s , 6.16 meg/s = (agg) 29.97 meg/sec
#again, removed all files
NOTE: Previously, we were using all raid 3 and no caching. However, raid 5
was actually 1-2 meg/sec slower. And caching did not seem to matter
in most test. From now on all tests are using raid 3 and caching
enabled.
setup: 2 different machines ftp a 400 meg file to same filesystem on a
single LUN (hdisk)
test1: 22.68 meg/sec, 22.49 meg/sec = (agg) 45.17 meg/sec
# note: did a rm of all files in fs before#
test2: 5.05 meg/sec, 4.36 meg/sec = (agg) 9.41 meg/sec
test3: 3.32 meg/sec, 3.44 meg/sec = (agg) 6.76 meg/sec
test4: 21.75 meg/sec, 5.80 meg/sec = (agg) 27.55 meg/sec
the files are stored in cache of the FC disks because I still
notice write even after the ftp transfer is finished.
setup: 2 different machines ftp a 400 meg file to different filesystem
on 2 different LUN's but within the same raid group
test1: 28.12 meg/sec, 6.12 meg/sec = (agg) 34.24 meg/sec
# note: did a rm of all files in fs before#
test2: 9.31 meg/sec, 6.15 meg/sec = (agg) 15.46 meg/sec
test3: 6.87 meg/sec, 6.41 meg/sec = (agg) 13.28 meg/sec
test4: 28.19 meg/sec, 27.56 meg/sec = (agg) 55.75 meg/sec
# note did a rm of all files in fs before #
setup: 2 different machines ftp a 400 meg file to different filesystem
on 2 different LUN's in 2 different raid groups
test1: 27.98 meg/sec, 17.88 meg/sec = (agg) 45.86 meg/sec
#removed all files first
test2: 16.52 meg/sec, 17.34 meg/sec = (agg) 33.95 meg/sec
test3: 17.48 meg/sec, 18.30 meg/sec = (agg) 35.78 meg/sec
test4: 17.08 meg/sec, 18.22 meg/sec = (agg) 35.30 meg/sec
test5: 27.42 meg/sec, 27.14 meg/sec = (agg) 54.56 meg/sec
#removed all files first, definitely flushing cache
# write continuing after ftp session
Summary:
Basically, caching does help performance but usually only in the cases
where the filesystems were empty, which seems to flush the cache. You
notice this behavior at every performance run after an "rm /
Also, it is interesting to note that as a filesystem fills up with more
files, it's performance decreases. It is unclear if this behavior also
appears when using raw logical volumes. Yet, in general raw LV's had
faster average write speeds and did not seem to be using much cache as
seem by the monitor tool. However, it is unclear how to "flush" out a
raw lv as was done to the filesystems.
However, most important seem to be the fact that one can sustain almost
full transfer rates to the fibre channel disks by writing to two
separate LUNs in two separate raid groups. Additional, raw lvs do not
seem to have what I suspect are locking contention or filesystem
control block updates that filesystems experience when
performing multiple writes to the same filesystem (verse multiple
writes to the same raw lv).
Some initial tests (not included above) with writing to 2 different
LUN's within the same raid group using raw lv, results in transfer
rates of only 12 meg/sec where as a single transfer results in
16+ meg/sec. So there also seems to be some type of contention here
also.