FAQ / zilstat – Will a separate ZIL log device help your ZFS performance?

Contribution from Richard Elling ( Richard.Elling@RichardElling.com ):
zilstat is useful for determining how much write activity is generated by your workload before you add a separate log to a pool (which can later not easily be removed).
The general recommendation for such cases is to use a separate ZIL log, preferably on a device which has a nonvolatile write cache or a write-optimized SSD, but use this script to check your wordload first.

The script can be downloaded here

 

Usage: zilstat.ksh [gMt][-l linecount] [-p poolname] [interval [count]]
-M # print numbers as megabytes (base 10)
-t # print timestamp
-p poolname # only look at poolname
-l linecount # print header every linecount lines (default=only once)
interval in seconds or “txg” for transaction group commit intervals
note: “txg” only appropriate when -p poolname is used
count will limit the number of intervals reported

examples:
zilstat.ksh # default output, 1 second samples
zilstat.ksh 10 # 10 second samples
zilstat.ksh 10 6 # print 6 x 10 second samples
zilstat.ksh -p rpool # show ZIL stats for rpool only

output:
[TIME]
N-Bytes – data bytes written to ZIL over the interval
N-Bytes/s – data bytes/s written to ZIL over ther interval
N-Max-Rate – maximum data rate during any 1-second sample
B-Bytes – buffer bytes written to ZIL over the interval
B-Bytes/s – buffer bytes/s written to ZIL over ther interval
B-Max-Rate – maximum buffer rate during any 1-second sample
ops – number of synchronous iops per interval
<=4kB – number of synchronous iops <= 4kBytes per interval
4-32kB – number of synchronous iops 4-32kBytes per interval
>=32kB – number of synchronous iops >= 32kBytes per interval
note: data bytes are actual data, total bytes counts buffer size

The main question to be answered by zilstat is, “how much write activity is going to the ZIL?” If you run your workload and the results remain all balls (zeros), then it is a good indication that your workload would not benefit from a separate log because the workload does not generate synchronous writes. However, if you are seeing a lot of activity, then perhaps your workload would benefit from a separate log. In general, NFS and database servers are the beneficiaries of a separate log.

Here is a sample output from my desktop. I apologize if this wraps on your browser.

# ./zilstat.ksh -t 60
TIME N-Bytes N-Bytes/s N-Max-Rate B-Bytes B-Bytes/s B-Max-Rate ops <=4kB 4-32kB >=32kB
2009 Feb 6 14:25:22 0 0 0 0 0 0 0 0 0 0
2009 Feb 6 14:26:22 287368 4789 283064 618496 10308 524288 7 0 2 5
2009 Feb 6 14:27:22 10304 171 10304 94208 1570 94208 2 0 0 2
2009 Feb 6 14:28:22 223120 3718 213152 618496 10308 544768 7 0 2 5
2009 Feb 6 14:29:22 0 0 0 0 0 0 0 0 0 0
2009 Feb 6 14:30:22 3336 55 1264 28672 477 12288 7 7 0 0
2009 Feb 6 14:31:22 0 0 0 0 0 0 0 0 0 0
2009 Feb 6 14:32:22 149768 2496 145480 294912 4915 282624 8 1 2 5
2009 Feb 6 14:33:22 248 4 248 4096 68 4096 1 1 0 0

From this sample you can see that the amount of data being synchronously written by the applications is rather modest. This data is fit into buffers which is what is actually sent to storage. For my activity this afternoon, the maximum rate of buffer bytes written peaked at 544,768 in a one-second sample. I really didn’t notice this as a performance impact, mostly because it is my desktop machine, and it already performs pretty well. Your workload might be very different and zilstat might be able to give you an idea of how much data would be written to a separate log. Clearly, if the numbers are mostly zero, most of the time, then a separate log will likely not help the performance of your workload.

In some cases, you may want to see the ZIL activity per transaction group (txg) commit. This will provide a better idea of the separate log device size. To see per-txg measurements, use the string “txg” instead of a number for the interval. You will also need to specify a pool with the txg option because there may be multiple active pools, and I’m not sure that there is a reasonable way to present that data concisely. If you want to look at multiple pools with the txg option concurrently, just run two instances of zilstat. Here is an example of txg output.

 

# ./zilstat.ksh -p rpool txg
waiting for txg commit…
txg N-Bytes N-Bytes/s N-Max-Rate B-Bytes B-Bytes/s B-Max-Rate ops <=4kB 4-32kB >=32kB
3873100 0 0 0 0 0 0 0 0 0 0
3873101 0 0 0 0 0 0 0 0 0 0
3873102 24160 805 24160 45056 1501 45056 6 1 5 0
3873103 0 0 0 0 0 0 0 0 0 0

I find that using the txg view with timestamps is very useful. However, the output is too long to fit nicely on a web page example.

Note that the time interval for txg commits is variable. By design, this should be no more than approximately 30 seconds. However, you may see them occur more frequently. For example, if you are using the Time Slider feature of OpenSolaris, then when ZFS takes snapshots it forces a txg commit. This is another reason why recording timestamps with the -t option is useful.

Note: a packaging bug in OpenSolaris may affect your ability to run DTrace scripts. If you run zilstat and DTrace complains about “Invalid library dependency in /usr/lib/dtrace/iscsit.d: /usr/lib/dtrace/iscsi.d” then you will need to install the SUNWiscsitgt package. See CR 6855460 for details.

Posted in: ZFS