Disk Performance: What The Numbers Mean

An enterprise grade SAS drive is spec’d at 3ms to 8ms response times.

Guys at Microsoft will tell you they want 10ms or less for everything. My goal is 10ms for the transaction log (LDF), and 20ms for the data files (MDF/NDF).

What happens when your are beyond 10ms on the t-log?

  • The log buffer fills up,
  • SQL Server sees the disk is being thrashed and takes it’s foot off the pedal because it can’t buffer anymore IO requests to the log,
  • At this point, locks start getting held much longer,
  • Severe blocking and deadlocks start to occur,
  • Applications start to time out and go into retries, etc…
  • The system stops being usable.

Errors like this start showing up in the ERRORLOG:

SQL Server has encountered x occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [Drive:\MSSQL\MSSQL.1\MSSQL\Data\xyz.mdf] in database [database].  The OS file handle is 0x00000000.  The offset of the latest long I/O is: 0x00000000000000

For more information on this:

http://blogs.msdn.com/b/sqlsakthi/archive/2011/02/09/troubleshooting-sql-server-i-o-requests-taking-longer-than-15-seconds-i-o-stalls-amp-disk-latency.aspx

The curious thing, CPU goes way down because no work is being done. This makes most sysadmins scratch their heads because they don’t realize where the bottleneck is.

Two ways of getting numbers:

PerfMon: Real Time

  • Avg. Disk Sec/Read
  • Avg. Disk Sec/Write
  • Avg. Disk Sec/Transfer

Reading the PerfMon Output: .010 = 10ms, .050 = 50ms, 1.000 = 1 second, etc…

Here’s and example of 6,132ms (6 seconds) response times on a write operation (really bad).

6second_disk_latency

DMV to see Virtual File Stats: Averages since system was started

SELECT
–virtual file latency
vReadLatency
= CASE WHEN num_of_reads = 0
THEN 0 ELSE (io_stall_read_ms/num_of_reads) END
, vWriteLatency
= CASE WHEN num_of_writes = 0
THEN 0 ELSE (io_stall_write_ms/num_of_writes) END
, vLatency
= CASE WHEN (num_of_reads = 0 AND num_of_writes = 0)
THEN 0 ELSE (io_stall/(num_of_reads + num_of_writes)) END
–avg bytes per IOP
, BytesperRead
= CASE WHEN num_of_reads = 0
THEN 0 ELSE (num_of_bytes_read/num_of_reads) END
, BytesperWrite
= CASE WHEN num_of_writes = 0
THEN 0 ELSE (num_of_bytes_written/num_of_writes) END
, BytesperTransfer
= CASE WHEN (num_of_reads = 0 AND num_of_writes = 0)
THEN 0 ELSE ((num_of_bytes_read+num_of_bytes_written)/(num_of_reads + num_of_writes)) END

, LEFT(mf.physical_name,2) as Drive
, DB_NAME(vfs.database_id) as DB
–, mf.name AS FileName
, vfs.*
, mf.physical_name
FROM sys.dm_io_virtual_file_stats(NULL,NULL) as vfs
JOIN sys.master_files as mf ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
–WHERE mf.type_desc = ‘LOG’ — log files
–WHERE DB_NAME(vfs.database_id) IN (‘tpcc’,’tpcc2′)
ORDER BY vLatency DESC
— ORDER BY vReadLatency DESC
— ORDER BY vWriteLatency DESC

The following is a sample output showing that the Read Latency is 6ms and the Write Latency is 255ms with the Average Latency (R+W) is 145ms.

vReadLatency         vWriteLatency        vLatency             Drive DB         physical_name
-------------------- -------------------- -------------------- ----- ---------- -----------------------------------------------------
6                    255                  145                  F:    tempdb     F:\MSSQL10_50.MSSQLSERVER\MSSQL\Data\tempdb12.ndf
6                    255                  145                  F:    tempdb     F:\MSSQL10_50.MSSQLSERVER\MSSQL\Data\tempdb11.ndf
6                    255                  145                  F:    tempdb     F:\MSSQL10_50.MSSQLSERVER\MSSQL\Data\tempdb10.ndf
6                    254                  145                  F:    tempdb     F:\MSSQL10_50.MSSQLSERVER\MSSQL\Data\tempdb5.ndf
6                    255                  145                  F:    tempdb     F:\MSSQL10_50.MSSQLSERVER\MSSQL\Data\tempdb4.ndf
6                    255                  145                  F:    tempdb     F:\MSSQL10_50.MSSQLSERVER\MSSQL\Data\tempdb3.ndf
6                    255                  145                  F:    tempdb     F:\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\tempdb2.ndf
6                    255                  145                  F:    tempdb     F:\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\tempdb.mdf
6                    255                  145                  F:    tempdb     F:\MSSQL10_50.MSSQLSERVER\MSSQL\Data\tempdb13.ndf
6                    255                  145                  F:    tempdb     F:\MSSQL10_50.MSSQLSERVER\MSSQL\Data\tempdb14.ndf

(10 row(s) affected)

The following chart from Microsoft (http://technet.microsoft.com/en-us/library/cc966412.aspx) shows what they expect:

msft_disk_numbers

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s