Archive for Benchmarking

Changing the Blocksize of NTFS Drives and Iometer Testing

index

All file systems that Windows uses to organize the hard disk are based on cluster (allocation unit) size, which represents the smallest amount of disk space that can be allocated to hold a file. The smaller the cluster size, the more efficiently your disk stores information.

If you do not specify a cluster size for formatting, Windows XP Disk Management bases the cluster size on the size of the volume. Windows XP uses default values if you format a volume as NTFS by either of the following methods:

  • By using the format command from the command line without specifying a cluster size.
  • By formatting a volume in Disk Management without changing the Allocation Unit Size from Default in the Format dialog box.

The maximum default cluster size under Windows XP is 4 kilobytes (KB) because NTFS file compression is not possible on drives with a larger allocation size. The Format utility never uses clusters that are larger than 4 KB unless you specifically override that default either by using the /A: option for command-line formatting or by specifying a larger cluster size in the Format dialog box in Disk Management.

Blocksize

What’s the difference between doing a Quick Format and a Full Format?

http://support.microsoft.com/kb/302686

Procedure

  • To check what cluster size you are using already type the below line into a command prompt
  • fsutil fsinfo ntfsinfo :
  • You can see that this drive I am using has a cluster size of 32K. Normally Windows drives default to 4K

Blocksize

  • Remember that the following procedure will reformat your drive and wipe out any data on it
  • Type format : /fs:ntfs /a:64k
  • In this command,  is the drive you want to format, and /a:clustersize is the cluster size you want to assign to the volume: 2K, 4K, 8K, 16KB, 32KB, or 64KB. However, before you override the default cluster size for a volume, be sure to test the proposed modification via a benchmarking utility on a nonproduction machine that closely simulates the intended target.

Other Information

  • As a general rule there’s no dependency between the I/O size and NTFS cluster size in terms of performance. The NTFS cluster size affects the size of the file system structures which track where files are on the disk, and it also affects the size of the freespace bitmap. But files themselves are normally stored contiguously, so there’s no more effort required to read a 1MB file from the disk whether the cluster size is 4K or 64K.
  • In one case the file header says “the file starts at sector X and takes 256 clusters” an in the other case the headers says “the file starts at sector X and takes 16 clusters”. The system will need to perform the same number of reads on the file in either case no matter what the I/O size is. For example, if the I/O size is 16K then it will take 128 reads to get all the data regardless of the cluster size.
  • In a heavily fragmented file system the cluster size may start to affect performance, but in that case you should run a disk defragmenter such as Windows or DiskKeeper for example.
  • On a drive that performs a lot of file additions/deletions or file extensions then cluster size can have a performance impact because of the number of I/Os required to update the file system metadata (bigger clusters generally = less I/Os). But that’s independent of the I/O size used by the application – the I/Os to update the metadata are part of NTFS itself and aren’t something that the application performs.
  • If you’re hard drive is formatted NTFS then you can’t use NTFS compression if you raise the cluster size above 4,096 bytes (4KB)
  • Also keep in mind that increasing cluster size can potentially waste more hard drive space

Iometer Testing on different Block Sizes

The following 9 tests were carried out on one Windows Server 2008 R2 Server (4 vCPUs and 4GB RAM) which is used to page Insurance Modelling data onto a D Drive which is located on the local disk on a VMware Host Server. The disk is an IBM 300GB 10K 6Gps SAS 2.5” SFF Slim-HS HDD

The Tests

iometertesting

The Testing Spec in Iometer

Just adjusted for Disk Block Size which is the Transfer Request Size in the spec below

spec

Testing and Results

  • 4K Block Size on Disk
  • 4K BLOCK SIZE 100% SEQUENTIAL 70% WRITE AND 30% READ

dev70-igloo-ea -4k

  • 4K Block Size on Disk
  • 32K BLOCK SIZE 100% SEQUENTIAL 70% WRITE AND 30% READ

dev70-igloo-ea-32k

  • 4K Block Size on Disk
  • 64K BLOCK SIZE 100% SEQUENTIAL 70% WRITE AND 30% READ

dev70-igloo-ea-64k

  • 32K Block Size on Disk
  • 4K BLOCK SIZE 100% SEQUENTIAL 70% WRITE AND 30% READ

dev70-igloo-ea -32k-4k

  • 32K Block Size on Disk
  • 32K BLOCK SIZE 100% SEQUENTIAL 70% WRITE AND 30% READ

dev70-igloo-ea -32k-32k

  • 32K Block Size on Disk
  • 64K BLOCK SIZE 100% SEQUENTIAL 70% WRITE AND 30% READ

dev70-igloo-ea -32k-64k

  • 64K Block Size on Disk
  • 4K BLOCK SIZE 100% SEQUENTIAL 70% WRITE AND 30% READ

dev70-igloo-ea 64k-4k

  • 64K Block Size on Disk
  • 32K BLOCK SIZE 100% SEQUENTIAL 70% WRITE AND 30% READ

dev70-igloo-ea 64k-32k

  • 64K Block Size on Disk
  • 64K BLOCK SIZE 100% SEQUENTIAL 70% WRITE AND 30% READ

dev70-igloo-ea 64k-64k

The Results

results

The best thing to do seems to be to match up the expected data size with the disk block size in order to achieve the higher outputs. E.g 32K workloads with a 32K Block Size and 64K workloads with a 64K Block size.

Fujitsu Paper (Worth a read)

https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-basics-of-disk-io-performance-ww-en.pdf

Iometer

What is Iometer?

Iometer is an I/O subsystem measurement and characterization tool for single and clustered systems. It is used as a benchmark and troubleshooting tool and is easily configured to replicate the behaviour of many popular applications. One commonly quoted measurement provided by the tool is IOPS

Iometer can be used for measurement and characterization of:

  • Performance of disk and network controllers.
  • Bandwidth and latency capabilities of buses.
  • Network throughput to attached drives.
  • Shared bus performance.
  • System-level hard drive performance.
  • System-level network performance.

Documentation

http://iometer.cvs.sourceforge.net/*checkout*/iometer/iometer/Docs/Iometer.pdf

http://communities.vmware.com

Downloads

http://www.iometer.org/doc/downloads.html

YouTube

Iometer Tutorial Part 1

Iometer Tutorial Part 2

Iometer Tutorial Part 2b

What are IOPs?

IOPS (Input/Output Operations Per Second, pronounced eye-ops) is a common performance measurement used to benchmark computer storage devices like hard disk drives (HDD), solid state drives (SSD), and storage area networks (SAN). As with any benchmark, IOPS numbers published by storage device manufacturers do not guarantee real-world application performance.

IOPS can be measured with applications, such as Iometer (originally developed by Intel), as well as IOzone and FIO and is primarily used with servers to find the best storage configuration.

The specific number of IOPS possible in any system configuration will vary greatly, depending upon the variables the tester enters into the program, including the balance of read and write operations, the mix of sequential and random access patterns, the number of worker threads and queue depth, as well as the data block sizes.There are other factors which can also affect the IOPS results including the system setup, storage drivers, OS background operations, etc. Also, when testing SSDs in particular, there are preconditioning considerations that must be taken into account

Performance Characteristics

The most common performance characteristics measured are sequential and random operations. Sequential operations access locations on the storage device in a contiguous manner and are generally associated with large data transfer sizes, e.g. 128 KB. Random operations access locations on the storage device in a non-contiguous manner and are generally associated with small data transfer sizes, e.g. 4 KB.

The most common performance characteristics are as follows

Installing and Configuring Iometer

  • Click on the .exe

  • Click Next

  • Click I agree

  • Click Next

  • Click Install

  • Click Finish
  • You should see everything installed as per below

  • Open Iometer AS AN ADMINISTRATOR. (not running as Administrator means you don’t see any drives)
  • Accept License
  • The Iometer GUI appears, and Iometer starts one copy of Dynamo on the same machine.

  • Click on the name of the local computer (Manager)in the Topology panel on the
    left side of the Iometer window. The Local Computer (Manager’s) available disk drives appear in the Disk Targets tab. Blue icons represent physical drives; they are only shown if they have no partitions on them. Yellow icons represent logical (mounted) drives; they are only shown if they are writable. A yellow icon with a red slash through it means that the drive needs to be prepared before the test starts
  • Disk workers access logical drives by reading and writing a file called iobw.tst in the root directory of the drive. If this file exists, the drive is shown with a plain yellow icon; if the file does not exist, the drive is shown with a red slash through the icon. (If this file exists but is not writable, the drive is considered read-only and is not shown at all.)
  • If you select a drive that does not have an iobw.tst file, Iometer will begin the test by creating this file and expanding it until the drive is full

  •  The Disk Targets tab lets you see and control the disks used by the disk worker(s currently selected in the Topology panel. You can control which disks are used, how much of each disk is used, the maximum number of outstanding I/Os per disk for each worker, and how frequently the disks are opened and closed.
  • You can select any number of drives; by default, no drives are selected. Click on a single drive to select it; Shift-click to select a range of drives; Control-click to add a drive to or remove a drive from the current selection

  • The Worker underneath your Machine Name – This will default to one worker (thread) for each physical or virtual  processor on the system.  In the event that Iometer is being used to  compare native to virtual performance, make sure that the worker numbers  match!
  • The Maximum Disk Size control specifies how many disk sectors are used by the
    selected worker(s). The default is 0, meaning the entire disk. Then the important part is to fill in the Maximum Disk Size. If you don’t do this, then the first time you run a test, the program will attempt to fill the entire drive with its test file!
  • You want to create a file which is much larger than the amount of RAM in your system however sometimes this is not practical if you have servers that are 24GB or 32GB etc
  • Please use the following link www.unitconversion.org/data-storage/blocks-to-gigabytes-conversion.html to get a proper conversion of blocks to GBs for a correct figure to put in Maxim Disk size
  • E.g. 1GB = 2097152
  • E.g. 5GB = 10485760
  • E.g. 10GB = 20971520
  • The Starting Disk Sector control specifies the lowest-numbered disk sector used by the selected worker(s) during the test. The default is 0, meaning the first 512-byte sector in the disk
  • The # of Outstanding I/Os control specifies the maximum number of outstanding asynchronous I/O operations per disk the selected worker(s) will attempt to have active at one time. (The actual queue depth seen by the disks may be less if the operations complete very quickly.) The default value is 1 but if you are using a VM, you can set this to the queue depth value which could be 16 or 32
    Note that the value of this control applies to each selected worker and each selected disk. For example, suppose you select a manager with 4 disk workers in the Topology panel, select 8 disks in the Disk Targets tab, and specify a # of Outstanding I/Os of 16. In this case, the disks will be distributed among the workers (2 disks per worker), and each worker will generate a maximum of 16 outstanding I/Os to each of its disks. The system as a whole will have a maximum of 128 outstanding I/Os at a time (4 workers * 2 disks/worker * 16 outstanding I/Os per disk) from this manager
  • For all Iometer tests, under “Disk Targets” always increase the “# of  Outstanding I/Os” per target.  When left at the default value of ‘1′, a  relative low load will be placed on the array.  By increasing this  number some the OS will queue up multiple requests and really saturate  the storage.  The ideal number of outstanding IOs can be determined by  running the test multiple times and increasing this number all the  while.  At some point IOPS will stop increasing.  Generally an increase  in return diminishes around 16 IOs/target but certainly more than 32  IOs/target will have no value due to the default queue depth in ESX

iometer99

Note: If the total number of outstanding I/Os in the system is very large, Iometer or Windows may hang, thrash, or crash. The exact value of “very large” depends on the disk driver and the amount of physical memory available. This problem is due to limitations in Windows and some disk drivers, and is not a problem with the Iometer software. The problem is seen in Iometer and not in other applications because Iometer makes it easy to specify a number of outstanding I/Os that is much larger than typical applications produce.

  • The Test Connection Rate control specifies how often the worker(s) open and close their disk(s). The default is off, meaning that all the disks are opened at the beginning of the test and are not closed until the end of the test. If you turn this control on, you can specify a number of transactions to perform between opening and closing. (A transaction is an I/O request and the corresponding reply, if any

  • Click on Access Specifications
  • Check the table below for recommendations

iometer1

  • Click on Access Specifications.

  • There is an Access Specification called “All in One” spec that’s included with IOmeter. This spec includes all block sizes at varying levels of randomness and can provide a good baseline for server comparison

iometer2

  • You can assign a series of targeted tests that get executed in sequential order under the “Assigned Access Specifications” panel.  You can use existing IO scenarios or define your own custom access scenario. I am going to assign the “4K; 100% Read; 0% Random” specification by selecting it and clicking the “Add” button.  This scenario is self-explanatory, and is generally useful for generating a tremendous amount of IO since your read pattern is optimal and the blocks are small.
  • The default is 2-Kilobyte random I/Os with a mix of 67% reads and 33% writes,
    which represents a typical database workload
  • For maximum throughput (Megabytes per second), try changing the Transfer
    Request Size to 64K, the Percent Read/Write Distribution to 100% Read, and
    the Percent Random/Sequential Distribution to 100% Sequential.
  • For the maximum I/O rate (I/O operations per second), try changing the
    Transfer Request Size to 512 bytes, the Percent Read/Write Distribution to
    100% Read, and the Percent Random/Sequential Distribution to 100%
    Sequential.
  • If you want to check what block size your O/S is using, try typing the below into a command prompt and look at the value for byes per cluster

blocksize

  • Note the below relation between block size and bandwidth

Capture

  • Next Click on Results Display

  • This tab will display your test results real-time once the test has finished.  Leave the radio button for “Results Since” set to “Start of Test” as it averages the results as they roll in.
  • Obtaining run-time statistics affects the performance of the system. When running a significant test series, the Update Frequency slider should be set to “oo” (infinity). Also, you should be careful not to move the mouse or to have any background processes (such as a screensaver or FindFast) running while testing, to avoid unnecessary CPU utilization and interrupts.
  • Set the “Update Frequency” to 2 or 3 seconds.  Don’t set it too low as it is possible to affect the test negatively if it is borrowing CPU cycles to keep Iometer updated.  While running you will see activity in the “Display” panel at the frequency you set.
  •  The three most important indicators are “Total I/Os per Second”, “Total MBs per Second”, and “Average I/O Response Time (ms)”.
  • Total I/Os indicate the current number of operations occurring against your storage target.
  • MBs per Second is a function of <I/Os> * <block size>.  This indicates the amount of data your storage target is reading per second.
  • One thing is for certain, that you don’t want to see any errors.  You have another serious issue if that is what you are seeing
  • Go to Test Setup

  • The “Test Description” is used as an identifier in the output report if you select that option.
  • “Run Time” is something you can adjust.  There are no strict rules regulating this setting.  The longer you run your test the more accurate your results.  You system may have unexpected errors or influences so extending your test a bit will flatten  out any anomalies.  If it is a production test run it for 20 – 60 minutes. There’s all sorts of ram caching whatever going on, so it reports falsely high for a while. If you watch it run, you’ll see it start off reporting very large numbers, and they slowly get smaller, and smaller, and smaller. Don’t pay any attention to the numbers until they stabilize, might be 30+ minutes.
  • “Ramp Up Time” is a useful setting as it allows the disks to spin up and level out the internal cache for a more consistent test result.  Set this between 10 seconds and 1 minute.
  • “Record Results” is used when you would like to produce a test report following the test.  Set it to “None” if you only wish to view the real-time results.  You can accept the defaults for “Number of Workers to Spawn Automatically”.
  • “Cycling Options” gives one the choice to increment Workers, Targets, and Outstanding I/Os while testing.  This is useful in situations where you are uncertain how multiple CPU threads, multiple storage targets, and queue depth effect outcome.  Do experiment with these parameters, especially the Outstanding I/Os (Queue Depth).  Sometimes this is OS dependent and other times it is hardware related.  Remember you can set the “Outstanding I/Os” under the “Disk Targets” tab.  In this test we are going to take the default. the choice to increment Workers, Targets, and Outstanding I/Os while testing.  This is useful in situations where you are uncertain how multiple CPU threads, multiple storage targets, and queue depth effect outcome.
  • Next, now that everything is set, click the Green Flag button at the top to start the test.  Following the Ramp Up time (indicated in the status bar) you will begin to see disk activity

  • It will prompt you to select a location to save your .csv
  • While the tests are running, you will see the below

  • You can expand a particular result into its own screen by pressing the right-arrow at the right of each test, which results in a screen similar to the one shown below

To test network performance between two computers (A and B)

  • On computer A, double-click on Iometer.exe. The Iometer main window appears and a Dynamo workload generator is automatically launched on computer A.
  • On computer B, open an MS-DOS Command Prompt window and execute Dynamo, specifying computer A’s name as a command line argument.
  • For example: C:\> dynamo computer_a
  • On computer A again, note that computer B has appeared as a new manager in the Topology panel. Click on it and note that its disk drives appear in the Disk Targets tab.
  • With computer B selected in the Topology panel, press the Start Network Worker button (picture of network cables). This creates a network server on computer B.
  • With computer B still selected in the Topology panel, switch to the Network Targets tab, which shows the two computers and their network interfaces. Select one of computer A’s network interfaces from the list. This creates a network client on computer A and connects the client and server together.
  • Switch to the Access Specifications tab. Double-click on “Default” in the Global Access Specifications list. In the Edit Access Specification dialog, specify a Transfer Request Size of 512 bytes. Press OK to close the dialog.
  • Switch to the Results Display tab. Set the Update Frequency to 10 seconds.
  • Press the Start Tests button. Select a file to store the test results. If you specify an existing file, the new results will be appended to the existing ones.
  • Watch the results in the Results Display tab.
  • Press the Stop Test button to stop the test and save the results.

Useful Powerpoint Presentation

Texas Systems Storage Presentation

Brilliant Iometer Results Analysis

http://blog.open-e.com/random-vs-sequential-explained/

Benchmarking using Performance Tools

Depending on what application you’re trying to model in your VMware test lab, there are a variety of benchmarking tools you can use to stress-test your configuration. VMware provides an extensive benchmarking suite with its VMmark and View Planner offerings.

VMmark incorporates vMotion and Storage vMotion in addition to generating a simulated user workload. View Planner uses Microsoft Office, Adobe Reader and other applications to emulate a typical user workload in a virtual desktop infrastructure, allowing you to measure application delay and user experience on numerous VMs simultaneously.

There are several other load generators available, and with the exception of the SPEC and VMware View Planner benchmarks, you can download them all for free.

File Server Capacity Tool (FSCT): This Microsoft utility drives a load on a traditional CIFS/SMB/SMB2 file server and measures the highest throughput that a server (physical or virtual) can sustain.

Exchange Load Generator 2010 (LoadGen): This Microsoft utility simulates a variety of Exchange email clients at various load levels to help you size your servers before deployment.

Exchange Server Jetstress 2010: This Microsoft utility focuses on the back-end input/output subsystem of the Exchange environment.

Dell DVD Store Database Test Suite: Also part of VMmark, this test suite simulates typical ecommerce site transactions, with built-in load generation.