SPECweb96 Release 1.0 Rules

SPECweb96 Release 1.0 Run and Reporting Rules

Version 2.7. Last modified: Fri May 30 10:52:05 PDT 1997

Last updates include additional reporting requirements added to sections 3.2.2.6, 3.3.1.4.1, and 3.3.1.4.2.

Introduction
- 1.1 Philosophy
- 1.2 Caveat
Running the SPECweb96 Release 1.0 Benchmark
- 2.1 Environment
  - Protocols
  - Server
- 2.2 Measurement
Reporting Results for the SPECweb96 Release 1.0 Benchmark
- 3.1 Metrics And Reference Format
- 3.2 Server Configuration
  - Server Hardware
  - Server Software
- 3.3 Testbed Configuration
  - Network
  - Load Generators
- 3.4 General Availability Dates
- 3.5 Test Sponsor
- 3.6 Notes/Summary of Tuning Parameters
- 3.7 Other Required Information
Building the SPECweb96 Release 1.0 Benchmark

1.0 Introduction

This document specifies how the benchmarks in the SPECweb96 Release 1.0 suite are to be run for measuring and publicly reporting performance results. These rules are according to the norms laid down by the SPEC Web Subcommittee and approved by the SPEC Open Systems Steering Committee. This ensures that results generated with this suite are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).

Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.

1.1 Philosophy

The general philosophy behind the rules for running the SPECweb96 Release 1.0 benchmark is to ensure that an independent party can reproduce the reported results.

The following attributes are expected:

Proper use of the SPEC benchmark tools as provided.
Availability of an appropriate full disclosure report.
Support for all of the appropriate protocols.

Furthermore, SPEC expects that any public use of results from this benchmark suite shall be for servers and configurations that are appropriate for public consumption and comparison. Thus, it is also expected that:

Hardware and software used to run this benchmark must provide a suitable environment for serving WWW documents.
Optimizations utilized must improve performance for a larger class of workloads than just the ones defined by this benchmark suite.
The server and configuration is generally available, documented, supported, and encouraged by the providing vendor(s).

1.2 Caveat

SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECweb96 Release 1.0 as deemed necessary to preserve the goal of fair benchmarking. SPEC with notify members and licencees whenever it makes changes to the suite and will rename the metrics (e.g. from SPECweb96 to SPECweb97a). In the event that a workload is removed, SPEC reserves the right to republish in summary form "adapted" results for previously published systems, converted to the new metric. In the case of other changes, a republication may necessitate retesting and may require support from the original test sponsor.

Relevant standards are cited in these run rules as URL references, and are current as of the date of publication. Changes or updates to these referenced documents or URL's may necessitate repairs to the links and/or amendment of the run rules. The current run rules will be available at the SPEC web site at http://www.spec.org. SPEC with notify members and licencees whenever it makes changes to the suite.

2.0 Running the SPECweb96 Release 1.0 Benchmark

2.1 Environment

2.1.1 Protocols

As the WWW is defined by its interoperative protocol definitions, SPECweb requires adherence to the related protocol standards. The benchmark environment shall be governed by the following standards:

HTTP1.0: Basic WWW protocol, as defined in http://www.w3.org/pub/WWW/Protocols/HTTP1.0/draft-ietf-http-spec.html.
RFC 761: DoD standard Transmission Control Protocol, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc761.txt
RFC791: Internet Protocol, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc791.txt
RFC792: Internet Control Message Protocol, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc792.txt and updated by RFC0950.
RFC 793: Transmission Control Protocol, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc793.txt
RFC950: Internet Standard Subnetting Procedure, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc950.txt
RFC 1122: Requirements for Internet hosts - communication layers, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc1122.txt

For further explanation of these protocols, the following might be helpful:

RFC 1180: TCP/IP tutorial [http://info.internet.isi.edu/in-notes/rfc/files/rfc1180.txt]
RFC 1739: A Primer On Internet and TCP/IP Tools [http://info.internet.isi.edu/in-notes/rfc/files/rfc1739.txt]

2.1.2 Server

For a run to be valid, the following attributes must hold true:

The server supports the required protocols, and is not utilizing variations of these protocols to satisfy requests made during the benchmark. To ensure comparability of results, this release of SPECweb does not support other versions of the HTTP protocol such as 0.9 or 1.1.
The value of TIME_WAIT must be at least 60 seconds.
Rationale: SPEC intends to follow relevant standards wherever practical, but with respect to this performance sensitive parameter it is dificult due to ambiguity in the standards. RFC1122 requires that TIME_WAIT be 2 times the maximum segment life (MSL) and RFC793 suggests a value of 2 minutes for MSL. So TIME_WAIT itself is effectively not limited by the standards. However, current TCP/IP implementations define a de facto lower limit for TIME_WAIT of 60 seconds, the value used in most BSD derived UNIX implementations. SPEC expects that the protocol standards relating to TIME_WAIT will be clarified in time, and that future releases of SPECweb will require strict conformance with those standards.
The server returns the complete and appropriate byte streams for each request made.
The server logs the following information for each request made: address of the requestor, a date and time stamp accurate to at least 1 second, specification of the file requested, size of the file transferred, and the final status of the request.
The server utilizes stable storage for all data files and server logs. The log file records must be written to non-volatile storage at least as often as once per 60 seconds.
The server is comprised of components that are generally available, or shall be generally available within six months of the first publication of these results.

Any deviations from the standard, default configuration for the server will need to be documented so an independent party would be able to reproduce the result without further assistance.

2.2 Measurement

2.2.1 File Set

The benchmark will make references to files located on the server. The range of files access will be determined by the particular level of requested load for each measurement. The particular files referenced shall be determined by the random workload generation in the benchmark itself.

The benchmark suite provides tools for the creation of the files to be used. It is the responsibility of the benchmarker to ensure that these files are placed on the server so that they can be accessed properly by the benchmark. These files, and only these files shall be used as the target file set. The benchmark shall perform internal validations to verify the expected file(s); no modification or bypassing of this validation is allowed.

2.2.2 Load Levels

Each benchmark run consists of a set of requested load levels for which an actual measurement is made. The benchmark measures the actual level achieved and the associated average response time for each of the requested levels.

The measurement of all data points defining a performance curve is made within a single benchmark run, starting with the lowest requested load level and proceeding to the highest requested load level. The requested load levels are specified in a list, from lowest to highest, from left to right, respectively, in the parameter file.

If any requested load level must be rerun for any reason, the entire benchmark run must be restarted and the series of requested load levels repeated. No server or testbed configuration changes, server reboots, or file system initializations (e.g., "newfs") are allowed between requested load levels.

The performance curve must consist of a minimum of 10 data points of requested load, uniformly distributed across the range from zero to the maximum requested load. Additional points in addition to these 10 uniformly distributed points can also be reported.

2.2.3 Benchmark Parameters

All benchmark parameter values must be left at their default values when generating reportable SPECweb96 results, except as noted in the following list:

Server: The means of accessing the desired server shall be defined. This includes the name or address(es) of the server, as well as the proper port number.
Load: A collection of clients called load generators is used to generate an aggregate load on the server being tested.

In particular, there are several settings that cannot be changed without invalidating the result.

Server Fileset: The size of the fileset generated on the server by the benchmark is established as a function of requested throughput. Thus, fileset size is dependent on throughput across the entire results curve. This provides a more realistic server load since more files are being manipulated on the server as the load is increased. This reflects typical server use in real-world environments. The default parameters of the benchmark allow the automatic creation of valid total and working filesets on the server being measured.
Time parameters: RUNTIME, the time of measurement for which results are reported, must be the default 600 seconds for reportable results. The WARMUP_TIME must be set to the default of 300 seconds for reportable results.
Workload parameters: The workload specifics are fixed by the benchmark specification. The given name of a workload file may specify any workload file properly built by the fileset generation step of the benchmark.

3.0 Reporting Results for the SPECweb96 Release 1.0 Benchmark

3.1 Metrics And Reference Format

The report of results for the SPECweb96 benchmark is generated in ASCII and HTML format by the provided SPEC tools. These tools may not be changed, except for portability reasons with prior SPEC approval. This section describes the report generated by those tools. The tools perform error checking and will flag many error conditions as resulting in an "invalid run". However, these automatic checks are only there for your convenience, and do not relieve you of your responsibility to check your own results and follow the run and reporting rules.

While SPEC believes that a full performance curve best describes a server's performance, the need for a single figure of merit is recognized. The benchmark single figure of merit, SPECweb96, is the peak throughput measured during the run (reported in operations per second). For a result to be valid, the peak throughput must be within 5% of the corresponding requested load. The results of a benchmark run, comprised of several load levels, are plotted on a performance curve on the results reporting page. The data values for the points on the curve are also enumerated in a table.

No data point within 25% of the maximum reported throughput may be reported where the number of failed requests for any file class is greater than 1% of total requests for that file class, plus one. No data point within 25% of the maximum reported throughput may be reported whose "Actual Mix Pcnt" versus "Target Mix Pcnt" differs by more than 10% of the "Target Mix Pcnt" for any workload class. E.g., if the target mix percent is 0.35 then valid actual mix percents are 0.35 +/- 0.035.

3.1.1 Table Format

The server performance graph is contstructed from a table containing the data points from a single run of the benchmark. The table consists of two columns:

Throughput in terms of operations per second rounded to the nearest whole number
Average Server Response Time

3.1.2 Graphical Format

Server performance is depicted in a plot with the following format:

Average Server Response Time is plotted on the Y-axis.
Throughput is plotted on the X-axis.

All data points of the plot must be enumerated in the table described in paragraph 3.1.1.

3.1.3 Detailed Results

The SPEC tools will allow verbose output optionally to be selected, in which case additional data are reported in a table:

Requested Load
Throughput in terms of operations per second rounded to the nearest whole number
File class, 1, 2, 3, or 4
Target Mix percentage
Actual Mix percentage. This is flagged as an error if the mix requirements of paragraph 3.1 are not met.
Operation Success Count
Operation Error Count. This is flagged as an error if the error rate requirements of paragraph 3.1 are not met.
Average Server Response Time (in Milliseconds rounded to the nearest tenth)
Standard Deviation of Server Response Time
95% confidence interval Server Response Time

3.2 Server Configuration

The system configuration information that is required to duplicate published performance results must be reported. This list is not intended to be all-inclusive, nor is each feature in the list required to be described. The rule of thumb is: if it affects performance or the feature is required to duplicate the results, describe it. All components must be generally available within 6 months of the or iginal publication of a performance result.

3.2.1 Server Hardware

The following server hardware components must be reported:

Vendor's name
System model number, type and clock rate of processor, number of processors, and main memory size.
Size and organization of primary, secondary, and other cache, per processor. If a level of cache is shared among processors in a system that should be stated in the "notes" section.
Memory configuration if this is an end-user option which may affect performance, e.g. interleaving and access time.
Other hardware, e.g. write caches, or other accelerators
Number, type, model, and capacity of disk controllers and drives
Type of file system

3.2.2 Server Software

The following server software components must be reported:

HTTP (Web) Server software and version.
Operating System and version.
The values of MSL (maximum segment life) and TIME-WAIT. If TIME-WAIT is not equal to 2*MSL, that must be noted. (Reference section 4.2.2.13 of RFC 1122).
Any other software packages used during the benchmarking process.
Other clarifying information as required to reproduce benchmark results (e.g. number of daemons, server buffer cache size, disk striping, non-default kernel parameters, etc.), and logging mode, must be stated in the "notes" section.
Additionally, the submitter must be prepared to make available a description of each of the tuning features that were utilized (e.g. kernel parameters, web software settings, etc.) including the purpose of that tuning feature. Where possible, it should be noted how the values used differ from the default settings for that tuning feature.

3.3 Testbed Configuration

3.3.1 Network Configuration

A brief description of the network configuration used to achieve the benchmark results is required. The minimum information to be supplied is:

Number, type, and model of network controllers
Number and type of networks used
Base speed of network
A network configuration notes section may be used to list the following additional information:
- Number, type, model, and relationship of external network components to support server (e.g., any external routers, hubs, switches, whatever).
- Relationship of load generators, load generator type, and networks (including routers, etc. if applicable) -- in short: which load generators are connected to which LAN segments. For example: "client1 and client2 on one ATM-622, client3 and client4 on second ATM-622, and clients 5, 6, and 7 each on their own 100TX segment."
- Number, type, model, and relationship of external network components.

3.3.2 Load Generators

The following load generator hardware components must be reported:

Number of load generator (client) systems
Processes or threads concurrently generating load on each load generator
System model number, processor type and clock rate, number of processors
Main memory size
Network Controller
Operating System and Version
Compiler and version used to compile benchmark (client code)
Any non-default TCP or HTTP parameters

3.4 General Availability Dates

The dates of general customer availability must be listed for the major components: hardware, HTTP server, and operating system, month and year. All the system, hardware and software features are required to be available within 6 months of the date of test.

3.5 Test Sponsor

The reporting page must list the date the test was performed , month and year, the organization which performed the test and is reporting the results, and the SPEC license number of that organization.

3.6 Notes/Summary of Tuning Parameters

This section is used to document:

System state: single or multi-user
System tuning parameters other than default
Process tuning parameters other than default
Background load, if any
ANY portability changes made to the individual benchmark source code including module name, line number of the change.
Additional information such as compilation options may be listed
Critical customer-identifiable firmware or option versions such as network and disk controllers
Additional important information required to reproduce the results, which do not fit in the space allocated above must be listed here.
If the configuration is large and complex, added information should be supplied either by a separate drawing of the configuration or by a detailed written description which is adequate to describe the system to a person who did not originally configure it.

3.7 Other Required Information

The following additional information is also required to appear on the results reporting page for SPECweb96 Release 1.0 results:

General Availability of the System Under Test.
The date (month/year) that the benchmark were run
The name and location of the organization that ran the benchmark
The SPEC license number

The following additional information may be required to be provided for SPEC's results review:

ASCII versions of the server log file in the Common Log Format, as defined in http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html#LogFormat.

4.0 Building the SPECweb96 Release 1.0 Benchmark

SPEC provides client driver software, which includes tools for running the benchmark and reporting its' results. This software implements various checks for conformance with these run and reporting rules. Therefore the SPEC software must be used except that necessary substitution of equivalent functionality (e.g. file set generation) may be done only with prior approval from SPEC. Any such substitution must be reviewed and deemed "performance-neutral" by the OSSC.

You may not change this software without prior approval from SPEC. SPEC permits minimal performance-neutral portability changes, but only with prior approval. All changes must be reviewed and deemed "performance-neutral" by the OSSC. Source code changes required for standards compliance must be reported to SPEC, citing appropriate standards documents. SPEC will consider incorporating such changes in future releases. Whenever possible, SPEC will strive to develop and enhance the benchmark to be standards-compliant. The portability change will be allowed if, without the change, the:

Benchmark code will not compile,
Benchmark code does not execute, or,
Benchmark code produces invalid results, and
The changed code implements the same workload for the server in a performance neutral manner.

Special libraries may be used in conjunction with the benchmark code as long as they do not replace routines in the benchmark source code, and they are not "benchmark-specific".

Driver software includes C code (ANSI C) and perl scripts (perl5). SPEC will provide prebuilt versions of perl and the driver code, or these may be recompiled from the provided source. SPEC requires the user to provide OS and server software to support HTTP 1.0 as described in section 2.