SPECmail2008 Frequently Asked Question

About The SPECmail2008 Benchmark

What is SPECmail2008?
SPECmail2008 is an industry standard benchmark designed to measure a system's ability to act as a mail server compliant with the Internet standards Simple Mail Transfer Protocol (SMTP) and Internet Mail Application Protocol, Version 4 (IMAP4). The benchmark models business user behavior by simulating a real world workload experienced by enterprised based email services. The goal of SPECmail2008 is to enable objective comparisons of mail server products.
How many domains must I provision?
Only two are used. The benchmark assumes a single corporate domain holds all local e-mail accounts - e.g. user134@yourdomain.co.us. It uses a second domain to generate external e-mail traffic loads - e.g. someuser55@branch1.yourdomain.co.us. Note: The benchmark considers the subdomain (i>branch1.yourdomain.co.us a separate e-mail domain.
What is the smallest number of users needed for a compliant run?
The benchmark must have at least 200 users to fulfill all of the message size and folder heirarchy distribution requirements.
What is the maximum number of users I must create?
The benchmark has been tested up through 4000 users. That is not a hardcoded limit. The main constraint has been disk storage.
What about more users?
More users in a single domain should work. However, the workload should not be extrapolated into the 10's of thousands or higher levels. That is usually the realm of outsourced e-mail services. The interaction level between any two users is usually determined by corporate bounds. This is reflected in the size of mail distribution lists and recipient counts. A 5000 user corporation has different workload characteristics than 100 corporations of 50 users.
What is the smallest number of users supported?
The benchmark works with only 1 user. We suggest using 2, to make the SMTP and IMAP work correctly - one to send and one to receive.

What SPECmail2008 Implements

What is the IMAP4 command set used?
The benchmark uses the IMAP4 command set defined by RFC 2060, only. To maximize interoperability chances, the benchmark does not use any IMAPv4 extended commands or parameters.
What does the benchmark test?
The SPECmail2008 benchmark uses different threads to emulate behaviors for particular mail clients: multi-session, single-session; both long and short duration mailbox activities; and short duration probes.
What is a SPECmail2008 Command Sequence?
A command seuqence is a series of IMAP commands geared toward a specific purpose. There are 5 command sequences: 2 primary and 3 secondary. Primary command sequences are persistent thru the life of the user while secondary command sequences are transient, and will start/end as needed during the run.
What is a SPECmail2008 Client Type?
A Client Type is a unique combination of IMAP Command Sequences that approximate observed behavior in real world e-mail IMAP4 clients.
What distinguishes client types from command sequences?
Client types model unique behaviors found in collected IMAP sessions during the SPECmail2008 benchmark development. These sequences are attributable to unique e-mail clients. Each client type uses at least one (1) of the primary command sequences. The ratio of client types is configurable within the benchmark using the config key CLIENT_TYPE_DISTRIBUTION.
How are users emulated?
Each load generator creates a user thread assigned to a client type. Each user thread generates one or more Command Sequence session into the SUT, according to the rules of that Client Type. Some Command Seqences perform as many operations as possible, throttled only by the response time of the SUT. Other IMAP command sequences follow predefined timings derived from real world data.

Running SPECmail2008

What do I need to run the SPECmail2008 benchmark?
- Java Runtime Environment, version 1.5 or higher
- SPECmail2008 package, usually distributed as a specimap.tar file
- An e-mail SUT that supports IMAP4r1 and SMTP, with the MTA configured to allow message relays to the benchmark's SMTP sink
- A large amount of message storage space (about an average of 450 MB per user). Some will use more. Others will use less.
- One host that acts as the benchmark manager and runs the SMTP sink
- One or more hosts that act as the benchmark load generators

What kind of hardware should we consider using?
We suggest a multi-CPU, not just multi-core hosts with 1 GB RAM for each CPU (not core). Keep in mind that each load generator emulates many users - 1 to hundreds. Each of these user threads must remember the status of each user's IMAP folders. Each user thread group must also have one persistant connection, and at least one other IMAP connection. All of these users must share the same host but perform the complex work of what is normally done by a dedicated computer.
What if we have a few, large/powerful machines?
The initial reaction is to run a single load generator on such multi-CPU/core host. However, JVM limitations restrict how well the load generator utilizes the physical box. We suggest running multiple load generators instead. Start each with the "-p " parameter, using a different port number for each instance. This method utilizes the hardware much better than a single JVM instance.
Any special physical network considerations?
Confirm that the benchmark manager and load generators can talk directly to the SMTP and IMAP servers on the SUT. The SMTP server will also need the ability to initiate SMTP connections to the designated SMTP sink host (usually the manager). This may require some DNS settings.
Any special DNS considerations?
Yes. Due to a special "feature" of Java RMI, the configuration settins should use simple hosts names, without domain references. These host names should be the primary names defined in either the hosts table or in the DNS entries. Hostname aliases causes the benchmark components to not rendevous at the correct points, at various benchmark run phases.
What is the overall benchmark sequence?
The SPECmail2008 benchmark sequence goes through three distinct phases: initialization, benchmark loadtest runs, and generating SPECmail2008 reports from the collected data. A compliant run consists of the verification and load test phases. The resulting output.raw file is used to create the official Disclosure.
When must the mail store be initialized?
The mail store must be initialized before a compliant run that will be published. This very long process depends on the number of users.

One possible short cut is to initialize the mail store once and then backup this data set. Future tests can use this same backup image (restored of course) for the same number of test users. This means the benchmark really needs 900MB disk space per user.
What phases does the benchmark go through?
A compliant run consists of: verify, ramp-up, steady-state (100%), data collection
- Verify ensures that mail store's messages and folders distributions complies with the benchmark.
- Ramp-up period gradually increases the load to requested load level
- Steady-state runs at a constant load level (100%) for the 1 hour required for a compliant run
- Data collection means gather data from all load generators and formating the final results
How is workload calculated?
The workload is derived from the total number of users (USER_END minus USER_START), the PEAK_PCT_USERS, and the CLIENT_TYPE_DISTRIBUTION.

The SUT may or may not be at 100% utilizaiton.
What version of Java should I use?
Sun Java from java.sun.com, 1.5 or later is recommended

Default installed Javas are not recommended
What type of mail messages are used?
The benchmark dynamically generates multi-part MIME encoded messages based on the defined message MIME parts and defined message size distributions.
Can I customize the benchmark workload?
Yes. The Client Type distribution in the SPECmail_fixed.rc file defines the numbers of each type of command sequence generated. Changing this distribution changes the workload characteristics.
Can I change the actual IMAP commands, parameters or sequence?

No these are embedded in the source
How can I customize the benchmark workload?
The workload can be changed in the following ways:
- % of day's work that falls in the peak hour (distribution) (peak % users)
- mime-part size distribution (ie. size of mime part in relation to whole)
- load factor (ie. percent of workload to be generated by number of users)
- message recipient distribution
- messages received per peak hour
- client type distribution (among the 4 client behaviors)
- folder hierarchy distribution (depth and width)
What does the THREADS_PER_CLIENT config key affect?

This config key is used only for the non loadtest benchmark phases - initialization, cleaning, verification. This value determines the number of concurrent threads the benchmark uses during these maintenance tasks.

How does the benchmark determine the number of load generator threads during the real load test?
The number of simultaneous sessions active during the course of the test period is a function of the Traffic Pattern distribution and the number of active users (USER_MAX minus USER_MIN times the PERCENT_ACTIVE).