SPECmail2009 Frequently Asked Question

About The SPECmail2009 Benchmark

What is SPECmail2009?
SPECmail2009 is an industry standard benchmark designed to measure a system's ability to act as a mail server compliant with the Internet standards Simple Mail Transfer Protocol (SMTP) and Internet Mail Application Protocol, Version 4 (IMAP4). The benchmark models business user behavior by simulating a real world workload experienced by enterprised based email services. The goal of SPECmail2009 is to enable objective comparisons of mail server products.
How many domains must I provision?
Only two are used. The benchmark assumes a single corporate domain holds all local e-mail accounts - e.g. user134@yourdomain.co.us. It uses a second domain to generate external e-mail traffic loads - e.g. someuser55@branch1.yourdomain.co.us. Note: The benchmark considers the subdomain branch1.yourdomain.co.us a separate e-mail domain.
What is the smallest number of users needed for a compliant run?
The benchmark must have at least 250 users to fulfill all of the message size and folder heirarchy distribution requirements.
What is the maximum number of users I must create?
The benchmark has been tested up through 10000 users. That is not a hardcoded limit. The main constraint has been disk storage - both in terms of space and I/O operations.
What about more users?
More users in a single domain should work. However, workloads in the 10's of thousands or higher levels moves the infrastructure context the realm of outsourced e-mail services, or a select subset of very large corporations. The interaction level between any two users is usually determined by corporate bounds. This is reflected in the size of mail distribution lists and recipient counts. A 5000 user corporation has different workload characteristics than 100 corporations of 50 users (outsourced service provider model).
What is the smallest number of users supported?
The benchmark works with only 1 user. We suggest using 2, to make the SMTP and IMAP work correctly - one to send and one to receive. A better value would be something like 10 users. This allows the benchmark to create all client types and provide an idea of how the LG might handle the working threads.
How is SPECmail2009 different from SPECmail2008?
SPECmail2009 supports Secured TCP connections and updates e-mail traffic volumes, both message and message store profiles. The new e-mail traffic and message store structures reflect a much larger corporate mail server than the 2003 sample (40,000 instead of 2,700). The udpated message profile reflect changes in both size and MIME structures as content reflects richer media options with modern e-mail IMAP clients.
There have also been performance fixes that make this version run considerably lighter. We also removed the artificial limit of 100 total load generators threads imposed by SPECmail2008.

Lastly, modern network security policies dictate encrypted TCP network connections. SPECmail2009 adds support for both SSL and TLS.
What are the new published metrics?
SPECmail2009 produces two metrics based on whether or not the test used secure TCP connections. Setting both IMAP_SECURE and SMTP_SECURE configuration keys to on causes the benchmark to publish the SPECmail_Ent2009Secure metric. Any other combination produces the SPECmail_Ent2009 metric, even if some connections are encrypted.

What SPECmail2009 Implements

What is the IMAP4 command set used?
The benchmark uses the IMAP4 command set defined by RFC 2060, only. To maximize interoperability chances, the benchmark does not require any IMAPv4 extended commands or parameters.
What does the benchmark test?
The SPECmail2009 benchmark uses different threads to emulate behaviors for particular mail clients: multi-session, single-session; both long and short duration mailbox activities; and short duration probes.
What is a SPECmail2009 Command Sequence?
A command seuqence is a series of IMAP commands geared toward a specific purpose. There are 5 command sequences: 2 primary and 3 secondary. Primary command sequences are persistent through the duration of a day or week. The benchmark starts primary sessions before the actual work load begins. They terminate at the end of the work load. Secondary command sequences are transient and short lived. These start and end as needed during the work load run.
What is a SPECmail2009 Client Type?
A Client Type is a unique combination of IMAP Command Sequences that approximate observed behavior in real world e-mail IMAP4 clients.
What distinguishes client types from command sequences?
Client types model unique behaviors found in collected IMAP sessions during the SPECmail2009 benchmark development. These sequences are attributable to unique e-mail clients. Each client type uses at least one (1) of the primary command sequences. The ratio of client types is configurable within the benchmark using the config key CLIENT_TYPE_DISTRIBUTION.
How are users emulated?
Each load generator creates a user thread assigned to a client type. Each user thread generates one or more Command Sequence session into the SUT, according to the rules of that Client Type. Some Command Seqences perform as many operations as possible, throttled only by the response time of the SUT. Other IMAP command sequences follow predefined timings derived from real world data.

Running SPECmail2009

What do I need to run the SPECmail2009 benchmark?
- Java Runtime Environment, version 1.5 or higher
- SPECmail2009 package, usually distributed as a specimap.tar file
- An e-mail SUT that supports IMAP4r1 and SMTP, with the MTA configured to allow message relays to the benchmark's SMTP sink
- A large amount of message storage space (about an average of 160 MB per user). Some will use more. Others will use less.
- One host that acts as the benchmark manager and runs the SMTP sink
- One or more hosts that act as the benchmark load generators
- SSL certificates for the IMAP and MTA hosts, if SSL/TLS is active
What kind of hardware should we consider using?
We suggest a multi-CPU, not just multi-core hosts with 1 GB RAM for each CPU (not core). Keep in mind that each load generator emulates many users - 1 to hundreds. Each of these user threads must remember the status of each user's IMAP folders. Each user thread group must also have one persistant connection, and at least one other IMAP connection. All of these users must share the same host but perform the complex work normally done by a dedicated computer.
What if we have a few, large/powerful machines?
The initial reaction is to run a single load generator on such multi-CPU/core host. However, JVM limitations restrict how well the load generator utilizes the physical box. We suggest running multiple load generators instead. Start each with the "-p port_number" parameter, using a different port number for each instance. This method utilizes the hardware much better than a single JVM instance.
Any special physical network considerations?
Most enterprise e-mail servers sit on a fast LAN. This means the minimum recommended is a 100BaseT switch. We suggest using Gigabit Ethernet switches if there are many load generators (20+) - usually due to a higher user count (2000 or more).
Any special logical network considerations?
Confirm that the benchmark manager and load generators can talk directly to the SMTP and IMAP servers on the SUT. The SMTP server will also need the ability to initiate SMTP connections to the designated SMTP sink host (usually the manager). This may require some DNS settings.
Any special DNS considerations?
Yes. Due to a special "feature" of Java RMI, the configuration settins should use simple hosts names, without domain references. These host names should be the primary names defined in either the hosts table or in the DNS entries. Hostname aliases causes the benchmark components to not rendevous at the correct points, at various benchmark run phases.
Any special SMTP considerations?
Yes. Your SMTP server should retry failed relay attempts frequently, since the benchmark's SMTP sink intentionally simulates an unreliable server. A retry interval of 30 seconds is recommended.
What is the overall benchmark sequence?
The SPECmail2009 benchmark sequence goes through three distinct phases: initialization, benchmark loadtest runs, and generating SPECmail2009 reports from the collected data. A compliant run consists of the verification and load test phases. The resulting output.raw file is used to create the official Disclosure.
When must the mail store be initialized?
The mail store must be initialized before a compliant run that will be published. This very long process depends on the number of users.

One possible short cut is to initialize the mail store once and then backup this data set and the "*-initResults.out" files. Future tests can use this restored backup image and *-initResults.out file for the same number of test users. This means the benchmark really needs 400MB disk space per user.
What phases does the benchmark go through?
A compliant run consists of: verify, ramp-up, steady-state (100%), data collection
- Verify ensures that mail store's messages and folders distributions complies with the benchmark.
- Ramp-up period gradually increases the load to requested load level
- Steady-state runs at a constant load level (100%) for the 1 hour required for a compliant run
- Data collection means gather data from all load generators and formating the final results
How is workload calculated?
The workload is derived from the total number of users (USER_END minus USER_START), the PEAK_PCT_USERS, and the CLIENT_TYPE_DISTRIBUTION.

The SUT may or may not be at 100% utilizaiton.
What version of Java should I use?
Sun Java engine 1.5 or later is recommended, download from java.sun.com

Default installed Javas are not recommended
What type of mail messages are used?
The benchmark dynamically generates multi-part MIME encoded messages based on the defined message MIME parts and defined message size distributions.
Can I customize the benchmark workload?
Yes. The Client Type distribution in the SPECmail_fixed.rc file defines the numbers of each type of command sequence generated. Changing this distribution changes the workload characteristics.
Can I change the actual IMAP commands, parameters or sequence?
No these are embedded in the source
How can I customize the benchmark workload?
The workload can be changed in the following ways:
- % of day's work that falls in the peak hour (distribution) (peak % users)
- mime-part size distribution (ie. size of mime part in relation to whole)
- load factor (ie. percent of workload to be generated by number of users)
- message recipient distribution
- messages received per peak hour
- client type distribution (among the 4 client behaviors)
- folder hierarchy distribution (depth and width)
- message MIME hierarchy distribution (depth and width)
What does the THREADS_PER_CLIENT config key affect?
This config key is used only for the non-loadtest benchmark phases - initialization, cleaning, verification. This value determines the number of concurrent threads the benchmark uses during these maintenance tasks.
How does the benchmark determine the number of load generator threads during the real load test?
The number of simultaneous sessions active during the course of the test period is a function of the Traffic Pattern distribution and the number of active users (USER_MAX minus USER_MIN times the PERCENT_ACTIVE).
Which mail servers are known to interoperate with SPECmail2009?
- Communigate
- Dovecot
- Sun Java Messaging Server

Errata

Exchange Server. SPECmail2009 version 1.0 sets a non-standard flag (copied) on some messages. Exchange does not support this flag. Exchange counts each of the benchmark's attempts to set the copied flag as an invalid IMAP command. After 10 invalid IMAP commands Exchange terminates the client session which aborts the benchmark. This will be fixed in SPECmail2009 version 1.0.1, due October 2009.
The symptom of this issue is that a load generator exits with one of these errors:
- Error: ImapWorkerThread.doMyLongCommand received unexpected EOF during "userXtY UID STORE N +FLAGS (\DELETED)"; response was ""
- Error: ImapWorkerThread.doMyLongCommand received unexpected EOF during "userXtY UID STORE N +FLAGS (copied)"; response was ""
Internationalization/Localization (I18N/L10N). SPECmail2009 version 1.0 requires use of the US-English locale. Running the benchmark in a locale that uses comma instead of period as a decimal separator (e.g., 100,00% instead of 100.00%) causes the benchmark's verification phase to fail. This will be fixed in SPECmail2009 version 1.0.1, due October 2009.
The symptom of this issue is that the benchmark manager reports this error after the message-verify phase:
```
java.lang.ArrayIndexOutOfBoundsException: N
       at
org.spec.specimap.SpecimapControl.printVerifyMessageSizeDistribution(SpecimapControl.java:716)
```
The workaround for this issue is to add the -Duser.country=US -Duser.language=en options to all invocations of java. For example:
```
$ java -Duser.country=US -Duser.language=en -classpath specimap.jar:check.jar specimapclient .....
$ java -Duser.country=US -Duser.language=en -classpath specimap.jar:check.jar specimap -clean -init -compliant ....
```