Benchmarking Methodology WG Minutes

WG Chair: Kevin Dubray

Minutes reported by Kevin Dubray

The Benchmarking Methodology Working Group met on Monday, 24 August 98, from 
1930h to 2200h.  
There were approximately 50 attendees.

Kevin Dubray opened the meeting with the presentation of the session's agenda:

1. Benchmarking Terminology for Firewall Performance. (D. Newman)
1.1 <draft-ietf-bmwg-secperf-04.txt>

2. Terminology for Cell/Call Benchmarking. (J. Dunn)

3. Benchmarking Methodology for LAN Switching Devices (B. Mandeville)

4. Latency Benchmarking Terminology. * (B. Mandeville)

The agenda was left un-bashed.  

Dubray announced the BMWG had made good progress since the LA meeting:  the 
Multicast Benchmarking Terminology draft was undergoing minor editorial changes 
following an AD review; the Firewall Performance Terminology draft was firming up 
nicely as a bulk of the "connection" oriented issues seemed to be resolved on the 
mailing list; new editors had resurrected the cell/call benchmarking terminology draft; 
and the first methodology draft on LAN switch benchmarking was delivered.

David Newman was then introduced to lead a discussion on the Firewall draft.

1. Benchmarking Terminology for Firewall Performance.

David Newman identified the area of change from the last two drafts.  
(See presentation in the Procedings.) 

A question was posed, "What sort of attacks may impact performance?" David 
articulated that there was currently no attempt in the I-D to benchmark firewall 
performance were the device under attack.  He thought classifying the types of traffic 
used in testing firewalls may help.  To that end, David mentioned that the newly added 
"illegal traffic" definition covers traffic used in an attack. 

Newman identified the topics that he would like address during the session:

- Bit forward rate. (It's hard to come up with the general case.)
- What classes of traffic?
- Benchmarking firewalls under attack.

David put up a slide (slide 12) of some historical categories that were thought to be 
helpful in characterizing firewall performance.  Examples included the number of email 
messages or the maximum number of concurrent telnet sessions.  
 
This lead into a pointed discussion over using a bit vs. a frame forwarding rate metric.  
It was communicated that because firewalls handle things at various "layers," forcing 
a frame or bit matric may be counterproductive.  Cynthia Martin suggested appendices 
to handle specifics related to the context of a particular technology, such as ATM.

Jeff Dunn again brought up the notion of PDUs (protocol data units) as units of 
measurements.  The measurement unit then could then be a bit, octet, or a frame.  So 
what characterizes a PDU?  That's up to the tester.  For example,  you can define IP-
over-ATM PDUs, or TCP-over-IP- over-Ethernet PDUs as counting units; again, the 
unit of counting is a test realization left to the tester.

In general, the group seem to acknowledge the need to move to a more generic, 
transaction-based measurement paradigm.  Specifically, the PDU-as-a-measurement-
unit received support. 

Harald Alvestrand asked what does one do for input/output mismatch.  Email relays are 
examples where there is mismatch.  David said that he believes "goodput" provides for 
these mismatches.

With respect to firewall forwarding, David though it was useful to construct a matrix 
with regards to traffic classes. (See Slide 11 of Newman presentation.)

A question was offered regarding provisions in the draft for overhead characterization 
when the DUT is under attack.  David replied that the matrix helps to provide for that 
type of characterization.

David said that he would make the appropriate changes to the draft and attempt to get 
the draft in shape for an AD review in December.

2. Terminology for Cell/Call Benchmarking. (J. Dunn/C. Martin)

Jeff Dunn was introduced and he immediately introduced Cynthia Martin as his co-
conspirator in trying to pull the draft together.

Jeff gave a insightful, yet entertaining, presentation on the history, purpose, and status 
of the Cell/Call Benchmarking draft.  (Presentation included in the Proceedings.)
     
Jeff summarized the draft's motivation as providing metrics for NBMA technologies.

Jeff cautioned that while many technologies follow "connection-oriented" paradigms 
(e.g. ATM & Frame Relay), these technologies may use clearly different components 
to achieve the same end.  Moreover, Jeff reinforced the need not to "re-invent" new 
terms for known concepts but to leverage existing concepts to build understanding.
 
Because of the varied contexts with which the subject matter could appear, Jeff 
requested the BMWG's help in populating the draft.

Jeff then presented the revised workplan with respect to the draft.  The group had no issues.

Jeff then state one of the unwritten goals of the draft: To characterize the effect of good 
and poor throughput on higher layer functions.  Jeff also said that the draft may possibly 
need to address SONET in the future.

With that, Jeff opened the floor for comments.

Dubray suggested that the "unwritten" be written into the draft (i.e., characterize the 
impact of lower layers on higher layer performance.)

An attendee offered the question: with the interest in IP over Sonet and its associated 
scrambling issues, how much related effort will go into this draft?  Jeff replied that he 
thought addressing IP over SONET merited investigation, especially as scrambling 
may have a specific effect.  He further pointed out such investigation MUST be limited 
to characterization as opposed to a direct determination of whether the IUT were 
scrambling correctly or not.

David Newman stated that it was nice & clean to say "security considerations: none," 
but is there is traffic that could impair performance.  Harald Alvestrand didn't think 
normal benchmarking presented a corresponding security issue.  Dunn agreed 
reminding folks that context of the draft's benchmarks was a clinical exercise.  
Moreover, he believed, the traffic content was a more methodological detail. 

Someone wondered at what level did references to "connection" pertain?  Is "connection" 
the same for Frame Relay as it is relative to ATM? In general, wasn't the term "connection" 
very layer centric?  Jeff retorted that, yes, the draft is very layer specific.  But he thought 
it was appropriate to draw the required relationships when and where it made sense.

With no further questions, Jeff thanked the group for its input and asked them to continue 
the discussion on the mailing list.

3. Benchmarking Methodology for LAN Switching Devices (B. Mandeville)

Word was passed to the chair that the editor was sitting in a plane on the tarmac in a 
neighboring state.  Dubray asked whether people thought that it would be productive to 
start a discussion of the draft without one of its authors.  The group thought that it would 
be beneficial.

Dubray then attempted to lead a discussion on the first methodology draft for LAN 
switching devices.

It was mentioned that the first paragraph of the draft's introduction states that the document 
defines a set of tests.  Shouldn't the document define the test methodologies specific to the 
tests defined in RFC 2285?  More generally, many thought the procedural descriptions 
lacked the specificity required to generate consistent results.  To the point, "would two 
people executing the same stated procedure get similar results on the same tested platform? 
(i.e., as a function of  methodology.) 

Another person questioned the general prescribed reporting format.  It was communicated 
that a more structured approach might be needed.  Another attendee stated that if we 
require structured reporting formats, why not just cite ISO9646?

There was several comments with respect to addressing and frames.  One questioned 
where the method in determining how packets are destined to specific ports was stated.  
A person questioned the wisdom of validating frame delivery based on address scrutiny 
alone. It would be better to have an independent validity check, such as a tag embedded 
in the frame.      

Another person cited that several rates are identified through the draft.  How are these 
rates calculated?  What source(s) specify the calculations?  A similar comment was 
received regarding burst size.

One person took issue with priming the device's address tables prior to a test run, as 
advocated in section 3, Test Set-up.

Another person identified inconsistencies with respect to spanning tree operation, this 
draft, and previous BMWG work.  For example, RFCs 1242 and 1944 require the 
Spanning Tree protocol to be enabled; this draft's section 3 suggests disabling 
Spanning Tree operation; section 5.9 makes provisions for spanning tree operations.  
David Newman offered that this is most likely based on experience that not all 
Spanning Tree IUTs are able to be activated or disabled.  Another voice asked how 
could one form a basis for fair comparison if spanning tree was on in one DUT but not 
on in  another?

On the topic of address learning rate, the question of consistency of results was raised.  It 
was thought that the methodology stated was not adequate in the determination of a known 
state, thereby compromising consistency.  Deborah Stopp believed that flushing the tables 
would go a long way in getting to a known state.  It was generally agreed that getting the 
DUT to a known state was beneficial and that the procedures relative to the attainment of 
a known state were lacking in the current methodological descriptions.

On the monitoring of flooded traffic, a question was raised as to whether flooded traffic 
is counted as a good or bad event. Kim Martin thought it useful to report offered load 
and forwarding rates with respect to frame size.

It was thought that the draft occasionally presents definitive (and potentially questionable) 
conclusions that have no place in a document defining test methodology.  It was thought 
that the document would be better served by defining input parameters, test procedures, 
and test outputs.

A question was raised asking why was the draft released with so many sections marked, 
"to be done."  The chair responded that the authors had acted on his request - the metrics 
defined were modular enough to be addressed individually.  Moreover, the scope of the 
draft was discrete enough (having been defined by RFC 2285) that the approach to 
garner commentary in a piecemeal fashion was not unsound.

The chair also responded to a charge made on the BMWG mailing list that the draft was a 
vehicle for a vendor-specific implementation: the chair thought ad hominum attacks were counterproductive; the presentation of alternatives to questionable metrics and methods 
was by far more productive.  By having the BMWG choose the best _presented_ solution, 
the networking community and its vendors would be best served.

Dubray postponed discussion of Latency issues until a later date. 

With that, Dubray summarized the BMWG goals for the next session:

1. Publish <draft-ietf-bmwg-mcast-05.txt> to reflect AD review.  ADs to forward 
draft to IESG.

2. Resolve the outstanding issues for the Firewall benchmarking terminology draft.  
Prepare the draft for AD review in December.

3. Lead discussion of issues for Benchmarking Methodology for LAN Switching 
Devices on the MWG mailing list.  Revise <draft-ietf-bmwg-mswitch-00.txt> 
and reissue.

4. Issue and discuss the first draft on Latency Benchmarking Terminology.