Benchmarking Methodology WG Minutes WG Chair: Kevin Dubray Minutes reported by Kevin Dubray The Benchmarking Methodology Working Group met on Monday, 24 August 98, from 1930h to 2200h. There were approximately 50 attendees. Kevin Dubray opened the meeting with the presentation of the session's agenda: 1. Benchmarking Terminology for Firewall Performance. (D. Newman) 1.1 2. Terminology for Cell/Call Benchmarking. (J. Dunn) 3. Benchmarking Methodology for LAN Switching Devices (B. Mandeville) 4. Latency Benchmarking Terminology. * (B. Mandeville) The agenda was left un-bashed. Dubray announced the BMWG had made good progress since the LA meeting: the Multicast Benchmarking Terminology draft was undergoing minor editorial changes following an AD review; the Firewall Performance Terminology draft was firming up nicely as a bulk of the "connection" oriented issues seemed to be resolved on the mailing list; new editors had resurrected the cell/call benchmarking terminology draft; and the first methodology draft on LAN switch benchmarking was delivered. David Newman was then introduced to lead a discussion on the Firewall draft. 1. Benchmarking Terminology for Firewall Performance. David Newman identified the area of change from the last two drafts. (See presentation in the Procedings.) A question was posed, "What sort of attacks may impact performance?" David articulated that there was currently no attempt in the I-D to benchmark firewall performance were the device under attack. He thought classifying the types of traffic used in testing firewalls may help. To that end, David mentioned that the newly added "illegal traffic" definition covers traffic used in an attack. Newman identified the topics that he would like address during the session: - Bit forward rate. (It's hard to come up with the general case.) - What classes of traffic? - Benchmarking firewalls under attack. David put up a slide (slide 12) of some historical categories that were thought to be helpful in characterizing firewall performance. Examples included the number of email messages or the maximum number of concurrent telnet sessions. This lead into a pointed discussion over using a bit vs. a frame forwarding rate metric. It was communicated that because firewalls handle things at various "layers," forcing a frame or bit matric may be counterproductive. Cynthia Martin suggested appendices to handle specifics related to the context of a particular technology, such as ATM. Jeff Dunn again brought up the notion of PDUs (protocol data units) as units of measurements. The measurement unit then could then be a bit, octet, or a frame. So what characterizes a PDU? That's up to the tester. For example, you can define IP- over-ATM PDUs, or TCP-over-IP- over-Ethernet PDUs as counting units; again, the unit of counting is a test realization left to the tester. In general, the group seem to acknowledge the need to move to a more generic, transaction-based measurement paradigm. Specifically, the PDU-as-a-measurement- unit received support. Harald Alvestrand asked what does one do for input/output mismatch. Email relays are examples where there is mismatch. David said that he believes "goodput" provides for these mismatches. With respect to firewall forwarding, David though it was useful to construct a matrix with regards to traffic classes. (See Slide 11 of Newman presentation.) A question was offered regarding provisions in the draft for overhead characterization when the DUT is under attack. David replied that the matrix helps to provide for that type of characterization. David said that he would make the appropriate changes to the draft and attempt to get the draft in shape for an AD review in December. 2. Terminology for Cell/Call Benchmarking. (J. Dunn/C. Martin) Jeff Dunn was introduced and he immediately introduced Cynthia Martin as his co- conspirator in trying to pull the draft together. Jeff gave a insightful, yet entertaining, presentation on the history, purpose, and status of the Cell/Call Benchmarking draft. (Presentation included in the Proceedings.) Jeff summarized the draft's motivation as providing metrics for NBMA technologies. Jeff cautioned that while many technologies follow "connection-oriented" paradigms (e.g. ATM & Frame Relay), these technologies may use clearly different components to achieve the same end. Moreover, Jeff reinforced the need not to "re-invent" new terms for known concepts but to leverage existing concepts to build understanding. Because of the varied contexts with which the subject matter could appear, Jeff requested the BMWG's help in populating the draft. Jeff then presented the revised workplan with respect to the draft. The group had no issues. Jeff then state one of the unwritten goals of the draft: To characterize the effect of good and poor throughput on higher layer functions. Jeff also said that the draft may possibly need to address SONET in the future. With that, Jeff opened the floor for comments. Dubray suggested that the "unwritten" be written into the draft (i.e., characterize the impact of lower layers on higher layer performance.) An attendee offered the question: with the interest in IP over Sonet and its associated scrambling issues, how much related effort will go into this draft? Jeff replied that he thought addressing IP over SONET merited investigation, especially as scrambling may have a specific effect. He further pointed out such investigation MUST be limited to characterization as opposed to a direct determination of whether the IUT were scrambling correctly or not. David Newman stated that it was nice & clean to say "security considerations: none," but is there is traffic that could impair performance. Harald Alvestrand didn't think normal benchmarking presented a corresponding security issue. Dunn agreed reminding folks that context of the draft's benchmarks was a clinical exercise. Moreover, he believed, the traffic content was a more methodological detail. Someone wondered at what level did references to "connection" pertain? Is "connection" the same for Frame Relay as it is relative to ATM? In general, wasn't the term "connection" very layer centric? Jeff retorted that, yes, the draft is very layer specific. But he thought it was appropriate to draw the required relationships when and where it made sense. With no further questions, Jeff thanked the group for its input and asked them to continue the discussion on the mailing list. 3. Benchmarking Methodology for LAN Switching Devices (B. Mandeville) Word was passed to the chair that the editor was sitting in a plane on the tarmac in a neighboring state. Dubray asked whether people thought that it would be productive to start a discussion of the draft without one of its authors. The group thought that it would be beneficial. Dubray then attempted to lead a discussion on the first methodology draft for LAN switching devices. It was mentioned that the first paragraph of the draft's introduction states that the document defines a set of tests. Shouldn't the document define the test methodologies specific to the tests defined in RFC 2285? More generally, many thought the procedural descriptions lacked the specificity required to generate consistent results. To the point, "would two people executing the same stated procedure get similar results on the same tested platform? (i.e., as a function of methodology.) Another person questioned the general prescribed reporting format. It was communicated that a more structured approach might be needed. Another attendee stated that if we require structured reporting formats, why not just cite ISO9646? There was several comments with respect to addressing and frames. One questioned where the method in determining how packets are destined to specific ports was stated. A person questioned the wisdom of validating frame delivery based on address scrutiny alone. It would be better to have an independent validity check, such as a tag embedded in the frame. Another person cited that several rates are identified through the draft. How are these rates calculated? What source(s) specify the calculations? A similar comment was received regarding burst size. One person took issue with priming the device's address tables prior to a test run, as advocated in section 3, Test Set-up. Another person identified inconsistencies with respect to spanning tree operation, this draft, and previous BMWG work. For example, RFCs 1242 and 1944 require the Spanning Tree protocol to be enabled; this draft's section 3 suggests disabling Spanning Tree operation; section 5.9 makes provisions for spanning tree operations. David Newman offered that this is most likely based on experience that not all Spanning Tree IUTs are able to be activated or disabled. Another voice asked how could one form a basis for fair comparison if spanning tree was on in one DUT but not on in another? On the topic of address learning rate, the question of consistency of results was raised. It was thought that the methodology stated was not adequate in the determination of a known state, thereby compromising consistency. Deborah Stopp believed that flushing the tables would go a long way in getting to a known state. It was generally agreed that getting the DUT to a known state was beneficial and that the procedures relative to the attainment of a known state were lacking in the current methodological descriptions. On the monitoring of flooded traffic, a question was raised as to whether flooded traffic is counted as a good or bad event. Kim Martin thought it useful to report offered load and forwarding rates with respect to frame size. It was thought that the draft occasionally presents definitive (and potentially questionable) conclusions that have no place in a document defining test methodology. It was thought that the document would be better served by defining input parameters, test procedures, and test outputs. A question was raised asking why was the draft released with so many sections marked, "to be done." The chair responded that the authors had acted on his request - the metrics defined were modular enough to be addressed individually. Moreover, the scope of the draft was discrete enough (having been defined by RFC 2285) that the approach to garner commentary in a piecemeal fashion was not unsound. The chair also responded to a charge made on the BMWG mailing list that the draft was a vehicle for a vendor-specific implementation: the chair thought ad hominum attacks were counterproductive; the presentation of alternatives to questionable metrics and methods was by far more productive. By having the BMWG choose the best _presented_ solution, the networking community and its vendors would be best served. Dubray postponed discussion of Latency issues until a later date. With that, Dubray summarized the BMWG goals for the next session: 1. Publish to reflect AD review. ADs to forward draft to IESG. 2. Resolve the outstanding issues for the Firewall benchmarking terminology draft. Prepare the draft for AD review in December. 3. Lead discussion of issues for Benchmarking Methodology for LAN Switching Devices on the MWG mailing list. Revise and reissue. 4. Issue and discuss the first draft on Latency Benchmarking Terminology.