CURRENT MEETING REPORT Minutes of the Common Indexing Protocol Working Group (FIND) Reported by Patrik Faltstrom, Bunyip The FIND Working Group met for the first time at the 34th IETF. Patrik Faltstrom chaired the meeting. He first reviewed the charter. It was pointed out that the e-mail address of the Working Group is "find@bunyip.com", and nothing else. The Charter gives the Working Group goal as defining one, and only one, common indexing protocol which all directory services can use when passing indexing information. Patrik admitted that so far this work has been aimed toward WHOIS++, but that he is depending on the group for help in making it work across directory protocols. Currently there are 2 drafts which came out of the WNILS Working Group: one on the Common Indexing Protocol by Chris Weider and the other on the WHOIS++ mesh by Patrik. Patrik intends the second version to include LDAP and PH. The way directory information is indexed in the CIP is for each leaf node to supply the information to the indexing server (centroid). When the indexing server gets a query it will be able to prune the branches where there will be no information. (Note that the examples are in WHOIS++) The leaf server sends the Data-Changed command: # DATA-CHANGED o Version-number: o Time-of-latest-centroid-change: o Time of message-generation: o Server-handle: o Host-Name: o Host-Port: o Best-time-to-poll: o Authentication-type: o Authentication-data: # END The centroid uses the Best-time-to-poll value to send a poll command: # POLL o Version-number: o Type-of-poll: o Poll-scope: o Start-time: o End-time: o Template: o Field: o Server-handle: o Host-Name: o Host-Port: o Hierarchy: o Description: o Authentication-type: o Authentication-data: # END The polled machine sends back the Centroid-changed response: # CENTROID-CHANGES o Version-number: o Start-time: o End-time: o Server-handle: o Case-sensitive: o Authentication-type: o Authentication-data: o Compression-type: o Size-of-compressed-data: o Operation: # BEGIN TEMPLATE o Template: o Any-field: # BEGIN FIELD o Field: o Data: # END FIELD # END TEMPLATE # END CENTROID-CHANGES Both the template and field are repeatable. Today the only transfer is on the whole centroid, it is case insensitive, is the 8879-1 character set, and the tokenization algorithm is white space and @. More information about the CIP is available at: http://www.bunyip.com/products/digger The question was asked why use this when X500 has replication? The answer is that it is a base for the future. X500 doesn't offer indexing, nor does it provide a common indexing for all protocols. This model is also used for URN to URC resolution at Georgia Tech, and the model may allow for Web indexing. Chris Weider pointed out that it will allow for the 1,000 flowers blooming, a term which refers to the multiplicity of directory protocols becoming available. Patrik was asked about things not WHOIS++ and he replied that he does not believe there will be any problem handling the indexing information. Patrik was also asked is the Working Group should do a survey of indexing schemes and Patrik replied he was looking for volunteers to do so. There was a small semantic discussion on whether it was an indexing protocol or whether it was exchanging data to create an index. Patrik would like to have a common format for the index if possible. Each directory service would pass its index to the centroid, which would index that index. And in fact, the index would be indexed at each level of the tree. Some of the issues to study will be the trade-offs of the number of levels and the size of the indexes. A lot of factors are involved: data, reduction of indexes, geography. Tim Howes of U Michigan gave a presentation on an program he's written called centipede. The centipede connects to a directory over LDAP which tells it what information to produce for the centroid index. It is produced, and centipede then connects to the target with the references and uses LDAP to install the index in the entry. It generates distinct values (whole names rather than tokens) which it passes up the tree. The large index allows more precise searches and pruning. Tim gave the following URL for more information: http://www.umich.edu/~rsug/ldap CIP has the advantage of you knowing who is indexing you, while centipede does not. Chris W. reminded the group that all of this was experimental and wanted the group to think about what sorts of indexing information would be useful. The group identified the following issues: o Character sets o Tokenization algorithm o Legal issues o How to specify for partial centroids o New server-to-ask records o Schema translations o Query result o Protocol issues o Security o Replication o Dealing with replicated data o Polling cycle detection The group focused on what might be simple. These issues might be: o Common format and schema translation o Overall model o Given a name of a company, return a domain name o Index data stored in WHOIS RWHOIS WHOIS++ and X500. The group agreed that the plan of attack should be: 1) Overall model 2) Schema translation Both Joann Ordille and Roland Hedburg have some experience with schema translations that will be useful to the group. It was suggested that we would need a registry of schema with descriptions and capabilities. The group also asked what happens when the search result lives on WHOIS++ but the client only speaks LDAP. Proxies were suggested as one solution. Another solution would be to return URLS which contained queries which could be handed to a server. The group then elected to defer engineering discussions to the list, and Patrik adjourned the meeting.