Greenstone tutorial exercise

Back to wiki
Back to index
Prerequisite: A large collection of HTML files—Tudor
Devised for Greenstone version: 2.60
Modified for Greenstone version: 2.70w

Enhanced collection of HTML files—Tudor

We return to the Tudor collection and add metadata that expresses a subject hierarchy. Then we build a classifier that exploits it by allowing readers to browse the documents about Monarchs, Relatives, Citizens, and Others separately.

Adding hierarchically-structured metadata and a Hierarchy classifier

  1. Open up your tudor collection (the original version, not the webtudor version), switch to the Enrich panel and select the citizens folder (a subfolder of englishhistory.net → tudor). Set its dc.Subject and Keywords metadata to Tudor period|Citizens. The vertical bar ("|") is a hierarchy marker. Selecting a folder and adding metadata has the effect of setting this metadata value for all files contained in this folder, its subfolders, and so on. A popup alerts you to this fact. Click <OK> to close the popup.

  1. Repeat for the monarchs and relative folders, setting their dc.Subject and Keywords metadata to Tudor period|Monarchs and Tudor period|Relatives respectively. Note that the hierarchy appears in the Existing values for dc.Subject and Keywords area.

    If you don't want to see the popup each time you add folder level metadata, tick the Do not show this warning again checkbox; it won't be displayed again.

  1. Finally, select all remaining files—the ones that are not in the citizens, monarchs, or relative folders—by selecting the first and shift-clicking the last. Set their dc.Subject and Keywords metadata to Tudor period|Others: this is done in a single operation (there is a short delay before it completes).

    When multiple files are selected in the left hand collection tree, all metadata values for all files are shown on the right hand side. Items that are common to all files are displayed in black—e.g. dc.Subject and Keywords—which others that pertain to only one or some of the files are displayed in grey—e.g. any extracted metadata.

    Metadata inherited from a parent folder is indicated by a folder icon to the left of the metadata name. Select one of the files in the relative folder to see this.

  1. Switch to the Design panel and select Browsing Classifiers from the left-hand list. Set the menu item for Select classifier to add: to Hierarchy; then click <Add Classifier...>.

  1. A window pops up to control the classifier's options. Change the metadata to dc.Subject and Keywords and then click <OK>.

  1. For tidiness' sake, remove the classifier for Source metadata (included by default) from the list of currently assigned classifiers, because this adds little to the collection.

  1. Now switch to the Create panel, build the collection, and preview it. Choose the new Subjects link that appears in the navigation bar, and click the bookshelves to navigate around the four-entry hierarchy that you have created.

Adding a hierarchical phrase browser (PHIND)

Next we'll add an interactive hierarchical phrase browsing classifier to this collection.

  1. Switch to the Design panel and choose the Browsing Classifiers item from the left-hand list.

  1. Choose Phind from the Select classifier to add: menu. Click <Add Classifier...>. A window pops asking for configuration options: leave the values at their preset defaults (this will base the phrase index on the full text) and click <OK>.

  1. Build the collection again, preview it, and try out the new option in the navigation bar. An interesting PHIND search term for this collection is "king".

Partitioning the full-text index based on metadata values

Next we partition the full-text index into four separate pieces. To do this we first define four subcollections obtained by "filtering" the documents according to a criterion based on their dc.Subject and Keywords metadata. Then an index is assigned to each subcollection. This will enable users to restrict a search to a subset of the documents.

  1. Switch to the Design panel, and click Partition Indexes. This feature is disabled because you are operating in Librarian mode (this is indicated in the title bar at the top of the window).

  1. Switch to Library Systems Specialist mode by going to Preferences... (on the File menu) and clicking <Mode>. Read about the other modes too.

  1. Return to the Partition Indexes section of the Design panel. Ensure that the Define Filters tab is selected (the default). Define a subcollection filter with name monarchs that matches against dc.Subject and Keywords, and type Monarchs as the regular expression to match with. Click <Add Filter>. This filter includes any file whose dc.Subject and Keywords metadata contains the word Monarchs.

  1. Define another filter, relatives, which matches dc.Subject and Keywords against the word Relatives. Define a third and fourth, citizens and others, which matches it against the words Citizens and Others respectively.

  1. Having defined the subcollections, we partition the index into corresponding parts. Click the Assign Partitions tab. Select the first subcollection and give it the name citizens; click <Add Partition>. Repeat for the other three subcollections, naming their partitions monarchs, others and relatives.

    The order they appear in the Assigned Subcollection Partitions list os the order they will appear in the drop down menu on the search page. You can change the order by using the <Move Up> and <Move Down> buttons.

  1. Build and preview the collection.

  1. The search page includes a pulldown menu that allows you to select one of these partitions for searching. For example, try searching the relatives partition for mary and then search the monarchs partition for the same thing.

  1. To allow users to search the collection as a whole as well as each subcollection individually, return to the Partition Indexes section of the Design panel and select the Assign Partitions tab. Type all into the Partition Name: and select all four subcollections by checking their boxes. Click <Add Partition>.

  1. To ensure that the all index appears first in the list on the reader's web page, use the <Move Up> button to get it to the top of the list here in the Design panel. Then build and preview the collection.

  1. Search for a common term (like the) in all five index partitions, and check that the numbers of words (not documents) add up.

  1. Return to Librarian mode, using Preferences... (on the File menu).

Controlling the building process

Finally we look at how the building process can be controlled. Developing a new collection usually involves numerous cycles of building, previewing, adjusting some enrich and design features, and so on. While prototyping, it is best to temporarily reduce the number of documents in the collection. This can be accomplished through the maxdocs parameter to the building process.

  1. Switch to the Create panel and view the options that are displayed in the top portion of the screen. Select maxdocs and set its numeric counter to 3. Now build.

  1. Preview the newly rebuilt collection's Titles A-Z page. Previously this listed more than a dozen pages per letter of the alphabet, but now there are just three—the first three files encountered by the building process.

  1. Go back to the Create panel and turn off the maxdocs option. Rebuild the collection so that all the documents are included.