Greenstone tutorial exercise
Open Archives Initiative (OAI) collection
This exercise explores service-level interoperability using the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH). So that you can do this on a stand-alone computer, we do not actually connect to the external server that is acting as the data provider. Instead we have provided an appropriate set of files that take the form of XML records produced by the OAI-PMH protocol.
One of Greenstone's documented example collections is sourced over OAI. This exercise takes you through the steps necessary to reconstruct it. (Note: this example is a collection of images: you will not be able to build it unless ImageMagick is installed on your computer.) You may wish to take a look at the documented example collection OAI demo now to see what this exercise will build.
-
Start a new collection called OAI Service Provider. Fill out the fields with appropriate information. You can leave the default metadata set as Dublin Core, although we do not make use of it.
-
In the Gather panel, locate the folder sample_files → oai → sample_small → oai. Drag this folder into the collection and drop it there.
-
During the copy operation, a popup window appears asking whether to add OAIPlug to the list of plug-ins used in the collection, because the Librarian Interface has not found an existing plug-in that can handle this file type. Press the <Add Plugin> button to include it.
The files for this collection consist of a set of images (in JCDLPICS → srcdocs) and a set of OAI records (in JCDLPICS) which contain metadata for the images.
When files are copied across like this, the Librarian Interface studies each one and uses its filename extension to check whether the collection contains a corresponding plug-in. No plug-in in the list is capable of processing the OAI file records that are copied across (they have the file extension .oai), so the Librarian Interface prompts you to add the appropriate plug-in.
Sometimes there is more than one plug-in that could process a file—for example, the .xml extension is used for many different XML formats. The popup window, therefore, offers a choice of all possible plug-ins that matched. It is normally easy to determine the correct choice. If you wish, you can ignore the prompt (click <Don't Add Plugin>), because plug-ins can be added later, in the Document Plugins section of the Design panel.
-
You need to configure the image plug-in. In the Design panel, select the Document Plugins section, then select the plugin ImagePlug line and click <Configure Plugin...>. In the resulting popup window locate the screenviewsize option, switch it on, and type the number 300 in the box beside it to create a screen-view image of 300 pixels. Click <OK>.
-
Now switch to the Create panel and build and preview the collection.
OAIPlug will process the OAI records, and assign metadata to the images, which are processed by ImagePlug.
Like other collections we have built by relying on Greenstone defaults, the end result is passable but can be improved. The next steps refine the collection using the metadata harvested by OAI-PMH into the .oai files.
-
In the Browsing Classifiers section of the Design panel, delete the two AZList classifiers (ex.Title and ex.Source).
-
Add an AZCompactList classifier based on ex.Subject metadata.
-
Now add an AZCompactList classifier based on ex.Description metadata. In its configuration panel set mingroup to 2, mincompact to 1, maxcompact to 10 and buttonname to Captions.Setting mingroup to 2 will mean that two or more documents with the same description will be grouped into a bookshelf; the default mingroup of 1 means that every document will get a bookshelf. mincompact and maxcompact control how many documents are grouped into each section of the horizontal A-Z list. In this case, each group can have as few as one document, and no more than ten.
-
In the Search Indexes section of the Design panel, delete all indexes and add a new one called "captions" based on ex.Description metadata.
-
Build the collection and preview it.
Tweaking the presentation with format statements
-
In the Design panel, select Format Features. First replace the VList format statement with this:
<td>
{If}{[numleafdocs],[link][icon][/link],[link][thumbicon][/link]}
</td>
<td valign=middle>
{If}{[numleafdocs],[Title],<i>[Description]</i>}
</td>
You will find this text in the file vlist_tweak.txt in the sample_files → oai →format_tweaks folder. Remember to click <Replace Format> when finished
This format statement customizes the appearance of vertical lists such as the search results and captions lists to show a thumbnail icon followed by Description metadata. Greenstone's default is to use extracted metadata, so [Description] is the same as [ex.Description].
-
Next, select DocumentHeading from the Choose Feature pull-down list and change its format statement to:
<h3>[Subject]</h3>
Click <Replace Format>.
The document heading appears above the DETACH and NO HIGHLIGHTING buttons when you get to a document in the collection. By default DocumentHeading displays the document's ex.Title metadata. In this particular set of OAI exported records, titles are filenames of JPEG images, and the filenames are particularly uninformative (for example, 01dla14). You can see them in the Enrich panel if you select an image in oai → JCDLPICS → srcdocs and check its ex.Source and ex.Title metadata. The above format statement displays ex.Subject metadata instead.
-
Finally, you will have noticed that where the document itself should appear, you see only "This document has no text.". To rectify this, select DocumentText in the Choose Feature pull-down list and use the following as its format statement (this text is in doctxt_tweak.txt in the format_tweaks folder mentioned earlier):
<center><table width=_pagewidth_ border=1>
<tr><td colspan=2 align=center>
<a href=[OrigURL]>[screenicon]</a></td></tr>
<tr><td>Caption:</td><td> <i>[Description]</i> <br>
(<a href=[OrigURL]>original [ImageWidth]x[ImageHeight] [ImageType] available</a>)
</td></tr>
<tr><td>Subject:</td><td> [Subject]</td></tr>
<tr><td>Publisher:</td><td> [Publisher]</td></tr>
<tr><td>Rights:<td> [Rights]</td></tr>
</table></center>
Click <Replace Format>.
This format statement alters how the document view is presented. It includes a screen-sized version of the image that hyperlinks back to the original larger version available on the web. Factual information extracted from the image, such as width, height and type, is also displayed.
-
Format statements are processed by the runtime system, so the collection does not need to be rebuilt for these changes to take effect. Switch to the Design panel and press <Preview Collection> to see the changes.
To expedite building, this collection contains fewer source documents than the pre-built version supplied with the Greenstone installation. However, after these modifications, its functionality is the same.