Greenstone tutorial exercise
Bibliographic collection
This exercise looks at adding fielded searching to a collection. Fielded searching is best used for metadata rich collections. Here we use bibliographic data in MARC format. We also "explode" the database, enabling editing of the metadata with the Librarian Interface.
-
Start a new collection called Beatles Bibliography which will contain a collection of MARC records on the Beatles, from the US Library of Congress. Enter the requested information and base it on -- New Collection --. There is no need to include any metadata sets because the metadata extracted from the MARC records will appear as extracted metadata. Deselect the Dublin Core metadata set, and click <OK>.
A No Metadata Sets Selected warning message will pop-up, alerting you to the fact that you won't be able to manually assign metadata to the collection. In this collection, all the metadata will come from the MARC file; click <OK> to continue. (If you don't want to see this popup again, tick the Do not show this warning again checkbox.)
-
In the Gather panel, open the sample_files → marc folder, drag locbeatles50.marc into the right-hand pane and drop it there. A popup window asks whether you want to add MARCPlug to the collection to process this file. Click <Add Plugin>, because this plugin will be needed to process the MARC records.
-
In the Document Plugins section of the Design panel, remove the plugins TextPlug to NULPlug by selecting each one in the Currently Assigned Plugins list and clicking <Remove Plugin> (ZIPPlug, GAPlug and MARCPlug remain). It is not strictly necessary to remove these redundant plugins, but it is good practice to include only plugins that are needed, to avoid unwanted (and unexpected) side effects.
-
Now select Browsing Classifiers within the Design panel and remove the default classifier for Source metadata.
In the Search Indexes section, remove the ex.Source index.
In this collection all records are from the same file, so Source metadata, which is set to the filename, is not particularly interesting or useful.
-
Switch to the Create panel, build the collection, and preview it. Browse through the Titles A-Z and view a record or two. Try searching—for example, find items that include rock music.
-
Back in the Librarian Interface, go to the Browsing Classifiers section of the Design panel. Select AZCompactList from the Select classifier to add: drop down menu, and click <Add Classifier...>. In the popup window, select ex.Subject as the metadata item. Click <OK>.
AZCompactList is like AZList, except that terms that appear multiple times in the hierarchy are automatically grouped together and a new node, shown as a bookshelf icon, is formed.
-
Build the collection and preview the result.
Adding fielded searching
-
In the Design panel select Search Types from the left-hand list and activate the Enable Advanced Searches option.
-
Add form searching to the collection by selecting form in the Search Types: menu and clicking <Add Search Type>. Remove plain searching by selecting plain in the Currently Assigned Search Types list, and clicking <Remove Search Type>.
-
Build the collection once again, and preview the results. Notice that the collection's home page no longer includes a query box. (This is because the search form is too big to fit here nicely.) To search, you have to click Search for words in the navigation bar. Note that the PREFERENCES page has changed to control the advanced searching options.
-
Look at the search form in the collection. There are two fields that can be searched: text and Title. Add some more fields to search on by going back to the Librarian Interface.
-
In the Design panel, go to the Search Indexes section. Add an index on subjects by selecting ex.Subject from the Build index on: list (and deselecting anything already selected), and giving it a name in the Index Name: box, e.g. "Subject". Click <Add Index>.
-
Rebuild the collection and preview the results. Notice the extra fields in the ... in field drop-down menus in the search form. You can do quite complicated queries by searching for words in different fields at the same time.
Exploding the database
-
Go to the Enrich panel and try to see the metadata. It doesn't appear! This is because the metadata is associated with records inside the file, not the file itself.
Metadata file types, such as MARC, CDS/ISIS, BibTex etc. can be imported into Greenstone but their metadata cannot be viewed in the Librarian Interface. To edit any metadata you need to go back to the program that created the file.
Greenstone provides a new way of exploding a metadata database so that each record appears as an individual document, with viewable and editable metadata. This process is irreversible: once this step has been done, the database is deleted and can no longer be used in its original program.
-
In the Gather panel, you may notice that the MARC database has a different coloured icon to other files. This green icon indicates that a file is a metadata database that can be exploded. Right-click on the file and choose Explode Metadata Database from the menu. A new window opens, containing options for the exploding process. A description of each option can be obtained by hovering the mouse over the option.
Turn on the metadata_set option by checking its box. This option indicates which metadata set to explode the metadata into. The default set is the "Exploded Metadata Set"—a metadata set which initially has no elements in it, but will receive a new element for each metadata field retrieved from the database.
-
Click <Explode> to start the exploding process. This may take a short while, depending on the size of the database.
-
Once exploding has finished, the MARC database file will have been deleted, and a folder created in its place. This folder contains an empty file for each record in the original database. The metadata for these records can be viewed and edited by switching to the Enrich panel.
-
Because the MARC file is no longer present, and the collection contains empty (.nul) files, we need to change the list of plugins. In the Document Plugins section of the Design panel, remove MARCPlug and add NULPlug (use the default configuration).
-
Rebuild and preview the collection. You will notice that the Titles A-Z classifier displays the filename not the record title, the Subjects classifier is empty, searching no longer returns any results, and the document display is useless.
Reformatting the collection to use the exploded metadata
The collection previously used extracted (ex.) metadata, but now it uses exploded (exp.) metadata. The classifiers and search indexes were built on ex metadata, which is why they no longer work properly.There is also no longer any text in the documents. Previously, MARCPlug stored the raw record as the "text" of each record. Now that the metadata is in the Librarian Interface, there is no longer the concept of raw record, and so there is no text.We need to modify the collection design to take note of these changes.
-
In the Search Indexes section, change the Title index to use exp.Title: select the Title index in the Assigned Indexes list. Deselect ex.Title in the Build index on: list, and select exp.Title. Click <Replace Index>.
-
Remove the ex.Subject index by selecting it in the Assigned Indexes list and clicking <Remove Index>. Add an index on exp.Subject: type "Subject" in the Index Name: field, select exp.Subject in the Build index on: list (making sure nothing else is selected), and click <Add Index>.
-
The text index is no longer any use, so remove that index too.
-
To enable combined searching across all indexes at once, tick the Add combined searching over all indexes (allfields) checkbox, enter an appropriate name in the Index Name: field (e.g. "All Fields", then click <Add Index>. Move this to the top of the list using the <Move Up> and <Move Down> buttons, so that it appears first in the drop down list. Click <Set Default Index> so that it becomes the default field for searching.
-
In the Browsing Classifiers section, change the Title AZList to use exp.Title metadata. Double click the Title AZList in the Currently Assigned Classifiers list, and change the metadata option to use exp.Title. Click <OK>. Do the same thing for the Subject AZCompactList.
-
In the Format Features section, select VList in the list of assigned format statements.
-
There is no dls or dc Title, so replace {Or}{[dls.Title],[dc.Title],[ex.Title],Untitled} with {Or}{[exp.Title],[ex.Title],Untitled}.
-
There are no source or thumb icons, so remove the second line: <td valign="top">[ex.srclink]{Or}{[ex.thumbicon],[ex.srcicon]}[ex./srclink]</td>.
-
The ex.Source metadata is set to the nul filename, so remove that from the display: remove {If}{[ex.Source],<br><i>([ex.Source])</i>}
The resulting format statement looks like:
<td valign="top">[link][icon][/link]</td>
<td valign="top">[highlight]
{Or}{[exp.Title],[ex.Title],Untitled}
[/highlight]</td>
Click <Replace Format>.
-
Clear the DocumentHeading format statement by selecting it in the list of assigned format statements, deleting the contents in the HTML Format String, and clicking <Replace Format>. The record Title will be displayed as part of the DocumentText format, so we don't need it here.
-
Next, edit the DocumentText format statement. Delete the contents and replace it with
<table>
<tr><td>Title:</td><td>[exp.Title]</td></tr>
<tr><td>Subject:</td><td>[exp.Subject]</td></tr>
<tr><td>Publisher:</td><td>[exp.Publisher]</td></tr>
</table>
Remember to click <Replace Format>.
-
The DETACH and NO HIGHLIGHTING buttons are not very useful for this collection, so lets get rid of them. Edit the DocumentButtons format statement, make it empty, and click <Replace Format>.
-
Rebuild and preview the collection. The classifiers should be back to normal, searching should now work, and there should be a nice record display.