After an email request, we’ve developed a new plugin, MetadataXMPPlug, to extract eXtensible Metadata Platform information from PDFs. The plugin is very basic – but should read most XMP information as long as it adheres to the RDF standard.
The neatest part of this project it that it makes use of Greenstone’s (relatively) new multipass import functionality. The MetadataXMPPlug is used during the metadata_read pass to extract metadata from PDF files, while the files themselves are handled by PDFPlug during the later import pass.
August 17th, 2009 at 5:55 pm
[...] of over one million pages, and we’re continuing this work with funding from a government R&D Grant. We’re currently working on another newspaper digitization project which will eventually [...]