HIST 6840 – Metadata in Archives

Hello future archivists!  Welcome.

For our demonstration on aggregated metadata, please fill out the following survey: https://forms.gle/FiKKND4JiVYFgMTQ6 

For access to the Prezi from today’s discussion, visit here: https://prezi.com/1axaaqjp-qcd/hist6840-metadata-in-research-and-archives/

The list of Archival and Digital resources presented in class is outlined below or the Google doc located here:  https://docs.google.com/document/d/10RaZueUjXtCD3rnjpTcCQDEJRS-KoBzTF1RhiakyWw8/edit?usp=sharing

Regional (Intermountain West) Digital Libraries

Aggregations of Digital Libraries

Aggregations of Archives (finding aids):

Standards

Cataloging Celebration Week

It’s Cataloging and Metadata Services celebration week at USU Libraries!  Take a look at this great interactive exhibit created by Andrea Payant, Becky Skeen, and Paul Daybell.  Then head over to USU Libraries’ Twitter and Instagram pages to participate and win prizes!

Twitter:

Instagram:

ARK de Triumph: Linking Finding Aids and Digital Libraries with Low-Tech Processes

Presentation given to the American Folklore Society: Archives and Library Section ConferenceAdvInArchiving2017_ArcDTriumph-DigitalLinking (1), Minneapolis, Minnesota, October 16, 2017

Presented by:

  • Andrea Payant, Metadata Librarian
  • Sara Skindelien, EAD and Metadata Specialist
  • Liz Woolcott, Head of Cataloging and Metadata Services

Abstract: 

As archives work to make their folklore and other special collections material available online, the segregation of finding aids and digital collections in silos presents a discovery and usability problem for patrons. Researchers discovering a finding aid for an oral history collection may not know that the recordings are digitized and available online in a separate database or vice versa. How can the workflows that feed these two different discovery points be combined to improve access to these valuable collections?

Utah State University (USU) Libraries realized that a cross-departmental workflow was necessary to connect both new and legacy finding aids with their digitized content to improve use and access. For legacy collections, the library created a low-tech process for semi-automating the batch linking of item and box level entries in EAD finding aids to the corresponding digitized material in CONTENTdm. For new collections, particularly born digital collections, the process enabled USU Libraries to simultaneously create a digital collection and a digitally linked finding aid all in one process.

In addition to the obvious benefit of linking finding aid inventories to their digital counterparts, this process implemented persistent identifiers that ensured the links would be consistent over time, even if databases were changed. The metadata staff minted ARKs identifiers for each object and crafted links between the finding aid and the digital library using those static ARKs instead of the system supplied URLs. This small, but important extra step will provide the flexibility to accommodate database migration in the future.

These processes used low-tech, familiar tools to lower the barrier for engaging more staff and student help in the work. This presentation will provide session participants with a discussion of the process, access to detailed step-by-step instructions, and will address future developments.

 

Serials MARC Cataloging Decision Making Workflow

The Cataloging and Metadata Services Unit at Utah State University ‘s Merrill-Cazier Library recently set out to map the workflows of the unit, including where material comes in, how it is handled, and where it goes after it leaves the unit.

Here is the workflow specific to the library’s serials work, created by our Special Collections and Serials cataloger Melanie Shaw.  These workflows  outline the procedures for serials that are cataloged in MARC for the integrated library system (ILS).

Previous workflow diagrams posted to this blog represented a high level overview of the work process for collections and materials, intended for non-catalogers.  This is the first in a series of workflows that will “zoom in” to show the step by step process of serials MARC cataloging.  Although it is more detailed, it is not meant to be exhaustive, but rather a visualization of the decision making process.

serials_cataloging_workflow_finalThis workflow diagram was made using LucidChart.

Electronic / Digital Resources MARC Cataloging Workflows

The Cataloging and Metadata Services Unit at Utah State University ‘s Merrill-Cazier Library recently set out to map the workflows of the unit, including where material comes in, how it is handled, and where it goes after it leaves the unit.

Here is the workflow specific to our electronic research collections.  These workflows  outline the basic procedures for electronic collections that are being cataloged in MARC for the integrated library system (ILS).

These workflows only look at the work done by the cataloging unit, but larger Technical Services division mapping is currently underway.  We anticipate that these workflows will look quite different when put in the context of the division as a whole.

This workflow diagram was made using LucidChart.

electroniccollections_cataloging_workflows-page-1

Physical Collections MARC Cataloging Workflows

The Cataloging and Metadata Services Unit at Utah State University ‘s Merrill-Cazier Library recently set out to map out the general workflows of the unit, including where material comes in, how it is handled, and where it goes after it leaves the unit.

Here is the workflow specific to our general research collections.  These workflows are specific to physical collections that are being cataloged in MARC for the integrated library system (ILS).

This workflow diagram was made using LucidChart.

physicalcollections_cataloging_workflows-page-1-1

 

Special Collections and Art Book Room MARC Cataloging Workflows

The Cataloging and Metadata Services Unit at Utah State University ‘s Merrill-Cazier Library recently set out to map out the general workflows of the unit, including where material comes in, how it is handled, and where it goes after it leaves the unit.

Here is the workflow specific to our unique material in the Special Collections and Archives division as well as our Art Book Room material.  These workflows are specific to physical collections that are being cataloged in MARC for the integrated library system (ILS).

This workflow diagram was made using LucidChart.

sca-workflows-page-1-1

Digital Content Linking Workflow

created by Utah State University Cataloging and Metadata Services Deptartment

This process semi-automates the batch linking of item and folder level entries in EAD finding aids to the corresponding digitized material in a digital repository (for USU Libraries, this is CONTENTdm.)  An overview of this workflow is available: http://journal.code4lib.org/articles/11997

Workflow for linking digital objects to corresponding items within EAD finding aids

Tools Needed:

  • Microsoft Excel
  • Microsoft Word
  • XML Editor (such as oXygen)

Purpose

This workflow is a step-by-step process that utilizes robust digital collection metadata to create EAD finding aids linked to digital objects at the item or folder level using the <dao> tag. [i] The process outlined below assumes that the metadata for digital objects will include the following fields: title, date, format, call number (or at minimum the collection, series, box, or folder information), and most importantly, the URL to access the item.

Workflow

Step 1.   Export collection metadata from the digital asset management system in a format that can be read as a spreadsheet.  For example, as a tab-delimited or comma-separated file.[ii]

  • Ensure that the title, format, date, call number (or at minimum it should include the collection, series, box, or folder information), and URL fields are included.  Having the call number split out into separate entities (series, box, folder, or item numbers) is preferred.

Step 2.   Edit the metadata for EAD

  • Open the collection metadata file in Microsoft Excel.
figure1Figure 1 – Digital collection metadata downloaded as a tab delimited file
  • Edit the spreadsheet to only include fields necessary for the EAD guide. Eliminate all metadata columns except the following:
  • Title
  • Format
  • Date
  • Call Number (or call number related fields)
  • URL
figure2Figure 2 – Edited Spreadsheet data (only including fields for EAD)
  • Check the entries and ensure that they are complete. The minimum requirement is Title and URL.
  • Depending on the complexity of the collection’s hierarchy, insert at least five empty columns at the beginning of the sheet and title them as follows (*columns reflected with an asterisk will vary and will need to reflect the hierarchical depth of the physical collection):
    • Component Level
    • Component Number
    • Box
    • Folder
    • *Item*
  • If the call number is not split out, add additional empty columns before/next to the “Call Number” column. These will serve as a workspace for separating the call number information into box, folder, or item numbers in the next step.
  • Separate the call number information into series, box, folder, item, etc. There are several Excel formulas that can help automate this.  One useful formula is:

Example Formula:

=MID(D2, 2,1) – This extracts certain characters within the text string.  For this example, the first call number listed below is 01:01:01. This refers to Box 1, Folder 1, Item 1 in the collection.  To extract each element, use the formula =MID(D2,2,1) which refers to cell “D2”, starting with the 2nd character and directs the program to extract 1 character.  This will extract the number “1” and place it in the cell the formula is initially entered in.  Replace the cell reference, starting character number and number of characters to obtain the specific series, box, folder, or item numbers necessary.

figure3

Figure 3 – Example Call Number (represents a collection with Box, Folder, and Item numbers reflected in the call number)
 figure4Figure 4 – How to use formulas to split the call number information into individual columns
  • When all of the separated information is sorted into the appropriate columns and rows as depicted above, copy and paste the columns as values. This can be done as follows:
    • Highlight the columns that have the separated call number information
    • Right click to copy the columns
    • Right click again in the same location and select “Paste Special”
    • Select “Values” – this will get rid of the formula information within the cells and just keep the product of the formula instead
  • Sort the entries in Call Number order and reflect the hierarchical structure of the physical collection.
  • In the Component Level column, indicate the component type using the approved EAD <c> level elements.[iii] In the Component Number column, indicate the component level number (ie. <c01> through <c12>.)  However, only indicate the variable number and do not include the “c” or “c0”portion of the element.

figure5

Figure 5 –Adding Component Level and Component Number Information (Note: this particular example has no hierarchy) 
  • Delete the Call Number column and any other “workspace” columns that were used to separate data.
  • Add a column at the end with the header: Resource Label
  • In the Resource Label column, enter the text that will be displayed as the link to the user. Copy the text for all rows. (For example, “Click to Access” or “Click to view.”)
  • The final spreadsheet should have the following columns (*columns reflected with an asterisk will vary and will need to reflect the hierarchical depth of the physical collection)
    • Component Level
    • Component Number
    • Box*
    • Folder*
    • Item*
    • Title
    • Format
    • Date
    • URL
    • Resource Label
  • Add in additional rows as necessary to create <c> level information for series, boxes, or folders that are not digital objects, but will serve as headers in the EAD guide. (see rows 2 and 3 in Figure 6 below for an example.)
 figure6Figure 6 – Example of a final spreadsheet (using collection data as seen in Figure 1 – Note: this collection has a hierarchy)

Step 3. Use the Mail Merge function to create a new XML container list with links to digital content [iv]

Once finished making the necessary edits to the spreadsheet, the next step is to utilize the mail merge function in Microsoft Word to create a new XML container list for the finding aid with the new links to digital content embedded.

  • To begin open a new Word document and insert an XML template like the one listed below. This template should represent the XML coding needed for a single item in a collection and you want to be sure to include the digital archival object <dao> and any attribute tagging necessary for the content linking to operate effectively.  The parts of the XML template that are highlighted in the angle brackets will be variable while the rest of the text will stay constant, or fixed.

Template for use with xlink namespace:

<c0«Component_Number»level=”«Component_Level»”>
<did>
<container type=”box”>«Box»</container>

<container type=”folder”>«Folder»</container>
<unitid>«Item»</unitid>
<unittitle encodinganalog=”title”>«Title»</unittitle>
<daogrp>
<resource xlink:label=”start”>«Resource_Label»</resource>
<daoloc xlink:label=”image” xlink:href=”«ARK_URL»” xlink:title=”digital image of «Title»”
xlink:role=”«Format»”/>
<arc xlink:form=”start” xlink:to=”image” xlink:show=”new” xlink:actuate=”onRequest”/>
</daogrp>
<unitdate>«Date»</unitdate>
</did>
</c0«Component_Number»>

Template for use without xlink namespace

<c0«Component_Number»level=”«Component_Level»”>
<did>
<container type=”box”>«Box»</container>

<container type=”folder”>«Folder»</container>
<unitid>«Item»</unitid>
<unittitle encodinganalog=”title”>«Title»</unittitle>
<daogrp>
<resource label=”start”>«Resource_Label»</resource>
<daoloc label=”image” href=”«ARK_URL»” title=”digital image of «Title»”
role=”«Format»”/>
<arc form=”start” to=”image” show=”new” actuate=”onRequest”/>
</daogrp>
<unitdate>«Date»</unitdate>
</did>
</c0«Component_Number»>

  • The highlighted portions above represent where the data from each column in the spreadsheet will be placed. As each row in the spreadsheet represents a new digital item, it will create a new a new <c0> unit for each row
  • To perform the mail merge, first go to the mailings tab in Word and click “Start Mail Merge.” Make sure “Normal Word Document” is selected.
 figure7Figure 7 – In the “Mailings” Tab select “Start the Mail Merge.” Make sure “Normal Word Document” is selected.
  • Click “Select Recipients” and choose “Use Existing List”
 figure8Figure 8 – Select an existing list (the edited spreadsheet)
  • A new window will open. Select a table and then select the correct spreadsheet.
 figure9Figure 9 – Select Table (the edited spreadsheet again)
  • Select the specific sheet in the spreadsheet file to use as the data source.

figure10

Figure 10 – Select the data source (sheet)
  • Next, assign columns from the spreadsheet to the corresponding EAD elements in the XML template. Begin by highlighting the first variable EAD element, then go to “Insert Merge Field” then select the matching field from the drop down list.  Repeat the same process for each of the variable EAD elements in the template.
figure11Figure 11 – Inserting merge fields for each column of information from your spreadsheet matching them to corresponding placeholder in the template
  • Once finished, complete the merge by selecting “Finish & Merge,” then select “Edit Individual Document.” A new window will open. Choose “All”

figure12

Figure 12 – Finish and merge all the records to generate a new word document
  • A new word document will be produced and the XML coding for each of the items in the collection will be visible with information inserted from the spreadsheet. One entry will be displayed on each page of the document
figure13 Figure 13 – A new word document with data populated by your spreadsheet in corresponding places within XML
  • Remove the extra blank space between entries by using the find and replace function in Word. Use ^b to find blanks and replace with ^|^| . This will insert two empty lines or manual line breaks between the entries.
 figure14Figure 14 – Removing blank space in the document
  • The word document should look like this:
figure15
Figure 15 – A completed XML container list in word
  • Make any necessary edits to finalize the XML container list (like removing empty tags.)
  • Copy the container list in Word and paste it into the <dsc> section of the master XML file for of the collection’s finding aid, overlaying the previous <dsc> content. Perform quality control on the XML.   NOTE: If the digital collection is a not a 1-to-1 relationship with the physical collection, some edits may need to be made.  It may be advisable to only copy and paste specific sections of the newly minted XML.  If the digital content only reflects random individual items in a collection, the process will need to be adjusted to copy and paste single items at a time.
 figure16Figure 16 – Copy the container list
figure17 Figure 17 – Paste the container list into the <dsc> section of the master finding aid
  • Once finished, upload your new finding aid complete with direct links to digital objects
figure18Figure 18 – The finding aid complete with links to digital objects in Archives West (Note: folder information is not seen here because it was added in the above procedures for demonstration purposes only.  For this collection, “1972.001.003” is a single item number and not a box or folder number.)

Notes and Uses

This batch process can be used in two ways to provide digital content links: a new XML formatted container list for an EAD finding aid or an updated version of an existing container list. With regards to updating an existing EAD container list, this process is most accommodating for digital collections that share a 1-to1 relationship with the physical collection, where all items in the physical collection have been digitized and are housed in one single digital collection.  Digital collections that contain more than one physical collection will be more problematic and require additional steps to sort.  Likewise, digital collections with content that comprise only a portion of a physical collection, instead of all of it, will also require additional steps.

_____________________________________________________________________________
[i] To see the specifications for the <dao> tag in EAD 2002, please visit: https://www.loc.gov/ead/tglib/elements/dao.html
[ii] This process will vary depending on the digital asset management system.  For CONTENTdm users, the process is outlined here: https://www.oclc.org/support/services/contentdm/help/collection-admin-help/exporting-metadata.en.html
[iii] EAD tag library: https://www.loc.gov/ead/tglib/att_gen.html
[iv] The mail merge process was not created by Utah State University, but was shared by the Utah State Archives with USU Libraries in 2008.  It can be found here:  http://archives.utah.gov/research/inventories/ead.html