Getting the data right 2 – ISBNs

For the most part we have found that using ISBNs with the CCM tool has produced excellent results. However, a couple of things about the way ISBNs are used in cataloguing and how Copac records work mean that searching and de-duplicating by ISBNs in CCM can occasionally be dangerous.

First, ISBNs for series are often recorded in the catalogue record for individual books in a series. So, for example, a search on CCM for ISBN 9025606199 (for the series Byzantinische Forschungen) returns 30 Copac records. Some of these records are for the series, but there are also records for 14 separate titles within this series. With ISBN de-duplication this is reduced to just one record. This could give the impression that a book that is in fact unique to your library is held at multiple places. In our opinion, therefore, unless you can be sure your list contains no series ISBNs, ISBN de-duplication should not be used for decisions such as withdrawals.

Secondly, it is common practice in cataloguing to record a print ISBN in an e-book record. Sometimes the 776 field is used, or the $z of the 020, but there are also a lot of e-book records in which the print ISBN is included in the 020$a (sometimes, indeed, the ISBN is shared by print and e-versions). But in cataloguing there are distinct records for the print version and e-version of a book, whereas this does not seem to be the case in Copac records. So, for example, a search for ISBN 9780230238978 returns just one record, even though the holdings for it are both print (at e.g. UCL) and electronic (at KCL). This could give the impression that a book that is in fact unique to your library in the print version is held elsewhere, whereas all the other holdings are for electronic versions. There is detailed information about the holdings in the full record, but for library staff to check these by batch would be very difficult, if not impossible. For withdrawal decisions the apparent merging of print and electronic resources on one record is problematic.

Advertisements
Posted in Uncategorized | Leave a comment

Shakespeare – some figures

In response to a parallel seach being performed at SHL, we searched for 4 sets of records in the King’s catalogue:

  • Shakespeare as author 1920 – 1929
  • Shakespeare as subject 1920 – 1929
  • Shakespeare as author 2000 – 2009
  • Shakespeare as subject 2000 – 2009

For collection comparison it is necessary to look also at the results from SHL, and directly searching the CCM interface would be interesting too. But the results of our own searches, comparing with the entire catalogue, are also worth recording: they highlight the issues surrounding compiling a file from the local catalogue (quality of local data is not always as high as we would like) and determining what the results from the tool really mean.

Shakespeare as author – 1920’s

The search of King’s catalogue produced 43 records.  We used CCM to compare our holding with Senate House.   Due to the use of “attributed author” some of the records were for music based on Shakespeare’s works.  Of the 43, 13 were apparently unique to King’s(and CCM gave the same result (13 unique records using multi-field deduplication level 1, 2, and 12 using level 3)  but checking by eye it seems that in many cases there were several other copies of the book held with only minor cataloguing discrepancies.  It was also immediately apparent that our 008 fields were incorrect in some cases and the 260 field had a much later date.

Shakespeare as subject – 1920’s

The search produced 77 records of which 76 matched with a record on COPAC (with Multi-field deduplication level 2) and 10 were unique to King’s.

Checking the 10 “unique” records (by eye) on COPAC suggests that only 2 were actually held only at King’s. 

Shakespeare as author – 2000’s

The search produced 545 records.  These matched with 542 records on CCM using ISBN deduplication.

However, excluding electronic versions of older works left 87 records.

We matched these 87 records using CCM.  Of the 87, 84 matched with a record on Copac using  ISBN deduplication and 85 matched using multi-field deduplication level 2.  18 were held at King’s only (apparently).

Shakespeare as subject – 2000’s

The search produced 410 records. Uploading to CCM produced 338 matches with ISBN deduplication. The “missing” 72 records are mainly accounted for by the ebook series Shakespeare Survey which has the ISBN of the print book and the ISBN of the series in each record.  There are a small number of other books where King’s has 2 records for the same ISBN.

Of the matched records 36 were held at King’s only. Looking at the King’s only records some are almost certainly cataloguing variants of the same work.  Hamlet by Zefferelli searched on Copac produces 18 video recordings, 1 videocassette, 2 DVDs, 1 video disc, 2 screen plays and a book.  Most of the 18 videorecordings seems to be the same 2000 edition held “only” by King’s.

 

Posted in Uncategorized | Leave a comment

RLUK Conference Pecha Kucha

Christine Wise, Associate Director, Historic Collections and Keeper of Special Collections at Senate House Libraries, and co-project manager, presented a Pecha Kucha session ‘Applying Collection Management Tools to Real-World collections’ at the RLUK Conference 2012 held in Newcastle earlier this month.

Posted in Uncategorized | Leave a comment

CCM for collection comparison – getting the data right

James Clark, Metadata Coordinator at King’s College London and co-project manager describes some practical findings from applying CCM to compare collections:

The first major piece of work we did at King’s was with our collection of books on military studies (class U in LCC). We found 6055 records that fitted our criteria, so this gave us a substantial sample to test CCM. One issue highlighted by other institutions who have worked with CCM is de-duplication when ISBNs are not available, and this is something that we found problematic too.

Initially we ran the file without de-duplication . 6055 records mapped to 5885 Copac records (the difference is explained because we have some duplicates on our catalogue and because some of our records haven’t made it into Copac) and 1505 records were identified as unique to King’s. When we ran the third level of de-duplication (author/title), with which we would expect most records to be de-duplicated, the number of records unique to King’s dropped to 1441.

As some of our records have ISBNs we can test the author/title de-duplication against ISBN de-duplication.  Of the original 1505 identified as unique, 531 had ISBNs. We ran these 531 records through ISBN de-duplication, which identified only 187 as unique. If this is any guide to the records without ISBNs (perhaps it is not!), then only a third of the 1505 records originally identified as unique would in fact be unique, and approximately 1000 would have matches. But author/title de-duplication only matched a further 64 records, reducing the number of apparently ‘unique’ records from 1505 to 1441.

The number of duplicates is no doubt a consequence of the variations in cataloguing that exist among the records of the contributing libraries. How problematic it is really depends on what CCM is being used for. For preservation purposes, you would want the tool to be very cautious when de-duplicating records, and then at least you could use the tool to get a minimum number of other holding libraries. Although this is not ideal, it would still allow a significant amount of stock to be released in the knowledge that it is held elsewhere. Even without any de-duplication, 1542 (about 25%) of our books in this sample were identified as held at more than ten other libraries. Because of this, it is possible for the matching to be effective for practical purposes even though it is not perfect. If your record matches with ten others, it does not matter that it does not match with an eleventh.

However, when comparing your collection with just one other library, any failure to match records will have a significant effect, giving the impression that the libraries’ holdings are more discrete than they actually are. On the other hand, when comparing only two libraries, duplicate records are likely to be much rarer. When we ran the same file of 6055 records through the tool and compared only with Senate House Libraries (SHL), we found matches for 1218 of our records. We were able to extract an ISBN for 2645 of the records that did not match and run these through CCM comparing with SHL. 127 of these matched with SHL giving us a failure-to-match rate for system number searches of just under 5%. In our opinion this rate is low enough for reasonable conclusions to be drawn about the relative strengths of the two collections.

Posted in Uncategorized | Leave a comment

What would success look like?

Since the Library Systems programme meeting held in Birmingham on 13th July, we have been considering what success would look like for this project. This project has several discrete strands of activity and we have identified key success criteria for each. By the end of the project we will have:

  • Successfully applied CCM to the King’s and Senate House Libraries systems and collection contexts, and gained an understanding of the practical strengths and limitations of the existing tool set and how we can continue to use it to inform collection management and development in a sustainable way. A written case study will allow the rest of the community to benefit from our experience.
  • Proposed specific recommendations, requirements and aspirations to feed into the development road map for CCM, the Copac / RLUK metadata aggregation, and the overarching Discovery and Library Systems programmes.
  • Identified and articulated new use cases and solutions based on CCM that would be applicable to all research libraries.
  • Initiated practical collection activity at King’s and Senate House Libraries based on analysis of CCM output. This activity will comprise policy making as well as actions around acquisition, access, and marketing, and will provide improved collection services for staff and students at both institutions and beyond, as measured by activity data and user feedback.
Posted in Uncategorized | Leave a comment

Project budget

The budget for this project is £40,028 of which JISC are funding £30,434. Over 90% of the budget is allocated for staff costs for project members from King’s, Senate House Libraries, and RLUK. Some potential costs have been avoided, for example dissemination, by using existing channels provided by the project partners. The chart below provides a breakdown of the budget allocations.

Chart showing project budget allocations

Posted in Project plan | Leave a comment

CCM training day

Dr Shirley Cousins of MIMAS provided training to the Copac CCM tools project team on 25th July 2012.  Colleagues from King’s College London and Senate House Library were all present for what proved to be a very beneficial session in exploring the parameters and potential of the CCM tools.  Shirley ran through a number of case studies of work carried out using the CCM tools by the White Rose Consortium, as well as providing scenarios to consider their effectiveness and any wider implications.  Project staff had already experimented with the CCM tools prior to the training and this allowed for feedback on initial impressions of the tools.

Posted in Uncategorized | Leave a comment