Getting the data right 2 – ISBNs

For the most part we have found that using ISBNs with the CCM tool has produced excellent results. However, a couple of things about the way ISBNs are used in cataloguing and how Copac records work mean that searching and de-duplicating by ISBNs in CCM can occasionally be dangerous.

First, ISBNs for series are often recorded in the catalogue record for individual books in a series. So, for example, a search on CCM for ISBN 9025606199 (for the series Byzantinische Forschungen) returns 30 Copac records. Some of these records are for the series, but there are also records for 14 separate titles within this series. With ISBN de-duplication this is reduced to just one record. This could give the impression that a book that is in fact unique to your library is held at multiple places. In our opinion, therefore, unless you can be sure your list contains no series ISBNs, ISBN de-duplication should not be used for decisions such as withdrawals.

Secondly, it is common practice in cataloguing to record a print ISBN in an e-book record. Sometimes the 776 field is used, or the $z of the 020, but there are also a lot of e-book records in which the print ISBN is included in the 020$a (sometimes, indeed, the ISBN is shared by print and e-versions). But in cataloguing there are distinct records for the print version and e-version of a book, whereas this does not seem to be the case in Copac records. So, for example, a search for ISBN 9780230238978 returns just one record, even though the holdings for it are both print (at e.g. UCL) and electronic (at KCL). This could give the impression that a book that is in fact unique to your library in the print version is held elsewhere, whereas all the other holdings are for electronic versions. There is detailed information about the holdings in the full record, but for library staff to check these by batch would be very difficult, if not impossible. For withdrawal decisions the apparent merging of print and electronic resources on one record is problematic.

Posted in Uncategorized | Leave a comment

Shakespeare – some figures

In response to a parallel seach being performed at SHL, we searched for 4 sets of records in the King’s catalogue:

  • Shakespeare as author 1920 – 1929
  • Shakespeare as subject 1920 – 1929
  • Shakespeare as author 2000 – 2009
  • Shakespeare as subject 2000 – 2009

For collection comparison it is necessary to look also at the results from SHL, and directly searching the CCM interface would be interesting too. But the results of our own searches, comparing with the entire catalogue, are also worth recording: they highlight the issues surrounding compiling a file from the local catalogue (quality of local data is not always as high as we would like) and determining what the results from the tool really mean.

Shakespeare as author – 1920’s

The search of King’s catalogue produced 43 records.  We used CCM to compare our holding with Senate House.   Due to the use of “attributed author” some of the records were for music based on Shakespeare’s works.  Of the 43, 13 were apparently unique to King’s(and CCM gave the same result (13 unique records using multi-field deduplication level 1, 2, and 12 using level 3)  but checking by eye it seems that in many cases there were several other copies of the book held with only minor cataloguing discrepancies.  It was also immediately apparent that our 008 fields were incorrect in some cases and the 260 field had a much later date.

Shakespeare as subject – 1920’s

The search produced 77 records of which 76 matched with a record on COPAC (with Multi-field deduplication level 2) and 10 were unique to King’s.

Checking the 10 “unique” records (by eye) on COPAC suggests that only 2 were actually held only at King’s. 

Shakespeare as author – 2000’s

The search produced 545 records.  These matched with 542 records on CCM using ISBN deduplication.

However, excluding electronic versions of older works left 87 records.

We matched these 87 records using CCM.  Of the 87, 84 matched with a record on Copac using  ISBN deduplication and 85 matched using multi-field deduplication level 2.  18 were held at King’s only (apparently).

Shakespeare as subject – 2000’s

The search produced 410 records. Uploading to CCM produced 338 matches with ISBN deduplication. The “missing” 72 records are mainly accounted for by the ebook series Shakespeare Survey which has the ISBN of the print book and the ISBN of the series in each record.  There are a small number of other books where King’s has 2 records for the same ISBN.

Of the matched records 36 were held at King’s only. Looking at the King’s only records some are almost certainly cataloguing variants of the same work.  Hamlet by Zefferelli searched on Copac produces 18 video recordings, 1 videocassette, 2 DVDs, 1 video disc, 2 screen plays and a book.  Most of the 18 videorecordings seems to be the same 2000 edition held “only” by King’s.


Posted in Uncategorized | Leave a comment

RLUK Conference Pecha Kucha

Christine Wise, Associate Director, Historic Collections and Keeper of Special Collections at Senate House Libraries, and co-project manager, presented a Pecha Kucha session ‘Applying Collection Management Tools to Real-World collections’ at the RLUK Conference 2012 held in Newcastle earlier this month.

Posted in Uncategorized | Leave a comment

CCM for collection comparison – getting the data right

James Clark, Metadata Coordinator at King’s College London and co-project manager describes some practical findings from applying CCM to compare collections:

The first major piece of work we did at King’s was with our collection of books on military studies (class U in LCC). We found 6055 records that fitted our criteria, so this gave us a substantial sample to test CCM. One issue highlighted by other institutions who have worked with CCM is de-duplication when ISBNs are not available, and this is something that we found problematic too.

Initially we ran the file without de-duplication . 6055 records mapped to 5885 Copac records (the difference is explained because we have some duplicates on our catalogue and because some of our records haven’t made it into Copac) and 1505 records were identified as unique to King’s. When we ran the third level of de-duplication (author/title), with which we would expect most records to be de-duplicated, the number of records unique to King’s dropped to 1441.

As some of our records have ISBNs we can test the author/title de-duplication against ISBN de-duplication.  Of the original 1505 identified as unique, 531 had ISBNs. We ran these 531 records through ISBN de-duplication, which identified only 187 as unique. If this is any guide to the records without ISBNs (perhaps it is not!), then only a third of the 1505 records originally identified as unique would in fact be unique, and approximately 1000 would have matches. But author/title de-duplication only matched a further 64 records, reducing the number of apparently ‘unique’ records from 1505 to 1441.

The number of duplicates is no doubt a consequence of the variations in cataloguing that exist among the records of the contributing libraries. How problematic it is really depends on what CCM is being used for. For preservation purposes, you would want the tool to be very cautious when de-duplicating records, and then at least you could use the tool to get a minimum number of other holding libraries. Although this is not ideal, it would still allow a significant amount of stock to be released in the knowledge that it is held elsewhere. Even without any de-duplication, 1542 (about 25%) of our books in this sample were identified as held at more than ten other libraries. Because of this, it is possible for the matching to be effective for practical purposes even though it is not perfect. If your record matches with ten others, it does not matter that it does not match with an eleventh.

However, when comparing your collection with just one other library, any failure to match records will have a significant effect, giving the impression that the libraries’ holdings are more discrete than they actually are. On the other hand, when comparing only two libraries, duplicate records are likely to be much rarer. When we ran the same file of 6055 records through the tool and compared only with Senate House Libraries (SHL), we found matches for 1218 of our records. We were able to extract an ISBN for 2645 of the records that did not match and run these through CCM comparing with SHL. 127 of these matched with SHL giving us a failure-to-match rate for system number searches of just under 5%. In our opinion this rate is low enough for reasonable conclusions to be drawn about the relative strengths of the two collections.

Posted in Uncategorized | Leave a comment

What would success look like?

Since the Library Systems programme meeting held in Birmingham on 13th July, we have been considering what success would look like for this project. This project has several discrete strands of activity and we have identified key success criteria for each. By the end of the project we will have:

  • Successfully applied CCM to the King’s and Senate House Libraries systems and collection contexts, and gained an understanding of the practical strengths and limitations of the existing tool set and how we can continue to use it to inform collection management and development in a sustainable way. A written case study will allow the rest of the community to benefit from our experience.
  • Proposed specific recommendations, requirements and aspirations to feed into the development road map for CCM, the Copac / RLUK metadata aggregation, and the overarching Discovery and Library Systems programmes.
  • Identified and articulated new use cases and solutions based on CCM that would be applicable to all research libraries.
  • Initiated practical collection activity at King’s and Senate House Libraries based on analysis of CCM output. This activity will comprise policy making as well as actions around acquisition, access, and marketing, and will provide improved collection services for staff and students at both institutions and beyond, as measured by activity data and user feedback.
Posted in Uncategorized | Leave a comment

Project budget

The budget for this project is £40,028 of which JISC are funding £30,434. Over 90% of the budget is allocated for staff costs for project members from King’s, Senate House Libraries, and RLUK. Some potential costs have been avoided, for example dissemination, by using existing channels provided by the project partners. The chart below provides a breakdown of the budget allocations.

Chart showing project budget allocations

Posted in Project plan | Leave a comment

CCM training day

Dr Shirley Cousins of MIMAS provided training to the Copac CCM tools project team on 25th July 2012.  Colleagues from King’s College London and Senate House Library were all present for what proved to be a very beneficial session in exploring the parameters and potential of the CCM tools.  Shirley ran through a number of case studies of work carried out using the CCM tools by the White Rose Consortium, as well as providing scenarios to consider their effectiveness and any wider implications.  Project staff had already experimented with the CCM tools prior to the training and this allowed for feedback on initial impressions of the tools.

Posted in Uncategorized | Leave a comment

Workpackages and schedule

Workpackage 1: Project management and planning
Creation of the project team through secondment; establishment and first meeting of the project board; budget management; creation of project dissemination channels including a project blog.

Workpackage 2: Identifying and defining collection profiles
Identifying and defining the collection profiles for the project. Meetings of the project team, and workshops with Information Specialists, Academic Liaison Librarians, and academic research staff will build on existing activity at King’s College London (an ongoing College-wide comprehensive Collection review) and Senate House Libraries (the Collaborative Collection Development Initiative).

Workpackage 3: CCM  training, and sharing of experience
Technical orientation and training of the project team from Shirley Cousins, Copac Services Manager at Mimas, to provide hands-on access to the CCM interface, opportunities to run analyses on smaller batches of test data, and explore the existing potential and limitations of the CCM tools available in preparation for workpackage 5.

Workpackage 4: Creating collection lists and assuring quality data
Creation of the data sets required for CCM analysis. Lists of bibliographic items comprising the specific collections identified and defined in workpackage 2 will be selected and generated from King’s College London and Senate House Libraries Library Management Systems (LMS) and Copac.

Workpackage 5: CCM data entry, preliminary output analysis and parameter refinement
Applying CCMto the collection lists created in workpackage 4. It should be noted that this workpackage is inherently iterative and the project team will expect to cycle through identical or similar workflows several times. An additional key component of this workpackage is agile refinement of collection data, CCM output, and workflows.

Workpackage 6: Analysis of final CCM outputs
Analysis by the project team of the final CCM outputs, including quantitative data and visualisations. The outputs will allow direct comparison between the chosen collection profiles at King’s College London and Senate House Libraries, as well as other research libraries in the Copac aggregation.

Workpackage 7: From CCM to policy and practical collection management
This workpackage will assess the feasibility of translating CCM output and analysis to collection management policy and decision-making. The project board and project team will consider if CCM has enabled the creation of a sufficiently robust evidence-base for the collection profiles selected for the project, and to what extent that evidence can be used to inform policy making and inform practical collection management actions.

Workpackage 8: Recommendations for CCM and Copac development
This package will produce a set of recommendations for the future development roadmap of CCM based on practical experience accumulated throughout the project. Recommendations will cover the usability of the CCM web interface; the options and parameters available for existing tools; desirable new tools, functionality and features; and requirements around the quality and details of records in the Copac aggregation. A key consideration for recommendation will be how scalable the project finds CCM for above campus collaborative collection management activity, for example how CCM handles large batches of records across large numbers of collections.

Workpackage 9: Dissemination
This workpackage will run for the length of the project and will use a variety of communication channels to disseminate activity news, and report outputs to stakeholders. Members of the project board and project team will also make presentations at any relevant events including programme meetings and workshops. RLUK will assist with external communications and outreach.


Posted in Project plan | Leave a comment

Aims, objectives, and outputs

This project has two overarching aims:

The first is to contribute to the development of future library systems by identifying, informing and testing bibliographic metadata requirements, particularly with regard to collection management, shared services, and making community data work harder.

The second is to take a significant step forward in practical collaborative collection management activity between King’s and Senate House, and in so doing not only better support the resources needs of staff and students using libraries at these institutions, but provide a blueprint other UK HEI’s could follow or adapt.

Project objectives include:

  • To identify specific subject areas of King’s and Senate House Libraries collections where collaborative collection management would benefit research and learning needs
  • To apply CCM to those subject areas to explore existing CCM use cases as well as identify any new use cases
  • To create effective and efficient workflows for the application of CCM to real-world collection management
  • To test the capabilities of CCM and Copac metadata and feed into technological development of CCM, Copac and future library systems.

Overall, outputs will include:

  • a practical and relevant case study and feasibility study that builds upon the recent JISC funded CCM project and Discovery programme
  • practical solutions and documented workflows to enable real-world collection management using CCM
  • outcomes in collection policy and operations at King’s and Senate House Libraries, delivering new value for library customers at both institutions
  • recommendations for development of CCM
  • recommendations for the development of Copac and the RLUK bibliographic metadata aggregation
  • prepared ground for future collaborative partnerships in the JISC community
  • an assessment of options for sustainability and in particular project to business as usual
Posted in Project plan | Tagged | Leave a comment


Welcome to the project blog for the JISC funded King’s College London and Senate House Libraries Copac Collection Management project. The project will be running for six months from June to November 2012 and we’ll be using this blog to report on the project plan and keep you updated on our progress.

Posted in Uncategorized | Leave a comment