Full-Text search of Early Music Prints Online (F-TEMPO)

About F-TEMPO ↓

F-TEMPO is a publicly-accessible, online resource offering, for the first time, flexible content-based, full-text searching of a large and growing collection of digital images of renaissance and early baroque music distributed amongst the world's music libraries. It has an easy-to-use online interface, and an API allowing flexible and extensible uses of the resource in an indefinite number of ways. This opens the musical and other information contained within the resource to be exposed as useable knowledge by digital humanists and others. As the resource grows over time to approach a million pages, F-TEMPO will offer a unique perspective on the musical, social, commercial and political history of the early modern period accessible to scholars for the first time.

The F-TEMPO project, funded by the British Academy and JISC through the Digital Humanities Research in the Humanities scheme, is being carried out at Goldsmiths, University of London. The Principal Investigator is Prof. Tim Crawford (Dept of Computing)

Email: t.crawford@gold.ac.uk

Objectives ↓

With the aid of the British Library and international music libraries, we will add already- digitised images to greatly expand ‘Early Music Online’ (EMO). EMO originated in a JISC Rapid Digitisation project in 2011; about 300 books of typeset music printed before 1600 were then catalogued and their page-images digitised from microfilms and are now freely available to the public. Access to the collection is provided by standard bibliographical metadata search; there is no facility for content-based exploration.

Among this extra music will be complete original published works of influential composers such as Marenzio and Monteverdi, and representative amounts by Josquin, Lassus, Palestrina, da Rore, etc. Through this resource, enriched with metadata, we could explore for the first time, for example, the networks of influence, distribution and fashion, and the effects on these of political, religious and social change over time, as represented in the output of the burgeoning 16th-century music publishing industry

Full-text search ↓

Full-text searching is an absolute necessity for ‘distant reading’, the basis of much digital humanities research. The current limited availability of digital scores means musicologists cannot use full-text searches into the music of the past. F-TEMPO enhances and extends our proof-of-concept interface, providing research-quality full-text search of material of great interest to historical musicologists, other scholars and general users. We plan a system closely analogous for music to Google Books, which extracts from scanned page-images their underlying textual content, which is then indexed and searched efficiently using the latest techniques of computer science

Context ↓

Music exists in a variety of forms on the web, typically as printed or manuscript documents or as audio files. Whatever the format, it cannot be searched directly, unlike text. It requires processing to extract ‘features’ which may then be investigated using the techniques of digital scholarship. However, the amount of data generated by feature-extraction from either audio analysis or digital scores means full-text, content-based searching of anything approaching a large-scale musical resource tends to be slow and inefficient

Vast amounts of music, mostly audio tracks, are now available using services such as Spotify, iTunes or YouTube. Music Information Retrieval (MIR) has mostly focused on audio material to make discovery and retrieval feasible from the Internet, with an emphasis from the music industry on the requirements of their paying customers

Music in graphic form is also available online as PDF files rendering page-images of either original musical documents or modern, computer-generated music notation. Such resources are a surrogate for traditional paper-based books used in traditional musicology, but offer few advantages beyond convenience. They don’t give the facility of full-text search, unlike the text-based and numerical materials which are increasingly the subject of ‘distant reading’ investigations in the digital humanities

Optical Music Recognition ↓

For good-quality score images, there are Optical Music Recognition (OMR) programs which sometimes produce useful scores from printed music of simple texture; however, in general, OMR output contains errors due to misrecognised or missed symbols. The results often amount to musical gibberish, which severely limits the usefulness of OMR for creating large digital score collections from score images

Our OMR program is Aruspix, which is highly reliable on good images from EMO, even though they have been digitised from microfilm.

Although OMR is far from perfect, users will often be happy using the methods of computer science on large collections containing noise. This is the principle behind searches in Google Books, based on Optical Character Recognition (OCR). Another online resource which has inspired F-TEMPO, this time based on images of Japanese woodblock prints, is Ukiyo-e Search, which permits instantaneous identification of even a poor-quality image of a print taken with a mobile phone

SIMSSA is using OMR and MIR to work towards a very large virtual and distributed collection of music accessible in the way we envisage for musicologists and all types of other musicians. As Associate Partners in SIMSSA F-TEMPO should be seen as a contribution to this international effort

Indexing and Search ↓

Our approach, inspired equally by Google Books, Ukiyo-e Search, Peachnote and SIMSSA, but limited to the large repertory of early printed music of the 16th and early 17th centuries, uses state-of-the-art, scalable retrieval methods. This currently provides rapid searches of over 40,000 page-images for those similar to a query-page in less than a second. It successfully recovers matches when the query page is not complete, e.g. when page-breaks occur differently in the various editions of a piece. Also, close non-identical matches, as between voice-parts of a polyphonic work in imitative style, are highly ranked in results; similarly, different works based on the same musical content (as in different sections) are usually well-matched

From the OMR output, we extract diatonic pitch-interval strings, which are robust to several types of OMR error involving wrong clefs, key-signatures and accidentals, for each page. From these we derive sets of features recently developed for bioinformatics analysis and retrieval, Minimal Absent Words. These are used as an index for fast and scalable search and retrieval.

Research questions ↓

  • ‘How did musical popularity change over time?’
    Musical fashions naturally evolve, so we shall carry out empirical research on the enduring popularity of some works by early masters, some of which survived for over a century. Such investigations have only been carried out ‘manually’ by piecemeal research, and some ‘hits’ may have been unfairly overlooked.
  • ‘How well is the relative popularity of early-modern music reflected in modern recordings since the 1950s?’
    We shall answer this question by using Linked Data to match works from the renaissance period with the available modern recordings.
  • ‘How different is church music from madrigals?’
    If a motet or mass movement is the query, high non-relevant matches tend also to be sacred pieces, and madrigals, etc., return other secular music. This is a phenomenon of style that has not, as far as we know, been investigated at any scale.
  • ‘How many arrangements are there?’
    In early testing we identified an instrumental ricercar as a wordless transcription of a Latin motet, hitherto unknown to musicology. As the collection grows, we shall find more such unexpected concordances, and will identify works labelled in some printed sources as by ‘Incertus’ (Uncertain composer); we have found a few already!

People ↓

Principal Investigator

Tim Crawford (Goldsmiths, University of London, Department of Computing)

Technical advice, back-end programming and web hosting

Laurent Pugin and Rodolfo Zitellini (Répértoire Internationale des Sources Musicales, Swiss Office)

Collaborators

Amelie Roper (British Library Music Department)

Philippe Vendrix (President, University of Tours, France)

Other participants and advisors

Jamie Forth and Golnaz Badkobeh (Goldsmiths, Computing Department), David Lewis (Oxford & Goldsmiths) and Dr Kevin Page (Oxford)