Skip to main content

skip to main content

developerWorks  >  XML  >

Thinking XML: Manage metadata with MusicBrainz

Digital media metadatabase uses RDF

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Introductory

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.

01 Dec 2002

Since its emergence in the mid-1980s, digital music has seen plenty of controversy, and even the management of digital music metadata has been subject to its own share of drama. But sometimes out of political dust-ups, good technical solutions emerge. In this article, Uche Ogbuji introduces MusicBrainz, a project for managing digital media metadata. MusicBrainz uses RDF in its core data formats and, in so doing, offers some important technical advantages over its predecessors.

Digital music has continued to be one of the big stories of the information age, important because of the convenience it affords music lovers, and the business opportunities it has opened up for high-tech companies. You can put dozens or hundreds of albums in digital storage, and catalog this music in any way you like. Since so much music these days is sold in the form of CDs, countless tools exist for gathering information on the artists and tracks to be maintained, or tagged, in the resulting digital formats (mp3, Ogg Vorbis, and so forth) This information is the common metadata of digital music.

In the early '90s, the Internet Compact Disc Database (CDDB) was born as a distributed database that matched CD characteristics to metadata. It grew rapidly through the efforts of many casual users, who contributed information on their CDs, assuming the system and software for CDDB was open and free. In a very controversial move, a commercial interest now known as Gracenote imposed licensing restrictions on CDDB, prompting the development of several truly open alternatives. freedb.org and MusicBrainz are the most prominent of these initiatives. The former continues to use the CDDB format for its database, whereas MusicBrainz made a fresh start and completely revamped their digital music metadata format and system. They chose RDF to play an important role in this effort.

MusicBrainz aims to be a metadatabase of digital audio and video that covers more than just CD track information. It's billed as an "open music encyclopedia." The openness is ensured by an explicit OpenContent license that's assigned to all MusicBrainz information. It is decentralized and ties together information at multiple Web locations. The server software is all readily available as open source. Currently there is information on about a million tracks. The basis of this data in RDF gives the service some unique advantages. First of all, each track, and all the other important concepts, have unique identifiers available in the form of URIs. With the URIs, a universal playlist can exist. This playlist can be published in compact forms and uniquely identifies a particular sequence of songs. CDDB does not have such global identifiers. MusicBrainz also defines RDF vocabularies for querying the encyclopedia.

Name that tune

The RDF subsystem of MusicBrainz is defined in the MusicBrainz Metadata Initiative 2.0 specification, which defines RDF for encyclopedia entries and for queries. MusicBrainz defines several base URIs (which they call namespaces) for the different (though related) RDF vocabularies it provides.

  • http://musicbrainz.org/mm/mm-2.0#: MusicBrainz Metadata namespace, usually associated with the prefix mm.
  • http://musicbrainz.org/mm/cdmp-1.0#: Compact Disc Lookup namespace, usually associated with the prefix cdmp.
  • http://musicbrainz.org/mm/mq-1.0#: MusicBrainz Query namespace, usually associated with the prefix mq.
  • http://musicbrainz.org/mm/mem-1.0#: MusicBrainz Extended Metadata namespace, usually associated with the prefix mem.

Let's concentrate on the mm and cdmp namespaces, as these are the most complete. mem is set up for extensions and refinements that are not yet in use. mq will probably become the focus of immediate activity in the project, but is not yet fully in place.

The MusicBrainz Metadata namespace covers core music metadata, using the following classes:

  • Artist: includes properties for the common name, and the name to be used for sorting (for example, "The Roots" could be sorted as "Roots, The"), as well as an RDF bag of the artists' albums.
  • Album: includes the dc:title property for the album title, as well as relationships to the artist and to an RDF sequence with the track listing.
  • Track: includes properties for track title, the creator, and the track number in the album.

MusicBrainz uses Dublin Core metadata elements wherever they make sense. As I discussed in the previous article, this allows MusicBrainz metadata to be somewhat accessible even to generic RDF agents.

Tracks are also given a property to connect them to their TRM Acoustic Fingerprint. TRM is a technology developed by Relatable, LLC as a unique bar code for digital media. Each TRM ID is a universally unique identifier (UUID). For example, the TRM for "Mellow My Man" by The Roots is f13069e3-da60-4782-82dd-a9f375e5c374. This information may optionally be used in digital rights management (DRM), though MusicBrainz is neutral on DRM issues.

Listing 1 is an example of a MusicBrainz Metadata record.


Listing 1. Snapshot from a music metadata example
	<rdf:RDF xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc  = "http://purl.org/dc/elements/1.1/"
         xmlns:mm  = "http://musicbrainz.org/mm/mm-2.0#">

  <mm:Artist rdf:about=
"http://musicbrainz.org/artist/8f6bd1e4-fbe1-4f50-aa9b-94c450ec0f11">
    <dc:title>Portishead</dc:title>
    <mm:sortName>Portishead</mm:sortName>
    <mm:albumList>
      <rdf:Bag>
        <rdf:li rdf:resource=
"http://musicbrainz.org/album/911e3f30-192e-4c3d-aa25-2a89d4202a3e"/>
        <rdf:li rdf:resource=
"http://musicbrainz.org/album/3677c7a6-03a6-4709-a7aa-edaea95ce473"/>
      </rdf:Bag>
    </mm:albumList>
  </mm:Artist>

  <mm:Album rdf:about=
"http://musicbrainz.org/album/911e3f30-192e-4c3d-aa25-2a89d4202a3e">
    <dc:title>Dummy</dc:title>
    <dc:creator rdf:resource=
"http://musicbrainz.org/artist/8f6bd1e4-fbe1-4f50-aa9b-94c450ec0f11"/>
    <mm:trackList>
      <rdf:Seq>
        <rdf:li rdf:resource=
"http://musicbrainz.org/track/8facb8ab-0b31-4d06-907f-0a9c9a72383c"/>
        <rdf:li rdf:resource=
"http://musicbrainz.org/track/44d90dca-5290-4cb3-af38-518818835f23"/>
<!--
Rest of the tracks snipped for brevity...
-->
      </rdf:Seq>
    </mm:trackList>
  </mm:Album>

  <mm:Album rdf:about=
"http://musicbrainz.org/album/3677c7a6-03a6-4709-a7aa-edaea95ce473">
    <dc:title>Roseland NYC Live</dc:title>
    <dc:creator rdf:resource=
"http://musicbrainz.org/artist/8f6bd1e4-fbe1-4f50-aa9b-94c450ec0f11"/>
    <mm:trackList>
      <rdf:Seq>
        <rdf:li rdf:resource=
"http://musicbrainz.org/track/1cf34447-7731-40a4-a2ba-347866a13c44"/>
        <rdf:li rdf:resource=
"http://musicbrainz.org/track/f71a27a7-4845-463c-9c67-ffb96a6b5a8f"/>
<!--
Rest of the tracks snipped for brevity...
-->
      </rdf:Seq>
    </mm:trackList>
  </mm:Album>

</rdf:RDF>

The album list is a bag because order is not pertinent. The track list is a sequence in order to preserve the track order. This is a bit redundant because each track already has a property with its track number.



Back to top


Querying for CD information

MusicBrainz also defines a query service for CD metadata: the Compact Disc Metadata Proposal (CDMP). The protocol is very simple. You can HTTP POST an RDF query document to a MusicBrainz server, and get a response back in MusicBrainz metadata form similar to that in Listing 1, but with CDMP wrapper elements. You can also use an HTTP GET with some special query parameters. The most common scenario for CDMP is the case where a user puts a CD into a computer, and the CD player application fires up. It then reads the CD to determine the offsets of each track, which in many cases can be used to uniquely identify the CD. It sends these offsets to the MusicBrainz server in order to get the CD and track information of the CD that matches the offset data. Listing 2 is an example of such a query.


Listing 2. Sample query for CD and track information
<rdf:RDF xmlns:rdf  = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:dc   = "http://purl.org/dc/elements/1.1/"
     xmlns:cdmp = "http://musicbrainz.org/mm/cdmp-1.0#"
     xmlns:mm   = "http://musicbrainz.org/mm/mm-2.0#">

 <cdmp:LookupCD>
  <cdmp:offsets>150-17895-34567-51432-68025-87365-106380-123452-140620-157792-175650
  </cdmp:offsets>
 </cdmp:LookupCD>

</rdf:RDF>
        

In effect, this is a query consisting of an RDF object with the properties as query parameters. This is a common approach to representing queries in RDF, although it gets unwieldy as queries get more complex. Luckily, most MusicBrainz queries are pretty simple. Listing 3 is a sample response to the query in Listing 2.


Listing 3. Sample response from CDMP lookup
<rdf:RDF xmlns:rdf  = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc   = "http://purl.org/dc/elements/1.1/"
         xmlns:cdmp = "http://musicbrainz.org/mm/cdmp-1.0#"
         xmlns:mm   = "http://musicbrainz.org/mm/mm-2.0#">
<cdmp:ResultCD>
  <cdmp:cd>
    <cdmp:CDMetadata>
      <dc:title>Rubycon</dc:title>
      <cdmp:cdmpId>ivDFb2Tw6HzN.XdYZFj5zr1Q9EY-</cdmp:cdmpId>
      <mm:Artist>
         <rdf:Description>
            <dc:title>Tangerine Dream</dc:title>
         </rdf:Description>
      </mm:Artist>
      <mm:trackList>
        <rdf:Seq>
          <rdf:li>
            <mm:Track>
               <dc:title>Rubycon (Part I)</dc:title>
               <mm:trackNum>1</mm:trackNum>
            </mm:Track>
          </rdf:li>
          <rdf:li>
            <mm:Track>
               <dc:title>Rubycon (Part II)</dc:title>
               <mm:trackNum>2</mm:trackNum>
            </mm:Track>
         </rdf:li>
        </rdf:Seq>
      </mm:trackList>
    </cdmp:CDMetadata>
  </cdmp:cd>
</cdmp:ResultCD>
</rdf:RDF>
        

The response is pretty much the MusicBrainz Metadata namespace format in CDMP wrapper classes. One advantage over CDDB is that multiple CD results can be returned from such a query, to deal with possible collisions between track offset details for different CDs.

You can also make CDMP queries to search for CDs by exact or partial matches of title, artist, or other data. And CDMP users can submit new CD information. Usually, if your CD player does a lookup and cannot find the matching CD information, the software does allow you to manually enter the track data. You can then submit this data as a contribution to MusicBrainz. MusicBrainz has a moderation system in place to minimize abuse and unintentional errors in submissions. This is important, as demonstrated by recent cases where CDDB data was tainted by gag entries using foul language. Most bad data is much less egregious, and is more often a result of typos, transposed tracks, and the like. MusicBrainz allows users to edit entries after initial submission, subject to moderation.

CDMP was originally designed for cooperation with other open CD lookup systems, but such collaboration has been less healthy than hoped, so CDMP might be replaced with mp namespace queries, which are more focused on the general MusicBrainz encyclopedia concept.



Back to top


Conclusion

MusicBrainz is important on several levels. For one thing, it demonstrates the power of communities dedicated to open technologies. They can often route efficiently around the damage caused by unscrupulous commercial interests. MusicBrainz was born when CDDB moved to a restrictive license, and the developers took the opportunity to redesign the CD information system to be more flexible, to have more features, and to support broader types of information. Users have contributed a huge amount of data to support the effort, and the database is a great public asset.

The use of RDF in MusicBrainz means that it can be readily integrated into other metadata initiatives. There are a few awkward things about the RDF forms. For one thing, they inherit all the awkwardness that comes with RDF 1.0 containers, and this clumsiness is combined with the fact that container relationships are sometimes used in addition to other relationships which need to be properly synchronized. As an example, the ordering in the sequence specified by mm:trackList is redundant against the equivalent mm:trackNum properties. Despite these small technical flaws, the data ends up being very clean and readily usable by a lot of tools. For example, MusicBrainz borrows decent internationalization from basic XML capabilities, as opposed to the murky status of internationalization in the original CDDB. And even those who are not familiar with RDF can take advantage of this because of the open-source client libraries available for MusicBrainz. If you develop any applications for handling digital media, consider using MusicBrainz formats and protocols for metadata.



Resources



About the author

Photo of Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top