Warning: mysql_connect(): Access denied for user 'mysql_feedback'@'localhost' (using password: YES) in /usr/local/psa/home/vhosts/coin-c.com/httpdocs/CCTunes/inc_rate.php on line 18
I cannot connect to the database because: Access denied for user 'mysql_feedback'@'localhost' (using password: YES)

7. Naming Files

Introduction

During the development of CCTunes various options were implemented to name the files. The initial, and simplest, option was to just use the "Track ID" that is generated from iTunes to name the albums.

This had various drawbacks, the most one being that these id's are on a per track basis. So, from one generation to the next, it would mean the HTML files in your library would receive a different id. Also, as I move my mp3 files from one share to another, or from my iPod to my computer, quite oftenly, these track id's did not remain the same in my library.

In conclusion, I thought this method was not suited for the job. So, I implemented another one.

First Attempt - MD5 Sums

When you're a bit familiar with md5 sums, the first thing that comes to mind in situations like these is: md5-sums. They are a hash based on the data of the album, so it would mean that whenever something in the album data changed, the album id would also change. If you combine that with a version control system (Subversion), like I do, it immediately means that it's easy to follow the changes in your album collection.

There are drawbacks to this approach too. The md5 sums are quite meaningless and rather long. They can, on top of this, be the same for two different albums so a mechanism would be needed to give a warning when the ids of two different albums are the same.

So, this was not the method implemented in the current version. It is still present in the createId.command file in the package, after the exit, though, should you be interested in it.

Use what is there: cddb id's

But, there is a mechanism implemented to do something like md5 on albums for music, and it is a commonly accepted method to create an id like md5, but more suited for music albums. It's the CDDB mechanism. Not that it's free from criticism either, this one. One of the first things I encountered on google was a fairly harsh criticism on the way the hash sums are calculated.

It wasn't that it was easy to find the way it's calculated either. It's open source, but documented only by means of the source. And, most of the packages creating such an id did an immediate lookup on either freedb or gracenote servers and used the data directly read from the CD, using lead in times that did not seem like they could be retrieved from the ripped mp3 data.

In the end, I found a package generating an id from mp3 data, and it did not seem to need any information from the CD itself. I don't know if it will create correct id's in any case, but it looks like the closest I can get for now. I'll explain why I think it will never create correct id's.

Here is the script that is finally generating the id's.

Code listing 7.1: Generating freedb id's on a per album basis

#!/usr/bin/perl        
# Returns the disc ID as a string.

use POSIX;

sub cddb_sum
{
	my ($n, $ret) = (shift, 0);
	for (split //, $n) { $ret += $_ }
	return $ret;
}

sub cddb_discid
{
        # kvl this is where the party's at ... using get_mp3info higher on to populate cdtoc
	my @cdtoc	   = @_;
	my $n		   = 0;
	my $total_time = 0;
	
	foreach my $track (@cdtoc)
	{
		my $track_time = floor($track+.5);
                # print "track time: "; print $track_time; print "\n";
		$n			+= &cddb_sum($total_time);
		$total_time +=			 $track_time;
	}
	return sprintf("%08x", ($n % 0xFF) << 24 | $total_time << 8 | @cdtoc);
}

$calci = 0;
foreach $calced (@ARGV)
{
  $calcarr[$calci] = $ARGV[$calci] / 1000;
  $calci = $calci + 1;
}

print cddb_discid ( @calcarr );

print "\n";

Why the id's don't correspond

An explanation of how the cddb id's are calculated can be found at freedb.org. As can be seen there, the only thing that matters is the number of tracks and the track durations. There is a little program at http://jeremy.zawodny.com/c/discid/ that you can use to generate discid's, or to verify what is going on in our case. We need to get the information out of the mp3's, or out of the iTunes XML file. This turned out to be troublesome.

As an example, I will try to explain the calculation of Morrissey's Kill Unkle album. Not because it's particularly special, just because the CD was lying around at the time I tried these things, so it's good to set as an example.

We get timings from different sources. The iTunes GUI of course. Also from the iTunes XML file, which is where we would prefer to get the timings, this is after all XML manipulation based. But also from discid, which is reading the Table Of Content directly from the CD. I needed to manipulate discid a bit to get it to print the timings, but that was not too difficult to do. And last the timings from the freedb website itself.

Track Number Time in ms from iTunes XML Time in m:ss from freedb website Time in seconds from discid Time in m:ss from iTunes UI Cumulative ms from rounding down iTunes XML values Cumulative ms from rounding iTunes XML values
0 2
1 205061 3:25 205 3:25 61 61
2 201743 3:22 202 3:21 804 -196
3 209084 3:29 209 3:29 888 -112
4 212741 3:33 212 3:32 1629 -371
5 176666 2:57 177 2:56 2295 -705
6 119797 2:00 120 1:59 3092 -908
7 203363 3:23 203 3:23 3455 -545
8 334602 5:34 334 5:34 4057 -943
9 211800 3:32 212 3:31 4857 -1143
10 112431 1:52 113 1:52 5288 -712

In the first column we can see the entries from the XML file. It turns out this is stored in milliseconds, so divide them by 1000 to convert to seconds. The difference between these values and the values from the iTunes UI are immediately showing the strange uncorrelatedness of these figures: Track 2 seems to indicate we need to round down, track 4 seems to indicate the opposite. So no simple rule to go from one to another.

The second column shows the entries from the freedb website. They would seem to be correct nearest integer rounding, if it wasn't for track 8, which is all of a sudden rounded down even though it would seem to be necessary to round it to 335s instead of 334s. Also the relation with the discid timings is interesting, because discid is the only program tried to give me the correct ID. The CD table of content has a lead in, which is the time in seconds at which the first track starts. This is indicated in the table as the duration of track 0. Also here, there is no simple correlation between the values from the XML file and the values from discid: track 1 to 3 seem to indicate a rounding to the nearest int, but track 4 denies this rule by all of a sudden expecting a rounding down.

So, how to proceed? It could be an interesting idea to inspect the cumulative rounding values, I've included them in the last two columns for rounding down and rounding to the nearest integer. Neither of the two columns seem to be giving a key to the solution of the problem, but maybe I'm just missing things.

So, for now, we will stick to rounding to the nearest integer, which gives us a pretty close ID, but not an exact match. Any ideas on getting an exact match are welcome. For now, for the example we get a freedb id of 7907c40a instead of the expected 7307c30a. You will have to admit it's similar.

Todo: something about MP3Browser

Todo: url's for the cddb-id perl script.

line
Updated $LastChangedDate: 2005-01-07 20:53:38 +0100 (Fri, 07 Jan 2005) $
line
Kristof Van Landschoot
Author

line
Summary: Naming the files produced by CCTunes on a per album basis
line
Single-page Version
Copyright 2003-2004 Coin-C bvba. Questions, Comments, Corrections? Email cctunes@coin-c.com.