|
7. Naming Files
Introduction
During the development of CCTunes various options were implemented
to name the files. The initial, and simplest, option was to just
use the "Track ID" that is generated from iTunes to name the albums.
This had various drawbacks, the most one being that these id's are on
a per track basis. So, from one generation to the next, it would
mean the HTML files in your library would receive a different id.
Also, as I move my mp3 files from one share to another, or from my
iPod to my computer, quite oftenly, these track id's did not remain
the same in my library.
In conclusion, I thought this method was not suited for the job. So,
I implemented another one.
First Attempt - MD5 Sums
When you're a bit familiar with md5 sums, the first thing that comes
to mind in situations like these is: md5-sums. They are a hash based
on the data of the album, so it would mean that whenever something in
the album data changed, the album id would also change. If you combine
that with a version control system (Subversion), like I do, it immediately
means that it's easy to follow the changes in your album collection.
There are drawbacks to this approach too. The md5 sums are quite meaningless
and rather long. They can, on top of this, be the same for two different
albums so a mechanism would be needed to give a warning when the ids of two
different albums are the same.
So, this was not the method implemented in the current version. It is still
present in the createId.command file in the package, after the exit,
though, should you be interested in it.
Use what is there: cddb id's
But, there is a mechanism implemented to do something like md5 on albums for
music, and it is a commonly accepted method to create an id like md5, but
more suited for music albums. It's the CDDB mechanism. Not that it's
free from criticism either, this one. One of the first things I encountered
on google was a fairly harsh criticism on the way the hash sums are calculated.
It wasn't that it was easy to find the way it's calculated either. It's open
source, but documented only by means of the source. And, most of the packages
creating such an id did an immediate lookup on either freedb or gracenote servers
and used the data directly read from the CD, using lead in times that did not seem
like they could be retrieved from the ripped mp3 data.
In the end, I found a package generating an id from mp3 data, and it did not seem
to need any information from the CD itself. I don't know if it will create correct
id's in any case, but it looks like the closest I can get for now. I'll explain
why I think it will never create correct id's.
Here is the script that is finally generating the id's.
Code listing 7.1: Generating freedb id's on a per album basis |
#!/usr/bin/perl
# Returns the disc ID as a string.
use POSIX;
sub cddb_sum
{
my ($n, $ret) = (shift, 0);
for (split //, $n) { $ret += $_ }
return $ret;
}
sub cddb_discid
{
# kvl this is where the party's at ... using get_mp3info higher on to populate cdtoc
my @cdtoc = @_;
my $n = 0;
my $total_time = 0;
foreach my $track (@cdtoc)
{
my $track_time = floor($track+.5);
# print "track time: "; print $track_time; print "\n";
$n += &cddb_sum($total_time);
$total_time += $track_time;
}
return sprintf("%08x", ($n % 0xFF) << 24 | $total_time << 8 | @cdtoc);
}
$calci = 0;
foreach $calced (@ARGV)
{
$calcarr[$calci] = $ARGV[$calci] / 1000;
$calci = $calci + 1;
}
print cddb_discid ( @calcarr );
print "\n";
|
Why the id's don't correspond
An explanation of how the cddb id's are calculated can be found at freedb.org.
As can be seen there, the only thing that matters is the number of tracks and
the track durations. There is a little program at
http://jeremy.zawodny.com/c/discid/ that you can use to generate discid's,
or to verify what is going on in our case. We need to get the information out
of the mp3's, or out of the iTunes XML file. This turned out to be troublesome.
As an example, I will try to explain the calculation of Morrissey's Kill Unkle
album. Not because it's particularly special, just because the CD was lying around
at the time I tried these things, so it's good to set as an example.
We get timings from different sources. The iTunes GUI of course. Also from
the iTunes XML file, which is where we would prefer to get the timings, this
is after all XML manipulation based. But also from discid, which is reading
the Table Of Content directly from the CD. I needed to manipulate discid a
bit to get it to print the timings, but that was not too difficult to do. And
last the
timings from the freedb website itself.
| Track Number |
Time in ms from iTunes XML |
Time in m:ss from freedb website |
Time in seconds from discid |
Time in m:ss from iTunes UI |
Cumulative ms from rounding down iTunes XML values |
Cumulative ms from rounding iTunes XML values |
| 0 |
|
|
2 |
|
|
|
| 1 |
205061 |
3:25 |
205 |
3:25 |
61 |
61 |
| 2 |
201743 |
3:22 |
202 |
3:21 |
804 |
-196 |
| 3 |
209084 |
3:29 |
209 |
3:29 |
888 |
-112 |
| 4 |
212741 |
3:33 |
212 |
3:32 |
1629 |
-371 |
| 5 |
176666 |
2:57 |
177 |
2:56 |
2295 |
-705 |
| 6 |
119797 |
2:00 |
120 |
1:59 |
3092 |
-908 |
| 7 |
203363 |
3:23 |
203 |
3:23 |
3455 |
-545 |
| 8 |
334602 |
5:34 |
334 |
5:34 |
4057 |
-943 |
| 9 |
211800 |
3:32 |
212 |
3:31 |
4857 |
-1143 |
| 10 |
112431 |
1:52 |
113 |
1:52 |
5288 |
-712 |
In the first column we can see the entries from the XML file. It turns out this
is stored in milliseconds, so divide them by 1000 to convert to seconds. The
difference between these values and the values from the iTunes UI are immediately
showing the strange uncorrelatedness of these figures: Track 2 seems to indicate
we need to round down, track 4 seems to indicate the opposite. So no simple rule
to go from one to another.
The second column shows the entries from the freedb website. They would seem to
be correct nearest integer rounding, if it wasn't for track 8, which is all of a sudden rounded
down even though it would seem to be necessary to round it to 335s instead of
334s. Also the relation with the discid timings is interesting, because discid
is the only program tried to give me the correct ID. The CD
table of content has a lead in, which is the time in seconds at which the first
track starts. This is indicated in the table as the duration of track 0. Also
here, there is no simple correlation between the values from the XML file
and the values from discid: track 1 to 3 seem to indicate a rounding to
the nearest int, but track 4 denies this rule by all of a sudden expecting a
rounding down.
So, how to proceed? It could be an interesting idea to inspect the cumulative rounding values,
I've included them in the last two columns for rounding down and rounding to the
nearest integer. Neither of the two columns seem to be giving a key to the solution
of the problem, but maybe I'm just missing things.
So, for now, we will stick to rounding to the nearest integer, which gives us
a pretty close ID, but not an exact match. Any ideas on getting an exact
match are welcome. For now, for the example we get a freedb id of 7907c40a
instead of the expected 7307c30a. You will have to admit it's similar.
Todo: something about MP3Browser
Todo: url's for the cddb-id perl script.
|