Tuesday, January 1, 2008

Music metadata: the insanity continues

Ok, I've bought my wife an audio book of Upton Sinclair's "The Jungle", and one of my daughters the audio book collection of the Lord of the Ring trilogy. And of course as I always do, I tried to immediately rip them to WMAs so they could be listened to from a variety of hardware I have at home.

And I immediately crashed full speed into the file metadata idiocy. Of course, CDDB does not know anything about Upton Sinclair. So I tried to edit it in WMP10. Apparently on their continuing quest of making the UI accessible to idiots, they've made it completely unusable. I managed to set the album name for the whole CD, but it would only set the artist per track. I could absolutely not figure out how to either make it forget my changes, or apply them to all the tracks on CD.

I ended up just ripping everything to Unknown Artist/Unknown Album XXX, then using my usual trick of creating the directory hierarchy Genre\Author\Album, and running a script to set the metadata in all the files to be derived from names of their ancestor directories.

Here's the script - for the occasion I rewrote it in Python:

import sys
import os

def _GetMetaDataFromDirName(dir_name):
all_dirs = dir_name.split('\\')
return (all_dirs[-3], all_dirs[-2], all_dirs[-1])


def main(argv):
if len(argv) < 2:
print 'Usage: python normalize_music.py dir_name'
print 'This will walk the directory tree that is presumed'
print 'to have *\\genre\\author\\album structure and set WMA'
print 'and MP3 file metadata accordingly'
return

base_dir = argv[1]
for root, dirs, files in os.walk(argv[1]):
for file in files:
full_file_name = os.path.join(root, file)
cmdline = None
if file.lower().endswith('.wma'):
(genre, author, album) = _GetMetaDataFromDirName(root)
if genre and author and album:
cmdline = ('c:\\bin\\meta.exe "WM/Genre=%s" "Artist=%s" '
'"Author=%s" "WM/Artist=%s" "WM/AlbumArtist=%s" '
'"WM/AlbumTitle=%s" "%s"' % (genre, author, author,
author, author, album,
full_file_name))
if file.lower().endswith('.mp3'):
(genre, author, album) = _GetMetaDataFromDirName(root)
if genre and author and album:
cmdline = ('c:\\bin\\meta.exe "WM/Genre=%s" '
'"Author=%s" "WM/AlbumArtist=%s" '
'"WM/AlbumTitle=%s" "%s"' % (genre, author, author,
album, full_file_name))
if cmdline:
print '\n\n'
print cmdline
os.system(cmdline)


if __name__ == '__main__':
main(sys.argv)


It uses the command line-based metadata-manipulation program I wrote a while ago (meta), available here: http://www.solyanik.com/drop/meta.zip. Note that you have to have both Windows Media Encoder 9, and .NET 2.0 installed to use it. But once you get all the dependencies, it's quite convenient to script dealing with metadata using it.

Ok, meanwhile, my daughter was ripping the "Fellowship of the Ring". When she was done, the folders were named differently - some "Fellowship of the Ring Disk X", some "The Fellowship of the Ring Disk Y", and all the file names inside the directories were different as well.

If only that was the worst! On closer examination I found that
(1) Disk 1 and Disk 16 both ripped into the same directory - "Fellowship of the Ring", thus disk 16 overwrote disk 1.
(2) Disk 14 ripped into the directory corresponding to disk 12, and overwrote these files as well.

So I had to re-rip these CDs, and run my script to reset the metadata correctly.

I think the person who invented media classification by metadata needs to be built a monument with the inscription "Spit here!".

1 comment:

Ilyak said...

Maybe then, you didn't try mp3 metadata in cyrillic.
There are myriads of ways to store cyrillic in ID3 tags and most of them are standard-compliant XOR widely used.

Not to say half of ours prefer To Have Titles This Way and another half don't.

P.S. Tools like grip do their job just fine writing correct metadata and laying out files correctly. Just use them to grab your discs. Of course, you still have to name artists and albums consistently.