stevepedwards.com/DebianAdmin linux mint IT admin tips info

An Intro to Metachar Search Methods – Wreck the MP3s – ♫ !!

Common Metach­ara­cters

^
[
.
$
{
*
(
\
+
)
|
?
<
>
The escape character is usually \

This Post evolved due to the masochistic desire improve my knowledge of regexs and metachars, possibly using them to tidy up MP3 album/song titles that can be horrendous looking on the command line, so a good place to practise regex ops.

ls PorcupineTree\ -\ Up\ the\ Downstair/Unknown\ Album\ \(e2118a0e\)\ -\ Track\

Space removal in file names is the first step, started and covered in the Find with -Exec Post, as there are different ways to do this; sed, tr, awk and rename for a start, examples of which have been covered in that Post, plus the PDF eBook Creation Post where it was necessary for the task to work.

cat file.txt

this has one or more spaces in it

sed 's/ //g' < file.txt

thishasoneormorespacesinit

cat file.txt | tr -d " "

thishasoneormorespacesinit

awk 'gsub(" ","")' < file.txt

thishasoneormorespacesinit

touch file\ name\ with\ spaces.txt

rename 's/ //g' file\ name\ with\ spaces.txt

filenamewithspaces.txt

Note the original file content is not altered by any of them – but rename DOES rename the file unless it already exists - that's it's job!

Important to understand for the addition of -exec rename … to the find command later.

I'm fascinated by the ability of linux to manipulate text the way it does, but filters/metachars are a massive and complex subject that needs to be studied and practised - on replaceable/unimportant files ideally - to get experience and an idea of the weirdness of it; my large MP3 collection being ideal for this due to the complex names, and as I have backups for the inevitable deletions that can occur when you just wanted a name change!

What follows are some ideas on ways to approach the topic.

First, set yourself a problem to solve that has probable future re-use benefits - start with the whitespace removal examples I have covered below.

This will always be useful to know on a Linux system, and gives a good starting position to alter commands from so build complexity, so understanding.

It's a good idea to know how many files and directories your user owns compared to those found (so you know you can change them) to start, so for my MP3 directory:

sudo find -type d | wc -l

559

find -type d -user stevee | wc -l

559

sudo find -type f | wc -l

7248

find -type f -user stevee | wc -l

7248

That is fine. But you need to work on a small sample first – this is far too many to keep track of. Also get rid on any none mp3 files like art if not necessary. This gets your brain in find command mode also.

There are many .jpg in here so they can just go:

find -name '*jpg' | wc -l

422

find -name '*jpg' -exec rm -v {} \;

Some Thumbs.db and desktop.ini files can go too…

Now there is

find -name '*' | wc -l

6992

Cd into one directory and work out as many generic regexs as you can in each for various file name types before applying them globally if you value your data – make SURE you have backups of it all!

The white space find and removal started with just the find part for either directories or files, but these have mostly already had white space removed by me before:

stevee@dellmint /Quadra/MP3 $ find -type d -name "*[[:blank:]]*"

./Steves Greatest Misses

stevee@dellmint /Quadra/MP3 $ find -type f -name "*[[:blank:]]*"

./All_Found_MP3s/Love Philter.mp3

./All_Found_MP3s/Lovely Brasilia.mp3

./All_Found_MP3s/Boo Bass Riff.mp3

./All_Found_MP3s/IVETE SANGALO & BABADO NOVO. Amor Perfeito (Axé) (Ao Vivo) ©GaliaBsAs.mp3….

Run the first command for directories only – from above you know what happens with rename if a non-white space file of the same name already exists:

find -type d -name "*[[:blank:]]*" -exec rename "s/ //g" {} \;

./Steves Greatest Misses not renamed: ./StevesGreatestMisses already exists

Now amend it for files:

find -type f -name "*[[:blank:]]*" -exec rename "s/ //g" {} \;

./All_Found_MP3s/Love Philter.mp3 not renamed: ./All_Found_MP3s/LovePhilter.mp3 already exists

./All_Found_MP3s/IVETE SANGALO & BABADO NOVO. Amor Perfeito (Axé) (Ao Vivo) ©GaliaBsAs.mp3 not renamed:….

As these already exist, they can be removed by changing the -exec rename for remove – note that rm -vr will recurse into that directory so remove any files also that DO NOT have white space – be careful! Use rm -v for directories ONLY unless sure you want that!

find -type d -name "*[[:blank:]]*" -exec rm -vr {} \;

removed ‘./Steves Greatest Misses/LovePhilterDance.mp3’

removed ‘./Steves Greatest Misses/Driving In Rain.mp3’

….

find -type f -name "*[[:blank:]]*" -exec rm -v {} \;

removed ‘./All_Found_MP3s/Love Philter.mp3’

removed ‘./IveteSangaloSonia/IVETE SANGALO & BABADO NOVO. Amor Perfeito (Axé) (Ao Vivo) ©GaliaBsAs.mp3’…..

Now on to the next step – say, tidying the files that contain chars in () brackets first e.g:

find -regex '.*[^-_./0-9a-zA-Z].*'

./MarinaElali/AlbumArt_{560D258D-8C82-455E-9968-4B33457C283A}_Large.jpg

./MarinaElali/04-MarinaElali-OneLastCry(Paginasdavidainternacional).mp3

./SugababesBest/UnknownAlbum(db0da20f)-Track08.mp3….

find -regex '.*[^-_./0-9a-zA-Z].*' | wc -l

2854

Alphanums inside () brackets is a bit too ambitious at first – try numbers inside brackets first. How?

What do I already know? From Shott's PDF, the metachars are:

Regular expression metacharacters usage consist of the following:

^ $ . [ ] { } - ? * + ( ) | \

How do you isolate them? The “()” don't appear explicitly in that regex as they are “special” shell recognised chars in themselves, so have appeared in file names as a by product of the surrounding catchall asterisk * expansion.

Experiment by simplifying that regex to just a numbers contained in “()” maybe?

find -regex '.*([0-9]).*' | wc -l

79

Analyse them by directing the list into a home dir file and read it with less to check if each entry has numbers within () and chars either side, but nothing else inside the () – it seems so:

find -regex '.*([0-9]).*' > ~/numberbrackets.txt

./All_Found_MP3s/11Track11(2).mp3

./All_Found_MP3s/36Track36(2).mp3

….

Can just these parts of the titles be removed?

From above, the SED global search and replace methods use a forward slash to delimit chars, as does the format for the rename cmd; white space removal was achieved by searching the found file name for ONE white space after the “s/”, then replacing it with NO char between the next two “/” delimiters, and to do this globally (g) for EVERY file found by find.

rename "s/ //g"

If the same logic is used to replace the numbers using a range [0-9] it may leave the actual () behind with nothing inside or not eh? Let's check on one file only first. Note the testing problem here – no DRY-RUN option that I know of or have researched for either find or exec..?

Let's try it on the last file in the list and gain insight using ls. This shows that using TAB completion does not expand the first ( after Track43, so needs to be escaped before it will TAB complete:

ls All_Found_MP3s/43Track43\(

43Track43(2).mp3 43Track43.mp3

This is a clue to removal for the rename string e.g. rename "s/\([0-9]\)//g"

As the “(” is a metachar, it needs to be escaped with a “\” to be read literally.

What this also highlights is that in prior copy operations (e.g. Windows merge), a file with the same name existed, so that is one reason for the current (2) version.

If so, I can't rename the file with the test, as it exists already.

find -name '*43Track43*'

./All_Found_MP3s/43Track43.mp3

./All_Found_MP3s/43Track43(2).mp3

It discerns both tracks with and without (), so now to add the rename regex part to show it works as required, but rename can't overwrite the existing file:

find -name '*43Track43*' -exec rename "s/\([0-9]\)//g" {} \;

./All_Found_MP3s/43Track43(2).mp3 not renamed: ./All_Found_MP3s/43Track43.mp3 already exists

Wow! It seems it would have worked exactly as required by removing the () also, but for the already existing file! Progress! All I can do here is substitute (2) for something “nicer” that tells me I did the change, such as an £ for clarity. First be sure there are no files already with a £ present:

find -name '*£*' | wc -l

0

find -name '*43Track43*' -exec rename "s/\([0-9]\)/£/g" {} \;

Great! Now I get a renamed track with the original unchanged:

find -name '*43Track43*'

./All_Found_MP3s/43Track43.mp3

./All_Found_MP3s/43Track43£.mp3

./All_Found_MP3s/43Track43(2).mp3

As above for the white space files, once renamed the original can be deleted now or later using this same regex now I know it works. Depends if you want to prove this example works en-masse or not later. If not, remove it now.

rm -v All_Found_MP3s/43Track43\(2\).mp3

The output can consist of much StdError, so it's difficult to see what's changed or not, which can be shown because wc -l shows no files output if no file changes occur:

find -regex '.*([0-9]).*' -exec rename "s/\([0-9]\)//g" {} \; | wc -l

./SteveStevens/SS-Track02(8).mp3 not renamed: ./SteveStevens/SS-Track02.mp3 already exists

./SteveStevens/SS-Track02(4).mp3 not renamed: ./SteveStevens/SS-Track02.mp3 already exists…

0

Proved if the screen output is sent to /dev/null there is no StdOut:

find -regex '.*([0-9]).*' -exec rename "s/\([0-9]\)//g" {} \; 2> /dev/null

If confident of your results and you do the changes globally, then the deletes globally, and all has gone well, then the files changed will equal the files deleted and your total file count will be the same.

If you really persisted you may eventually create a regex that renames many of your awful file names as you like, to a generic, tidier format within the bounds of sanity at least. If so save it and publish it to the web for all to use! There is a script example for MP3s in Server Hacks 1, but I'm not typing it here..!

So what would be next in the horrendous looking category? Those names with long strings inside {} probably – all of these are jpgs actually, but for the exercise:

find -name '*{*'

./AlbumArt_{866F7EC7-163A-4791-BB41-8081A3C9DF2B£_Small.jpg

./AlbumArt_{866F7EC7-163A-4791-BB41-8081A3C9DF2B£_Large.jpg

You can fight with that one...but as an aside – here's a great little music char that popped up during this:

Try you backup drive with a find for that! Unless you've been to Brasil and know Ivette Sangalo...

find -name *♫*

./IveteSangalo-MTVAoVivo/PERERÊlyrics(IveteSangalo)♫_files
./IveteSangalo-MTVAoVivo/PERERÊlyrics(IveteSangalo)♫_files/PERERÊlyrics(IveteSangalo)♫.htm

Comments are closed.

Post Navigation