I was asked for help with removing tabs from source code in a CVS repository. Simple I say. Well it was, with a couple of caveats.
WARNING: those with a weak bladder or IBS, start with single sample rcs files, then move up and work on a copy of your CVS repository before you try this at home!
With a little write, test and review I soon learnt
- they only wanted to address their C code (.h and .c) – and for that regex’s are so handy.
- obviously not their makefiles (those tabs are kind of important)! Always worth rebuilding / testing code once done.
- we’d better not touch any RCS files marked as binary either.
- we’ll not touch those cvsroot/CVSROOT admin files, no sources here anyway (nor should we mess around)
- script works best as root
- be careful with content that is rcs structure (we are touching source content only)
Here’s the code to modify a single rcs file or a complete repository. Actually I also realised that some people are slow to checkin, so I wrote another version to handle checked out files in home directories too. It’s basically the same but with no RCS tree structure to worry about. Again, be careful, don’t trash a groups checked out work, you’ll not be popular!
#!/usr/bin/perl
# NOTES:
# give it a directory or a single file
# won't touch anything with dir pattern 'cvsroot/CVSROOT'
# only .c & .h (won't touch make files (we need those tabs))
# try running with >& to capture the log, then tail -f on log file
# or |& tee /usr/tmp/mylog.out
sub basename {
my( $path, $ext ) = @_;
my( $name );
( $name = $path ) =~ s#.*/##;
if ( ! $ext ) {
$name =~ s#\..*$##;
}
$name;
}
sub useTmpFile {
my ($newFile, $File ) = @_;
$gid = 8096; #cvsadmin
$uid = 0; #root
$mode = 0444; # -r--r--r--
# move tmp file to live file
system ("mv $newFile $File > /dev/null 2>&1");
if ( $? != 0 ) {
warn "Couldn't mv $newFile $File";
}
$chmodcnt = chmod $mode, $File; # -r--r--r--
if ( $chmodcnt != 1 ) {
warn "Couldn't chmod mode=$mode file=$File"
}
#MAKE FILE owned ROOT.CVSADMIN
$chowncnt = chown $uid, $gid, $File; # `chown root.cvsadmin $f`;
if ( $chowncnt != 1 ) {
warn "Couldn't chown uid=$uid gid=$gid file=$File"
}
return;
}
sub convTabToSpaces {
my ($tabLine) = $_;
my ($idx);
my $tab = ' ';
my $replace, $idx, $spaces;
# index to 1st tab
# index STR,SUBSTR,POSITION
$idx = index ($tabLine, $tab , 0); #start pos of 1st tab
while ( $idx != -1 ) {
$mod = $idx%8;
if ( $mod != 0 ) {
$spaces = ((int($idx/8)+1) * 8 ) - $idx;
} else {
$spaces = 8;
}
if ( $spaces == 1 ) {$replace = ' ';}
if ( $spaces == 2 ) {$replace = ' ';}
if ( $spaces == 3 ) {$replace = ' ';}
if ( $spaces == 4 ) {$replace = ' ';}
if ( $spaces == 5 ) {$replace = ' ';}
if ( $spaces == 6 ) {$replace = ' ';}
if ( $spaces == 7 ) {$replace = ' ';}
if ( $spaces == 8 ) {$replace = ' ';}
($cnt) = $tabLine =~ s/\t/$replace/; # replace the first tab
$idx = index ($tabLine, $tab , $idx);
} # while
$tabLine;
}
sub detab {
my ($file) = @_;
open (IN, "<$file") || die "Can't open $file\n";
$out = &basename($file, 1);
open (OUT, ">/usr/tmp/$out") || die "Can't open $out\n";
$datasect = 0;
$changedFile = 0; # file modified flag
while ( <IN> ) {
# handle text\n@ or text\n@@ -> we're modifying text not rcs structure
if ( $_ =~ /^text$/ ) {
print ( OUT "$_" );
$tmp = <IN>;
#print ( OUT $tmp );
if ( $tmp =~ /^\@\@$/ ) {
$datasect = 0;
print ( OUT $tmp );
next;
} else {
$datasect = 1;
$_ = $tmp;
}
} else {
if ( ( $_ =~ /^\@\@$/ || $_ =~ /^\@$/ ) && $datasect == 1) {
print ( OUT "$_" );
$datasect = 0;
next;
}
}
if ( $datasect == 1 ) {
$replace = ' '; # 8 spaces
$tmpLine = $_;
($cnt) = $tmpLine =~ s/\t/$replace/g;
if ( $cnt ne "" ) {
$changedFile = 1;
$detabLine = convTabToSpaces($_);
print ( OUT $detabLine );
} else {
print ( OUT $_ );
}
} else {
# not datasect so put out untouched
print ( OUT "$_" );
}
} #input
close(IN);
close(OUT);
if ( $changedFile == 1 ) {
$main::fileCnt++;
useTmpFile ("/usr/tmp/$out", $file);
print "Replaced tabs in file $file\n";
# debug start
#system("rm /usr/tmp/$out");
#if ( $? != 0 ) {
# warn "Couldn't rm /usr/tmp/$out";
#}
# debug end
} else {
#cleanup
system("rm /usr/tmp/$out");
if ( $? != 0 ) {
warn "Couldn't rm /usr/tmp/$out";
}
}
return;
}
# MAIN
######
# globals
$fileCnt = 0;
#AS ROOT ONLY
$me = getpwuid($<);
print "User: $me\n";
chop($host = `hostname`);
print "Host: $host\n";
if ($me ne "root" ) {
die "must be root\n";
}
# TO RUN THRU e.g CVSROOT
if (@ARGV < 1){
die "Gimme a CVSROOT_DIR or rcsFile\n";
} else {
$dir = $ARGV[0]; print "your dir or file = $dir\n";
}
# GET ALL FILES
# no directories
chop (@files = `find $dir -name \\* -type f -print `);
$totFileCnt = $#files+1;
print "Starting with $totFileCnt files\n";
$cmddir = "/usr/bin";
foreach $file ( @files ) {
# don't touch the CVSROOT directory
next if ( $file =~ /.*\/cvsroot\/CVSROOT.*/ );
# only C stuff
next unless $file =~ /.*\.c,.*/ || $file =~ /.*\.h,.*/;
$filetype = 'binary'; # default - don't touch
# drop if a binary file
open (RLOG, "$cmddir/rlog $file |") || die "Can't rlog $file\n";
while (<RLOG>) {
# last if ($_ =~ /^=.*=$/); #not needed
if ($_ =~ /^keyword substitution: b.*$/ ) {
#print "is a binary file\n";
$filetype = 'binary';
last;
} else {
if ($_ =~ /^keyword substitution: kv$/) {
#print "is a normal file\n";
$filetype = 'normal';
last;
}
}
}
close(RLOG);
if ($filetype eq 'normal') {
detab $file;
}
}
print "tab_fix.perl modified $fileCnt files\n";
exit;
5 responses so far ↓
Alexandr Ciornii // July 18, 2009 at 6:53 pm |
Better:
$replace = ‘ ‘ x $spaces;
Alex // July 18, 2009 at 11:36 pm |
Thanks Alexandr
A little check using that pointer revealed this educational link
http://www.perlmonks.org/?node_id=203941
Steve Scaffidi // July 19, 2009 at 3:02 am |
While I love to see people proud to use Perl, I would strongly recommend that you take a look at the GNU indent utility. If you are running any sort of *nix it should be readily available through your usual package sources. If not, I’m confident it will compile and install from source without issue. If you’re using Windows, it’s already included in Cygwin.
Basically, if your code is all written in C, GNU Indent should readily solve all your code-formatting problems.
Here’s its website: http://www.gnu.org/software/indent/
It is not that unusual for a company to force all files to be re-formatted using the indent program as a CVS/SVN pre-commit hook. In fact, I highly recommend that practice.
Next, get to know the ‘find’ command. I see you’re calling it from your perl code, but I can see from what you’re doing with the results that GNU find can do everything you need *without* the extra processing.
Your final program will be reduced to something like this:
# find
-exec indent \;This is a much faster, maintainable, and flexible solution than the program above! This does not in any way belittle your efforts, nor does it call your task trivial... part of becoming a master of these things is learning to do *much* more with less typing, but more thinking.
Steve Scaffidi // July 19, 2009 at 3:08 am |
Whoops… in that example find command there was some stuff I put in angle-brackets! Let’s try that again!
# find <code dirs> <find opts> -exec indent <indent opts> \;
Alex // July 20, 2009 at 12:19 am |
Hi Steve
maybe you did not spot that this was an “in repository” cleanup. Inside CVS?
So in processing “,v” i.e. RCS files, tabs found can be unrelated to the actual source code (there is the initial node / tree structure before the section with the source)!
I was also reluctant to assume that a file extension guaranteed matching content, and so I checked for binary tags and such.
Significantly, in modifying the RCS files directly, one avoided setting off a chain of events that a checkout / checkin of many thousands of source files would cause.
Oh, it was a one off I wrote for a software house client (with complicated CVS triggers, tagging and release mechanisms)!
And so the “in situ” fix.
The code is a one off and a little messy (just a stream of code until it worked, with on-going client input / changes), plus my brain is rewired by Groovy and Python
Actually I agree with your find comments (my code started out doing something different) and suggestion about indent, and maybe they will read this discussion!
I still used the same routine (minus RCS handling) on what the developers had in their sandboxes (work areas with source checked out) because it guaranteed that the changes outside of the repository would still match exactly what was now inside that CVS repository (hopefully all was correct, but if an error crept in, at least it would match!).
Does that make the exercise a little more clear? I thought that others just might get such a request and there was no point in my not sharing