Thursday 16 February 2012

HOWTO: Git - reauthor/fix author and committer email and author name after a git cvsimport

You might find yourself at some moment when your git repository imported from CVS does not contain all the correct names and email addresses of the commits which were once in CVS but are now part of your project history in your git repo. Or you might do a cvsimport which missed a few authors.

Let's suppose you first import the cvs repo into git, but then you realise you missed some authors.

Before being able to do a git cvsimport, you need a checkout of the module or cvs subdir that you want to turn into its own git repo.

For ease of use I defined CVSCMD as
cvs -z9 -d :pserver:my_cvs_id@cvs.server.com:/root_dir
You will need to replace the items written in italics according to you situation, more exactly, you need to define 'my_cvs_id', 'cvs.server.com' and 'root_dir'. If your acces method to the server is not pserver, you should change that accordingly. This information should be available from your project admin or pages.


Check out the desired module or even subdir of a module

$CVSCMD checkout -d localdirname MODULE/path/to/subdir

git cvsimport -A ../authors -m -z 600 -C ../new-git-repo -R

How to find out the commits which do need rewriting

The way to limit yourself only to the commits that had no cvs-git author and commit information on git-cvsimport time is to use a filter like this:
git log -E --author='^[^@]*$' --pretty=format:%h
This tells git log to print only the abbreviated hashes (%h) for the commits that have NO '@' sign in the 'Author:', which happens if no cvs user id to git author and email was provided in the authors file and git cvsimport time.

We will use this command's output to tell later git filter-branch which commits need rewriting. *

But before that...

How do we find if our authors file is complete?

For this task we'll use a slighly modified form of the previous command and some shell script magic.
git log -E --author='^[^@]*$' --pretty=format:%an | sort -u > all-leftout-cvs-authors
And now in all-leftout-cvs-authors we'll have a sorted list of all cvs id's which were not handled in the original git-cvsimport. In my case there are only 19 such ids:
$ wc -l all-leftout-cvs-authors
19 all-leftout-cvs-authors

Nice, that will be easy to fix. Now edit your all-leftout-cvs-authors file to add the relevant information in a format similar to this:
john = John van Code <john@code.temple.tld>
jimmy = Jimmy O'Document <jimmy@documenting.com>
In case you can't make a complete cvs-user-to-name-and-email map, you might want to use stubs of the following form in order to be able to easily identify later such commits, if you prefer (or you could let them unaltered at al ;-):
cvsid=cvsid <cvsid@cvs.server.com>

How to actually do the filtering to fix history (using git-filter-branch and a script)

After this is done, we'll need just one more piece, the command to do the altering itself which reads as follow (note that my final authors file is called new-authors and that I placed this in a script in order to be able to easily run it without trying to escape all spaces and such madness):

[ "$authors_file" ] || export authors_file=$HOME/new-authors

#git filter-branch -f --remap-cvs --env-filter '
git filter-branch -f --env-filter '

get_name () {
grep "^$1=" "$authors_file" | sed "s/^.*=\(.*\)\ .*$/\1/"
}

get_email () {
grep "^$1=" "$authors_file" | sed "s/^.*\ <\(.*\)>$/\1/"
}

if grep -q "^$GIT_COMMITTER_NAME" "$authors_file" ; then
GIT_AUTHOR_NAME=$(get_name "$GIT_COMMITTER_NAME") &&
GIT_AUTHOR_EMAIL=$(get_email "$GIT_COMMITTER_NAME") &&
GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME" &&
GIT_COMMITTER_EMAIL="$GIT_AUTHOR_EMAIL" &&
export GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL &&
export GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL
fi
' -- --all
You might wonder what's up with the commented git filter-branch line with the --remap-cvs option. This script will NOT work for you as long as you have the stock git-filter-branch script and keep the option --remap-cvs while not patching your git-filter-branch script (/usr/lib/git-core/git-filter-branch), but that option will provide a file with the mappings from the old to the new commit ids. If you want that function, too, you'll want to apply this patch to git-filter-branch:

diff --git a/git-filter-branch b/git-filter-branch
old mode 100644
new mode 100755
index ae602e3..d1f7ef6
--- a/git-filter-branch
+++ b/git-filter-branch
@@ -149,6 +149,11 @@ do
prune_empty=t
continue
;;
+ --remap-cvs)
+ shift
+ remap_cvs=t
+ continue
+ ;;
-*)
;;
*)
@@ -368,6 +373,33 @@ while read commit parents; do
die "could not write rewritten commit"
done <../revs

+# Rewrite the cvs-revisions file, if requested and the file exists
+
+ORIG_CVS_REVS_FILE="${GIT_DIR}/cvs-revisions"
+if [ -f "$ORIG_CVS_REVS_FILE" ]; then
+ if [ "$remap_cvs" ]; then
+ printf "CVS remapping requested\n"
+
+ CVS_REVS_FILE="$tempdir/cvs-revisions"
+ cp "$ORIG_CVS_REVS_FILE" "$CVS_REVS_FILE"
+ printf "\nFound $ORIG_CVS_REVS_FILE; will copy and alter it as $CVS_REVS_FILE\n"
+ cvs_remap__commit_count=0
+ newcommits="$(ls ../map/ | wc -l)"
+ for commit in ../map/* ; do
+ cvs_remap__commit_count=$(($cvs_remap__commit_count+1))
+ printf "\rRemap CVS commit $commit ($cvs_remap__commit_count/$newcommits)"
+
+ oldsha1="$(basename $commit)"
+ read newsha1 < $commit
+ sed -i "s@$oldsha1\$@$newsha1@" "$CVS_REVS_FILE"
+ done
+ else
+ warn "\nNo CVS remapping requested, but cvs-revisions file found. All CVS mappings will be lost.\n"
+ fi
+elif [ "$remap_cvs" ]; then
+ warn "\nWARNING: CVS remap was ignored, since no original cvs-revisions file was found\n"
+fi
+
# If we are filtering for paths, as in the case of a subdirectory
# filter, it is possible that a specified head is not in the set of
# rewritten commits, because it was pruned by the revision walker.
@@ -491,6 +523,11 @@ if [ "$filter_tag_name" ]; then
done
fi

+if [ "$remap_cvs" -a -f "$CVS_REVS_FILE" ]; then
+ mv "$ORIG_CVS_REVS_FILE" "$ORIG_CVS_REVS_FILE.original"
+ cp "$CVS_REVS_FILE" "$ORIG_CVS_REVS_FILE"
+fi
+
cd ../..
rm -rf "$tempdir"


Then, after running this script, let's call it filter, you should have a brand new git repo with the appropriate authors and their emails set.


P.S.: I have started writing this post some time ago but stopped just before the last part, the one with the filter script. I realise I might be missing something in the explanation, but if you have problems, please comment so I can help you fixing them.

P.P.S.: * I realised in the filter script at some point I wanted to do something like:
for R in $(git log -E --author='^[^@]*$' --pretty=format:%H | head -n 2) ; do
[the same git filter branch command above but ending in ...]
' $R
done
But I think I remember that $R didn't work on the whole history, but only on that revision, or some other weird of that sort. I know I ended up not filtering explicitly those revisions, but the entire history. I hope this helps.

Wednesday 1 February 2012

HOWTO: Windows, nmake, cygwin and path type detection

If you are using Windows, have cygwin installed and need to test if a path is absolute or relative in nmake (I know, how often does that happen?), here is the magic bit of code that manages to do just that:


cygwin_path = c:\cygwin\bin

echo = $(cygwin_path)\echo.exe
grep = $(cygwin_path)\grep.exe

#testpath = ..\..\rel\test\path
#testpath = rel\test\path
#testpath = \abs\test\path
testpath = c:\abs\test\path

!if ([ $(echo) '$(testpath)' | $(grep) -q -E '^^(\w:)?\\\\' ] == 0)
type = abs
!else
type = rel
!endif

test:
@$(echo) "testpath = $(testpath)"
@$(echo) "type = $(type)"


Probably there are other solutions, but this is the first I came up with. Another solution would be to use gnu make :-) .