Archive for the ‘mercurial’ tag
Don’t ever break MY trunk!
Nico’s last blog post touches a subject that has been in my mind for some time now. I must first say that I don’t write this text strictly in reaction to Nico’s post and that I have not verified with if he acknowledges the points I’m about to make. During the time I’ve spent at Kryptiva it was pretty common to see what I will call WiP (Work-in-progress) commits pushed in our team Mercurial repositories. The reason usually given for pushing broken or incomplete changesets to repositories are the one cited by Nico: people need to backup big changes they are making, or want to complete those changes from another computer.
It would be unacceptable to commit WiP changes on a centralized source control system like SubVersion or CVS because the repository can be checked out by other users at any point in time. Those user tend to expect a working repository even if checking out from a public repository usually means there is a risk that whatever you are checking out will not work. At least, the minimum expectation is that the checked out copy will be compilable.
In distributed version control system (DVCS), like git, everybody commits on it own copy of a repository. Changes get pushed across repositories in discrete bundles. Unless the programming was careless, what ends up in the master repository usually is correct. So, even if programmers have committed broken changes at some point in the repository history, people that clone the repository will usually get a sound copy.
Committing broken code will rarely if ever hurt if all you work on are personal and/or small scale, ashort term, projects. If you are a single programmer tracking changes to a project will git and want to break your trunk every so often, then, go on, be my guess. You are the only person that will suffer your broken history. If you work in a group with several distributed repositories, then you need to read the rest of this post to understand why committing broken trunks is a bad thing.
History
The history of a code repository is the documentation of all the changes that was ever done to a project during its lifetime. As is, it’s the only external documentation that programmer will continuously maintain. This is not something that is obvious when working on projects that have a few tens or maybe hundreds commits. As long as the whole project fits in your head, it is unlikely that you will need to refer to the project change history. This happens when the project stretches over long time periods and has over thousands of commit. The change history is also something that is very useful when a project changes hand.
WiP commits come into this picture because they usually come with a commit message that not very explicit: “work in progress”, “to be continued”, “I’m not done”, “Finishing tomorrow”, etc. Such a message is extremely not useful if you need to inspect the project history, a blame/annotate log.
In effect, the WiP changesets are separated from the documentation of the change that usually happens at the last commit done on the feature. Tracking back the reason of the change is never unworkable but gets progressively more difficult as the project and the repository age.
Bissection
Bissection is actually a debugging technique that is mostly exclusive to the use of DVCS. It is a way to find regressions in the repository history by testing past commits using a binary search pattern. At each step of the bissection procedure, the DVCS system updates the repository, putting it in a state represented by a past changeset. The automated bissection procedure then leave the programmeur to test the resulting repository. The programmer should at that point run automated tests or reproduce the problem manually.
This graph represents a set of commit in a repository. The solid lines are connected changesets in the project history. The dashed line represents the changesets touched by the bissection procedure. In this picture, the initial broken changeset is F and the first known good changeset is A. The changeset consulted are, in order, D, B, then C, which is then found to be the changeset that introduced the bug.
This graph illustrates what happens when a few WiP commits are introduced in the tree. WiP commits means the project can’t be compiled at all point in its history which it might be impossible to find a regression using bissection.
This is the most serious problem that can happen if you commit broken code to a repository used by a team. It can seriously hamper debugging in big shared repositories.
To be continued…
If you are not impressed by the 2 reasons I explain here, then you need to read my next post. I think the best reason not to commit broken code is that DVCS offers you all the tools you need to make proper commit. I’ll explain how this is possible with Git and Mercurial in my next post on this subject.
Compressing a year of timekeeping in 2 hours
I’m very bad at keeping track of the time I spend working. This tends to require manual input, and something to remind me of doing the input. The later part is where I usually fail and lose interest. This meant that last week I had to input a year worth of timekeeping data in a few hours in a web application for that purpose.
This is not a problem as opaque as it might seem to some people. We use timekeeping at work to keep track of how much time are spent doing specific projects and not to keep a precise account of who is working or not at specific time.
The only place where that data is consigned is in out revision control systems, Mercurial. It has a detailed log of the data that was commited inside a repository and, an explanation why if the commit message was good. Scanning each repository (all 72 of them) with the default log command output would have been undoable.
Luckily, Mercurial has a lesser known feature which allows users to present log data data in a more terse way that the default. This is the --template switch, which is pretty well explained in Mercurial manual.
The command I’m using in the script bellow is something like that:
hg log --template "{date|shortdate} {author|email} {rev}"
Here is an excerpt of the output of this command.
... 2009-09-09 fdgonthier@kryptiva.com 1934 2009-09-21 fdgonthier@kryptiva.com 1935 2009-09-21 fdgonthier@kryptiva.com 1936 ...
So this shows some commit I have done in a specific project during the month of september in 2009. It was then trivial to extract that data from all the repositories to see what I was working on at what date. The following script loops around all my repositories and extract from the log the dates in 2009 where I have commited something. Note that I have added another field in the template, which is the name of the directory containing the Mercurial repository. This will be used to distinguish between projects in the step after the data is obtained.
#!/bin/sh
for i in $(find . -maxdepth 1 -type d | cut -c 3-); do
if [ -e $i/.hg ]; then
echo "Churning $i"
(cd $i; \
hg log \
--template "{date|shortdate} $i {author|email} {rev}\n" |\
grep -E "^2009.*(fdgonthier)") > ~/churn/$i
fi
done
From the files churn directory it’s then trivial to get a picture of everything that was worked on all through the year. Just cat the file together and sort the whole set of lines by date.
> cd ~/churn && cat * | sort | less ... 2009-03-03 bar-daemon fdgonthier@kryptiva.com 1803 2009-03-03 bar-daemon fdgonthier@kryptiva.com 1804 2009-03-04 libfoo fdgonthier@kryptiva.com 5 2009-03-04 libfoo fdgonthier@kryptiva.com 6 2009-03-04 bar-daemon fdgonthier@kryptiva.com 1805 2009-03-04 bar-deamon fdgonthier@kryptiva.com 1806 ...
This will be as accurate as you keep your repositories clean. For example, it might be difficult to extract only the changesets you did if you did not pay attention to correctly configuring your default commit name. It happened to me in some contexts. I also had to use the revision number of the log to the content of some commits because I could not remember to what subproject they were attached.
This is not something you want to have to do. It’s much more accurate and easy to properly feed the timetracking program on a daily basis. There is no excuse not do to it properly, but if you tend to forget that kind of thing, this trick can help.
Mercurial shell prompt, followup
I have already found a solution for the problem I stated in my last post. I was able to make a shorty Python script that get the information out of the Mercurial repository using Mercurial classes.
It was more simple than I expected. If I manage to clean that up, it would probably be an useful Mercurial extension.
I only paste the Python code here. I would like to setup a git repository for my $HOME/bin scripts. This is definitely not the final product, but it’s already something you can use.
#!/usr/bin/python
import mercurial
import sys
from mercurial import ui, hg;
myui = ui.ui()
try:
# Instanciate the repository.
hg_repo = hg.repository(myui, sys.argv[1])
# Get some informations.
hg_branch = hg_repo.dirstate.branch()
hg_rev = hg_repo.changelog.rev(hg_repo.changelog.tip())
if hg_repo.dirstate._dirty:
hg_dirty = "1"
else:
hg_dirty = ""
print "hg_rev=%s\nhg_branch=%s\nhg_dirty=%s" % (hg_rev, hg_branch, hg_dirty)
except Exception, ex:
sys.exit(1)
Mercurial shell prompt
I thought this could be interesting.
Based on this post by Mike Hommey, I decided to setup my own Mercurial-aware ZSH prompt.
Here the revelant part of my .zshrc.
function precmd {
psvar=()
if [ ! -e `pwd`/.hg ]; then
PROMPT=$DEFAULT_PROMPT
return
else
# TODO: We need some way to know quickly if there are changes.
hg_tip=`hg tip --template 'hg_rev={rev};\nhg_branch={branches}'`
hg_status=`hg status -mardi`
eval $hg_tip
psvar[1]=$hg_rev
if [[ "$hg_branch" == "" ]]; then
psvar[2]=default
else
psvar[2]=$hg_branch
fi
if [[ "$hg_status" != "" ]]; then
psvar[3]="*"
else
psvar[3]="-"
fi
PROMPT="[r:%1v b:%2v s:%3v]%n@%m:%1d > "
fi
}
export DEFAULT_PROMPT="%n@%m:%~/ > "
export PROMPT=$DEFAULT_PROMPT
See the TODO for the problem I currently have with that. It is calling Mercurial twice. Mercurial is fast, but its still not a native program and it takes a noticable time to update the prompt. If I got all the informations in one call to hg I would be quite happy, but I feel I would need to hack some Python code and use the Mercurial classes.
The resulting prompt is compact and quite nice:
[r:4 b:debian s:-]fdgonthier@moka:spamassassin >


