Archive for January, 2010
Compressing a year of timekeeping in 2 hours
I’m very bad at keeping track of the time I spend working. This tends to require manual input, and something to remind me of doing the input. The later part is where I usually fail and lose interest. This meant that last week I had to input a year worth of timekeeping data in a few hours in a web application for that purpose.
This is not a problem as opaque as it might seem to some people. We use timekeeping at work to keep track of how much time are spent doing specific projects and not to keep a precise account of who is working or not at specific time.
The only place where that data is consigned is in out revision control systems, Mercurial. It has a detailed log of the data that was commited inside a repository and, an explanation why if the commit message was good. Scanning each repository (all 72 of them) with the default log command output would have been undoable.
Luckily, Mercurial has a lesser known feature which allows users to present log data data in a more terse way that the default. This is the --template switch, which is pretty well explained in Mercurial manual.
The command I’m using in the script bellow is something like that:
hg log --template "{date|shortdate} {author|email} {rev}"
Here is an excerpt of the output of this command.
... 2009-09-09 fdgonthier@kryptiva.com 1934 2009-09-21 fdgonthier@kryptiva.com 1935 2009-09-21 fdgonthier@kryptiva.com 1936 ...
So this shows some commit I have done in a specific project during the month of september in 2009. It was then trivial to extract that data from all the repositories to see what I was working on at what date. The following script loops around all my repositories and extract from the log the dates in 2009 where I have commited something. Note that I have added another field in the template, which is the name of the directory containing the Mercurial repository. This will be used to distinguish between projects in the step after the data is obtained.
#!/bin/sh
for i in $(find . -maxdepth 1 -type d | cut -c 3-); do
if [ -e $i/.hg ]; then
echo "Churning $i"
(cd $i; \
hg log \
--template "{date|shortdate} $i {author|email} {rev}\n" |\
grep -E "^2009.*(fdgonthier)") > ~/churn/$i
fi
done
From the files churn directory it’s then trivial to get a picture of everything that was worked on all through the year. Just cat the file together and sort the whole set of lines by date.
> cd ~/churn && cat * | sort | less ... 2009-03-03 bar-daemon fdgonthier@kryptiva.com 1803 2009-03-03 bar-daemon fdgonthier@kryptiva.com 1804 2009-03-04 libfoo fdgonthier@kryptiva.com 5 2009-03-04 libfoo fdgonthier@kryptiva.com 6 2009-03-04 bar-daemon fdgonthier@kryptiva.com 1805 2009-03-04 bar-deamon fdgonthier@kryptiva.com 1806 ...
This will be as accurate as you keep your repositories clean. For example, it might be difficult to extract only the changesets you did if you did not pay attention to correctly configuring your default commit name. It happened to me in some contexts. I also had to use the revision number of the log to the content of some commits because I could not remember to what subproject they were attached.
This is not something you want to have to do. It’s much more accurate and easy to properly feed the timetracking program on a daily basis. There is no excuse not do to it properly, but if you tend to forget that kind of thing, this trick can help.
Interfacing with Microsoft products
I work for Kryptiva, which is a small company creating security and collaboration tools. Our server products are deployed on Ubuntu and Debian Linux, but our client-side products are naturally available on Windows. We host some of our server product ourselves, but some of the server we have are deployed inside client networks, which are usually Microsoft Windows networks.
Creating application for Microsoft Network and for Microsoft Windows desktops meant to had to deal with several Microsoft technologies. This might come as a surprise to readers of Linux technology blog, but I think that not all Microsoft technologies are nightmarish messes to deal with. To the extent to what we have needed to do, Microsoft did not fare as worse or as good than everything else I have dealt with in my programming career.
Microsoft Active Directory is the building block of all Microsoft Windows network. It is an huge database with load of information which can happily be accessed using the open LDAP protocol. Working with Active Directory is something intimidating at first because the data it provides isn’t intuitively interpretable. For a Linux user, it takes a little while to get used to the terminology and the features of a Microsoft Windows network. Once the basics are acquired, it’s actually pretty easy to get Active Directory to return the information ones want. Of course, it has Microsoft-style quirks you have to deal with, but everything is pretty decently documented especially since Microsoft was forced release to their documentation. The people working on Samba 4 would certainly have a lot more to say than me about Active Directory but I honestly can’t say that much bad things about it.
Microsoft is a big company that has to deal with an enormous amount of code and software interfaces internally, but they also need to answer to their clients which all together have a massive quantity of code interfacing with Microsoft code. This means that deprecating age-old API is hard, and can’t always be handled in a very elegant way.
I’m not arguing that API should never be deprecated, but there are good ways and bad ways to deprecated API. The MAPI (Messaging API) is an age-old interface that has been in use for years in Windows to send an receive email messages on Windows machine. It’s badly documented and quirky API but since it’s the core of all messaging applications on Windows, it’s very powerful and difficult to do without it. Sadly, Microsoft does not want people to use their shiny new .NET framework with the old MAPI. They reasons they explain is very vague. To put it simply: it’s not compatible and will crash your program, but we can’t really explain why or when. The alternative they suggest are simply not practical or not as powerful as directly dealing with MAPI.
This brings us to the worse. The interface exposed by Microsoft to plug into Outlook is just
Details about the next Microsoft Outlook release, currently in beta are being documented by Microsoft. In a recent newsletter, the makers of Addin Express linked us with this overview of a change related to the way Outlook 2010 shuts down.
Starting in the Outlook 2010 Beta release, Outlook, by default, does not signal add-ins that it is shutting down.
…
Most add-ins use these events to release references to Outlook COM objects and clear memory that was allocated during the session.
Thanks Microsoft, but what about network connections? Opened databases? Plain old data files?
I can only guess why Microsoft decided that it is important that Outlook shuts down as quickly as possible but it was apparently deemed more important that data integrity. This is something I find questionable. Microsoft have left a backdoor open for system administrator to revert to the old behavior, but in practice this is not the kind of thing you like to ask system administrator to do. There must be a handful of hackish and unsupported workaround to solve this problem and Outlook programmers around the world will find and document most of them, but I’m pretty sure they could all do without that…
LD_PRELOAD fun
Here is a welcome digression from my previous Twitter oriented posts. I’m starting to play around with the LD_PRELOAD feature in the Linux dynamic linker. For those who might not know what this feature is, here is the description from ld.so (8).
LD_PRELOAD
A whitespace-separated list of additional, user-specified, ELF
shared libraries to be loaded before all others. This can be
used to selectively override functions in other shared
libraries. For setuid/setgid ELF binaries, only libraries in
the standard search directories that are also setgid will be
loaded.
So in pratical term, any libraries you specify in the LD_PRELOAD environment variable will loaded before any system libraries. This means that dynamic symbols in a loading program will be first searched in those libraries before being searched anywhere else. This means you can override any defined symbol you want in standard libraries.
Let’s start with a rather juvenile example. This will change the behavior of the read (2) function in order to make the user believe a file might have a different content.
ssize_t read(int fd, void *buf, size_t count) {
static int done = 0;
if (!done) {
char silly_str[] = "Haha you got overriden.\n";
size_t s = count > sizeof(silly_str) ? sizeof(silly_str) : count;
memcpy(buf, silly_str, s);
done = 1;
return s;
}
else return 0;
}
If you compile this inside a library that is called, for example, libread.so, you can test this code by running:
> /bin/cat /etc/fstab # /etc/fstab: static file system information. # ... > LD_LIBRARY_PATH=. LD_PRELOAD=libread.so /bin/cat /etc/fstab Haha you got overriden.
That in itself is just a rather silly prank you can play on your friend’s computer if you happen to have access to it. Experienced programmer will start seeing potential uses for LD_PRELOAD. I am getting to that.
The subject of our next example will be the honorable ls (1). ls uses the opendir (3) function to open a directory and browse its files. It should react properly if it can’t open the directory. One way to test this is to make opendir() return NULL and observe how the caller reacts. You can do that using LD_PRELOAD.
DIR *opendir(const char *name) {
return NULL;
}
> LD_LIBRARY_PATH=. LD_PRELOAD=libls1.so /bin/ls /tmp /bin/ls: cannot open directory /tmp
What can you do now if you want to preserve part of the behavior of the function, or modify they result it returns? Your preloaded library will then need to use libdl to dynamically load the function it wants to modify the behavior.
The following example is a very simple override of the opendir (3) function which open a different directory than what the caller expects. I will explain more in detail the details of this function below.
DIR *opendir(const char *name) {
DIR *(*libc_opendir)(const char *name);
*(void **)(&libc_opendir) = dlsym(RTLD_NEXT, "opendir");
return libc_opendir("/tmp");
}
libdl is fortunately very simple to use. The naive approach would be to use dlopen (3) to open the C library, then get the pointer to the function you are calling using dlsym (3). In theory, this technique is valid and working, but doing that circumvents the LD_PRELOAD mechanisme because preloaded libraries can be chained and calling directly into the C library prevents other caller to override our own function.
In practice, calling dlopen() on libc on an Ubuntu Karmic system made some program crash and burn for reasons I will not attempt to explain. The next technique should be preferred on Linux system, especially when dealing with the system C library.
dlsym() has an option that makes the Linux dynamic linker search for the right symbol to be override. This is the RTLD_NEXT flag, which is to be used just for the purpose of wrapper dynamic library functions.
libdl the task of returning the pointer to the right symbol. The RTLD_NEXT option to dlsym() returns the right symbol.
The next and final example of the use of LD_PRELOAD will still use the valiant ls. In time for Christmas, this will modify the output of ls by randomizing the d_type field returned in the dirent structure by readdir (3). If you use colorized ls output, and I believe most of you probably do, you should see a pretty display of color whenever you list a directory by preloading this function.
struct dirent64 *readdir64(DIR *dir) {
static struct dirent64 *(* libc_readdir64)(DIR *dir) = NULL;
struct dirent64 *dent;
unsigned char rnd_dtype[7] = { DT_UNKNOWN, DT_REG,
DT_DIR, DT_FIFO,
DT_SOCK, DT_CHR,
DT_BLK };
if (libc_readdir64 == NULL) {
*(void **)(&libc_readdir64) = dlsym(RTLD_NEXT, "readdir64");
srand(time(NULL));
}
dent = libc_readdir64(dir);
if (dent != NULL)
dent->d_type = rnd_dtype[rand() % 7];
return dent;
}
There is still a problem with this code on my new Ubuntu Hardy machine. The code from the preloaded library hangs before the program terminates. I do not understand why this happen and a search for this bug did not turn up anything. The problem doesn’t happen with Ubuntu Karmic.
There is nothing new about using LD_PRELOAD this way. Several very nice libraries have been built with the intention of modifying the behavior of typical libraries.
- fakeroot: “fakeroot provides a fake root environment by means of LD_PRELOAD and SYSV IPC (or TCP) trickery.”
- fakechroot: fakechroot provides a fake chroot environment to programs.
- libtrash:“[...] the shared library which, when preloaded, implements a trash can under GNU/Linux”
- cowdancer: cowdancer is an userland implementation of copy-on-write filesystem.
There are 29 projects matching LD_PRELOAD on freshmeat.net. You might have used some of them.
The code I have written for this demonstration is available on BitBucket.
