Wireshark-dev: Re: [Wireshark-dev] Wireshark Git Mirror Maintenance
From: Gerald Combs <gerald@xxxxxxxxxxxxx>
Date: Sun, 03 Aug 2014 15:20:12 -0700
On 8/3/14, 11:34 AM, Evan Huus wrote: > On Mon, May 13, 2013 at 7:54 PM, Gerald Combs <gerald@xxxxxxxxxxxxx > <mailto:gerald@xxxxxxxxxxxxx>> wrote: > > On 5/10/13 1:47 PM, Evan Huus wrote: > > Hi Gerald > > > > I just cloned the Wireshark git mirror onto a new machine and was > > surprised at how large it was to download. Running an aggressive git > > gc on the finished clone reduced the disk usage on my machine from > > ~500MB to ~150MB. > > > > I'm a bit surprised - git is supposed to automatically garbage collect > > repositories when they get too cluttered, but perhaps its threshold > > for automatic gc is just very high. > > > > I pinged Balint (CCed) about this and he suggested running gc on a > > weekly basis and gc --aggressive on a monthly basis on the server. It > > would probably save a non-trivial amount of bandwidth in the long term > > as more people clone the repository. > > It might be due to our particular circumstances (a bare repository only > updated via the mirror script) but git's automatic garbage collection > doesn't seem to happen very often. The mirror script runs "git gc > --auto" each time it synchronizes which keeps it from filling up the > disk (which happened early on) but as you point out there is room for > improvement. I added a cron job that runs "git gc --aggressive" each > week. Here is the output from a manual run, which includes "git > count-objects -v" before and after: > > 2013-05-13 14:38:12: Started. > 2013-05-13 14:38:12: Synchronizing repository wireshark > 2013-05-13 14:38:12: Object count start > count: 0 > size: 0 > in-pack: 316591 > packs: 45 > size-pack: 567146 > prune-packable: 0 > garbage: 0 > 2013-05-13 14:38:12: Collecting garbage > 2013-05-13 15:09:56: Object count start > count: 0 > size: 0 > in-pack: 316596 > packs: 2 > size-pack: 127499 > prune-packable: 0 > garbage: 0 > 2013-05-13 15:09:56: Done > > > So it's been over a year since this conversation and we have actually > migrated to Git/Gerrit so I have no idea what Gerrit is doing in this > regard (is there even a "real" git repository backing it, or is it all > internal magic?), but I recently came across [1] which suggests that > repeated use of --aggressive maybe wasn't such a good idea after all. > > It suggests just sticking to regular `git gc` except in cases of large > one-time imports (like we did on migration) at which point you should > run the apparently-very-slow `git repack -a -d --depth=250 --window=250`. > > FWIW, a fresh clone from Gerrit right now is 213MB - my local repo is > only 161MB, and my current desktop is actually not beefy enough to run > the recommended repack command so I have no idea what improvement that > would give. It's a "real" git repository but any operations performed by Gerrit are done using JGit. The weekly automatic number update script runs `gerrit gc --all`, which uses JGit's garbage collector. Many sites including Google appear to run it one or more times a day. We may want to to the same. I tried running git `repack -a -d --depth=250 --window=250` on the server. It ran successfully and shrunk the repository from 248 MB to 208 MB but now the OS X builders are timing out during `git fetch`...
- Follow-Ups:
- Re: [Wireshark-dev] Wireshark Git Mirror Maintenance
- From: Evan Huus
- Re: [Wireshark-dev] Wireshark Git Mirror Maintenance
- Prev by Date: [Wireshark-dev] [PATCH] fix a Buffer overrun possible for long command line arguments.
- Next by Date: Re: [Wireshark-dev] Wireshark Git Mirror Maintenance
- Previous by thread: Re: [Wireshark-dev] [PATCH] fix a Buffer overrun possible for long command line arguments.
- Next by thread: Re: [Wireshark-dev] Wireshark Git Mirror Maintenance
- Index(es):