Git commands I run before reading any code
grepsedawk
1931 points
404 comments
April 08, 2026
Related Discussions
Found 5 related stories in 61.2ms across 3,961 title embeddings via pgvector HNSW
- Contextual commits – An open standard for capturing the why in Git history vidimitrov · 30 pts · March 12, 2026 · 47% similar
- More on Version Control velmu · 66 pts · March 29, 2026 · 47% similar
- Claude Code runs Git reset –hard origin/main against project repo every 10 mins mthwsjc_ · 224 pts · March 29, 2026 · 47% similar
- United States Code (federal laws) in Git nickvido · 42 pts · April 03, 2026 · 47% similar
- Get Shit Done: A meta-prompting, context engineering and spec-driven dev system stefankuehnel · 267 pts · March 17, 2026 · 45% similar
Discussion Highlights (20 comments)
gherkinnn
These are some helpful heuristics, thanks. This list is also one of many arguments for maintaining good Git discipline.
pzmarzly
Jujutsu equivalents, if anyone is curious: What Changes the Most jj log --no-graph -r 'ancestors(trunk()) & committer_date(after:"1 year ago")' \ -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \ | sort | uniq -c | sort -nr | head -20 Who Built This jj log --no-graph -r 'ancestors(trunk()) & ~merges()' \ -T 'self.author().name() ++ "\n"' \ | sort | uniq -c | sort -nr Where Do Bugs Cluster jj log --no-graph -r 'ancestors(trunk()) & description(regex:"(?i)fix|bug|broken")' \ -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \ | sort | uniq -c | sort -nr | head -20 Is This Project Accelerating or Dying jj log --no-graph -r 'ancestors(trunk())' \ -T 'self.committer().timestamp().format("%Y-%m") ++ "\n"' \ | sort | uniq -c How Often Is the Team Firefighting jj log --no-graph \ -r 'ancestors(trunk()) & committer_date(after:"1 year ago") & description(regex:"(?i)revert|hotfix|emergency|rollback")' Much more verbose, closer to programming than shell scripting. But less flags to remember.
ramon156
> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.” The most changed file is the one people are afraid of touching?
seba_dos1
> If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Squash-merge workflows are stupid (you lose information without gaining anything in return as it was easily filterable at retrieval anyway) and only useful as a workaround for people not knowing how to use git, but git stores the author and committer names separately, so it doesn't matter who merged, but rather whether the squashed patchset consisted of commits with multiple authors (and even then you could store it with Co-authored-by trailers, but that's harder to use in such oneliners).
traceroute66
> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. What a weird check and assumption. I mean, surely most of the "20 most-changed files" will be README and docs, plus language-specific lock-files etc. ? So if you're not accounting for those in your git/jj syntax you're going to end up with an awful lot of false-positive noise.
JetSetIlly
Some nice ideas but the regexes should include word boundaries. For example: git log -i -E --grep="\b(fix|fixed|fixes|bug|broken)\b" --name-only --format='' | sort | uniq -c | sort -nr | head -20 I have a project with a large package named "debugger". The presence of "bug" within "debugger" causes the original command to go crazy.
aa-jv
Great tips, added to notes.txt for future use .. Another one I do, is: $alias gss='git for-each-ref --sort=-committerdate' $gss ce652ca83817e83f6041f7e5cd177f2d023a5489 commit refs/heads/project-feature-development ce652ca83817e83f6041f7e5cd177f2d023a5489 commit refs/remotes/origin/project-feature-development 1ef272ea1d3552b59c3d22478afa9819d90dfb39 commit refs/remotes/origin/feature/feature-removal-from-good-state c30b4c67298a5fa944d0b387119c1e5ddaf551f1 commit refs/remotes/origin/feature/feature-removal eda340eb2c9e75eeb650b5a8850b1879b6b1f704 commit refs/remotes/origin/HEAD eda340eb2c9e75eeb650b5a8850b1879b6b1f704 commit refs/remotes/origin/main 3f874b24fd49c1011e6866c8ec0f259991a24c94 commit refs/heads/project-bugfix-emergency ... This way I can see right away which branches are 'ahead' of the pack, what 'the pack' looks like, and what is up and coming for future reference ... in fact I use the 'gss' alias to find out whats going on, regularly, i.e. "git fetch --all && gss" - doing this regularly, and even historically logging it to a file on login, helps see activity in the repo without too much digging. I just watch the hashes.
mattrighetti
I have a summary alias that kind of does similar things # summary: print a helpful summary of some typical metrics summary = "!f() { \ printf \"Summary of this branch...\n\"; \ printf \"%s\n\" $(git rev-parse --abbrev-ref HEAD); \ printf \"%s first commit timestamp\n\" $(git log --date-order --format=%cI | tail -1); \ printf \"%s latest commit timestamp\n\" $(git log -1 --date-order --format=%cI); \ printf \"%d commit count\n\" $(git rev-list --count HEAD); \ printf \"%d date count\n\" $(git log --format=oneline --format=\"%ad\" --date=format:\"%Y-%m-%d\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \ printf \"%d tag count\n\" $(git tag | wc -l); \ printf \"%d author count\n\" $(git log --format=oneline --format=\"%aE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \ printf \"%d committer count\n\" $(git log --format=oneline --format=\"%cE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \ printf \"%d local branch count\n\" $(git branch | grep -v \" -> \" | wc -l); \ printf \"%d remote branch count\n\" $(git branch -r | grep -v \" -> \" | wc -l); \ printf \"\nSummary of this directory...\n\"; \ printf \"%s\n\" $(pwd); \ printf \"%d file count via git ls-files\n\" $(git ls-files | wc -l); \ printf \"%d file count via find command\n\" $(find . | wc -l); \ printf \"%d disk usage\n\" $(du -s | awk '{print $1}'); \ printf \"\nMost-active authors, with commit count and %%...\n\"; git log-of-count-and-email | head -7; \ printf \"\nMost-active dates, with commit count and %%...\n\"; git log-of-count-and-day | head -7; \ printf \"\nMost-active files, with churn count\n\"; git churn | head -7; \ }; f" EDIT: props to https://github.com/GitAlias/gitalias
croemer
Rather than using an LLM to write fluffy paragraphs explaining what each command does and what it tells them, the author should have shown their output (truncated if necessary)
boxed
Just looking at how often a file changes without knowing how big the file is seems a bit silly. Surely it should be changes/line or something?
alkonaut
Trusting the messages to contain specific keywords seems optimistic. I don't think I used "emergency" or "hotfix" ever. "Revert" is some times automatically created by some tools (E.g. un-merging a PR).
niedbalski
Ages ago, google released an algorithm to identify hotspots in code by using commit messages. https://github.com/niedbalski/python-bugspots
niedbalski
Ages ago google wrote an algorithm to detect hotspots by using commit messages, https://github.com/niedbalski/python-bugspots
user20251219
thank you - these are useful
whstl
> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions. In my experience, when the team doesn't squash, this will reflect the messiest members of the team. The top committer on the repository I maintain has 8x more commits than the second one. They were fired before I joined and nobody even remembers what they did. Git itself says: not much, just changing the same few files over and over. Of course if nobody is making a mess in their own commits, this is not an issue. But if they are, squash can be quite more truthful.
nola-a
For more insights on Git, check out https://github.com/nolasoft/okgit
fzaninotto
Instead of focusing on the top 20 files, you can map the entire codebase with data taken from git log using ArcheoloGit [1]. [1]: https://github.com/marmelab/ArcheoloGit
lpribis
I was curious what information I could glean from these for some popular repos. Caveat: I'm primarily an low-level embedded developer so I don't interface with large open source projects at the source level very often (other than occasionally the linux kernel). I chose some projects at random that I use. *Mainline linux* Most changed files: pretty much what I expected for 1 and 2... the "cutting edge" of Linux development over other OSes -- bpf and containers. The bpf verifier and AMD GPU driver might get a boost in this list due to sheer LoCs in those files (26K and 14K respectively). An intel equivalent of amdgpu_dm is #21 in the list (drivers/gpu/drm/i915/display/intel_display.c) and nvidia is nowhere to be seen (presumably due to out-of-tree modules/blobs?). 186 kernel/bpf/verifier.c 174 fs/namespace.c 162 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 161 kernel/sched/ext.c 159 fs/f2fs/f2fs.h Bus factor: obviously none. The top 4 10399 Christoph Hellwig -> I only know his name because of drama last year regarding rust bindings to DMA subsystem 8481 Mauro Carvalho Chehab -> I also know his name from the classic "Mauro, shut the fuck up!" Linus rant 8413 Takashi Iwai -> Listed as maintainer for sound subsystem, I think he manages ALSA 8072 Al Viro -> His name is all over bunch of filesystem code Buggy files: Intel comes out on top of GPU drivers this time (twice). Along with KVM for x86(64), the main allocator, and BTRFS. 1477 drivers/gpu/drm/i915/intel_display.c 1406 MAINTAINERS 1390 sound/pci/hda/patch_realtek.c 1102 drivers/gpu/drm/i915/i915_drv.h 943 arch/x86/kvm/x86.c 928 mm/page_alloc.c 871 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 862 drivers/gpu/drm/i915/i915_reg.h 840 fs/btrfs/inode.c *GCC* Most changed files: IR autovectorization code, riscv heuristics tables, and C++ template handling (pt.c is "paramaterized types"). 152 gcc/tree-vect-stmts.cc 145 gcc/config/riscv/riscv.cc 131 gcc/tree-vect-loop.cc 116 gcc/cp/pt.cc Buggy files: DWARF debuginfo generation, x86 heuristics tables, RS6000(?!) heuristic tables. I had to look up RS6000, it's an IBM instruction set from the 90s lol. cp-tree.h is an interesting file, it seems be the main C(++) AST datastructures. 1017 gcc/dwarf2out.c 885 gcc/config/i386/i386.c 796 gcc/cp/cp-tree.h 740 gcc/config/rs6000/rs6000.c 720 gcc/cp/pt.c *xfwm4* Most changed files: the list is dominated by *.po localizations. I filtered these out. Even after this, I discovered there is very little active development in the last few years. If I extend to 4 years ago, I get: 1. src/client.c - Realizing this project is too "small" to glean much from this. client.c is just the core X client management code. Makes sense. 2. src/placement.c - Other core window management code. This has not told me much other than where most of the functionality of this project lies. Bus factor: Pretty huge. Not really an issue in this case due to lack of development I guess. 3298 Olivier Fourdan 530 Anonymous 319 Xfce Bot 121 Jasper Huijsmans Files with bug commits: Very similar distribution to most changed files. Not enough datapoints in this one to draw any big conclusions. I think these massive open projects (excl xfwm) are generally pretty consistent code quality across the heavily trodden areas because of the amount of manpower available to refactor the pain points. I've yet to see an example of "god help you if you have to change that file" in e.g. linux, but I have of course seen that situation many times in large proprietary codebases.
bsuvc
I love how the author thinks developers write commit messages. All joking aside, it really is a chronic problem in the corporate world. Most codebases I encounter just have "changed stuff" or "hope this works now". It's a small minority of developers (myself included) who consider the git commit log to be important enough to spend time writing something meaningful. AI generated commit messages helps this a lot, if developers would actually use it (I hope they will).
baquero
I put it into a gist :) https://gist.github.com/aeimer/8edc0b25f3197c0986d3f2618f036...