Inside Git: Unpacking the .git Folder and How Version Control Actually Works

We have all been there: you type git add ., followed by a git commit -m "update", and hope for the best. For many developers, Git feels like a magic black box. You memorize the commands to keep your team happy, but the moment you face a merge conflict or a detached HEAD, the panic sets in.
To truly master Git, you have to stop memorizing commands and start building a mental model of its internal engine. Git is not just a version control system; at its core, it is a content-addressable filesystem.
Today, we are going to open the black box, look inside the hidden .git folder, and understand exactly how Git tracks your code.
π The Heart of the Machine: The .git Folder
When you run git init in a new project, Git doesn't magically wrap your files in an invisible force field. It simply creates a hidden directory named .git.
If you delete this folder, your project is no longer a Git repository. You keep your files, but you lose your entire project history.
Let's look at the anatomical structure of a fresh .git folder:
Plaintext
.git/
βββ objects/ # The internal database (where your code history lives)
βββ refs/ # Pointers to specific commits (your branches and tags)
βββ HEAD # A pointer indicating your currently active branch
βββ index # A binary file representing the Staging Area
βββ config # Repository-specific settings
The most important folder here is .git/objects. This is Git's internal database. Every time you save a version of your project, Git writes the data into this folder.
πΈ Snapshots, Not Deltas
Before we look at the objects inside that database, we need to correct a massive misconception.
Many developers think Git tracks changes by storing deltas (e.g., "Line 4 changed from 'A' to 'B'"). Git does not do this. Instead, Git takes a snapshot of your entire project at that exact moment in time. If a file hasn't changed, Git doesn't store a new copy; it just creates a link to the previous identical file. This snapshot model is what makes Git incredibly fast at switching branches.
The Holy Trinity of Git Objects
When Git takes these snapshots, it uses three fundamental types of objects. Everything in your repository's history is made up of these three things:
1. The Blob (Binary Large Object)
A blob stores the content of your file. That's it. It does not store the file name, the creation date, or the author. It only cares about the data inside the file (like your HTML or JavaScript code).
2. The Tree
If a blob is the file content, the tree is the directory. A tree object stores file names, permissions, and pointers to the blobs (or other trees) that belong in that directory.
3. The Commit
A commit is simply a wrapper around a tree. It provides the metadata: who made the save, when they made it, the commit message, and a pointer to the parent commit so Git can track the timeline.
Diagram: How They Connect
Here is the mental model of how a single commit looks under the hood:
Plaintext
[ Commit Object ]
β - Author: Jane Doe
β - Message: "Add homepage"
β - Parent: (Previous Commit)
β
βββ> Points to -> [ Tree Object ] (Represents the root folder)
β
βββ "index.html" ----> [ Blob Object ] (HTML content)
βββ "src/" ----------> [ Tree Object ] (src folder)
βββ "app.js" -> [ Blob Object ]
π Hashes: The Glue Holding It Together
You might be wondering: How does Git name these objects in the database? Git uses a hashing algorithm called SHA-1. Whenever Git saves an object (a blob, tree, or commit), it hashes the content and generates a 40-character hexadecimal string (e.g., a1b2c3d4e5f6...).
Integrity: Because the hash is generated from the content, if even a single space is altered in your code, the resulting hash will be completely different. This makes it impossible to secretly alter Git history without rewriting the hashes.
Storage: This is also how Git avoids duplicating files. If you have two identical images in different folders, their content is the same, so their SHA-1 hash is the same. Git only stores one Blob in the
.git/objectsfolder!
The Internal Flow: git add and git commit
Letβs tie it all together by watching what happens in the .git folder when you run your daily commands. Imagine you just created a new file called style.css.
Step 1: git add style.css
When you add a file to the staging area, Git goes to work immediately:
It reads the content of
style.css.It generates a SHA-1 hash based on that content.
It creates a Blob object and saves it in the
.git/objectsfolder.It updates the
.git/indexfile (the staging area) to say, "Hey, the filestyle.cssnow points to this new Blob."
Step 2: git commit -m "Add styles"
When you commit, Git finalizes the snapshot:
It looks at the staging area (
.git/index).It generates a Tree object that maps the name
style.cssto the Blob created in Step 1.It creates a Commit object containing your name, the message
"Add styles", and a pointer to that new Tree.Finally, it updates the
HEADpointer and your current branch pointer to look at this brand-new Commit object.
Diagram: The Data Flow
Plaintext
Working Directory Staging Area (.git/index) Local Repo (.git/objects)
[style.css] ββββββ> (Tracks file name + Hash) βββββ> [Blob created immediately]
(git add)
β
β
(git commit) β
βββββββββββββββββ> [Tree & Commit created]
Summary
Git is entirely predictable once you know how it stores data.
The
.gitfolder is a database.Blobs hold content.
Trees hold folder structures and file names.
Commits hold metadata and tie the timeline together.
SHA-1 hashes ensure nothing is ever accidentally corrupted or overwritten.
Next time you type git commit, picture the blobs, trees, and commits linking together in your .git/objects folder. You aren't just typing commands anymore; you're manipulating a deeply elegant file system.

