Skip to main content

Command Palette

Search for a command to run...

Inside Git: Unpacking the .git Folder and How Version Control Actually Works

Published
β€’5 min read
Inside Git: Unpacking the .git Folder and How Version Control Actually Works

We have all been there: you type git add ., followed by a git commit -m "update", and hope for the best. For many developers, Git feels like a magic black box. You memorize the commands to keep your team happy, but the moment you face a merge conflict or a detached HEAD, the panic sets in.

To truly master Git, you have to stop memorizing commands and start building a mental model of its internal engine. Git is not just a version control system; at its core, it is a content-addressable filesystem.

Today, we are going to open the black box, look inside the hidden .git folder, and understand exactly how Git tracks your code.


πŸ“‚ The Heart of the Machine: The .git Folder

When you run git init in a new project, Git doesn't magically wrap your files in an invisible force field. It simply creates a hidden directory named .git.

If you delete this folder, your project is no longer a Git repository. You keep your files, but you lose your entire project history.

Let's look at the anatomical structure of a fresh .git folder:

Plaintext

.git/
β”œβ”€β”€ objects/    # The internal database (where your code history lives)
β”œβ”€β”€ refs/       # Pointers to specific commits (your branches and tags)
β”œβ”€β”€ HEAD        # A pointer indicating your currently active branch
β”œβ”€β”€ index       # A binary file representing the Staging Area
└── config      # Repository-specific settings

The most important folder here is .git/objects. This is Git's internal database. Every time you save a version of your project, Git writes the data into this folder.


πŸ“Έ Snapshots, Not Deltas

Before we look at the objects inside that database, we need to correct a massive misconception.

Many developers think Git tracks changes by storing deltas (e.g., "Line 4 changed from 'A' to 'B'"). Git does not do this. Instead, Git takes a snapshot of your entire project at that exact moment in time. If a file hasn't changed, Git doesn't store a new copy; it just creates a link to the previous identical file. This snapshot model is what makes Git incredibly fast at switching branches.


The Holy Trinity of Git Objects

When Git takes these snapshots, it uses three fundamental types of objects. Everything in your repository's history is made up of these three things:

1. The Blob (Binary Large Object)

A blob stores the content of your file. That's it. It does not store the file name, the creation date, or the author. It only cares about the data inside the file (like your HTML or JavaScript code).

2. The Tree

If a blob is the file content, the tree is the directory. A tree object stores file names, permissions, and pointers to the blobs (or other trees) that belong in that directory.

3. The Commit

A commit is simply a wrapper around a tree. It provides the metadata: who made the save, when they made it, the commit message, and a pointer to the parent commit so Git can track the timeline.

Diagram: How They Connect

Here is the mental model of how a single commit looks under the hood:

Plaintext

[ Commit Object ]
 β”‚  - Author: Jane Doe
 β”‚  - Message: "Add homepage"
 β”‚  - Parent: (Previous Commit)
 β”‚
 └──> Points to -> [ Tree Object ] (Represents the root folder)
                     β”‚
                     β”œβ”€β”€ "index.html" ----> [ Blob Object ] (HTML content)
                     └── "src/" ----------> [ Tree Object ] (src folder)
                                              └── "app.js" -> [ Blob Object ]

πŸ” Hashes: The Glue Holding It Together

You might be wondering: How does Git name these objects in the database? Git uses a hashing algorithm called SHA-1. Whenever Git saves an object (a blob, tree, or commit), it hashes the content and generates a 40-character hexadecimal string (e.g., a1b2c3d4e5f6...).

  • Integrity: Because the hash is generated from the content, if even a single space is altered in your code, the resulting hash will be completely different. This makes it impossible to secretly alter Git history without rewriting the hashes.

  • Storage: This is also how Git avoids duplicating files. If you have two identical images in different folders, their content is the same, so their SHA-1 hash is the same. Git only stores one Blob in the .git/objects folder!


The Internal Flow: git add and git commit

Let’s tie it all together by watching what happens in the .git folder when you run your daily commands. Imagine you just created a new file called style.css.

Step 1: git add style.css

When you add a file to the staging area, Git goes to work immediately:

  1. It reads the content of style.css.

  2. It generates a SHA-1 hash based on that content.

  3. It creates a Blob object and saves it in the .git/objects folder.

  4. It updates the .git/index file (the staging area) to say, "Hey, the file style.css now points to this new Blob."

Step 2: git commit -m "Add styles"

When you commit, Git finalizes the snapshot:

  1. It looks at the staging area (.git/index).

  2. It generates a Tree object that maps the name style.css to the Blob created in Step 1.

  3. It creates a Commit object containing your name, the message "Add styles", and a pointer to that new Tree.

  4. Finally, it updates the HEAD pointer and your current branch pointer to look at this brand-new Commit object.

Diagram: The Data Flow

Plaintext

Working Directory        Staging Area (.git/index)         Local Repo (.git/objects)
    [style.css]  ──────>  (Tracks file name + Hash) ─────>  [Blob created immediately]
    (git add)                                                
                                                                        β”‚
                                                                        β”‚
                                   (git commit)                         β”‚
                                         └────────────────>  [Tree & Commit created]

Summary

Git is entirely predictable once you know how it stores data.

  • The .git folder is a database.

  • Blobs hold content.

  • Trees hold folder structures and file names.

  • Commits hold metadata and tie the timeline together.

  • SHA-1 hashes ensure nothing is ever accidentally corrupted or overwritten.

Next time you type git commit, picture the blobs, trees, and commits linking together in your .git/objects folder. You aren't just typing commands anymore; you're manipulating a deeply elegant file system.

More from this blog

MyCohortBlogs

13 posts