The three trees
In computing, trees are data structures used to represent files and directories. You can think of them as a mapping between file paths and their contents.
There are three important trees1 in Git: working directory, staging area and repository2. Learning what they are and how different commands interact with them is key to using Git effectively.
Definitions
Working directory
As the name suggests, the working directory (a.k.a. working tree) is where you do work. It's what you interact with by using a text editor or IDE to change files.
When you create a new file in your working directory, it's initially considered untracked — Git doesn't know about it yet. At this point, you can either choose to track it (by committing it) or ignore it (by adding it to .gitignore
).
Staging area
The staging area (a.k.a. index) is used in preparation for a commit. You can incrementally move changes from the working directory to the staging area, and then persist them by creating a commit.
Repository
The repository is used to store commits, which are curated snapshots of your files. By using references (e.g. branches, tags, HEAD
) and/or revisions (e.g. HEAD~1
, main@{2.weeks.ago}
), you can select a specific commit and interact with the files at that point in time. If no references are specified, HEAD
(the latest commit in the current branch) is usually assumed.
Comparing and synchronizing trees
On a clean checkout, a tracked file is the same across the three trees:
If you change that file, you can compare the working directory to the staging area by using Diff:
If you then use Add to apply the change you made to the staging area, Diff would show no differences, because the working directory and the staging area are now in sync. However, you can compare the staging area to the repository by specifying the --staged
flag:
At this point, using Commit promotes files in the staging area to a new commit in the repository, synchronizing the three trees again:
Files can also be synchronized in the opposite direction by using Reset. The --mixed
flag discards changes in the staging area to match the repository, while the --hard
flag discards changes in both the staging area and the working directory.
Untracked files
By default, Diff only compares tracked files across trees. If you want it to include new files as well, you must first use Add with the -N
/--intend-to-add
flag.
Going beyond
Many other commands in Git interact with at least one of the three trees. Can you guess how other common commands (e.g. Checkout, Clean, Stash, Merge, Rebase) interact with each tree? Let me know in the comments!
Trees are technically a type of object in Git, and can be composed of blobs (i.e. files) and other trees. This article covers root trees, but that modifier is ommitted for simplicity.
Although the repository isn't actually a tree (for the purists, it's a directed acyclic graph), terminology such as branches and trunk has historically been used to describe it. For correctness, the repository contains commits, and each commit is associated with a tree. Also, “two trees and a graph” isn't as good a title.