How To Dissect Open Source Software

Step 0

You need to have working knowledge of the language(s) the software is written in and it's dev/deployment ecosystem. E.g. if you're dealing with a Python project,you should know what a requirements.txt file is.

You have no business dissecting a project if you can't even program in the language it's written in.

Step 1

You need to be able to build and/or run the copy of the software that you cloned from the hosted git repo. If the project doesn't have clear instructions on how to run it, you make your best guess and keep trying till it works.

This is done to ensure that all subsequent work is being done on valid code. In most cases, you should also note the exact commit. My preference is to checkout the commit that is run successfully into a new branch and do all dissection work there.

Step 2

You need to identify and understand the main execution flow for the program. This will include identifying the entry point in the code.

This is a very important step.

Generally speaking there are two types of execution flow. Direct and Open-Ended.

Most Command-Line Interface (CLI) programs usually have direct execution flow i.e. you supply arguments, call the program and you get a result.

Most GUI applications have Open-Ended execution flow. You continuously interact with the program until an exit code is sent to the main thread/process (user-supplied or from another part of the program).

Step 3

Recursively document each section of the overall software.

These are the divisions I like to use:

1. Folder

2. File(Module)

3. Function

4. Line

I'll reiterate that it's important to understand the ecosystem of the language because it will give you insights why the projects are structured the way they are (assuming best practices are being followed)

In my experience, documenting up to file(module) level is sufficient for a high-level understanding of the software. If you want to contribute, you have to go down to the function level (or even the line level if you're feeling frisky :-) ).

Should you go down to the function/line level for the entire codebase?

No.

You only need to go the function and/or line level for the module that contains the code you need to understand. This may lead to more line level documentation of other modules. As you start, you'll get a better idea of what else you'll need to look at and in some cases you may even realize that you're looking at the wrong code.

Step 4

Test your understanding. This step and Step 3 above go hand-in-hand. You should continuously be adding little bits of code here and looking at your changes. This ensures that the mental model you're developing for the codebase is correct.

You repeat these two steps ad-nauseum until you're confident in your understanding of the codebase

Step 5 (Bonus Step)

Dissect non-code related files. This includes things like package files, dependency files, Makefiles, deployment scripts etc. It won't give you any direct understanding of the code itself but you'll understand the ecosystem within which the code is developed and in most cases you'll pick up some new ideas and techniques.

That's all there is to it. Now go out there and do some dissecting.