Automate Remote Git Cleanup
Automate Remote Git Cleanup

Cleanup of stale branches from remote Git Repository

In previous article, we discussed about how to cleanup the local git repository via PowerShell script. We can schedule the script so that it runs on regular intervals and it can save us some time.

In this article, let’s discuss about how to automate cleanup of stale branches from remote Git repository.

Brief Overview of Git Hosting

Git is the most widely used version control system now a days. It is quickly becoming the standard for version control.

Git is a distributed version control system. Now what does this mean ? It means that once you clone the repository, your local copy of code is created. This local copy of code is a complete version control repository in itself. You can create branches in your local repository, you can commit your work to one of the branches and you can view history of those local commits. When you sync the local branch, the changes are pushed to the remote server. These fully functional local repositories make it is easy to work offline or remotely.

That was a short intro to Git if you have not used GIT yet. Generally, when we start a project, we find a service which provides Git server. For example,

  • Azure Repos (comes as part of Azure DevOps) is a set of version control tools that you can use to manage your code.
  • GitHub s another service which can be used to host the Git server
  • GitLab, Bitbucket, Amazon AWS CodeCommit are some of the other well-known Git hosting services.

Whether your software project is large or small, using version control as soon as possible is a good idea.

The Problem Statement

As stated earlier, Git is a distributed version control system. So, it has a remote repository hosted somewhere in some cloud. The local repository is on our machine. And we can create branches on remote as well as on local. That means both local repository and remote repository can have stale branches.

The local repository is on each developer’s machine and it is that individual’s responsibility (and headache) to ensure that stale branches are cleaned up regularly. We already have seen how to automate this task of cleaning up the local repository branches in previous article. In this article, we are going to have a look at how the remote repository can be cleaned and whether this task can be automated.

This article will provider a generic approach which can be used with any Git hosting provider. In addition to that there will be separate sections about Azure DevOps and GitHub to see what options are provided by these two hosting providers. There can be similar options provided by other Git hosting providers as well. But the purpose of this article is to provide you basic idea about the options at our hand. Once you have that idea, it will be easier for you to look for hosting provider specific options.

Why do we need to clean up remote branches?

Cleaning up remote branches in git is one of the best practices that every project should follow. Also, it helps in keeping your repository tidy and organized. Here are some reasons why you may want to clean up remote branches:

  • Performance: Having too many remote branches can slow down your git commands, especially when you fetch or pull changes from the remote repository. By deleting the branches that you don’t need anymore, you can reduce the network traffic and speed up your operations.
  • Clarity: Having too many remote branches can also make your repository hard to understand and maintain, especially if some of them have ambiguous or outdated names. By deleting the branches that you don’t care anymore, you can make your repository more clear and consistent.

What are options in Azure DevOps (i.e. Azure Repos) ?

Azure DevOps is one widely used Git hosting provider. When we try to access remote Git repository hosted in Azure DevOps, we can go to Branches page, which shows three tabs – Mine, All and Stale. The tab names are self-explanatory. Stale tab shows list of stale branches. Azure DevOps assumes a branch to be stale if the branch did not have commits since past three months or longer.

I could not find any ready-to-use pipeline task to delete the stale branches. Please let me know in comments if you know any.

Azure DevOps – Stale Branches View

What are options in GitHub ?

GitHub is another Git hosting provider which is free and hosts numerous open source projects. Here as well, when we click on View All Branches option, we see 4 different tabs. The tab names are self explanatory. If we select Stale tab, we would be able to see all the stale branches. The question is how does GitHub decides which branch is stale and which is not ?

As per documentation, if a branch does not have a commit in past 3 months, then the branch is assumed to be stale. Another interesting thing I observed is, GitHub provides an action Stale Branches. It has two settings days-before-stale and days-before-delete.

  • if a branch is inactive for more than days-before-stale days, then a new issue is created with branch name in its title, specifying the branch is stale.
  • if a stale branch would be deleted if it says inactive for more then days-before delete input.
GitHub – Stale Branches View

What are some approaches to automate this cleanup ?

Now, we know definition of a stale branch, which is written in the documentations of above mentioned Git repositories – any branch which does not have commits since more than 3 months.

There are few approaches that we can use to automate deletion of stale branches from remote repository

  • Firstly, check if you can use any ready-to-use pipeline extension / action to cleanup stale branches and use it
  • Secondly, we can write a script (PowerShell or Bash or Azure CLI Script) to get list of stale branches and then deleting them. In this approach, we can blindly assume that a stale branch would never be needed in future and hence it can be deleted. This script can also be included in pipeline and then pipeline can be scheduled to run at regular intervals.
  • Third option is more manual approach. Instead of assuming stale branch can be deleted, we can get list of those branches and discuss with team. It can be done in two steps
    • Stale branches which are merged – a list of all such branches can be compiled and then we can ask team members (i.e. nagging) to delete those branches.
    • Stale branches which are not merged yet – And as second step, we can get list of stale branches which were not merged to main branch. Here we may not be able to know the person who created the branch. We may be able to know the person who performed most latest commit, but that commit may have come from the merge from other branch. Hence this kind of stale branches may need longer discussions with team to decide the action to be taken.

We can extend any of these approaches by laying certain guidelines for the teams. For example, we can ask team to follow a certain naming guideline, to include user story number in branch name. Then, the stale branch cleanup script, we can use that number and make an API call to product backlog to see if the user story is marked as done. If it is done, then we can delete the stale branch blindly.

Which approach is best ?

Personally, I prefer to use feature toggle service and short lived branches. The feature toggle enables short lived branches as we can commit in progress code and keep it hidden behind the feature toggle. That helps teams to reduce the code review time as well. But another advantage is, if we follow this ‘short lived branches’ approach strictly, we can blindly delete stale branches if there are any.

Also, three months timeline is huge. One may not remember why a certain branch was created. Or the creator may have left the organization and in that case no-one may come forward and take the “risk” of deleting the stale branch. In any case, if discussions or nagging is needed for deleting stale branches, I think that would most probably result in delays in cleanup and thereby increasing the project cost. It’s because of these two reasons, I always feel automated strict cleanup should be opted by the team (of course, it has to be a team decision).

Demo: Script for Cleanup

Enough of the theory ! Now, let’s talk about the code to automate the cleanup.

The code snippet given below can be used to cleanup all stale branches, which do not have any commits since 3 months. This script takes a parameter is_dry_run, which is defaulted to true. If this parameter is true, then no deletion happens, but you get to see the branches which would be deleted. If it is set to false, then actual deletion happens. This parameter should be used to make sure you are not deleting any required or not-merged branches.

The script given below does not consider if the stale branch was merged or not. We can add that condition as well if needed.

Also, note that this script can be executed from local git repository, cloned from the remote. So, this script is not ideal for execution via pipeline. If you want to integration with pipeline, you can check Git Hosting Provider specific options. For example, if you are using GitHub, you can use Stale Branches action. If you are using Azure DevOps you can try scripting using Azure CLI and then use it in the pipeline PowerShell task.

NOTICE : Running this script will delete branches on remote. Hence it should be executed only if you are sure about what you are doing. Also, this script is just to provide you a quick-start on automation task. I am not responsible if you run this script and accidentally delete the required (not-merged) branches.

Conclusion

Initially, I thought this may be just short article, but I ended up writing a lot here !

In this article, we have shown you how to get rid of unwanted remote branches in git using PowerShell. There are some other approaches as well on the internet. Every approach will have some pros and cons. You should choose the right approach, the one that suits your needs and preferences.

I hope you find this article informative and helpful. Let me know your thoughts.

Leave a ReplyCancel reply