Monday, September 28, 2020

Stale braches cleanup in Git repo

As code development moves forward, collaboration and experimentation flourish, developers join and leave the team, the Git repos start to accumulate stale branches. There is no exact definition for for "stale branch", but both Azure DevOps and GitHub, have Stale branches view. This view displays "... branches in the repo that haven't had any commits in three months or longer". There are many reasons why branches became stale. Eventually, there will be a lots of them:


Apart from "polluting" the repo and making it harder to find branches, this situation has another side effect. When CI tools run pipelines, the worker machines (agents) have to clone repo on each run. During repo cloning, Git creates references files for braches in the local folder .git/refs/remotes/origin. This translates into a lot of small IO operations that affects pipeline execution time.

The manual clean up of the staled branches could be tedious process, especially when repo has tens, hundreds, or even thousands of such branches. Below is a simple PowerShell script that will help to automate the process.

$TTL = 90 #days
$borderTime = (Get-Date).AddDays(-$TTL)
git fetch origin
$remoteBranches = git branch -| Where-Object {$_ -like '*remotes/origin/*'} | ForEach-Object {$_.trim()}
$remoteBranches = $remoteBranches | Where-Object { ($_ -notlike 'remotes/origin/HEAD*') `
                                              -and ($_ -ne 'remotes/origin/master') }
foreach($branch in $remoteBranches){
    $branchName = ($branch.Split('/', 3))[2]
    $branchSHA = git rev-parse origin/$branchName
    $branchLastUpdate = [DateTime]::Parse($(git show ---format=%ci $branchSHA))
    if($branchLastUpdate -lt $borderTime)
    {
        Write-Output "git push origin :$branchName"
    }
}

The script needs to be run in the local repo folder, it can be executed as a file or just pasted into PowerShell console. As an output, the script will produce a list of "delete branch" git statements (without actually execution of them):

git push origin :branch_1
git push origin :branch_2
git push origin :task/xyz
...
git push origin :feature/abc

The list needs to be reviewed - the branches that have to be preserved must be removed from this list. After that, each statement can be executed individually, or all of them at ones as a batch. The excluded branches could be added into the script's branch filter (second Where-Object statement, lines 5-6) to review time in the feature.

No comments:

Post a Comment

How to backup Azure DevOps code repositories

Under " shared responsibility in the cloud " model, the client is always responsible for its own data. Azure DevOps, as a SaaS off...