Read our newest book, Fundamentals of DevOps and Software Delivery, for free!

Introducing git-xargs: an open source tool to update multiple GitHub repos

Headshot of Zack Proser

Zack Proser

APR 21, 2021 | 9 min read
Featured Image of Introducing git-xargs: an open source tool to update multiple GitHub repos
Today we’re open sourcing git-xargs, a command-line tool (CLI) for making updates across multiple Github repositories with a single command. You give git-xargs a script to run and a list of repos, and it checks out each repo, runs your scripts against it, commits any changes, and opens pull requests with the results. At the end of each run, you get a detailed report on exactly what happened with each repo:
For example, have you ever needed to add a particular file across many repos at once? Or to run a search and replace to change your company or product name across 150 repos with one command? What about upgrading Terraform modules to all use the latest syntax? How about adding a CI/CD configuration file, if it doesn’t already exist, or modifying it in place if it does, but only on a subset of repositories you select?You can handle these use cases and many more with a single git-xargs command. Just to give you a taste, here’s how you can use git-xargs to add a new file to every repository in your Github organization:
git-xargs \ --branch-name add-contributions \ --github-org my-example-org \ --commit-message "Add CONTRIBUTIONS.txt" \ touch CONTRIBUTIONS.txt
In this example, every repo in the my-example-org GitHub org have a CONTRIBUTIONS.txt file added, and an easy to read report will be printed to STDOUT :
In this blog post, I’m going to cover the following topics
  • Example use cases for this tool
  • Why we built git-xargs
  • How git-xargs works under the hood
  • How you can get started with git-xargs quickly

In the following sections we’ll take a look at some of the use cases we found git-xargs useful for tackling internally, as well as some suggested tasks it would be well-suited for in the wild.
  • 1.This script will create a LICENSE.txt file, if it doesn’t exist already, and put the MIT license in it.
  • 2.If a LICENSE.txt file already exists, it will update the copyright year.
#!/usr/bin/env bash YEAR=$(date +"%Y") FULLNAME="Gruntwork, LLC" function create_license { cat << EOF > LICENSE.txt MIT License Copyright (c) 2016 to $YEAR, $FULLNAME Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. EOF } # Copyrights should be declared as "$CREATION_YEAR to $CURRENT_YEAR" # Therefore, this sed command will look to update the date immediately following the word "to" function update_license_copyright_year { echo "Updating license copyright year to $(date +%Y)..." sed -i "s|to \([1-9][0-9][0-9][0-9]\)|to $(date +%Y)|" LICENSE.txt if [ $? -eq 0 ]; then echo "Success!" else echo "Error!" fi } # if the repo does not contain a LICENSE.txt file, then create one with the correct year if [ ! -f "LICENSE.txt" ]; then echo "Could not find LICENSE.txt at root of repo, so adding one..." create_license else update_license_copyright_year fi

We use CircleCI extensively, and we wanted to begin leveraging their new contexts feature so that we could increase the security and control of our build projects.Using contexts is more maintainable and secure than copying and pasting the same secrets as environment variables throughout every repository that needs access to the same credentials, such as AWS keys in order to run Terratest integration tests on every build.By having all of our projects use a single CircleCI context, we could easily rotate our secrets in one place and have all of our projects pick up the change instantly. However, to get this working we also needed to update our CircleCI workflows syntax version to 2.0 across all our repos, as this is the earliest version that introduced support for contexts. Imagine our YAML files looked something like this:
workflows: # We need to upgrade this version to 2 across all our repos to be able to use the context nodes! version: 1 build-and-test: jobs: - test: context: - Gruntwork Admin filters: tags: only: /^v.*/ - build: context: - Gruntwork Admin requires: - test filters: tags: only: /^v.*/ - deploy: context: - Gruntwork Admin requires: - build filters: tags: only: /^v.*/ branches: ignore: /.*/
To accomplish this, we leveraged Mike Farah’s excellent yq tool to programmatically update all our .circleci/config.yml files. This script is a simplified version that shows how you might bump your CircleCI workflows to the version that supports contexts, but you can imagine how you could easily extend this to add or rotate CircleCI contexts, etc.
#!/usr/bin/env bash echo "Upgrading CircleCI workflows syntax to 2..." yq w -i .circleci/config.yml 'workflows.version' 2 # remove stray merge tags from the final output sed -i '/!!merge /d' .circleci/config.yml

We recently renamed all of our repos to be compliant with Hashicorp’s Terraform Cloud, which required that every repo be prefixed with terraform-aws-<module-name> . Without git-xargs this would have been a pretty painful exercise given the scope of our Terraform library.After we renamed all our repositories through the Github UI, we still needed to update all our internal references throughout our source code to point to the updated repo names (in READMEs, code and URLs). We were able to write one Bash script that did this for us:
#!/usr/bin/env bash # We renamed a ton of our repos to the Terraform Registry naming format. This script uses grep and sed to search for # references to the old names and replace them with the new names. # Bash doesn't have a good way to escape quotes in strings. The official solution is to list multiple strings next # to each other (, but that becomes unreadable, especially with regex. # Therefore, to make our regex more readable, we are declaring clearly-named variables for the special characters we # want to match, and including those in a string with no need for escaping or crazy concatenation readonly DOUBLE_QUOTE='"' readonly SINGLE_QUOTE="'" readonly BACKTICK='`' readonly START_OF_LINE='^' readonly END_OF_LINE='$' readonly FORWARD_SLASH='\/' readonly DOT='\.' readonly WHITESPACE='\s' readonly OPEN_BRACKET='\[' readonly CLOSE_BRACKET='\]' readonly OPEN_PAREN='\(' readonly CLOSE_PAREN='\)' readonly OPEN_CURLY_BRACE='\{' readonly CLOSE_CURLY_BRACE='\}' # When replacing old names with new, these are regular expressions for the characters we allow before a name or after. # We check these characters explicitly to make sure we don't accidentally replace one of the names when it just happens # to appear as a substring in some unrelated word. E.g., "module-ci" should NOT be replaced in # "gruntwork-module-circleci-helpers". readonly ALLOWED_CHARS_BEFORE="($START_OF_LINE|$WHITESPACE|$FORWARD_SLASH|$DOUBLE_QUOTE|$SINGLE_QUOTE|$BACKTICK|$OPEN_BRACKET|$CLOSE_BRACKET|$OPEN_PAREN|$CLOSE_PAREN|$OPEN_CURLY_BRACE|$CLOSE_CURLY_BRACE)" readonly ALLOWED_CHARS_AFTER="($END_OF_LINE|$WHITESPACE|$FORWARD_SLASH|$DOUBLE_QUOTE|$SINGLE_QUOTE|$BACKTICK|$OPEN_BRACKET|$CLOSE_BRACKET|$OPEN_PAREN|$CLOSE_PAREN|$OPEN_CURLY_BRACE|$CLOSE_CURLY_BRACE|$DOT)" # The list of repos to replace, in pairs, where the first entry is the old name and the second entry is the second name # (we use this ugly array instead of a map because the old Bash version on Mac doesn't support maps). readonly REPLACEMENT_PAIRS=( "module-vpc" "terraform-aws-vpc" "module-aws-monitoring" "terraform-aws-monitoring" "package-directory-services" "terraform-aws-directory-services" "module-file-storage" "terraform-aws-file-storage" "module-ecs" "terraform-aws-ecs" "module-security" "terraform-aws-security" "cis-compliance-aws" "terraform-aws-cis-service-catalog" "aws-service-catalog" "terraform-aws-service-catalog" "aws-architecture-catalog" "terraform-aws-architecture-catalog" "package-terraform-utilities" "terraform-aws-utilities" "module-ci" "terraform-aws-ci" "module-asg" "terraform-aws-asg" "module-server" "terraform-aws-server" "package-beanstalk" "terraform-aws-beanstalk" "package-openvpn" "terraform-aws-openvpn" "module-data-storage" "terraform-aws-data-storage" "module-load-balancer" "terraform-aws-load-balancer" "package-zookeeper" "terraform-aws-zookeeper" "package-kafka" "terraform-aws-kafka" "package-messaging" "terraform-aws-messaging" "module-cache" "terraform-aws-cache" "package-static-assets" "terraform-aws-static-assets" "package-elk" "terraform-aws-elk" "package-mongodb" "terraform-aws-mongodb" "package-lambda" "terraform-aws-lambda" "package-datadog" "terraform-aws-datadog" "package-waf" "terraform-aws-waf" "package-sam" "terraform-aws-sam" "module-ci-pipeline-example" "terraform-aws-ci-pipeline-example" ) # Finds all files in the repo to replace, taking care to exclude the .git folder, Terraform & Terragrunt scratch # folders, binary files, and other files we shouldn't be touching. function find_files_to_update { find . \ -not -iwholename '*.git*' \ -not -iwholename '*.terragrunt-cache*' \ -not -iwholename '*.terraform*' \ -not -iwholename '*.png' \ -not -iwholename '*.jpg' \ -not -iwholename '*.jpeg' \ -not -iwholename '*.gif' \ -not -iwholename '*.bmp' \ -not -iwholename '*.tiff' \ -not -iwholename '*.DS_Store*' \ -not -iwholename '*.go' \ -not -iwholename '*go.mod' \ -not -iwholename '*go.sum' \ -type f } # Format a regex replacement string for use with perl. The returned value will be of the format: # # s/<REPO_OLD_NAME_1>/<REPO_NEW_NAME_1>/g; s/<REPO_OLD_NAME_2>/<REPO_NEW_NAME_2>/g; s/<REPO_OLD_NAME_3>/<REPO_NEW_NAME_3>/g; ... # # This string will allow us to replace multiple values in a single call to Perl. # # function format_replacement_string { local replacements="" local i old_name new_name for ((i=0;i<"${#REPLACEMENT_PAIRS[@]}";i+=2)); do old_name="${REPLACEMENT_PAIRS[i]}" new_name="${REPLACEMENT_PAIRS[i+1]}" # This is the sed-like regex for the replacements we'll be doing. To help create this regex, I used this handy # online regex tester, that not only gives you nice highlighting, but even lets you define a bunch of test cases to # check against! # # # replacements+=" s/$ALLOWED_CHARS_BEFORE$old_name$ALLOWED_CHARS_AFTER/\$1$new_name\$2/g;" done # Strip extra space at start of string: echo "${replacements# }" } # The main entrypoint for this script function update_all_repo_names { local replacements replacements=$(format_replacement_string) local files_to_update files_to_update=($(find_files_to_update)) local file for file in "${files_to_update[@]}"; do # I originally used sed, but on Mac, sed added an unnecessary newline at the end of every single file it touched, # so I switched to Perl. This also has the added benefit of allowing us to process multiple replacements in a # single call. perl -i -pe "${replacements[@]}" "$file" done } update_all_repo_names

Why did we build this? At Gruntwork we maintain over 150 repositories, containing hundreds of thousands of lines of code. Thousands of developers rely on this code in production.This means a large part of what we do is keep our code, especially our Infrastructure as Code (IaC) library, up to date with best practices, new releases (e.g.; new Terraform versions), and security patches. In addition to constantly shipping improvements and new features for our IaC library and service catalog, we also have to stay on top of maintenance tasks that are constantly cropping up.And we do all of this with a team of less than 20 engineers! git-xargs helps us to more efficiently and quickly perform tasks that require updating many of our repositories at once.

git-xargs allows you to run a script (or multiple scripts, e.g., Bash, Ruby, Python) against 5, 50, or 150 repos at once! You can select the exact repos you want to run it against either by supplying the --github-org flag to match every repo in your Github org, or by providing a flatfile that explicitly lists which repos you want it to operate on (see below for an example).
  • 1.git-xargs will clone each of your selected repos to your machine to the /tmp/ directory of your local machine.
  • will checkout a local branch (whose name you specify)
  • will run all your selected scripts against your selected repos
  • will commit any changes in each of the repos (with a commit message you can optionally specify)
  • will push your local branch with your new commits to your repo’s remote
  • will call the Github API to open a pull request with a title and description that you can optionally specify. If you don’t specify these, git-xargs will use your commit-message for the PR title and description, if you provide one, or fall back to defaults for all 3, if you don’t.
  • will print out a detailed run summary to STDOUT that explains exactly what happened with each repo and provide links to successfully opened pull requests that you can quickly follow from your terminal. If any repos encountered errors at runtime (whether they weren’t able to be cloned, or script errors were encountered during processing, etc) all of this will be spelled out in detail in the final report so you know exactly what succeeded and what went wrong.
git-xargs does all this using goroutines, so it is pretty fast, as it runs against multiple repos concurrently.

Here are some other starter ideas for scripts that would be good candidates to run via git-xargs :
  • Modify package.json files in-place across repos to bump a common node.js depdency using jq
  • Update your Terraform module library from Terraform 0.13 to 0.14 .
  • Remove stray files of any kind, when found, across repos using find and its exec option
  • Add a new file of any kind with conditional contents to repos using heredoc syntax:
  • Rename every instance of a company or product name that has changed using sed
  • Add baseline tests to repos that are missing them by copying over a common local folder where they are defined
  • Refactor multiple Golang tools to use new libraries by executing go get to install and uninstall packages, and modify the source code files’ import references

Please give git-xargs a shot and let us know what you think! Grab a copy of the binary from the git-xargs releases page, give it execute permissions, and run --help to get started:
chmod u+x git-xargs git-xargs --help
If you have a good script that you believe is generic enough to be of use to many other people, please open a pull request against our scripts directory so that others may benefit from it!If you find bugs or have ideas for ways we could extend and improve it, please feel free to file a Github issue.

Explore our latest blog

Get the most up-to-date information and trends from our DevOps community.
TerraformResouces Image

Promotion Workflows with Terraform

How to configure GitOps-driven, immutable infrastructure workflows for Terraform using Gruntwork Patcher.

Jason Griffin

October 3, 2023 7 min read
TerraformResouces Image

The Impact of the HashiCorp License Change on Gruntwork Customers

How to configure GitOps-driven, immutable infrastructure workflows for Terraform using Gruntwork Patcher.

Josh Padnick

October 3, 2023 7 min read