MonoRepo vs Multiple Repo with some Microsoft .NET specifics


Published: 17 August 2018
Author: Jared Holgate
Category: Software Engineering
Tags: MonoRepo,Process,Git,.NET

Intro

This post is a discussion around why I am looking at a MonoRepo at all and what I see as the pro's and con's of a MonoRepo vs Multiple Repository. This topic is widely discussed elsewhere and these are just my views today given what I know right now.

Historically I have always worked with multiple repositories, usually one app, micro service, class library or solution in Microsoft terminology per repository. This has worked well for my team, but there are definitely some inefficiencies with working this way and recently I’ve been looking at MonoRepo as a way to address some of the process bottle necks I see.

The evolution of the tooling we use, particularly TFS and Visual Studio 2017 has also made this more of a realistic consideration.

What is a MonoRepo?

Just in case you aren't already aware, the difference between a MonoRepo and Multiple Repos is;

  1. Multiple Repositories, you have a separate repository for each service / library you develop and combine them into a whole application at build / integration time.
  2. MonoRepo, instead of separate repositories, you have a single repository with a folder structure inside it. Each folder contain an individual service / library. Commits, branches, clones, etc. work across your whole code base rather then one part of it.

The oft cited example of a company using a MonoRepo is Google. They have their own bespoke source control system to enable this. Off the shelf systems have not handled it well historically, but I believe they are starting to catch up now.

Current Process

To describe the current work flow with a generic example, let’s assume we have an API and UI that consumes it. Within the API repository there is a client class library that is use to test the API and is consumed by callers of the API. This client is built, packaged and push to our package feed on every CI build of the API.

In order for someone to create a new API method and associated UI change, they need to;

  1. Create 1 branch on each repo.
  2. Create the new API method and associated client changes, with relevant tests.
  3. Reference a pre-release version of the API client generated by the CI build in the UI project.
  4. Test the API client with the UI project and repeat the cycle of fix code, build, reference new client, test UI.
  5. Raise pull request for API branch so it gets merged to master. Repeat the cycle again if any code review issues are raised.
  6. Reference the new client package generated from the master branch build following the pull request merge.
  7. Raise a pull request for the UI branch...

I know it is possible to create a package locally for the API client and do a lot of testing prior to the CI build, but I generally don’t see that happening.

As you can see this process is not optimal. The same applies for other types of share libraries that are packaged (usually via nuget of npm in our case), not just API clients.

Why Consider MonoRepo

The MonoRepo solves this by directly referencing the API client or other shared library in the consumer. This allows all development and testing to happen locally in a fast cycle. Only one pull request is required and the changes are part of the same version.

Don’t get me wrong, package managers like nuget are fantastic, they have served us well over the years and will continue to for third party components or components we develop that we will never change. However for components that do change, the management of that is becoming more onerous, especially as we continue to expand our code base.

Other benefits of a MonoRepo are that it is easier to really share code. I find that despite good intentions, having separate teams working in separate repos means a lot of code duplication and a lack of shared libraries.

I also believe a MonoRepo gives better visibility of dependency consumers. It allows us to easily trigger CI builds for consumers when a dependency is updated.

One of the biggest benefits I've seen so far in my experimentation is refactoring. It is so much easier to refactor code that would normally be avoided through fear of breaking something it is dependent on.

So in list form here are some benefits that I've noted so far;

  1. Simpler development process.
  2. Easier code review.
  3. Less package and versioning hell.
  4. More visibility of dependencies.
  5. More code sharing (and actually doing it).
  6. More robust CI builds, especially for API consumers.
  7. Refactoring is a lot less scary.
  8. Potential for simpler local development environments.
  9. Potential to reduce the number of test environments required.
  10. Better knowledge sharing.
  11. Does not force you to deploy all code simultaneously, we can still only deploy code that has been changed.

I'm sure there are others I've not covered here.

Issues with a MonoRepo

As you can see my main motivation for considering moving to a MonoRepo is removing waste / bottle necks from our development process. However, I do have a number of concerns with making the switch that I fear may come back to bite me if I make the wrong call.

Examples;

  1. The repository will become huge. Can git handle this? We haven’t hit any issues with testing so far. Hopefully with GVFS we have a mitigation for this.
  2. People forget to add build triggers for dependencies. This will come down to education, definition of done, code reviews and maybe some kind of audit.
  3. Nested project dependencies in Microsoft solutions require the project dependencies of project dependencies to be referenced for a successful build. This is just not very nice and will probably make some members of my team balk. I am hoping that the benefits out weigh the cons here, for example debugging will be a lot easier. Where possible, dependencies should be consolidated and merged also.
  4. Someone making a breaking change or not versioning an API they update. This is not MonoRepo specific, but could have a larger impact as all consumers get the latest version by default for their next deployment. Good automated testing and good practice (don’t make breaking changes) should make this go away.
  5. If you require permissions at the application or API level so that only certain developers can see certain bits of source code, you are going to struggle with a MonoRepo.
  6. A change to base library could result in most of your code being rebuilt and 'force' you to re-deploy it all at once. I'm not sure yet whether this would ever really occur, but it's certainly a risk.

Microsoft Specific Innovations

Some things that have made moving to a MonoRepo a real possibility for us are;

  1. The latest project format from Microsoft means that Nuget references for project references are automatically restored for the consumer. Without this a package restore would need to be run on the dependency before the consumer could be built. All versions of .NET support the new project structure.
  2. Build triggers in TFS can be filtered at the folder level. This means that a build will only be triggered on a commit that impacts files that we care about for that application.
  3. Pull request policies in TFS allow specifying optional builds with folder level filters. Similar to 2, but for pull request triggered builds.
  4. Pull request polices allow specify specific teams or individuals for specific folders.
  5. GVFS makes me a lot less concerned about the size and growth of the single repository.
  6. Not critical this one, but certainly makes life easier. Cake builds that support the latest versions of dot net core and the new project structures. This means the same build steps can be run locally on a developers PC without committing any code.

Git History

If you are worried about losing your Git history when you merge repositories, don’t. Here is an example;

  1. Clone the mono repo. Open a command prompt and navigate to your root source control folder, then run this command;
git clone http://[TFSOrVSTSUrl]:8080/tfs/[Collection]/[Project]/_git/[MonoRepoName]

e.g. 

http://AcmeTFS:8080/tfs/Acme/Engineering/_git/MonoRepo

  1. Navigate into the MG repository. Add a remote to the repository you want to merge using this command;
git remote add -f [RemoteBranchName] [RemoteUrl]

e.g.

git remote add -f queuing http://AcmeTFS:8080/tfs/Acme/Engineering/_git/AcmeQueueing
  1. Merge the repository into the MonoRepo using this command;
git merge [RemoteBranchName]/master --allow-unrelated-histories

e.g.

git merge queueing/master --allow-unrelated-histories
  1. You now have the repository merged, but all the files are and folders are in the root directory. Create a new sub directory for the merged repository (e.g. md queueing).

  2. Move the folders and files into the new sub directory. The easiest way to do this in Windows is by using TortoiseGit. Right click the folders and files and drag them into the directory. When the prompt comes up, select 'Git Move versioned files here'. Alternatively, you can use the 'git mv' command.

IMPORTANT! If you don't use Git to move the files and folders you will loose your history and defeat the point of the merge!

  1. Now you can commit and push the changes;
git commit -m "Moved merged repository into a sub-folder"

git push

Conclusion

I’m afraid this has turned into a bit of a ramble, so I will try to conclude.

Based on my experimentation and the type of code base we work on, I think that a MonoRepo would suit us. The main benefits I see are shifting left and removing bottle necks from the delivery pipeline. There are no show stopping reasons not to give it a go anyway. Is it going be right for every team? No, probably not. My advice is to try it on a small scale and see what you think.