Git Patterns Antipatterns
Git Patterns Antipatterns
CONTENTS INCLUDE:
n n
Whats New in JPA 2.0 JDBC Properties Access Mode Mappings Shared Cache Additional API and more...
JPA 2.0
By Mike Keith
DZone, Inc.
www.dzone.com
#178
Get More Refcardz! Visit refcardz.com
CONTENTS INCLUDE: Step-Wise Migration Hybrid SCM Git Champions Cheat Sheets Replication Local-Only Rebasing... and More!
Git is the most popular DVCS (Distributed Version Control System) in terms of market adoption. Grasping Git adequately, however, can be difficult. This Refcard is designed to help you transition to Git with confidence and understanding, and is especially suited to enterprise-scale application development. This card assumes familiarity with version control systems (VCS) in general, and at least passing familiarity with Git. For Git basics, see Refcard #94: Getting Started with Git.)
Description
Define one set of permissions for experimental branches and another for stable release branches. Enforce compatibility of Git protocols with your ICT security standards. Make sure everyone is fully accountable for all commits.
#11. Git protocol strategy #12. Identity enforcement via user registry
GIT CHALLENGES
Most teams considering Git are already using some kind of VCS (e.g. Subversion, CVS or ClearCase). These teams face two key challenges.
DVCS makes you think differently. The concepts and terms proper to DVCS are deeply different from those proper to a traditional, centralized VCS. Translate these terms and concepts immediately.
Anti-pattern
DONT migrate your code in a single step. This can sound easy at first, but poorly planned initial migration can damage your repos and branches considerably. Step 1: Define migration scope correctly Take some time to define the scope of what is being migrated to Git. Rarely migrate everything a large mass of stable code history, happily residing in your current VCS, will just weigh down what should be a lean tool. Basically: DONT migrate dead repositories and branches (unless you have independent reasons to decommission your legacy VCS, e.g. licensing restrictions) DO wield a judicious change-depth knife, carefully defining a maximum depth of changes in your branches history to migrate
Pattern
#1. Step-wise migration #2. Git champions
Description
Don't migrate everything at once . Acclimate your teams to DVCS-thinking by clustering migrations around champions. Keep Git's guts transparent and use tools only after everyone knows how the Git CLI works. Make sure cheat sheets are readily available, even after initial adoption. For early adoption, and later for small and medium teams, a shared blessed repository will ease developers into Git and simplify CI. For large enterprises, the single blessed repo pattern makes less sense. Avoid anarchy, but enforce order via a well-understood strategy, not a restrictive VCS. Define one topic branch for each individual feature. Limit rebasing to local repositories or individual branches only.
DZone, Inc.
www.dzone.com
Remember that the migration process itself has to physically create every intermediate file status and store all snapshot trees in the Git repository. This operation may take a considerable amount of time, and large chunks of change history may be unhelpful, given your current pipeline. Step 2: Migrate branches Heres some good news: Git provides out-of-the-box commands for migrating from Subversion (git svn) and CVS (git cvsimport). And now the bad news: these out-of-the-box commands can take a long time to run, so you may need to repeat the process if you dont want to block the productivity of your team. DO use scripts to run the migration again and again. This will (a) help you predict execution times and (b) fine-tune the list of branches and repositories that you intend to migrate. Step 3: Migrate build infrastructure Continuous Integration (CI) and Continuous Delivery (CD) frameworks like Jenkins makes are not tied to any particular VCS. But adding a VCS to your release mix can tangle CI / CD processes significantly. For example, your release process may contain a lot of existing VCS hardcoded commands that need to be changed manually. DO duplicate existing scripts and declare a script freeze on the original versions until the Git migration is completed. Step 4: Define a cutover date Rigorously avoid cutover creep on a project-by-project basis. DONT force the entire company into a single cutover date. Its much better to start with a few projects, get feedback on the transition while other projects are still getting ready to migrate, and improve the process for projects with later cutover dates. DO set migration projects to read-only at cutover so that developers can then commit to Git only. Otherwise, youll risk stranding a few stragglerprojects in an old release paradigm. Furthermore, the early-migrating teams will have developed expertise and confidence with Git when the later-migrating teams are just getting started which will help the earlier team-members serve as Git champions (see Pattern #3). Step 5: Commit to Git! But just in case: DO keep a backup of your existing VCS system and its build running until all projects are running on Git flawlessly. Migrating once is hard enough and more depressing when the direction is backward. DEFINE your roll-back procedure you may run into unpredictable problems, such as people misusing the tool or with your Git Server or Build stability, and you must not let this interfere with your productivity. Insofar as the first Git migration attempt damages productivity, the next migration attempt is unlikely to happen. Hybrid VCS Throughout the migration process you need to manage more than one system at a time (i.e. legacy Subversion and the new Git repositories). You may consider as well, though not suggested, to keep some of the projects in your legacy VCS, because of their complex project history or their rather excessive cost to migrate consolidated (or end-of-life) project maintenance to a different system. DO use proper tools (i.e. CollabNet TeamForge or SubGit) to make sure that you are able to manage both tools during migration in a consistent way. The Bottom Line: Avoid big-bang migration Doing everything at once will not make things simpler, even though you may think that people can use local code for a couple of days.
Productivity will almost certainly suffer if you migrate everything at once. If you migrate everything at once, you wont learn anything from early iterations. No teams will gain significantly more Git experience than others (except on local branches, which wont help much). And worse: if you migrate everything at once and something doesnt work right, youll never know what went wrong.
Anti-pattern
Changing the culture of your developers is usually the hardest part of the migration process. Months after your migration cutover date, you might still hear people commenting that using the previous VCS was easier. Buy-in is always crucial, but especially for DVCS; and when buy-in is crucial, plainvanilla tech training is not enough. DO reward buy-in from motivated people who will experiment with the new tools before distribution to a wider audience. DO choose the most dynamic and innovative members of your team to drive the change. Give them time to understand the new concepts in detail let them earn the title of Git champions. DO advertise your Git champions throughout the company. Enterprise-wide backing will magnify your Git champions effectiveness. DO distribute your Git champions among different locations and teams for DVCS, local support, and from developers on the same project/team, this is essential. DONT underestimate the complexity and danger of Git. It isnt reasonable to expect everyone to learn all 150+ commands from the documentation alone. For a tool this complex, the more local the support, the better. Also consider commercial support: http://www.collab.net/support/supportprograms#git The Bottom Line: Never let inexperienced developers run with Git by themselves When youre coding for fun, learn to walk by falling and getting back up. When youre coding for enterprise-level profit, leaving Git in the hands of inexperienced developers is like abandoning a toddler with a dagger in its hands. For a frightful appearance of this anti-pattern, consider this 2012 incident, wherein Eclipse experienced a disastrous branch loss due to a wrong Git command syntax: https://bugs.eclipse.org/bugs/show_bug.cgi?id=361707 There are management tools available that ease the migration to Git, and help enforce security and control. A good overview is at www.collab.net/git
The path to learning how to use a DVCS effectively is long and difficult. Distributed systems are inherently complicated: you need to manage consistency between different states. Drive your first DVCS as close to the metal as possible. Use the command line only, until you know second-nature how Git works. Start with the simplest Git commands:
|
Git identity settings (git config user.name, git config user.email) Git local and remote operations (git push, git fetch, git pull)
DZone, Inc.
www.dzone.com
Searching through Git history (git log, git diff, git bisect, git grep) (Interactive) rebase (git rebase, git rebase -i)
At this stage, tools may come in handy to increase productivity. Tools also may help to: Display your Git branch network topology Display files status (untracked, changed, stages, committed) as decoration on your file icons Perform two- or three-way file merges
The Bottom Line: Learn to feel how DVCS works by learning to think exactly as Git thinks You may not be surprised to learn that the most popular tool for Subversion is TortoiseSVN (http://tortoisesvn.net) which makes version control pretty much drag-and-drop file copy. So theres a good chance that newlyminted Git users will expect to use similar tools for Git. But drag-and-drop file copy is nothing like using DVCS.
But as the dev-social graph gets more complex, peer-to-peer and replication capabilities can quickly get dangerous. To keep Git safe for medium and larger teams, choose one of the following two patterns.
Developers with no knowledge of Git and the underlying concepts of DVCS will be heavily reliant on cheat sheets and thats okay. So: PUBLICIZE pre-defined git formulas for common tasks (e.g. http:// refcardz.dzone.com/refcardz/getting-started-git, or http://www.cheatsheets.org/saved-copy/git-cheat-sheet.pdf). If youre migrating from Subversion, use the following equivalence table: Subversion / Git equivalence table
When teams increase in size and are distributed among different locations, or when you need to integrate with an ecosystem of Continuous Integration (CI) / Continuous Delivery (CD) for builds, tests and deployments, it is key to define and publish one repository the blessed repository as a single point of reference for the entire team.
Task
Repository creation Checkout Server branch File management
Subversion
svnadmin create repo svn co url/branch svn add file svn rm file svn mv file svn commit svn up
Git
git init git clone url branch git add file git rm file git mv file git commit -a git push git stash git pull rebase git stash pop git checkout branch git status git checkout path git log path git blame path git diff rev path git branch initialrev git tag tagname
Switch branches Local files status Revert to latest version Display file history Show annotated file Show file differences Create branch Create tag
svn switch url/branch svn status svn revert path svn log path svn blame path svn diff rrev path svn copy url1 branch svn copy url1 tagname
DONT force large distributed Enterprise organizations to use a unique central blessed repository node, especially when bandwidth is limited and connectivity between sites is unreliable. You may also want to include extra resilience for high availability, even when bandwidth is not an issue. DO server-side replications across sites by enabling push/pull among blessed repositories located in the main geographical locations for development. For example, you may have one or two US-based replicas, plus one for Europe, one for Asia-Pacific, and one for Latin America. The Bottom Line: Avoid the temptation of the good old centralized days, but dont fall in love with peer-to-peer repos blindly Sometimes companies new to DVCS will go full-blown distributed for no good reason otherwise you might as well stick with Subversion, right? Wrong. Git is very flexible in terms of distributedness. Just use Git exactly as each team requires. Unnecessary use of peer-to-peer repositories can also generate significant push-back, especially for large teams that are already unsure about DVCS.
DZone, Inc.
www.dzone.com
Git lets you define branches and organize them into hierarchical namespaces. For large enterprises, as for large codebases, careful namespace management is essential. DEFINE your hierarchy of namespaces start from refs/heads as the base namespace for all branches. PUBLISH this hierarchy across your organization in multiple formats (e.g. printouts on office walls, commonly-used internal wikis, shared screens, whiteboards) wherever your developers already store information on strategy It will then be clear what the branches represent. This clarity will help: Release management Production support and hot-fix management Development of sprints and spikes/experiments
Hot Tip
Advanced Git Servers, such as Google Gerrit Code Review, dene additional name-spaces (refs/for/) for managing code-contribution and review, automating the creation and merging of implicit topicbranches (identied by unique Change-Ids) and validating through scoring the maturity of contributions.
The Bottom Line: Avoid interleaved commits To get a better gut feel for the problem, consider the following situation. Feature A has commits A1, A2 and A3; feature B has commits B1, B2 and B3; the master branch has commits A1, B1, A2, B2, A3, B3. If feature B is de-scoped for any reason (quality, time-to-market, security risks, etc.) the master branch history will have no points that will lead to a stable-xyz release. Then youll need to cherry-pick A1, A2 and A3 into the main branch, which itself risks conflicts, because of the missing commits in the history that relate to the de-scoped feature.
Heres a typical branching scheme in Git: refs/heads/master (main line for future releases) refs/heads/releases/stable-x.y.z (set of branches for integrating changes and stabilizing code for release) refs/heads/user-xyz/mybranch (optional but very useful: namespace for individual users, allowing them to create their own private branches and share them for review)
The Bottom Line: Avoid DVCS anarchy by publishing a branch strategy DONT allow small teams to organize branches without any standards. DO make it an organization-wide decision whether or not to define rules on branching. Git history is not necessarily a hierarchical structure; complete anarchy may lead to obscure branching histories.
git rebase is probably one of the most useful day-to-day commands in Gits hefty arsenal. It works as a time-machine it lets you repeat a set of changes from a specific point in your project history. More impressively, commits can be changed, moved or even deleted. You can even create a temporary branch in order to abort the operation and restore the original set of commits. You can imagine how this can be incredibly powerful when the person in control knows exactly what needs to be done.
Per-feature topic branching offers two advantages: 1. 2. It lets features be developed in parallel (which also encourages good coding practices). It lets features be merged based on quality and maturity level per feature (which often dont align across features).
Hot Tip
Git keeps unreferenced deleted objects in its store. They can be recovered by looking at Git reog (audit of all references changes). However, when git gc (Garbage Collection) is executed, unreferenced objects are physically removed and cannot be recovered anymore. This is a compliance concern for most enterprises.
DO define a namespace for all topic branches (e.g. refs/heads/topics/ topic-abc) In a traditional VCS like Subversion, topic branches were typically avoided for two reasons: 1. 2. Branching was easy and cheap but merging was painful and expensive. Consequently, continuous integration of features was feasible only when all topics were in the same branch.
But when two developers are working on the same branch - if they both want to rebase the same branch - the last one wins, and the loser isnt even notified that his push has been completely wiped out Therefore: DONT push a rebased branch to a remote repository, unless you are absolutely sure that nobody else has made any changes to that branch. If you ever need to force push, always notify everybody before you proceed. DO consider enterprise-grade management tools that provide history protection. (see pattern #15 ). The Bottom Line: Restrict force-push very carefully When a consistent branching policy has been defined (see Patterns 7 and 8), the problem of preventing history loss is easier to resolve. Developers can simply respect the policies associated with different branch namespaces.
DZone, Inc.
www.dzone.com
DO choose a Git server tool that can implement branch-level or finegrained permissions (not all currently support this feature) This will let you open force-push to private branch name-spaces (e.g. refs/ heads/user-xyz/mybranch), which will allow developers to benefit from the power of the Git history rewrite. You should also restrict operations when dealing with common branches such as master development or release. DO regular code reviews within your development team when pushing to common stable or development branches. Even if you arent deleting other developers histories, you could break the build and block team integration tests for hours.
Git allows peer-to-peer code exchange without any identity or credentials check via a network or security protocol. (Git is designed to allow commits on behalf of a potentially unverified third person.) Because of this, youll need to enforce identity on your own if youre going to maintain responsibility for commits. Note that Git distinguishes (a) the Author (who created the original committed code) from (b) the Committer (who amended the commit or pushed the code to a repository). Accordingly: DO verify the identity of the Git Author according to your company or source code license guidelines. (Often licensing reasons will dictate whether you can allow external team members to contribute code.) DO verify the identity of the Git Committer (or at least the declared e-mail address) against the user registry used to validate its credentials. For instance, one developer claiming to be linus@torvalds.org (which can be achieved simply by executing git config user.email linus@torvalds.org) will not be allowed to push unless he proves that he knows the password associated with that e-mail address in the Company User Registry. The Bottom Line: Avoid accidental flexibility DONT allow small workgroups to get rid of security. Small workgroups may think real identity enforcement is not needed, since the team members know each other and work face-to-face. But when you dont check identity in the real world, youll often see many inconsistencies in a Git branch log missing authors (Git provides a warning but it can be ignored), typos, use/abuse of nick-names. And then it becomes difficult to tell who made what changes to the code.
Hot Tip
Google Gerrit is the most widely used Git Server that allows both branch-level security and code-review.
The Bottom Line: Hold teams responsible, but use Git permissions to keep master and release branches safe You might think that only large enterprises need to enforce security at this level, but Gits power can make it dangerous. The difference between a normal and a forced push is just the difference between -f and + on the Git command line. No warning, no confirmation request and a slip of the finger kills an entire branch history. Git Tools can make mistakes even easier. Once upon a time, a developer innocently clicked Yes on the message box Your push is not fastforward: do you want to force it?
Part of the genius of Git is the division between the repository logic and protocol logic. You can have a complete Git development and exchange workflow without using any real network protocol implementation. In theory, that makes the network protocol choice more flexible. But in reality, because SSH protocol is so common, this has led to some difficulties. Workgroups and companies typically have firewalls and ICT Security rules, which are potentially incompatible with some of the protocols available in Git. For example, the Git native protocol does not provide any user authentication layer so anybody can push and pull from a repository endpoint without an identity credential check. Other more secure protocols, (e.g. SSH) may be incompatible with your ICT Security requirements perhaps because of authentication of physical users on a remote Unix system exposed via OpenSSH, or because the authentication keys do not expire and are not archived in the Company LDAP repository. Or more simply: the company firewall might restrict access to port 22. To prevent this from happening: DO an up-front analysis of the most appropriate protocols for your company and PUBLISH them as standards across your development teams. DONT use native Git protocol to push to any central repository. Use it only for peer-to-peer code exchange within your local network.
The distributed nature of Git certainly lends itself well to geographically distributed development. But side-by-side peer reviews can be challenging if developers dont share the same office space all the time. DO codify workflows for code reviews with peers DO consider additional tools for code review. The leading code review tool is Google Gerrit (free open source) DO consider automating code build validation by incorporating automation into code reviews (e.g. with Jenkins)
Gits push f command lets you rewrite history by erasing entire code branches and associated ref logs. Avoid the obvious dangers by: DO factor history protection into your overall Git management strategy DONT rely on Git ref logs alone -- they are not true audit logs
DZone, Inc.
www.dzone.com
DO perform frequent backups of the master repositories DO consider complementing Git with specialized 3rd party tools to provide Git history protection See this video for more on history protection for Git: http://www.youtube. com/watch?v=z_FN1NvneBw
CONCLUSIONS
Git is a powerful tool for agile development but the flip-side of flexibility is chaos. Planning and driving Git adoption carefully is necessary to maintain code quality and avoid product delays. The inherent flexibility of DVCS needs to be harnessed at both adoption and iterated development stages.
Further Reading
Refcard #94: Getting Started with Git Various Git resources and downloads Go Agile with Git webinar series More on Git history protection
For all its power, Git does not integrate easily with many ALM tools (e.g. trackers, quality management tools, build servers). This will change as Git grows more popular. But then integration quality is likely to suffer, because the Git code base is refactored frequently. DONT plan your Git strategy as a silo without considering other tools and processes. DO discuss and review your Git strategy with project managers, product owners, build managers and quality managers. DO explore management tools that provide the governance and integration for 3rd party tool integrations while shielding those integration points from a fast-changing Git code base
RECOMMENDED BOOK
Git is the version control system developed by Linus Torvalds for Linux kernel development. It took the open source world by storm since its inception in 2005, and is used by small development shops and giants like Google, Red Hat, and IBM, and of course many open source projects. Buy Here
Free PDF
DZone, Inc. 150 Preston Executive Dr. Suite 201 Cary, NC 27513 DZone communities deliver over 6 million pages each month to more than 3.3 million software developers, architects and decision makers. DZone offers something for everyone, including news, tutorials, cheat sheets, blogs, feature articles, source code and more. "DZone is a developer's dream", says PC Magazine.
Copyright 2013 DZone, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
888.678.0399 919.678.0300 refcardz@dzone.com Sponsorship Opportunities sales@dzone.com $7.95 Refcardz Feedback Welcome
Version 1.0