subversion | Brett Porter

As part of the process of migrating NPanday to the Apache Incubator, I had to find a way to extract the Subversion repository from Codeplex. The challenge here is that it isn’t actually a Subversion repository, but a TFS server running SVNBridge to appear like one. Its occasional quirks and timeouts had been part of the reason we decided to move. Here is what I’ve learned.

Available Tools

The aim was to get it as a dump of a Subversion repository so that it could be loaded into the ASF Subversion repository, retaining the full history. I tried everything to get it out of there – svnsync, cloning it as a Git repository, cloning it as a Mercurial repository, and other similar tools. All would timeout or freak out at some point due to the nature of SVNBridge. I made some progress with tfs2svn (which seems to be what Codeplex is using to migrate repositories to Mercurial when needed), but I started to find that not only was it needing regular manual intervention, it wasn’t quite the same (eg. Subversion properties came in as ..svnbridge hidden folders, and every comment appended the original TFS revision number).

I had tried rsvndump earlier without a lot of success, but eventually gave it another try. While it needed some help, it was making the most progress and was the one that ended up being successful.

Patching rsvndump

It wasn’t anywhere near smooth sailing, and rsvndump needed some modifications to handle the task. So I brushed off my dusty C skills and made the required changes, which can be found in my github fork of the project (pull request pending).

The main problem was that rsvndump expected to be able to do the repository all in one hit. Given that Codeplex would timeout on several requests, this made it impossible. It did allow selecting a subset of revisions, but even then it would both do a full svn log (which wouldn’t succeed), and beyond that would traverse revisions to construct a path hash (I believe for detecting moves, etc.).

To assist with these, I added a --log-window-size option, similar to git-svn. This retrieves the logs in multiple requests, avoiding problems with timeouts. Next, I added a --first-rev argument, which would start retrieving logs and content from a later revision than 0. While it introduced some risk of crashing due to missing revisions, in many circumstances it allowed restarting a dump from a later revision in a much faster manner.

The next problem was that Codeplex shares a single TFS repository between several projects, so your own revision numbers are not sequential. NPanday started at revision 21102, and ended at 60509 with a lot of gaps, having only 1427 revisions of its own. This wasn’t too much of a problem – because rsvndump was designed to deal with subdirectories of a Subversion repository it expected the gaps. The --first-rev argument helped deal with the big gap to the start. But another SVNBridge quirk was that svn copy operations copied from the (current revision - 1) – even when it didn’t exist! To correct this, I had to adjust the code to search backwards through the revision numbers until it found one that existed to make the copy operations correct.

Finally, rsvndump added padding revisions into the dump file when a revision number was missing. This is helpful if you want to maintain the same numbers, but due to my use of --first-rev they were already out and I was importing to an existing repository, so I decided to strip these out. For that, I added another flag --omit-padding-revnums.

Running rsvndump

I ended up running a command like the following:

rsvndump --omit-padding-revs --adjust-missing-revnums \
  --first $FIRST --log-window-size 1000 -v --incremental \
  -r $REV1:$REV2 https://npanday.svn.codeplex.com/svn \
  3>&1 >&2 2>&3 3>&- >$REV1-$REV2.dump | tee $REV1-$REV2.log.txt

The first few arguments are the customisations described above (and --adjust-missing-revnums to make the dumped revisions sequential). The next are the traditional svnadmin dump arguments that rsvndump honours. Finally, I redirected the output so that I could channel stdout to the dumpfile and stderr to a log file that I could also tee.

Other Codeplex SVNBridge Issues

With these changes in place, I was getting moderately successful dumps – but a few frustrating issues remained.

Firstly, many svn copy operations (such as creating a tag) were tracked file by file by Codeplex instead of at the top level directory. This resulted in further timeouts that I couldn’t work around. We had seen this manifest on the Codeplex repository, being unable to even list the /tags/ directory. I didn’t attempt to correct this, instead manually applying the copy operation again after the preceding dump, then continuing.

For example:

svn cp \
  -m '[maven-release-plugin]  copy for tag npanday-project-1.2.1' \
  $REPO/trunk $REPO/releases/npanday-project-1.2.1

svn ps svn:author "SND\jocaba_cp" --revprop -rHEAD $REPO
svn ps svn:date \
  2010-09-07T07:15:46.723000Z \
  --revprop -rHEAD $REPO

If that revision appeared in the dump file (either incomplete or not able to be applied), I’d delete it by searching for Revision-number: xyzxyz and deleting the lines up until the next revision.

Between tags and a few other stubborn revisions that wouldn’t come across (including one where even svn log wouldn’t succeed), I manually reconstructed 100 revisions like that. The upside was that it provided an opportunity to clean out some botched releases (due to the SVNBridge /tags/ issues) and branches that had never been used.

So the process was to dump as many revisions as possible, then apply to a test repository, check it out, make required modifications, and repeat. I captured all of this in a shell script so that at any time I could recreate the work repository and reapply all of the dumps and modifications to date. This because useful a few times as I gradually identified inconsistencies with a checkout from the same revision in Codeplex from having missed something.

This still uses a lot of bandwidth – starting at a given revision will both reconstruct the path hash for the whole repository at that revision, and fetch the “base revision”, which is a checkout of an entire revision, tags and all. So the process took a few days running intermittently. I also had to start the --first-rev at least one revision earlier and sometimes more, to avoid getting a cryptic Subversion error message about the “editor drive”.

Properties were also quite quirky on SVNBridge, due to the way they are apparently stored in TFS as described earlier. Some could not be removed (eg, bogus svn:mime-type), and some were set oddly (svn:ignore on a file, svn:eol-style on a directory). I chose to leave these alone and correct them after the import.

Some properties went missing, which was part of a larger problem on SVNBridge with copying from an existing revision. If you attempt to copy in the working copy and then make a modification before committing, this doesn’t show up as A+ in the svn log result later, but simply M. The dumps know it was copied, but not that it was added, so attempt to modify a non-existant file when being applied. What’s more, this step wipes out some properties that are set on directories.

In some cases, I manually applied the revision, in others I made an edit in the dump file from:

Node-action: change

Node-action: add

Deleting directories hit snags as well. I’m unsure if this was a problem in SVNBridge or rsvndump, but it would dump deletions for every path and file like so:

/tags/npanday-1.2-RC1
/tags/npanday-1.2-RC1/pom.xml
/tags/npanday-1.2-RC1/dotnet
...

When applying the dump, it would successfully delete the first then fail on the others that were already deleted in the first step. I ended up removing the nodes for all the later entries manually in these instances. You would take these out 4 lines at a time (including trailing whitespace of 2 lines):

Node-path: tags/npanday-1.2-RC1/pom.xml
Node-action: delete

Final manipulations

Aside from Codeplex, for NPanday we needed to make some more manipulations. First, changing the usernames to line up to their final accounts on the ASF using repeated changes to the svn:author revision properties.

The dump was also loaded onto another partial subversion repository that contained some intermediate history between leaving the incubator originally and arriving at Codeplex.

Loading to an existing Subversion repository and path

After all this was eventually done, and there was a repository that was matched with the history of the Codeplex one, it needed to be dumped to load into the ASF repository.

Normally, this would be a simple:

svnadmin dump --incremental --deltas work-repo >npanday.dump

However, the objective was to load this onto a path that already existed. This was because we sought to have continuity with the history from the point at which the project was forked from the incubator originally.

To achieve this, I identified the revision in the dump that matched the content in the ASF repository, which due to the initial creation of branches and tags, was revision 4. I then dumped it using:

svnadmin dump --incremental -r5:HEAD work-repo >npanday.dump

Originally, --deltas had been included to reduce the size, but we found that this caused checksum problems, possibly due to different line endings between r4 and the original in the ASF repository.

At last, this yielded a dump that could be loaded into the ASF repository, and the results can be seen here: http://svn.apache.org/viewvc/incubator/npanday. You can now see the historical continuity in files such as http://svn.apache.org/viewvc/incubator/npanday/trunk/pom.xml.

Conclusion

This took considerably more work than anticipated when we originally thought it would be a good idea to retain the history.

I found that there wasn’t a lot of information about these topics on the web, so I hope this post will help to expand that for those that might face this challenge in the future. I’ve also found that editing Subversion dump files (when not in delta-mode) is reasonably straightforward.

Interestingly I’ve learned that Subversion 1.7 will include the ability to do a remote svnadmin dump, however I don’t believe this will work when svnsync is not supported (as was the case here), or support sub-paths as rsvndump does.

Last week Dennis started things moving to have another release of the Maven Release Plugin. The release process should start very soon, so please join us on dev@maven.apache.org to help test it!

This is certainly a nice one to have out the door, not only because of the length of time since the last release but because it fixes some important bugs (Subversion 1.6 support for starters), and improves multi-module support.

Having been bitten by the latter category myself very recently I took the opportunity to get a couple of changes in.

Support for flat directory multi-module projects

This highly requested support was actually added by Deng way back in May last year, but it was only recently that I started using the new version of the plugin and discovered a small corner case I jumped in and made a couple of improvements and fixes.

While I would always recommend using a typical hierarchical Maven multi-module project, there are a number of existing projects using the flat structure, particularly in non-Java environments. It’s good that the release plugin can now support anything with a common trunk.

This means that projects like the following will now release correctly (run from the parent directory):

.
|-- release-parent
|   `-- pom.xml
|-- release-module2
|   `-- pom.xml
`-- release-module1
    `-- pom.xml

Not requiring artifacts to be in the local repository before releasing

This controversial issue has popped up a number of times and proven to be a real nuisance in releases, where a multi-module project needed to be built locally before it can be released (including the preparation test, that makes 3 full builds!), or at best spouted a large number of warnings about missing dependencies on the artifacts it was yet to build.

In the end here we decided to revert to the original behaviour and accept the limitations that came with, while making the typical release faster and easier. The release:prepare-with-pom goal has been added to cater to the use case for which the dependency resolution was put in place originally. With this intended to be the 2.0 release of the plugin, we can stick to this behaviour going forward.

In the future, Maven 3.0 has added additional capabilities for plugins to operate with their modules without building them first, which will allow a unified and enhanced release:prepare goal once more, but in the mean time we’ve opted to put in place the best solution for the majority of Maven users today.

	Jose Quijada on Creating a Custom Build Extens…
	Lonzak on A Maven-friendly Pattern for S…
	Manoel Campos da Sil… on Using a GPG agent for signing…
	rogerkiwi on OS X Mountain Lion Mail and Sm…
	Marco Lessard on Creating a Custom Build Extens…

Tag Archives: subversion

Automatically Resolving Version Conflicts in Maven POMs When Merging

Copying a Codeplex subversion repository with rsvndump

Available Tools

Patching rsvndump

Running rsvndump

Other Codeplex SVNBridge Issues

Final manipulations

Loading to an existing Subversion repository and path

Conclusion

Updated Multi-module Support for Maven Release Plugin

Support for flat directory multi-module projects

Not requiring artifacts to be in the local repository before releasing

Working around –non-interactive problems in Leopard’s Subversion

Twitter Updates

Book

Recent Posts

Recent Comments

Top Posts

Tags

Recent Bookmarks

Blog Stats

Meta

Tag Archives: subversion

Automatically Resolving Version Conflicts in Maven POMs When Merging

Spread the word:

Copying a Codeplex subversion repository with rsvndump

Available Tools

Patching rsvndump

Running rsvndump

Other Codeplex SVNBridge Issues

Final manipulations

Loading to an existing Subversion repository and path

Conclusion

Spread the word:

Updated Multi-module Support for Maven Release Plugin

Support for flat directory multi-module projects

Not requiring artifacts to be in the local repository before releasing

Spread the word:

Working around –non-interactive problems in Leopard’s Subversion

Spread the word:

Twitter Updates

Book

Recent Posts

Recent Comments

Top Posts

Tags

Recent Bookmarks