Wednesday, October 16, 2013

Committing content to RTC SCM with the SDK

Now comes the fun part: committing changes to Rational Team Concert's SCM. We're going to write a program that commits new content to a file in RTC without loading it first. This isn't possible in the Eclipse, Visual Studio, or command line interfaces.

Note: This post uses internal API. It may change at any moment, stranding you on a specific version of RTC. 

You can download a zip of the Eclipse project, or load it from JazzHub.

Relevant Architecture

As you know, RTC SCM has repository workspaces. Workspaces record the directory structure of source code in the repository. We refer to files and folders in the workspace as items. Whenever a user changes an item, a change set is created that moves the item from a before state to an after state. 

If the user modifies the content of a file, the before state in the change set has the old text while the after state has the new bytes. 

Internally, items have state ids that are used to uniquely identify the before and after states.

Change sets are either active or complete. An active change set can be given new after states, while a complete change set cannot have new after states associated with it.

To avoid ambiguity, an item can only appear in one active change set per workspace. In other words, if I want to commit a new change to some item that is already in an active change set, I must commit to the existing change set.

Object Model

In this example we're going to be diving deeper into the guts of the Jazz platform than usual, so you'll need to know a little about how it works. Here's a quick sketch.

The RTC server stores objects. Every object has a unique item ID as well as a unique state ID. When one object references another, it does so with a handle. The handle consists of the item ID and, if the object cares about a specific version, the state ID. Implicitly both the object and handle have a type. 

To minimize bandwidth consumption, the RTC server and client usually exchange handles. But eventually they need to exchange real data. When that happens, the client will fetch the full object. In our example, when the client wants to learn more about a change set, it will fetch that change set. 

Fetched objects are immutable. The client cannot change them without getting a working copy first. Fields on the working copy may be set as appropriate, then the client saves it back to the RTC server. When the client modifies the object representing a file, it creates a working copy of that file. 

We'll see more and more of handles, fetches, and working copies as we explore this sample.

Pseudocode 

Our uploader will perform the following steps:
  1. Find the repository workspace and component named in the arguments. 
  2. Find the file item in the workspace. 
  3. Upload the file content to the repository.
  4. Choose a change set to record the modification. 
  5. Tell the repository to set the after state of our change set to be the uploaded content. 
You'll notice that step three is a little weird: we upload content and then create the change set, rather than doing it in one step. Welcome to the land of implementation details. 

API and SDKs

This post uses internal API. It may change at any moment, meaning that any code you write will be tied to a specific version of RTC. So don't get upset if that happens. In practice, this code is pretty stable, so it probably won't happen any time soon, but...

We're going to be committing from the client, so we need the following classes:
  • IWorkspaceConnection - the logical representation of a repository workspace available in the client side SDK. It provides handy caches and methods simplifying workspace access. 
  • IFileContentManager - provides read/write access to file content stored in the RTC SCM. (this is not supported API)
  • IFileItem - a file in RTC SCM. (this is not supported API)
  • IConfigurationOp - describes a change to an item in a repository workspace. The repository uses these to build change sets. 
We're using a 4.0.3 version of the server and SDK. Nothing here is specific to that version, so it should work with newer and older versions. Notice I say should.

Implementation

Our example program will take a number of arguments, including:
  1. the workspace to commit to
  2. the component to commit to
  3. the path in the repository workspace
  4. the path to the new file content
I'm not going to cover resolving the workspace and component names, since we've covered that before. Instead, we'll start on line 95, where we resolve the remote path.

Resolving the Repository Path

In order to commit changes to an existing item, we need to find the item's ID. Humans, the frail beings that we are, refer to files by path. But RTC doesn't care about paths and refers to files by their item ID. So we need to find the IFileItem for the path the user gave us on the command line. 

We do that in findRepositoryPath() by splitting the user's path into segments, then asking the IWorkspaceConnection for the directory structure of the component. The structure is represented by an IConfiguration, which allows us to query the item at the user's path:

private IFileItemHandle findRepositoryPath(IWorkspaceConnection wsConn, IComponent comp, String remotePath) throws TeamRepositoryException {
 String[] path = remotePath.split("(/|\\\\)");
 
 IConfiguration config = wsConn.configuration(comp);  
 IVersionableHandle itemAtPath = config.resolvePath(comp.getRootFolder(), path, null);
 
 if (itemAtPath == null) {
  throw new RuntimeException("Could not resolve path " + Arrays.asList(path));
 }
  
 if (itemAtPath instanceof IFileItemHandle) {
  return (IFileItemHandle) itemAtPath;
 }
  
 System.err.println("We only like files, not " + itemAtPath.getClass().getSimpleName());
 throw new RuntimeException();
}

Line 143 performs the path query. The first argument is the root folder, which the the root of the component, and the second argument is the path specified by the user. The third argument is a IProgressMonitor - we don't care about those for this example.
The return value is a handle to an IFileItem, which carries the item type, the item id, and state id. Our example is limited to dealing with files, so we use an instanceof check on line 149 to ensure that it's a file. Repository workspaces can also hold folders (represented by IFolder) and symbolic links (represented by ISymbolicLink) but those would complicate the example, so we're ignoring them.

Uploading New Content

We upload content with the IFileContentManager. In RTC, content knows a few things about itself, namely the encoding, line delimiter (for text files), and an optional previous version. Since this is an example, we hardcode reasonable values: 
// Upload content
System.out.println("Uploading content of " + contentSource.getAbsolutePath());
IFileContentManager contentManager = FileSystemCore.getContentManager(repo);

IFileContent content = contentManager.storeContent("UTF-8", FileLineDelimiter.LINE_DELIMITER_NONE, new FileInputStreamProvider(contentSource), null, null);
The content is read multiple times during the upload, so we can't pass around an InputStream, since the first close() would render it useless. Instead, the caller passes in a factory that allows the stream to be repeatedly opened.

The multiple reads are due to a couple of implementation decisions: 
  1. Our content transport layer is unadulterated HTTP 1.0. In order to use Connection: Keep-Alive, we need to know the length of the stream before we upload it. We don't use content chunking. I don't know why. 
  2. We can't use the file size to determine the length, because we normalize *nix/DOS line endings on upload. Our only option is to walk the file to determine the length. 
But callers don't need to know that. Instead, they just have to subclass AbstractVersionedContentManagerInputStreamProvider and define reasonable methods. The FileInputStreamProvider is probably the minimal possible implementation. You can see it on line 49 of TrivialCommit.java
The upload occurs as a single call and returns an IFileContent object. The content object is a client-side references that is used to uniquely identify file text in the repository. 

Creating the After State

There are two arcane steps we must take before creating the 'after' state for our change set: we need to inflate the IFileItemHandle from findRepositoryPath() into a full item, and then get a working copy of that full item.

IFileItem fileItem = (IFileItem) wsConn.configuration(comp).fetchCompleteItem(fileHandle, null);
fileItem = (IFileItem) fileItem.getWorkingCopy();

As we said above: items are passed between the RTC server and client as a handle consisting of the item id, the state id, and the item type. Handles can be considered cross-network pointers: they're a lightweight representation that allows programmers to refer to the item without worrying about its properties. In this case, we care about the IFileItem itself, so we get the full representation on line 106 by fetching it with IConfiguration.fetchCompleteItem()

Full items are immutable, so we have to ask the item for a working copy of itself on line 107 before we can change it. Once we have the working copy, we make changes to that. 

The handle/full-item/working-copy trichotomy gives us some powerful tools. 
  • We can pass handles across the network quickly, only fetching the full version of the ones we care about. 
  • We can batch fetching handles, to minimize the number of network operations.
  • Immutable full items allow long-lived portions of RTC to assume that the properties of the item haven't changed. 
  • Eventing when working copies are saved allow the RTC UI to update appropriately. 

Recording the Change

Our changes to the IFileItem are minor: we just update the content and modification date. The other fields keep the previous values.
fileItem.setContent(content);
fileItem.setFileTimestamp(new Date());

The working copy of the IFileItem is the 'after' state of the file change we want to make. To get the 'after' state into a change set we create an IConfigurationOp containing the working copy. The configuration op is used to inform the repository that a change set should be updated.

// Save the change into a change set
IConfigurationOpFactory opFactory = wsConn.configurationOpFactory();
ISaveOp saveOp = opFactory.save(fileItem);

We have IConfigurationOps because we want a common way of talking about item modifications. Aside from the Milquetoast SaveOp, we have ops that merge conflicts, delete items, or remove them from change sets.

We're almost done. We just need to find a change set to record the after state.

Finding the Change Set

In a truly trivial example, we would create a new change set and commit into that. But there's a restriction on commit: an item can only appear in one active change set per workspace. If we try committing it to another change set, the RTC repository will throw an exception. This use case is fairly common, so we'll handle it in selectChangeSetModifying().

private IChangeSetHandle selectChangeSetModifying(IWorkspaceConnection wsConn, IComponent comp, IFileItemHandle fileHandle) throws TeamRepositoryException {
 List<IChangeSetHandle> activeChangeSets = wsConn.activeChangeSets(comp);
  
 @SuppressWarnings("unchecked")
 List<IChangeSet> changes = wsConn.teamRepository().itemManager().fetchCompleteItems(activeChangeSets, IItemManager.DEFAULT, null);
 
 for (IChangeSet cs : changes) {
  for (IChange change : (List<IChange>)cs.changes()) {
   if (fileHandle.sameItemId(change.item())) {
    return cs;
   }
  }
 }
 
 return wsConn.createChangeSet(comp, null);
}

To avoid the exception, we walk the set of active change sets and look for one with a change to our IFileItem. The IWorkspaceConnection knows the list of active change sets, but only records them as handles, so we convert the IChangeSetHandles into full items with the IItemManager on line 240.

The IChangeSet expresses changes as IChange objects. Each of those modifies a single item, so we walk those to see if our IFileItem is already being modified.. We know that only one change set may modify an item at a time, so we're safe to return when we find the first match on line 245.

There's always the possibility that there aren't any active change sets modifying our IFileItem, so line 250 creates a new IChangeSet if necessary.


Committing the Change

We have all of the parts we need: an ISaveOp describing the after state of our item, a change set to record the change, and a workspace containing the change set. Our commit is almost anticlimactic: 

// Save the change set
System.out.println("Committing");
wsConn.commit(cs, ops, null);


After the commit is over, we change the comment on the change set and complete it. We do that to make our change set easier to identify. In production code we would (probably) leave the change set open and we certainly wouldn't create such a pointless comment, since the completion time of the change set already has that information. 

Running the Program

The program takes a whopping seven arguments: the first three are the repository URI, user name and password. The next three are the repository workspace name, component name, and path of the item to save to. The last argument names the file that contains the bytes we want to commit.

Let's put together a simple example:
  1. Start your repository and connect to it with the RTC Eclipse client.
  2. Create a trivial repository workspace that has a few directories then open it in the Repository Files view. Mine looks like:

    The file text.txt is the target of our commit. 
  3. We'll use the eclipse 'Trivial Commit' launch. You need to modify the paths and arguments before you can use it. You can do that by opening the Run menu and executing the Run Configurations gesture. Edit the 'Trivial Commit' launch and modify the arguments as appropriate:

    The arguments are: repository URI, username, password, workspace name, component name, path of the item to commit in the repository workspace, and the path of the content to write to the remote workspace. 
  4. My initial content for text.txt is an empty file, while the content of /tmp/sample.txt is 'hello commit world'. To verify your content, open the target file from the Repository Files view in Eclipse. 
  5. Run the launch. It should chug for a few moments before performing the commit. 
  6. Refresh the Repository Files view and open text.txt.
  7. Et voilà!


    The file's history contains a change set with our automatically created comment:
Now you have a working commit using the RTC SDK. The sample demonstrates how to find an item in a repository workspace, traverse active change sets to one that modifies a specific item, and add a change to a change set. Along the way you've learned a little more about RTC's item model. 

As always, comments are welcome, and the source code is available on JazzHub

Monday, August 12, 2013

Getting your stuff - using the RTC SDK to zip a repository workspace

Have you ever wanted to create a zip archive of a repository workspace? This post describes how to use the RTC SDK (and some undocumented internals) to copy the contents of a repository workspaces into a zip file.

Let's start with some background. A repository workspace contains components, each of which has a configuration - which is a ten dollar word for "file tree." The configuration provides access to the structure of the tree and has pointers to the file content.

A program that zips up a remote workspace needs to:
  1. log into the repository,
  2. find the workspace to zip,
  3. get the components in the workspace,
  4. get the configuration of each component, and finally
  5. walk the file tree to write each directory/file to our zip.
This post uses unsupported API. It is likely that the APIs will change without warning in future iterations of RTC - you should use the Source Control command line tool instead. You are using these APIs at your own risk.
Each step is addressed in its own section. The code samples are available on JazzHub or can be downloaded as a zip. You are expected to have a working development environment with the RTC SDK and RTC server configured properly.

Logging in to the repository

All of the information we're interested in is stored on the RTC repository. In order to access it, our programs needs to log in:
ITeamRepository repo = TeamPlatform.getTeamRepositoryService().getTeamRepository(uri);
  
repo.registerLoginHandler(new MyLoginHandler(username, pw));

repo.login(null);
The MyLoginHandler class included in the sample project. The username and
pw are configured beforehand.

Finding the workspace

For our example, we find the workspace by searching for one with a specific name using the IWorkspaceManager#findWorkspaces() method.

The interesting classes here are the SCMPlatform singleton that allows us to get a IWorkspaceManager, and the IWorkspaceSearchCriteria. The workspace manager answers simple queries about workspaces, as well as providing access to IWorkspaceConnections, that we will learn more about below.

IWorkspaceManager mgr = SCMPlatform.getWorkspaceManager(repo);


// Find the named workspace
IWorkspaceSearchCriteria cri = IWorkspaceSearchCriteria.FACTORY.newInstance();
cri.getFilterByOwnerOptional().add(repo.loggedInContributor());
cri.setExactName(wsName);
List<IWorkspaceHandle> findWorkspaces = mgr.findWorkspaces(cri, 2, null);

if (findWorkspaces.size() == 0) {
 System.err.println("Couldn't find any workspaces named \"" + wsName + "\"");
 return false;
}

if (findWorkspaces.size() > 1) {
 System.err.println("Multiple workspaces named \"" + wsName + "\"");
 return false;
}
The IWorkspaceSearchCriteria allows us to build a query that matches a number of workspaces. The query fields are implicitly and'ed together to limit the workspaces returned. In an ideal world, we'll find exactly one workspace that matches our criteria.

Searching by workspace name is useful for our example, but it isn't the safest thing to do in a production environment, since there could be multiple workspaces with the same name. In production UI code, we would present the user with a choice of workspaces. If the code were headless, we would use the ID of the workspace we care about.

Because we want a modicum of correctness, our example ensures that there is exactly one visible workspace with the given name (lines 68-76).

The findWorkspaces() method returns a list of handles to the workspaces matching the criteria. A handle is a lightweight object that identifies an item in the repository. We'll use the handle later on to query the workspaces.

Getting the components in the workspace


Once we've found the IWorkspaceHandle, we need a richer representation to query the file tree. We do that by converting the handle into an IWorkspaceConnection (see line 78, below). The IWorkspaceConnection provides operations on a repository workspace, and caches information about the workspace.

Now we have the connection, we start digging into the structure of the workspace. The topmost layer in the logical tree consists of components. Components split remote workspaces into logical groupings of files and folders. They aren't normally represented in the local filesystem when loaded, so our zip creator won't record them in the zip.

IWorkspaceConnection wsConn = mgr.getWorkspaceConnection(findWorkspaces.get(0), null);

// Start walking the workspace contents
IFileContentManager contentManager = FileSystemCore.getContentManager(repo);

File base = new File(System.getProperty("user.dir"));

FileOutputStream out = new FileOutputStream(new File(base, wsName + ".zip"));
try {
 ZipOutputStream zos = new ZipOutputStream(out);
 
 for (IComponentHandle compHandle : (List<IComponentHandle>)wsConn.getComponents()) {
  IConfiguration compConfig = wsConn.configuration(compHandle);

  // Fetch the items at the root of each component. We do this to initialize our 
  // queue of stuff to download.
  Map<String, IVersionableHandle> handles = compConfig.childEntriesForRoot(null);
  List<IVersionable> items = compConfig.fetchCompleteItems(new ArrayList<IVersionableHandle>(handles.values()), null);

  loadDirectory(contentManager, compConfig, zos, "", items);
 }
 
 zos.close();

} finally {
 out.close();
}
On line 89 we loop over each of the components, getting an IConfiguration. The configuration encapsulates the file/folder structure in the repository workspace, so we use that to walk the remote filesystem. The first part of the walk is on line 94 where we get the handles of the component's root items. (Note that we're dealing with root items: there could be files and symlinks at the top of the component hierarchy, as well as directories)

Walking the file tree to fetch content


We're finally here! The fun part that involves getting file content from the repository. Unfortunately, this is also where we diverge from the supported API. In our last code snippet, you'll notice that we got an IFileContentManager on line 81. Sadly, that isn't part of the API anyone outside of the SCM Core is supposed to use. If you do use it, be aware that the class could change in future releases. You use it at your own risk.

With the legalese/honesty out of the way, let's look at how we use the forbidden API. Our loadDirectory method is called recursively to write the content of each directory into the zip file. It takes a list of IVersionable items as an argument. An IVersionable is the superclass of the things that live in a filesystem: files, folders, or symlinks. It is possible that other types of items could exist in the configuration - but let's pretend they don't, because, for the most part, they won't.

loadDirectory() loops over each versionable and either creates a directory in the zip or writes the file content into the zip. In the case of folders, it gets the children from the configuration (line 135), and then recursively calls itself:

if (v instanceof IFolder) {
 // Write the directory
 String dirPath = path + v.getName() + "/";
 zos.putNextEntry(new ZipEntry(dirPath));
 
 @SuppressWarnings("unchecked")
 Map<String, IVersionableHandle> children = compConfig.childEntries((IFolderHandle)v, null);
 @SuppressWarnings("unchecked")
 List<IVersionable> completeChildren = compConfig.fetchCompleteItems(new ArrayList<IVersionableHandle>(children.values()), null);

 loadDirectory(contentManager, compConfig, zos, dirPath, completeChildren);
}

More interesting things happen in the IFileItem block:

else if (v instanceof IFileItem) {
 // Get the file contents and write them into the directory
 IFileItem file = (IFileItem) v;
 zos.putNextEntry(new ZipEntry(path + v.getName()));
 
 InputStream in = contentManager.retrieveContentStream(file, file.getContent(), null);
 byte[] arr = new byte[1024];
 int w;
 while (-1 != (w = in.read(arr))) {
  zos.write(arr, 0, w);
 }
 
 zos.closeEntry();
}

The fun part is on line 146 where we ask the (forbidden) IFileContentManager for the content of the file. We pass in the IFileItem as well as its IContent, which is a pointer to the blob of bytes stored in the RTC repository. The remainder of the block is anticlimactic: copying the content with a regular Java stream idiom.

Even though the file content portion of this example is forbidden API, the example helps to show how to use our APIs. The pattern of logging in, finding a workspace, and then using the IWorkspaceConnection to perform some operation may be useful in other contexts. The example doesn't begin to get into the practical complexities of getting content (normalizing line endings, storing file properties, handling symlinks), or the problems faced when merging into an existing filesystem.

You can download the full Eclipse project or poke at the project on JazzHub

Thursday, July 4, 2013

Configuring Eclipse to use the RTC SDK

Last night I was poking around trying to figure out how to write a demo with the Rational Team Concert SDK. My first attempt failed. It wasn't until I found Ralph Schoon's excellent blog post on using the SDK that I finally figured out how to do it.

Since Ralph's post is long and it refers to an even longer PDF, I thought I'd present an abridged explanation:

  1. Download the RTC SDK and server. At the time of writing, 4.0.3 is the most recent release, so  you might as well grab that
  2. Extract the SDK to ~/rtc-sdk.
  3. Start Eclipse on a fresh workspace. 
  4. Set your target platform by:
    1. Open the Eclipse preferences and search for the Target Platform page.

    2. Click "Add" to create a new target platform. Initialize it with "Nothing."

    3. Edit the target platform.

    4. Add an Installation. Point it to ~/rtc-sdk.

    5. Set your target platform as default.
You're done! You can now write programs that use the RTC SDK.

To verify that your Eclipse is properly configured, download the attached project, and copy it into Eclipse. Get your server configured and started, then edit Start.java to use the URI of your server and credentials of the user you created.

You can run the test application with the "api demo" Eclipse launcher. It logs into the server, creates a repository workspace named "meow", and then lists all of the repository workspaces owned by the current user. At the very least, you should see "meow" in the Eclipse Console:
Expected output in a properly configured Eclipse

Sunday, January 27, 2013

When should you start using a source control?

My eyes were immediately drawn to this thread on Ars Technica: "When should I make the first commit to source control?", with answers from Stack Exchange users.

I'm from a team at IBM that is building a source control. So getting my opinion is a bit like asking your stock broker what he or she thinks about the merits of investing on the stock market. Of course I'm convinced the source control should be a transparent part of your development - you should not even have to think about it while you design and edit your code. It should just be there when you're in a bad state and you want to return to a know good one. Aka, you should not even ask yourself when to make the first commit...

A source control doesn't have to be in the way of your dev work. It knows when you save files, and in many IDEs and products out there, it even knows what task or defect you are working on. It can work in the background, safely pushing your changes from your local drive to a remote machine, so you can crash your hard drive or pursue your work from a separate machine in the evening.

A source control is now a basic feature like syntax highlighting, code assist or refactoring. It's a given that whatever you've done, you can review and roll back in time, fork and try something else. Suspend your current work, resume some other on-going work. Work with a buddie, with a team. And when you can do all that without thinking about having to use a source control, then we've succeeded.

The first tip I give about using a source control is to check-in frequently. By that I mean storing your changes frequently in the source control, which isn't the same as giving every one of your changes to your team. Many source controls have a staged approach - you can version control your own changes in a private way then control when you find them useful and good enough to be shared with others.

In the 8 years of RTC Source Control, we tested different source control workflows. In the early days, we were really keen on something called auto check-in. In 1.0, that feature was on by default. You save a file. Bing. That change is safely backed up on the RTC server, in your private workspace. Some of us were wondering - why bother asking the user when to manually check-in? Why not doing it automatically for every file that is modified? It turned out to be a very divisive matter. 
  • Some really thought that was the way of the future - just have every change you do automatically be replicated in your backup on the server. 
  • And others really wanted to stay in control as to when their changes leave their local drive and become part of the source control's memory.
I was initially a strong proponent of the auto check-in workflow. I always thought the ones opposing it were a bit like the mathematician Gauss as described by Abel: "he is like the fox, who effaces his tracks in the sand with his tail". Like you don't want others to know all the wrong paths you've explored before you actually perfected the fix for a defect... That's too bad, because e.g. RTC Source Control has a way to highlight the initial and final versions in a change set and make the intermediate versions mostly hidden in regular operations. 

So why did we turn off auto check-in by default? Because many users were hit by some of its drawbacks.
  • Network latency when you work over a poor wifi connection at the airport
  • Big mess when a mistake you do in your IDE refactors 1000 files.
  • Your history is filled with meaningless changes (if you use auto-complete of change sets) or with huge change sets if you forget to help the tool and complete your change sets from time to time
I did experience all these annoyances myself and was convinced that it wasn't appropriate for a majority of users in its current form. It's fantastic for those users who like the convenience and remember to complete their change sets at appropriate stable moments. I don't use auto check-in anymore. But I frequently check-in my changes so that I can easily go back to a previous good state, and in RTC we make it easy with a simple "Check-in all" button. I only deliver my changes to the stream used by my team when it's ready - after a green personal build for example.

If the source control you use makes a commit sound like a complicated, slow and cumbersome task to perform, you're likely missing the greatest strength of a source control. It's invisible to you when things work, and it's there when you need it. So, in conclusion, next time you create a project, put it under version control right away... Version control is good for you even if you don't intend to share with other peers right away.