Make Better Things



I like to make better things.

Introduction to Version Control System

Version Control (aka Revision Control aka Source Control) lets you track your files over time. Why do you care? So when you mess up you can easily get back to a previous working version.

So Why Do We Need A Version Control System (VCS)?

Large, fast-changing projects with many authors need a Version Control System (geekspeak for “file database”) to track changes and avoid general chaos. A good VCS does the following:

  • Backup and Restore. Files are saved as they are edited, and you can jump to any moment in time. Need that file as it was on Feb 23, 2007? No problem.
  • Synchronization. Lets people share files and stay up-to-date with the latest version.
  • Short-term undo. Monkeying with a file and messed it up? Throw away your changes and go back to the “last known good” version in the database.
  • Long-term undo. Sometimes we mess up bad. Suppose you made a change a year ago, and it had a bug. Jump back to the old version, and see what change was made that day.
  • Track Changes. As files are updated, you can leave messages explaining why the change happened (stored in the VCS, not the file). This makes it easy to see how a file is evolving over time, and why.
  • Track Ownership. A VCS tags every change with the name of the person who made it. Helpful for blamestorming giving credit.
  • Sandboxing, or insurance against yourself. Making a big change? You can make temporary changes in an isolated area, test and work out the kinks before “checking in” your changes.
  • Branching and merging. A larger sandbox. You can branch a copy of your code into a separate area and modify it in isolation (tracking changes separately). Later, you can merge your work back into the common area.

Shared folders are quick and simple, but can’t beat these features.

Learn the Concepts

Most version control systems involve the following concepts, though the labels may be different.

Basic Setup

  • Repository (repo): The database storing the files.
  • Server: The computer storing the repo.
  • Client: The computer connecting to the repo.
  • Working Set/Working Copy: Your local directory of files, where you make changes.
  • Trunk/Main: The “primary” location for code in the repo. Think of code as a family tree — the “trunk” is the main line.

Basic Actions

  • Add: Put a file into the repo for the first time, i.e. begin tracking it with Version Control.
  • Revision: What version a file is on (v1, v2, v3, etc.).
  • Head: The latest revision in the repo.
  • Check out: Download a file from the repo.
  • Check in: Upload a file to the repository (if it has changed). The file gets a new revision number, and people can “check out” the latest one.
  • Checkin Message: A short message describing what was changed.
  • Changelog/History: A list of changes made to a file since it was created.
  • Update/Sync: Synchronize your files with the latest from the repository. This lets you grab the latest revisions of all files.
  • Revert: Throw away your local changes and reload the latest version from the repository.

Advanced Actions

  • Branch: Create a separate copy of a file/folder for private use (bug fixing, testing, etc). Branch is both a verb (”branch the code”) and a noun (”Which branch is it in?”).
  • Diff/Change/Delta: Finding the differences between two files. Useful for seeing what changed between revisions.
  • Merge (or patch): Apply the changes from one file to another, to bring it up-to-date. For example, you can merge features from one branch into another. (At Microsoft this was called Reverse Integrate and Forward Integrate)
  • Conflict: When pending changes to a file contradict each other (both changes cannot be applied).
  • Resolve: Fixing the changes that contradict each other and checking in the correct version.
  • Locking: “Taking control” of a file so nobody else can edit it until you unlock it. Some version control systems use this to avoid conflicts.
  • Breaking the lock: Forcibly unlocking a file so you can edit it. It may be needed if someone locks a file and goes on vacation.
  • Check out for edit: Checking out an “editable” version of a file. Some VCSes have editable files by default, others require an explicit command.

And a typical scenario goes like this:

Alice adds a file (list.txt) to the repository. She checks it out, makes a change (puts “milk” on the list), and checks it back in with a checkin message (”Added required item.”). The next morning, Bob updates his local working set and sees the latest revision of list.txt, which contains “milk”. He can browse the changelog or diff to see that Alice put “milk” the day before.

How it all works

Checkins

The simplest scenario is checking in a file (list.txt) and modifying it over time.

Basic checkin in VCS

Basic checkin in VCS

Each time we check in a new version, we get a new revision (r1, r2, r3, etc.). In Subversion you’d do:

svn add list.txt
(modify the file)
svn ci list.txt -m “Changed the list”

The -m flag is the message to use for this checkin.

Checkouts and Editing

In reality, you might not keep checking in a file. You may have to check out, edit and check in. The cycle looks like this:

Basic Checkout VCS

Basic Checkout VCS

If you don’t like your changes and want to start over, you can revert to the previous version and start again (or stop). When checking out, you get the latest revision by default. If you want, you can specify a particular revision. In Subversion, run:

svn co list.txt (get latest version)
…edit file…
svn revert list.txt (throw away changes)
svn co -r2 list.txt (check out particular version)

Diffs

The trunk has a history of changes as a file evolves. Diffs are the changes you made while editing: imagine you can “peel” them off and apply them to a file:

Basic diff in VCS

Basic diff in VCS

For example, to go from r1 to r2, we add eggs (+Eggs). Imagine peeling off that red sticker and placing it on r1, to get r2.

And to get from r2 to r3, we add Juice (+Juice). To get from r3 to r4, we remove Juice and add Soup (-Juice, +Soup).

Most version control systems store diffs rather than full copies of the file. This saves disk space: 4 revisions of a file doesn’t mean we have 4 copies; we have 1 copy and 4 small diffs. Pretty nifty, eh? In SVN, we diff two revisions of a file like this:

svn diff -r3:4 list.txt

Diffs help us notice changes (”How did you fix that bug again?”) and even apply them from one branch to another.

Bonus question: what’s the diff from r1 to r4?

+Eggs
+Soup

Notice how “Juice” wasn’t even involved — the direct jump from r1 to r4 doesn’t need that change, since Juice was overridden by Soup.

Branching

Branches let us copy code into a separate folder so we can monkey with it separately:

Basic branching in VCS

Basic branching in VCS

For example, we can create a branch for new, experimental ideas for our list: crazy things like Rice or Eggo waffles. Depending on the version control system, creating a branch (copy) may change the revision number.

Now that we have a branch, we can change our code and work out the kinks. (“Hrm… waffles? I don’t know what the boss will think. Rice is a safe bet.”). Since we’re in a separate branch, we can make changes and test in isolation, knowing our changes won’t hurt anyone. And our branch history is under version control.

In Subversion, you create a branch simply by copying a directory to another.

svn copy http://path/to/trunk http://path/to/branch

So branching isn’t too tough of a concept: Pretend you copied your code into a different directory. You’ve probably branched your code in school projects, making sure you have a “fail safe” version you can return to if things blow up.

Merging

Branching sounds simple, right? Well, it’s not — figuring out how to merge changes from one branch to another can be tricky.

Let’s say we want to get the “Rice” feature from our experimental branch into the mainline. How would we do this? Diff r6 and r7 and apply that to the main line?

Wrongo. We only want to apply the changes that happened in the branch!. That means we diff r5 and r6, and apply that to the main trunk:

Basic merging in VCS

Basic merging in VCS

If we diffed r6 and r7, we would lose the “Bread” feature that was in main. This is a subtle point — imagine “peeling off” the changes from the experimental branch (+Rice) and adding that to main. Main may have had other changes, which is ok — we just want to insert the Rice feature.

In Subversion, merging is very close to diffing. Inside the main trunk, run the command:

svn merge -r5:6 http://path/to/branch

This command diffs r5-r6 in the experimental branch and applies it to the current location. Unfortunately, Subversion doesn’t have an easy way to keep track of what merges have been applied, so if you’re not careful you may apply the same changes twice. It’s a planned feature, but the current advice is to keep a changelog message reminding you that you’ve already merged r5-r6 into main.

Conflicts

Many times, the VCS can automatically merge changes to different parts of a file. Conflicts can arise when changes appear that don’t gel: Joe wants to remove eggs and replace it with cheese (-eggs, +cheese), and Sue wants to replace eggs with a hot dog (-eggs, +hot dog).

Basic conflicts in VCS

Basic conflicts in VCS

At this point it’s a race: if Joe checks in first, that’s the change that goes through (and Sue can’t make her change).

When changes overlap and contradict like this, the VCS may report a conflict and not let you check in — it’s up to you to check in a newer version that resolves this dilemma. A few approaches:

  • Re-apply your changes. Sync to the the latest version (r4) and re-apply your changes to this file: Add hot dog to the list that already has cheese.
  • Override their changes with yours. Check out the latest version (r4), copy over your version, and check your version in. In effect, this removes cheese and replaces it with hot dog.

Conflicts are infrequent but can be a pain. Usually I update to the latest and re-apply my changes.

Tagging

Who would have thought a version control system would be Web 2.0 compliant? Many systems let you tag (label) any revision for easy reference. This way you can refer to “Release 1.0″ instead of a particular build number:

Basic tagging in VCS

Basic tagging in VCS

In Subversion, tags are just branches that you agree not to edit; they are around for posterity, so you can see exactly what your version 1.0 release contained. Hence they end in a stub — there’s nowhere to go.

(in trunk)
svn copy http://path/to/revision http://path/to/tag

A real life example: Managing windows source code -

  • There’s a main line with stable builds of Windows.
  • Each group (Networking, User Interface, Media Player, etc.) has its own branch to develop new features. These are under development and less stable than main.

You develop new features in your branch and “Reverse Integrate (RI)” to get them into Main. Later, you “Forward Integrate” and to get the latest changes from Main into your branch:

Let’s say we’re at Media Player 10 and IE 6. The Media Player team makes version 11 in their own branch. When it’s ready and tested, there’s a patch from 10 – 11 which is applied to Main (just like the “Rice” example, but a tad more complicated). This a reverse integration, from the branch to the trunk. The IE team can do the same thing.

Later, the Media Player team can pick up the latest code from other teams, like IE. In this case, Media Player forward integrates and gets the latest patches from main into their branch. This is like pulling in the “Bread” feature into the experimental branch, but again, more complicated.

So it’s RI and FI. Aye aye. This arrangement lets changes percolate throughout the branches, while keeping new code out of the main line. Cool, eh?

In reality, there’s many layers of branches and sub-branches, along with quality metrics that determine when you get to RI. But you get the idea: branches help manage complexity. Now you know the basics of how one of the largest software projects is organized.

Memory management in Objective-C(iPhone)

One of the biggest obstacles most people have to get their head around when they first start Objective C / Cocoa development is the memory management – and it’s actually very beautiful when it finally dawns on you how simple it really is (though this may take a couple of proper projects!).

Memory management in C/C++

With traditional C/C++, determining responsibility for clearing up after unused objects is a bit of a nightmare. There is nothing in the language that specifies how this should be approached and it all depends on the communication and conventions used by individual programmers.

Imagine programmer A writes some code (Class A) which creates and allocates an object called Data and passes this object to programmer B for him to use in his code (Class B). Both programmer A and programmer B are using the same object Data. Who should delete the object Data when it is no longer used?

If Class A only creates and initializes the Data and no longer needs it, then class B can safely delete the object after he is done with Data. However, how can he be sure that class A no longer needs the Data? If Class A does still need the Data while Class B is using it, how do we known when to delete it? If Class A deletes the pointer when he’s done, Class B might try to reference it and you get a crash. If Class B deletes the pointer when he’s done, Class A might try to reference it and you get another crash. Of course if Class C is introduced which needs to share the same Data, it gets even more complicated.

There are conventions and patterns to handle these situations, essentially a form of communication saying “Hang on, don’t delete that object, I’m still using it” or “Ok, I’m done with it, do with it whatever you will”. That is exactly the reason reference counting was developed, and is the main memory management technique used in Objective C.

Memory management in Objective C

Objective C uses ‘reference counting’ as its main memory management technique (wikipedia.org/wiki/Reference_counting). Every object keeps an internal count of how many times it’s ‘needed’. The system makes sure that objects that are needed are not deleted, and when an object is not needed it is deleted. This may sound like automatic garbage collection (the way it works in Java, AS3 (wikipedia.org/wiki/Garbage_collection_(computer_science)), but it is not. The main difference is that in automatic GC (Java, AS3 etc.), a seperate chunk of code periodically runs in the background to see what objects are being referenced and which ones are not needed. It then deletes any unused objects automatically with no special handling required by the programmer (apart from making sure all references to objects are removed when not needed). In the reference counting method, the programmer has the responsibility of declaring when he needs the object and when he’s done with the object, and object deletion takes place immediately when the object is no longer used, i.e. reference count drops to zero (See the above wikipedia links for more on the matter).

Note: Objective C 2.0 also has an option to enable automatic garbage collection. However, garbage collection is not an option when developing for iPhone so its still important to understand reference counts, object ownership etc..

Object Ownership

It’s important to understand the concept of object ownership. In Objective C, an object owner is someone (or piece of code) that has explicitly said “Right, I need this object, don’t delete it”. This could be the person (or code) that created the object (e.g. Class A in the example above). Or it could be another person (or code) that received the object and needs it (e.g. Class B in the example above). Thus an object can have more than one owner. The number of owners an object has, is also the reference count.

Object owners have the responsibility of telling the object when they are done with it. When they do, they are no longer an owner and the object’s reference count is decreased by one. When an object has no owners left (i.e. no one needs it, reference count is zero), it is deleted.

You can still use an object without being its owner (using it temporarily), but bear in mind that the object might (and probably will) get deleted in the near future. So if you need an object long-term, you should take ownership. If you only need the object there and then (e.g. within the function that you received it, or within that event loop – more on autorelease pools later), then you don’t need to take ownership.

Messages

The main messages you can send an object (regarding memory management) are:

alloc (e.g. [NSString alloc]): This allocates an instance of the object (in this case NSString). It also sets the reference count to 1. You are the only owner of this object. You must release the object when you are done with it. (OFFTOPIC: remember to call an init function on the newly allocated object: [[NSString alloc] init] or [[NSString alloc] initWithFormat:] etc.)

new (e.g. [NSString new]): This is basically a shorthand way of writing [[NSString alloc] init]. So same rules apply as alloc.

retain (e.g. [aString retain]): You call this when you are passed an existing object (memory has already been allocated for it elsewhere), and you want to tell the object that you need it as well. This is like saying “Ok, I need this object, don’t delete it till I’m done.”. The reference count is increased by one, and you are the new owner of this object (along with any other previous owners). You must release the object when you are done with it.

release (e.g. [aString release]): You call this when you are done using an object. You are no longer the owner of the object so the reference count is decreased by one. If reference count is now zero (i.e. no owners left), then the object is automatically deleted and the memory is freed, otherwise the object stays in memory for other owners to use. It is like saying “Right, I’m done with this object, you can delete it if no one else is using it”. If you are not the owner of an object, you should not call this. If you are the owner of an object, you must call this when you are done.

autorelease (e.g. [aString autorelease]). This means that you need the object temporarily, and does not make you an owner. It’s like saying “Ok, I need this object for now, keep it in memory while I do a few things with it, then you can delete it”. More on this later in the ‘autorelease pools’ section.

So when dealing with Objective C pointers/objects, it’s important to remember to send the correct messages. The general rule of thumb is: If you own an object (allocate or retain it), you release it. If you don’t own it (came via convenience method or someone else allocated it), you don’t release it.

Convenience methods

Many classes in Cocoa have whats known as convenience methods. These are static methods used for allocating and initializing objects directly. You are not the owner of the returned objects and they are deleted automatically when the autorelease pool is popped (generally at the end of the event loop, but this depends on you or the app).

E.g. the explicit way of allocating and initializing an NSNumber is:

NSNumber *aNumber = [[NSNumber alloc] initWithFloat:5.0f];

This creates a new instance of NSNumber, initializes it with the ‘initWithFloat’ method, and parameter 5.0f.
aNumber has a reference count of 1. You are the owner of aNumber and you must release it when you are done.
Using a convenience method would be:

NSNumber *aNumber = [NSNumber numberWithFloat:5.0f];

This also creates a new instance of NSNumber, initializes it with the ‘numberWithFloat’ method, and parameter 5.0f.

It also has a reference count of 1. But you are not the owner of aNumber and should not release it. The owner is the NSNumber class and the object will be deleted automatically at the end of the current scope – defined by the autorelease pool, more on this later – for now its safe to say you should not release the object, but keep in mind the object will not hang around for long.

Convenience methods generally have the same name as the relevant init function, but with the init replaced by the type of object. E.g. for NSNumber: initWithFloat -> numberWithFloat, initWithInt -> numberWithInt.

An example with NSString:


NSString *aString1 = [[NSString alloc] initWithFormat:@"Results are %i and %i", int1, int2]; // explicit allocation, you are the owner, you must release when you are done

NSString *aString2 = [NSString stringWithFormat:@"Results are %i and %i", int1, int2]; // convenience method, you are not the owner, the object will be deleted when the autorelease pool is popped.

Autorelease Pools

An autorelease pool is an instance of NSAutoreleasePool and defines a scope for temporary objects (objects which are to be autoreleased). Any objects which are to be autoreleased (e.g. objects you send the autorelease message to or created with convenience methods) are added to the current autorelease pool. When the autorelease pool is popped (released) all objects that were added to it are also automatically released. This is a simple way of managing automatic release for objects which are needed temporarily.

E.g. You want to create a bunch of objects for temporary calculations, and instead of keeping track of all the local variables you define and then calling release for all of them at the end of your function, you can create them all with autorelease (or convenience methods) safe in the knowledge that they are going to be released next time the autorelease pool is popped. Note: there is a downside to this which I’ll discuss in the Convenience vs Explicit section.

Autorelease pools can be nested, in which case autorelease objects are added to the latest autorelease pool to be created (the pools are stacked in a Last In First Out type stack).

Example in the Autorelease, Convenience vs Explicit section.

Arrays, Dictionaries etc.

Arrays, dictionaries etc. generally retain any objects added to them. (When dealing with 3rd party collection type objects, always check the documentation to see if they retain or not). This means that these collections will take ownership of the object, and you do not need to retain before adding.
E.g. The following code will create a leak:

-(void) addNumberToArray:(float)aFloat
 NSNumber *aNumber = [[NSNumber alloc] initWithFloat:aFloat]; // reference count is now 1, you are the owner
 [anArray addObject: aNumber]; // reference count is now 2, the array is also an owner as well as you.
}

You need to release the number after you’ve added it if you no longer need it elsewhere other than the array. The following code is correct:

-(void) addNumberToArray:(float)aFloat {
 NSNumber *aNumber = [[NSNumber alloc] initWithFloat:aFloat]; // reference count is now 1, you are the owner
 [anArray addObject: aNumber]; // reference count is now 2, the array is also an owner as well as you.
 [aNumber release]; // reference count is now 1, you are not the owner anymore
}

Now, when the array is released, or the object is removed from the array, the reference count is dropped once more as the array delcares itself as no longer owner of the object, so the object is deleted.

Of course another way of doing the above safely is:

-(void) addNumberToArray:(float)aFloat {
 NSNumber *aNumber = [NSNumber numberWithFloat:aFloat]; // reference count is now 1, NSNumber is the owner, you are not
 [anArray addObject: aNumber]; // reference count is now 2, the array is also an owner as well as NSNumber.
}

Now when the autorelease pool is popped the NSNumber loses ownership and reference count drops to 1 (now only the array owns the number). When the array is released, or the object is removed from the array the reference count drops to zero and the number is deleted.

You may wonder which is a better way of doing this? Method 1 (explicitly using alloc and release), or method 2 (the convenience method). I generally preferred method 2 on OSX, because it looks simpler and is less code. The functionality looks identical – but it is not. It is actually better practise to use method 1 (especially when developing for iPhone) or use method 2 with your own autorelease pools, more on this below.

Autorelease, Convenience vs Explicit

You may be wondering what exactly the difference and/or benefit is of the following two approaches:

Explicit:

-(void) doStuff:(float)aFloat {
 NSNumber *aNumber = [[NSNumber alloc] initWithFloat:aFloat]; // refcount is 1, you are owner
 /// ... do a bunch of stuff with aNumber...
 ...
 [aNumber release]; // release aNumber
}

Autoreleased:

-(void) doStuff:(float)aFloat {
 NSNumber *aNumber = [NSNumber numberWithFloat:aFloat];// refcount is 1, you are not ownder, will be automatically release
 /// ... do a bunch of stuff with aNumber...
 ...
}

With the explicit approach, aNumber is released immediately at the end of doStuff and the memory is deallocated there and then. With the Autoreleased approach, the aNumber is released when the autorelease pool is popped, and generally that happens at the end of the event loop. So if you create quite a lot of autorelease objects during an event loop, they are all going to add up and you may run out of memory. In the above example it isn’t that clear but let me give another example:

Explicit:

-(void) doStuff:(float)aFloat {
 for(int i=0; i<100; i++) {
 NSNumber *aNumber = [[NSNumber alloc] initWithFloat:aFloat]; // refcount is 1, you are owner
 /// ... do a bunch of stuff with aNumber...
 ...
 [aNumber release]; // release aNumber
 }
}

Autoreleased:

-(void) doStuff:(float)aFloat {
 for(int i=0; i<100; i++) {
 NSNumber *aNumber = [NSNumber numberWithFloat:aFloat];// refcount is 1, you are not owner, will be automatically released
 /// ... do a bunch of stuff with aNumber...
 ...
 }
}

Now you can see, in the first example we never have more than a single NSNumber in memory (the NSNumber is allocated at the beginning of, and deallocated at the end of each for loop). Whereas in the second example with each for loop, a new NSNumber is created while the old one is still hanging around in memory waiting for the autorelease pool to be released. On desktop systems with a lot of ram, you may have the luxury to decide which method you’d like to go for, but on limited memory platforms such as iPhone it’s pretty important to make sure objects are deleted as soon as they become unnecessary and not hang around.

Of course another option is to create your own autorelease pool, which would be especially useful if you are using lots of temporary objects and can’t be bothered to release them all individually. Consider the following code:

Explicit:

-(void) doStuff {
 for(int i=0; i<100; i++) {
 NSNumber *aNumber1 = [[NSNumber alloc] initWithFloat:1]; // refcount is 1, you are owner
 NSNumber *aNumber2 = [[NSNumber alloc] initWithFloat:2]; // refcount is 1, you are owner
 NSNumber *aNumber3 = [[NSNumber alloc] initWithFloat:3]; // refcount is 1, you are owner
 NSNumber *aNumber4 = [[NSNumber alloc] initWithFloat:4]; // refcount is 1, you are owner
 NSNumber *aNumber5 = [[NSNumber alloc] initWithFloat:5]; // refcount is 1, you are owner
 NSNumber *aNumber6 = [[NSNumber alloc] initWithFloat:6]; // refcount is 1, you are owner

 // ... do a bunch of stuff with all objects above.
 ...

 // release all objects
 [aNumber1 release];
 [aNumber2 release];
 [aNumber3 release];
 [aNumber4 release];
 [aNumber5 release];
 [aNumber6 release];
 }
}

Autoreleased:

-(void) doStuff {
 for(int i=0; i<100; i++) {
 NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; // create your own little autorelease pool

 // these objects get added to the autorelease pool you created above
 NSNumber *aNumber1 = [NSNumber numberWithFloat:1]; // refcount is 1, you are not owner, will be automatically released
 NSNumber *aNumber2 = [NSNumber numberWithFloat:2]; // refcount is 1, you are not owner, will be automatically released
 NSNumber *aNumber3 = [NSNumber numberWithFloat:3]; // refcount is 1, you are not owner, will be automatically released
 NSNumber *aNumber4 = [NSNumber numberWithFloat:4]; // refcount is 1, you are not owner, will be automatically released
 NSNumber *aNumber5 = [NSNumber numberWithFloat:5]; // refcount is 1, you are not owner, will be automatically released
 NSNumber *aNumber6 = [NSNumber numberWithFloat:6]; // refcount is 1, you are not owner, will be automatically released

 // ... do a bunch of stuff with all objects above.
 ...

 [pool release]; // all objects added to this pool (the ones above) are released
 }
}

In this case, both chunks of code essentially behave the same. In the first example 6 NSNumbers are created at the beginning of every for loop, and they are explicitly released at the end of each for loop (you own them). There is never more than 6 NSNumbers in memory.

In the second example you don’t own any of the NSNumbers, but by creating your own autorelease pool, you control their lifespan. Because you create and release an autorelease pool in the loop, the NSNumbers only live the duration of the for loop, so you never have more than 6 NSNumbers in memory. Had you not created the autorelease pool, at the end of every for loop you’d have 6 NSNumbers waiting to be deleted, and by the end of the function there’d be 6×100=600 NSNumbers hanging around in memory. Combine that with other autorelease objects allocated in other functions and you can have an awful lot of unused objects which are going to be released soon (so no memory leak), but potentially you may hit your memory limits if you don’t release as you go.