Iterating in Java [originally, Iterate through the Tulips]
Timothy J. Rolfe
Article accepted by Dr. Dobbs Journal It is available as a Word document as well.
Within Java, there is a powerful pair of object types built for going through the elements of a collection, the Iterator and the ListIterator. They provide an implementation-independent method for access, in which the user does not need to know whether the underlying implementation is some form of array or of linked list, and allows the user go through the collection without explicit indexing. This paper will examine how they might be used in one of the standard list operations (sorting), seeing how they can simplify the operation, but at the expense of efficiency.
For both kinds of iterators, the iterator keeps track of what will be examined next: conceptually, it can be considered to be sitting between elements within the collection, or at the front or the back. The interface Iterator is the simpler of the two. It only mandates three methods:
boolean hasNext(), whose purpose is quite clear in its name.
Object next(), which returns the next element in the collection.
void remove(), which removes from the collection the last element returned by next().
These methods facilitate the traversal of the collection, without worrying about linkages within linked lists or subscripting within arrays.
Note that the interface Iterator only traverses the collection in one direction, and only supports a single modifier for the collection, the remove. If you require a more powerful way to manipulate the collection, you need to use the interface ListIterator. This contains methods for traversing the collection in both directions (previous() as well as next()). In addition, it contains more methods for modifying the collection; beside add() and remove(), it also allows modification of the collection by replacing one element with another set(Object). Further, the ListIterator gives you access to the index within the collection of the element that will be accessed via either next() or previous(): nextIndex() and previousIndex().
When used with array-based collections, the iterator objects allow for removals and (with the list iterator) additions while hiding from the user the memory movements within the array that those operations require.
To "learn by doing", I'm going to work through three sorting algorithms, looking at what you need to do to implement them by directly manipulating a doubly-linked list, and then what you need to do to implement them through a ListIterator (since the Iterator itself is not powerful enough).
Selection Sort
About the simplest and most intuitive sorting algorithm is the one that poker players might use to put their card hands in order: initially, hold all of your jumbled cards in one hand, and then repeatedly move the largest available card to the back of the cards held in the other hand.
Listing One shows the implementation of this in Java 1.5.x (that is, using generic collections, similar to using templates within C++) within a linked list. Since it moves values around among the nodes of the list, the fact that this is a doubly-linked list does not enter into the code. Specifically, this is a method added to the class KWLinkedList that accompanies Koffman and Wolfgang's Objects, Abstraction, Data Structures and Design Using Java Version 5.0.
Listing One
// The list structure is unchanged, while data values are moved
// around within it. Since the class Node is a nested class, we
// have access to its private fields.
public void selectSort()
{ Node<E> front = head;
if ( front == null ) // Empty list
return;
// When only one item remains, we are finished: it is the largest
while ( front.next != null )
{ Node<E> check = front.next,
small = front;
E smallValue = small.data;
while ( check != null )
{ if ( smallValue.compareTo(check.data) > 0 )
{ small = check;
smallValue = small.data;
}
check = check.next;
}
// Finish the swap begun by setting smallValue
small.data = front.data;
front.data = smallValue;
// Advance to the next position in the list
front = front.next;
}
}
If, however, you were using the class java.util.LinkedList, you would not have access to the nodes. That means you can't just keep your finger on the node with the smallest value; all you can capture is what and where that smallest value is. That means that after you have found the smallest value, you have to send an iterator down to that position to access the node. Remember that the generation of an iterator at a given index requires the traversal of the linked list from its beginning down to that position linked list is not a random-access structure. While you could have one iterator acting like the outer loop (keeping track of the front of the remaining list), the iterator accomplishing the task of the nested loop would always need to traverse the entire list. It first traverses down to the starting point and then traverses to the end as you look for the smallest value.
If, however, you use an iterator, you can avoid many of these traversals by actually building a new list. To accomplish this, you first clone the existing list; that is, you generate a "shallow copy", a linked list that references the same nodes as the original list. Once you've done that, you can use the clear() method on the original list (that is, set its head and tail references to null and its size to zero) and build your new list there. Now you make a slight shift: repeatedly find the largest element in the old list, remove it, and put it at the front of the new list. This way each traversal of the old list gets ever shorter. Since you're building the new list in a stack-like manner, you effectively reverse the order, so that the resulting list goes from smallest to largest.
Listing Two shows the implementation of the algorithm described above. While it references the KWLinkedList, it is in fact using methods that also exist in java.util.LinkedList. Since this is a static method (within a little library of ListIterator sorting methods), it does not jump through the hoops necessary to use generics, but instead depends on using Comparable objects. The Java version 1.5.x compiler does get rather unhappy with the unchecked references!
Listing Two
public static void selectSort ( KWLinkedList work )
{ KWLinkedList oldList = work.clone();
ListIterator finder;
Comparable maxValue;
if ( oldList.size() < 2 ) // Empty or single-element list
return;
work.clear(); // Empty out the list to receive the new contents
// When only one item remains, it is the smallest and goes first
while ( oldList.size () > 1 )
{ int maxIndex = 0;
finder = oldList.listIterator(0);
maxValue = (Comparable) finder.next();
while ( finder.hasNext() )
{ Comparable testValue = (Comparable) finder.next();
if ( maxValue.compareTo(testValue) < 0 )
{ maxValue = testValue;
// We've just stepped PAST the item, so correct for that
maxIndex = finder.nextIndex() - 1;
}
}
// Note: this generates an extra traversal
finder = oldList.listIterator(maxIndex);
maxValue = (Comparable) finder.next();
finder.remove(); // Eliminate the element returned by finder.next()
work.addFirst(maxValue); // Add to the FRONT of the new list.
}
// Move the last value --- the smallest item
finder = oldList.listIterator(0);
maxValue = (Comparable) finder.next();
finder.remove(); // not really required, given garbage collection
work.addFirst(maxValue);
}
Insertion Sort
Another fairly intuitive sorting algorithm is the one that bridge players might use to put their card hands in order: from the pile of cards in front of you, pick up one card at a time and put into its proper place with the cards that you already have picked up.
Listing Three shows the implementation of this in Java 1.5.x within a linked list. Since the method does move nodes around, it is important that you correctly deal with the doubly-linked list. The head of the existing list is copied to a node that will be used to walk through that list. Then the initial list is transformed into a single-entry list. As you walk through the old list, you remove each node and then insert it into its proper place in the new list. This task is made somewhat complex by the doubly-linked list and the cases of inserting into a doubly-linked list at the front, middle, and back of the existing list. As often happens with linked lists, the code gets long handling all the cases.
Listing Three
public void insertSort()
{
Node<E> working = head;
// No work to do if the list is empty or has just one element
if ( head == null || head.next == null )
return;
// The nodes to be inserted begin with the SECOND
working = working.next;
// Transform the existing list into a one-node list
head.prev = head.next = null;
tail = head;
working.prev = null; // I.e., this is now the front of the list
while ( working != null )
{
Node<E> current = head;
Node<E> hold = working; // Insert node from old list
E holdData = hold.data; // Separate copy for convenience
working = working.next; // Advance old list pointer
if ( working != null )
working.prev = null; // Keep it a proper list
hold.next = hold.prev = null; // And remove links to old list
// Find where this element belongs in the list being generated
while ( current != null ) // Traverse past insertion point
// "<= 0" to guarantee a stable sort if ( holdData.compareTo(current.data) <="0" ) break; else current="current.next;" // The "hold" node belongs BEFORE current hold.next="current;" // True, however we got here. if ( current="=" null ) // Add to end of list { hold.prev="this.tail;" this.tail="this.tail.next" ="hold;" } else // Check whether this is the beginning or farther along { if ( current.prev !="null" )// Add in the middle { hold.prev="current.prev;" current.prev.next="hold;" current.prev="hold;" } else // Add at the front { hold.prev="null;" current.prev="hold;" head="hold;" } } } }
The iterator implementation, on the other hand, doesn't have to deal with all that messy doubly-linked list stuff, as seen in Listing Four. It can trust the ListIterator methods to remove and add elements correctly. The spirit is the same, though. Remove from the old list and insert into proper position within the new list. Start by cloning the list to be sorted, and then use the clear() method on the list received so that it will end up with the sorted contents. Move one element directly over from the old list to the new list (using the LinkedList method addFirst). Then walk through the old list, removing elements and explicitly inserting them into proper position within the new list (using the ListIterator method add). Since the ListIterator methods handle the messing linked list stuff, the program is briefer and clearer. Again, as a static method it does not use generics and so uses Comparable objects. The methods of KWLinkedList used are identical with those in java.util.LinkedList.
Listing Four
public static void insertSort ( KWLinkedList work )
{ KWLinkedList oldList = work.clone();
ListIterator oldIter = oldList.listIterator();
Comparable hold;
if ( ! oldIter.hasNext() ) // Empty list. Return.
return;
hold = (Comparable) oldIter.next(); // Pull first item
if ( ! oldIter.hasNext() ) // One-item list. Return
return;
work.clear(); // Empty out the list to receive the new contents
work.addFirst(hold); // First entry in the new list
oldIter.remove();
while ( oldIter.hasNext() )
{ ListIterator finder = work.listIterator(0);
hold = (Comparable) oldIter.next();
oldIter.remove(); // Explicit REMOVE this from the list.
while ( finder.hasNext() )
{ Comparable test = (Comparable) finder.next();
if ( test.compareTo(hold) > 0 )
{ finder.previous(); // Move past the in-position item
break;
}
}
finder.add(hold);
}
}
Merge Sort
Within the computing community, these two algorithms are well known as not being appropriate for large data sets (unless, for insertion sort, the data set is nearly in proper order). The time required increases as the square of the problem size. When dealing with linked lists, the merge sort algorithm is preferable the time required increases only slightly faster than the problem size. In the jargon, merge sort is an order(n log n) algorithm. Here the fundamental operation is merging two sorted lists into a single, larger sorted list. You also can invoke my favorite expression while teaching computing: "Recursion is your friend!"
Merge sort is best viewed as a recursive method. When you get the list to be sorted inside of merge sort, you check to see if it has size 0 or 1 so that it is already in order and you don't have anything to do. Otherwise you break the list into two lists of equal or nearly equal size, calling the first half the "left" list and the second half the "right" list. Recursively sort those two lists and then merge them into a final list. To do that, repeatedly check the beginning of the two lists. If the left entry is less than or equal to the right entry, move the left entry over to the final list, otherwise move the right. Once one of the two lists has gone empty, just move the rest of the other list over to the final list. [Note: we select from the left list on equal values to make this a "stable" sorting algorithm, namely one that does not undo an existing order. In other words, if in the original list item A has a lower index than item B, then if A and B are equal A still has a lower index than B in the sorted list.]
As you build more powerful sorting methods, you have to work harder. Listing Five shows the implementation of this in Java 1.5.x within a doubly-linked list. It also uses a nonstandard constructor that I added to KWLinkedList one that receives the references for the head, the tail, and the size and constructs a list with those fields, in the process insuring that the resulting list is a correct doubly-linked list. That means that the head node has a null previous link, and the tail node has a null next link. If the left-list size is size/2 (with truncation, if appropriate), you break the list in half by taking the appropriate number of steps from the head node to be sitting on the last node in the left list. From there you know what will be the head node in the right sublist (the one immediately after this one), and you know the last node in the right sublist since it is the last node in the input list. Once you've built the right sublist, you can easily build the left sublist, using the input list's head node as first and the node you're sitting on as the last.
Recursion is your friend: make the calls and these two lists have become sorted. Now you just have the merge process to go through. First, though, you can use a technique to avoid having the first entry as a special case: you use a dummy header. The real list begins after that single node. You keep track of the node that the next entry comes after call it the current node. You then set up to step through the left and right lists. You repeatedly check how the nodes at the front of those two lists compare, moving from the right list if that is strictly less than the left entry, or else from the left list. Once one of the two lists has gone empty, you can just tack the non-empty list to the back of the list you're building make that entire list come after the current node. Once you're finished, you just discard the dummy header. Since Java does your garbage collection for you, all you need to do is to update what head refers to.
Listing Five
public void mergeSort()
{ KWLinkedList<E> leftList, rightList;
Node<E> current = this.head;
int leftSize = this.size / 2,
rightSize = this.size - leftSize,
k; // loop variable
if ( this.size < 2 ) return;
// Traverse to the _end_ of the left sublist
// Start at 1 to go (leftSize-1) steps
for ( k = 1; k < leftSize; k++ )
current = current.next;
// NOTE: it is CRITICAL that the right list be generated first, since
// creation of the left list sets current.next to null.
rightList = new KWLinkedList<E> ( current.next, this.tail, rightSize );
leftList = new KWLinkedList<E> ( this.head, current, leftSize );
this.head = this.tail = null; this.size = 0;
// Recursively sort these two sublists.
leftList.mergeSort(); rightList.mergeSort();
// Start out the list with a dummy header for convenience ---
// that way the first entry is not a special case.
current = this.head = new Node<E>(null);
// Merge these two lists into a single list
// Execute until one of the two lists goes empty.
while (leftList.head != null && rightList.head != null )
{ // ? Pull from the right list ?
if (leftList.head.data.compareTo(rightList.head.data) > 0 )
{ current.next = rightList.head; // Link in the node
rightList.head = rightList.head.next;// Advance this list
}
else // pull from the left list.
{ current.next = leftList.head; // Link in the node
leftList.head = leftList.head.next; // Advance this list
}
current.next.prev = current; // Set the back link
current = current.next; // Update the current node
}
// Append the non-empty list at the back of the current list
if ( leftList.head != null ) // Left sublist has contents
{ current.next = leftList.head;
current.next.prev = current; // Set the back link
this.tail = leftList.tail; // and grab the tail setting
}
else // Right sublist has contents
{ current.next = rightList.head;
current.next.prev = current; // Set the back link
this.tail = rightList.tail; // and grab the tail setting
}
// Regenerate the size
this.size = leftList.size + rightList.size;
// REMOVE the dummy header --- the garbage collector will recover it.
this.head = this.head.next;
}
As seen in Listing Six, the iterator implementation doesn't have to deal with all that messy doubly-linked list stuff. Here, though, you break the list in half by expressly moving items into the left sublist. What remains of the list is then the right sublist you can just use the clone() method mentioned above to generate it. Then, as above, you can use the clear() method to empty out the initial list. As in the previous example, you just let recursion sort the two sublists and then begin the merge logic. The logic is the same, except that you're using iterators to accomplish it. Here, though, instead of moving nodes around you're explicitly adding items to the list you're building. You'll pay for this: your removals from the two sublists require garbage collection, and your insertions to the back of the new list require construction of new nodes. Also, once you've emptied out one of the sublists, you need to expressly empty out the remaining sublist and add all of its entries to the back of the new list. Not counting the size-1 lists (which require no processing) n-1 lists are processed, varying in size from n down to 2.
Listing Six
public static void mergeSort ( KWLinkedList working )
{
KWLinkedList left, right;
ListIterator iter = working.listIterator(0),
leftIter, rightIter;
int size = working.size(),
sizeLeft = size/2, index;
Comparable leftValue, rightValue;
if ( size < 2 ) return;
/*************** Split the list into two smaller lists ***************/
left = new KWLinkedList();
for ( index = 0; index < sizeLeft; index++ )
{ left.addLast((Comparable)iter.next());
iter.remove();
}
right = working.clone(); // The rest constitutes the right list
working.clear(); // Empty out for rebuilding
/*************** Recursively sort the two smaller lists ***************/
mergeSort(left); mergeSort(right);
/*************** Merge the two smaller lists ***************/
leftIter = left.listIterator(0);
leftValue = (Comparable) leftIter.next();
rightIter = right.listIterator(0);
rightValue = (Comparable) rightIter.next();
// Main loop: both lists are non-empty
while (leftValue != null && rightValue != null)
{ if (rightValue.compareTo(leftValue) < 0)
{ working.addLast(rightValue);
if ( rightIter.hasNext() )
rightValue = (Comparable) rightIter.next();
else
rightValue = null;
}
else
{ working.addLast(leftValue);
if ( leftIter.hasNext() )
leftValue = (Comparable) leftIter.next();
else
leftValue = null;
}
}
// Clean-up: we know which list is empty
if ( leftValue == null ) // Right list is not empty
{ working.addLast(rightValue);
while ( rightIter.hasNext() )
working.addLast((Comparable) rightIter.next());
}
else // Left list is not empty
{ working.addLast(leftValue);
while ( leftIter.hasNext() )
working.addLast((Comparable) leftIter.next());
}
}
Pay the Piper
As you look at the complexity of the explicit linked list implementation of insertion sort as compared with the relative simplicity of the iterator code, you might wonder, "Why not just always use the iterator version?" The answer is a simple one, known for a long time: TANSTAAFL "There Ain't No Such Thing As A Free Lunch", a phrase that Robert Heinlein popularized in his The Moon is a Harsh Mistress. If you depend on the iterator to do the linked list stuff, you have to pay the overhead of having another object for the task. The code shown in the listings appears in a benchmark program to determine experimentally their behaviors. It was written to save results into a "comma-separated value" file, which then allowed manipulation and graphing of the results within Excel. [The Excel workbook and benchmark program are available electronically from the Resource Center (see page x).]
The benchmark program runs both the direct implementation methods and the iterator methods for all three algorithms plus the static method Collections.sort(List) and captures averages for both the number of comparisons required and the milliseconds required to sort lists of varying sizes. [The number of comparisons is captured by using a wrapper class MyComparable, shown in Listing 7.]
The benchmark program was run on a Dell Optiplex GX620 computer with dual-core IntelŪ PentiumŪ 4 computer with 3.60GHz processors under Ubuntu Linux version 2.6.12-8-686-smp. The Java programs were compiled under Java version 1.5.0, and run with Java version 1.5.0_04. For each list size, the program averaged 10,000 random initial lists to obtain the average comparisons and the average milliseconds required.
As you might expect, the average number of comparisons for the two implementations of each algorithm does not change. Also, using Collections.sort on a Java API LinkedList ends up requiring about 6% more comparisons than directly implementing a merge sort. Since the documentation for Collections.sort indicates that it pulls the list contents into an array, sorts the array, and then puts the sorted contents back into the list, you're probably seeing the use of Arrays.sort, which uses merge sort when presented with an array of objects. It does one extra comparison, since after sorting the two array segments it checks whether the end of the left segment is less than or equal to the beginning of the right segment in which case it omits the merge.
It is the average milliseconds required to sort where you see massive differences between the methods that directly implement the liked list code for sorting and the methods that contract out the linked list manipulations to iterators. Figure 1 shows the averages times for the two order(n2) algorithms (selection sort and insertion sort). As is commonly the case, selection sort requires more time that insertion sort. Figure 2 shows the average times for the order(n log n) algorithms, the two explicit merge sort implementations and the use of Collections.sort. Note the change in scale between the two figures for lists of size 200, the iterator implementation of merge sort required less than 0.4 msec, which is faster than the direct implementation of select sort and the iterator implementation of insertion sort.
Listing Seven
public class MyComparable <E extends Comparable<E>> implements Comparable<E>
{ private static long count = 0;
private E value;
public MyComparable ( E item )
{ value = item; }
public E getValue()
{ return value; }
public String toString()
{ return value.toString(); }
public int compareTo(E o)
{ MyComparable<E> rt = (MyComparable<E>)o;
count++;
return value.compareTo(rt.value);
}
public int freeCompare(E o)
{ MyComparable<E> rt = (MyComparable<E>)o;
return value.compareTo(rt.value);
}
static public long nCompares()
{ long hold = count;
count = 0;
return hold;
}
}

Figure 1: Average Time
Required for Selection and Insertion Sort

Figure 2: Average Time Required for Merge Sort and
Collections.sort
Figure Three gets to the heart of the issue: it shows the ratios of the average time required for three algorithms between their direct implementations as linked list manipulations and the implementations by way of iterators. Selection sort requires about double the time when you use iterators, while insertion sort requires a bit less than double the time. With all of the garbage collection traffic for merge sort (emptying out lists to build other lists), you see a time ratio of about 2.5.
The time ratios also turn out to be operating-system dependent! These results come from running under a Linux version. If you run the same experiment under Microsoft Windows, you see the iterator implementations doing significantly worse. Since you are collecting wall-clock time (System.currentTimeMillis() drives the timings), you need to get as quiet an environment as possible. You can accomplish this by booting Windows into "Safe" mode. I did this in parallel with the Linux experiments on an identical machine. Specifically, the computer was running the Windows XP-Professional operating system in safe mode, with the code running in an MS-DOS window. It executed significantly slower (about half the speed, though that may be an artifact of running in safe mode). The time ratios have about the same placement as with the Linux runs, but at lists of size 200, insertion sort showed a time ratio of about 2.20, selection sort showed about 2.35, and merge sort showed about 2.94. This is despite the fact that the Windows XP runs were also performed under Java version 1.5.0_04. For some reason, under Windows the iterators operate a bit slower or the direct implementation operates a bit faster.
Of course, as is clear from Figure Two, your best bet is to just forget about writing your own sorting method and use the Java API version available in Collections.sort!

Figure 3: Time Ratios (Iterator Implementation / Direct
Implementation)
Acknowledgement
I wish to thank Tom Sathre of Denver, CO. He provided some useful comments as I was preparing this article.