Posted On: Monday, October 15th, 2007 (ResExcellence, Source Code, Strings)
Posted by: Paul Lefebvre

by Seth Willits

In this tutorial we’re going to write a CapacityString class which will vastly improve string performance in certain situations. Now, I admit this tutorial isn’t exactly going to be eye catching, but I think for some of you it will be quite an eye opener.

The Problem

Let’s say that you’re going to be importing some data from a file, processing it, and outputting the results to a string. Since this process can take quite a while you’re going to want to display a progress dialog with a progress bar that increments realistically based on the current position in the file. Normally you’d have something that looks like this:

bin = File.OpenAsBinaryFile
length = bin.Length 
 
For i = 1 To length Step 2048
   s = s + ProcessData( bin.Read(2048) )
   // show progress
Next

There’s nothing extraordinary about this code at all, but what if I told you that it could be sped up by over 50 times? It’s quite possible, and easy.

Note that every time through the loop we assign a value to the string “s”, and during that assignment REALbasic reallocates a block of memory to store the contents of that string. The problem is that allocating memory is actually pretty slow, so doing it over and over and over again is very inefficient. The solution is to simply allocate enough memory up front so that it never has to be reallocated. Sounds easy, and it is, but REALbasic strings can’t do this so what we need to do is do it ourselves using a MemoryBlock.

The CapacityString Class

Create a new class called CapacityString and add three properties to it: mCapacity as Integer, mLength as Integer, and mData as MemoryBlock. mData is the chunk of memory that we’re going to be using to store the string, mCapacity will cache the size of the MemoryBlock (although it will always contain the value returned by mData.Size, the fewer function calls we make the faster the code will be), and because the string inside the MemoryBlock will almost never be the size of the MemoryBlock itself, we use mLength to store the size of the string.

Sub Constructor(capacity as Integer)
   mCapacity = capacity
   mData = New MemoryBlock(mCapacity)
End Sub 
 
Function Operator_Convert() As String
   Return mData.StringValue(0, mLength)
End Function

The constructor initializes the mData MemoryBlock to have the capacity we want, and Operator_Convert is a handy method to return the string that is stored in the CapacityString.

The SetString method below sets the string in the CapacityString. The first thing that each of these methods below does is first check to see if the string will actually fit inside of the MemoryBlock. If it doesn’t, it resizes (within the method, function calls would add overhead ;^) and then assigns the string.

Sub SetString(s as String)
   Dim slen As Integer = LenB(s)
   If mCapacity < slen Then
      mCapacity = slen
      mData.Size = mCapacity
   End If 
 
   mData.StringValue(0, slen) = s
   mLength = slen
End Sub 
 
Sub AppendString(s As String)
   Dim slen As Integer = LenB(s) 
 
   If mCapacity < mLength + slen Then
      mCapacity = mLength + slen
      mData.Size = mCapacity
   End If 
 
   mData.StringValue(mLength, slen) = s
   mLength = mLength + slen
End Sub

AppendString is similar to SetString but just adds the string onto the end. This is equivalent to “s = s + …”. The InsertString method below doesn’t have a direct equivalent to REALbasic’s String type because you have to use Mid or Left and Right with Strings to be able to insert text in the middle. So this not only speeds things up, but gives us extra functionality. That’s nice. :^)

Sub InsertString(location As Integer, s As String)
   Dim slen As Integer = LenB(s)
   If mCapacity < mLength + slen Then
      mCapacity = mLength + slen
      mData.Size = mCapacity
   End If 
 
   // 0 based
   location = location - 1
   mData.StringValue(location + slen, mLength - location) = mData.StringValue(location, mLength - location)
   mData.StringValue(location, slen) = s
   mLength = mLength + slen
End Sub

For a simple test of the class, you can use this code:

Sub Action()
   dim s as String
   dim cs as CapacityString
   dim bin as BinaryStream
   dim i, length as Integer
   dim time as Double
   dim file as FolderItem   file = GetOpenFolderItem("") 
 
   if file = nil then return 
 
   ///////////////////////
   // Using a String
   ///////////////////////
   time = Microseconds
   bin = File.OpenAsBinaryFile
   length = bin.Length 
 
   for i = 1 to length step 2048
      s = s + bin.Read(2048)
   next 
 
   time = (Microseconds - time) / 1000000
   MsgBox "String: " + Format(time, "###.##") + " seconds"
   bin.Close 
 
   ///////////////////////
   // Using a CapacityString
   ///////////////////////
   time = Microseconds
   bin = File.OpenAsBinaryFile
   length = bin.Length
   cs = New CapacityString(length) 
 
   for i = 1 to length step 2048
      cs.AppendString bin.Read(2048)
   next 
 
   time = (Microseconds - time) / 1000000
   MsgBox "CapacityString: " + Format(time, "###.##") + " seconds"
   bin.Close
End Sub

Finished

This isn’t a completely “finished” class as it doesn’t take every string posibility into account, but it’s a solid foundation for anyone wanting to take the idea even further.

Download CapacityString REALbasic project

Originally published by ResExcellence
Reprinted with permission

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

This entry was posted on Monday, October 15th, 2007 at 10:35 am and is filed under ResExcellence, Source Code, Strings. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply