I am currently working on a HOWTO on VBA to OOB migration. As part of this I have been doing a benchmark comparison on their relative performance. The preliminary results are:
Run Time in Secs On 1.2Ghz P3M Laptop | VBA | OOB | Ratio |
ShellSort(Intensive Basic) | 0.29 | 19.24 | 1:66 |
Test Harness (Lots of UNO / COM Calls) | 1.11 | 19.66 | 1:18 |
The sample one-sigma was typically 3% or so over a batch of runs giving a broad comparison 1 Sec VBA = 1 Min OOB for pure code. On my Laptop OOB runs at about 50 KIPS which is fine for non-looping algo’s, but it could take a VBA migrator by nasty surprise when migrating a heavy one.
This was run under XP/SP2 agains MSOffice 2003. Also note that from PERFMON I could see that the Calc and Basic RTS are running one-task in SOFFICE.BIN. This was all user compute. It just looks like the design/performance of the Basic interpreter/RTS means that it runs like a total dog. FYI VBscript runs at about a 1:2 ratio — that is 30 times faster then OOB on the sort routine at least. I haven’t done the timing on using VBscript over the Automation Bridge yet.
BTW With the exception of toggling one True/False this runs unchanged on both VBA and OOB.
The Benchmark routine — A Shell Sort
Option Explicit ' ' Module ShellSort ' Public Function ShellSort(ByRef A() As Variant) As String ' ' Complements to Donald Knuth, "Art of Computer Programming ' Vol 3 - Sorting and Searching" ' ' There are various ShellSort variants. All use a process of sorting ' a set of collapsing subsequences. If you think of a bubble sort, the worse ' runtime is O(N*Nb) where Nb is the largest distanc of an entry in an ' unsorted position from its sorted position. The trick that the Shell ' algortihm exploits is to do the initial passes to do large swap bounds, ' decreasing these as the sort progresses, so the journey of an individual ' element from its start position look quite like a binary chop. ' ' The exchange sort variant below runs in about half the time of a bubble sort ' variant. You can also play bounds with the nesting strategy. This ' implementation uses a Sedgewick Sequence. RTFM for more info. ' ' The average sort time is O(N^(7/6)) and worst case O(N^(4/3)). For comparison ' The average sort time is O(N*Log N) and worst case O(N^2) for Quicksort. Dim Alb As Variant, Aub As Variant, Aspan As Variant Dim nSedgewick As Variant, nShell As Long, inc As Long Dim i As Long, j As Long, tmp As Variant Dim nMoves As Long, sDebug As String sDebug = "" Alb = LBound(A) Aub = UBound(A) Aspan = Aub - Alb + 1 nSedgewick = Array(1, 5, 19, 41, 109, 209, 505, 929, 2161, 3905, 8929, 16001, _ 36289, 64769, 146305, 260609, 587521, 1045505, 2354689, _ 4188161, 9427969, 16764929, 37730305, 67084289, _ 150958081, 268386305, 999999999999#) nShell = 0 Do While nSedgewick(nShell + 1) < Aspan: nShell = nShell + 1: Loop Do While nShell >= 0: ' sort by insertion in increments of h inc = nSedgewick(nShell) For i = Alb + inc To Aub tmp = A(i) For j = i - inc To Alb Step -inc If A(j) <= tmp Then Exit For A(j + inc) = A(j) Next j A(j + inc) = tmp Next i sDebug = sDebug & Chr(10) & inc & ", " & nMoves & ", " & (Aub - Alb - inc) nShell = nShell - 1 Loop ' The real reason that this is a function is to avoid the variant syntax ' of sub calls in VBA vs. OOB ShellSort = sDebug End Function
Module TestHarness
Option Explicit #Const OOB = True ' Set this True or False depending on whether ' your code is in Calc or Excel Const nArray = 20000 Sub test() Dim x(), timer1, timer2, sDebug As String ReDim x(nArray - 1) ' Disable screen updating and AutoCalculate #If OOB Then Msgbox "def OOB" ThisComponent.addActionLock ThisComponent.lockControllers ThisComponent.enableAutomaticCalculation (False) #Else Application.ScreenUpdating = False Application.Calculation = xlCalculationManual #End If ' timer1 times the load and fetch overhad (UNO/COM intensive) timer1 = getTimeMS() LoadArray x, 1 ' timer1 times the sort (pure Basic runtime no UNO/COM) timer2 = getTimeMS() sDebug = ShellSort(x()) timer2 = getTimeMS() - timer2 StoreArray x, 2 timer1 = getTimeMS() - timer1 - timer2 ' Now post the debug back to the sheet StoreArray Array(sDebug, _ "Sort Time = " & (timer2 / 1000!) & " sec" & Chr(10) & _ "Load/Store Time = " & (timer1 / 1000!) & " sec" _ ), 3 ' Put Screen Updating and AutoCalculate back to defaults #If OOB Then ThisComponent.enableAutomaticCalculation (True) ThisComponent.unLockControllers ThisComponent.removeActionLock #Else Application.ScreenUpdating = True Application.Calculation = xlCalculationManual #End If End Sub Sub FillSheetColumn() Dim vArray(nArray), i, y For i = 0 To UBound(vArray) vArray(i) = Chr(Int(25 * Rnd() + Asc("A"))) & CStr(Int(10000000 * Rnd())) Next i StoreArray vArray, 1 End Sub Private Sub LoadArray(x, ByVal nCol) Dim i #If OOB Then With ThisComponent.Sheets(0) For i = 0 To UBound(x) x(i) = .getCellbyPosition(nCol - 1, i).String Next i End With #Else With ActiveWorkbook.Sheets(1) For i = 0 To UBound(x) x(i) = .Cells(i + 1, 1).Value Next i End With #End If End Sub Private Sub StoreArray(x, ByVal nCol) Dim i #If OOB Then With ThisComponent.Sheets(0) For i = 0 To UBound(x) .getCellbyPosition(nCol - 1, i).String = x(i) Next i End With #Else With ActiveWorkbook.Sheets(1) For i = 0 To UBound(x) .Cells(i + 1, nCol).Value = x(i) Next i End With #End If End Sub Function getTimeMS() As Long #If OOB Then getTimeMS = CLng(CSng(GetSystemTicks) / 0.98) #Else getTimeMS = CLng(Timer() * 1000) #End If End Function
Out of interest, I also recoded this routine in Perl to see how the Perl RTS copes with this algorithm. I just focussed on the sort timings and didn’t bother to get the automation bridge bit going, but here are the timings:
RTS | Time (sec) | Ratio |
OOB | 19.24 | |
VBA | 0.29 | 1:66 |
Perl | 0.86 | 1:22 |
So Perl is of the same order as VBscript and these set the benchmark for what this sorts of byte-code interpreter could achieve. I spend a couple of hours going through the 66K lines of compiler and RTS code so now I understand why OOB runs like such a dog. To really bottom the fine details would need to do an execution code walk, but that involves a lot of work to set up a test environment.
The nub of the problem is that compiler has a very simple code generation strategy and just about leaves everything to the byte-code interpreter in the RTS. The storing of variables, execution of functions all work off the stack and the code seems reasonable lean. The problem comes when you want to access the variables. So a = b+c gets translated to
FIND “b”, type1 GET FIND “c”, type2 GET PLUS FIND “d”, type3 SET
The problem is that the FIND has to do a lot of work, and this really hits you hard when you have something like:
For i = 0 To 20000 ThisComponent.Sheets(0).CellbyPosition(nCol - 1, i).String = x(i) Next i
This will call this find function (or one of its variants) 160,000 times. The work involved for a local variant which binds to primitive data type is reasonable lightweight (60,000 calls), array element access slightly worse (another 20,000), but the killer is the object and method accesses (the other 80,000) where the RTS does all the necessary reflection / inspection on the objects and methods / properties. 99.99% of this is duplicated and entirely redundant. What is worse here is that you think you might help by using the With construct, but is just syntactic sugar which is substituted at compile-time, so the following is equivalent to the above:
With ThisComponent.Sheets(0) For i = 0 To 20000 .CellbyPosition(nCol - 1, i).String = x(i) Next i End With
However recoding this as:
k = ThisComponent.Sheets(0) nm1 = nCol - 1 For i = 0 To j k.getCellbyPosition(nm1, i).setString(x(i)) Next i
manually hoists unnecessary work out the loop reducing the total runtime of this piece of code by by nearly 60%!
What is very clear is that the Basic compiler and RTS ignore two essential tenants of good environment design
- Never put off to runtime what you can do effectively at compile time
- If you have to do it at runtime, the cache invariant work so that you only do it once not thousands of times.
Now the first of these would require a major piece of re-architecture so couldn’t be taken lightly, but I reckon that you could implement a look-aside cache to bypass 90% of this repeated find work with perhaps a few weeks work. Isn’t this a worthy project if we have a 10x speed up of the Basic performance as a result?