Silverlight Hack

Silverlight & related .NET technologies

About Me

Welcome to Silverlighthack.com.  This is a site where you can find many articles on Silverlight, Windows Phone 7 and .NET related technologies.  

My name is Bart Czernicki.  I have been working with computers since 1988 and have over 12 professional years in the IT field focusing on architecture, technology strategy and product management.  I currently work as a Sr. Software Architect at a large software development company.

Below is the cover of my new book that shows how Silverlight's unique RIA features can be applied to create next-generation business intelligence (BI 2.0) applications.

Silverlight 4 Business Intelligence Soft 

Contact: bartczernicki@gmail.com

View Bart Czernickis profile on LinkedIn

NONE of the comments or opinions expressed here should be considered ofmy past or current employer(s).  The code provided is as-is without anyguarantees or warranties.

Silverlight MultiThreading with a Computational Process (Counting Primes)

Intro 

In my previous post about multithreading I looked at the importance of using multithreaded techniques to improve the performance of the UI controls. This post focuses on using multiple threads to improve computational performance inside Silverlight.

I saw this article by Tim Anderson comparing Silverlight and Flash in a computational situation by counting primes.  I also saw derivatives of this article comparing the performance of Silverlight vs JavaScript/TraceMonkey.  Silverlight is a clear winner in both articles.  However, we can do better.  This article will focus on how we can improve the performance of Tim's code by adding multithreaded processing inside Silverlight.  I am not focusing on math theory, the counting prime algorithm or hacks to make other versions better.  This article focuses on some straightforward Silverlight 2 multithreading optimizations.

kick it on DotNetKicks.com

Demo and Beta 2 Source Code are provided.

Update 10/14/2008..The demo is now on Silverlight 2 RTM and the Silverlight 2 Source Code is up.

Overview of the example & demo:

First, check out Tim Anderson's example and run the example on your machine to see what we are going to be optimizing.  Also, note the performance of the runtime on your workstation.  Run it a couple of times each to get a good baseline. Silverlight should be faster by a factor of 4.

 

 

So let's improve the code with more threads (remember, not the algorithm; I want to keep it the same).  I created a small application that has the standard implementation (standard code); an example with two threads and an example with three threads.  Click on the example below to launch it.

 

  • The first button is essentially Tom's code and performs the code in single thread
  • The second button spins up 2 threads using the BackgroundWorker to calculate the primes (using the same code in the first button)
  • The third button spins up 3 threads using the BackgroundWorker to calculate the primes  (using the same code in the first button)
  • The last input UI control is a textbox that allows you to enter an integer (1 to whatever) to add a factor of 1,000,000 to end with.  For example, 2 will calculate the primes for 2,000,000 integers;  3 will calculate for 3,000,000 integers.  This way we can increase the number of integers we want to calculate by simply changing the integer in the text box  (There is a specific reason why I implemented the prime counter to use this method as I cheat a little bit in the implementation.  More on that below)
Overview of the multithreaded implementation: 

My implementation of how the multiple threads calculate a single process (i.e., counting the primes from 1 -> 1,000,000) is very similar to the logic that is implemented by the Microsoft Parallel Extensions.  Check out the video on Channel 9 where Joe Duffy and Igor Ostrofsky explain how some of the parallel extensions work.  I would recommend watching the entire video; however, you can ffwd to about the 8 minute mark to get a decent understanding of the logic I am mimicking.  Below is a screen shot of the logic that the parallel LINQ uses when dealing with some kind of process on a data structure like an array:

Let's overview the logic that Parallel LINQ uses (It does MUCH more than this, but we are doing something simple at a high level):

  • We have an input array of 5 numbers.
  • In the logic above we are spinning up 2 threads.
    • The first thread will do some kind of processing on the integers: 3, 1, 2
    • The second thread will do some kind of processing on the integers: 5, 6
  • Each thread will store each product in an intermediate data structure
  • After both threads are completed, then the output from the two intermediate structures is combined and returned to the main thread.

This is exactly the same logic I am using for counting primes in a multithreaded environment.

  • We have a number of integers we want to count primes to (i.e., 1,000,000)
  • Say we want to use three threads to calculate primes from 1 -> 1,000,000
    • The first thread will handle the numbers between 1 -> 333,332
    • The second thread will handle the numbers between 333,333 -> 666,666
    • The third thread will handle the numbers between 666,667 -> 1,000,000
  • As each thread finishes, it adds to the count of the integers that are primes.
  • When all three threads finish, the final result is reported back to the main UI thread.
Overview of the key lines of code:

In my previous writeup, I used the BackgroundWorker to implement the processing on multiple threads for this demo.  In this example, I do everything manually and you could write logic to dynamically spin up the necessary threads as needed using delegates.  Like I mentioned, this code mimicks the high level logic in the parallel extensions library.  In the future, I hope that they release a parallel extensions dll for Silverlight which will do exactly what we are simulating in this example/demo.

The code below handles the calculating of primes for three threads:

  private void CalculatePrimesOptimizedThreeThreads_Click(object sender, RoutedEventArgs e)  {  // Reset the variables.  this.threadsReportingThree = 0;  this.multiThreadedPrimesThree = 0;  this.multiThreadedStartThree = DateTime.Now;  /*  You do not want to do this in production code.  I would do something like this once the delegates support async thread invocation.  Func callCalcAsync = this.calculateNumberOfPrimes;  callCalcAsync.BeginInvoke(1, 500000, [add callback method], null);  */  this.bw31.RunWorkerAsync();  this.bw32.RunWorkerAsync();  this.bw33.RunWorkerAsync();  }  

The code in the event handler above simply resets the timespan & count variables. Finally, it spins up the three seperate threads in order to process 1/3rd of the counting of the 1,000,000 primes.

  void bw31_DoWork(object sender, DoWorkEventArgs e)  {  e.Result = this.calculateNumberOfPrimes(1, 333332*this.multiplyFactor);  }  void bw32_DoWork(object sender, DoWorkEventArgs e)  {  e.Result = this.calculateNumberOfPrimes(333332 * this.multiplyFactor + 1, 666666 * this.multiplyFactor);  }  void bw33_DoWork(object sender, DoWorkEventArgs e)  {  e.Result = this.calculateNumberOfPrimes(666666 * this.multiplyFactor + 1, 1000000 * this.multiplyFactor);  }  

The code above shows the processing of each of the three threads do after they are initiated by the RunWorkerAsync() method (in the first code snippet).  Each of the threads calls the calculateNumberOfPrimes method with the respective procesing range passed in.  Simple stuff.

  void bw3_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)  {  // Poor way to do a waithandle. Don't do this in production code.  // Use the waithandle...that's what it's there for.  I am keeping this simple.   // until Silverlight 2 gets full async support  object lockObject = new object();  lock (lockObject)  {  this.threadsReportingThree++;  this.multiThreadedPrimesThree += (int)e.Result;  // Don't do anything until both threads have reported.  if (this.threadsReportingThree == 3)  {  DateTime end = DateTime.Now;  Double timetaken = end.Ticks - this.multiThreadedStartThree.Ticks;  timetaken = timetaken / 10000000;  this.lblCalculatePrimesOptimizedThreeThreads.Text = "Number of primes up to: " + 1000000 * this.multiplyFactor + " is " + this.multiThreadedPrimesThree.ToString() + ", counted in " + timetaken + " secs.";  }  }  }  

In the code above, we are performing the last step of our parallel extensions simulation.  We are synchornizing the result of the three background threads.  As each thread finishes, we add to the number of primes that each thread process has found.  When all three threads finish, we report the result the the UI (main thread).  Note, we do perform a lock on the processing code as we could appear in a race condition with two threads finishing at the same time.  The probability of this is small; however, not thinking about these things usually results in intermittent code and causes weird bugs to occur.  Furthermore,  the more threads to sync, the higher the probability of errors like this occuring.

Performance Gain Tests

Before we start looking at the demo numbers, we need to understand that this processing doesn't come free.  Spinning up extra threads from the threadpool, figuring out which section of work each thread should process, storing the results in intermediate structures and finally synchronizing the threads all have an associated processing cost with them.  Obviously, in production you could adjust the multithreaded logic based on your data/environment(s), make it configurable or make it dynamic and figure out which is the best way to process.  Longer running processes done on client machines with multiple cores will benefit from this multithreaded implementation than those that don't meet this criteria.

Performance Gain Tests - Test Core 2 Due 2.2 GigHz

The first test machine is a Core 2 Duo 2.2 GigHz.  The workstation has two physical cores, so it can handle multiple processes better than a single core processor.  Depending on your workstation specs, you may receive different results.  This is a typical standard desktop/laptop you could buy from Dell/HP in the last 1.5-2 years. 

Test 1 (counting the number of primes from 1 -> 1,000,000).  Factor is the default: 1

  • Single thread time: .362 seconds
  • Two threads time:  .256 seconds
  • Three threads time: .257 seconds

Notice right off that multithreading the computation already netted a nice performance gain of over 40%!  Notice there isn't much of a difference between two threads and three threads.  Remember, when I mentioned above that there is a cost associated with each thread you use in splitting up processing on each core.  Obviously, the cost here is roughly equal to the benefit.  Therefore, they wash each other out and three threads process in about the same total time as two threads.  One way to mitigate the cost of multiple threads on a core is to increase the amount of available cores :)

Test 2 (counting the number of primes from 1 -> 5,000,000).  Factor is the default: 5 (dramatically increases the processing time).  I ran these tests multiple times and picked the lowest run I received (You will receive different results as you may have different items running in the background on your workstation).

  • Single thread time: 3.194 seconds
  • Two threads time:  2.031 seconds
  • Three threads time: 1.719 seconds

In this example, we can see that the difference between the single threaded and dual threaded implementation is about 1.1 seconds!  That is a huge difference.  Let's look at the three thread implementation.  It actually came in 0.3 seconds lower than the two thread implementation.  So now we are seeing some benefits of our third implementation.  It is not very drastic; however, as we increase the processing needed, the difference gap will increase further in the time span.  The percentage different should remain roughly the same.

Performance Gain Tests - Dual Processor Quad-Core HaperTown 5420 (Intel Xeon 2.5 GigHz)

The second test machine is a server: Quad-Core 2.5 GigHz.  The server has 8 physical cores.  This test is just to show what results we come up with when the cores are increased by 4x. 

Test 1 (counting the number of primes from 1 -> 1,000,000).  Factor is the default: 1

  • Single thread time: .234 seconds
  • Two threads time:  .15625 seconds
  • Three threads time: .1093 seconds

With a higher class of processor, these times are a lot faster than the previous results.  However, with 2 extra cores now our three thread implementation is already 42% faster than the two thread implementation.  Note that during our test on a dual core, the three thread implementaton was indentical to the two thread implementation.

Test 2 (counting the number of primes from 1 -> 5,000,000).  Factor is the default: 5 (dramatically increases the processing time).  I ran these tests multiple times and picked the lowest run I received (You will receive different results as you may have different items running in the background on your workstation).

  • Single thread time: 2.25 seconds
  • Two threads time:  1.401 seconds
  • Three threads time: 0.997 seconds

The times are obviously faster and the margins are about equal between the last test (The three thread implementation is about 40%+ faster than two thread implementation). The performance of the three threads is 125% faster than a single threaded computation!

What's happening during the tests?

If you have a dual-core or a quad-core workstation, you can try this yourself by simply opening up task manager (while testing different implementations).  I set the factor to 15 to count the number of primes from 1 -> 15,000,000.  This will simulate a longer running process that hopefully registers properly on the task manager.

Above is a screen shot of the task manager CPU usage of both cores while performing the prime calculation for 15,000,000 integers.  The data in the red box was captured for the single threaded implementation.  The data in the blue box was captured for the multithreaded implementation (2 threads).  Notice from the graph that the duration for the single threaded implementation is longer than the multithreaded implementation.  Furthermore, during that duration the single threaded implementation does not take full advantage of both cores.  The CPU does some switching automatically for the process; however, it's not perfect.  Contrast this with the multithreaded implementation which spikes both cores to 100% and keeps it at 100% until the processing has finished.  The graph in the blue is the kind of graph you want to see from a fully multithreaded process.

Who cares?  Why would I want to do this in a web app?

In the world of Web 1.0 and maybe even Web 2.0 days, you probably would want to leave heavy processing to a server or move to a fat client application.  With the start of cloud computing, Web 3.0 or "Web OSes" more is expected from web sites inside the browser.  Multithreading isn't just about AJAX and calling a web service anymore.  Inside Silverlight, you can do some real powerful optimizations by leveraging multiple threads.  This can be useful in a lot of areas.  Just to name a few:

  • Silverlight game programming (i.e., AI)
  • Applications that handle a lot of data
  • Simple math/physics simulations
  • Server monitoring of processes, databases, etc.

If you are professional programmer reading this, I hope I don't have to convince you to utilize multithreaded patterns in your code to net 40% performance gains! :) Just take a look at the net gain on a server machine with three threads...125%!  This is with my simple/manual/hacky waithandle logic, so there is definitely more improvement to be netted here.

My last two articles gave detailed examples of multithreading, which were an intro to my next post in my Silverlight Master Series. 

Beta 2 Source code: SilverlightPrimes.zip (544.08 kb)

Silverlight 2 RTM code: SilverlightPrimes_RTM.zip (543.73 kb)

Posted: Sep 07 2008, 13:23 by Bart Czernicki | Comments (4) RSS comment feed |
  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags:
Social Bookmarks: E-mail | Kick it! | DZone it! | del.icio.us
blog comments powered by Disqus