I want to be a good boy, I really do!
Consider this scenario for me: I've got a SQL query which returns 7000 rows. I need to put a single value of this row into a combo box. This value is also the primary identifier for one of my custom business objects. That leaves me with the following three options:
- select the results out with a dataset and bind the combobox to the dataset
- select the results out with a datareader and iterate through the results, adding the values one at a time to the combobox
- select the results out with a datareader and for each value, create my custom business object and add it to a collection. Then bind the combobox to the collection
I actually tried each of these. I started out with solution #2 - but that took too long. On my 3Ghz machine with 1Gb RAM, it took over a minute to iterate through all the results and add the values to the combobox. Way too long. So I moved on to solution #3 - much more elegant. I create a collection of objects and all my databinding all over the form means I have virtually no code. Sweet. Except that took too long too. Remember my post on delegates? I implemented that technique and the process of creating all my CBOs took 40 seconds on a high powered HyperThreaded/Dual Core CPU. So that moved me to solution #1 - which takes about 2 seconds (I also used the worker thread, so that might be helping).
Let's review:
- Use a simple dataset (and 2 threads). This takes 2 seconds. Of course, I have to create my CBO from the selected value later, but that's fast when I'm only doing it once.
- Use a data reader and a single thread. This takes 75 seconds.
- Use a custom collection of CBOs and 2 threads. This takes 40 seconds.
Clearly solution #1 wins. Of course, it's not as elegant as #3, but it's so much faster that anyone who talks about architectual purity needs to get slapped. Now my question is why? What the F$#K is taking so long?
I have a theory - I'm using the Enterprise Library to handle my calls to the database. Hydrating my CBO makes a single call to the database and gets about 6 values out as output parameters. My theory is that b/c the EntLib handles the opening and closing of the database inside of the ExecuteReader function that I'm actually opening and closing the connection 7000 times! Now that totally sucks and if I was interested in writing the ADO code I'd move the open and close outside of the loop. But I'm not really interested in writing that code b/c even if I could speed up the connection to the database, I doubt I'd shave 37 seconds off of my time.
So my question to the general public is this: when is it right to use a dataset and when should I use a CBO? I'm not doing a lot of complex work here - just filling out some screens with some properties of some objects. Sounds like a job for a CBO collection, but I'm typically dealing with THOUSANDS of objects. Datasets seem to win on raw speed.
Thoughts? Comments?
-- Matt Ranlett