This project has moved. For the latest updates, please go here.

reference data

Apr 24, 2011 at 8:10 AM

Hi all,

Been watching the visual studio database projects for a while now and go to say I love the guidance document.

One thing I think needs to be covered in more depth is how to handle deploying reference data, I had a look and couldnt find much of a mention of this at the moment (hopefully didnt miss it!).

One of the main things stopping the adoption of the project has been how we should handle reference data properly in respect to adding, removing, end dating/obseleting and how to achieve this properly without circumventing the database projects.

I have seem some approaches that use the MERGE command but would be intersted in knowing if this is the best way to do it etc.

Any chance in the next version of the documentation there could be some guidance around this, that would be awesome!

Keep up the good work!

Mar 5, 2012 at 4:28 PM

It's curious to see that nobody has anything to say about this.  I've only seen blips, including this very guidance document pg. 71, about this issue which define the simple reference table with just a few or a few hundred rows.  The guide clearly states that the deployment script "must be re-runnable or idempotent".  Easy enough.  What, however, is the best practice if the reference table has several hundred thousand rows, like a medication or medical procedure code reference table? I've done some searching on the net and it just points right back http://blogs.msdn.com/b/bahill/archive/2009/04/01/maintaining-and-synchronizing-your-reference-data.aspx.  The real issue is that VSDBCMD takes for-e-ver to insert the rows to even perform a merge or during initial deployment using the post-deployment script with INSERT commands.  This is obviously not a great solution since deployments should really be as fast as possible.  The only thing I can think of is to include a .dat file in the deployment to perform a bulk insert and recreate the table every time or use to the MERGE approach mentioned above.  I would interested to hear what has been done to deal with this issue.

Developer
Mar 6, 2012 at 6:47 AM

Hi dpachla,

you mentioned that inserting big chunks of data can be painful as you want to minimize the deployment time and have your system up and running in the shortest amount of time. Beside having static data captured in INSERT scripts, I also had truck loads of data needed to be synced / updated during a delta-deployment / initial deployment. There are several approaches we took, but this is the one where we made sure that we have the least impact on the deployment and the existing database. (We imported and synced street data to support a street catalog) 

-Insert the data in a seperate database using methods like BULK insert, SSIS loading, bcp. Make sure that you are able to place the data files of the reference catalog on another spindle than the operational database for performance reasons.

-Update your database schema

-Use Merge (where accessible due to compatibility) and sync the data with your database

-Drop the database with the reference data to clean the house again.

This can be either than within the deployment process using script in the preprocessor commands (we had a sample for that in the HOL where we needed to cope with the NULLability) or as a seperate process. Using a separate manual process is in most cases the better thing as in our case the reference data did not change that often and therefore we could get rid of the costly import and syncs most of the time only doing the sync when needed.

Does this make sense to you ?

-Jens

Mar 6, 2012 at 2:21 PM

Hi Jens,

That definitely makes sense.  What you've explained will work for me too.  I'm finding myself now asking, if there is going to be another database involved, why not back up and restore a reference database to perform the merge?  It seems like that would be the fastest and most accurate.  These types of tables are typically coming from a third party anyway, and just like in your case, they are not updated very often so the costly merge process can be avoided in 99% of deployments.

Thanks for the great info.

Developer
Mar 6, 2012 at 3:13 PM

Hi,

 

and good to hear that this is also a feasible way for you. According to "It's curious to see that nobody has anything to say about this.", sorry for that. Sometime the alerts dive under my radar in my personal inbox. Not my intention to ignore anyone here. I encourage you to simply *bump* if there is no response.

-Jens