Automated Workflow Environments and EMR

October 30, 2006

Well, we work in the next era of software development, not only designing applications, but also developing systems that communicate with each other, thus participating in a workflow.

Automating this workflow through the seamless integration of these apps is a task that challenges many of the industries that we work in.

Automated Workflow Environments are those systems where multiple systems contribute and communicate to enable a network of these apps to actually solve complex problems very efficiently, with no human interaction. You can call them Digital Ecosystems.

You can construct workflow nets to describe the complex problems that these systems efficiently solve. Workflow nets, a subclass of Petri nets, are known as attractive models for analyzing complex business processes. Because of their good theoretical foundation, Petri nets have been used successfully to model and analyze processes from many domains, like for example, software and business processes. A Petri net is a directed graph with two kinds of nodes – places and transitions – where arcs connect ‘a place’ to ‘a transition’ or a transition to a place. Each place can contain zero, one or more tokens. The state of a Petri net is determined by the distribution of tokens over places. A transition can fire if each of its inputs contains tokens. If the transition fires, i.e. it executes, it takes one token from each input place and puts it on each output place.

In a hospital environment, for example, the processes involved, show a complex and dynamic behavior, which is difficult to control. The workflow net which models such a complex process provides a good insight into it, and due to its formal representation, offers techniques for improved control.

Workflows are case oriented, which means that each activity executed in the workflow corresponds to a case. In a hospital domain, a case corresponds with a patient and an activity corresponds with a medical activity. The process definition of a workflow assumes that a partial order or sequence exists between activities, which establish which activities have to be executed in what order. Referring to the Petri net formalism, workflow activities are modeled as transitions and the causal dependencies between activities are modeled as places and arcs. The routing in a workflow assumes four kind of routing constructs: sequential, parallel, conditional and iterative routing. These constructs basically define the route taken by ‘tokens’ in this workflow.

Well, enough theory, how does this apply?

Think of this in practical terms using the example of a EMR* or CPR* System or HIS* System:
• A patient arrives at a hospital for a consultation or particular set of exams or procedures.
• The patient is registered, if new to the hospital. A visit or encounter record is created in the Patient Chart (EMR) – with vitals, allergies, current meds and insurance details.
• The physician examines the patient and orders labs, diagnostic exams or prescription medications for the patient possibly using a handheld CPOE*
• The patient is scheduled for the exams in the RIS – radiology info system or LIS – laboratory info system or HIS (hospital info system)
• The RIS or LIS or HIS sends notifications to the Radiology and/or Cardiology and/or Lab or other Departments in the hospital through HL7 messages for the various workflows.
• The various systems in these departments will then send HL7 or DICOM or proprietary messages to get the devices or modalities, updated with the patient data (prior history, etc.)
• The patient is then taken around by the nurses to the required modalities in the exam/LAB areas to perform the required activities.
• The patient finishes the hospital activities while the diagnosis continues and the entire data gathered is coalesced and stored in rich structured report or multimedia formats in the various repositories – resulting in a summary patient encounter/visit record in the Electronic Patient Record in the EMR database.
• There could also be other workflows triggered – pharmacy, billing,.
• The above is just the scenario for an OUTPATIENT, there are other workflows for INPATIENT – ED/ICU/other patients.

The key problems in this ‘Automated Workflow Environment’ are:

• Accurate Patient Identification and Portability to ensure that the Patient Identity is unique across multiple systems/departments and maybe hospitals. The Patient Identity key is also essential to Integrating Patient healthcare across clinics, hospitals, regions(RHIO) and states.
• Support for Barcode/RFID on Patient Wrist Bands, Prescriptions/Medications, Billing (using MRN, Account Number, Order Number,Visit Number), etc to enable automation and quick and secure processing.
• Quick Patient data retrieval and support for parallel transactions
Audits and Logs for tracking access to this system
• Support for PACS, Emergency care, Chronic care (ICU / PACU), Long Term care, Periodic visits, point of care charting, meds administration, vital signs data acquisition, alarm notification, surveillance for patient monitors, smart IV pumps, ventilators and other care areas – treatment by specialists in off-site clinics, etc.
• Support for Care Plans, Order sets and Templates, results’ tracking and related transactions.
• Quick vital sign results and diagnostic reporting
• Effective display of specialty content – diagnostic/research images, structured “rich” multimedia reports.
Secure and efficient access to this data from the internet
Removal of paper documentation and effective transcription
SSO-Single Sign On, Security roles and Ease of use for the various stakeholders – here, the patient, the RN, physician, specialist, IT support etc.
Seamless integration with current workflows and support for updates to hospital procedures
Modular deployment of new systems and processes – long term roadmap and strategies to prevent costly upgrades or vendor changes.
HIPAA, JCAHO and Legal compliance – which has an entire set of guidelines – privacy, security being the chief one.
• Efficient standardized communication between the different systems either via “standard” HL7 or DICOM or CCOW or proprietary.
• Support for a High speed Fiber network system for high resolution image processing systems like MRI, X-Ray, CT-SCAN, etc.
• A high speed independent network for real time patient monitoring systems and devices
• Guaranteed timely Data storage and recovery with at least 99.9999% visible uptime
• Original Patient data available for at least 7 years and compliance with FDA rules.
Disaster recovery compliance and responsive Performance under peak conditions.
• Optimized data storage ensuring low hardware costs
Plug ‘n’ Play of new systems and medical devices into the network, wireless communication among vital signs devices and servers, etc.
Location tracking of patients and devices (RFID based) and Bed Tracking in the hospital
Centralized viewing of the entire set of Patient data – either by a patient or his/her physician
Multi-lingual user interface possibilities (in future?)
Correction of erroneous data and merging of Patient records.
Restructuring existing hospital workflows and processes so that this entire automated workflow environment works with a definite ROI and within a definite time period!
• Integration with billing, insurance and other financial systems related to the care charges.
Future proof and support for new technologies like Clinical Decision Support (CDSS) – again a long term roadmap is essential.

ROI: How does a hospital get returns on this IT investment?

  1. Minimization of errors – medication or surgical – and the associated risks
  2. Electronic trail of patient case history available to patient, insurance and physicians
  3. Reduced documentation and improvement in overall efficiency and throughput
  4. Patient Referrals from satellite clinics who can use the EMR’s external web links to document on patients – thus providing a continuous electronic report
  5. Possible pay-per-use by external clinics – to use EMR charting facilities
  6. Remote specialist consultation
  7. Efficient Charges, Billing and quicker settlements
  8. Better Clinical Decision Support – due to an electronic database of past treatments
  9. In the long term, efficiency means cheaper insurance which translates to volume income
  10. Better compliance of standards – HIPAA, privacy requirements, security
  11. Reduced workload due to Process Improvement across departments – ED, Obstetrics/Gynecology, Oncology/Radiology, Orthopedic, Cardiovascular, Pediatrics, Internal Medicine, Urology, General Surgery, Ophthalmology, General/family practice, Dermatology, Psychiatry
  12. Improved Healthcare with Proactive Patient Care due to CDSS
  13. Quality of Patient Care: A silent factor of a hospital’s revenue is quality of patient care. One of the chief drivers of quality of patient care is the quality of information provided efficiently to the Physicians though which they can make those critical decisions

Now, the big picture becomes clear.

Doesn’t the above set of requirements apply to any domain? This analysis need not be applicable only to a hospital domain, the same is true for a Biotech domain (where orders are received, data is processed, analyzed, and the processed data is presented or packaged). Similarly a Manufacturing Domain, Banking domain or Insurance Domain etc.

The need is for core engine software – based on EDI (Electronic Data Interchange) – that integrate and help in the Process Re-Engineering of these mini workflows securely and effectively and using common intersystem communication formats like X-12 or HL7 messages.

These Workflow Engines would be the hearts of the digital world!

*EMR – Electronic Medical Record
*CPR – Computerized Patient Record
*CDSS – Clinical Decision Support
*RHIO – Regional Health Information Organization
*CPOE – computerized physician order entry

Some of the information presented here is thanks to research papers and articles at:
*Common Framework for health information networks
*Discovery of Workflow Models for Hospital Data
*Healthcare workflow
*CCOW-IHE Integration Profiles
*Hospital Network Management Best Practices
*12 Consumer Values for your wall

What about the latest IT trends and their applications in healthcare?

We already know about Google Earth and Google Hybrid Maps and the advantages of Web 2.0
The next best thing is to search the best shopping deal or the best real estate by area and on a hybrid map – this recombinant web application reuse technique is called a mashup or heat map.
Mashups have applications in possibly everything from Healthcare to Manufacturing.
Omnimedix is developing and deploying a nationwide data mashup – Dossia, a secure, private, independent network for capturing medical information, providing universal access to this data along with an authentication system for delivery to patients and consumers.

Click on the below links to see the current ‘best in class mash ups
*After hours Emergency Doctors SMS After hours Emergency Doctors SMS system – Transcribes voicemail into text and sends SMS to doctors. A similar application can be used for Transcription Mashup (based on Interactive Voice Response – IVR): Amazon Mturk, StrikeIron Global SMS and Voice XML
* Calendar with Messages Listen to your calendar + leave messages too Mashup (based on IVR): 30 Boxes based on Voxeo , Google Calendar
* – Housing/Climate/Jobs/Schools
* Visual Classifieds Browser – Search Apartments, visually
* – Real Estate/Home pricing
* – Rent comparison
* – Real Estate Statistical Analysis
* – Rent/Real Estate/Home pricing – linked to Craigslist
* – Google Maps + Travel Videos
* – Wheel of Zip Code based restaurants
* More sample links at this site (unofficial Google mashup tracker) includes some mentionable sites :
* latest news from India by map
* read news by the map – slightly slow
* view news from Internet TV by map –
* see a place in 360

What’s on the wish list ? Well, a worldwide mashup for real estate, shopping, education, healthcare will do just fine. Read on to try out YOUR sample…
OpenKapow: The online mashup builder community that lets you easily make mashups. Use their visual scripting environment to create intelligent software Robots that can make mashups from any site with or without an API.
In the words of Dion HinchCliffe, “Mashups are still new and simple, just like PCs were 20 years ago. The tools are barely there, but the potential is truly vast as hundreds of APIs are added to the public Web to build out of”.
Don also covers the architecture and types of Mashups here with an update on recombinant web apps

Keep up to date on web2.0 at

Will Silverlight and simplified vector based graphics and workflow based – xml language – XAML be the replacement for Flash and JavaFX?

Well, the technology is promising and many multimedia content web application providers including News channels have signed up for Microsoft SilverLight “WPF/E” due to the light weight browser based viewer streaming “DVD” quality video based on the patented VC-1 video codec.

Microsoft® Silverlight™ Streaming by Windows Live™ is a companion service for Silverlight that makes it easier for developers and designers to deliver and scale rich interactive media apps (RIAs) as part of their Silverlight applications. The service offers web designers and developers a free and convenient solution for hosting and streaming cross-platform, cross-browser media experiences and rich interactive applications that run on Windows™ XP+ and Mac OS 10.4+.

The only problem is LINUX is left out from this since the Mono Framework has not yet evolved sufficiently.

So, the new way to develop your AJAX RIA “multimedia web application” is – design the UI with an Artist in Adobe Illustrator and mashup with your old RSS, LINQ, JSON, XML-based Web services, REST and WCF Services to deliver a richer scalable web application.

Is it Knoppix or PCLinuxOS time?

September 19, 2006

Not yet tired of windows? Well, read on to find out the next gen O/S out there…

* New capabilities – creating a remastered custom Live CD/DVD and booting from a USB flash

Let’s see where do I start…

Well, my machine crashed – it’s a new Acer – NLCI (4150 or 4650 series) laptop- and the problem was maybe some virus, because the ntoskrnl.exe was corrupted, and the drivers just stopped loading after a few seconds of boot time, bringing the hard disc to a halt and the screen to a frozen white screen (talk about the old blue screen being upgraded by MS!!!)

Time to look around for a boot-able and repair kit right! I tried – got an XP cd and installed XP onto a different partition – but the next day – that too went into the same loop like the above.

Then, I found out that maybe the hard disc may have got some permanent damage – thanks to all the travel lately.

I look around for a solution other than DOS obviously and I find the best free O/S ever!

Knoppix — the free open source Linux O/S that boots and runs from a CD/DVD (like PCLinuxOS 2007), with automatic hardware detection, recognizing all the devices – Graphics Card, Sound Card, USB, CD, DVD, RAID, Modem, Lan Card, printer-scanner(hp all-in-one), pcmcia cards, sd card(gateway works with PCLinuxOS), bluetooth(works with PCLinuxOS), wireless(works with PCLinuxOS),modem, external camera, webcam – you name it and its a solid system that detects, installs and boots fast (PCLinuxOS 2007 boots in 30 seconds), and runs quickly on ordinary hardware! It can even mount my NTFS hard disc in read mode (ntfs-3g driver supports read and write of NTFS files) to copy files to my USB so my critical files are saved in no time. Of course, part of the recovery process is being able to write CD and DVD’s which Knoppix (and PCLinuxOS) supports with a right click menu.

Knoppix – a flavor of Debian Linux – or – is a 640MB bootable Live CD with a collection of GNU/Linux s/w, which brings up a Windows like user interface, connects to the internet (after a minor config) through a DSL or Cable modem and voila – I’m online via Mozilla, Opera or Konqueror. Knoppix 5.1.1+ is a debian flavour of the Linux O/s with the Linux kernel 2.6.19+ , KDE 3.5.5+/GNOME 2.16+, ntfs-3g and it has all the flavours of a true windows system with OpenOffice2.1+ – (Word-documents,Excel-spreadsheets,PPT-presentations) , PDF reader, kaffeine (media player) and GIMP (for those mspaint users), picasa from google, a host of ntfs data recovery tools and good card games!

PCLinuxOS 2007 TR4is a free open source Linux O/S (flavor of Mandrake/Mandriva Linux with Linux kernel, KDE 3.5.7+/GNOME 2.16+, Open Office 2.3+, 3D windows support), has all the above software modules, and it is much better than the Knoppix version as of PCLinuxOS 2007 TR4 versus Knoppix 5.1.1 – since PCLOS can be remastered and installed on a usb flash drive very easily.

Live CD Knoppix (like Live CD PCLinuxOS) uses on-the-fly decompression to load into memory, the required modules, from bootable CD/DVD, so the CD is locked by knoppix and you can’t use it for writing or DVD viewing although they have the CD/DVD writer software and the movie viewer software and I can use an external DVD/CD writer/reader to perform CD/DVD burns/reads. You can install the PCLinuxOS 2007 TR4 Live CD to a USB flash drive The other pluses to these Live CDs are: There is already an available messenger – thanks to secure GAIM (now Pidgin) – which can connect to Yahoo, Google Chat, AOL, etc. among others. Kopete is better than other messengers due to WebCam support, but need Knoppix/PCLinux installed to the hard disc. You can also create custom bootable CD/DVDs/USB flash, since the default live CDs are built for a read-only O/s (from the CD) with default options.

Want to listen to quality music? – using StreamTuner on Linux you can listen to live internet streamed (128 to 256kbps) quality radio on xmms, from around the World for free!!!

Security – you can install the necessary Mozilla addon’s and the shorewall or firestarter firewall to boost your experience.
VLC, Gxine, Amarok and Mplayer are very good multimedia programs covering all the needed experience.

Acrobat 8 is available for Linux for the pdf community.

Open Office or Star Office are not perfect but are decent Linux office solutions.

If you are wondering about install time – there is none – since the OS just boots of a CD/USB, you can use all the features of a full fledged O/S, and if you need, you may install the O/S to hard disc in 10 minutes.

So what are the minimum requirements of this new O/s (Vista beware!)

· Intel-compatible CPU (i486 or later),
· 32 MB of RAM for text mode, at least 96 MB for graphics mode with KDE (at least 128 MB of RAM is recommended to use the various office products),
· boot-able CD-ROM drive, or a boot floppy and standard CD-ROM (IDE/ATAPI or SCSI),
· standard SVGA-compatible graphics card,
· serial or PS/2 standard mouse or IMPS/2-compatible USB-mouse.

Before you comment, please note:

  • I know I could have called Acer – support – since my laptop is within warranty – well, I didn’t call because I needed internet connectivity – not further delays and postal issues with mailing my hard disc out.
  • Knoppix and PCLinuxOS have very good multi-session KDE3.5+/GNOME 2.16+ windows environment which I wanted compared to a lite-r environment like DSL Linux ( – that’s another topic – 50MB linux – usb/mini-cd boot) – another reason not to go for a usb boot was the laptop bios was not upgraded by the manufacturer and the way to upgrade bios is only through a working Windows XP !! Latest: I finally got a usb floppy drive and got my BIOS upgraded, also I created a custom Live CD of PCLinuxOS 2007 TR4 and copied it to a USB flash which is now bootable (don’t forget to install th MBR) and working for Acer 4152 and Gateway MX 6124 laptops.
  • I downloaded Knoppix 3.6 from a friends cable connection and burnt the CD in 10 minutes and was online on my Acer 4152 NLCI laptop in 15 minutes: Latest: I downloaded Knoppix 5.1.1 Live DVD and PCLinuxOS 2007 TR4 Live DVD – PCLinuxOS rates better with custom Live DVD/USB flash boot support.
  • All of the needed drivers and software including Gaim, Office(Word,Excel,PPT), PDF viewer, printer-scanner drivers, rss reader, were on the CD so nothing to install – unlike ms windows, where a plain o/s is useless!!!

Knoppix 3.6 Problems and PCLinuxOS 2007 TR4 Advantages:

  • The laptop CD/DVD drive is being fully used/locked and I cannot eject it – when I log into the Knoppix Live O/s. This is by design, since the modules are dynamically loaded from the CD. You can burn custom CD’s by downloading the necessary softwares by apt-get
  • I downloaded PCLinuxOS 0.93 and later upgraded to PCLinuxOS 2007 TR4 and they install to hard disc easily with a single click, also they have a neat way to create a custom Live CD/DVD, which can be created in 20 minutes flat using mklivecd/k3b
  • Webcam with messenger is an issue with gaim – PCLinuxOS 2007 fixed this problem with a new version of Kopete and many more web camera tools.
  • WPA Wireless security is not supported with default ndiswrapper, you may still have to use the windows wireless card drivers – otherwise PCLinuxOS 2007 makes secure wireless a breeze, with support for secured 128 bit WEP.
  • The apt-get feature (combined with Synaptic of PCLinuxOS) of most Linux flavors (yast in OpenSuse) is great to keep your specific O/S features upto-date and remove broken packages. The control you have over your custom machine software is simply great.
  • I can’t access the SD Card inserted in the proprietary Acer Laptop Texas Instruments SD/MMC Card Reader – but I can connect the Kodak digital camera via USB and the photos can be uploaded. Latest: PCLinuxOS 2007 TR4 has a fix for many Gateway laptop SD card reader issues in their support forums for most SD cards and multi card readers.
  • I can’t read MS Visio documents (but this in development and I will use visio converted to jpegs till then).
  • Open Office is not yet a very mature software and cannot reasonably compare to MS Office 2003 in either features or printing of Excel documents.
  • PCLinuxOS 2007 has Beryl which is a decent 3D windows manager with nice 3D windows effects – but 3D windows is still not a very refined concept and I would suggest uninstalling the compiz and beryl 3D software.

Well, that’s it from me, see ya… Keep your Knoppix CD/DVD or PCLinuxOS CD/DVD/USB ready…. as we say in Linux there is true consumer choice even though I personally vote for PCLinuxOS 2007 TR4

Oh ya.. Knoppix supports clusters and a multi-computer version is out called “ParallelKnoppix” which converts a host of windows machines into a Linux Cluster Farm. Descriptions are here –

Howto? – Another useful site to learn to use Linux –

Some more flavors that receive good desktop Linux reviews are

Other good sites with helpful linux links:

Simple SQL Server Performance Tips

July 29, 2005
  1. Always create a data model (ERD).
  2. Consider using an application block or a best practice based design.
  3. Make sure the database is normalized – very important else sql server will not give optimized query plans (Tips for SQL Server 2005 Query Plans) . For the 1 to many (1:m OR m:1) relation, -> ensure that the child table’s primary key has one of its composite keys as the parent table’s primary key. All dependent tables must have the parent-primary-key (foreign key) and a surrogate key as its primary key eg. a Person – Address relationship, or a Product – Attribute relationship. For an m:n relation ensure that the two tables have a third table to hold the primary key combinations of both the related tables eg. a many to many relationship.
  4. Make sure database security is controlled through views/stored procedures and finally roles.
  5. All commonly used joins have indexes on the where condition columns. Remember foreign key constraint doesn’t mean an index.
  6. Always use Inner Joins if possible then Outer Joins . Use Left Outer joins only when foreign keys are nullable. Try to design around NULL (avoid foreign keys being NULL). Use ANSI_NULL to ensure ANSI NULL compatibility. Remember: SELECT * FROM A1 where b not in (SELECT b from B1) would return null if any b is null.
  7. Keep transactions as short as possible.
  8. Reduce lock time. Try to develop your application so that it grabs locks at the latest possible time, and then releases them at the very earliest time.
  9. Always run/display execution plan from query analyzer when testing out stored procs/ad-hoc sql and ensure clustered index seek or nested loops are used. NO HASH JOINs. I/O or hash joins would mean spikes in CPU usage in the performance monitor(perfmon).
  10. Avoid where conditions with functions since SQL Server doesn’t have Function based indices. eg don’t use select a,b from X where CONVERT(date) > ’10/10/2005′, instead move the convert to the RHS constant. This guarantees query exec. plan reuse and also usage of index columns by query plan.
  11. Always run sql profiler and run your client application and ensure that the duration column is not too much, if too much run index tuning wizard which will confirm that no indices are required for the queries.
  12. Always use connection pools for guaranteeing caching of queries results etc. Connection Strings should exactly match for connection pooling, if NT USer use same user while connecting to the database from the client. Remember: NT based connection pooling through delegation doesn’t work correctly in ASP.NET, also it isn’t as scalable as a SQL user based connection pool. You can always encrypt the connection string in the web.config file
  13. SQL Server .NET data provider is the fastest. The SQL Server .NET provider uses TDS (Tabular Data Stream, which is the native SQL Server data format) to communicate with SQL Server. The SQL Server .NET provider can be used to connect to SQL Server 7.0 and SQL Server 2000 databases, but not SQL Server 6.5 databases. If you need to connect to a SQL Server 6.5 database, the best overall choice is the OLE DB.NET data provider.
  14. 2 part name – Always use fully qualify tables/views/stored procs like exec dbo.sp_storeusers or sp_sqlexec rsdb.dbo.sp_storeusers to be compatible with future releases of SQL Server.
  15. SQL Server 2005 places no limits on server RAM, supports XML natively, has an inbuilt tuning advisor and works with the same sql syntax as SQL Server 2000. Constant Scan and other operators of SQL 2005.
  16. Server side cursors are not scalable in SQL Server => avoid .
  17. Cursors are degradable to the next higher cost cursor – when ORDER BY (not covered by index), TOP, GROUP BY, UNION, DISTINCT,.. is used.
  18. Always use DataReaders, then DataTables then DataSets with ADO.NET in that order of performance hit.
  19. Try to use SELECT … (with NOLOCK) hint. NOLOCK gives dirty data, useful only when readers are much more than writers. If appropriate, reduce lock escalation by using the ROWLOCK or PAGLOCK. Consider using the NOLOCK hint to prevent locking if the data being locked is not modified often.
  20. Always de-allocate and close cursors, close connections.
  21. To check io costs – set statistics io on — just get stats for touches on the tables (could be index, clustered index or table)
  22. Non-clustered index leads to a bookmark look-up when the clustered index/rowid data is accessed.
  23. Internationalization: Always use UTC time in database and plan for Unicode. Don’t assume locale and number of users, design for most scalability. Don’t use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O and wasting unnecessary space in your buffer cache.
  24. ADO.NET calls ad-hoc queries using sp_ExecuteSQL(“…”) so they will be cached, so no problems with search pages but use the same connection string/pooling. select * from syscacheobjects to check cache.
  25. Avoid SELECT * ==> leads to table scan, also have at least 1 clustered index on a table (unless its very small) Because there is no index on the column to use for the query. It must do a table scan to evaluate each row. A table scan is also done if all columns are requested or the where condition doesn’t contain any indices.
  26. use SET => better for assigning single values rather than SELECT eg. SET @a =10
  27. Openxml is costly – it loads the xml parser in sql server so use bulk insert/bulk copy
  28. DBCC – database consistency check (misnomer now!) DBCC FREEPROCACHE (free proc cache) DBCC REINDEX – at night, high cost, table lock, reindex, reapply fill factor which is applied only initially DBCC CHECK – check db consistency DBCC SHOWCONTIG – show defragmentation (extent level, logical scan, scan density) DBCC INDEXDEFRAG – online operation – during day, low cost, page lock, fix logical scan frag.
  29. Maintainance: update statistics every night, reindex every week.
  30. sp_who – show spids currently running and deadlocked ones
  31. Ask for less data over the wire – its better to work like explorer and ask for parent nodes first then child nodes based on user request.
  32. use of DISTINCT is not very scalable => database model error (may not be relational)
  33. Optimizer uses constraints – so use indices,foreign keys etc
  34. Clustered Index Scan or Full Table Scan are because an index is missing, use index tuning wizard with thorough to find the missing index when the application is running and when profiler is used. Index Tuning Wizard can be run on individual queries too from SQL Query Analyser.
  35. SQL Query Optimizer: Select column would affect BOOKMARK LOOKUP, Predicate column (where clause) determines clustered or non-clustered index seek/scan (scan=>between clause), Estimated Resultant rows determines a Clustered Index Scan is to be done or not.
  36. dbcc memorystatus – value of Stolen under Buffer Distribution increase steadily? => either consuming a lot of memory within SQL Server or is not releasing something. When an application acquires a lot of Stolen memory, SQL Server cannot page this to disk like it can for a data or index page. This is memory that must remain in SQL Server’s Buffer Pool and cannot be aged out. If the application is using cursors, memory associated with a cursor requires Stolen Memory while the cursor is open => Perhaps the application is opening up cursors but not closing them before opening a new one.
  37. OR/’in’ clauses are not very performant (most of the time they result in a table scan) ==> use unions for large queries.
  38. Always check for SQL Injection problems including comment web page injection issues.
  39. A view – “virtual table” – based on views would all be materialized in the tempdb during execution so the query plan used would be based on the sql (if it contains CONVERT, RTRIM functions etc in the where clause, the index wouldn’t be used because there are no function based indexes like ORACLE).
  40. Data Types: char == trailing spaces (padded), varchar == no trailing spaces (not-padded).If the text data in a column varies greatly in length, use a VARCHAR data type instead of a CHAR data type. The amount of space saved by using VARCHAR over CHAR on variable length columns can greatly reduce I/O reads, improving overall SQL Server performance. Don’t use FLOAT or REAL data types for primary keys, as they add unnecessary overhead that hurts performance. Use one of the integer data types instead.
  41. Avoid SQL Server Application Roles which do not take advantage of connection pooling
  42. Set following for all stored procs
    SET ANSI_NULLS ON — guarantees ansi null behaviour during concat, IN operations
    SET CONCAT_NULL_YIELDS_NULL ON — any string concat with NULL is NULL
    SET NOCOUNT ON — minimize network traffic.
  43. O/RM – Object-relational mapping – Object-relational mapping, or O/RM, is a programming technique that links relational databases to object-oriented language concepts, creating (in effect) a “virtual object database.”
  44. Simple tips from way to optimize stored procedures:
    • Limit the use of cursors wherever possible. Use temp tables or table variables instead. Use cursors for small data sets only.
    • Make sure indexes are available and used by the query optimizer. Check the execution plan for confirmation.
    • Avoid using local variables in SQL statements in a stored procedure. They are not as optimizable as using parameters.
    • Use the SET NOCOUNT ON option to avoid sending unnecessary data to the client.
    • Keep transactions as short as possible to prevent unnecessary locking.
    • If your application allows, use the WITH (NOLOCK) table hint in SQL SELECT statements to avoid generating read locks. This is particularly helpful with reporting applications.
    • Format and comment stored procedure code to allow others to properly understand the logic of the procedure.
    • If you are executing dynamic SQL use SP_EXECUTESQL instead of EXEC. It allows for better optimization and can be used with parameters.
    • Access tables across all stored procedures in the same logical order to prevent deadlocks from occurring.
    • Avoid non-optimizable SQL search arguments like Not Equal, Not Like, and, Like ‘%x’.
    • Use SELECT TOP n [PERCENT] instead of SET ROWCOUNT n to limit the number of rows returned.
    • Avoid using wildcards such as SELECT * in stored procedures (or any SQL application for that matter).
    • When executing stored procedures from a client, using ADO for example, avoid requesting a refresh of the parameters for the stored procedure using the Parameters.Refresh() command. This command forces ADO to interrogate the database for the procedure’s parameters and causes excessive traffic and application slowdowns.
    • Break large queries into smaller, simpler ones. Use table variables or temp tables for temporary storage, if necessary.
    • Understand your chosen client library (DB-LIB, ODBC, OLE DB, ADO, ADO.Net, etc.) Understand the necessary options to set to make queries execute as quickly as possible.
    • If your stored procedure generates one or more result sets, fetch those results immediately from the client to prevent prolonged locking. This is especially important if your client library is set to use server-side cursors.
    • Do not issue an ORDER BY clause in a SELECT statement if the order of rows returned is not important.
    • Put all DDL statements (like CREATE TABLE) before any DML statements (like INSERT). This helps prevent unwanted stored procedure recompiles.
    • Only use query hints if necessary. Query hints may help performance, but can prevent SQL Server from choosing the best execution plan. A query hint that works today may not work as well tomorrow if the underlying data changes in size or statistical distribution. Try not to out think SQL Server’s query processor.
    • Consider using the SQL Server query governor cost limit option to prevent potentially long running queries from ever executing.

    Best index tuning:

    • Examine queries closely and keep track of column joins and columns that appear in WHERE clauses. It’s easiest to do this at query creation time.
    • Look for queries that return result sets based on ranges of one or more columns and consider those columns for the clustered index.
    • Avoid creating clustered primary keys if the PK is on an IDENTITY or incrementing DATETIME column. This can create hot-spots at the end of a table and cause slow inserts if the table is “write” heavy.
    • Avoid excessive indexes on columns whose statistical distribution indicates poor selectivity, i.e. values found in a large number of rows, like gender (SQL Server will normally do a table scan in this case).
    • Avoid excessive indexes on tables that have a high proportion of writes vs. reads.
    • Run the Index Tuning Wizard on a Coefficient trace file or Profiler trace file to see if you missed any existing indexes.
    • Do not totally rely on the Index Tuning Wizard. Rely on your understanding of the queries executed and the database.
    • If possible, make sure each table has a clustered index, which may be declared in the primary key constraint (if you are using a data modeling tool, check the tool’s documentation on how to create a clustered PK).
    • Indexes take up extra drive space, slow down INSERTs and UPDATEs slightly, and require longer backup/replication times, but since most tables have a much higher proportion of reads to writes, you can usually increase overall performance creating the necessary indexes, as opposed to not creating them.
    • Remember that the order of columns in a multi-column index is important. A query must make use of the columns as they are listed in the index to get the most performance increase. While you don’t need to use all columns, you cannot skip a column in the index and still receive index performance enhancement on that column.
    • Avoid creating unique indexes on columns that allow NULL values.
    • On tables whose writes far outweigh reads, consider changing the FILLFACTOR during index creation to a value that allows for adequate free space on the index pages to allow for optimal table inserts.
    • Make sure SQL Server is configured to auto update and auto create statistics. If these options cause undue strain on the server during business hours and you turn them off, make sure you manually update statistics, as needed. Also, note that sql server trace does cause a strain and slowdown on the server.
    • Consider rebuilding indexes on a periodic basis, by recreating them (consider using the DROP_EXISTING clause), using DBCC INDEXDEFRAG (SQL 2000), or DBCC DBREINDEX. These commands defragment an index and return the fill factor space to the leaf level of each index page. Consider a mix/match of each of these commands for your environment.
    • Do not create indexes that contain the same column. For example, instead of creating two indexes on LastName, FirstName and LastName, eliminate the second index on LastName.
    • Avoid creating indexes on descriptive CHAR, NCHAR, VARCHAR, and NVARCHAR columns that are not accessed often. These indexes can be quite large. If you need an index on a descriptive column, consider using an indexed view on a smaller, computed portion of the column. For example, create a view:
      CREATE VIEW view_nameWITH SCHEMABINDINGASSELECT ID, SUBSTRING(col, 1, 10) as colFROM table     
      Then create an index on the reduced-sized column col:     
      CREATE INDEX name on view_name (col). This index can still be used by SQL Server when querying the table directly (although you would be limited in this example to searching for the first 10 characters only). Note: Indexed views are SQL Server 2000 only.
    • Use surrogate keys, like IDENTITY columns, for as many primary keys as possible. INT and BIGINT IDENTITY columns are smaller than corresponding alpha-numeric keys, have smaller corresponding indexes, and allow faster querying and joining.
    • If a column requires consistent sorting (ascending or descending order) in a query, for example:
      SELECT LastName, FirstNameFROM CustomersWHERE LastName LIKE N%ORDER BY LastName DESC     
      Consider creating the index on that column in the same order, for example:     
      CREATE CLUSTERED INDEX lastname_ndxON customers(LastName, FirstName) DESC. This prevents SQL Server from performing an additional sort on the data.
    • Create covering indexes wherever possible. A covering index covers all columns selected and referenced in a query. This eliminates the need to go to the data pages, since all the information is available in the index itself.

    Benefits of using stored procedures

    • Stored procedures facilitate code reuse. You can execute the same stored procedure from multiple applications without having to rewrite anything.
    • Stored procedures encapsulate logic to get the desired result. You can change stored procedure code without affecting clients (assuming you keep the parameters the same and don’t remove any result sets columns).
    • Stored procedures provide better security to your data. If you use stored procedures exclusively, you can remove direct Select, Insert, Update, and Delete rights from the tables and force developers to use stored procedures as the method for data access.
    • Stored procedures are a part of the database and go where the database goes (backup, replication, etc.).
    • Stored procedures improve performance. SQL Server combines multiple statements in a procedure into a unified execution plan.
    • Stored procedures reduce network traffic by preventing users from having to send large queries across the network.
    • SQL Server retains execution plans for stored procedures in the procedure cache. Execution plans are reused by SQL Server when possible, increasing performance. Note SQL 7.0/2000: this feature is available to all SQL statements, even those outside stored procedures, if you use fully qualified object names.
  45. Top 10 Must Have Features in O/R Mapping Tools at – 1. Flexible object mapping -Tables & views mapping, Multi-table mapping, Naming convention, Attribute mapping, Auto generated columns, Read-only columns, Required columns, Validation, Formula Fields, Data type mapping, 2. Use of existing Domain objects, 3. Transactional operations – COM+/MTS,Stand-alone, 4. Relationships and life cycle management – 1 to 1, many to 1, 1 to many, many to many, 5. Object inheritance – 1 table per object or 1 table for all objects – handling insert, update, delete and load data, 6. Static and dynamic queries, 7. Stored procedure calls,8. Object caching, 9. Customization of generated code and re-engineering support, 10. Code Template Customization
  46. Perform an audit of the SQL Code
    Transact-SQL Checklist

    • Does the Transact-SQL code return more data than needed?
    • Are cursors being used when they don’t need to be?
    • Are UNION and UNION SELECT properly used?
    • Is SELECT DISTINCT being used properly?
    • Does the WHERE clause make use of indexes in search criteria?
    • Are temp tables being used when they don’t need to be?
    • Are hints being properly used in queries?
    • Are views unnecessarily being used?
    • Are stored procedures being used whenever possible?
    • Inside stored procedures, is SET NOCOUNT ON being used?
    • Do any of your stored procedures start with sp_?
    • Are all stored procedures owned by DBO, and referred to in the form of databaseowner.objectname?
    • Are you using constraints or triggers for referential integrity?
    • Are transactions being kept as short as possible?
    • Is the application using stored procedures, strings of Transact-SQL code, or using an object model, like ADO, to communicate with SQL Server?
    • What method is the application using to communicate with SQL Server: DB-LIB, DAO, RDO, ADO, .NET?
    • Is the application using ODBC or OLE DB to communication with SQL Server?
    • Is the application taking advantage of connection pooling?
    • Is the application properly opening, reusing, and closing connections?
    • Is the Transact-SQL code being sent to SQL Server optimized for SQL Server, or is it generic SQL?
    • Does the application return more data from SQL Server than it needs?
    • Does the application keep transactions open when the user is modifying data?
  47. Application Checklist

    Thanks to the authors at and the other sites listed above.

.NET memory and performance improvement

January 17, 2005

Now that you have finished your .NET Application, the memory bogs you down?

Limiting memory usage of .NET applications is a requirement that often arises in programs that allocate and use large amounts of memory. The garbage collected environment that the CLR offers means that memory that is used to perform some calculation then discarded is not immediately collected once it is no longer needed, and application memory usage can become quite high in some situations. Rather than wait for all available memory to be exhausted before performing a full garbage collection, there are scenarios where preserving memory for other processes is a higher priority than the raw speed of the memory-intensive .NET application.

Well, there is a COM API RequestVirtualMemLimit to be called after overriding to prevent your application from hogging all the memory and waiting for the last instant for the GC to start freeing off memory. To the CLR, a failed RequestVirtualMemLimit call will appear the same as Windows running out of memory and returning a NULL pointer for a VirtualAlloc request. Rather than simply refusing to allocate any further memory, a gentler and more effective technique is to allow a small memory increase so exception objects can be successfully created, and an OutOfMemory exception can gracefully thrown and handled by managed code. If memory cannot be allocated for exception objects, the runtime will terminate without giving exception handlers a chance to execute, which will rarely be the desired behaviour.

Therefore, to place an effective cap on memory usage, an object implementing IGCHostControl needs to be provided to the runtime.

But the problem, is the “chicken and egg” problem. The ICorConfiguration interface, which is implemented by CorRuntimeHost, has a method called SetGCHostControl that allows an IGCHostControl-implementing object to be provided to the runtime. Unfortunately, it is not possible to retrieve an ICorConfiguration reference after the runtime has started. The QueryInterface logic of CorRuntimeHost fails throws an error when a request for ICorConfiguration is made, and the ICorRuntimeHost::GetConfiguration method, which returns a ICorConfiguration reference, fails when it is called post-startup. When certain hosting functionality is only available before the runtime is started, it is impossible to use the functionality from managed code. Managed code can never execute before the runtime starts, and if the functionality is required, as it is with the memory capping functions, the only option is to explicitly host the runtime using unmanaged code.
Read on at .
Thanks to the author for this insight into unmanaged code advantages in a managed world.

Looking forward to a better managed C++ in .NET 2.0

.NET Remoting

December 18, 2004

.NET Remoting is gaining a lot of importance so here’s some good links


Thanks to the authors for this info.

Should 4+1Views based Architecture be a standard for High Level Design documents

October 7, 2004

The template and details are at:

“To describe a software architecture, we use a model composed of multiple views or perspectives. In order to eventually address large and challenging architectures, the model we propose is made up of five main views

  • The logical view, which is the object model of the design (when an object-oriented design method isused),
  • the process view, which captures the concurrency, availability, performance and synchronization aspects of the design,
  • the physical view, which describes the mapping(s) of the software onto the hardware and reflects its distributed aspect,
  • the development view, which describes the static organization of the software in its development environment.
  • The description of an architecture—the decisions made—can be organized around these four views, and then illustrated by a few selected use cases, or scenarios which become a fifth view.”

Thanks to the Author – Philippe Kruchten – and IEEE for this invaluable experience paper.

Nice article on Unit Test Patterns

September 30, 2004

Think you know all the patterns in Unit Testing, think again, here are the various Unit Testing Patterns.

Unit Testing Patterns

Pass/Fail Patterns

  • The Simple-Test Pattern
  • The Code-Path Pattern
  • The Parameter-Range Pattern

Data Driven Test Patterns

  • The Simple-Test-Data Pattern
  • The Data-Transformation-Test Pattern

Data Transaction Patterns

  • The Simple-Data-I/O Pattern
  • The Constraint-Data Pattern
  • The Rollback Pattern

Collection Management Patterns

  • The Collection-Order Pattern
  • The Enumeration Pattern
  • The Collection-Constraint Pattern
  • The Collection-Indexing Pattern

Performance Patterns

  • The Performance-Test Pattern

Process Patterns

  • The Process-Sequence Pattern
  • The Process-State Pattern
  • The Process-Rule Pattern

Simulation Patterns

  • Mock-Object Pattern
  • The Service-Simulation Pattern
  • The Bit-Error-Simulation Pattern
  • The Component-Simulation Pattern

Multithreading Patterns

  • The Signalled Pattern
  • The Deadlock-Resolution Pattern

Stress-Test Patterns

  • The Bulk-Data-Stress-Test Pattern
  • The Resource-Stress-Test Pattern
  • The Loading-Test Pattern

Presentation Layer Patterns

  • The View-State Test Pattern
  • The Model-State Test Pattern

Read on at Advanced Unit Testing: Patterns by Marc Clifton

Thanks to the author for this material on Unit Testing.