In the last year I ran into two problems with the ODP.NET driver of Oracle on a Windows platform. Both were related to the use of BLOB fields and the order of CRUD-operations in several threads accessing the BLOB fields in a DB.
Since Oracle is not willing to do anything about it, you might run into in it, even in future releases of the ODP.NET driver. To help you to avoid the pitfall I wrote this document.
I have .NET application that has a workflow (no WCF, just plain C# code) to handle document transformations in several separate processing steps called Actions (using a pipe-and-filter pattern).
To allow for optimal parallelism every Action is running in its own thread and multiple documents can be crawling through the flow at the same time.
To persist the original message and the final result, we take a copy of the message and store in the DB at specific locations in the flow. We use BLOB fields to store the message content. The original message creates (Insert) a new record in the DB and at the end this record is updated with the final transformation result.
The DB Actions (green) are the asynchronous to the message handling flow (blue): we take a copy of the message but we do not wait on the DB Actions to complete. The message keeps travelling through the flow.
Since all the Actions are running on separate threads, they are all waiting on the OS to appoint them some CPU time. Sometimes this means that a message is moving very fast from left to right (the bleu flow)
- Firstly because DB Actions are slow compared to the basic transformation steps in the other Actions.
- Secondly because the CPU was just appointed faster to the threads of non-DB related Actions.
This can lead to the DB Update Action firing before the DB Insert Action is finished or even has started.
We’ve foreseen this problem. To avoid a complex scheduling mechanism we just added a retry mechanism to the DB Update Action to allow for the DB Insert to finish.
Technically this all works fine but we ran into a bug of the ODP.NET driver. A really annoying bug because the driver gets in a deadlock resulting in the above system coming to a grinding halt.
After some time spent on debugging my code I started focusing on the calls to and from Oracle. So I activated Oracle Tracing. You can do this by adding some registry setting to the machine executing your code.
Go to 'KEY_LOCAL_MACHINE\SOFTWARE\ORACLE\ODP.NET\Assembly_Version' You'll find two parameters:
- TraceFileName: any valid path name and file name.
0 = None
1 = Entry, exit, and SQL statement information
2 = Connection pooling statistics
4 = Distributed transactions (enlistment and delistment)
8 = User-mode dump creation upon unmanaged exception
16 = HA Event Information
32 = Load Balancing Information To obtain tracing on multiple objects, simply add the valid values.
Keep in mind that Oracle has made it so easy (pun intended) that it will start tracing any process on the machine that accesses the Oracle DB. So if you do this on a server, you'll end up very fast with some gigs of trace files!
To avoid showing the boring details of the traces, I just give you the results.
Basically it came down to Oracle having multithreading problems in its driver. When accessing BLOB fields, where updates are tried before inserts succeeded on a record, under high load the driver deadlocks.
In the trace file I saw that for BLOB fields, the driver keeps a local list used as temporary holding station to communicate BLOB-s between the .NET client and the DB server. The driver has also span off a thread responsible to clean-up this list. I couldn't deduct whether it was triggered by regular time intervals or based on the list's size. Anyway, it's this driver's proprietary thread that collided with the threads from my code accessing the DB.
The hard time I had to indentify the problem was due to the fact it only occurs when you really put some stress on the system. So if you locally test it on your development machine you'll never encounter it but once in production, under heavy load, your application becomes unresponsive.
When confronted Oracle with this, they just said they concur my results but weren't going to solve it. So I modified my flow and instead of doing asynchronous inserts (outside the flow) the inserts became part of the flow. The updates stayed asynchronous. The flow now looks like:
Sometimes the flow receives two times exactly the same message as input. When the DB Insert Action tries to create a record for the message it gets a primary key violation because the record is already there.
My first strategy was: keep it simple, just always try to insert and ignore any primary key violation. Apparently Oracle does not like this.
Again the driver gets into a deadlock when the system is under heavy load. Looking at the traces I saw the same problem as explained before: the driver's internal thread colliding with the driver's client thread.
Just always check if the record exists before trying to insert the same record.
Looking back at the first problem I'm not that annoyed by the bug. You could say it makes kind of sense of having your data access layer single threaded even if you have multi threaded business logic running above it.
This is not the case with the second problem. As user of an Oracle library I'm not aware of its internals. So even if I use the driver single threaded from my code it can lock up.
You might imagine I'm not that impressed by Oracle anymore. I lost valuable time in debugging this.