Always try to reproduce the errorThe theory is quite simple: if want to fix it, you must first see it in action, otherwise you may be spending time which will not get you to the solution as fast as when you actually manage to see the error for yourself.
If, at first, it looks like it doesn't repeat itself, don't discourage and check every bit of system which could cause the difference: use the same application files or application server (or at least, a very good copy), the same database, test with the same client, transaction, date, amounts and whatsoever. You never know which is the information which triggers exactly that scenario which leads to the error, so your test should be as similar as possible with the initially reported error (if you need to, don't hesitate to ask for enough details about what happened, as these details will help you trigger the error on demand).
Isolate the guilty codeYou have some idea about which application code is triggered, which database settings matter and so on, but you can't really pinpoint the problem. This is the time to confront your expectations about what should happen versus what really happens. Think about blocks which build together the application workflow which you are testing. Think about these blocks as a data chain, where each link has an expected output. For example, if you expect a data field to have a certain value relevant for future processing, then check which is the real value of that. Did everything go as planned? If yes, then the problem is up ahead, in another code block, if not, then take it a step back until you can tell which is the smallest bit which caused the problem.
Debug or "print" the intermediary dataWhen you still need more refinement in finding the guilty code at the exact line, there are two methods which give great results: debugging or printing data throughout the workflow.
Debugging is a great way for actually seeing the values of what happens behind the scenes, while processing. It isn't a hard way but it isn't an easy way either, if you are new to it. Basically, it involves connecting in real time to the application engine and actually pause it for getting a chance to study data values and intermediary points, as a method to get you to the problem core. More on debugging - in another blog post.
The other way is an easy one and it is usually applicable to most systems, from application servers to database functions or similar. The trick is to put some lines in the middle of the intermediary code, which will print for you the values of different data used in the calculation phase. The print does not necessarily mean it is about a console window getting text to be displayed: it can be an application log file, a database temporary table or the script output window of a database tool which runs your code. Any place where you can write some data and then check it - is good enough for your goal.
Confirm your findings by changing the behaviorOnce you think you have reached the bottom of your investigation and you have the problematic code at hand, you should do one last thing which will confirm that you are on the right track: change the application behavior. You don't need to think about the final solution if you are still a little bit uncertain about actually finding the problem. Confirm your findings by altering the application code. For example, write a dummy value, make it do something different, bypass it, do anything you like which, when you will rerun your test scenario, it will show you indeed that you have replaced the original error with the output of your intermediary actions.
Last, but not least...Apply the final solution. Of course, this is custom for each of the cases so I can't really give you an opinion about it. At least, not under this blog post.
What about you, which are your methods in tracking down the error? Fell free to write your experiences in the comment area!