OK. NO TCO OR ROI HERE. WE LIED.
Posted on February 23rd, 2006 by Peter | Permalink

This is a message to all the developers out there: Please don’t ship release code with debug prints!

We rely on DebugView a lot for debugging ORF internal releases and every software which pollutes the system debug output makes it harder to do. Not just for us, but for everyone else who uses this excellent debugging tool. Sure, we can exclude these processes from the trace, but time and time again a new one starts doing things like Office 2003’s type32.exe

[2932] Check exception: ..\dpgbase\regvalue.h (375), DPG::RegValue,class std::allocator > >::operator`class std::basic_string,class std::allocator >’, addr = 0×6126CA2E, last error = 123
[2932] Check exception: ..\dpgbase\regvalue.h (375), DPG::RegValue,class std::allocator > >::operator`class std::basic_string,class std::allocator >’, addr = 0×6126CA2E, last error = 123
[2932] Check exception: ..\dpgbase\regvalue.h (375), DPG::RegValue,class std::allocator > >::operator`class std::basic_string,class std::allocator >’, addr = 0×6126CA2E, last error = 123
[2932] Check exception: ..\dpgbase\regvalue.h (375), DPG::RegValue,class std::allocator > >::operator`class std::basic_string,class std::allocator >’, addr = 0×6126CA2E, last error = 123

Other notable programs are RssReader (ok, that is actually a debug release), Lookout for Outlook and Winamp, which dumps the entire playlist when it turns over.

Come on guys, it is only matter of conditional compiling.

Posted on February 17th, 2006 by Peter | Permalink

Extreme Programming (XP) has a nice principle called “Refactor as you go”. XP says that we should favor the simplest solution for a problem. I like the idea for a number of reasons, but most importantly for its cost-effective nature. Simpler design is faster to implement, results in less code (less bugs) and source code that is easier to read. “Designing simple” means designing for today’s problems; solutions designed for tomorrow often take ages to implement and output large amount of code to be tested.

This is not without price, however. Sometimes the design is proven too static for new requirements (see our last case) and often little attention is paid to modular design, reusability or consistency. This is where refactoring (improving code without changing its functionality) and “Refactor as you go” comes into the picture. The principle says that we should refactor to better code in small steps, as opposed to making big changes at once, because big refactorings are hard to do, open the door for new bugs and break at least one more XP discipline (“Small Releases”).

When programming methodologies and real life meet, you can feel the electricity in the air. Simple design serves short-term business goals very well (“get that new version shipped ASAP!”), but time-consuming and seemingly useless refactoring conflicts with them (“hey, why does it take two weeks to add a new report?”). It takes some self-discipline to take the Red Pill of refactoring, instead of the Blue Pill of the Commando Pattern. After all, it so much easier to get the job done quick, leaving tons of “TODO: Refactor” comments behind and it is so tempting to say that users care little about the software internals as long as it works. But then, points when you when you have to do a bigger refactoring always come.

Most of this week was spent with refactoring. Yes, it is a pretty big one, which means that there is seemingly no progress. ORF PowerLogs are still not fully integrated into ORF. The refactoring will take at least one more week, but even in mid-term, these changes will pay the price back, because the source code will be much easier to maintain. Not only PowerLog integration will be faster, but adding new tests to ORF or changing existing ones will take less time.

Stay tuned.

Posted on February 10th, 2006 by Peter | Permalink

PowerLogs Raw CSV ScreenshotI gave you a brief introduction to ORF 3.0 PowerLogs in my previous article, now I would like to tell you more about the details.

PowerLogs aim to provide much more detailed information than the current ORF text logs do. Some of this new data is needed for reporting and others are merely for the user, such as the subject of the email. Let’s see what’s new:

  • Email subject. Yes, the logs will tell you the subject of the email :) This will help identifying false positives and emails in general. We have to admit that logging the Message-ID did not really work out, but we could not log the subject previously due to the ANSI charset limitation.
    Of course, this will only work at the On Arrival filtering point, as there is no such thing as “email” at the Before Arrival filtering point.
  • Complete test log of every email. This is required by Reporting—in order to generate test effectiveness reports for individual tests, we must be able to tell how many emails the IP Blacklist or the Spamcop DNSBL checked and caught.
  • List references. Got an email blacklisted by the Keyword Filtering, but you have no idea which one caused the blacklisting? No problem, the log will tell you which keyword filter it was. I mean, not only the keyword filter comment (if any), but the keyword filter itself. Similarly, it will tell you the related IP Whitelist, Sender Blacklist, DNSBL or SURBL expression. This also means that you will be able to generate reports about individual DNSBLs effectiveness (maybe not in 3.0, but in a later version—the data is there).

Some more unsorted facts:

  • Logs and list references (see above) will be stored in separate files (n:1 relation, n log files and 1 reference file) to save disk space.
  • Both files will be in regular CSV format, with Unicode UTF-8 encoding. No more “Unicode comment cannot be logged” complains :)
  • Timestamps will be in Coordinated Universal Time (UTC), to avoid Daylight Saving Time (Summer Time) problems.
  • Log file name format will be fixed and will always contain the date when the log was generated. ORF test logs offer flexibility in this regard, but unfortunately that flexibility adds lots of extra complexity to log processing . The date will be generated in UTC, which might be a bit strange for those whose time zone offset is more than 1-2 hours.

Obviously, the more data logged means larger log files. Lot larger. Due to this, real-time reporting with full log processing would have very poor performance. Our fine-tuned CSV reader has about 8Mb/s processing speed, which is nowhere near to the performance needed to generate yearly reports reasonably fast (it would take hours on a high-load server). Also, those who run servers with high load will have gigabytes of logs in a year, which they will barely keep, not even for reporting.

To reduce the time needed, ORF will generate preprocessed report files daily (or more often). As ORF users have to be able to generate reports for a specific time range, e.g. for Jan 1, 2006-July 1, 2006, we cannot just make an incremental report. Instead, full reports with a given resolution are needed and when the user requests a report for Q3 2006, generate a summary of these preprocessed reports between the above dates. Of course, preprocessed reports will also take disk space, so resolution might not be 1 second or 1 minute. As the range of reports and their exact contents are yet to be specified, the question of resolution is still open, but it may be 1 hour or 24 hours.