ORF Reporting: Design, Part 2

In my previous post, I told you about performance concerns of the ORF 3.0’s new Reporting feature. Alas, performance is not the only thing that slightly reminds me a design nightmare.

One significant problem with the current log format is that is does not contain enough data for reporting or the data contained is not in the required format.

  • Logging of specific ORF events can be disabled. Not only they can be, some are disabled by default, for instance, logging Before Arrival whitelist/accept events is disabled in a fresh ORF installation. This makes it impossible to compile reports about whitelists at the Before Arrival filtering point, because the reporting engine has no clue what happens Before Arrival, except if the email (actually, the recipient) was blacklisted, which is logged.

    These defaults and the customizability aimed to simplify reading the logs and saving disk space. They certainly do, but they break reporting as well.
  • Another scenario when data is missing is when we need individual test reports. Currently, ORF does not log what test was performed, expect if a hit (i.e. whitelisting/blacklisting) or error was generated by that test. We could make a pretty good guessing from the “Tests: “ log entries, but this would require sequential processing (see later) of the logs and not even in that case we would be informed why a hit did not occur (e.g. there could have been a list exception, like with the AD Exception List
  • Individual list items cannot be identified from the logs. If you ever want to see how many emails your new Viagra keyword filter caught, the current log format offers little help. The comment of the keyword filter is logged, but that we rarely seen comments assigned to the keyword filters. It is not really user friendly this way, so we need to tell what exactly the keyword filter was. The current log format offers no way to fully identify the filter, though.
  • Reports would need time-sequential log processing. While this sounds pretty easy, it has some challenges, because event date range cannot be determined from the file name for two reasons. First, the format can be configured by the user. Second, due to time zone changes, a log file can be re-opened, so it is perfectly expected that a log will have an event from 23:45 and the next event will be the same day 23:12.

    So we need to load all log files and sort all events into ascending order. Remember the performance concerns? Imagine sorting 365.000.000 event dates in real-time! In addition, some reports like Greylisting need the original order (time zone changes could break the original order).

For all the above reasons, you will not be able to generate reports from your current logs, unfortunately. We are working on a new log format (work name “ORF PowerLog” :), specifically built for reporting, but we plan this format to take over the role of the current ORF logs (not in 3.0 – the Log Viewer will not support PowerLogs, due to time constraints).

The benefits of the new logs are numerous, for instance, it will finally break the no-Unicode law and subject logging will be supported, but it will also provide full support to identify which keyword filter caused a hit.

The drawback is the size, of course. The more information need to be logged, the larger the log will be. We will do everything to make the new log format to be as compact as possible, but at least 200% growth can be expected.

1 thought on “ORF Reporting: Design, Part 2

  1. Pingback: Vamsoft Insider » ORF Reporting: Design, Part 3: ORF PowerLogs

Leave a Reply

Your email address will not be published. Required fields are marked *

AlphaOmega Captcha Classica  –  Enter Security Code