Why you should add log analysis to your performance engineering practice

Do I have to wait until performance testing to answer performance questions?

I am often asked by customers why we have to wait so late in the development process to attack performance related actvities. Well, the short answer is that you don't. The difference between the two is the former is based almost entirely on performance testing and the latter is grounded in performance engineering principals, which can be used throughout the application lifecycle, from development all the way through production.

One of the more powerful and less invasive methods of investigating performance is actually to take a look at the logs generated by an application. They can tell you quite a bit about application performance as well if the application is deployed and working correctly in production.

We'll start with the types of items I typically take a look at in a live application using the web or app server logs

Web Log

  1. Do I have duplicate requests for the same resource from a user which show up in the logs. This is a telling attribute that one of several items are in play
    - Misconfigured web server related to resources which should be cached at the client
    - Misconfigured Content distribution network allowing requests through which should be cached at the CDN. I know of at least one customer who had an SSL site misconfigured for over a decade, which impacted both their ability to sustain load as well as distorted their planning and budgeting for hardware for a decade.
    - Developers are using the incorrect request method from the client. You may have everything correct but your developers have decided to use a no-cache request method which forces a re-request of the resource and additional load to the server.
  2. Do I have high request loads for common items such as images or font resources? If so, then I have a misconfigured web server or CDN which needs to be tuned to allow for caching of these resources types so avoid the extra load related to them
  3. Is my load balancer working correctly? If I am configured for round robin then I should see roughly the same number of end user IP addresses in the logs of each server. If there is a problem then it will become apparent when a count of remote hostnames by server is examined. This can be addressed quite quickly. A Misconfigured load balancer is one of the top three items which typically impact scalability in a performance test and in production. If you can find it passively then all the better.

Broadening out the scope of the log examination, take a look at each of the application servers in your application. Makes sure that the log level has not been set too high, such as to debug, in production. If you want to impact your performance one of the worst things you can do if run too high a log level. This turns the log disk subsystem into the drag anchor for full system performance.>/p>

With the proper configuration on your logs you can even collect the amount of time required to satisfy the individual requests, allowing you to roll up information on the most expensive requests in the system.

The absolute beauty of logs is that they are available almost as soon as the first line of code is written.

  • Developers are outputting debug information to logs to check for proper performance.
  • Time stamps are available to check how long code is taking to execute.
  • Examination is passive, non intrusive the actual act of execution

I recommend collecting logs for analysis as soon as systems enter unit testing, timing each compile and execute cycle. When the time increases the developer should have to explain why before merging code to the build tree. Creating a solid functional design is often not difficult. Creating one that performs well and meets functional spec takes solid design. Catching small changes in performance at the unit test and component assembly stage will prevent larger impacts later in the development and deployment cycles.

How?

Ok James, so now you have just provided me another task in addition to my performance testing duties. Sure, this allows introduction to projects at potentially an earlier stage or with diagnosing issues in production which means higher value for my team, but what about all of this information, how do I make sense of all of these logs?

Tools to Manage Log Data