Java Application Latency Reduction – DZone

By Jessie Hobb On Dec 18, 2023

One of the hard and ambiguous problems I dealt with in my application development career was improving latency for a distributed data retrieval application.

It was a containerized Java application that was used to serve product ads on one of the biggest retail websites. The idea was to reduce latency so that it could provide room for additional processing, especially to run and experiment with advanced machine learning models to serve better ads for customers.

One of the techniques I used was memory analysis to get insights into activity around the JVM memory usage. Although it sounds trivial, I discovered major roadblocks that took some time for me to figure out. In the end, I was able to overcome each of them and successfully reduced application p99 latency from 400ms to 240ms.

Latency reduction was a new challenge for me, so I needed optimal tools to tackle it. There were many tools available, both open source and paid, but I found the eclipse memory analysis tool MAT to be the most useful among the free ones. There are many articles on how to install and use MAT, so I won’t go into detail about the same.

In this article, I will cover the challenges related to memory analysis of large production applications and how to overcome them.

Challenges

JVM heap memory footprint of large applications is quite big, and in my case, it was around 100 GB. Analyzing such a big heap dump requires a lot of memory to run the analyzer tool and is usually slow on regular laptops.
Large heap dumps consume an equal amount of disk space. In case there is not enough disk a heap dump command would fail or in worst case fill up the root partition and crash the host on which it is run.
A heap dump is a stop-the-world event. Taking a heap dump pauses all activity in the application, which can result in health check failure for the service and can result in it being terminated, making it hard to grab the heap dump file.

Solution

In case of a large heap dump, it would be best to use a cloud-based resource like AWS EC2 with sufficient memory and disk space.
To solve disk space issues, if the application is running on some cloud resource, then it usually has separate storage attached to it. Separate storage can be increased before taking a heap dump.
Check if the application is monitored using a periodical health check, e.g., if it is part of a load balancer. In such a case, it needs to be taken out of the serving fleet to avoid it being terminated once the heap dump command is started.
Take multiple heap dumps at certain intervals to capture change in service state.

Improvements

One of the biggest culprits was an in-memory cache, which resulted in an excessively retained heap, thus resulting in frequent garbage collection with latency impact.
Memory analysis gave a major clue regarding how the data index, which was used for data retrieval, was being used. It turned out that the full index was loaded in the JVM heap and was also stored on tmpfs, thus using twice the amount of required memory, which was unnecessary and also resulted in frequent garbage collection.

Conclusion

Analyzing memory for any large-scale production application is critical.

Caching data within an application may be useful but should be closely monitored to monitor for any degradation over time.

Heap dump analysis is a powerful tool. If done using the correct machines and tools, it can become painful.

Keep an eye out for health check routines for production applications while taking heap dumps to collect the same successfully.

I did not dive too much into details to keep the article short. If anyone wants more information, feel free to message me.