Java performance improvement — Java 8+ streams vs loops and lists vs arrays
Thanks to the original writer : https://medium.com/levi-niners-crafts/java-performance-improvement-java-8-streams-vs-loops-and-lists-vs-arrays-e824136832d6
This is an experience recap of a single aspect of coding that is still very much in use and by all accounts will still be in use for the next 10 or so years. In Levi9 Serbia we always strive to write a clean code, and not blindly follow the trends that may not appear adequate in every situation. As two of the reasons from 999 lists say:
we’ll share secret ingredients.
we think first and code later.
We hope the knowledge shared in this article will provide advice and guidelines on when to use which approach in Java and what to expect.
What are Java 8+ streams?
Java 8 introduced streams. Not to be confused with input/output streams, these Java 8+ streams can also process data that goes through them. It was hailed as a great new feature that allowed coders to write algorithms in a more readable (and therefore more maintainable) way.
Instead of writing a complicated code for Java loops within/after loops, the coder could simply write a pipeline in a few lines and the intent of it would be clear at a glance (in most cases). It resulted in a cleaner code and reduced the need for comments (a huge upside for me personally).
Code writing became faster. Code writing became easier. Knowing how to use Java streams was one of the most important conditions to be accepted in most companies as a Java developer.
Until we realized just how much performance can suffer from overusing them.
To be clear, streams introduced in Java 8 were slow and the comparison from the title started to arise in many forms. Still the advantages were clear and once Java 11 came, streams were greatly optimized.
What will comparing loops to streams be like?
In this comparison, we will use OpenJDK 17 as that is the latest stable Java version at the time of writing this article. The scenario will be a single thread, as in no parallel executions. This is not due to the inability to have parallelization when using Java loops. It is simple to implement. We are choosing not to in order to keep things simple.
The scenario is simple as well and the pipeline flow is pretty like the flow we improved on while working on the project for an energy company in the Netherlands.
This flow collected billing items for contracts with up to 1000 grid connections and calculated billing for a year of that contract. The scenario we will run, while simplistic, retains most of the pipeline structure and will be enough to show the difference in performance.
The scenario:
-Has many elements
-Each element has an array of numbers
-For each element to:
-Calculates average of all sums.
Results of running pipeline scenarios
Table 1 — Results of running pipeline scenario
As you can see in the table above, when the number of elements increases, the run time increases as well.
Clearly, this can be improved. During our work we have tried two approaches.
The code used can be seen here https://github.com/levinine/stream-loops. We ran the tests with the changes mentioned and these are the results:
Table 2 — Results of running pipeline scenario
Time units description:
Java List usage makes things slower. How?
This is to be expected when considering how data is structured. In a LinkedList, each item is connected to the next and the previous one. In an ArrayList there is an array of pointers pointing to all the items in the list. This makes it easier and faster to access the nth item in the list. There are also a lot of functionalities in lists that are not present in arrays. Since we don’t need to use most of those functionalities, using lists just makes the whole data structure bloated and unnecessarily bulky.
Loops are faster even in Java 17
This may seem like a surprise at first, but it shouldn’t be. Making things light usually makes them faster. If the code is structured in a clear set of descriptively named functions, it will still be readable and maintainable.
People tend to forget due to the variety of functionalities Java provides that it is crazy fast. This is not a hyperbole. This is a fact. When limiting yourself to the vanilla part of Java, if a code is properly written, you can achieve speeds comparable to the execution of the similar code in Go, albeit with a much larger memory footprint.
How well-structured usage of arrays and loops affected our billing items collection
Imagine that elements are grid connections and numbers are billing items. Also imagine that the first step in the pipeline isn’t just sorting but also reading billing items of a Gas/Electricity connection from DB and that many connections have the same billing items.
To deal with this we used Hazelcast caching. We also corrected previously established iteration over the Guava multimap in which connections were grouped. Each thing helped us improve performance a bit and we ended up reducing the billing items collection and the processing time from the initial 27 days (when there were 1000 connections which was unworkable as a circuit breaker would kill the process long before) to 5 minutes.
How much was the speed improvement due to the “loop instead of stream” and “array instead of list” approach? - About 2%.
It did, however, drop heap usage significantly which allowed us to use Hazelcast caching without any problems or fear of OOM. And THAT made a significant speed difference.
So, dare I say, this approach provides great advantages when used.
Should we even use Java streams then?
A short answer is — absolutely.
Any way you slice it, a code once written will have to be changed or adapted at some point. Readability and maintainability of the code will forever be important for any person willing to be a professional Java developer.
A longer answer is — know when to use what. In most cases you should use a mix.
Sometimes you will use streams and sometimes loops. Sometimes you will use lists and sometimes arrays. The most important skill a Java developer, dabbling in performance improvements, should possess is the ability to use just enough stream pipelines to keep a code readable while using enough loops to make run times faster. While filtering out stuff from collections using streams is fun, there is no need for that. There is also no need to write a lot of code to do the same thing using loops. Almost any IDE today can use predefined snippets. Create some and use them. In many cases, loop setup will be the same and the only thing changed will be filtering conditions. Snippet usage will shorten coding time.
We hope this will provide some guidelines on improving Java code in legacy systems, as well in creating new ones, as differences listed here become more pronounced as Java version moves closer to 8. This part of coding is often underappreciated and undervalued, and yet, when used correctly — it will provide immense improvements in your code executions.
Thank you for reading, hope you had fun!