Planning a Java API for +10k users

This article will be focused on some necessary points for a good performance on applications that will serve thousands of users. What you generally only realize when receiving a heavy load of requests and simultaneous access.

Sure, this will not be a silver bullet, the intention here is to bring some important configurations to avoid problems, but, it is only a piece, about everything you need to take care. I intend to write more articles like this, and I appreciate any suggestions of adjustments or new topics.

Stack reference:

For this article, I used as reference Java and it's known frameworks, Spring and Quarkus. Also, PostgreSQL, JPA, and some AWS services. But in general, the knowledge can be abstracted for any stack.

Database pool of connections

Starting from the bottom, here there's no exact formula for that configuration, you should test. Connection pool, in summary, is basically the quantity of connections that your application is allowed to open and keep open. Creating a connection to the database is an expensive operation, always that you could avoid, better, so, it's important to have some of them always active by default, closing only when necessary. Some implementations like Agroal (used on Quarkus) and HikariCP (used on SpringBoot) make this management easier. So, knowing the basics, now, you need to know how many connections your database is able to support. One way to figure out, on PostgreSQL for example, is running:

SHOW max_connections;

Let's say the response is 100. Then, keep in mind that you will need to configure in your application, at least, 2 main parameters: The minimum and maximum size of your polling.

Quarkus example:

quarkus.datasource.jdbc.min-size 
quarkus.datasource.jdbc.max-size

SpringBoot example:

spring.datasource.hikari.minimum-idle
spring.datasource.hikari.maximum-pool-size

To understand what values you can use, start questioning:

How far can my application be able to scale up? In case of a single monolithic application, you can consider letting your parameters to almost the extremes, min:10 / max: 90, for example.

If you are working in a microservices architecture, probably using Kubernetes, you may have a maxReplicas. Let's say, you have 10 replicas configured as maximum, so, you need to take in consideration all of them added up. The configuration could be something like: min: 3 / max: 8. Remember, for our example of a database, we have available 100 connections, considering a peak of access, and all pods scaled up, we can reach 80 active connections (8 connections x 10 pods), it's important to leave some free margin.

In case of lambdas (AWS Lambdas for example), this scale up/down can be even more aggressive. If your lambdas has connections to the database, pay attention to this, this kind of infrastructure is used to scale 10, 100, 500 instances quickly. I would configure a really small pool for this sort of application, something like: min: 1 / max: 3. Some services such as Amazon RDS Proxy, can help too, as an external pool, sharing the available connections.

Lazy loading

It is one of the biggest performance issues on applications using ORM frameworks like Hibernate. Consider this mapping:

Simple, right? Employees and Products are lazy here. Let's find all and expose them to some resource.

 [
    {
        "id": 1,
        "name": "Hey Doughnuts",
        "employees": [
            {
                "id": 1,
                "name": "Mateus"
            },
            {
                "id": 2,
                "name": "Jack"
            }
        ],
        "products": [
            {
                "id": 1,
                "name": "Red Velvet",
                "price": 5.99
            },
            {
                "id": 2,
                "name": "Chocolate",
                "price": 4.50
            }
        ]
    },
    {
        "id": 2,
        "name": "Chocolates Company",
        "employees": [
            {
                "id": 4,
                "name": "Jorge"
            }
        ],
        "products": []
    }
]

Example of response, looks like inoffensive, doesn't it? It's only some small records on our database. Now let's see the logs.

If you notice, 3 selects were done for a simple and small query. The entity Store was loaded, but to serialize the employees and the products, any java serializer framework like Jackson, would need to pass through the getters from these lazy properties, consequently, initializing them, forcing hibernate to execute more 2 SQL statements.

Of course, there are a lot of ways to avoid lazy initialization, or even, not serialize those lists. There are many articles that can help you to understand the n+1 problem, and improve the performance of your queries, like: https://vladmihalcea.com/n-plus-1-query-problem

This is not the point here. My goal is to make you understand the impact of your main endpoints. How many selects are being triggered on each one of the most used resources in your API? This diagnostic is extremely necessary. 1? 10? 50? Now, multiply these numbers by a peak of access, 100 users, 1000 users, 10000, 100000... Many companies have applications with 100 users, using expensive database clusters capable of operating thousands of connections, but completely committed with SQLs badly implemented, always needing to pay more and more to keep the minimum of necessary performance. This is not sustainable for a long period.

So, this sort of implementation is usual in your project?

Considering a database with 100 stores, 101 selects will be made.

Load only necessary data

Can you see any problem with this method? This kind of implementation is too common, and I see this lazy strategy spread between many developers. It's a waste of resources. And depending on how many times this is used on your services, can impact directly over the general performance.

In case you didn't realize the problem, if you only need the name on this method, why load the complete Store from the database? It can be even worse if there are some database checks for each user request, such as security validations, and user context loading. Instead, create a query requesting only the attributes you need.

Also, it's very important that any endpoint brings small paginated results and good ways to filter. Your users don't want to seek out hundreds of your records to find what it wants. It's a waste of time for them, and a waste of resources for the application. Instead, give the customer an opportunity to start with a keyword, and return a small size page. Don't do filters and pagination only programmatically, for example, using streams and conditionals, forward this responsibility to the database.

Cache

If there are thousands of requests for your application, many of them will request, many times, the same information. Between different users, and even the same user, going back to some resource, again, and again.

And, for each request, what your application needs to load? You can have data from the database or even for third services to create a request context for example. Mainly in those cases, cache is essential. Is not necessary to build the same objects every time when you know that the input will be exactly the same.

Example:

Let's say for each request you would need to fill some object User with more details. One query for the database, 2 calls for an external API, and that's it! See the image as a reference:

Considering the UserDetail does not change frequently, why not cache the complete object based on the userId? See the example:

Using SpringBoot, it can be even simpler, you could just annotate this method with @Cacheable using the library spring-boot-starter-cache.

Consider using an external cache such as Redis, and not something embedded in your application. Covering your mainly endpoints and external requests will make a huge difference to perform better on read-heavy workloads.

Foreachs through large data

If you are designing an API for thousands of users, be careful when going through them. A common error you can see is java.lang.OutOfMemoryError. Why? Check an example:

In our case, we are finding all users from the database, and for each of them, sending a message through email. Wouldn't be a problem if your table contains 10 records, 100, but it starts to be when more than that. Firstly, due to insufficient space in the Java heap, an OutOfMemoryError can be thrown, of course, we are trying to get everything into the memory, besides that, how long would this method be running for 100,000 users? It's the sort of method prohibited for large scale applications.

Long time Requests

Consider an endpoint that saves a sale. For each request, your api would need to generate a receipt pdf, get some taxes data from a third-party integration and send it back by email.

This is something that your user doesn't need to be waiting in front of a loading spinner. You can finish the operation, release your thread to handle other sales, and process this event in the background in a different pool, specialized in long time processing. Don't compromise your main thread pool with long time requests, otherwise, in a peak of access, quickly your api will be completely stuck with a few users, and won't be able to tackle even simple requests. If there is something that can be processed asynchronously, do it!

So, it's important to review what is strictly important, and fast, to be done synchronously.

Static content storage

Files such as images, videos, music, where should you store?

I would say, please, don't use a relational database to do it. You may think to create a clob column, to save the binary, and expose it in some way in your api. It works, but it is too heavy, and not a performatic way to save and get those files. Instead, consider putting in an external file storage, for example, AWS S3.

If your clients were used to get some image from your api like:

GET example.api.com/files/1234567890

Now, transferring this responsibility for a external service, they would call:

GET api-files.external.storage.com/1234567890

"Ok, but how to link those images to some entity on my domain?".

Let's say you have an User with a thumbnail, for each uploaded photo, you can generate an UUID for that file, save on the user table only that id, and forward the file to the external storage. So, the frontend would have a direct id to get by itself. Example:

User / API response:

{"user":{ "name":"example", "thumbnail":"12367-e89b-12d3-a456-426614" }}

Frontend call to renderize the thumbnail:

GET api-files.external.storage.com/12367-e89b-12d3-a456-426614

So, your application will not be occupied getting large files, also, your requests will do less IO blocking operations and respond faster.

Scalability

This article is focused on your API by itself. Some other stuff like scalability, would extend this article a lot, so, I intend to do it afterwards, but this is an important matter to have in mind. Is your application scalable? If you have horizontal scaling, does it multiply your capacity of processing or gives you more headaches? Concurrency problems, data in memory spread between different instances, etc. Or, if you have vertical scaling, is your application always using the full capacity? Share with me your issues!

I hope this article have helped you to perform better and help your project. Any suggestion feel free to contact me, would be amazing discuss different approaches about some topic. Also, if you liked, feel free to share!

Planning a Java API for +10k users

Mateus Parente

Database pool of connections

Lazy loading

Load only necessary data

Cache

Foreachs through large data

Long time Requests

Static content storage

Scalability

More articles by Mateus Parente

Others also viewed

Understanding the Spring Framework: A Complete Guide for Java Developers

Quarkus Framework and Comparison with Spring Boot

Spring vs. the World: Comparing Spring Boot Alternatives

What Are the New Features of SpringBoot3 ?

Five API Performance Optimization Tricks that Every Java Developer Must Know

Evolving with Java: Strategies for Upgrade and Migration

Extending ORDS with a Java plugin

🔧 Java Spring Boot in 2025: Still the Backbone of Modern Backend Development

Java 25 in Action: Real Features Solving Real Developer Pain

The Comprehensive Roadmap for Becoming a Java Developer: Step-by-Step Guide, Essential Skills, and Trending Tools

Explore content categories

Database pool of connections

Lazy loading

Load only necessary data

Cache

Foreachs through large data

Long time Requests

Static content storage

Scalability

More articles by Mateus Parente

Things you should do when starting a new IT job

Others also viewed

Understanding the Spring Framework: A Complete Guide for Java Developers

Quarkus Framework and Comparison with Spring Boot

Spring vs. the World: Comparing Spring Boot Alternatives

What Are the New Features of SpringBoot3 ?

Five API Performance Optimization Tricks that Every Java Developer Must Know

Evolving with Java: Strategies for Upgrade and Migration

Extending ORDS with a Java plugin

🔧 Java Spring Boot in 2025: Still the Backbone of Modern Backend Development

Java 25 in Action: Real Features Solving Real Developer Pain

The Comprehensive Roadmap for Becoming a Java Developer: Step-by-Step Guide, Essential Skills, and Trending Tools

Explore content categories