KEMBAR78
Spring Boot | PDF | Spring Framework | Databases
0% found this document useful (0 votes)
383 views65 pages

Spring Boot

Uploaded by

dodgeviper0065
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
383 views65 pages

Spring Boot

Uploaded by

dodgeviper0065
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Spring Boot

Quickstart
Spring in a nutshell
It’s a very popular framework for building Java Applications and provides a large number of
helper classes and annotations.

The problem with this is that building a traditional Spring application is really hard, because
raises a few questions such as which JAR dependencies do you need, what kind of
configuration we have to use (xml or Java), how to install the server, etc.

Goals of Spring
 Lightweight development with Java POJOs (Plain Old Java Objects), making it much
simpler to build compared to the heavyweight EJBs from the early versions of J2EE
 Dependency injection to promote loose coupling, so instead of hard wiring your
objects together, you specify the wiring via configuration file or annotations.
 Minimize boilerplate Java code

Spring Boot Solution


It makes easier to start with Spring development:

 Minimizing the amount of manual configuration (performing auto configuration based


on props files and JAR class path)
 Help to resolve dependency conflicts (Maven or Gradle)
 Comes with an embedded HTTP Server, so you can get started quickly.

Spring Boot and Spring


 Spring Boot uses Spring behind the scenes
 Spring Boot simply makes it easier to use Spring.

Deploying Spring Boot


 Spring Boot apps can also be deployed in the traditional way.
 Deploy a WAR (Web Application archive) file to an external server: Tomcat, Jboss,
WebSphere, etc…

Spring FAQ
 Spring Boot replace Spring MVC, Spring Rest, etc…?
o No, instead, Spring Boot actually uses those technologies in the background.
Core Container
It’s the main item in Spring. It contains things like the Beans, SpEL (Spring Expression
Language), the Context and the Core.

Infrastructure

 Aspect Oriented Programming (AOP): Allows us to create this application wide services
(like logging, security, transactions, instrumentation) and apply these services to our
objects in a declarative fashion.
 Instrumentation: Its where you can use class loader implementation, such as Java
Agents, to remotely monitor the app with JMX (Java Management Extension).
Data Access Layer
It contains all the necessary things to facilitate our communication with the database (such as
JDBC, an ORM, Transactions, etc.)

 JDBC: It helps us to reduce by nearly %50 our source code to connect with the
database.
 ORM: It means Object to Relational Mapping.
 JMS: It means Java Messaging Service and it’s for sending async messages to a message
broker. Spring provides helper classes for JMS
 Transactions: Add transactions support. Spring makes heavy use of AOP behind the
scenes.

Web layer
It’s the layer where all web related classes go. It’s the home of the Spring MVC framework.
Spring Projects
Additional Spring modules built on top of the core Spring Framework. Think of them just as
simply add-ons.

Only use what you need:

 Spring Cloud
 Spring Data
 Spring Batch
 Spring Security
 Spring Web Services
 Spring LDAP.
 Etc.

Maven
Maven is a Project Management Tool and its most popular use is for build management and
dependencies.

What problems does Maven solve?


When building your Java project, you may need additional JAR files, one approach is manually
downloading these JAR files and add them one by one to the build path/ class path.

This problem is solved by Maven, where we only specify what JAR files we use and Maven
download and add them to the classpath, this last part, in the building and running part.

Handling JAR dependencies


When Maven retrieves a project dependency, it will also download supporting dependencies.
For example, Spring depends on commons-logging.
Standard directory structure
Each development team dreams up their own directory structure, but its not ideal for new
comers and it’s not standardized.

Maven solves this problem by providing a standard directory structure.

Benefits
 Most major IDEs have built-in support for Maven
o IDEs can easily read/import maven projects.
 Maven projects are portable, developers can easily share maven projects between
IDEs.

Advantages of Maven
 Dependency management
 Building and running your project without buildpath/classpath issues
 Standard directory structure

pom.xml
The POM File (Project Object Model) it’s the configuration file of our project.

It’s always located on the root of the Maven project.

POM file structure


It’s composed of three things:

 Project meta data


 Dependencies
 Plug ins
Project coordinates
Project coordinates uniquely identifies a project. Its, for example, precise information for
finding a house (city, street, number)

Dependency coordinates
The coordinates of a dependencies are the same as the project:

 Group ID, Artifact ID


 Version is optional, but it’s a best practice to always include it. This is for repeatable
builds, making sure that the project works with the version X.X.X of the dependency.

This is also referred as GAV (Group ID, Artifact ID and Version)

Maven wrapper files


 mvnw.sh and mvnw.cmd: Allow us to run a maven project. It’s not necessary to have
maven installed or present on our path variable. If the correct version of Maven is NOT
found on the computer, then it’s downloaded. mvnw.sh is for Linux and mvnw.cmd is
for Windows.
If the correct version of maven its already installed, then we can ignore/delete the
mvnw files.
Spring Boot starters
These dependencies that contains a “starter” in their artifact id, are a collection of maven
dependencies, all of compatible with each other.

e.g.

 spring-boot-starter-web:
o spring-web
o spring-webmvc
o hibernate-validator
o tomcat
o etc…

This saves the developer to have a really big list with all the individual dependencies, and
checking if the version of each one its compatible with the rest of dependencies.

They’re tested and verified by the Spring Development Team.

Reduces the amount of Maven configuration.

Spring Boot Parent for Starters


Spring boot provides a “Starter Parent” which is a special starter that provides maven default.

Its also used to override a default property

If we are using a Starter Parent, we don’t need to specify the Starters version of the
dependencies, because they inherit them from the starter parent.
Also, the Starter Parent provides default configuration for the spring boot plugin.

Benefits
 Default configuration: Java version, UTF encoding, etc…
 Dependency management, spring boot starters inherits their version from the starter
parent
 Default configuration of spring boot plugin.

Spring Boot Dev Tools


It’s a functionality that restarts the application when detect changes in the code. This is to
avoid restarting manually the code.

To use it, we only need to put the dependency in the POM file.

In case of IntelliJ, we need to do some configuration, because it doesn’t support Dev Tools by
default.

The configuration to do is Preferences > Build, Execution, Deployment > Compiler and check
the box “Build project automatically”.

Also, there’s another configuration to do, which is in Preferences > Advanced settings, where
we need to check the box “Allow auto-make to start”.

Spring Boot Actuator


It’s a functionality that exposes the endpoints to monitor and manage your applications.

With this, there’s no need to write additional code and you get endpoints to check the status of
the application.

The endpoints exposed by the actuator are prefixed with /actuator

Exposing endpoints
By default, only /health is exposed.

To expose others endpoints, we have to add them after a comma in the YML’s property
management.endpoints.web.exposure.include, or use a wildcard (*) and expose all of the
endpoints. Also, we need to enable the endpoint with management.
{endpoint}.env.enabled=true.

e.g. To expose /health and /info:


management.endpoints.web.exposure.include=health,info
management.info.env.enabled=true
Exclude endpoints
Also, we can exclude endpoints using the following properties in application.properties:
management.endpoints.web.exposure.exclude=health,info

/health
Checks the status of your application, it’s normally used by monitoring apps to see if the
application is up or down.

Health status is customizable based on the business logic.

/info
Can provide information about the application, which is customizable and, by default, it’s
empty.

We need to add properties to the application.properties to add this info.

The properties started with “info.”, will be used by the info endpoint.

e.g.
info.app.name=My cool APP
info.app.description=A really cool app, yohoo!
Info.app.version=1.0.0

Other endpoints
Actuator offer more than 10 endpoints, between them are:

 /auditevents: Audit events for your application


 /beans: List of all beans registered in the Spring Application Context.
 /mappings: List of all @RequestMapping paths
 etc
Application properties
By default, Spring Boot will load properties from application.properties.

The properties and their values defined here, can be used later in the application, with the
@Value annotation.

No additional coding or configuration than this is required.

Also you can use any name for the properties and define how many variables you want.

Spring boot properties


They’re the properties that can be configured in the application.properties.

Spring boot has more than 1000 properties to configure, and they’re grouped into the
following categories:
CORE - Logging properties
We can define the logging levels based on package names:

For example, the yellow one is going to log all INFO logs in the com.luv2code packages. This
configuration also logs all of its sub packages.

The logging levels are:

 TRACE
 DEBUG
 INFO
 WARN
 ERROR
 FATAL
 OFF

Also, we can save the log into a file with the following configuration:

WEB- Web properties


We can configure web properties such as the server port, context path or timeout of the
session.

ACTUATOR – Actuator properties


We already saw some of them:
SECURITY – Security properties
Some of the properties for security are:

DATA – Data properties


In this properties we define all data related properties, such as the database connection URL,
username and password.

Static content
By default, Spring Boot will load static resources (Such as JS files, CSS files, images, etc.) from
“/resources/static” directory.

Also, notice that if the packaging is JAR, then it won’t package the content of
“/src/main/webapps”, this is exclusively used by WAR packaging.

Templates
Spring boot includes auto-configuration for following template engines:

 FreeMarker
 Thymeleaf
 Mustache
Spring Core
Inversion of Control (IoC)
It’s the approach of outsourcing the construction and management of objects.

With an object factory, we can create basic objects defined in configuration.

The Spring container works basically as an object factory.

We going to say “give me a Coach object” and Spring will determine which coach object is
needed based on a configuration, and it will give reference to the object.

Spring container
Primary functions:

 Create and manage objects (IoC)


 Inject object dependencies (Dependencies injection).

Spring container configuration


We can use:

 XML configuration file (legacy)


 Java annotations (modern)
 Java source code (modern)

Dependency injection
Makes use of the dependency inversion principle.

The client delegates to another object the responsibility of providing its dependencies.

e.g.

We have a car factory and we want a car.


The car is composed by a lot of parts, so the car factory has to assemble it and give us a
functional car.

Returning to the spring container example, when getting a Coach, the coach object may have
additional dependencies

For example, in this case, we get the Head Coach, and the Head Coach has a staff of assistant
coaches, physical trainers, medical staff, etc.

So we can say “give me everything that I need to make use of the given Coach object”, and the
object factory will return the object ready to use.

Another example is a Controller that wants to use a Coach object, this is a dependency:
Injection types
There are multiple types of injection with Spring.

We will cover the two recommended ones:

 Constructor Injection
 Setter Injection

For your information, the not recommended ones are:

 Field injection

It’s not recommended because the method makes the code harder to unit test.

Which one to use


 Constructor injection
o Use it when you have required dependencies.
o Generally recommended by the spring.io development team as first choice.
 Setter injection
o Use this when you have optional dependencies.
o If dependency is not provided, your app can provide reasonable default logic.

@Autowired annotation
For dependency injection, Spring can use autowiring (@Autowired annotation).

Spring will look for a class that matches by type (class or interface).

Once found, Spring will inject it automatically, hence it is autowired.

For example, we want to inject the Coach implementation:

1. Spring will scan for @Components that implements the Coach interface.
2. If found, Spring will inject them, e.g. CricketCoach.

@Component annotation
It’s an annotation that marks the class as a Spring Bean.

A Spring Bean is just a regular Java class that’s managed by Spring.

The @Component annotation also makes the bean available for dependency injection.

Component Scanning
Spring will scan your Java classes for special annotations, such as @Component, and then
automatically register the beans in the Spring container.

By default, Spring Boot starts component scanning from same package as your main Spring
Boot application. Also, scans sub-packages recursively.

This implicitly defines a base search package and allows you to leverage default component
scanning. There’s no need to explicitly reference the base package name.
@SpringBootApplication annotation
This annotation enables auto configuration, component scanning and additional configuration.

Behind the scenes, is composed by the following annotations:

 @EnableAutoConfiguration: Enables Spring Boot’s auto-configuration support


 @ComponentScan: Enables component scanning of current package. Also recursively
scans sub-packages.
 @Configuration: Able to register extra beans with @Bean or import other
configuration classes.

In this annotation we can define what packages (besides the main one) we want to scan:

SpringApplication.run()
This method, bootstrap our Spring Boot application, and, behind the scenes, it creates
application context and registers all beans.

Also, starts the embedded Tomcat server, etc…

@Qualifier annotation
If there’s more than one implementation of a class to inject, with this annotation, we can
specify what implementation we want.

For example, we have multiple coach implementation

We specify that we want the CricketCoach class by using @Qualifier(“cricketCroach”)

The name is the same as the class, and the first character it’s always lower-case.
@Primary annotation
It’s an alternative solution for the @Qualifier.

If there’s multiple implementations of a class, then Spring will inject the one with the
@Primary annotation.

Anyways, by using @Primary, can have only one for multiple implementations. That means that
there’s going to be an error if there’s multiple implementations of the same class with the
@Primary annotation.

Mixing @Primary and @Qualifier


Between them, @Qualifier has higher priority, meaning that Spring will inject the class
specified with @Qualifier instead that the one with the @Primary annotation.

It’s recommended to use @Qualifier because it’s more specific and has higher priority.

Lazy Initialization
Instead of creating all beans up front, we can specify lazy initialization.

With this annotation, a bean will only be initialized in the following cases:

 It is needed for dependency injection.


 Or it is explicitly requested.

To use it, we have to add the @Lazy annotation to a given class.

To avoid putting a @Lazy in every bean (including controllers), we can make all classes Lazy by
adding a property in our application.properties, which is:

spring.main.lazy-initialization = true

Advantages:

 Only creates objects as needed.


 May help with faster startup time if you have large number of components.

Disadvantages:

 If you have web related components (like @RestController), it won’t be created until
requested.
 May not discover configuration issues until too late.
 Need to make sure you have enough memory for all beans once created.

Bean scopes
It means to the lifecycle of a bean, defining how long does the bean live, how many instances
are created and how is the bean shared.

The default scope in Spring is singleton.

We can specify the Scope by using the @Scope annotation:

There’s a few scopes such as:

 Singleton: Create a single shared instance of the bean. Default scope.


 Prototype: Creates a new bean instance for each container request. They’re lazy by
default.
 Request: Scoped to an HTTP web request. Only used for web apps.
 Session: Scoped to an HTTP web session. Only used for web apps.
 Global-Session: Scoped to a global HTTP web session. Only used for web apps.

Singleton
Spring Container creates only one instance of the bean by default.

It’s cached in memory and all dependency injections for the bean will reference the SAME
bean.
Bean lifecycle
The cycle when starts follows the next steps:

1. Container is created.
2. Beans are instantiated.
3. Dependencies are injected.
4. Internal Spring Processing.
5. Our Custom Init method. In this point the bean is ready to use.

And, when the container is shutdown, follows the next steps:

1. Our Custom Destroy method.

Bean Lifecycle Methods / Hooks


You can add custom code during bean initialization, calling custom business logic methods and
setting up handles to resources (db, sockets, files, etc.).

Also, you can add custom code during bean destruction, calling custom business logic methods
and cleaning up handles to resources (db, sockets, files, etc.).

It’s important to know that, with the prototype scoped beans, the destroy method it’s not
called.

Init: method configuration


To create our own custom initialization code in the bean, we have to use the @PostContruct
annotation over a method.

Destroy: method configuration


And, to create our own custom destroy code in the bean, we have to use the @PreDestroy
annotation over a method.

Create beans by @Configuration


We can also create beans creating them in a class with the @Configuration annotation.
The bean id by default, is going to be the method name.

The main use case for this kind of bean declaration, it’s for make an existing third-party class
available to Spring framework.

In these scenarios, you may not have access to the source code of third-party class, so you
would like to use the third-party class as a Spring bean.

e.g.

With this, we configured the S3 Client as a Spring Bean using @Bean, now we can use inject
our own configured S3Client without configuring it over and over again.

Spring benefits
Spring is more than just Inversion of Control and Dependency Injection, but for small basic
apps, it may be hard to see the benefits of Spring.

Spring is targeted for enterprise, real-time/real-word applications.

Spring provides features such as:

 Database access and Transactions


 REST APIs and Web MVC
 Security
 Etc.
Hibernate/JPA
What is Hibernate?
It’s a framework for persisting/saving Java objects in a database. We can use it for saving and
retrieving data from the database

Benefits of Hibernate
 Hibernate handles all of the low-level SQL.
 Minimizes the amount of JDBC code you have to develop.
 Hibernate provides the Object-to-Relational Mapping (ORM).

Object-To-Relational Mapping (ORM)


With the help of a framework (e.g. Hibernate), the developer defines mapping between Java
class and database table.

What is JPA?
Jakarta Persistence API (JPA) previously known as Java Persistence API, it’s the standard API for
Object-to-Relational-Mapping (ORM).

Only a specification, defines a set of interfaces and requires an implementation to be usable.

JPA – Vendor Implementations


JPA provides JPA spec, which are a list of interfaces. Now we need the implementations for
these interfaces, which, while we’re using JPA, are provided by Hibernate or EclipseLink.

Benefits of JPA
 By having a standard API, you aren’t locked to vendor’s implementation.
 Maintain portable, flexible code by coding to JPA spec (interfaces).
 Can theoretically switch vendor implementations e.g. if Vendor ABC stops supporting
their product, we could switch to vendor XYZ without vendor lock in.
JPA Flow
1. To manage the database, we need a DAO object (Data Access Object)

2. Our DAO needs a JPA Entity Manager. JPA Entity Manager is the main component for
saving/retrieving entities.
3. Our JPA Entity Manager needs a Data Source. The Data Source, defines database
connection info.
Both, JPA Entity Managar and Data Source, are automatically created by Spring Boot
(Based on the application.properties file, JDBC URL, user id, password, etc.).

EntityManager use cases


 Need low-level control over the database operations and want to write custom
queries.
 Provides low-level access to JPA and work directly with JPA entities.
 Complex queries that required advanced features such as native SQL queries or stored
procedure calls.
 When you have custom requirements that are not easily handled by higher-level
abstractions.

JPARepository use cases


 Provides commonly used CRUD operations out of the box, reducing the amount of
code you need to write.
 Additional features such as pagination, sorting.
 Generate queries based on method names.
 Can also create custom queries using @Query.
Entity Manager
Create an object
public void save(Student theStudent) {
entityManager.persist(theStudent);
}
Read by id
public Student findById(Integer id) {
return entityManager.find(Student.class, id);
}
Update an object
@Transactional
public void update(Student theStudent) {
entityManager.merge(theStudent);
}
Delete an object
@Transactional
public void delete(Integer id) {
Student theStudent = entityManager.find(Student.class, id);
entityManager.remove(theStudent);
}

JPA Query Language (JPQL)


JPA has the JPA Query Language (or JPQL).

It’s a query language for retrieving objects and it’s similar in concept to SQL (where, like, order
by, join, in, etc…).

However, JPQL is based on entity name and entity fields.

Find by query

Named parameters
Also, we can use named parameters, to avoid hard coding
Update by query

Delete by query

Create database tables


JPA/Hibernate provides an option to automatically create database tables. They’re created
based on Java code with JPA/Hibernate annotations.

Configuration
In the application.properties, we can set the property in charge of creating the database, which
is spring.jpa.hibernate.ddl-auto=create.

When the APP it’s started, JPA/Hibernate will drop tables and then create them.

There’s plenty of configurations that we can set besides of create.

 none: no action will be performed.



JPA Terminology
@Entity
It’s a Java class that’s mapped to a database table.

At a minimum, the entity class:

 Must be annotated with @Entity


 Must have a public or protected no-argument constructor, but the class can have other
constructors.

@Table
It’s an annotation to specify all table’s related information.

If it’s not provided, the table name it’s the same as the class.
It’s not recommended, because if the class name changes, then it probably won’t match the
existing database table name.

@Column
It’s an annotation to specify the column information.

Its use is optional, if not specified, the column name is the same name as Java field.

This approach It’s not recommended, because if the property name changes, then it probably
won’t match the existing database columns.

@Id
Marks the column as a primary key, which identifies the row as unique and cannot contain null
values.

@GeneratedValue
It’s to make the column generate the value by itself using, for example, an auto increment.

Some generations strategies are:

 GenerationType.AUTO: Pick an appropriate strategy for the particular database.


 GenerationType.IDENTITY: Assign primary keys using database identity column.
 GenerationType.SEQUENCE: Assign primary keys using a database sequence.
 GenerationType.TABLE: Assign primary keys using an underlying database table to
ensure uniqueness.

You can define your own CUSTOM generation strategy.

You have to create an implementation of org.hibernate.id.IdentifierGenerator and


override the method: public Serializable generate(..)

@RestController, @Repository and @Service


They’re sub annotations of @Component.

It’s useful to annotate the objects with their proper annotation because, doing it by this way,
the classes are more properly suited for processing by tools or associating with aspects.
How does Hibernate/JPA relate to JDBC?
Hibernate/JPA uses JDBC for all database communications.

Automatic Data Source Configuration


In Spring Boot, hibernate is the default implementation of JPA.

EntityManager is main component for creating queries, etc. Also, the EntityManager is from
JPA.

Based on configs, Spring Boot will automatically create the beans (DataSource, EntityManager,
etc.), then, you can inject these into your app, for example your DAO.

Spring Boot will automatically configure your data source for you, based on entries from Maven
pom file. Also, the DB connection info is read from application.properties:
spring.datasource.url=jdbc:mysql://localhost:3306/student_tracker
spring.datasource.username=springstudent
spring.datasource.password=springstudent

Also, there’s no need to give JDBC driver class name. Spring Boot will automatically detect it
based on URL.
REST CRUD APIs
REST
It stands for REpresentational State Transfer, and it’s a lightweight approach for communicating
between applications.

REST calls can be made over HTTP.

REST is language independent, so the client and server application can use ANY programming
language.

REST applications can also use any data format, where XML and JSON are commonly used.

Also, REST API, RESTful API, REST WS, RESTful WS, etc. they are all the same.

JSON
JSON Stands for JavaScript Object Notation, and it’s a lightweight data format for storing and
exchanging data.

It’s also language independent, not just JavaScript.


REST Over HTTP
The most common use of REST is over HTTP, and they methods are mostly used for CRUD
operations.

The client sends a HTTP request message and the services returns a HTTP response message.

HTTP Request Message


It has three parts:

 Request line: The HTTP commands (GET, POST, etc.).


 Header variables: request metadata.
 Message body: contents of message.

HTTP Response Message


They also have three parts:

 Response line: server protocol and status code (200, 404, 500, etc.).
 Header variables: response metadata.
 Message body: contents of message.

HTTP Response – Status Codes

MIME Content Type


The message format is described by MIME (Multipurpose Internet Mail-Extension) content
type.

The basic syntax is type/sub-type, e.g. text/html, application/json, etc.


REST Annotations
@RestController
Marks the class as a controller, also, provides REST support.

@RequestMapping
Defines part of the url used to call the API, it’s mostly used to define the base url.

@PathParameter
Marks the parameter as a path parameter, which means that the value of this parameter goes
in the URL.
@GetMapping("/students/{studentId}")
private Student getStudentById(@PathVariable int studentId){
return students.get(studentId);
}
@ExceptionHandler
Defines the method as exception handler.
@ExceptionHandler
public ResponseEntity<StudentErrorResponse>
handleException(StudentNotFoundException e){
StudentErrorResponse error = new StudentErrorResponse();

error.setStatus(HttpStatus.NOT_FOUND.value());
error.setMessage(e.getMessage());
error.setTimeStamp(System.currentTimeMillis());

return new ResponseEntity<>(error, HttpStatus.NOT_FOUND);


}
But with only this annotation, we only cover the exceptions in the controller where we put that
method.
@ControllerAdvice
Is similar to an interceptor/filter.

We can use to pre-process request to controllers, post-process responses to handle exceptions


and it’s perfect for a global exception handling.

This is actually real-time use of AOP (Aspect Oriented Programming).

@ControllerAdvice
public class StudentRestExceptionHandler {
@ExceptionHandler
public ResponseEntity<StudentErrorResponse>
handleException(StudentNotFoundException e){
StudentErrorResponse error = new StudentErrorResponse();

error.setStatus(HttpStatus.NOT_FOUND.value());
error.setMessage(e.getMessage());
error.setTimeStamp(System.currentTimeMillis());

return new ResponseEntity<>(error, HttpStatus.NOT_FOUND);


}
}
@Service
Applied to service implementations.

Spring will automatically register the service implementation.


Rest Objects
ResponseEntity
Is a wrapper for the HTTP response object. Provides fine-grained control to specify HTTP status
code, HTTP headers and response body.

Java JSON Data Binding


Data binding is the process of converting JSON data to Java POJO.

Also, it’s pretty much the same thing as Mapping, Serialization/Deserialization and
Marshalling/Unmarshalling, its converting from one format to another.

JSON Data Binding with Jackson


Spring uses the Jackson Project behind the scenes.

Jackson handles data binding between JSON and Java POJO.

Spring Boot Starter Web automatically includes dependency for Jackson.

Also, they have support for data binding with XML.

By default, Jackson will call appropriate getter/setter method, when you go from POJO to JSON
and viceversa.

JSON to Java POJO


Jackson calls the setters to convert from JSON to Java POJO.

It does NOT access internal private fields directly.


Java POJO to JSON
Jackson calls the getter methods to convert from Java POJO to JSON.

REST Exception Handling


We can handle the exceptions and return our desired output/error as a JSON.

REST API Design – Best practices


API Design Process
1. Review API requirements
2. Identify main resource/entity
3. Use HTTP methods to assign action on resource.

Bad practices
 DO NOT include actions in the endpoint, instead use HTTP methods to assign actions.

Design patterns
Service Layer
It’s an intermediate layer for custom business logic. It integrates data from multiple sources
(DAO/repositories).

It’s the implementation of the Service Façade design pattern.

Best practices
 Apply transactional boundaries at the service layer, it’s responsibility of the service
layer to manage transaction boundaries.
Spring Data JPA
It helps us to minimize boiler-plate DAO code, giving us for example, JpaRepository, which give
us CRUD implementations for free.
@Repository
public interface EmployeeRepository extends JpaRepository<Employee,
Integer> {
}

JPARepository – Advanced features


 Extending and adding custom queries with JPQL
 Query Domain Specific Language (Query DSL)
 Defining custom methods (low-level coding)

Spring Data REST


It gives us a REST CRUD implementation for free, helping us to minimize boiler-plate REST code.

Spring Data REST will expose these endpoints for free.

How does it work?


Spring Data REST will scan the project looking for JpaRepository, and will expose REST APIs for
each entity type of our JpaRepository.

By default, Spring Data REST will create endpoints based on entity type.

For the endpoints, it will use simple pluralized form:

 First character of Entity type is lowercase


 Then just adds an “s” to the entity

To make it work, we only have to add the dependency.


HATEOAS
The Spring Data REST endpoints are HATEOAS compliant.

HATEOAS means Hypermedia as the Engine of Application State. Hypermedia-driven sites


provide information to access REST interfaces, think of it as meta-data for REST data.

For a collection, meta-data includes page size, total elements, pages, etc.

Advanced features
 Supports pagination, sorting and searching.
 Extending and adding custom queries with JPQL.
 Query Domain Specific Language (Query DSL).

Configs and Sorting


 We can specify plural name/path with an annotation

 By default, Spring Data REST will return the first 20 elements, we can navigate to the
different pages of data using query param.
 It has the following properties available, and more:

 You can sort by the property names of your entity:

REST API Security


Spring Security Model
Spring Security defines a framework for security. It’s implemented using Servlet filters in the
background.

There’s two methods of securing an app: declarative and programmatic.

Spring Security with Servlet Filters


Servlet Filters are used to pre-process/post-process web request. They can route web request
based on security logic.

Spring provides a bulk of security functionality with servlet filters.


Spring Security in Action

Security Concepts
 Authentication: Check user id and password with credentials stored in app/db.
 Authorization: Check to see if user has an authorized role.

Declarative Security
Define application’s security constraints in configuration, using the annotation used for all java
config @Configuration.

Also, provides separation of concerns between application code and security.

Programmatic Security
Spring Security provides an API for custom application config.

It provides greater customization for specific app requirements.

Enabling Spring Security


Just have to add the dependency spring-boot-starter-security.

Spring Security configuration


You can override default user name and generated password:

Authentication and Authorization


There’s different techniques for defining user, passwords and roles:

 In-memory
 JDBC
 LDAP
 Custom/Pluggable
 Others…
Spring Security Password Storage
In Spring Security, passwords are stored using a specific format

In our examples, we have the followings encodings:

 noop: Plain text passwords


 bcrypt: BCrypt password hashing, which is one-way hashing.

For example:

@Configuration
public class DemoSecurityConfig {

@Bean
public InMemoryUserDetailsManager userDetailsManager(){
UserDetails john = User.builder()
.username("john")
.password("{noop}test123")
.roles("EMPLOYEE")
.build();

return new InMemoryUserDetailsManager(john, mary, susan);


}
}

Restricting Access to Roles


General syntax

e.g.
@Bean
public SecurityFilterChain filterChain(HttpSecurity http) throws
Exception {
http.authorizeHttpRequests(configurer ->
configurer
.requestMatchers(HttpMethod.GET,
"/api/employees").hasRole("EMPLOYEE")
.requestMatchers(HttpMethod.GET,
"/api/employees/**").hasRole("EMPLOYEE")
.requestMatchers(HttpMethod.POST,
"/api/employees").hasRole("MANAGER")
.requestMatchers(HttpMethod.PUT,
"/api/employees").hasRole("MANAGER")
.requestMatchers(HttpMethod.DELETE,
"/api/employees/**").hasRole("ADMIN"));

http.httpBasic(Customizer.withDefaults());

http.csrf(csrf -> csrf.disable());

return http.build();
}

Cross-Site Request Forgery (CSRF)


Spring Security can protect against CSRF attacks.

Basically, what they do in the background is embed additional authentication data/token into
all HTML forms, and, on subsequent request, web app will verify token before processing.

The primary use case is traditional web applications (HTML forms, etc…).

When to use CSRF protection?


The Spring Security team recommends:

 Use CSRF protection for any normal browser web request.


 And traditional web apps with HTML forms to add/modify data.

If you are building a REST API for non-browser clients, you may want to disable CSRF
protection.

In general, it’s not required for stateless REST APIs that use POST, PUT, DELETE and/or PATCH.

Database Support in Spring Security


Spring Security can read user account info from database. By default, you have to follow Spring
Security’s predefined table schemas.

You can also, customize the table schemas. It’s useful if you have custom tables specific to your
project.
Internally, Spring Security uses the “ROLE_” prefix.

And with this, we tell Spring Security to use JDBC authentication with our data source.

@Bean
public UserDetailsManager userDetailsManager(DataSource dataSource){
return new JdbcUserDetailsManager(dataSource);
}
BCrypt
It’s an algorithm that performs one-way encrypted hashing and it’s recommended by the
Spring Security team to encrypt the database sensible data.

It adds a random salt to the password for additional protection and includes support to defeat
brute force attacks.

The salt are the first 22 characters after the last dollar sign:
$2y$13$<this is the salt, 22 chars><this is the password hash>

Spring Security Login Process

The password from DB is NEVER decrypted, because bcrypt is a one-way encryption algorithm.

Spring Security database support custom tables


We need to tell Spring how to query our custom tables.

We need to provide:

 Query to find user by user name


 Query to find authorities/roles by user name

To do it, we have to do the following code:


@Bean
public UserDetailsManager userDetailsManager(DataSource dataSource){
JdbcUserDetailsManager jdbcUserDetailsManager = new
JdbcUserDetailsManager(dataSource);

jdbcUserDetailsManager.setUsersByUsernameQuery("select user_id,
pw, active from members where user_id = ?");

jdbcUserDetailsManager.setAuthoritiesByUsernameQuery("select
user_id, roles from roles where user_id = ?");

return jdbcUserDetailsManager;
}
Spring MVC
Thymeleaf
Thymeleaf it’s an open source Java templating engine, commonly used to generate the HTML
views for web apps.

However, it’s a general purpose templating engine, we can use Thymeleaf outside of web apps.

We can create Java apps with thymeleaf, no need for Spring. It’s a separate project, unrelated
to spring.io. But there’s a lot of synergy between the two projects.

What is a Thymeleaf template?


Can be an HTML page with some Thymeleaf expressions. It includes dynamic content from
Thymeleaf expressions.

Where is the Thymeleaf template processed?


In a web app, Thymeleaf is processed on the server. The results are included in HTML returned
to the browser.

Development process
There’s two steps:

1. Add the dependency in the POM file.


2. Develop Spring MVC Controller

Having the Thymeleaf dependency in the POM, Spring will auto-configure to use
Thymeleaf.
Also, this will generate a template called helloworld.html (just like the returned String)
in src/main/resources/templates/helloworld.html.

@Controller
public class DemoController {
@GetMapping("/hello")
public String sayHello(Model theModel){
theModel.addAttribute("theDate", new Date());

return "helloworld";
}
}

3. Create Thymeleaf template

We need the th:”http://www.thymeleaf.org”, to use Thymeleaf expressions.


Also, There’s a thymeleaf expression in the example that comes from our Controller:
<!DOCTYPE HTML>
<html xmlns:th="http://www.thymeleaf.org">

<head>
<title>Thymeleaf Demo</title>
</head>

<body>
<p th:text="'Time on the server is ' + ${theDate}"/>
</body>

</html>

Where to place Thymeleaf template?


In Spring Boot, your Thymeleaf template files go in src/main/resources/template. And for web
apps, Thymeleaf templates have a .html extension.

Additional Features
 Looping and conditionals
 CSS and JavaScript integration
 Template layouts and fragments.

Using CSS with Thymeleaf templates


You have the option of using:

 Local CSS files as part of your project.


 Referencing remote CSS files.

Also, Spring Boot will search following directories for static resources:

1. /META-INF/resources
2. /resources
3. /static
4. /public

This, in a top-down fashion.


Development process
1. Create CSS file
Spring Boot will look for static resources in the directory src/main/resources/static:

Also, can have any sub-directory.

2. Reference CSS in Thymeleaf template

3. Apply CSS

3rd Party CSS Libraries – Bootstrap


It’s a local installation, we have to download the files and add them to /static/css directory.

We also can access them remotely


Useful code
 Send data to other endpoint:
<form th:action="@{/processForm}" method="GET">
<input type="text" name="studentName"
placeholder="What's your name?"/>
<input type="submit"/>
</form>
 Get data sent in form:
<body>
Hello World of Spring!
<br><br>
Student name: <span th:text="${param.studentName}"/>
</body>
 Data binding on HTML:
<body>

<h3>Student Registration Form</h3>

<form th:action="@{/processStudentForm}" th:object="${student}"


method="POST">
First name: <input type="text" th:field="*{firstName}" />

<br><br>

Last name: <input type="text" th:field="*{lastName}" />

<br><br>

<input type="submit" value="Submit" />


</form>
</body>
 Drop down list:
Country:
<select th:field="*{country}">
<option th:value="Brazil">Brazil</option>
<option th:value="France">France</option>
<option th:value="Germany">Germany</option>
<option th:value="India">India</option>
</select>
 Radio buttons:
<input type="radio" th:field="*{favoriteLanguage}" th:each="tempLang :
${languages}" th:value="${tempLang}" th:text="${tempLang}"/>

 Check boxes:
<input type="checkbox" th:field="*{favoriteSystems}"
th:value="Linux">Linux</input>
<input type="checkbox" th:field="*{favoriteSystems}"
th:value="macOS">macOS</input>
<input type="checkbox" th:field="*{favoriteSystems}"
th:value="'Microsoft Windows'">Microsoft Windows</input>

 Dynamic list:
<ul>
<li th:each="tempSystem : ${student.favoriteSystems}" th:text="$
{tempSystem}"/>
</ul>

Spring MVC
Behind the Scenes
Components of a Spring MVC Application
 A set of web pages to layout UI components.
 A collection of Spring beans (controllers, services, etc...).
 Spring configuration (XML, Annotations or Java).

Spring MVC Front Controller

Front controller known as DispatcherServlet is part of the Spring Framework. For that, it’s
already developed by Spring Dev Team

The parts that we have to create are:

 Model objects (orange).


 View templates (dark green).
 Controller classes (yellow).

Controller
Its code created by the developer, and contains our business logic:

 Handle the request.


 Store/retrieve data (db, web service…)
 Place data in model.

Also, it sends us to the appropriate view template.

Model
It contains our data and store/retrieves data via backend systems (database, web services,
spring beans, etc.).

Its where the data it’s placed, that can be any Java object/collection (Strings, Objects, Info from
Database, etc.).

Our View page can access data from the model.


View template
Spring MVC is flexible, it supports many view templates. For it, it’s recommended thymeleaf.
The developer creates a page that displays data.

Other view templates


Other view templates supported are Groovy, Velocity, Freemarker, etc…

Sending Data with GET method

When using form with method GET, the form data is added to end of URL as name/value pairs,
e.g. theUrl?field1=value1&field2=value2…

Sending Data with POST method

Form data is passed in the body of HTTP request message.

Data Binding
Spring MVC forms can make use of data binding to automatically setting or retrieving data from
a Java object/bean.

In our Spring Controller, before we show the form, we must add a model attribute. This is a
bean that will hold form data for the data binding.
And the HTML part for this should be

The name of the th:object should be the same as the added to the model:

Also, the *{…} in the inputs are a shortcut for student.firstName and student.lastName.

With this, when the form is loaded, Spring MVC will read student from the model, then call
student.getFirstName() and student.getLastName().

Now, to get the data modified from the model, we have to add it as a parameter like this:

And, for getting the modified information, we just have to get it:
Drop-Down lists
To set a value from drop down list to a property of an object

Dynamic drop-down list


To generate a dynamic, drop down list, we have to add a list to our model:
model.addAttribute("countries", countries);

After that, we have to iterate over the list and generate every entry like this:
Country:
<select th:field="*{country}">
<option th:each="tempCountry : ${countries}" th:value="$
{tempCountry}" th:text="${tempCountry}" />
</select>

Radio Buttons
To assign info to a field from a radio button, we can do this:

Dynamic radio buttons


The dynamic way it’s like for drop-down, except that the th:each it’s in an input:
<input type="radio" th:field="*{favoriteLanguage}" th:each="tempLang :
${languages}" th:value="${tempLang}" th:text="${tempLang}"/>
Check Boxes
For check boxes, it’s the same as radio. The only thing that changes it’s the type, that’s
“checkbox”.

Just to know, if the value has spaces, we have to put it in single quotes, just like the value
Microsoft Windows in the example.

Dynamic check boxes


Just like dynamic radio buttons, only changing the type to “checkbox”.
Validation
Java’s Standard Bean Validation API
Java has a standard Bean Validation API, it defines a metadata model and API for entity
validation.

Spring Boot and Thymeleaf also support the Bean Validation API.

Bean Validation Features


You can validate:

 Required.
 Length.
 Numbers.
 Using regular expressions.
 And custom validation.

Validation Annotations
Some validation annotations are:

 @NotNull: Checks that the annotated value is not null.


 @Min: Must be a number >= value.
 @Max: Must be a number <= value.
 @Size: Size must match the given size.
 @Pattern: Must match a regular expression pattern.
 @Future / @Past: Date must be in future or past of given date.
 Others…

The Controller it’s in charge of performing the validation.

An example of the annotations can be:

Later, in the HTML, we have to check for errors:


Also, we have to check for errors in the Controller to define to which form is the user going to
go:

Objects
HttpServletRequest
Holds HTML form data.

Model
Container for our data. When it comes as parameter in a request mapping method, it’s empty.

Annotations
@RequestMapping
This Mapping handles ALL HTTP methods, e.g. GET, POST, etc.

But we can specify the methods to handle:

Any other HTTP Request method will get rejected.

@GetMapping
It’s the short way of using @RequestMapping(path=”/processForm”,
method=RequestMethod.GET).

@PostMapping
It’s the short way of using @RequestMapping(path=”/processForm”,
method=RequestMethod.POST).

Useful Code
 Get HTML form data and send data to the model.
@RequestMapping("/processFormVersionTwo")
public String lestShoutDude(HttpServletRequest request, Model model){
String theName = request.getParameter("studentName");
theName = theName.toUpperCase();

String result = "Yo! " + theName;

model.addAttribute("message", result);

return "helloworld";
}
 Get HTML form data and send data to the model. Without using HttpServletRequest
object.
@RequestMapping("/processFormVersionThree")
public String processFormVersionThree(@RequestParam("studentName")
String theName, Model model){
theName = theName.toUpperCase();

String result = "Hey my friend! " + theName;

model.addAttribute("message", result);

return "helloworld";
}
 Data binding on Java:
@GetMapping("/showStudentForm")
public String showForm(Model model){
Student student = new Student();

model.addAttribute("student", student);

return "student-form";
}
@PostMapping("/processStudentForm")
public String processStudentForm(@ModelAttribute("student") Student
student){
System.out.println("student: " + student.getFirstName() + " " +
student.getLastName());

return "student-confirmation";
}
JPA/Hibernate advanced mappings
Mappings
One to One Mapping
Used mostly to separate information in two tables. E.g.:

One to Many/Many to One Mapping


Used mostly for a father-children relationship. E.g.:

Many to Many Mapping

Cascade
It’s apply the same operation to related entities.

e.g. If I save an instructor, it will also save its details

Also, if I delete the instructor, it will delete its details.


Cascade delete
It depends of the case, for example, it’s not applicable here:

Cascade types
 PERSIST: If entity is persisted/saved, related entity will also be persisted.
 REMOVE: If entity is removed/deleted, related entity will also be deleted.
 REFRESH: If entity is refreshed, related entity will also be refreshed.
 DETACH: If entity is detached (not associated w/ session), then related entity will also
be detached.
 MERGE: If entity is merged, then related entity will also be merged.
 ALL: All of above cascade types.

Configure cascade type

Fetch types: Eager vs Lazy loading


Eager
Will retrieve everything.

Lazy
Will retrieve on request.

Mapping directions
Uni-directional
It’s a one-way relationship
Bi-directional
It’s when you can access the relationship object from both sides.

For this type of relationships, we use mappedBy to help find associated instructor.
Entity Lifecycle
Many operations can be done over an entity:

 Detach: If entity is detached, it’s not associated with a Hibernate session


 Merge: If instance is detached from session, then merge will reattach to session.
 Persist: Transitions new instances to managed state. Next flush/commit will save in db.
 Remove: Transitions managed entity to be removed. Next flush/commit will delete
from db.
 Refresh: Reload/synch object with data from db. Prevents stale data (the entity has
different data from the database).
Real World Insights
Choosing Partitions Count & Replication Factor
Partitions Count & Replication Factor
These are the two most important parameters when creating a topic.

They impact performance and durability of the system overall.

It’s best to get the parameters right the first time.

 If the partitions count increases during a topic lifecycle, you will break your keys
ordering guarantees.
 If the replication factor increases during a topic lifecycle, you put more pressure on
your cluster, which can lead to unexpected performance decrease.

Choosing the Partitions Count


Each partition can handle a throughput of a few MB/s (measure it for your setup!).

More partitions imply:

 Better parallelism, better throughput.


 Ability to run more consumers in a group to scale (max as many consumers per groups
as partitions).
 Ability to leverage more brokers if you have a large cluster.
 BUT more elections to perform for Zookeper (if using Zookeper).
 BUT more files opened on Kafka.

Guidelines:

 Partitions per topic = MILLION DOLAR QUESTION


o (Intuition) Small cluster (< 6 brokers): 3 x # brokers.
o (Intuition) Big cluster (> 12 brokers): 2 x # brokers.
o Adjust for number of consumers you need to run in parallel at peak
throughput.
o Adjust for producer throughput (increase if super-high throughput or projected
increase in the next 2 years).
 TEST! Every Kafka cluster will have different performance.
 Don’t systematically create topics with 1000 partitions.
Choosing the Replication Factor
Should be at least 2, usually 3, maximum 4.

The higher the replication factor (N):

 Better durability of your system (N-1 brokers can fail).


 Better availability of your system (N-min.insync.replicas if producer acks=all)
 BUT more replication (higher latency if acks=all)
 BUT more disk space on your system (50% more if RF is 3 instead of 2).

Guidelines:

 Set it to 3 to get started (you must have at least 3 brokers for that).
 If replication performance is an issue, get a better broker instead of less RF.
 Never set it to 1 in production.

Cluster guidelines
Total number of partitions in the cluster:

 Kafka with Zookeper: max 200.000 partitions (Nov 2018) – Zookeper Scaling limit.
o Still recommend a maximum of 4000 partitions per broker (soft limit).
 Kafka with KRaft: potentially millions of partitions.

If you need more partitions in your cluster, add brokers instead.

If you need more than 200.000 partitions in your cluster, follow the Netflix model and create
more Kafka clusters.

Over all, you don’t need a topic with 1000 partitions to achieve high throughput. Start at a
reasonable number and test the performance.

Topic Naming Conventions


Naming a topic is “free-for-all”. But it’s better to enforce guidelines in your cluster to ease
management. You are free to come up with your own guideline.
Paint bike shed topic naming convention
<message type>.<dataset name>.<data name>.<data format>

Message type:

 logging: for logging data (slf4j, syslog, etc.).


 queuing: for classical queuing use cases.
 tracking: for tracking events such as user clicks, page views, ad views, etc.
 etl/db: for ETL and CDC use cases such as database feeds.
 streaming: for intermediate topics created by stream processing pipelines.
 push: for data that’s being pushed from offline (batch computation) environments into
online environments.
 user: for user-specific data such as scratch and test topics.

The dataset name is analogous to a database name in traditional RDBMS systems. It’s used
as a category to group topics together.

The data name field is analogous to a table name in traditional RDBMS systems, though it’s
fine to include further dotted notation if developers with to impose their own hierarchy within
the dataset namespace.

The data format for example .avro, .json, .protobuf, .csv, .log.

Use snake_case.
Advanced Topics Configuration
Changing a Topic Configuration
Why should I care about topic configuration?
Brokers have defaults for all topic configuration parameters.

These parameters impact performance and topic behavior.

Some topics may need different values than the defaults:

 Replication factor.
 # of Partitions.
 Message size.
 Compression level.
 Log Cleanup Policy.
 Min Insync Replicas.
 Other configurations.

A list of configurations can be found in official broker configs documentation.

Segments and Indexes


Partitions and Segments
Topics are made of partitions and partitions are made of segments (files).

Only one segment is ACTIVE (the one data is being written to)

Two segment settings:

 log.segment.bytes: the max size of a single segment in bytes (default 1GB).


 log.segment.ms: the time Kafka will wait before committing the segment if not full (1
week).

Segments and Indexes


Segments come with two indexes (files):

 An offset to position index: helps Kafka find where to read from to find a message.
 A timestamp to offset index: helps Kafka find messages with a specific timestamp.
Segments: Why should I care?
A smaller log.segment.bytes (size, default 1GB) means:

 More segments per partitions.


 Log compaction happens more often
 But Kafka must keep more files opened (Error: Too many open files)

Ask yourself how fast will I have new segments based on throughput?

A smaller log.segment.ms (time, default 1 week) means:

 You set a max frequency for log compaction (more frequent triggers).
 Maybe you want daily compaction instead of weekly.

Ask yourself how often do I need log compaction to happen?

Log Cleanup Policies


Many Kafka clusters make data expire, according to a policy. That concept is called log cleanup.

 Policy 1: log.cleanup.policy = delete (Kafka default for all user topics)


o Delete based on age of data (default is one week).
o Delete based on max size of log (default is -1 == infinite).
 Policy 2: log.cleanup.policy = compact (Kafka default for topoc __consumer_offsets)
o Delete based on keys of your messages.
o Will delete old duplicate keys after the active segment is committed.
o Infinite time and space retention.

Log Cleanup: Why and When?


Deleting data from Kafka allows you to:

 Control the size of the data on the disk, delete obsolete data.
 Overall: Limit maintenance work of the Kafka Cluster.

How often does log cleanup happen?

 Log cleanup happens on your partition segments!


 Smaller/More segments mean that log cleanup will happen more often.
 Log cleanup shouldn’t happen too often => takes CPU and RAM resources.
 The cleaner checks for work every 15 seconds (log.cleaner.backoff.ms).
Log Cleanup Policy: Delete
 log.retention.hours:
o Number of hours to keep data for (default is 168 – one week).
o Higher number means more disk space.
o Lower number means that less data is retained (if your consumers are down
for too long, they can miss data).
o Other parameters allowed: log.retention.ms, log.retention.minutes (smaller
unit has precedence).
 log.retention.bytes:
o Max size in Bytes for each partition (default is -1 – infinite).
o Useful to keep the size of a log under a threshold.

Use cases – two common pair of options:

 One week of retention: log.retention.hours = 168 and log.retention.bytes=-1.


 Infinite time retention bounded by 500 MB: log.retention.ms = -1 and
log.retention.bytes = 524288000.

Log Cleanup Policy: Compact


Log compaction ensures that your log contains at least the last known value for a specific key
within a partition.

Very useful if we just require a SNAPSHOT instead of full history (such as for a data table in a
database).

The idea is that we only keep the latest “update” for a key in our log.

Example
We want a topic of employee-salary and we want to keep the most recent salary for our
employees:
Log Compaction Guarantees
Any consumer that’s reading from the tail of a log (most current data) will still see all the
messages sent to the topic.

Ordering messages, it kept, log compaction only removes some messages, but does not re-
order them.

The offset of a message is immutable (it never changes). Offsets are just skipped if a message is
missing.

Deleted records can still be seen by consumers for a period of delete.retention.ms


(default is 24 hours).

Log Compaction Myth Busting


It doesn’t prevent you from pushing duplicate data to Kafka.

 De-duplication is done after a segment is committed.


 Your consumers will still read from tail as soon as the data arrives.

It doesn’t prevent you from reading duplicate data from Kafka. Same points as above.

Log compaction can fail from time to time.

 It’s an optimization and it the compaction thread might crash.


 Make sure you assign enough memory to it and that it gets triggered.
 Restart Kafka if log compaction is broken.

You can’t trigger Log Compaction using a API call (for now…)

How it works

Log compaction log.cleanup.policy=compact is impacted by:

 segment.ms (default 7 days): Max amount of time to wait to close active segment.
 segment.bytes (default 1G): Max size of a segment.
 min.compaction.lag.msg (default 0): How long to wait before a message can be
compacted.
 delete.retention.ms (default 24 hours): Wait before deleting data marked for
compaction.
 min.cleanable.dirty.ratio (default 0.5): Higher => less, more efficient cleaning. Lower
=> opposite.
Unclean Leader Election
unclean.leader.election.enable
If all your In Sync Replicas go offline (but you still have out of sync replicas up), you have the
following option:

 Wait for an ISR to come back online (default)


 Enable unclean.leader.election.enable = true and start producing to non ISR partitions.

If you enable unclean.leader.election.enable = true, you improve availability, but you will lose
data because other messages on ISR will be discarded when they come back online and
replicate data from the new leader.

Overall, this is a very dangerous setting, and its implications must be understood fully before
enabling it.

Use cases include metrics collection, log collection, and other cases where data loss is
somewhat acceptable, at the trade-off of availability.
Large Messages in Apache Kafka
Kafka has a default of 1 MB per message in topics, as large messages are considered inefficient
and an anti-pattern.

Two approaches to sending large messages:

1. Using an external store: store messages in HDFS, Amazon S3, Google Cloud Storage,
etc.. and send a reference of that message to Apache Kafka.
2. Modifying Kafka parameters: must change broker, producer and consumer settings.

Option 1: Large Messages using External Store


Store the large message (e.g. video, archive file, etc…) outside of Kafka.

Send a reference of that message to Kafka.

Write custom code at the producer/consumer level to handle this pattern.

Option 2: Sending large messages in Kafka (ex: 10 MB)


Topic-wise, Kafka-size, set max message size to 10MB:

 Broker side: modify message.max.bytes.


 Topic side: modify max.message.bytes.
 Warning: the settings have similar but different names, it’s not a typo!

Broker-wise, set max replication fetch size to 10MB:

 replica.fetch.max.bytes = 10485880 (in server.properties)

Consumer-side, must increase fetch size of the consumer will crash:

 max.partition.fetch.bytes = 10485880

Producer-side, must increase the max request size:

 max.request.size = 10485880

You might also like