WEBLOGIC SERVER MONITORING BEST PRACTICES
:
overview
Global organizations retain and refine their competitive cutting edge
technologies through innovation, process efficiency and productivity
improvements. In order to support these business needs, with a Weblogic
Application Server, Administrators are constantly learning new features and
adapting to the latest technologies. To keep pace with this trend, there are
more demands for best practices for monitoring the Weblogic servers.
Monitoring
The Weblogic monitoring can be broadly classified as
1. Availability monitoring.
2. Performance monitoring.
Availability Monitoring deals with checking the availability of all the Weblogic
Servers and do a proactive alarm to the Weblogic Administrators, if anything
goes wrong.
Performance Monitoring deals with ongoing monitoring of performance
issues in Weblogic management system like Memory Utilization, CPU
Utilization, Garbage Collection etc and do a proactive alarm to Weblogic
Administrators, if any performance degrade is predicted.
Availability Monitoring
The following adverse conditions with the Weblogic environment can cause
the Weblogic Servers to become unavailable.
1. Weblogic Server Instance Unavailable
2. Weblogic Server Internal Errors
3. Critical File Systems Full
4. Capacity Limits Reached
5. Weblogic Session-Replication Issues
Weblogic Server Instance Unavailable: Weblogic Server Instances run as
processes on the system. Occasionally they may terminate abnormally
causing Weblogic Server to become unavailable. Partial outages also exist
when Weblogic Server is functioning normally for the currently connected
users but will not allow any new connections. Therefore, Weblogic Server
must be monitored from both the process and connection perspectives.
If an instance is, down and is 24/ 7, alarm the Weblogic Administrators
through email or SMS.
Weblogic Server Internal Errors: Weblogic Server writes internal errors
to the Standard Output Log file for the Instance. Although many of the
errors are not critical, all errors should be addressed in a timely manner.
Errors containing critical error codes must be detected and addressed
immediately.
Critical File Systems Full: One of the most critical Weblogic Server
availability issues occurs when the Weblogic Server is unable to process
transactions because the file system allocated for log space is full. Log space
is used by transactions to store Weblogic Server updates and it is sized by
the Weblogic Administrator according to the number and size of the
transactions for that Serevr. Weblogic Server writes error and dump
information to file systems designated by the Weblogic Administrator. The
error and dump files systems are important because Weblogic Server will not
be able to log critical internal errors when error and dump file systems
become full. Internal errors usually lead to unavailability of the Weblogic
Server.
Weblogic Session-Replication Issues: Weblogic Server has the capability
to replicate all Sessions from one Instance to another. Although Weblogic
Server functions correctly when it is unable to replicate data, applications
using the replicated data may become unavailable. To a user, application
unavailability and database unavailability are the same.
Performance Monitoring:
The following are some of the adverse conditions within the Weblogic Server
to have performance degradation.
Poorly Tuned Application: Applications that are poorly tuned typically
manifest themselves via high CPU and I/O utilization on the Weblogic
Server. This is due to the large number of sort and read operations that
must be performed by the Weblogic Server. Poorly tuned Weblogic Servers
typically manifest themselves via high I/O operations. This is due to a lack of
pre-allocated memory for caching and other critical internal processes.
Increasing the amount of memory allocated to the Weblogic Server will
typically solve most Weblogic Server tuning issues. This, however, can
actually decrease performance if there is a lack of memory resources on the
server because the Weblogic cache space will be swapped in and out of
memory. Therefore, it is also critical to review the CPU, memory, and I/O
utilization of the server as a whole. Tuning of server resources is not
addressed in this document.
The following are the key areas:
1. Heap Memory Utilization
2. CPU Utilization
3. Garbage Collection
4. Idle Thread Count
5. Socket Connection
6. Thread Dumps
7. Users Session Locked in the Weblogic Server
8. Multicasting
Lock Conflicts: Poorly tuned applications tend to have components that
lock objects or resources in the Weblogic Server for long periods. Although
Weblogic Server functions correctly under these conditions, the users may
experience poor response times or even application unavailability. It is
important to detect the processes that are preventing others from retrieving
information. Tuning the Weblogic Server executed by that process will
improve the response time of the application. More importantly, the number
of users are being blocked is an indication of the level of application
unavailability.
Note: Frequent lock conflicts typically indicate that there is an application
tuning issue.
Poor Response Time: Applications occasionally have problems that cause
the user to experience poor response times. It is important to measure the
response time of the Weblogic Server so that poor response time periods can
be detected. Once such a period is detected, further research and analysis
can be performed to find the root cause of the response time issues.
Device Errors: Occasional disk errors typically have no adverse effect on
the Weblogic Server. Frequent disk device errors have the potential to cause
Weblogic Server corruption and slow performance. Therefore, monitoring the
disk errors encountered by Weblogic Server is important. Network errors
cause frequent retransmits having an adverse effect on Weblogic Server
performance. It is a good idea to monitor the traffic between your client
applications and the server. An application designed and tuned for a slow
network performs great on a fast network; however, the opposite is not
true.
CPU Over load: CPU time is the amount of time that the Weblogic Server
spent processing the business methods that are deployed in the Weblogic
Server. If this is the main timed event, tuning business methods that are
deployed and/or increasing server CPU resources will provide the greatest
performance improvement.
Conclusion
Both the types of monitoring can be done and measured by using “JRockit
Mission Controller” which can be downloaded from Oracle Weblogic site. This
tool gives a clear picture of both the monitoring and management of
Weblogic Servers.