KEMBAR78
Linux Chapter With TOC OCR | PDF | Gnu | Process (Computing)
0% found this document useful (0 votes)
33 views107 pages

Linux Chapter With TOC OCR

The document provides a comprehensive overview of Linux fundamentals, including its history, command line importance, user management, file systems, and system initialization processes. It covers topics such as the differences between SysV init and systemd, user and group management, file permissions, and storage management techniques like LVM. Additionally, it emphasizes the significance of the command line interface and offers practical commands for managing services and file systems.

Uploaded by

pracheth sp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views107 pages

Linux Chapter With TOC OCR

The document provides a comprehensive overview of Linux fundamentals, including its history, command line importance, user management, file systems, and system initialization processes. It covers topics such as the differences between SysV init and systemd, user and group management, file permissions, and storage management techniques like LVM. Additionally, it emphasizes the significance of the command line interface and offers practical commands for managing services and file systems.

Uploaded by

pracheth sp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

Table of Contents

Linux Fundamentals .................................... 2

A Brief History Lesson .................................... 5

Why Command Line Matters .................................... 11

First Things First Hit That Power Button .................................... 12

SysV Init Traditional .................................... 18

Socket and Timer Units .................................... 23

Introduction to User Management .................................... 26

Creating and Managing Users .................................... 27

Group Management .................................... 28

Understanding File Permissions .................................... 29

Changing Permissions and Ownership .................................... 30

Special Permissions .................................... 31

Understanding Linux File Systems .................................... 36

Mounting and Unmounting File Systems .................................... 37

Managing Swap Space .................................... 40

Monitoring Storage Usage .................................... 41

File System Troubleshooting .................................... 42

Introduction to LVM Logical Volume .................................... 43

Creating and Managing Logical Volumes .................................... 44

Understanding Processes in Linux .................................... 47

Introduction to Shell Scripting .................................... 57

Using Variables in Shell Scripts .................................... 58

Conditionals in Shell Scripts .................................... 59

Loops in Shell Scripts .................................... 62

Realworld Example .................................... 64

Automating Tasks with Shell Scripts .................................... 68

Scheduling Jobs with Cron .................................... 72

1 Howcron Works .................................... 73

Understanding Load Averages and CPU .................................... 82

Commonly Adjusted ulimit Parameters .................................... 85

Commonly Adjusted sysctl Parameters .................................... 88

Making sysctl Changes Persistent .................................... 89

Optimizing System Performance with .................................... 91


Table of Contents (cont.)
Keep Logs Manageable .................................... 100

Diagnosing NetworkRelated Problems .................................... 102

RealWorld Use Case .................................... 103


The Path to Become an SRE
Engineer
Abe Bazouie
Linux Fundamentals
Linux, Unix or Minix. Wait … what???
What Is Linux, Really?

● Linux is an open-source operating system kernel that powers millions


of devices worldwide, from servers to smartphones.
● It’s a Unix-like system, inspired by its predecessors (Unix and Minix).
● Think of Linux as the foundation of a house—everything else (like your
applications) is built on top of it.
A Brief History Lesson

Minix, developed by
Andrew Tanenbaum
1987

1969 1987
Unix was created Linus Torvalds,
at Bell Labs by
built Linux as a
Dennis Ritchie
and Ken personal project.
Thompson
Why Linux Rocks

Why Should You Care About Linux?

● Powers 96.3% of the world’s top servers (including Google, Facebook,


and Netflix).
● Free and open-source (use it, modify it, share it).
● Highly reliable, secure, and customizable.
● Fun Fact: Even Android runs on Linux!
Unix vs. Linux vs. Minix

● Unix: The OG operating system, expensive and


proprietary.
● Minix: Unix’s lightweight teaching-oriented cousin.
● Linux: The open-source, community-driven, and infinitely
customizable offspring.
The GNU Project – The Building Blocks of
Linux

● Founded in 1983 by Richard Stallman, the


GNU Project aimed to create a free
Unix-like operating system.
● GNU stands for “GNU's Not Unix” (a fun
recursive acronym!).
● Provided essential utilities like compilers,
editors, and shell programs—everything
except the kernel.
The Philosophy of GNU

● The GNU Project championed the idea of free software (freedom, not price).
● Created the GPL (GNU General Public License) to ensure software freedom.
● Inspired the open-source movement, which drives modern software
development.
What Makes Linux Tick? Think of it Like a Restaurant

● Kernel – The Chef


○ The kernel is like the chef in a restaurant.
○ It takes care of the ingredients (hardware like CPU, memory, and storage) and cooks (manages)
them to serve the dish (processes).
○ You never talk to the chef directly.

● Shell – The Waiter


○ The shell is like the waiter who takes your order (commands).
○ It listens to what you want, passes the request to the chef (kernel), and brings the results
(output) back to you.
○ Common shells: Bash, Zsh, etc.

● Userland – The Dining Room


○ The userland is like the dining area, where you sit, relax, and enjoy.
○ It includes the menu (applications), utilities (like salt and pepper shakers), and everything you
directly interact with.
○ Examples: Text editors, browsers, and system tools like ls.
Why Command Line Matters

The Power of the Command Line

● Direct communication with the system.


● Allows automation and scripting.
● Fun Fact: Command-line wizards are 50% cooler than GUI users (source: me :D).
First Things First: Hit That Power Button!

What happen once you push the Power button to turn on your computer?

● Boot sequence: BIOS → Bootloader → Kernel → Init.


● The init process is like the conductor of an orchestra—it starts and manages
all the system processes.
What’s Next?

● Deep dive into the init process: What it is, how it works, and why it matters.
● Understanding systemd, the modern init replacement.
Introduction to the init Process

● The init process is the first process


started by the Linux kernel after booting.

● It has PID 1, meaning it's the parent of all


other processes on the system.

● Its job is to start system services and get


the system ready for use (logging in,
starting services like networking, etc.).

● Different types of init systems have


existed, such as:
○ SysV init (older method)
○ systemd (modern replacement)
Init1: parent of other processes
Init1: kill zombie processes
The Role of init in System Startup

How Does the init Process Work?

SysV init: Works by using runlevels to control which services start at boot. Each
runlevel represents a different system state (e.g., multi-user mode, single-user mode).

Init scripts: In SysV init, scripts located in /etc/init.d/ or /etc/rc.d/ define which
services start.

Runlevels:
● Runlevel 0: Halt (shuts down the system)
● Runlevel 1: Single-user mode (for system maintenance)
● Runlevel 3: Multi-user mode (text-based login)
● Runlevel 5: Multi-user mode (graphical login)
● Runlevel 6: Reboot
Systemd vs SysV Init: Modern Targets vs Traditional Runlevels

SysV Init (Traditional)


● Uses runlevels (0-6) to define system states.
Example:
○ Runlevel 3: Multi-user (text).
○ Runlevel 5: Multi-user (graphical).

Systemd (Modern)
● Replaces runlevels with target units.
Example Targets:
○ multi-user.target: Multi-user mode.
○ graphical.target: Graphical mode.

Key Improvement
● Targets are more descriptive and flexible, enabling faster boot times and custom
system states.
Systemd vs SysV
SysV Runlevel systemd Target Description

Runlevel 0 poweroff.target Shuts down the system

Runlevel 1 rescue.target Single-user mode for


maintenance tasks

Runlevel 3 multi-user.target Multi-user mode with no


GUI (CLI only)

Runlevel 5 graphical.target Multi-user mode with


GUI (Graphical Login)

Runlevel 6 reboot.target Reboots the system


Introduction to systemd

systemd is the default init system in most modern Linux distributions.

Replaces older init systems like SysV init to manage services, processes, and
system boot.

It uses units to manage system components.


Managing Services with systemd

Start, stop, enable, or check the status of services using systemctl.

Example commands:

● systemctl start nginx.service: Start a service.


● systemctl status sshd.service: Check the status of a service.
● systemctl enable httpd.service: Enable a service to start on boot.
Target Units

Understanding Target Units:

Target units are used to group other units. For example:

● multi-user.target: Boots the system into multi-user mode (text-based).


● graphical.target: Boots the system into graphical mode (GUI).

Use systemctl isolate multi-user.target to switch between modes.


Socket and Timer Units

● Socket Units: Manage network connections (e.g., sshd.socket listens for


SSH connections).
● Timer Units: Schedule tasks based on time (e.g., logrotate.timer rotates
logs daily).

Examples:

● systemctl start sshd.socket: Start the socket for SSH connections.


● systemctl list-timers: View all active timers.
Journaling and Logs

systemd uses journald to manage logs for services and system processes.

View logs using journalctl:

● journalctl -b: View logs from the current boot.


● journalctl -u nginx.service: View logs for a specific service.
Introduction to User Management

● Each user is identified by a UID (User ID) and stored in /etc/passwd.

● Passwords are stored in hashed form in /etc/shadow.

● Users can be assigned a primary group and secondary groups.


Creating and Managing Users

useradd to create a new user.

usermod to modify user properties.

userdel to delete a user and their home directory.

Example commands:

● useradd abe
● usermod -aG sudo abe
● userdel -r abe
Group Management

● Groups provide a way to assign permissions to multiple users.

● Use groupadd, usermod -G, and groupdel for group management.

● Understanding primary groups and secondary group memberships.

Diagram ✅
Understanding File Permissions

Every file has read (r), write (w), and execute (x) permissions.

Permissions are assigned to three classes: owner, group, and others.

Example:

● rwxr-xr-- means:
○ rwx for the owner.
○ r-x for the group.
○ r-- for others.
Changing Permissions and Ownership

Use chmod to change file permissions.

chown and chgrp to change file and group ownership.

Example:

● chmod 755 filename


● chown abe:developers file.txt
Special Permissions

● Setuid (chmod u+s): Executes the file with the permissions of the file owner.
● Setgid (chmod g+s): Files created in a directory inherit the group of the
directory.
● Sticky Bit (chmod +t): Only the owner can delete or modify files in a directory.

Diagram ✅
Managing Privileges with sudo

● sudo (superuser do) allows users to execute commands with root


privileges without needing to log in as the root user.
● The sudoers file (located at /etc/sudoers) defines which users and groups
have sudo access.
● To edit the sudoers file, use the visudo command to avoid syntax errors.
● Best Practice: Avoid logging in as root directly. Instead, assign specific
commands to users via sudo for security.

Commands:

● sudo command: Run a command with root privileges.


● sudo -i: Start an interactive shell with root privileges.
● sudo visudo: Edit the sudoers file safely.
Configuring sudo Permissions

sudoers file syntax:

● Format: user ALL=(ALL:ALL) ALL


● You can allow users to run specific commands by specifying them.
● Example: Allow a user to only run the systemctl command:
○ abe ALL=(ALL) /bin/systemctl

Using user groups to manage sudo access:

● Adding users to the sudo group grants them root privileges:


○ usermod -aG sudo username
Understanding the "ALL" in sudoers Syntax

user ALL=(ALL:ALL) ALL

1. user: The specific user account or group you’re granting sudo permissions to.
2. The first "ALL":
○ Meaning: This means that the user can run commands on all hosts (useful in
multi-host setups). If you’re only administering a single machine, this means "all
commands on this machine."
3. The (ALL:ALL) part:
○ The first "ALL" (inside the parentheses): This represents the target user. It means
the user can execute commands as any user on the system (including root).
○ The second "ALL" (inside the parentheses): This represents the target group. It
means the user can execute commands as any group.
4. The last "ALL":
○ Meaning: This means that the user can run all commands (as opposed to
specifying particular commands).
File Systems
and Storage
Management
Understanding Linux File Systems

A file system manages how data is stored and retrieved from a disk.

Common Linux file systems:

● ext4: The most common default file system.


● XFS: Known for handling large files efficiently.
● Btrfs: Offers advanced features like snapshots and pooling.

Key terms: Mounting, Partitions, Swap Space.


Mounting and Unmounting File Systems

● Mounting attaches a file system to a specific directory, making it accessible.


● Commands:
a. mount /dev/sda1 /mnt: Mount a file system.
b. umount /mnt: Unmount a file system.
● Persistent mounts are configured in /etc/fstab for automatic mounting at
boot.
Linux Filesystem Hierarchy

The Linux Filesystem Hierarchy Standard (FHS) defines the directory


structure and contents.

It starts with the root / directory, which contains other key directories like
/bin, /etc, /var, etc.

Each directory has a specific purpose:

● /bin: Essential command binaries (e.g., ls, cp).


● /etc: Configuration files for the system.
● /usr: User applications and files.
● /var: Variable files like logs and databases.
Understanding Key Directories in the
Filesystem
● /root: Home directory for the root user.
● /tmp: Temporary files.
● /var/log: System log files.
● /home: Home directories for non-root users.
● /mnt: Temporary mount points.
● /opt: Optional software.
● /dev: Device files, such as hard drives and USB devices.
● /proc: Information about running processes.

Diagram ✅
Managing Swap Space

Swap ????

● Swap space is used as virtual memory when the


system runs out of physical RAM.
● To create a swap file:
○ fallocate -l 1G /swapfile
○ mkswap /swapfile
○ swapon /swapfile
● Monitor swap usage using the free command.
Monitoring Storage Usage

Common tools:

● df: Shows disk space usage.


○ df -h: Shows human-readable disk space usage.
● du: Shows directory and file sizes.
○ du -sh /var/log: Shows the size of a directory.
● Quota: Disk space management for users.

Best practice: Regularly monitor and clean up disk space to avoid downtime.
File System Troubleshooting

● fsck: A tool for checking and repairing file system errors.


a. fsck /dev/sda1: Check and repair a file system.
● Use df and du for diagnosing disk space issues.
● Mounting options: Use options like ro (read-only) or noatime to control
mount behavior.
Introduction to LVM (Logical Volume
Management)

What is LVM?

● LVM allows you to manage and resize storage dynamically.


● LVM Structure:
○ Physical Volumes (PVs): The physical disks or partitions.
○ Volume Groups (VGs): Groups of physical volumes.
○ Logical Volumes (LVs): The storage units you create and manage.
● Key benefit: You can resize, add, or remove volumes without rebooting the system.

Diagram ✅
Setting Up and Managing LVM

Creating and Managing Logical Volumes

● Example commands:
○ pvcreate /dev/sda1: Create a physical volume.
○ vgcreate myvg /dev/sda1: Create a volume group.
○ lvcreate -L 10G -n mylv myvg: Create a logical volume.
○ mkfs.ext4 /dev/myvg/mylv: Format the logical volume with a file system.
○ mount /dev/myvg/mylv /mnt: Mount the logical volume.
● Resizing logical volumes:
○ lvextend -L +5G /dev/myvg/mylv: Increase the size of the logical volume.
○ resize2fs /dev/myvg/mylv: Resize the file system to match the logical
volume.
Monitoring LVM

● Use vgdisplay to show information about volume groups.


● Use lvdisplay to check logical volume details.
● Example:
○ vgdisplay myvg
○ lvdisplay /dev/myvg/mylv
● Best practice: Regularly monitor volume groups and logical volumes to
ensure they have enough space.
Summary of File Systems and Storage
Management

You should now know how to:

● Mount and unmount file systems.


● Manage swap space.
● Monitor disk usage using df and du.
● Troubleshoot file system issues with fsck.
● Set up and manage LVM for dynamic storage allocation.

Best practice: Keep storage well-monitored to prevent performance issues.


Introduction to Process Management

Understanding Processes in Linux:

● A process is a running instance of a program.


● Processes can be in the foreground (interacting with users) or background
(running without user interaction).
● Processes have different priorities, which can be adjusted with nice and
renice.
● Key Concepts:
○ Foreground vs. Background processes.
○ Process ID (PID): Every process has a unique identifier.
○ Parent and child processes: Processes can create other processes.

Diagram ✅
Managing Process Priorities with nice and
renice

nice: Adjusts the priority of a process when it starts.

● Lower nice value = higher priority.

renice: Changes the priority of an already running process.

Example commands:

● nice -n 10 myscript.sh: Start a script with lower priority.


● renice -n -5 1234: Change the priority of process 1234 to a higher priority.

Diagram ✅
Monitoring Processes with top and htop

Using top and htop to Monitor Processes:

● top: Displays real-time system summary, including CPU, memory usage,


and active processes.
○ Use top to identify resource-hungry processes.
● htop: An improved, interactive version of top with a more user-friendly
interface.
● Key commands:
○ top: Start the process monitor.
○ htop: Start an interactive process monitor.

Go deep into “top” …

Diagram ✅
Top, detail …
top, go deep to resources
Managing Processes with ps and kill

Viewing and Managing Processes with ps and kill

● ps: Lists processes running on the system. Use it to get details about
processes.
○ ps aux: List all running processes with details.
● kill: Sends signals to terminate or control processes.
○ kill -9 PID: Forcefully terminate a process.
● Key Concepts:
○ PID: Process ID, used to manage specific processes.
○ Signals: Control how processes are managed, such as termination
(SIGKILL) or stopping (SIGSTOP).

Diagram ✅
Managing Services with systemd

Configuring and Monitoring Services with systemd:

● systemd is the modern init system used to manage services and


processes in Linux.
● Key commands:
○ systemctl start service: Start a service.
○ systemctl stop service: Stop a service.
○ systemctl status service: Check the status of a service.
● Use journalctl to view service logs:
○ journalctl -u service: View logs for a specific service.
Advanced Monitoring with strace and lsof

● strace: Monitors system calls made by a process. Useful for


troubleshooting:
○ strace -p PID: Monitor system calls for a specific process.
● lsof: Lists open files and network connections for a process:
○ lsof -p PID: List files opened by a process.
● Use cases:
○ strace helps identify why a process is stuck or misbehaving.
○ lsof helps track what files or sockets are being used by a process.

Diagram ✅
Summary of Process Management and
Monitoring

You should now understand:

● How to manage system processes with ps, kill, and systemctl.


● How to monitor processes using top, htop, strace, and lsof.
● The importance of adjusting process priorities with nice and renice.

Best Practice: Monitor critical processes regularly and use tools like strace and
lsof during incident response to pinpoint problems.
Shell Scripting
Introduction to Shell Scripting

● A shell script is a program written for the shell (command-line interpreter)


to automate repetitive tasks.
● Bash is the most common shell used for scripting in Linux.
● Advantages: Automating routine tasks, speeding up workflows, and
ensuring consistency.
● Key concepts in shell scripting:
○ Variables
○ Conditionals
○ Loops
Variables in Shell Scripts

Using Variables in Shell Scripts

Variables store data that can be reused in the script.

Defining a variable:

● my_var="Hello, World!"
● Accessing the variable: echo $my_var

Variables can hold strings, numbers, and even command outputs.

Example:
name="Abe"
echo "Hello, $name!"

Diagram ✅
Conditionals in Shell Scripts

Conditionals allow the script to take different actions based on conditions.

Example using if statements:


if [ $age -ge 18 ]; then
echo "You are an adult."
else
echo "You are not an adult."
fi
Understanding “if” Statements

What is an if statement?:

● An if statement is used to execute commands based on conditions.


● The basic structure is:

if [ condition ]; then
# Commands to execute if condition is true
else
# Commands to execute if condition is false
fi
Understanding “if” Statements

Common conditions:
● Check if a file exists: [ -e /path/to/file ]
● Compare numbers: [ $var -eq 5 ]
● String comparisons: [ "$var" = "Hello" ]

Real-world Example:
● A script that checks if a directory exists before creating it:

if [ -d "/backup" ]; then
echo "Backup directory exists."
else
mkdir /backup
echo "Backup directory created."
fi
Loops in Shell Scripts
Understanding “for” Loops in Shell Scripting

What is a for loop?:

● A for loop iterates over a list of items and executes commands for each
item.
● The basic structure is:

for item in list; do


# Commands to execute for each item
done
Understanding “for” Loops in Shell Scripting

Real-world Example:

● A script that processes all .log files in a directory:

for file in /var/log/*.log; do


gzip "$file"
echo "$file has been compressed."
done

This loop iterates over each .log file in the /var/log directory and compresses it.
Understanding “while” Loops in Shell Scripting

What is a while loop?:

● A while loop repeats commands as long as a condition is true.


● The basic structure is:

while [ condition ]; do
# Commands to execute
done
Understanding “while” Loops in Shell Scripting

Real-world Example:

● A script that pings a server until it responds:

counter=1
while [ $counter -le 5 ]; do
echo "Counter: $counter"
((counter++))
done
Understanding “while” Loops in Shell Scripting

Another Real-world Example:

● A script that pings a server until it responds:

while ! ping -c 1 google.com &> /dev/null; do


echo "Waiting for the server to respond..."
sleep 5
done
echo "Server is up!"

This script repeatedly pings google.com until a response is received.


Automating Tasks with Shell Scripts

● Examples of tasks that can be automated:


○ Backups: Automate backing up important files or directories.
○ Log Rotation: Rotate logs periodically to prevent large log files.
○ System Monitoring: Automate resource monitoring and alerting.
● Example: Automating a backup task using tar:

tar -czf /backup/home_backup.tar.gz /home/user

● Use cron to schedule this script daily.


Scheduling Jobs with Cron

What is cron?:

● cron is a time-based job scheduler in Unix-like systems. It allows you to


schedule scripts or commands to run at specific intervals (e.g., hourly, daily).
● The cron daemon (crond) runs in the background and checks for scheduled
tasks.

What is crontab?:

● crontab is the cron table where you define the schedule for cron jobs.
● crontab file contains a list of cron jobs and their schedules for a user or
system.
Scheduling Jobs with Cron

● User crontabs: Each user can have their own crontab file, edited with
crontab -e. These are stored in /var/spool/cron/crontabs (exact location
may vary depending on the distribution).
● System-wide crontab: Located at /etc/crontab, this file is used for
scheduling system-wide tasks.
● Other cron directories:
○ /etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, and
/etc/cron.monthly allow for scheduling scripts to run at hourly, daily,
weekly, or monthly intervals.

Key Differences between cron and crontab:

● cron refers to the background daemon that runs scheduled tasks.


● crontab is the file (or command) where the scheduling of tasks is defined.
Scheduling Jobs with Cron

Crontab Syntax:
● * * * * * /path/to/script.sh
○ Minute (0-59)
○ Hour (0-23)
○ Day of the Month (1-31)
○ Month (1-12)
○ Day of the Week (0-7, where 0 and 7 are both Sunday)
● Example: Run a backup script daily at midnight:
○ 0 0 * * * /home/user/backup.sh

Managing crontab:
● Edit crontab: crontab -e
● View crontab: crontab -l
● Remove crontab: crontab -r
Scheduling Jobs with Cron

Each user has their own crontab file, where scheduled jobs are defined.

Crontab format:

● * * * * * /path/to/script.sh: Runs the script at a specific interval (e.g., every


minute).

Example to run a backup script daily at midnight:

0 0 * * * /home/user/backup.sh
Yeah … still Cron … but last one!

1. How cron Works:


○ The cron daemon (crond) reads all crontab files (user-specific and
system-wide) and cron directories.
○ It checks for tasks that match the current time and executes them.
2. User vs. System-wide Crontab:

○ User crontabs are specific to each user and are edited using crontab -e. These
tasks run with the user's permissions.
○ System-wide crontab (/etc/crontab) can include tasks that affect the whole
system and specify the user who should run the command.
3. Crontab Management:

○ Using crontab -e to edit the crontab file is safer than manually editing the file in
/var/spool/cron/crontabs.
○ The cron directories (/etc/cron.*) are often used for simple scripts that should
run on a regular basis without needing to edit the crontab file directly.
I lied … this is the last one :)

Recap the key points:

● cron is your go-to for automating tasks.


● Use crontab to define when and what tasks to run.
● Keep an eye on log files to ensure everything runs smoothly.

Reminder:

● Automate those repetitive tasks and make your life easier—just set it and
let cron handle the rest!
Scheduling One-Time Jobs with “at”

at schedules a one-time job to run in the future.

Example:

at 3:00 PM tomorrow
at> /path/to/script.sh

Use cases: Running tasks later without having them repeat, e.g., restarting a server, running maintenance scripts.

Use atq to view pending jobs and atrm to remove them.


System
Performance
Tuning
Introduction to System Performance Tuning

To understand how to monitor and optimize system performance for reliability.

Why It Matters:

● Proactive monitoring helps detect issues before they become critical.


● Optimization ensures efficient use of resources, reducing costs and
increasing system stability.

Key Areas to Focus On:

● CPU Utilization
● Memory Usage
● Disk I/O
● System Limits
Monitoring CPU and Memory with “top” and “free”

top:

● Provides a real-time overview of CPU and memory usage.


● Key metrics to monitor:
○ %CPU: CPU usage of each process.
○ %MEM: Memory usage of each process.
○ Load Average: Represents the system load over 1, 5, and 15 minutes.
● Example: Use top to identify processes that are consuming the most
resources.

free:

● Displays total, used, free, and available RAM and swap space.
● Example: free -h gives a human-readable summary of memory usage.
Monitoring Disk I/O with “iostat”

Using iostat for Disk I/O Monitoring

iostat:

● Part of the sysstat package, it helps monitor disk I/O and CPU
performance.
● Provides statistics on disk reads/writes and CPU load.
● Example: iostat -x 5 provides extended I/O stats every 5 seconds.

Key Metrics:

● tps: Transactions per second.


● kB_read/s and kB_wrtn/s: Amount of data read and written per second.
● %util: Percentage of time the disk is busy. High values may indicate a
bottleneck.
System Resource Monitoring with “vmstat”
and “sar”
Using vmstat and sar for System Monitoring

vmstat:

● Reports information about processes, memory, paging, block I/O, and CPU
activity.
● Example: vmstat 5 displays system stats every 5 seconds.
● Key metrics:
○ r: Number of runnable processes (CPU queue length).
○ si/so: Swap-in and swap-out rates.
○ us/sy/id: CPU time spent in user/system/idle.
System Resource Monitoring with “vmstat”
and “sar”

Using vmstat and sar for System Monitoring

sar:

● Part of the sysstat package, it provides historical system performance


data.
● Example: sar -u 5 10 reports CPU usage every 5 seconds for 10 intervals.
● Historical analysis: Compare past performance trends to current data.
Understanding Load Averages and CPU
Utilization

Interpreting Load Averages and CPU Utilization

Load Average:

● Represents the average number of processes waiting for CPU time.


● Example Output: 0.50, 0.75, 1.25 for 1, 5, and 15 minutes.
● A load average of 1.0 means 1 process is waiting for CPU time on a
single-core system.
● For multi-core systems, divide by the number of cores to assess load.
Understanding Load Averages and CPU
Utilization

CPU Utilization:

● %us: User space processes.


● %sy: System/kernel processes.
● %wa: Time CPU spends waiting for I/O operations.
● %id: Idle time (low values can indicate a busy system).

Example: Compare load averages to CPU utilization to determine if the system


is CPU-bound.

Diagram ✅
Understanding ulimit

What is ulimit?
● ulimit is a shell command that allows you to control user-level resource limits on a
Linux system.
● These limits are essential for preventing resource exhaustion, such as excessive
CPU usage or too many open files, which can degrade system performance.

Why It Matters for SREs:


● Proper use of ulimit helps maintain system stability by capping resource usage for
user processes.
● Helps avoid scenarios where a misbehaving application consumes all system
resources, leading to a Denial of Service (DoS).

Basically it shows the max size/number of buffer size, core files, scheduling priority, file
locks, threads ...
Understanding ulimit

Commonly Adjusted ulimit Parameters:

● ulimit -n: Sets the maximum number of open file descriptors.


○ Example: ulimit -n 65535 increases the maximum number of open files.
○ Relevance: Increasing this limit is critical for high-traffic servers that handle many
simultaneous connections (e.g., web servers, databases).

● ulimit -u: Sets the maximum number of user processes.


○ Example: ulimit -u 2048 limits the user to 2048 processes.
○ Relevance: Prevents a user or application from creating too many processes, which
could overwhelm the system.

● ulimit -c: Controls the core dump size for debugging.


○ Example: ulimit -c unlimited allows core dumps of any size.
○ Relevance: Useful for troubleshooting application crashes by analyzing core dumps.
Understanding ulimit

Hands-On Example: Adjust the open file limit and apply it:

ulimit -n 10240 # Set maximum open files to 10,240

Pro Tip: Make changes permanent by editing configuration files, like


/etc/security/limits.conf.
Understanding sysctl
What is sysctl?

● sysctl is a Linux tool that allows you to modify kernel parameters at


runtime.
● It is often used to tune network, memory, and security settings.
● Why It Matters for SREs:
○ Helps optimize kernel settings for high performance and low latency
environments.
○ Allows you to make fine-tuned adjustments to the system to handle
production workloads.
Understanding sysctl

Commonly Adjusted sysctl Parameters:

● net.core.somaxconn: Sets the maximum number of queued connections.


○ Example: sysctl -w net.core.somaxconn=1024
○ Relevance: Important for web servers to handle a high number of incoming
connections without dropping packets.

● vm.swappiness: Controls swap usage.


○ Example: sysctl -w vm.swappiness=10
○ Relevance: Lowering this value reduces the system's tendency to swap memory,
which can improve performance for memory-intensive applications.

● fs.file-max: Sets the maximum number of file handles the kernel can allocate.
○ Example: sysctl -w fs.file-max=100000
○ Relevance: Essential for applications that need to open many files simultaneously,
like large databases or logging systems.
Understanding sysctl

Making sysctl Changes Persistent:

● Edit the file /etc/sysctl.conf and add the desired parameters:

net.core.somaxconn = 1024
vm.swappiness = 10
fs.file-max = 100000

● Apply changes with:


sysctl -p

Pro Tip: Always test sysctl changes in a staging environment before applying
them in production.
Best Practices for Using ulimit and sysctl in Production

Understand the Impact:


● Improperly configuring ulimit or sysctl can negatively impact system stability.
● Always research the effects of each parameter before applying it.
Test in Staging:
● Test adjustments in a staging environment that mirrors production.
● Monitor for performance improvements and potential side effects.

Document Your Changes:


● Record any changes made to ulimit and sysctl settings.
● Keep notes on why the change was made and how it impacted performance.

Monitor After Applying Changes:


● Use tools like top, sar, and vmstat to monitor system resource usage after making
changes.
● Look for improvements in CPU utilization, memory usage, and network performance.
Optimizing System Performance with
“ulimit” and “sysctl”

Advanced Tuning with ulimit and sysctl for SREs

Understand how to use ulimit and sysctl to optimize system performance for
production environments.

But what is ulimit and sysctl ???


Troubleshooting
and Log
Management
Introduction to Troubleshooting and Log Management

Importance for SREs:

● Logs provide a record of system activities and errors, making them crucial
for diagnosing issues.
● Proactive log monitoring helps prevent issues from escalating into critical
incidents.

Focus Areas:

● Navigating logs in /var/log/.


● Using journalctl for systemd logs.
● Investigating boot failures, disk space issues, and system crashes.
● Diagnosing network issues with essential tools.
Navigating Logs in /var/log/

What is /var/log/?

● Directory where system logs are stored on Linux.


● Contains logs for system events, authentication, application errors, and more.

Key Log Files:

● /var/log/messages: General system logs (non-systemd).


● /var/log/syslog: System messages and logs from various services.
● /var/log/auth.log: Authentication and login attempts.
● /var/log/dmesg: Kernel ring buffer logs (hardware and boot messages).
Navigating Logs in /var/log/

Example:

● Use tail -f /var/log/syslog to monitor logs in real-time.


● Use grep to search for specific errors:

grep -i "error" /var/log/syslog


Using journalctl for Systemd Logs

What is journalctl?

● A command for querying and displaying logs managed by systemd's


journald.
● Allows filtering logs by service, priority, date, and boot sessions.

Key Commands:

● View all logs: journalctl -xe (shows logs with extra detail).
● Filter logs by time: journalctl --since "2023-10-01" --until "2023-10-02"
● View logs for a specific service: journalctl -u nginx
● View logs from the previous boot: journalctl -b -1
Using journalctl for Systemd Logs

Example:

● Use journalctl -u sshd to troubleshoot SSH login issues.


● Filter logs for critical errors:

journalctl -p crit
Managing Logs with logrotate

Automating Log Management with logrotate

What is logrotate?
● A tool that automatically rotates, compresses, and deletes log files based on specified
criteria.
● Helps prevent log files from consuming too much disk space over time.
● Typically used for logs in the /var/log/ directory but can be configured for any log file.

Key Features:
● Rotation: Renames old log files and creates new ones (e.g., syslog becomes syslog.1).
● Compression: Compresses old logs to save space (e.g., .gz format).
● Retention: Keeps a specified number of old log files before deleting them.
● Custom Schedules: Rotate logs daily, weekly, monthly, or based on file size.
Managing Logs with logrotate

Configuration:
● Default configuration is in /etc/logrotate.conf.
● Custom configurations for specific services can be placed in
/etc/logrotate.d/.

Rotate a custom log file daily and keep 7 compressed backups:


/var/log/myapp.log {
Explanation:
● daily: Rotate logs every day. daily
● rotate 7: Keep 7 copies of old logs. rotate 7
● compress: Compress old logs. compress
● missingok: Skip rotation if the log file is missing. missingok
● notifempty: Don’t rotate if the log file is empty. notifempty
create 0640 root root
}
Best Practices for Using logrotate

Keep Logs Manageable:


● Rotate logs daily for high-activity logs (e.g., web server logs).
● Use weekly or monthly rotation for less active logs.

Use Compression:
● Compressing logs saves disk space, especially for logs that contain a lot of text data.
● Use compress in the configuration to automatically gzip old logs.

Adjust Retention Based on Needs:


● For compliance: Retain logs for longer periods (e.g., rotate 30 for a month).
● For space management: Retain fewer logs to prevent disks from filling up.

Monitor Log Rotation:


● Review /var/lib/logrotate/status to see the status of rotated logs.
● Check logs for logrotate activity in /var/log/cron or /var/log/messages.
Investigating Common Issues

Boot Failures:

● Use dmesg and journalctl to check kernel logs for errors during boot.
● Look for error messages related to hardware or missing files.
● Example: journalctl -b to see logs from the latest boot.

Disk Space Issues:

● Use df -h to check available disk space:


○ Identify which partitions are filling up.
● Use du -sh /path to find large files or directories.

System Crashes:

● Check for OOM (Out of Memory) errors using dmesg or journalctl.


● Look in /var/log/messages or /var/log/syslog for panic or segfault entries.
Diagnosing Network-Related Problems

Essential Tools for Network Troubleshooting:

● ping: Test connectivity to a host.


○ Example: ping google.com to check internet connectivity.
● traceroute: Identify the path packets take to reach a host.
○ Example: traceroute 8.8.8.8 to see the route to Google's DNS server.
● netstat: Show network connections, routing tables, and interface statistics.
○ Example: netstat -tuln to display listening ports.
Diagnosing Network-Related Problems

Real-World Use Case:

● Use ping to check if a server is reachable during an incident.


● Use traceroute to identify where packets are getting delayed or dropped.
● Use netstat to find which processes are using network ports, useful for
identifying rogue services.

Hands-On Example:

● Run traceroute and interpret the results to identify network bottlenecks.


● Use netstat to find and terminate a problematic process:

netstat -tuln | grep ":80"


Real-World Log Analysis Scenario

Scenario: A web server is slow to respond. How do you diagnose the issue?

Step-by-step Analysis:

1. Check Web Server Logs: journalctl -u nginx for errors.


2. Check System Resource Usage: top and free -h to see if the system is under
stress.
3. Check Disk Space: df -h to ensure logs are not filling up the disk.
4. Check Network Activity: netstat to see if unusual connections are affecting
the server.

Outcome: Identify and resolve a misconfigured firewall that was slowing down the
server's response time.
Yes, …You did it!

You might also like