KEMBAR78
Linux Initialization Process (2) | PPTX
Initialization (2)
Taku Shimosawa
Pour le livre nouveau du Linux noyau
1
Agenda
• Initialization function list
• The list of the functions called from the kernel startup
function (start_kernel)
• The list of the functions called from some function called
from the start_kernel function
• setup_arch
• rest_init, and the following functions
• Initialization topics
• Multiprocessor (SMP) Initialization
2
3. Initialization
At last, we have come here!
3
Initialization Overview
4
Booting Code
(Preparing CPU states, Gathering HW information, Decompressing vmlinux etc.)
arch/*/boot/
arch/*/kernel/head*.S, head*.c
Low-level Initialization
(Switching to virtual memory world, Getting prepared for C programs)
init/main.c (startup_kernel)
Initialization
(Initializing all the kernel features including architecture-dependent parts)
init/main.c (rest_init)
Creating the ā€œinitā€ process, and letting it the rest
initialization
(Setting up multiprocessing, scheduling)
kernel/sched/idle.c (cpu_idle_loop)
ā€œSwapperā€ (PID=0) now sleeps
init/main.c (kernel_init)
Performing final initialization
and
ā€œExecā€ing the ā€œinitā€ user
ā€œinitā€ (PID=1)
arch/*/kernel, arch/*/mm, …Call
vmlinux
start_kernel (1)
5
# Function Category Description
1 lockdep_init Debug Lock validator
2 smp_setup_processor_id* SMP Initialize processor ID (some architecture)
3 debug_objects_early_init Debug Lifetime debugging facility for objects
4 boot_init_stack_canary* Debug Decide the canary value for the stack
protector
5 cgroup_init_early cgroup Early init for some cgroup subsystems
6 boot_cpu_init SMP Set the boot cpu for various cpumasks
7 page_address_init MM Initialize hash for kmap (highmem)
8 setup_arch*
9 mm_init_owner MM Set init_mm’s owner to init_task
10 mm_init_cpumask MM Set the cpu mask pointer to the mm’s cpumask
(only if CPUMASK_OFFSTACK)
11 setup_command_line Init Copy the command line parameter to newly
allocated buffer (allocated by memblock)
12 setup_nr_cpu_ids SMP Set ā€œnr_cpu_idsā€ according to the last bit in
Functions with * : mostly
architecture dependent codes
start_kernel (2)
6
# Function Category Description
13 setup_per_cpu_areas* SMP Allocate and initialize percpu areas
14 smp_prepare_boot_cpu* SMP Prepare for SMP boot
15 build_all_zonelists MM Initializes ā€œzonelistā€
16 page_alloc_init MM Add a handler for CPU hotplug (to drain pages)
17 parse_early_param Init Parse ā€œearlyā€ options
18 parse_args Init Parse the rest of options
19 jump_label_init Option Jump label (self-modification)
20 setup_log_buf Debug Allocate and initialize printk log buffer
21 pidhash_init Sched Initialize PID hash
22 vfs_caches_init FS Initialize various caches (kmem_cache) in VFS
(dcache, inode, mnt, files, …)
23 sort_main_extable MM Sort the exception table (used in page faults)
24 trap_init* CPU Initialize trap handlers
start_kernel (3)
7
# Function Category Description
25 mm_init MM Initialize MM
25A page_cgroup_init_flatmme MM Allocate pages for page_cgroup
25B mem_init* MM Free pages for buddy allocator
25C kmem_cache_init MM Initialize cache
25D percpu_init_late MM Replaces per-cpu chunks with those
allocated by slab
25E pgtable_init* MM Create cache for ptlock and pgtable (SH etc.)
25F vmalloc_init MM Initialize vmalloc
26 sched_init Sched Initialize scheduler
27 idr_cache_init Util Initialize IDR (ID to pointer translation)
28 rcu_init SMP Initialize RCU
29 tick_nohz_init Sched Initialize NOHZ (enable context tracking)
30 radix_tree_init Util Initialize radix tree (create cache, etc.)
31 early_irq_init* CPU Initialize irq_desc.
start_kernel (4)
8
# Function Category Description
32 init_IRQ * CPU Initialize various IRQs (in x86, set gates for
APIC interrupts, etc.)
33 tick_init Timer Tick broadcast (to emulate local timer)
34 init_timers Timer Timer stats, notifier, and timer softirq
35 hrtimers_init Timer hrtimer notifier, and hrtimer softirq
36 softirq_init Sched Tasklet lists, and tasklet softirqs
37 timekeeping_init Timer Clocksource
38 time_init * Timer (Platform-dependent) timer initialization
39 sched_clock_postinit Sched Start the hrtimer
40 perf_event_init Debug Perf events
41 profile_init Debug (Simple) profiler
42 call_function_init SMP Initialize csd (call single data) queue
local_irq_enable CPU At this point, interrupts are enabled
start_kernel (5)
9
# Function Categor
y
Description
43 kmem_cache_init_late MM Post-initialization of cache (slab)
44 console_init Console Call console initcalls
45 lockdep_info Debug Print lockdep information
46 locking_selftest Debug Test spinlocks, rwlocks, mutexes, and
rwsemaphores
47 page_cgroup_init cgroup Page cgroup
48 debug_objects_mem_init Debug Enable dynamic allocation for debugobjects
(#3), and replace static ones with newly
allocated one
49 kmemleak_init Debug kmemleak (Memory leak check facility)
50 setup_per_cpu_pageset MM Per-cpu pageset
51 numa_policy_init MM NUMA (VMA) policy
52 late_time_init* Timer Late initialization
(In x86, HPET and TSC are initialized)
start_kernel (6)
10
# Function Category Description
53 sched_clock_init Sched Set the time info for scheduler
54 calibrate_delay Timer Calibrate for the ā€œdelayā€ functions
55 pidmap_init Process Init PID map for initial PID namespace
56 anon_vma_init MM Create cache for ā€œanon_vmaā€
57 acpi_early_init ACPI ACPI Subsystems, load DSDT
58 thread_info_cache_init Process Allocate cache for thread_info if its size is
less than PAGE_SIZE
59 cred_init Security Task credential
60 fork_init Process Allocate a cache for task_struct
61 proc_caches_init MM Allocate caches for mm_struct, etc.
62 buffer_init FS Allocate a cache for buffer_head
63 key_init Security Allocate a cache for key_jar
64 security_init Security Call security_initcall’s
65 dbg_late_init Debug Late init for kgdb
start_kernel (7)
11
# Function Category Description
66 vfs_caches_init FS Allocate SLAB caches and hashtables for
various VFS caches (dcache, inode_cache, …)
67 signals_init Sched Allocate a cache for sigqueue
68 page_writeback_init MM Initialize the ratio for the dirty pages
69 proc_root_init Procfs Create the root for procfs and some
directories
70 cgroup_init Cgroup Initialize the rest of cgroups
71 cpuset_init Sched The top-level cpuset
72 taskstats_init_early Sched Task statistics exposed to the user level
73 delayacct_init Sched Task delay accounting
74 check_bugs* CPU Fix up for some architecture-dependent bugs
(in x86_64, alternatives are initialized, and
divide the first 2MB page into 4K pages)
75 sfi_init_late SFI Map again the area by using ioremap
start_kernel (8)
12
# Function Category Description
76 ftrace_init Debug ftrace
77 rest_init
setup_arch (x86) (1)
13
# Function Category Description
1 memblock_reserve MM Reserve the text area
2 early_reserve_initrd MM Reserve the initrd area
3 clone_pgd_area, load_cr3 MM Switch to swapper_pg_dir (i386 only)
4 olpc_ofw_detect Platform OLPC OFW Stuff
5 early_trap_init CPU Init debug and int3 gate
6 early_cpu_init CPU Detect CPU’s vendor (registered in
cpu_dev_register: Intel, AMD, Cyrix…) and
calls early_init and bsp_init
7 early_ioremap_init MM Init early ioremap
8 setup_olpc_ofw_pgd Platform OLPC OFW Stuff
9 (Parsing boot parameters) Setup --
10 x86_init.oem.arch_setup Platform OEM-dependent setup (Intel MID etc.)
11 setup_memory_map MM Copy and print e820 information
12 parse_setup_data Setup Parse setup_data in boot_params
setup_arch (x86) (2)
14
# Function Category Description
13 copy_edd Setup Copy BIOS EDD information
14 (prepare init_mm) MM Set start_code, end_code, etc. for init_mm
15 (command line stuffs) Setup
16 x86_configure_nx MM Set ptemask according to whether NX is
supported by CPU
17 parse_early_param Setup (=#17 in start_kernel)
18 x86_report_nx MM Print NX information
19 memblock_x86_reserve_r
ange_setup_data
MM Reserve the setup_data area
20 acpi_mps_check SMP Check if ACPI is disabled and MPS code is not
built-in
21 early_pci_dump_devices Device Dump PCI info before PCI is initialized
22 e820_reserve_setup_data MM Reserve the setup_data area in e820
23 finish_e820_parsing Setup Sanitize e820 info and print e820 info.
setup_arch (x86) (3)
15
# Function Category Description
13 copy_edd Setup Copy BIOS EDD information
14 (prepare init_mm) MM Set start_code, end_code, etc. for init_mm
15 (command line stuffs) Setup
16 x86_configure_nx MM Set ptemask according to whether NX is
supported by CPU
17 parse_early_param Setup (=#17 in start_kernel)
18 x86_report_nx MM Print NX information
19 memblock_x86_reserve_r
ange_setup_data
MM Reserve the setup_data area
20 acpi_mps_check SMP Check if ACPI is disabled and MPS code is not
built-in
21 early_pci_dump_devices Device Dump PCI info before PCI is initialized
22 e820_reserve_setup_data MM Reserve the setup_data area in e820
23 finish_e820_parsing Setup Sanitize e820 info and print e820 info.
setup_arch (x86) (4)
16
# Function Cat. Description
24 dmi_scan_machine DMI Check if DMI (Desktop Management Interface)
is present or not
25 dmi_memdev_walk DMI Walk through the DMI table
26 dmi_set_dump_stack_arch_de
sc
DMI Set architecture description* for dump_stack
27 init_hypervisor_platform VM Get the hypervisor information and init
(e.g. Get Hz using special I/O port when
running on VMWare)
28 probe_roms MM Request resources for Video ROM, Extension
ROMs, etc.
29 insert_resource MM Insert resources for kernel’s code, data, BSS
30 e820_add_kernel_range MM Add kernel code, data areas to e820 if is not
marked as E820_RAM
31 trim_bios_range MM Reserve BIOS areas in e820
(*) Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
CPU: 3 PID: 2763 Comm: irqbalance Tainted: G W 3.14.13 #1
Hardware name: Supermicro X9SRH-7F/7TF/X9SRH-7F77TF, BIOS 3.00 07/05/2013
setup_arch (x86) (5)
17
# Function Category Description
32 early_gart_iommu_check Device Check GART (Graphics Address Remapping
Table)
33 (Substitute to max_pfn) MM Set max_pfn as the last page in e820
34 mtrr_bp_init CPU MTRRs (Memory Type Range Registers)
35 check_x2apic CPU Enable X2APIC if available
36 find_smp_config SMP Find the SMP config for Intel MP Spec.
37 reserve_ibft_region Device Reserve iSCSI Boot Format Table
38 early_alloc_pgt_buf MM Allocate page table buffer (to be used in the
early stage)
39 reserve_brk MM Reserve brk area
40 cleanup_highmap MM Unmap out-of-range areas in the kernel map
41 memblock_set_current_li
mit
MM Set the memblock’s allocation limit to
ISA_END_ADDRESS
42 memblock_x86_fill MM Fill the memblock info according to e820
setup_arch (x86) (6)
18
# Function Category Description
43 early_reserve_e820_mpc_
new
SMP Allocate for mptable
44 setup_bios_corruption_ch
eck
Setup Fill 64KB of low memory by some pattern to
detect if BIOS corrupts the area
45 reserve_real_mode CPU/SMP Reserve some low memory for trampoline
46 trim_platform_memory_r
anges
Setup Special tricks (reserve) for some platform
(Some Sandy Bridge)
47 trim_low_memory_range MM Reserve the first 4KB page in memblock
48 init_mem_mapping MM Reconstruct memory mapping
49 early_trap_pf_init CPU Set page fault handler
50 setup_real_mode CPU/SMP Setup the trampoline code
51 memblock_set_current_li
mit
MM Change the limit to the last page mapped
52 dma_contiguous_reserve MM Allocate contiguous area for DMA
setup_arch (x86) (7)
19
# Function Cat. Description
53 setup_log_buf Debug Setup printk log buffer
54 reserve_initrd MM Reserve the initrd
55 acpi_initrd_override ACPI Find the ACPI override info in initrd
56 vsmp_init Setup vSMP (ScaleMP Inc.)
57 io_delay_init Setup Check DMI override for I/O delay strategy
58 acpi_boot_table_init ACPI ACPI BOOT table parsing
59 early_acpi_boot_init ACPI Parse MADT in ACPI
60 initmem_init MM Setup node information based on ACPI (if
NUMA)
61 reserve_crashkernel Debug Reserve memory for crashkernel
62 memblock_find_dma_reserve MM Count the reserved pages in DMA zone
63 pagetable_init MM Initialize sparse mem, and zone sizes
64 tboot_init CPU Intel TXT (Trusted eXecution Technology)
support
setup_arch (x86) (8)
20
# Function Cat. Description
65 map_vsyscall CPU Map vsyscall
66 generic_apic_probe CPU Probe APIC driver
67 early_quirks PCI Apply some quirks for certain devices
68 acpi_boot_init ACPI Parse (again) BOOT, FADT, MADT, HPET etc.
69 sfi_init SFI SFI (Simple Firmware Interface)
70 x86_dtb_init Setup Device tree
71 get_smp_config SMP (If ACPI is not found) construct the table
72 prefill_possible_map SMP Set the possible CPU map
73 init_cpu_to_node NUMA Set up the cpu to node map
74 init_apic_mappings CPU Set the local APIC address
75 x86_io_apic_ops.init CPU I/O APIC
76 kvm_guest_init Virt. KVM Guest (paravirt ops, etc.)
77 e820_reserve_resources MM Reserve resources for e820 entries
setup_arch (x86) (9)
21
# Function Cat. Description
78 e820_mark_nosave_regions PM Add non-RAM area in e820 to nosave regions
79 x86_init.resources.reserve_re
sources
I/O Reserve standard I/O resources (Timer, KB,…)
80 e820_setup_gap MM Find the largest gap in e820, and pass PCI to
use the gap to allocate new MMIO areas
81 x86_init.oem.banner Debug ā€œBooting paravirtualized kernel on %sā€
82 x86_init.timers.wallclock_init Timer (NOP; defined in MID only)
83 mcheck_init CPU Machine check (temperature)
84 arch_init_ideal_nops CPU Set the NOP instructions ideal to the current
platform
85 register_refined_jiffies Timer Register ā€œrefined_jiffiesā€ clocksource
setup_arch (ARM) (1)
22
# Function Category Description
1 setup_processor CPU Processor initialization
2 setup_machine_fdt Setup Parse the device tree
3 setup_machine_tags Setup If 2 is failed, parse the ATAGs
4 (prepare init_mm) MM Set start_code, end_code, etc. for init_mm
5 (command line stuffs) Setup (=#15 in x86)
6 parse_early_param Setup (=#17 in x86)
7 (sort meminfo) MM Sort the memory information
8 early_paging_init MM Recreate the page table prepared during boot
9 setup_dma_zone MM Setup the dma zone information
10 sanity_check_meminfo MM Sanitize the meminfo
11 arm_memblock_init MM Add free memory from meminfo, and reserve
various reserved areas.
12 paging_init MM Permanent kmap area
setup_arch (ARM) (2)
23
# Function Category Description
13 request_standard_resourc
es
MM Reserve resources for system memory, video
ram
14 unflatten_device_tree Setup Create a tree from FDT
15 arm_dt_init_cpu_maps CPU Create CPU logical map based on the device
tree
16 psci_init CPU Read the method to be used for CPU on, off,
etc.
17 smp_init_cpus SMP Initialize the CPU cores available
18 smp_build_mpidr_hash SMP Precompute shifts required to get index from
MPIDR (Mulitprocessor ID register) value
19 hyp_mode_check Virt. Check if the CPU is running in HYP mode
20 reserve_crashkernel Debug Reserve memory for crashkernel
21 mdesc->init_early (Platform-specific initialization)
The rest of initialization
• rest_init (init/main.c)
• Create two kernel threads
• ā€œinitā€ (PID = 1, gradually it becomes the init user process)
• ā€œkthreaddā€ (PID = 2, to allow init to create another kernel threads)
24
static noinline void __init_refok rest_init(void)
{
rcu_scheduler_starting();
...
kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
numa_default_policy();
pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
rcu_read_lock();
kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
rcu_read_unlock();
complete(&kthreadd_done);
...
init_idle_bootup_task(current);
schedule_preempt_disabled();
...
cpu_startup_entry(CPUHP_ONLINE);
}
Idle task
• Before entering idle, it calls scheduler.
• Then, call the idle function
25
...
init_idle_bootup_task(current);
schedule_preempt_disabled();
...
cpu_startup_entry(CPUHP_ONLINE);
}
void __sched schedule_preempt_disabled(void)
{
sched_preempt_enable_no_resched();
schedule();
preempt_disable();
}
void cpu_startup_entry(enum cpuhp_state state)
{
...
__current_set_polling();
arch_cpu_idle_prepare();
cpu_idle_loop();
}
kernel_init
• Call the remaining init functions (kernel_init_freeable)
• Synchronize all the asynchronous operations
• Free the initmem (free_initmem)
• Mark RO Data to RO (and NX) (mark_rodata_ro)
• Set the system state to SYSTEM_RUNNING
• Set the current NUMA policy to default
(numa_default_policy)
• Try to execve(2) ā€œinitā€ process
• If rdinit parameter is set, exec the path
• If init parameter is set, exec the path
• Try to run ā€œ/sbin/init,ā€ ā€œ/etc/init,ā€ ā€œ/bin/init,ā€ ā€œ/bin/shā€
• If nothing worked, panic with a familiar message:
26
"No working init found. Try passing init= option to kernel. See Linux
Documentation/init.txt for guidance."
kernel_init_freeable
• First, wait for the completion of kthreadd’s setup
• Set init’s allowed cpus/mems to all CPUs and nodes
• Set cad_pid to init’s
• Prepare to boot other CPUs (smp_prepare_cpus)
• Call early initcalls (do_pre_smp_initcalls)
• Initialize lockup_detector (lockup_detector_init)
• Initialize multiprocessor (smp_init)
• Boots up other cores/sockets
• Initialize the scheduler (sched_init_smp)
• Call the do_basic_setup function (-> Next slide)
• Open ā€œ/dev/consoleā€ and dup twice (fd : 0 to 2)
• Check if the ramdisk is available
• If not, try to mount root (prepare_namespace)
• Load the I/O scheduler (elevator) module
27
do_basic_setup
• Re-initialize cpuset to the active CPUs
(cpuset_init_smp)
• Initialize user-mode helper (khelper)
• Initialize tmpfs (shmem_init)
• Initialize drivers (driver_init)
• Create proc directories and files for IRQs (init_irq_proc)
• Call constructors (do_ctors) (CONFIG_CONSTRUCORS)
• Enable the user-mode helper workqueue
• Call all the initcalls (do_initcalls)
• Initialize random values (random_int_secret_init)
28
initcalls
• Facility to call initialization functions during the
initialization (in the kernel_init_freeable function)
• Example
29
static int cpu_pm_init(void)
{
register_syscore_ops(&cpu_pm_syscore_ops);
return 0;
}
core_initcall(cpu_pm_init);
(kernel/cpu_pm.c)
Level of initcalls
• Several levels (the order to call) are defined
30
Macro Lv. # Description
early_initcall early called before smp
pure_initcall 0 no dependency, variable initizalization
core_initcall{,_sync} 1, 1s
postcore_initcall{,_sync} 2, 2s
arch_initcall{,_sync} 3, 3s
subsys_initcall{,_sync} 4, 4s
fs_initcall{,_sync} 5, 5s
rootfs_initcall rootfs
device_initcall{,_sync} 6, 6s
late_initcall{,_sync} 7, 7s
Initcall definition
• Collect all the pointers for initcall functions at
certain sections
• Section name : ā€œ.initcall lv .initā€
• E.g. for ā€œcore_initcallā€, the section will be ā€œ.initcall1.initā€
31
#define __define_initcall(fn, id) 
static initcall_t __initcall_##fn##id __used 
__attribute__((__section__(".initcall" #id ".init"))) = fn; 
LTO_REFERENCE_INITCALL(__initcall_##fn##id)
(include/linux/init.h)
In the LD script
32
#define INIT_CALLS 
VMLINUX_SYMBOL(__initcall_start) = .; 
*(.initcallearly.init) 
INIT_CALLS_LEVEL(0) 
INIT_CALLS_LEVEL(1) 
INIT_CALLS_LEVEL(2) 
INIT_CALLS_LEVEL(3) 
INIT_CALLS_LEVEL(4) 
INIT_CALLS_LEVEL(5) 
INIT_CALLS_LEVEL(rootfs) 
INIT_CALLS_LEVEL(6) 
INIT_CALLS_LEVEL(7) 
VMLINUX_SYMBOL(__initcall_end) = .;
(include/asm-generic/vmlinux.lds.h)
#define INIT_CALLS_LEVEL(level) 
VMLINUX_SYMBOL(__initcall##level##_start) = .; 
*(.initcall##level##.init) 
*(.initcall##level##s.init) 
(include/asm-generic/vmlinux.lds.h)
Special initcalls
• console_initcall
• Called from console_init (in kernel_start)
• security_initcall
• Called from security_init (in kernel_start)
• When used in loadable modules (not
recommended), it’s replaced by module_init
33
#else /* MODULE */
/* Don't use these in loadable modules, but some people do... */
#define early_initcall(fn) module_init(fn)
#define core_initcall(fn) module_init(fn)
...
(include/linux/init.h)
Initcall debug
• Kernel command-line option: ā€œinitcall_debugā€
• Shows the debug message
• When it calls and is returned from each initcall function, it
prints a message with elapsed time
34
static int __init_or_module do_one_initcall_debug(initcall_t fn)
{
...
pr_debug("calling %pF @ %in", fn, task_pid_nr(current));
calltime = ktime_get();
ret = fn();
rettime = ktime_get();
...
pr_debug("initcall %pF returned %d after %lld usecsn",
fn, ret, duration);
...
}
(init/main.c)
4. Multiprocessor
Initialization
Welcome to the world of concurrency!
35
How the multiple cores are started?
• Two types
36
HW Power On
Start Linux kernel
Initialize SMP
Core 0 Core 1 Core 2 …
Wake up
Wake up
Core 0 Core 1 Core 2
Wake up
Wake up
Stop &
Wait Stop &
Wait
How the multiple cores are started?
• The first type
• x86, ARM, etc.
• (x86) The first processor (core) is determined by HW,
and called ā€œthe bootstrap processorā€ (BSP). The
remaining processor(s) (cores) are called ā€œapplication
processor(s)ā€ (APs).
• The second type
• PowerPC (some models), etc.
37
MP Detection
• How to detect the number of cores available in the
hardware?
• Firmware Information
• ACPI MADT (Multiple APIC Description Table) (x86)
• SFI (Simple Firmware Interface) (Xeon Phi)
• MP Configuration Table (Very old x86)
• DeviceTree (ARM)
• Or hardcoded (ARM…)
• Kernel boot parameters
• nosmp
• maxcpus=<n>
• Kernel configuration
• CONFIG_NR_CPUS
38
MP Booting
• x86
• INIT IPI
• The sequence of INIT, INIT, STARTUP IPI.
• NMI (For CPU0)
• ā€œThis works to wake up soft offline CPU0 onlyā€
• ARM
• ā€œenable-methodā€ node in the device tree
• Depends on the board (march)
• ARM64
• ā€œenable-methodā€ node in the device tree
• ā€œspin-tableā€
• Cores spin at some memory area (outside the kernel). When a
value is written to the area, the core jumps to the written address.
• ā€œpsciā€ (Power State Coordination Interface)
39
AP Initialization
• After woken up, where will AP execute?
• X86
• First, ā€œtrampoline codeā€
• Switches from real-mode to the 32-bit or 64-bit mode
• Located in the very low memory since the new core start in the
real-mode
• Then, jump to the secondary entrypoint
• 32-bit : startup_32_smp (arch/x86/kernel/head_32.S)
• 64-bit : secondary_startup_64 (arch/x86/kernel/head_64.S)
• ARM64
• First, ā€œsecondary_holding_penā€ (arch/arm64/kernel/head.S)
• After woken up, all the cores are held at this function
• Then, secondary_startup
40
AP Initialization (2)
• Initializes the CPU state for the new core in the
assembler level
• Paging on
• Some special registers…
• Then, goes to the C code
• start_secondary (in x86, arch/x86/kernel/smpboot.c)
• secondary_start_kernel (in ARM/ARM64,
arch/arm{,64}/kernel/smp.c)
• Finally, it goes to the idle loop as the boot task
• cpu_startup_entry
41
start_secondary (x86)
42
# Function Category Description
1 cpu_init CPU Various CPU states
2 x86_cpuinit.early_percpu_
clock_init
3 smp_callin SMP Notify the BSP of the AP’s boot-up
4 check_tsc_sync_target
5 set_cpu_online SMP Set the cpu_online_mask
6 x86_platform.nmi_init CPU
7 boot_init_stack_canary Debug
8 x86_cpuinit.setup_percpu
_clockev
9 cpu_startup_entry
secondary_start_kernel (ARM64)
43
# Function Category Description
1 (Set the current mm to
init_mm)
MM
2 set_my_cpu_offset SMP Set per-cpu offset
3 cpu_set_reserved_ttbr0 CPU Set TTBR0 to the zero page
4 cpu_ops[cpu]-
>cpu_postboot
CPU
5 notify_cpu_starting
6 smp_store_cpu_info
7 set_cpu_online
8 complete Notify the boot CPU of the core’s boot
9 cpu_startup_entry Go to the idle loop
(Notes)
• Naming conventions
• BP? BSP?
• Why some functions have e820_ as their prefixes but
some do not?
44

Linux Initialization Process (2)

  • 1.
    Initialization (2) Taku Shimosawa Pourle livre nouveau du Linux noyau 1
  • 2.
    Agenda • Initialization functionlist • The list of the functions called from the kernel startup function (start_kernel) • The list of the functions called from some function called from the start_kernel function • setup_arch • rest_init, and the following functions • Initialization topics • Multiprocessor (SMP) Initialization 2
  • 3.
    3. Initialization At last,we have come here! 3
  • 4.
    Initialization Overview 4 Booting Code (PreparingCPU states, Gathering HW information, Decompressing vmlinux etc.) arch/*/boot/ arch/*/kernel/head*.S, head*.c Low-level Initialization (Switching to virtual memory world, Getting prepared for C programs) init/main.c (startup_kernel) Initialization (Initializing all the kernel features including architecture-dependent parts) init/main.c (rest_init) Creating the ā€œinitā€ process, and letting it the rest initialization (Setting up multiprocessing, scheduling) kernel/sched/idle.c (cpu_idle_loop) ā€œSwapperā€ (PID=0) now sleeps init/main.c (kernel_init) Performing final initialization and ā€œExecā€ing the ā€œinitā€ user ā€œinitā€ (PID=1) arch/*/kernel, arch/*/mm, …Call vmlinux
  • 5.
    start_kernel (1) 5 # FunctionCategory Description 1 lockdep_init Debug Lock validator 2 smp_setup_processor_id* SMP Initialize processor ID (some architecture) 3 debug_objects_early_init Debug Lifetime debugging facility for objects 4 boot_init_stack_canary* Debug Decide the canary value for the stack protector 5 cgroup_init_early cgroup Early init for some cgroup subsystems 6 boot_cpu_init SMP Set the boot cpu for various cpumasks 7 page_address_init MM Initialize hash for kmap (highmem) 8 setup_arch* 9 mm_init_owner MM Set init_mm’s owner to init_task 10 mm_init_cpumask MM Set the cpu mask pointer to the mm’s cpumask (only if CPUMASK_OFFSTACK) 11 setup_command_line Init Copy the command line parameter to newly allocated buffer (allocated by memblock) 12 setup_nr_cpu_ids SMP Set ā€œnr_cpu_idsā€ according to the last bit in Functions with * : mostly architecture dependent codes
  • 6.
    start_kernel (2) 6 # FunctionCategory Description 13 setup_per_cpu_areas* SMP Allocate and initialize percpu areas 14 smp_prepare_boot_cpu* SMP Prepare for SMP boot 15 build_all_zonelists MM Initializes ā€œzonelistā€ 16 page_alloc_init MM Add a handler for CPU hotplug (to drain pages) 17 parse_early_param Init Parse ā€œearlyā€ options 18 parse_args Init Parse the rest of options 19 jump_label_init Option Jump label (self-modification) 20 setup_log_buf Debug Allocate and initialize printk log buffer 21 pidhash_init Sched Initialize PID hash 22 vfs_caches_init FS Initialize various caches (kmem_cache) in VFS (dcache, inode, mnt, files, …) 23 sort_main_extable MM Sort the exception table (used in page faults) 24 trap_init* CPU Initialize trap handlers
  • 7.
    start_kernel (3) 7 # FunctionCategory Description 25 mm_init MM Initialize MM 25A page_cgroup_init_flatmme MM Allocate pages for page_cgroup 25B mem_init* MM Free pages for buddy allocator 25C kmem_cache_init MM Initialize cache 25D percpu_init_late MM Replaces per-cpu chunks with those allocated by slab 25E pgtable_init* MM Create cache for ptlock and pgtable (SH etc.) 25F vmalloc_init MM Initialize vmalloc 26 sched_init Sched Initialize scheduler 27 idr_cache_init Util Initialize IDR (ID to pointer translation) 28 rcu_init SMP Initialize RCU 29 tick_nohz_init Sched Initialize NOHZ (enable context tracking) 30 radix_tree_init Util Initialize radix tree (create cache, etc.) 31 early_irq_init* CPU Initialize irq_desc.
  • 8.
    start_kernel (4) 8 # FunctionCategory Description 32 init_IRQ * CPU Initialize various IRQs (in x86, set gates for APIC interrupts, etc.) 33 tick_init Timer Tick broadcast (to emulate local timer) 34 init_timers Timer Timer stats, notifier, and timer softirq 35 hrtimers_init Timer hrtimer notifier, and hrtimer softirq 36 softirq_init Sched Tasklet lists, and tasklet softirqs 37 timekeeping_init Timer Clocksource 38 time_init * Timer (Platform-dependent) timer initialization 39 sched_clock_postinit Sched Start the hrtimer 40 perf_event_init Debug Perf events 41 profile_init Debug (Simple) profiler 42 call_function_init SMP Initialize csd (call single data) queue local_irq_enable CPU At this point, interrupts are enabled
  • 9.
    start_kernel (5) 9 # FunctionCategor y Description 43 kmem_cache_init_late MM Post-initialization of cache (slab) 44 console_init Console Call console initcalls 45 lockdep_info Debug Print lockdep information 46 locking_selftest Debug Test spinlocks, rwlocks, mutexes, and rwsemaphores 47 page_cgroup_init cgroup Page cgroup 48 debug_objects_mem_init Debug Enable dynamic allocation for debugobjects (#3), and replace static ones with newly allocated one 49 kmemleak_init Debug kmemleak (Memory leak check facility) 50 setup_per_cpu_pageset MM Per-cpu pageset 51 numa_policy_init MM NUMA (VMA) policy 52 late_time_init* Timer Late initialization (In x86, HPET and TSC are initialized)
  • 10.
    start_kernel (6) 10 # FunctionCategory Description 53 sched_clock_init Sched Set the time info for scheduler 54 calibrate_delay Timer Calibrate for the ā€œdelayā€ functions 55 pidmap_init Process Init PID map for initial PID namespace 56 anon_vma_init MM Create cache for ā€œanon_vmaā€ 57 acpi_early_init ACPI ACPI Subsystems, load DSDT 58 thread_info_cache_init Process Allocate cache for thread_info if its size is less than PAGE_SIZE 59 cred_init Security Task credential 60 fork_init Process Allocate a cache for task_struct 61 proc_caches_init MM Allocate caches for mm_struct, etc. 62 buffer_init FS Allocate a cache for buffer_head 63 key_init Security Allocate a cache for key_jar 64 security_init Security Call security_initcall’s 65 dbg_late_init Debug Late init for kgdb
  • 11.
    start_kernel (7) 11 # FunctionCategory Description 66 vfs_caches_init FS Allocate SLAB caches and hashtables for various VFS caches (dcache, inode_cache, …) 67 signals_init Sched Allocate a cache for sigqueue 68 page_writeback_init MM Initialize the ratio for the dirty pages 69 proc_root_init Procfs Create the root for procfs and some directories 70 cgroup_init Cgroup Initialize the rest of cgroups 71 cpuset_init Sched The top-level cpuset 72 taskstats_init_early Sched Task statistics exposed to the user level 73 delayacct_init Sched Task delay accounting 74 check_bugs* CPU Fix up for some architecture-dependent bugs (in x86_64, alternatives are initialized, and divide the first 2MB page into 4K pages) 75 sfi_init_late SFI Map again the area by using ioremap
  • 12.
    start_kernel (8) 12 # FunctionCategory Description 76 ftrace_init Debug ftrace 77 rest_init
  • 13.
    setup_arch (x86) (1) 13 #Function Category Description 1 memblock_reserve MM Reserve the text area 2 early_reserve_initrd MM Reserve the initrd area 3 clone_pgd_area, load_cr3 MM Switch to swapper_pg_dir (i386 only) 4 olpc_ofw_detect Platform OLPC OFW Stuff 5 early_trap_init CPU Init debug and int3 gate 6 early_cpu_init CPU Detect CPU’s vendor (registered in cpu_dev_register: Intel, AMD, Cyrix…) and calls early_init and bsp_init 7 early_ioremap_init MM Init early ioremap 8 setup_olpc_ofw_pgd Platform OLPC OFW Stuff 9 (Parsing boot parameters) Setup -- 10 x86_init.oem.arch_setup Platform OEM-dependent setup (Intel MID etc.) 11 setup_memory_map MM Copy and print e820 information 12 parse_setup_data Setup Parse setup_data in boot_params
  • 14.
    setup_arch (x86) (2) 14 #Function Category Description 13 copy_edd Setup Copy BIOS EDD information 14 (prepare init_mm) MM Set start_code, end_code, etc. for init_mm 15 (command line stuffs) Setup 16 x86_configure_nx MM Set ptemask according to whether NX is supported by CPU 17 parse_early_param Setup (=#17 in start_kernel) 18 x86_report_nx MM Print NX information 19 memblock_x86_reserve_r ange_setup_data MM Reserve the setup_data area 20 acpi_mps_check SMP Check if ACPI is disabled and MPS code is not built-in 21 early_pci_dump_devices Device Dump PCI info before PCI is initialized 22 e820_reserve_setup_data MM Reserve the setup_data area in e820 23 finish_e820_parsing Setup Sanitize e820 info and print e820 info.
  • 15.
    setup_arch (x86) (3) 15 #Function Category Description 13 copy_edd Setup Copy BIOS EDD information 14 (prepare init_mm) MM Set start_code, end_code, etc. for init_mm 15 (command line stuffs) Setup 16 x86_configure_nx MM Set ptemask according to whether NX is supported by CPU 17 parse_early_param Setup (=#17 in start_kernel) 18 x86_report_nx MM Print NX information 19 memblock_x86_reserve_r ange_setup_data MM Reserve the setup_data area 20 acpi_mps_check SMP Check if ACPI is disabled and MPS code is not built-in 21 early_pci_dump_devices Device Dump PCI info before PCI is initialized 22 e820_reserve_setup_data MM Reserve the setup_data area in e820 23 finish_e820_parsing Setup Sanitize e820 info and print e820 info.
  • 16.
    setup_arch (x86) (4) 16 #Function Cat. Description 24 dmi_scan_machine DMI Check if DMI (Desktop Management Interface) is present or not 25 dmi_memdev_walk DMI Walk through the DMI table 26 dmi_set_dump_stack_arch_de sc DMI Set architecture description* for dump_stack 27 init_hypervisor_platform VM Get the hypervisor information and init (e.g. Get Hz using special I/O port when running on VMWare) 28 probe_roms MM Request resources for Video ROM, Extension ROMs, etc. 29 insert_resource MM Insert resources for kernel’s code, data, BSS 30 e820_add_kernel_range MM Add kernel code, data areas to e820 if is not marked as E820_RAM 31 trim_bios_range MM Reserve BIOS areas in e820 (*) Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3 CPU: 3 PID: 2763 Comm: irqbalance Tainted: G W 3.14.13 #1 Hardware name: Supermicro X9SRH-7F/7TF/X9SRH-7F77TF, BIOS 3.00 07/05/2013
  • 17.
    setup_arch (x86) (5) 17 #Function Category Description 32 early_gart_iommu_check Device Check GART (Graphics Address Remapping Table) 33 (Substitute to max_pfn) MM Set max_pfn as the last page in e820 34 mtrr_bp_init CPU MTRRs (Memory Type Range Registers) 35 check_x2apic CPU Enable X2APIC if available 36 find_smp_config SMP Find the SMP config for Intel MP Spec. 37 reserve_ibft_region Device Reserve iSCSI Boot Format Table 38 early_alloc_pgt_buf MM Allocate page table buffer (to be used in the early stage) 39 reserve_brk MM Reserve brk area 40 cleanup_highmap MM Unmap out-of-range areas in the kernel map 41 memblock_set_current_li mit MM Set the memblock’s allocation limit to ISA_END_ADDRESS 42 memblock_x86_fill MM Fill the memblock info according to e820
  • 18.
    setup_arch (x86) (6) 18 #Function Category Description 43 early_reserve_e820_mpc_ new SMP Allocate for mptable 44 setup_bios_corruption_ch eck Setup Fill 64KB of low memory by some pattern to detect if BIOS corrupts the area 45 reserve_real_mode CPU/SMP Reserve some low memory for trampoline 46 trim_platform_memory_r anges Setup Special tricks (reserve) for some platform (Some Sandy Bridge) 47 trim_low_memory_range MM Reserve the first 4KB page in memblock 48 init_mem_mapping MM Reconstruct memory mapping 49 early_trap_pf_init CPU Set page fault handler 50 setup_real_mode CPU/SMP Setup the trampoline code 51 memblock_set_current_li mit MM Change the limit to the last page mapped 52 dma_contiguous_reserve MM Allocate contiguous area for DMA
  • 19.
    setup_arch (x86) (7) 19 #Function Cat. Description 53 setup_log_buf Debug Setup printk log buffer 54 reserve_initrd MM Reserve the initrd 55 acpi_initrd_override ACPI Find the ACPI override info in initrd 56 vsmp_init Setup vSMP (ScaleMP Inc.) 57 io_delay_init Setup Check DMI override for I/O delay strategy 58 acpi_boot_table_init ACPI ACPI BOOT table parsing 59 early_acpi_boot_init ACPI Parse MADT in ACPI 60 initmem_init MM Setup node information based on ACPI (if NUMA) 61 reserve_crashkernel Debug Reserve memory for crashkernel 62 memblock_find_dma_reserve MM Count the reserved pages in DMA zone 63 pagetable_init MM Initialize sparse mem, and zone sizes 64 tboot_init CPU Intel TXT (Trusted eXecution Technology) support
  • 20.
    setup_arch (x86) (8) 20 #Function Cat. Description 65 map_vsyscall CPU Map vsyscall 66 generic_apic_probe CPU Probe APIC driver 67 early_quirks PCI Apply some quirks for certain devices 68 acpi_boot_init ACPI Parse (again) BOOT, FADT, MADT, HPET etc. 69 sfi_init SFI SFI (Simple Firmware Interface) 70 x86_dtb_init Setup Device tree 71 get_smp_config SMP (If ACPI is not found) construct the table 72 prefill_possible_map SMP Set the possible CPU map 73 init_cpu_to_node NUMA Set up the cpu to node map 74 init_apic_mappings CPU Set the local APIC address 75 x86_io_apic_ops.init CPU I/O APIC 76 kvm_guest_init Virt. KVM Guest (paravirt ops, etc.) 77 e820_reserve_resources MM Reserve resources for e820 entries
  • 21.
    setup_arch (x86) (9) 21 #Function Cat. Description 78 e820_mark_nosave_regions PM Add non-RAM area in e820 to nosave regions 79 x86_init.resources.reserve_re sources I/O Reserve standard I/O resources (Timer, KB,…) 80 e820_setup_gap MM Find the largest gap in e820, and pass PCI to use the gap to allocate new MMIO areas 81 x86_init.oem.banner Debug ā€œBooting paravirtualized kernel on %sā€ 82 x86_init.timers.wallclock_init Timer (NOP; defined in MID only) 83 mcheck_init CPU Machine check (temperature) 84 arch_init_ideal_nops CPU Set the NOP instructions ideal to the current platform 85 register_refined_jiffies Timer Register ā€œrefined_jiffiesā€ clocksource
  • 22.
    setup_arch (ARM) (1) 22 #Function Category Description 1 setup_processor CPU Processor initialization 2 setup_machine_fdt Setup Parse the device tree 3 setup_machine_tags Setup If 2 is failed, parse the ATAGs 4 (prepare init_mm) MM Set start_code, end_code, etc. for init_mm 5 (command line stuffs) Setup (=#15 in x86) 6 parse_early_param Setup (=#17 in x86) 7 (sort meminfo) MM Sort the memory information 8 early_paging_init MM Recreate the page table prepared during boot 9 setup_dma_zone MM Setup the dma zone information 10 sanity_check_meminfo MM Sanitize the meminfo 11 arm_memblock_init MM Add free memory from meminfo, and reserve various reserved areas. 12 paging_init MM Permanent kmap area
  • 23.
    setup_arch (ARM) (2) 23 #Function Category Description 13 request_standard_resourc es MM Reserve resources for system memory, video ram 14 unflatten_device_tree Setup Create a tree from FDT 15 arm_dt_init_cpu_maps CPU Create CPU logical map based on the device tree 16 psci_init CPU Read the method to be used for CPU on, off, etc. 17 smp_init_cpus SMP Initialize the CPU cores available 18 smp_build_mpidr_hash SMP Precompute shifts required to get index from MPIDR (Mulitprocessor ID register) value 19 hyp_mode_check Virt. Check if the CPU is running in HYP mode 20 reserve_crashkernel Debug Reserve memory for crashkernel 21 mdesc->init_early (Platform-specific initialization)
  • 24.
    The rest ofinitialization • rest_init (init/main.c) • Create two kernel threads • ā€œinitā€ (PID = 1, gradually it becomes the init user process) • ā€œkthreaddā€ (PID = 2, to allow init to create another kernel threads) 24 static noinline void __init_refok rest_init(void) { rcu_scheduler_starting(); ... kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND); numa_default_policy(); pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES); rcu_read_lock(); kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns); rcu_read_unlock(); complete(&kthreadd_done); ... init_idle_bootup_task(current); schedule_preempt_disabled(); ... cpu_startup_entry(CPUHP_ONLINE); }
  • 25.
    Idle task • Beforeentering idle, it calls scheduler. • Then, call the idle function 25 ... init_idle_bootup_task(current); schedule_preempt_disabled(); ... cpu_startup_entry(CPUHP_ONLINE); } void __sched schedule_preempt_disabled(void) { sched_preempt_enable_no_resched(); schedule(); preempt_disable(); } void cpu_startup_entry(enum cpuhp_state state) { ... __current_set_polling(); arch_cpu_idle_prepare(); cpu_idle_loop(); }
  • 26.
    kernel_init • Call theremaining init functions (kernel_init_freeable) • Synchronize all the asynchronous operations • Free the initmem (free_initmem) • Mark RO Data to RO (and NX) (mark_rodata_ro) • Set the system state to SYSTEM_RUNNING • Set the current NUMA policy to default (numa_default_policy) • Try to execve(2) ā€œinitā€ process • If rdinit parameter is set, exec the path • If init parameter is set, exec the path • Try to run ā€œ/sbin/init,ā€ ā€œ/etc/init,ā€ ā€œ/bin/init,ā€ ā€œ/bin/shā€ • If nothing worked, panic with a familiar message: 26 "No working init found. Try passing init= option to kernel. See Linux Documentation/init.txt for guidance."
  • 27.
    kernel_init_freeable • First, waitfor the completion of kthreadd’s setup • Set init’s allowed cpus/mems to all CPUs and nodes • Set cad_pid to init’s • Prepare to boot other CPUs (smp_prepare_cpus) • Call early initcalls (do_pre_smp_initcalls) • Initialize lockup_detector (lockup_detector_init) • Initialize multiprocessor (smp_init) • Boots up other cores/sockets • Initialize the scheduler (sched_init_smp) • Call the do_basic_setup function (-> Next slide) • Open ā€œ/dev/consoleā€ and dup twice (fd : 0 to 2) • Check if the ramdisk is available • If not, try to mount root (prepare_namespace) • Load the I/O scheduler (elevator) module 27
  • 28.
    do_basic_setup • Re-initialize cpusetto the active CPUs (cpuset_init_smp) • Initialize user-mode helper (khelper) • Initialize tmpfs (shmem_init) • Initialize drivers (driver_init) • Create proc directories and files for IRQs (init_irq_proc) • Call constructors (do_ctors) (CONFIG_CONSTRUCORS) • Enable the user-mode helper workqueue • Call all the initcalls (do_initcalls) • Initialize random values (random_int_secret_init) 28
  • 29.
    initcalls • Facility tocall initialization functions during the initialization (in the kernel_init_freeable function) • Example 29 static int cpu_pm_init(void) { register_syscore_ops(&cpu_pm_syscore_ops); return 0; } core_initcall(cpu_pm_init); (kernel/cpu_pm.c)
  • 30.
    Level of initcalls •Several levels (the order to call) are defined 30 Macro Lv. # Description early_initcall early called before smp pure_initcall 0 no dependency, variable initizalization core_initcall{,_sync} 1, 1s postcore_initcall{,_sync} 2, 2s arch_initcall{,_sync} 3, 3s subsys_initcall{,_sync} 4, 4s fs_initcall{,_sync} 5, 5s rootfs_initcall rootfs device_initcall{,_sync} 6, 6s late_initcall{,_sync} 7, 7s
  • 31.
    Initcall definition • Collectall the pointers for initcall functions at certain sections • Section name : ā€œ.initcall lv .initā€ • E.g. for ā€œcore_initcallā€, the section will be ā€œ.initcall1.initā€ 31 #define __define_initcall(fn, id) static initcall_t __initcall_##fn##id __used __attribute__((__section__(".initcall" #id ".init"))) = fn; LTO_REFERENCE_INITCALL(__initcall_##fn##id) (include/linux/init.h)
  • 32.
    In the LDscript 32 #define INIT_CALLS VMLINUX_SYMBOL(__initcall_start) = .; *(.initcallearly.init) INIT_CALLS_LEVEL(0) INIT_CALLS_LEVEL(1) INIT_CALLS_LEVEL(2) INIT_CALLS_LEVEL(3) INIT_CALLS_LEVEL(4) INIT_CALLS_LEVEL(5) INIT_CALLS_LEVEL(rootfs) INIT_CALLS_LEVEL(6) INIT_CALLS_LEVEL(7) VMLINUX_SYMBOL(__initcall_end) = .; (include/asm-generic/vmlinux.lds.h) #define INIT_CALLS_LEVEL(level) VMLINUX_SYMBOL(__initcall##level##_start) = .; *(.initcall##level##.init) *(.initcall##level##s.init) (include/asm-generic/vmlinux.lds.h)
  • 33.
    Special initcalls • console_initcall •Called from console_init (in kernel_start) • security_initcall • Called from security_init (in kernel_start) • When used in loadable modules (not recommended), it’s replaced by module_init 33 #else /* MODULE */ /* Don't use these in loadable modules, but some people do... */ #define early_initcall(fn) module_init(fn) #define core_initcall(fn) module_init(fn) ... (include/linux/init.h)
  • 34.
    Initcall debug • Kernelcommand-line option: ā€œinitcall_debugā€ • Shows the debug message • When it calls and is returned from each initcall function, it prints a message with elapsed time 34 static int __init_or_module do_one_initcall_debug(initcall_t fn) { ... pr_debug("calling %pF @ %in", fn, task_pid_nr(current)); calltime = ktime_get(); ret = fn(); rettime = ktime_get(); ... pr_debug("initcall %pF returned %d after %lld usecsn", fn, ret, duration); ... } (init/main.c)
  • 35.
    4. Multiprocessor Initialization Welcome tothe world of concurrency! 35
  • 36.
    How the multiplecores are started? • Two types 36 HW Power On Start Linux kernel Initialize SMP Core 0 Core 1 Core 2 … Wake up Wake up Core 0 Core 1 Core 2 Wake up Wake up Stop & Wait Stop & Wait
  • 37.
    How the multiplecores are started? • The first type • x86, ARM, etc. • (x86) The first processor (core) is determined by HW, and called ā€œthe bootstrap processorā€ (BSP). The remaining processor(s) (cores) are called ā€œapplication processor(s)ā€ (APs). • The second type • PowerPC (some models), etc. 37
  • 38.
    MP Detection • Howto detect the number of cores available in the hardware? • Firmware Information • ACPI MADT (Multiple APIC Description Table) (x86) • SFI (Simple Firmware Interface) (Xeon Phi) • MP Configuration Table (Very old x86) • DeviceTree (ARM) • Or hardcoded (ARM…) • Kernel boot parameters • nosmp • maxcpus=<n> • Kernel configuration • CONFIG_NR_CPUS 38
  • 39.
    MP Booting • x86 •INIT IPI • The sequence of INIT, INIT, STARTUP IPI. • NMI (For CPU0) • ā€œThis works to wake up soft offline CPU0 onlyā€ • ARM • ā€œenable-methodā€ node in the device tree • Depends on the board (march) • ARM64 • ā€œenable-methodā€ node in the device tree • ā€œspin-tableā€ • Cores spin at some memory area (outside the kernel). When a value is written to the area, the core jumps to the written address. • ā€œpsciā€ (Power State Coordination Interface) 39
  • 40.
    AP Initialization • Afterwoken up, where will AP execute? • X86 • First, ā€œtrampoline codeā€ • Switches from real-mode to the 32-bit or 64-bit mode • Located in the very low memory since the new core start in the real-mode • Then, jump to the secondary entrypoint • 32-bit : startup_32_smp (arch/x86/kernel/head_32.S) • 64-bit : secondary_startup_64 (arch/x86/kernel/head_64.S) • ARM64 • First, ā€œsecondary_holding_penā€ (arch/arm64/kernel/head.S) • After woken up, all the cores are held at this function • Then, secondary_startup 40
  • 41.
    AP Initialization (2) •Initializes the CPU state for the new core in the assembler level • Paging on • Some special registers… • Then, goes to the C code • start_secondary (in x86, arch/x86/kernel/smpboot.c) • secondary_start_kernel (in ARM/ARM64, arch/arm{,64}/kernel/smp.c) • Finally, it goes to the idle loop as the boot task • cpu_startup_entry 41
  • 42.
    start_secondary (x86) 42 # FunctionCategory Description 1 cpu_init CPU Various CPU states 2 x86_cpuinit.early_percpu_ clock_init 3 smp_callin SMP Notify the BSP of the AP’s boot-up 4 check_tsc_sync_target 5 set_cpu_online SMP Set the cpu_online_mask 6 x86_platform.nmi_init CPU 7 boot_init_stack_canary Debug 8 x86_cpuinit.setup_percpu _clockev 9 cpu_startup_entry
  • 43.
    secondary_start_kernel (ARM64) 43 # FunctionCategory Description 1 (Set the current mm to init_mm) MM 2 set_my_cpu_offset SMP Set per-cpu offset 3 cpu_set_reserved_ttbr0 CPU Set TTBR0 to the zero page 4 cpu_ops[cpu]- >cpu_postboot CPU 5 notify_cpu_starting 6 smp_store_cpu_info 7 set_cpu_online 8 complete Notify the boot CPU of the core’s boot 9 cpu_startup_entry Go to the idle loop
  • 44.
    (Notes) • Naming conventions •BP? BSP? • Why some functions have e820_ as their prefixes but some do not? 44