Skip to content

Instantly share code, notes, and snippets.

@congto
Forked from reterVision/notes.md
Created March 25, 2018 10:54
Show Gist options
  • Save congto/458fefb9631ea0d1b5867e62d3b80727 to your computer and use it in GitHub Desktop.
Save congto/458fefb9631ea0d1b5867e62d3b80727 to your computer and use it in GitHub Desktop.

Revisions

  1. @reterVision reterVision revised this gist Sep 24, 2014. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions notes.md
    Original file line number Diff line number Diff line change
    @@ -429,7 +429,8 @@ utilization.

    **Bonding module**

    **1.6 Understanding Linux performance metrics**
    1.6 Understanding Linux performance metrics
    ===

    **Processor metrics**
    - CPU utilization
    @@ -467,5 +468,6 @@ utilization.
    - Blocks read/write per second
    - Kilobytes per second read/write

    **2 Monitoring and benchmark tools**
    2 Monitoring and benchmark tools
    ===

  2. @reterVision reterVision revised this gist Sep 24, 2014. 1 changed file with 152 additions and 1 deletion.
    153 changes: 152 additions & 1 deletion notes.md
    Original file line number Diff line number Diff line change
    @@ -316,5 +316,156 @@ direct impact on a server's performance.

    - SCSI (?)

    RAID and storage system
    1.4 RAID and storage system
    ===

    1.5 Network subsystem
    ===

    **Networking implementation**

    The socket provides an interface for user applications.

    1. When an application sends data to its peer host, the application creates its
    data
    2. The application opens the socket and writes the data through the socket
    interface.
    3. The *socket buffer* is used to deal with the transfered data. The socket
    buffer has reference to the data and it goes down through the layers.
    4. In each layer, appropriate operations such as parsing the headers, adding
    and modifying the headers, check sums, routing operation, fragmentation, and
    so on are performed. When the socket buffer goes down through the layers,
    the data itself is not copied between the layers. Because copying actual
    data between different layers is not effective, the kernel avoids unnecessary
    overhead by just changing the reference in the socket buffer and passing it
    to the next layer.
    5. Finally, the data goes out to the wire from the network interface card.
    6. The Ethernet frame arrives at the network interface of the peer host.
    7. The frame is moved into the network interface card buffer if the MAC address
    matches the MAC address of the interface card.
    8. The network interface card eventually moves the packet into a socket buffer
    and issues a hard interrupt at the CPU.
    9. The CPU then processes the packeet and moves it up through the layers until
    it arrives at (for example) a TCP port of an application such as Apache.

    **Socket buffer**

    ```
    /proc/sys/net/core/rmem_max
    /proc/sys/net/core/rmem_default
    /proc/sys/net/core/wmem_max
    /proc/sys/net/core/wmem_default
    /proc/sys/net/ipv4/tcp_mem
    /proc/sys/net/ipv4/tcp_rmem
    /proc/sys/net/ipv4/tcp_wmem
    ```

    **Network API(NAPI)**

    The standard implementation of the network stack in Linux focuses more on
    reliability and low latency than on low overhead and high throughput.

    Gigabit Ethernet and modern applications can create thousands of packets per
    second, causing a large number of interruts and context switches to occur.

    For the first packet, NAPI works just like traditional implementation as it
    issues an interrupt for the first packet. But after the first packet, the
    interface goes into a polling mode. As long as there are packets in the DMA
    ring buffer of the network interface, no new interrupts will be caused,
    effectively reducing context switching and the associated overhead.
    Should the last packet be processed and the ring buffer be emptied, then the
    interface card will again fall back into the interrupt mode. NAPI also has the
    advantage of improved multiprocessor scalability by creating soft interrupts
    that can be handled by multiple processors.

    **Netfilter**

    You can manipulate and configure Netfilter using the iptables utility.

    - **Packet filtering**: If a packet matchs a rule, Netfilter accepts or denies the
    packets or takes appropriate action based on defined rules.
    - **Address translation**: If a packeet matchs a rule, Netfilter alters the packet
    to meet the address translation requirements.

    **Netfilter Connection tracking**

    - NEW: packet attempting to establish new connection
    - ESTABLISED: packet goes through established connection
    - RELATED: packet which is related to previous packets
    - INVALID: packet which is unknown state due to malformed or invalid packet

    **TCP/IP**

    * Connection establishment
    * Connection close
    - The client sends a FIN packet to the server to start the connection
    termination process.
    - The server sends a ACK of the FIN back and then sends the FIN packet to
    the client if has no data to send to the client.
    - The client sends an ACK packet to the server to complete connection
    termination.

    **Traffic control**

    - **TCP/IP transfer window**
    - Basically, the TCP transfer window is the maximum amount of data a given
    host can send or receive before requiring an ACK from the other side of
    the connection.
    - The windows size is offered from the receiving host to the sending host
    by the window size field in the TCP header.

    - **Retransmission**
    - TCP/IP handles the timeouts and data retransmission problem by queuing
    packets and trying to send packets several times.

    **Offload**

    If the neetwork adapter on your system supports hardware offload functionality,
    the kernel can offload part of its task to the adapter and it can reduce CPU
    utilization.

    - Checksum offload
    - TCP segmentation offload

    **Bonding module**

    **1.6 Understanding Linux performance metrics**

    **Processor metrics**
    - CPU utilization
    - User time
    - System time
    - Waiting
    - Idel time
    - Nice time
    - Load average
    - Runable processes
    - Blocked
    - Context switch
    - Interrupts

    **Memory metrics**
    - Free memory
    - Swap usage
    - Buffer and cache
    - Slabs
    - Active versus inactive memory

    **Network interface metrics**
    - Packets received and sent
    - Bytes received and sent
    - Collistions per second
    - Packets dropped
    - Overruns
    - Errors

    **Block device metrics**
    - lowait
    - Average queue length
    - Average wait
    - Transfers per second
    - Blocks read/write per second
    - Kilobytes per second read/write

    **2 Monitoring and benchmark tools**

  3. @reterVision reterVision revised this gist Sep 22, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion notes.md
    Original file line number Diff line number Diff line change
    @@ -312,7 +312,7 @@ direct impact on a server's performance.
    - Deadline
    - NOOP

    ** I/O device driver**
    **I/O device driver**

    - SCSI (?)

  4. @reterVision reterVision revised this gist Sep 22, 2014. 1 changed file with 195 additions and 0 deletions.
    195 changes: 195 additions & 0 deletions notes.md
    Original file line number Diff line number Diff line change
    @@ -106,6 +106,8 @@ Linux CPU scheduler
    ===

    **O(1)**
    http://en.wikipedia.org/wiki/O(1)_scheduler
    http://www.ibm.com/developerworks/library/l-completely-fair-scheduler/

    two process priority arrays

    @@ -123,3 +125,196 @@ arrays are switched, restarting the algorithm.
    1.2 Linux memory architecture
    ===

    32-bit architectures -- 4 GB address space (3 GB usesr space and 1 GB kernel
    space)
    64-bit architectures -- 512 GB or more for both user/kernel space.

    Virtual memory manager
    ===

    Applications do not allocate physical memory, but request a memory map of a
    certain size at the Linux kernel and in exchange receive a map in virtual
    memory.

    VM does not necessarily have to be mapped into physical memory. If your app
    allocates a large amount of memory, some of it might be mmapped to the swap
    file on the disk subsystem.

    Applications usually do not write directly to the disk subsystem, but into
    cache or buffers.

    Page frame allocation
    ===

    A page is a group of contiguous linear addresses in physical memory (page frame)
    or virtual memory.

    A page is usually 4K bytes in size.

    Buddy system
    ===

    The Linux kernel maintains its free pages by using a mechanism called a
    *buddy system*.

    The buddy system maintains free pages and tries to allocate pages for page
    allocation requests. It tries to keep the memory area contiguous.

    When the attempt of pages allocation fails, the page reclaiming is activated.

    Page frame reclaiming
    ===

    *kswapd* kernel thread and `try_to_free_page()` kernel function are responsible
    for page reclaiming.

    *kswapd* tries to find the candidate pages to be taken out of active pages
    based on *LRU* principle.

    The pages are used mainly for two purposes: *page cache* and *process address space*
    The page cache is pages mapped to a file on disk.
    The pages that belong to a process address space are used for heap and stack.

    **swap**

    If the virtual memory manager in Linux realizes that a memory page has been
    allocated but not used for a significant amount of time, it moves this memory
    page to swap space.

    The fact that swap space is being used does not indicate a memory bottleneck;
    instead it proves how efficiently Linux handles system resources.

    1.3 Linux file systems
    ===

    Virtual file system
    ===

    VFS is an abstraction interface layer that resides between the user process
    and various types of Linux file system implementations.

    Journaling
    ===

    **non-journaling file system**
    *fsck* checks all the metadata and recover the consistency at the time of next
    reboot. But when the system has a large volume, it takes a lot of time to be
    completed. **The system is not operational during this process**

    **journaling file system**
    Writing data to be changed to the area called the journal area before writing
    the data to the actual file system. The journal area can be placed both in the
    file system or out of the file system. The data written to the journal area is
    called the journal log. It includes the changes to file system metadata and the
    actual file data it supported.

    Ext2
    ===

    The extended 2 file system is the predeceessor of the extended 3 file system.

    * no journaling capabilities.
    * Starts with the boot sector and split entire file system into several small
    block groups contributes to performance gain because the i-node table and
    data blocks which hold user data can reside closer on the disk platter, so
    seek time can be reduced.

    Ext3
    ===

    * Availability: Ext3 always writes data to the disks in a consistent way, so in
    case of an unclean shutdown, the server does not have to spend time checking
    the consistency of the ddata, thereby reducing system recovery from hours to
    seconds.
    * Data integrity: By specifying the journaling mode `data=journal` on the mount
    command, all data, both file data and metadata, is journaled.
    * Speed
    * Flexibility

    **Mode of journaling**
    * journal
    * ordered
    * writeback

    1.4 Disk I/O subsystem
    ===

    Before a processor can decode and execute instructions, data should be retrieved
    all the way from sectors on a disk platter to the processor and its registers.
    The results of the executions can be written back to the disk.

    I/O subsystem architecture
    ===

    1. A process requests to write a file through the `write()` system call.
    2. The kernel updates the page cache mapped to the file.
    3. A `pdflush` kernel thread takes care of flushing the page cache to disk.
    4. The file system layer puts each block buffer together to a *bio* struct
    and submits a write request to the block device layer.
    5. The block device layer gets requests from upper layers and performs an I/O
    elevator operation and puts the requests into the I/O request queue.
    6. A device driver such as SCSI or other device specific drivers will take care
    of write operation.
    7. A disk device firmware performs hardware operations like seek head, rotation
    and data transfer to the sector on the platter.

    Cache
    ===

    **Memory hierarchy**

    L1 cache, L2 cache, L3 cache, RAM and some other caches between the CPU and
    disk.

    The higher the cache hit rate on faster memory is, the faster the access to
    the data.

    **Locality of reference**

    - The data most recently used has a high probability of being used in the near
    future.
    - The data that resides close to the data which has been used has a high
    probability of being used.

    **Flushing a dirty buffer**

    When a process changes data, it changes the memory first, so at the this time
    the data in memory and in disk is not identical and the data in memory is
    refered to as a **dirty buffer**.

    The dirty buffer should be synchronized to the data on the disk as soon as
    possible, or the data in memory could be lost if a suddden crash occurs.

    The synchronization process for a dirty buffer is called **flush**.

    **kupdate** -- occurs on a regular basis.

    `/proc/sys/vm/dirty_background_ratio` -- the propotion of dirty buffers in
    memory.

    Block layer
    ===

    The block layer handles all the activity related to block device operation.

    The *bio* structure is an interface between the file system layer and the
    block layer.

    **Block sizes**

    The smallest amount of data that can be read or written to a drive, can have a
    direct impact on a server's performance.

    **I/O elevator**

    - Anticipatory
    - Complete Fair Queuing
    - Deadline
    - NOOP

    ** I/O device driver**

    - SCSI (?)

    RAID and storage system
    ===
  5. @reterVision reterVision revised this gist Sep 22, 2014. 1 changed file with 84 additions and 1 deletion.
    85 changes: 84 additions & 1 deletion notes.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    Linux process management
    1.1 Linux process management
    ========================

    * process scheduling
    @@ -40,3 +40,86 @@ not need to copy resources on creation.

    Process priority and nice level
    ===

    Process priority is a number that determines the order in which the process is
    handled by the CPU and is determined by dynamic priority and static priority.

    Linux supports `nice` levels from 19(lowest priority) to -20(highest priority).

    Context switching
    ===

    During process execution, information on the running process is stored in
    registers on the processor and its cache. The set of data that is loaded to
    the register for the executing process is called the context.

    Interrupt handling
    ===

    The interrupt handler notifies the Linux Kernel of an event. It tells the
    kernel to interrup process execution and perform interrup handling as quickly
    as possible because some device requires quick responsiveness.

    Interrupts cause `context switching`

    In a multi-processor environment, interrupts are handled by each processor.
    Binding interrupts to a single physical processor could improve system
    performance.

    Process state
    ===

    Every process has its own state that shows what is currently happening in the
    process.

    * TASK_RUNNING
    * TASK_STOPPED
    * TASK_INTERRUPTIBLE
    * TASK_UNINTERRUPTIBLE
    * TASK_ZOMBIE

    Zombie processes
    ===

    It is not possible to kill a zombie process with the kill command, because it
    is already considered dead. If you cannot get rid of a zombie, you can kill the
    parent process and then the zombie disappears as well.

    Process memory segments
    ===

    * Text segment
    * The area where executable code is stored
    * Data segment
    * The data segment consists of these 3 areas.
    - Data: The area where initialized data such as static variables are stored
    - BSS: The area where zero-initialized data is stored. The data is
    initialized to zero.
    * Heap segment
    - Heap: The area where `malloc()` allocates dynamic memory based on the
    demand. The heap grows towards higher addresses.
    * Stack segment
    * The area where local variables, function paramenters, and the return
    address of a function is stored. The stack grows toward lower addresses.

    Linux CPU scheduler
    ===

    **O(1)**

    two process priority arrays

    * active
    * expired

    As processes are allocated a timeslice by the scheduler, based on their
    priority and prior blocking rate, they are placed in a list of processes for
    their priority in the active array. When they expire their timeslice, they are
    allocated a new timeslice and placed on the expired array.

    When all processes in the active array have expired their timeslice, the two
    arrays are switched, restarting the algorithm.

    1.2 Linux memory architecture
    ===

  6. @reterVision reterVision created this gist Sep 22, 2014.
    42 changes: 42 additions & 0 deletions notes.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,42 @@
    Linux process management
    ========================

    * process scheduling
    * interrupt handling
    * signaling
    * process prioritization
    * process switching
    * process state
    * process memory

    A process is an instance of execution that runs on a processor.

    `task_struct` -> `process descriptor`

    Life cycle of processes
    =======================

    `parent process` -> `fork()` -> `child process` -> `exec()` -> `child process`
    -> `exit()` -> `zombie process` -> `parent process`

    Copy On Write
    =============

    Kernel only assgins the new physical page to the child processes when the child
    process call `exec()` which copies the new program to the address space of the
    child process.

    The child process will not be completely removed unitl the parent process knows
    of the termination of its child process by the `wait()` system call.

    Thread
    ======

    A thread is an execution unit generated in a single process. It runs parallel
    with other threads in the same process.

    Thread creation is less expensive than process creation because a thread does
    not need to copy resources on creation.

    Process priority and nice level
    ===