congto · March 25, 2018 10:54 · Sep 24, 2014 · Sep 24, 2014 · Sep 22, 2014 · Sep 22, 2014
diff --git a/notes.md b/notes.md
@@ -429,7 +429,8 @@ utilization.
 
 **Bonding module**
 
-**1.6 Understanding Linux performance metrics**
+1.6 Understanding Linux performance metrics
+===
 
 **Processor metrics**
   - CPU utilization
@@ -467,5 +468,6 @@ utilization.
   - Blocks read/write per second
   - Kilobytes per second read/write
 
-**2 Monitoring and benchmark tools**
+2 Monitoring and benchmark tools
+===
 
diff --git a/notes.md b/notes.md
@@ -316,5 +316,156 @@ direct impact on a server's performance.
 
  - SCSI (?)
 
-RAID and storage system
+1.4 RAID and storage system
 ===
+
+1.5 Network subsystem
+===
+
+**Networking implementation**
+
+The socket provides an interface for user applications.
+
+1. When an application sends data to its peer host, the application creates its
+   data
+2. The application opens the socket and writes the data through the socket
+   interface.
+3. The *socket buffer* is used to deal with the transfered data. The socket
+   buffer has reference to the data and it goes down through the layers.
+4. In each layer, appropriate operations such as parsing the headers, adding
+   and modifying the headers, check sums, routing operation, fragmentation, and
+   so on are performed. When the socket buffer goes down through the layers,
+   the data itself is not copied between the layers. Because copying actual
+   data between different layers is not effective, the kernel avoids unnecessary
+   overhead by just changing the reference in the socket buffer and passing it
+   to the next layer.
+5. Finally, the data goes out to the wire from the network interface card.
+6. The Ethernet frame arrives at the network interface of the peer host.
+7. The frame is moved into the network interface card buffer if the MAC address
+   matches the MAC address of the interface card.
+8. The network interface card eventually moves the packet into a socket buffer
+   and issues a hard interrupt at the CPU.
+9. The CPU then processes the packeet and moves it up through the layers until
+   it arrives at (for example) a TCP port of an application such as Apache.
+
+**Socket buffer**
+
+```
+/proc/sys/net/core/rmem_max
+/proc/sys/net/core/rmem_default
+/proc/sys/net/core/wmem_max
+/proc/sys/net/core/wmem_default
+/proc/sys/net/ipv4/tcp_mem
+/proc/sys/net/ipv4/tcp_rmem
+/proc/sys/net/ipv4/tcp_wmem
+```
+
+**Network API(NAPI)**
+
+The standard implementation of the network stack in Linux focuses more on
+reliability and low latency than on low overhead and high throughput.
+
+Gigabit Ethernet and modern applications can create thousands of packets per
+second, causing a large number of interruts and context switches to occur.
+
+For the first packet, NAPI works just like traditional implementation as it
+issues an interrupt for the first packet. But after the first packet, the
+interface goes into a polling mode. As long as there are packets in the DMA
+ring buffer of the network interface, no new interrupts will be caused,
+effectively reducing context switching and the associated overhead.
+Should the last packet be processed and the ring buffer be emptied, then the
+interface card will again fall back into the interrupt mode. NAPI also has the
+advantage of improved multiprocessor scalability by creating soft interrupts
+that can be handled by multiple processors.
+
+**Netfilter**
+
+You can manipulate and configure Netfilter using the iptables utility.
+
+- **Packet filtering**: If a packet matchs a rule, Netfilter accepts or denies the
+  packets or takes appropriate action based on defined rules.
+- **Address translation**: If a packeet matchs a rule, Netfilter alters the packet
+  to meet the address translation requirements.
+
+**Netfilter Connection tracking**
+
+- NEW:        packet attempting to establish new connection
+- ESTABLISED: packet goes through established connection
+- RELATED:    packet which is related to previous packets
+- INVALID:    packet which is unknown state due to malformed or invalid packet
+
+**TCP/IP**
+
+* Connection establishment
+* Connection close
+  - The client sends a FIN packet to the server to start the connection
+    termination process.
+  - The server sends a ACK of the FIN back and then sends the FIN packet to
+    the client if has no data to send to the client.
+  - The client sends an ACK packet to the server to complete connection
+    termination.
+
+**Traffic control**
+
+- **TCP/IP transfer window**
+  - Basically, the TCP transfer window is the maximum amount of data a given
+    host can send or receive before requiring an ACK from the other side of
+    the connection.
+  - The windows size is offered from the receiving host to the sending host
+    by the window size field in the TCP header.
+
+- **Retransmission**
+  - TCP/IP handles the timeouts and data retransmission problem by queuing
+    packets and trying to send packets several times.
+
+**Offload**
+
+If the neetwork adapter on your system supports hardware offload functionality,
+the kernel can offload part of its task to the adapter and it can reduce CPU
+utilization.
+
+- Checksum offload
+- TCP segmentation offload
+
+**Bonding module**
+
+**1.6 Understanding Linux performance metrics**
+
+**Processor metrics**
+  - CPU utilization
+  - User time
+  - System time
+  - Waiting
+  - Idel time
+  - Nice time
+  - Load average
+  - Runable processes
+  - Blocked
+  - Context switch
+  - Interrupts
+
+**Memory metrics**
+  - Free memory
+  - Swap usage
+  - Buffer and cache
+  - Slabs
+  - Active versus inactive memory
+
+**Network interface metrics**
+  - Packets received and sent
+  - Bytes received and sent
+  - Collistions per second
+  - Packets dropped
+  - Overruns
+  - Errors
+
+**Block device metrics**
+  - lowait
+  - Average queue length
+  - Average wait
+  - Transfers per second
+  - Blocks read/write per second
+  - Kilobytes per second read/write
+
+**2 Monitoring and benchmark tools**
+
diff --git a/notes.md b/notes.md
@@ -312,7 +312,7 @@ direct impact on a server's performance.
 - Deadline
 - NOOP
 
-** I/O device driver**
+**I/O device driver**
 
  - SCSI (?)
 

diff --git a/notes.md b/notes.md
@@ -106,6 +106,8 @@ Linux CPU scheduler
 ===
 
 **O(1)**
+http://en.wikipedia.org/wiki/O(1)_scheduler
+http://www.ibm.com/developerworks/library/l-completely-fair-scheduler/
 
 two process priority arrays
 
@@ -123,3 +125,196 @@ arrays are switched, restarting the algorithm.
 1.2 Linux memory architecture
 ===
 
+32-bit architectures -- 4 GB address space (3 GB usesr space and 1 GB kernel
+			space)
+64-bit architectures -- 512 GB or more for both user/kernel space.
+
+Virtual memory manager
+===
+
+Applications do not allocate physical memory, but request a memory map of a
+certain size at the Linux kernel and in exchange receive a map in virtual
+memory.
+
+VM does not necessarily have to be mapped into physical memory. If your app
+allocates a large amount of memory, some of it might be mmapped to the swap
+file on the disk subsystem.
+
+Applications usually do not write directly to the disk subsystem, but into
+cache or buffers.
+
+Page frame allocation
+===
+
+A page is a group of contiguous linear addresses in physical memory (page frame)
+or virtual memory.
+
+A page is usually 4K bytes in size.
+
+Buddy system
+===
+
+The Linux kernel maintains its free pages by using a mechanism called a
+*buddy system*.
+
+The buddy system maintains free pages and tries to allocate pages for page
+allocation requests. It tries to keep the memory area contiguous.
+
+When the attempt of pages allocation fails, the page reclaiming is activated.
+
+Page frame reclaiming
+===
+
+*kswapd* kernel thread and `try_to_free_page()` kernel function are responsible
+for page reclaiming.
+
+*kswapd* tries to find the candidate pages to be taken out of active pages
+based on *LRU* principle.
+
+The pages are used mainly for two purposes: *page cache* and *process address space*
+The page cache is pages mapped to a file on disk.
+The pages that belong to a process address space are used for heap and stack.
+
+**swap**
+
+If the virtual memory manager in Linux realizes that a memory page has been
+allocated but not used for a significant amount of time, it moves this memory
+page to swap space.
+
+The fact that swap space is being used does not indicate a memory bottleneck;
+instead it proves how efficiently Linux handles system resources.
+
+1.3 Linux file systems
+===
+
+Virtual file system
+===
+
+VFS is an abstraction interface layer that resides between the user process
+and various types of Linux file system implementations.
+
+Journaling
+===
+
+**non-journaling file system**
+*fsck* checks all the metadata and recover the consistency at the time of next
+reboot. But when the system has a large volume, it takes a lot of time to be
+completed. **The system is not operational during this process**
+
+**journaling file system**
+Writing data to be changed to the area called the journal area before writing
+the data to the actual file system. The journal area can be placed both in the
+file system or out of the file system. The data written to the journal area is
+called the journal log. It includes the changes to file system metadata and the
+actual file data it supported.
+
+Ext2
+===
+
+The extended 2 file system is the predeceessor of the extended 3 file system.
+
+* no journaling capabilities.
+* Starts with the boot sector and split entire file system into several small
+  block groups contributes to performance gain because the i-node table and
+  data blocks which hold user data can reside closer on the disk platter, so
+  seek time can be reduced.
+
+Ext3
+===
+
+* Availability: Ext3 always writes data to the disks in a consistent way, so in
+  case of an unclean shutdown, the server does not have to spend time checking
+  the consistency of the ddata, thereby reducing system recovery from hours to
+  seconds.
+* Data integrity: By specifying the journaling mode `data=journal` on the mount
+  command, all data, both file data and metadata, is journaled.
+* Speed
+* Flexibility
+
+**Mode of journaling**
+* journal
+* ordered
+* writeback
+
+1.4 Disk I/O subsystem
+===
+
+Before a processor can decode and execute instructions, data should be retrieved
+all the way from sectors on a disk platter to the processor and its registers.
+The results of the executions can be written back to the disk.
+
+I/O subsystem architecture
+===
+
+1. A process requests to write a file through the `write()` system call.
+2. The kernel updates the page cache mapped to the file.
+3. A `pdflush` kernel thread takes care of flushing the page cache to disk.
+4. The file system layer puts each block buffer together to a *bio* struct
+   and submits a write request to the block device layer.
+5. The block device layer gets requests from upper layers and performs an I/O
+   elevator operation and puts the requests into the I/O request queue.
+6. A device driver such as SCSI or other device specific drivers will take care
+   of write operation.
+7. A disk device firmware performs hardware operations like seek head, rotation
+   and data transfer to the sector on the platter.
+
+Cache
+===
+
+**Memory hierarchy**
+
+L1 cache, L2 cache, L3 cache, RAM and some other caches between the CPU and
+disk.
+
+The higher the cache hit rate on faster memory is, the faster the access to
+the data.
+
+**Locality of reference**
+
+- The data most recently used has a high probability of being used in the near
+  future.
+- The data that resides close to the data which has been used has a high
+  probability of being used.
+
+**Flushing a dirty buffer**
+
+When a process changes data, it changes the memory first, so at the this time
+the data in memory and in disk is not identical and the data in memory is
+refered to as a **dirty buffer**.
+
+The dirty buffer should be synchronized to the data on the disk as soon as
+possible, or the data in memory could be lost if a suddden crash occurs.
+
+The synchronization process for a dirty buffer is called **flush**.
+
+**kupdate** -- occurs on a regular basis.
+
+`/proc/sys/vm/dirty_background_ratio` -- the propotion of dirty buffers in
+memory.
+
+Block layer
+===
+
+The block layer handles all the activity related to block device operation.
+
+The *bio* structure is an interface between the file system layer and the
+block layer.
+
+**Block sizes**
+
+The smallest amount of data that can be read or written to a drive, can have a
+direct impact on a server's performance.
+
+**I/O elevator**
+
+- Anticipatory
+- Complete Fair Queuing
+- Deadline
+- NOOP
+
+** I/O device driver**
+
+ - SCSI (?)
+
+RAID and storage system
+===
diff --git a/notes.md b/notes.md
@@ -1,4 +1,4 @@
-Linux process management
+1.1 Linux process management
 ========================
 
 * process scheduling
@@ -40,3 +40,86 @@ not need to copy resources on creation.
 
 Process priority and nice level
 ===
+
+Process priority is a number that determines the order in which the process is
+handled by the CPU and is determined by dynamic priority and static priority.
+
+Linux supports `nice` levels from 19(lowest priority) to -20(highest priority).
+
+Context switching
+===
+
+During process execution, information on the running process is stored in
+registers on the processor and its cache. The set of data that is loaded to
+the register for the executing process is called the context.
+
+Interrupt handling
+===
+
+The interrupt handler notifies the Linux Kernel of an event. It tells the
+kernel to interrup process execution and perform interrup handling as quickly
+as possible because some device requires quick responsiveness.
+
+Interrupts cause `context switching`
+
+In a multi-processor environment, interrupts are handled by each processor.
+Binding interrupts to a single physical processor could improve system
+performance.
+
+Process state
+===
+
+Every process has its own state that shows what is currently happening in the
+process.
+
+* TASK_RUNNING
+* TASK_STOPPED
+* TASK_INTERRUPTIBLE
+* TASK_UNINTERRUPTIBLE
+* TASK_ZOMBIE
+
+Zombie processes
+===
+
+It is not possible to kill a zombie process with the kill command, because it
+is already considered dead. If you cannot get rid of a zombie, you can kill the
+parent process and then the zombie disappears as well.
+
+Process memory segments
+===
+
+* Text segment
+  * The area where executable code is stored
+* Data segment
+  * The data segment consists of these 3 areas.
+    - Data: The area where initialized data such as static variables are stored
+    - BSS: The area where zero-initialized data is stored. The data is
+           initialized to zero.
+* Heap segment
+    - Heap: The area where `malloc()` allocates dynamic memory based on the
+            demand. The heap grows towards higher addresses.
+* Stack segment
+  * The area where local variables, function paramenters, and the return
+    address of a function is stored. The stack grows toward lower addresses.
+
+Linux CPU scheduler
+===
+
+**O(1)**
+
+two process priority arrays
+
+* active
+* expired
+
+As processes are allocated a timeslice by the scheduler, based on their
+priority and prior blocking rate, they are placed in a list of processes for
+their priority in the active array. When they expire their timeslice, they are
+allocated a new timeslice and placed on the expired array.
+
+When all processes in the active array have expired their timeslice, the two
+arrays are switched, restarting the algorithm.
+
+1.2 Linux memory architecture
+===
+
diff --git a/notes.md b/notes.md
@@ -0,0 +1,42 @@
+Linux process management
+========================
+
+* process scheduling
+* interrupt handling
+* signaling
+* process prioritization
+* process switching
+* process state
+* process memory
+
+A process is an instance of execution that runs on a processor.
+
+`task_struct` -> `process descriptor`
+
+Life cycle of processes
+=======================
+
+`parent process` -> `fork()` -> `child process` -> `exec()` -> `child process`
+-> `exit()` -> `zombie process` -> `parent process`
+
+Copy On Write
+=============
+
+Kernel only assgins the new physical page to the child processes when the child
+process call `exec()` which copies the new program to the address space of the
+child process.
+
+The child process will not be completely removed unitl the parent process knows
+of the termination of its child process by the `wait()` system call.
+
+Thread
+======
+
+A thread is an execution unit generated in a single process. It runs parallel
+with other threads in the same process.
+
+Thread creation is less expensive than process creation because a thread does
+not need to copy resources on creation.
+
+Process priority and nice level
+===