Wednesday, January 13, 2016

Salomon Singer - Mutexes & Semaphores Demystified @ Barr Group, 13 May 2015

Summary @ 46:48


Summary @ 47:16

For the 54 minutes video: https://vimeo.com/128176447

Salomon Singer, mailto:ssinger@barrgroup.com, is the principal engineer at the Barr Group having 30+ years experience in embedded systems. In this excellent talk, he explains the mutexes and semaphores and emphasizes the differences between them. Please note that in the Internet there are a lot of wrong information on how mutexes and semaphores differ!

 I took the following notes while watching the excellent talk:

11:25  Salomon Singer starts his talk after Michael Barr's introduction on

       the Barr Group etc.

12:45  Code at risk due to shared resources is called critical section. Left

       unprotected we might have race condition.

15:00  Tasks are like cars approaching an intersection. Need a protocol to

       prevent mishaps.

16:00  Mutual Exclusion: the programmer should be involved for protecting

       the critical sections.

17:00  Mutex is an operating system data structure to ensure that tasks have

       exclusive access to shared variables or hardware resources. It is 
       short for "mutual exclusion". It is used to prevent race conditions.

17:30  Pseudocode:

           Acquire mutex
           // Critical section code
           Release mutex

17:55  Each mutex protects one shared resource.


18:30  Bathroom key analogy: only one person may use the bathroom at a time.

           Mutex: bathroom key
           Tasks: customers
           Shared resource: bathroom

19:00  One conceptual implementation:

           typedef struct {
               TCB *p_block_q; // queue of blocked tasks: a linked list
                               // of waiting tasks
                               // NULL if there is no waiting task
               int value;      // current mutex state: 1 -> available
           } mutex_t;          //                      0 -> not available

20:00  Each mutex is created at the request of a task. Pseudocode:

          // allocate space for a mutex
          m. p_block_q = (TCB *)NULL; // no tasks waiting
          m.value = 1;                // initially available
          return &m;                  // return a handle

20:42  Mutexes are totally scalable solution: if N tasks try to acquire the

       mutex, only the first one will succeed and N - 1 tasks will wait in 
       the queue. When a tasks releases the mutex, one task in the wait queue 
       will be ready.

21:26  Summary: a mutex is created in the available state. Always use a mutex

       by protocol. Never release a mutex previously not acquired.

22:32  What if two tasks call the same function at about the same time?

       Each such function must be reentrant! Non-reentrant functions are
       the source of many problems that are extremely difficult if not 
       impossible to reproduce.

24:30  Requirements for reentrancy:

          - Safe to use automatic variables: each task has its own stack.
          - Unsafe to use static or dynamic variables: unless protected 
            with mutexes. 
          - Peripheral registers: access must always be protected!

25:30  Example: TCP/IP networking. The network driver controls just one 

       chip just like a global variable.

26:50  Reentrant functions are the best places to use mutexes.


27:27  Where and when to use mutexes:

           Rule 1: Use mutexes in "leaf node" source files, i.e. in source
                   files that don't make calls to any other source files.
           Rule 2: Avoid mutexes in "task level" source files. Prefer 
                   directional intertask communication like semaphores, 
                   mailboxes or message queues.

28:40  What is semaphore? It is an RTOS-provided data structure, at core 

       an integer counter. Safe for parallel-running tasks. All the data
       structure updates inside the RTOS are atomic.

29:15  Types of Semaphores:

           - Counting Semaphore: a multitasking safe counter. Can be 
             incremented/decremented automatically via API calls.
           - Binary Semaphore: Can be set to 1 or 0 automatically via API
             calls.
           - Mutual Exclusion Semaphore (Mutex): A special binary semaphore
             with additional capabilities.

30:10  One conceptual implementation:

           typedef struct {
               TCB *p_block_q; // queue of blocked tasks: a linked list
                               // of waiting tasks
                               // NULL if there is no waiting task
               int value;      // current state: 0..N
           } sem_t;

30:45  OS Implementer: similarities keep coming!

       OS user: try to forget you ever saw this!

30:55  Because semaphores and mutexes have completely different uses!


31:20  ISR's can only make non-blocking calls:

           - It is never safe to wait for a semaphore.
           - However, it is safe to signal a semaphore: that will never block!

31:55  Tasks can safely wait and signal a semaphore.

32:00  Mutexes vs Semaphores:

           - Mutexes are to be used only for mutual exclusion. Always
             init 1, wait() first and then leave().

32:30      - Semaphores are for one-to-one signalling. Can be init to any

             value. Typically init to 0. There is no prescribed order to
             use the API.

34:10  Problem: multiple identical resources. Analogy: 2+ bathrooms.


34:35  If you Google this problem, many sources say a counting semaphore will

       work! They are all wrong. A counting semaphore can't protect 2+ 
       resources! Need to know which one is available.
  
34:55  The right solution will be using distinct mutexes for each bathroom. One
       key for women and another key for men.

35:28  Priority inversion problem.

       A high priority task needs a resource being used by a low-priority task.
       So, it is waiting for the low-priority task. However, a medium-priority
       task preempts the low-priority task having higher priority!  
       Consequently, the high-priority tasks wait for the low-priority task
       longer than necessary! It may cause the high-priority task to miss a
       deadline.

37:48  What is the implications of priority inversion? It is impossible to

       be certain that all deadlines will be met! Thus, we need a way to 
       prevent priority inversions.

38:35  Priority inversion workarounds for mutexes:

           - Highest locker or priority ceiling protocol. Can be implemented 
             above the RTOS. Each shared resource is assigned a priority just 
             above its users' priorities. Calls to raise and lower task 
             priorities are used instead of mutex calls.

39:28      - Priority inheritance protocol. Can be implemented only in the

             RTOS. The priority of a task that holds a resource is increased 
             automatically but only when a high-priority task wants the resource
             and only until the low-priority task releases it. Upon completion, 
             the priority of the task will be demoted. It scales nicely.

40:00  Priority ceiling example.


41:18  Priority inheritance example.


42:43  Priority inheritance scales nicely example.


44:34  Mars Pathfinder story: priority inversion example. WxWorks has a mutex 

       priority inheritance option, but it wasn't turned on!

47:50  Starts to answer the questions.









  

No comments: