Showing posts with label mutex. Show all posts

Tuesday, November 1, 2022

[FIXED] Why is the performance of rust tokio so poor? [release test result updated]

November 01, 2022 channel, mutex, performance, rust, rust-tokio No comments

Issue

The following scenarios are frequently used in asynchronous programming.

channel tx/rx;
mutex lock/unlock;
async task spawn;

So I ran some comparison tests on a lower performance cloud host (equivalent to j1900) as follows. I found that the performance of rust-tokio is very, very poor compared to go-lang.

Is there any parameter that needs to be adjusted? Can a single thread executor improve it?

Results.

tx/rx, time per op: go-lang: 112 ns,

tokio::sync::mpsc::channel: 7387 ns;

std::sync::channel: 2705 ns,

crossbean: 1062 ns.
mutex lock/unlock, per op:

tokio::sync::Mutex 4051 ns

std::sync::Mutex 321 ns
spawn (not join), per op:

tokio::spawn: 8445 ns

Rust tokio test tx/rx on channel

    #[tokio::test]
    async fn test_chan_benchmark() {
        let count = 100_000;
        let (tx, mut rx) = tokio::sync::mpsc::channel(10000);
        let start = std::time::SystemTime::now();
        let handle = tokio::spawn(async move {
            loop {
                let i = rx.recv().await.unwrap();
                if i == count - 1 {
                    break;
                }
            }
        });

        for i in 0..count {
            tx.send(i).await.unwrap();
        }
        drop(tx);

        handle.await.unwrap();
        let stop = std::time::SystemTime::now();
        let dur = stop.duration_since(start).unwrap();
        println!(
            "count={count}, cosume={}ms, ops={}ns",
            dur.as_millis(),
            dur.as_nanos() / count as u128,
        );
    }

Go channel tx/rx:

func TestChanPerformance(t *testing.T) {
    count := 1000000
    ch := make(chan int, count)
    rsp := make(chan int, 1)
    t1 := time.Now()
    go func() {
        for {
            if _, ok := <-ch; !ok {
                rsp <- 0
                break
            }
        }
    }()
    for i := 0; i < count; i++ {
        ch <- i
    }
    close(ch)
    <-rsp

    d := time.Since(t1)
    t.Logf("txrx %d times consumed %d ms, %d nspo", count, d.Milliseconds(), d.Nanoseconds()/int64(count))
}

Mutex test:

    #[tokio::test]
    async fn bench_std_mutex() {
        for count in [1_000, 10_000, 100_000] {
            let start = std::time::SystemTime::now();

            let under = Arc::new(std::sync::Mutex::new(0));
            for _ in 0..count {
                let _ = under.lock().unwrap();
            }

            let stop = std::time::SystemTime::now();
            let dur = stop.duration_since(start).unwrap();
            println!(
                "count={count}, cosume={}ms, ops={}ns",
                dur.as_millis(),
                dur.as_nanos() / count as u128,
            );
        }
    }

Tokio spawn test:

    #[tokio::test]
    async fn bench_tokio_spawn() {
        let count = 100_000;
        //let mut ths = Vec::with_capacity(count);
        let start = std::time::SystemTime::now();
        for _ in 0..count {
            tokio::spawn(async move {});
        }
        let stop = std::time::SystemTime::now();
        let dur = stop.duration_since(start).unwrap();
        //for _ in 0..count {
        //    ths.pop().unwrap().await.unwrap();
        //}
        // do not wait for join, just spawn
        println!(
            "count={count}, cosume={}ms, ops={}ns",
            dur.as_millis(),
            dur.as_nanos() / count as u128,
        );
    }

=============UPDATED=========== For --release:

std::mpsc::Mutex: 13ns;
tokio::mpsc::Mutex: 130ns;
std::mpsc::channel: 200ns;
tokio::mpsc::channel: 256ns;
tokio::spawn: 553ns;

Solution

Add --release to instruct the compiler to perform optimizations.

To demonstrate just how much of a difference this makes, here is a simple add function compiled with and without optimizations:

pub fn add(a: u32, b: u32) -> u32 {
    a + b
}

with optimizations:

example::add:
        lea     eax, [rdi + rsi]
        ret

without optimizations:

example::add:
        push    rax
        add     edi, esi
        mov     dword ptr [rsp + 4], edi
        setb    al
        test    al, 1
        jne     .LBB0_2
        mov     eax, dword ptr [rsp + 4]
        pop     rcx
        ret
.LBB0_2:
        lea     rdi, [rip + str.0]
        lea     rdx, [rip + .L__unnamed_1]
        mov     rax, qword ptr [rip + core::panicking::panic@GOTPCREL]
        mov     esi, 28
        call    rax
        ud2

.L__unnamed_2:
        .ascii  "/app/example.rs"

.L__unnamed_1:
        .quad   .L__unnamed_2
        .asciz  "\017\000\000\000\000\000\000\000\002\000\000\000\005\000\000"

str.0:
        .ascii  "attempt to add with overflow"

Note that the optimized version does no longer contain an overflow check. The overflow check is very useful during debugging, but also very slow.

Answered By - Finomnis

Answer Checked By - Clifford M. (PHPFixing Volunteer)

[FIXED] How should I implement a concurrent approximate counter in rust?

October 24, 2022 automatic-ref-counting, concurrency, mutex, operating-system, rust No comments

Issue

I am reading chapter 29 of OS: the three easy pieces, which is about concurrent data structure. The first example of concurrent data structure is approximate counter. This data structure increments numbers by using a global Mutex and several local Mutexes with local counters. When a local counter hits a threshold, it grabs the global mutex and flushes its local counter number to the global counter.

This chapter shows code in C language. Since I'm practicing Rust language, I've been trying to implement the data structure in Rust, but I encountered the Deref trait implementation error. It seems that "Arc" can be used only with few structs. How should I change my code?

The original code in C

typedef struct __counter_t {
    int global; // global count
    pthread_mutex_t glock; // global lock
    int local[NUMCPUS]; // per-CPU count
    pthread_mutex_t llock[NUMCPUS]; // ... and locks
    int threshold; // update frequency
} counter_t;

// init: record threshold, init locks, init values
// of all local counts and global count
void init(counter_t *c, int threshold) {
    c->threshold = threshold;
    c->global = 0;
    pthread_mutex_init(&c->glock, NULL);
    int i;

    for (i = 0; i < NUMCPUS; i++) {
        c->local[i] = 0;
        pthread_mutex_init(&c->llock[i], NULL);
    }
}


// update: usually, just grab local lock and update
// local amount; once local count has risen ’threshold’,
// grab global lock and transfer local values to it
void update(counter_t *c, int threadID, int amt) {
    int cpu = threadID % NUMCPUS;
    pthread_mutex_lock(&c->llock[cpu]);
    c->local[cpu] += amt;
    if (c->local[cpu] >= c->threshold) {
        // transfer to global (assumes amt>0)
        pthread_mutex_lock(&c->glock);
        c->global += c->local[cpu];
        pthread_mutex_unlock(&c->glock);
        c->local[cpu] = 0;
    }
    pthread_mutex_unlock(&c->llock[cpu]);
}

// get: just return global amount (approximate)
int get(counter_t *c) {
    pthread_mutex_lock(&c->glock);
    int val = c->global;
    pthread_mutex_unlock(&c->glock);
    return val; // only approximate!
}

My code:

use std::fmt;
use std::sync::{Arc, Mutex};


pub struct Counter {
    value: Mutex<i32>
}

impl Counter {
    pub fn new() -> Self {
        Counter { value: Mutex::new(0)}
    }

    pub fn test_and_increment(&mut self) -> i32 {
        let mut value = self.value.lock().unwrap();
        *value += 1;

        if *value >= 10 {
            let old = *value;
            *value = 0;
            return old;
        }
        else {
            return 0;
        }
    }

    pub fn get(&mut self) -> i32 {
        *(self.value.lock().unwrap())
    }

    pub fn add(&mut self, value: i32) {
        *(self.value.lock().unwrap()) += value;
    }
}


impl fmt::Display for Counter {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}", *self.value.lock().unwrap())
    }
}


pub struct ApproximateCounter {
    value: Counter,
    local_counters: [Counter; 4]
}

impl ApproximateCounter {
    pub fn new() -> Self {
        ApproximateCounter {
            value: Counter::new(),
            local_counters: [Counter::new(), Counter::new(), Counter::new(), Counter::new()]
        }
    }

    pub fn increment(&mut self, i: usize) {
        let local_value = self.local_counters[i].test_and_increment();

        if local_value > 0 {
            self.value.add(local_value);
        }
    }

    pub fn get(&mut self) -> i32 {
        self.value.get()
    }
}

fn main() {
    let mut counter = Arc::new(ApproximateCounter::new());
    let mut threads = Vec::new();
    for i in 0..4 {
        let c_counter = counter.clone();
        threads.push(thread::spawn(move || {
            for _ in 0..100 {
                c_counter.increment(i);
            }
        }));
    }
    for thread in threads {
        thread.join();
    }
    println!("{}", counter.get());
}

Error message:

error[E0596]: cannot borrow data in an `Arc` as mutable
  --> src/main.rs:54:21
   |
54 |                     c_counter.increment(i);
   |                     ^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
   |
   = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Arc<ApproximateCounter>

Solution

ApproximateCounter::increment takes &mut self, but it should take &self instead. Arc gives you a shared reference. You therefore cannot obtain a mutable reference to something that is held in Arc. However Mutex provides interior mutability allowing you to mutate data behind a shared reference. So if you change incerment and test_and_incremet methods to take shared references to self instead your code should work.

Couple minor things:

Your code won't compile, because you inside struct Counter you have Mutex, but you didn't specify it's type. Change it to Mutex<i32>.
get and add methods should take &self as well.
When you are cloning Arc prefer Arc::clone(&thing) over thing.clone().

Answered By - Aleksander Krauze

Answer Checked By - Willingham (PHPFixing Volunteer)

[FIXED] How to find the number of threads waiting for mutex lock in Sync.Mutex?

September 30, 2022 concurrency, go, multithreading, mutex No comments

Issue

I am using this in a go subroutine

    a.Lock()
    ...
    enter code here
    ...
    a.Unlock()

Is there any api to know how many threads are waiting at a.Lock()?

Solution

Is there any api to know how many threads are waiting at a.Lock()?

No.

(And basically if you need to know that you should redesign your aglorithm.)

Answered By - Volker

Answer Checked By - David Goodson (PHPFixing Volunteer)

[FIXED] What is the Swift equivalent to Objective-C's "@synchronized"?

September 30, 2022 concurrency, mutex, swift No comments

Issue

I've searched the Swift book, but can't find the Swift version of @synchronized. How do I do mutual exclusion in Swift?

Solution

With the advent of Swift concurrency, we would use actors.

You can use tasks to break up your program into isolated, concurrent pieces. Tasks are isolated from each other, which is what makes it safe for them to run at the same time, but sometimes you need to share some information between tasks. Actors let you safely share information between concurrent code.

Like classes, actors are reference types, so the comparison of value types and reference types in Classes Are Reference Types applies to actors as well as classes. Unlike classes, actors allow only one task to access their mutable state at a time, which makes it safe for code in multiple tasks to interact with the same instance of an actor. For example, here’s an actor that records temperatures:
actor TemperatureLogger {
    let label: String
    var measurements: [Int]
    private(set) var max: Int

    init(label: String, measurement: Int) {
        self.label = label
        self.measurements = [measurement]
        self.max = measurement
    }
}
You introduce an actor with the actor keyword, followed by its definition in a pair of braces. The TemperatureLogger actor has properties that other code outside the actor can access, and restricts the max property so only code inside the actor can update the maximum value.

For more information, see WWDC video Protect mutable state with Swift actors.

For the sake of completeness, the historical alternatives include:

GCD serial queue: This is a simple pre-concurrency approach to ensure that one one thread at a time will interact with the shared resource.
Reader-writer pattern with concurrent GCD queue: In reader-writer patterns, one uses a concurrent dispatch queue to perform synchronous, but concurrent, reads (but concurrent with other reads only, not writes) but perform writes asynchronously with a barrier (forcing writes to not be performed concurrently with anything else on that queue). This can offer a performance improvement over a simple GCD serial solution, but in practice, the advantage is modest and comes at the cost of additional complexity (e.g., you have to be careful about thread-explosion scenarios). IMHO, I tend to avoid this pattern, either sticking with the simplicity of the serial queue pattern, or, when the performance difference is critical, using a completely different pattern.
Locks: In my Swift tests, lock-based synchronization tends to be substantially faster than either of the GCD approaches. Locks come in a few flavors:
- NSLock is a nice, relatively efficient lock mechanism.
- In those cases where performance is of paramount concern, I use “unfair locks”, but you must be careful when using them from Swift (see https://stackoverflow.com/a/66525671/1271826).
- For the sake of completeness, there is also the recursive lock. IMHO, I would favor simple NSLock over NSRecursiveLock. Recursive locks are subject to abuse and often indicate code smell.
- You might see references to “spin locks”. Many years ago, they used to be employed where performance was of paramount concern, but they are now deprecated in favor of unfair locks.
Technically, one can use semaphores for synchronization, but it tends to be the slowest of all the alternatives.

I outline a few my benchmark results here.

In short, nowadays I use actors for contemporary codebases, GCD serial queues for simple scenarios non-async-await code, and locks in those rare cases where performance is essential.

And, needless to say, we often try to reduce the number of synchronizations altogether. If we can, we often use value types, where each thread gets its own copy. And where synchronization cannot be avoided, we try to minimize the number of those synchronizations where possible.

Answered By - Rob

Answer Checked By - Senaida (PHPFixing Volunteer)

[FIXED] Why is lock_guard a template?

September 30, 2022 c++, concurrency, mutex No comments

Issue

I just learned about std::lock_guard and I was wondering why it is a template.
Until now I have only seen std::lock_guard<std::mutex> with std::mutex inside the angle brackets.

Solution

Using std::lock_guard<std::mutex> is indeed quite common.
But you can use std::lock_guard with other mutex types:

Various standard mutex types, e.g.: std::recursive_mutex.
Your own mutex type. You can use any type, as long as it is a BasicLockable, i.e. it supports the required methods: lock(), unlock().

Answered By - wohlstad

Answer Checked By - Marie Seifert (PHPFixing Admin)

Tuesday, November 1, 2022

[FIXED] Why is the performance of rust tokio so poor? [release test result updated]

Issue

Solution

Monday, October 24, 2022

[FIXED] How should I implement a concurrent approximate counter in rust?

Issue

Solution

Friday, September 30, 2022

[FIXED] How to find the number of threads waiting for mutex lock in Sync.Mutex?

Issue

Solution

[FIXED] What is the Swift equivalent to Objective-C's "@synchronized"?

Issue

Solution

[FIXED] Why is lock_guard a template?

Issue

Solution

Total Pageviews

Featured Post

Why Learn PHP Programming

Tuesday, November 1, 2022

Issue

Solution

Monday, October 24, 2022

Issue

Solution

Friday, September 30, 2022

Issue

Solution

Issue

Solution

Issue

Solution

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To