# Issue

I would like to find an efficient operation to do the following look up in a list:

```
L = list(10:15,11:20)
a = c(3,7)
b = numeric()
for(i in 1:length(a)) b[i] = L[[i]][a[i]]
```

I think `for`

loops are inefficient and I imagine this can be done faster using, for example, `sapply`

. My main goal is to do this efficiently when `L`

is long.

# Solution

UPDATE:

Your aversion to a `for`

loop may be unfounded. I've found that it can be very machine dependent. On my current machine, with `b`

properly initialized, a base R `for`

loop is slower only than an `Rcpp`

solution, and that just barely. See the updated benchmark below. The `loop1`

solution is properly initialized. However, I've tried this on other machines, and on some the `for`

loops are indeed slower than the `apply`

solutions.

A base R vectorized solution using `unlist`

, `cumsum`

, and `lengths`

:

```
b <- unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))]
```

Benchmarking (tossing in an `Rcpp`

solution)*

```
library(purrr)
L <- lapply(sample(4:10, 1e5, TRUE), seq)
a <- sapply(lengths(L), function(x) sample(x, 1))
Rcpp::cppFunction("IntegerVector ListIndex(const List& L, const IntegerVector& a) {
const int n = a.size();
IntegerVector b (n);
for (int i = 0; i < n; i++) b(i) = as<IntegerVector>(L[i])(a(i) - 1);
return b;
}")
microbenchmark::microbenchmark(sapply = sapply(1:length(a), function(x) L[[x]][a[x]]),
vapply = vapply(seq_along(L), function(i) L[[i]][a[i]], integer(1)),
purr = as.integer(imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])),
unlist = unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))],
rcpp = ListIndex(L, a),
loop1 = {b <- integer(length(a)); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
loop2 = {b <- integer(); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
check = "identical")
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> sapply 102.4199 113.72450 125.21764 119.72455 130.41480 291.5465 100
#> vapply 97.8447 107.33390 116.41775 112.33445 119.01680 189.9191 100
#> purr 226.9039 241.02305 258.34032 246.81175 257.87370 502.3446 100
#> unlist 29.4186 29.97935 32.05529 30.86130 33.02160 44.6751 100
#> rcpp 22.3468 22.78460 25.47667 23.48495 26.63935 37.2362 100
#> loop1 25.5240 27.34865 28.94650 28.02920 29.32110 42.9779 100
#> loop2 41.4726 46.04130 52.58843 51.00240 56.54375 88.3444 100
```

*I couldn't get akrun's `dplyr`

solution to work with the larger vector.

Answered By - jblood94 Answer Checked By - Terry (PHPFixing Volunteer)

## 0 Comments:

## Post a Comment

Note: Only a member of this blog may post a comment.