# 问题内容:

I have a data table that looks like

|userId|36|37|38|39|40|
|1|1|0|3|0|0|
|2|3|0|0|0|1|


Where each numbered column (36-40) represent week numbers. I want to calculate the number of weeks before the 1st occurrence of a non-zero value, and the last.

For instance, for userId 1 in my dataset, the first value appears at week 36, and the last one appears at week 38, so the value I want is 2. For userId 2 it’s 40-36 which is 4.

I would like to store the data like:

|userId|lifespan|
|1|2|
|2|4|


# 答案:

## 答案1:

General method I would take is to melt it, convert the character column names to numeric, and take the delta by each userID. Here is an example using data.table.

library(data.table)
1|1|0|3|0|0
2|3|0|0|0|1",

dt <- melt(dt, id.vars = "userId")
dt[, variable := as.numeric(as.character(variable))]
dt
#     userId variable value
#  1:      1       36     1
#  2:      2       36     3
#  3:      1       37     0
#  4:      2       37     0
#  5:      1       38     3
#  6:      2       38     0
#  7:      1       39     0
#  8:      2       39     0
#  9:      1       40     0
# 10:      2       40     1
dt[!value == 0, .(lifespan = max(variable) - min(variable)), by = .(userId)]
#    userId lifespan
# 1:      1        2
# 2:      2        4


## 答案评论:

This is exactly what I was after, thank you!
– Benirving92
21 mins ago

## 答案2:

Here’s a dplyr method:

df %>%
gather(var, value, -userId) %>%
mutate(var = as.numeric(sub("X", "", var))) %>%
group_by(userId) %>%
slice(c(which.max(value!=0), max(which(value!=0)))) %>%
summarize(lifespan = var[2]-var[1])


Result:

# A tibble: 2 x 2
userId lifespan
<int>    <dbl>
1      1        2
2      2        4


Data:

df = read.table(text = "userId|36|37|38|39|40
1|1|0|3|0|0
2|3|0|0|0|1", header = TRUE, sep = "|")


## 原文地址：

https://stackoverflow.com/questions/47756325/find-index-of-first-and-last-occurrence-in-data-table

Tags: