# 问题内容:

I have a data table that looks like

``````|userId|36|37|38|39|40|
|1|1|0|3|0|0|
|2|3|0|0|0|1|
``````

Where each numbered column (36-40) represent week numbers. I want to calculate the number of weeks before the 1st occurrence of a non-zero value, and the last.

For instance, for userId 1 in my dataset, the first value appears at week 36, and the last one appears at week 38, so the value I want is 2. For userId 2 it’s 40-36 which is 4.

I would like to store the data like:

``````|userId|lifespan|
|1|2|
|2|4|
``````

# 答案:

## 答案1:

General method I would take is to melt it, convert the character column names to numeric, and take the delta by each userID. Here is an example using `data.table`.

``````library(data.table)
1|1|0|3|0|0
2|3|0|0|0|1",

dt <- melt(dt, id.vars = "userId")
dt[, variable := as.numeric(as.character(variable))]
dt
#     userId variable value
#  1:      1       36     1
#  2:      2       36     3
#  3:      1       37     0
#  4:      2       37     0
#  5:      1       38     3
#  6:      2       38     0
#  7:      1       39     0
#  8:      2       39     0
#  9:      1       40     0
# 10:      2       40     1
dt[!value == 0, .(lifespan = max(variable) - min(variable)), by = .(userId)]
#    userId lifespan
# 1:      1        2
# 2:      2        4
``````

## 答案评论:

This is exactly what I was after, thank you!
– Benirving92
21 mins ago

## 答案2:

Here’s a `dplyr` method:

``````df %>%
gather(var, value, -userId) %>%
mutate(var = as.numeric(sub("X", "", var))) %>%
group_by(userId) %>%
slice(c(which.max(value!=0), max(which(value!=0)))) %>%
summarize(lifespan = var-var)
``````

Result:

``````# A tibble: 2 x 2
userId lifespan
<int>    <dbl>
1      1        2
2      2        4
``````

Data:

``````df = read.table(text = "userId|36|37|38|39|40
1|1|0|3|0|0
2|3|0|0|0|1", header = TRUE, sep = "|")
``````

## 原文地址：

https://stackoverflow.com/questions/47756325/find-index-of-first-and-last-occurrence-in-data-table

Tags: