I know I can use the
plyr and its friends to combine dataframes, and
merge as well, but so far I don’t know how to merge two dataframes with multiple columns based on 2 columns?
See the documentation on
?merge, which states:
By default the data frames are merged on the columns with names they both have,
but separate specifications of the columns can be given by by.x and by.y.
This clearly implies that
merge will merge data frames based on more than one column. From the final example given in the documentation:
x <- data.frame(k1=c(NA,NA,3,4,5), k2=c(1,NA,NA,4,5), data=1:5)
y <- data.frame(k1=c(NA,2,NA,4,5), k2=c(NA,NA,3,4,5), data=1:5)
merge(x, y, by=c("k1","k2")) # NA's match
This example was meant to demonstrate the use of
incomparables, but it illustrates merging using multiple columns as well. You can also specify separate columns in each of
Hope this helps;
df1 = data.frame(CustomerId=c(1:10),
Hobby = c(rep("sing", 4), rep("pingpong", 3), rep("hiking", 3)),
Product=c(rep("Toaster",3),rep("Phone", 2), rep("Radio",3), rep("Stereo", 2)))
df2 = data.frame(CustomerId=c(2,4,6, 8, 10),State=c(rep("Alabama",2),rep("Ohio",1), rep("Cal", 2)),
like=c("sing", 'hiking', "pingpong", 'hiking', "sing"))
df3 = merge(df1, df2, by.x=c("CustomerId", "Hobby"), by.y=c("CustomerId", "like"))
df2$like mean the same thing.