I am merging two data sets and end up with too few variables. What is going on?


The most likely problem is that you have duplicate variable names across your two files. It is probably best to use a suffix or prefix in variable names so that you can tell which items come from which file. For instance, phght and mhght might represent the person’s self-reported height and m_hght might represent height from a medical record.

In SAS, the duplicate variable is dropped from the first file in the merge statement. The following provides more detail about how SAS handles duplicate variables with merges:

stata handles duplicate variable names the reverse of SAS. In other words, it drops the duplicate variable from the second file named or the ‘using’ data file.

  use temps
  merge id using tempm

In this case, the height variable from the temporary medical file (tempm) would be dropped and the height variable from the self-reported file would be kept.

 use tempm 
 merge id using temps

In this case, the height variable from the temporary self-report file will be dropped.

