art photo

DSDR KB

Q:  

I am merging two data sets and end up with too few variables. What is going on?

A:  

The most likely problem is that you have duplicate variable names across your two files. It is probably best to use a suffix or prefix in variable names so that you can tell which items come from which file. For instance, phght and mhght might represent the person’s self-reported height and m_hght might represent height from a medical record.

In SAS, the duplicate variable is dropped from the first file in the merge statement. The following provides more detail about how SAS handles duplicate variables with merges:

http://www.psc.isr.umich.edu/dis/data/prgmlib/sas/merge1.html

stata handles duplicate variable names the reverse of SAS. In other words, it drops the duplicate variable from the second file named or the ‘using’ data file.

  use temps
  merge id using tempm

In this case, the height variable from the temporary medical file (tempm) would be dropped and the height variable from the self-reported file would be kept.

 use tempm 
 merge id using temps

In this case, the height variable from the temporary self-report file will be dropped.

Annotated Resources:

Direct Links:

Related Question Groups:


W A R N I N G

If you are reading this, it may be that you are using rather old web browsing software that does not support modern international Web technology standards. For a better experience of the Web and this site in particular, please upgrade your web browser software. The following are good choices:

Firefox
Opera
Safari