I am working with a zip code file from the Census Bureau. Some of the zip codes have XX or HH in the code, e.g., 298HH. How can I get rid of these zip codes? What does the HH/XX code(s) represent anyway?


First, zip code tabulation areas (ZCTA) are the Census Bureau’s representation of zip codes. The HH indicates that the ZCTA represents a large water area (generally larger than 25 square miles). Because the water area may border many five-digit ZCTAs, the water area was not assigned to any zip code – hence 298HH. The XX codes are applied to large land areas where the Census Bureau had insufficient information to determine the five-digit codes. These are generally in rural areas with little settlement – parks, forest lands, deserts, etc.

The easiest way to get rid of these ZCTAs that have XX or HH in the code is to perform an arithmetic operation on the ZCTA code. All the non-numeric codes will be transformed from a character value to a missing value:

data a;
infile "fake";
input zcta $;
zip = zcta + 0;
proc print; var zcta zip;

