Gen str_village = string(int(village),"%02.0f") Gen str_country = string(int(country),"%02.0f") // %02.0f because 'country' is two digits Put simply, multi-digit variables without leading zeros “squish” together and you risk non-uniqueness (“collision”). The problems is that household 1 in year 1960 in village 19 in country 11 will have the same id as household 1 in year 1960 in village 119 in country 1 –> 1119601 for both. The temptation is to do this: egen uniqueid = concat(country village year household) They’re event data, sort of like this: country
#STATA EGEN HOW TO#
Permanently specifies that, in addition to making the change right now, the new limit be remembered and become the default setting when you invoke Stata.Someone asked me today how to create a unique ID from a dataset with four variables whose combination are unique. nopromote prevents replace from doing this instead, the replacement values are truncated to fit the current storage type.
replace promotes strings to longer strings. Similarly, replace promotes byte and int variables to longer integers ( int and long) if the replacement value is an integer but is too large in absolute value for the current storage type. By default, replace changes the variable type to a floating point ( float or double) and thus correctly stores the changed values. For instance, consider a variable stored as an integer type ( byte, int, or long), and assume that you replace some values with nonintegers. Nopromote prevents replace from promoting the variable type to accommodate the change. A more popular alternative for most users is order. These options are primarily used by the Data Editor and are of limited use in other contexts. Optionsīefore( varname) or after( varname) may be used with generate to place the newly generated variable in a specific position within the dataset. Set type specifies the default storage type assigned to new variables (such as those created by generate) when the storage type is not explicitly specified. Because replace alters data, the command cannot be abbreviated. Replace changes the contents of an existing variable.
If str is specified, a strL or a str # variable is created using the same rules as above.
If a type is specified, the result returned by = exp must be string or numeric according to whether type is string or numeric. Otherwise, a str # variable is created, where # is the smallest string that will hold the result. In the latter case, if the string variable contains values greater than 2,045 characters or contains values with a binary 0 ( \0), a strL variable is created. A float variable (or a double, according to set type) is created if the result is numeric, and a string variable is created if the result is a string. If no type is specified, the new variable type is determined by the type of result returned by = exp. The values of the variable are specified by = exp. Menu generateĭata > Create or change data > Create new variable replaceĭata > Create or change data > Change contents of variable Description For the other types, see data types.īy is allowed with generate and replace see by. See Description below for an explanation of str. Specify default storage type assigned to new variables