Comments on Hungarian Notation and our Usage of It


These notes were the result of an attempt to explain how/why we use Hungarian Notation in a lot of our code. The discussion begins with a historical explanation, but moves on to explain why the Hungarian Notation can be relevant to development occurring in modern software shops.


Hungarian Notation comes in two main flavors: Apps Hungarian Notation and Systems Hungarian Notation.

Apps Hungarian Notation is the invention of Charles Simonyi and it concentrates on the SEMANTIC meaning of a variable. We will see this in the discussion below.

Systems Hungarian Notation was the invention of the Windows OS team. They adopted the basic structure of Apps Hungarian Notation, but used the prefixes to represent a variable's actual DATA-TYPE. Thus the Windows API is littered with type prefixes. When people extended the ideas to C++ they tended to adopt the Systems approach due to the ubiquity of the Windows API. They prepended the type system with a set of SCOPE markers.

With time Systems Hungarian Notation came to be known as Hungarian Notation and Apps Hungarian Notation became all but forgotten. Meanwhile, Systems Hungarian Notation began to accumulate a bad reputation because:

  1. In C++ most variables were no longer of primitive types, and people felt the need to expand the prefix system to include all the newly implemented types. Naturally this got out of hand very quickly.
  2. Compilers do a lot of type checking so the notation wasn't needed to enforce type-safety.
  3. Rapid Development Environments improved to the point where data-type information was easily available when reading code.
  4. Some people felt straight-jacketed by having to conform to the naming conventions.
  5. Changing variable types presented large maintenance issues.

You will notice that in our code we often use Systems Hungarian Notation conventions for C++. This is because BD and AT (and later AD) came from a C++ shop where Systems Hungarian Notation was the norm. There is NO requirement to use Hungarian Notation on our project, but we would like to preserve consistency within code files. That means that when a legacy file is modified, developers should conform to Hungarian Notation if the pre-existing code does so.

To begin our discussion, lets review the conventions that we currently use:

Scope prefixes are used for all variables that are not local to a method-

m_ means a class/structure's member data
s_ means a class/structure's static data
c_ means a class/structure's constant data
a_ means a method argument (with ao_ and ar_ variations that mean out and ref arguments, respectively)
(no prefix) means a local variable

Data prefixes are used for primitive types and a few frequently used classes (we don't get trapped into inventing a cumbersome type system)

n   means any primitive integer type
e   means an enum
s   means a string
b   means a boolean
d   means a double
c   means a char (can also mean an integer that represents a count)
by  means a byte (sometimes bt or just b also)
ds  means a DataSet
dt  means a DateTime or DataTable
dv  means a DataView
dr  means a DataRow (sometimes 'row' is used also)
sdr means a SqlDataReader
rex means a Regular expression (sometimes reg, rgx, regex are used)
ex  means an Exception
txt means a TextBox control
rv  means a RequiredValidator control
lbl means a Label control
rb  means a RadioButton control
cb  means a CheckBox control
ddl means a drop down list
btn means a Button control
grd means a DataGrid control
arr means an array (sometimes 'rg' is also used)
lst means a list
map means a hashtable or data dictionary
(no prefix) means an instance of some other class (sometimes 'o' is used)


Now, let us review Simonyi's original paper to get a better sense of Apps Hungarian Notation:

The first definition is what is meant by "type"? Simonyi defines it as the set of operations that can be meaningfully applied on a variable. By this he does not mean the set of operators and methods, but something more important. The key to the phrase is "meaningfully" and that is how he brings in the concept of semantics. For example a double that represents a width and a double that represents a length might get multiplied together, or even added together, but shouldn't get assigned to each other. Thus, Simonyi's concept of "type" is more restrictive than that of data-type.

Naming Rules:

  1. Quantities are named by their type possibly followed by a qualifier. Some punctuation rule separates the two, and Simonyi recommends that of capitalization of the first letter of the qualifier.
    Examples: x is type meaning x-coordinate and it can stand alone as a variable name. rowFirst consists of type row and Qualifier First.
  2. Qualifiers are only necessary to distinguish between variables of the same type within a given scope. There are a list of standard qualifiers which can be used, or the developer can use a custom qualifier that conveys meaning in the given context.
  3. Simple types are named by short tags that are chosen by the developer.
  4. Names of constructed types should be constructed from the constituent types. This combinatorial rule is the motivation for using short tags for simple types.
    Example: a pointer to a row would have a type prefix prow, constructed from short primitive tags 'p' and 'row'.

The article suggests that new tags be devised for new data structures, not by combining the tags of the constituent data members. Also, suggests that the name of the new data structures should have the tag as a prefix to the type named.
Example: struct winWindow {h handle, w width, l length} winWindow winMain = new winWindow().

Here is a full example from his paper:
if (co == coRed) then *mpcopx[coRed]+= dx;
At a glance we can see that the variable co is compared with a quantity of its own kind; coRed is also used as a subscript to an array whose domain is of the correct type. Furthermore, as we will see, the color is mapped into a pointer to x, which is de-referenced (by the *operator in this example) to yield an x type value, which is then incremented by a "delta x" type value. Such "dimensional analysis" does not guarantee that the program is completely free from bugs, but it does help to eliminate the most common kinds. It also lends a certain rhythm to the writing of the code: "Let's see, I have a co in hand and I need an x; do I have a mpcox? No, but there is a mpcopx that will give me a px; *px will get me the x..."

Notice that he is also using tag names to compose function names that map between types.

Writability - this is a concept about the possibility of two developers writing the exact same code to achieve a given end. A claim is made that if this is achieved then the code is necessarily easily read and maintained as it has a certain inevitability about it. The actual writing of the code has a certain mechanical nature after the solution of the problem at hand is understood. The claim is that the use of a pre-agreed set of type tags will facilitate this. A corollary conclusion is that one should avoid qualifiers whenever possible since these come from the imagination of the developer. If qualifiers are necessary, then it is preferable to select one from a list of standard qualifiers.

Procedure Naming Rules:

  1. Distinguish procedure names via punctuation. He suggests always starting with a capital letter. It is the large scope of procedure names that reduces the utility of beginning with type tags.
  2. If there is a return value, use the tag name (with first letter capitalized) of the return type.
  3. Express action of procedure with one or two words, usually verbs. Use punctuation for distinguishing words. He suggests capitalizing the first letter of each word.
  4. Append the tags of some or all the argument types if it seems appropriate to do so.
Examples:
InitSy - takes a type sy as its argument and initializes it
OpenFn - fn is the type of the argument and the procedure will "open" it, whatever that means.
FcFromBnRn - return type fc based on bn, rn input types

Standard type constructions:

pX - pointer to type x.
dX - difference between to instances of type x
cX - count of instances of type x.
mpXY - An array of y's indexed by x.  Read as 'map from x to y'.
rgX - An array of x's.  Read as 'range x'.
dnX - array indexed by type x.
eX - element of array dnX.
grpX - group of x's stored one after another.
bX - relative offset to a type x.
cbX - size of instances of x in bytes
cwX - size of instances of x in words

Standard Qualifiers:

xFirst - the first element in an ordered set of x values
xLast - the last element in an ordered set of x values
xLim - the strict upper limit of an ordered set of X values.  Loop should be x < xLim.
xMax - strict upper limit for all x values
xMac - the current upper limit for a set of x values
xNil - distinguished Nil value of type x
xT - temporary instance of x

Some Common Primitive Types:

f - flag (boolean, logical)
w - word with arbitrary contents
ch - character, usually in ASCII text
b - byte
sz - pointer to zero-terminated string
st - pointer to a string where first byte is the count of characters cch
h - pointer to a pointer (pp), AKA handle


WHILE it is interesting to review Simonyi's declared scheme, it is really one devised for his own team's needs, writing in C. The important points that should be taken are:

  1. The concept of "type" is distinct from data-type. While he stated this explicitely, his examples tend to concentrate on primitive data-types and so the evolution to Systems Hungarian Notation was natural.
  2. The actual tags to be used are up to the development team so there is no such thing as THE Hungarian Notation.
  3. The goal of the notation is to make it possible to detect type conflicts that the compiler cannot detect.

Here is an INTERESTING article about all this by Joel Spolsky. The point being made there is to emphasize the 3 key advantages of Apps Hungarian Notation that I mentioned above, and he presents a particularly relevant context, that of security in web applications.

In particular he discusses a naming convention where 's' means string and 'us' means unsafe string. Any input to a webpage will be read into a variable with the prefix 'us'. There are appropriately named methods that validate the input and the returned, validated string is assigned to a variable with prefix s. He follows Simonyi's recommendation for function naming so that code reading can easily detect problems with using unsafe strings in contexts that require safe strings. He doesn't say what the implementation of the validation functions are, but you can bet that they use whitelists and other techniques discussed by OWASP.

Finally, I'd like to take a step back and look at the program of Hungarian Notation as a whole. The impression that I have is that the whole thing revolves around the fact that the type-checking done by compilers is inadequate when the type system does not fully capture the SEMANTIC TYPES used in an application. While the use of Hungarian Notation is useful for finding type errors missed by compilers, the approach is subject to human intervention and consistency. For that reason I feel it is inadequate for this purpose. A better approach would be to carefully define a set of classes that do meet the full semantic needs of an application. For example, it would be relatively simple to implement struct UnsafeString and struct SafeString that encapsulate the difference between data that comes from sources external to the application and data that has been validated by the application. Taking this approach would leverage OOP principles and compiler type-checking to completely eliminate bugs caused by inappropriate type usage.

This approach drastically reduces the justification for Hungarian Notation. However, I do not believe that the notation is then useless. It is still helpful to give instant scope/type information to the code reader. Although a good code editor will make this information readily accessible, it still requires the code reader to take some physical action to bring up the information...which takes time. Also, one normally cannot see scope/type information on multiple variables simultaneously. Another point that needs to be made is that we don't only view code in a good code editor. Occasionally code is copy/pasted into some text document. Even more to the point, our code-review tool does not provide scope/type information. For all these reasons I'm still very much in favor of using some variant of Hungarian Notation, no matter how well we can map the semantics of what we are coding to encapsulating structs and classes.