These notes were the result of an attempt to explain how/why we use Hungarian Notation in a lot of our code. The discussion begins with a historical explanation, but moves on to explain why the Hungarian Notation can be relevant to development occurring in modern software shops.
Hungarian Notation comes in two main flavors: Apps Hungarian Notation and Systems Hungarian Notation.
Apps Hungarian Notation is the invention of Charles Simonyi and it concentrates on the SEMANTIC meaning of a variable. We will see this in the discussion below.
Systems Hungarian Notation was the invention of the Windows OS team. They adopted the basic structure of Apps Hungarian Notation, but used the prefixes to represent a variable's actual DATA-TYPE. Thus the Windows API is littered with type prefixes. When people extended the ideas to C++ they tended to adopt the Systems approach due to the ubiquity of the Windows API. They prepended the type system with a set of SCOPE markers.
With time Systems Hungarian Notation came to be known as Hungarian Notation and Apps Hungarian Notation became all but forgotten. Meanwhile, Systems Hungarian Notation began to accumulate a bad reputation because:
You will notice that in our code we often use Systems Hungarian Notation conventions for C++. This is because BD and AT (and later AD) came from a C++ shop where Systems Hungarian Notation was the norm. There is NO requirement to use Hungarian Notation on our project, but we would like to preserve consistency within code files. That means that when a legacy file is modified, developers should conform to Hungarian Notation if the pre-existing code does so.
To begin our discussion, lets review the conventions that we currently use:
Scope prefixes are used for all variables that are not local to a method-
m_ means a class/structure's member data s_ means a class/structure's static data c_ means a class/structure's constant data a_ means a method argument (with ao_ and ar_ variations that mean out and ref arguments, respectively) (no prefix) means a local variable
Data prefixes are used for primitive types and a few frequently used classes
(we don't get trapped into inventing a cumbersome type system)
n means any primitive integer type e means an enum s means a string b means a boolean d means a double c means a char (can also mean an integer that represents a count) by means a byte (sometimes bt or just b also) ds means a DataSet dt means a DateTime or DataTable dv means a DataView dr means a DataRow (sometimes 'row' is used also) sdr means a SqlDataReader rex means a Regular expression (sometimes reg, rgx, regex are used) ex means an Exception txt means a TextBox control rv means a RequiredValidator control lbl means a Label control rb means a RadioButton control cb means a CheckBox control ddl means a drop down list btn means a Button control grd means a DataGrid control arr means an array (sometimes 'rg' is also used) lst means a list map means a hashtable or data dictionary (no prefix) means an instance of some other class (sometimes 'o' is used)
Now, let us review Simonyi's original paper to get a better sense of Apps Hungarian Notation:
The first definition is what is meant by "type"? Simonyi defines it as the set of operations that can be meaningfully applied on a variable. By this he does not mean the set of operators and methods, but something more important. The key to the phrase is "meaningfully" and that is how he brings in the concept of semantics. For example a double that represents a width and a double that represents a length might get multiplied together, or even added together, but shouldn't get assigned to each other. Thus, Simonyi's concept of "type" is more restrictive than that of data-type.
Naming Rules:
The article suggests that new tags be devised for new data structures, not by combining the tags of the constituent data members.
Also, suggests that the name of the new data structures should have the tag as a prefix to the type named.
Example: struct winWindow {h handle, w width, l length}
winWindow winMain = new winWindow().
Here is a full example from his paper:
if (co == coRed) then *mpcopx[coRed]+= dx;
At a glance we can see that the variable co is compared with a quantity
of its own kind; coRed is also used as a subscript to an array whose domain
is of the correct type. Furthermore, as we will see, the color is mapped
into a pointer to x, which is de-referenced (by the *operator in this example)
to yield an x type value, which is then incremented by a "delta x" type value.
Such "dimensional analysis" does not guarantee that the program is completely
free from bugs, but it does help to eliminate the most common kinds. It also
lends a certain rhythm to the writing of the code: "Let's see, I have a co in hand
and I need an x; do I have a mpcox? No, but there is a mpcopx that will give
me a px; *px will get me the x..."
Notice that he is also using tag names to compose function names that map between types.
Writability - this is a concept about the possibility of two developers writing the exact same code to achieve a given end. A claim is made that if this is achieved then the code is necessarily easily read and maintained as it has a certain inevitability about it. The actual writing of the code has a certain mechanical nature after the solution of the problem at hand is understood. The claim is that the use of a pre-agreed set of type tags will facilitate this. A corollary conclusion is that one should avoid qualifiers whenever possible since these come from the imagination of the developer. If qualifiers are necessary, then it is preferable to select one from a list of standard qualifiers.
Procedure Naming Rules:
Standard type constructions:
pX - pointer to type x. dX - difference between to instances of type x cX - count of instances of type x. mpXY - An array of y's indexed by x. Read as 'map from x to y'. rgX - An array of x's. Read as 'range x'. dnX - array indexed by type x. eX - element of array dnX. grpX - group of x's stored one after another. bX - relative offset to a type x. cbX - size of instances of x in bytes cwX - size of instances of x in words
Standard Qualifiers:
xFirst - the first element in an ordered set of x values xLast - the last element in an ordered set of x values xLim - the strict upper limit of an ordered set of X values. Loop should be x < xLim. xMax - strict upper limit for all x values xMac - the current upper limit for a set of x values xNil - distinguished Nil value of type x xT - temporary instance of x
Some Common Primitive Types:
f - flag (boolean, logical) w - word with arbitrary contents ch - character, usually in ASCII text b - byte sz - pointer to zero-terminated string st - pointer to a string where first byte is the count of characters cch h - pointer to a pointer (pp), AKA handle
WHILE it is interesting to review Simonyi's declared scheme, it is really one devised for his own team's needs, writing in C. The important points that should be taken are:
Here is an INTERESTING article about all this by Joel Spolsky. The point being made there is to emphasize the 3 key advantages of Apps Hungarian Notation that I mentioned above, and he presents a particularly relevant context, that of security in web applications.
In particular he discusses a naming convention where 's' means string and 'us' means unsafe string. Any input to a webpage will be read into a variable with the prefix 'us'. There are appropriately named methods that validate the input and the returned, validated string is assigned to a variable with prefix s. He follows Simonyi's recommendation for function naming so that code reading can easily detect problems with using unsafe strings in contexts that require safe strings. He doesn't say what the implementation of the validation functions are, but you can bet that they use whitelists and other techniques discussed by OWASP.
Finally, I'd like to take a step back and look at the program of Hungarian Notation as a whole. The impression that I have is that the whole thing revolves around the fact that the type-checking done by compilers is inadequate when the type system does not fully capture the SEMANTIC TYPES used in an application. While the use of Hungarian Notation is useful for finding type errors missed by compilers, the approach is subject to human intervention and consistency. For that reason I feel it is inadequate for this purpose. A better approach would be to carefully define a set of classes that do meet the full semantic needs of an application. For example, it would be relatively simple to implement struct UnsafeString and struct SafeString that encapsulate the difference between data that comes from sources external to the application and data that has been validated by the application. Taking this approach would leverage OOP principles and compiler type-checking to completely eliminate bugs caused by inappropriate type usage.
This approach drastically reduces the justification for Hungarian Notation. However, I do not believe that the notation is then useless. It is still helpful to give instant scope/type information to the code reader. Although a good code editor will make this information readily accessible, it still requires the code reader to take some physical action to bring up the information...which takes time. Also, one normally cannot see scope/type information on multiple variables simultaneously. Another point that needs to be made is that we don't only view code in a good code editor. Occasionally code is copy/pasted into some text document. Even more to the point, our code-review tool does not provide scope/type information. For all these reasons I'm still very much in favor of using some variant of Hungarian Notation, no matter how well we can map the semantics of what we are coding to encapsulating structs and classes.