# EPR Approach to Intro Stat: Entities, Properties, and Variables

## Donald B. Macnaughton

Consider three questions:

1. Is the concept of "entity" the most fundamental concept of human reality?
2. Are the concepts of "entity" and "property of an entity" appropriate concepts to teach at the beginning of an introductory statistics course?
3. What is a variable?

With these questions in mind, let me describe an approach to presenting the concepts of entities, properties, and variables to students in an introductory statistics course.

# Entities

If you stop and observe your train of thought at this moment, you will probably agree that you think about "things". For example, during the course of a minute or so, you may think about, among other things, a friend, an appointment, today's weather, and an idea. Each of these things is an example of an "entity".

Many different types of entities exist, for example,

• physical objects
• processes
• organisms
• events
• ideas
• societal entities (e.g., educational institutions)
• symbols
• forces
• waves
• mathematical entities (e.g., sets, numbers, vectors).

People usually view entities as existing in two different places: in the external world and in their minds. We use entities in our minds mainly to stand for entities in the external world, much as we use a map to stand for its territory.

Most people begin to use the concept of "entity" when they are very young. Most of us use the concept automatically as a way of organizing the multitude of stimuli that enter our minds minute by minute when we are awake.

Since everything (every thing) can be usefully viewed as being an entity, the concept of "entity" may be the most fundamental concept of human reality.

Because people use the concept of "entity" almost entirely at an unconscious level, some people have difficulty grasping the fundamental role that the concept plays in their thought.

The concept of "entity" is further concealed because it does not often appear directly in discussion in either (1) everyday life, (2) empirical research, or (3) statistics. Direct discussion is usually omitted because, when dealing with specific issues, it is usually not necessary to drill down all the way to the foundational concept and discuss "things" at such a basic level. Instead, discussions (of specific issues) usually concern one or more particular types of entities, which are best referred to by their type names. For example, medical researchers often study a type of entity called "human beings".

However, as I argue below, the concept of "entity" serves as a foundation for most other concepts in statistics and empirical research. Therefore, discussion of the concept of "entity" at the beginning of an introductory statistics course is invaluable.

# Properties of Entities

Every entity has associated with it a set of attributes or "properties". For example, all human beings have thousands of different properties, two of which are "height" and "blood group".

For any particular entity, each of its properties has a "value". We usually report the value of a property with words, with symbols, or with numbers. For example, your height might be 5 feet 9 inches.

If we need to know the value of a property of an entity, we can apply an appropriate measuring instrument to the entity. If the instrument is measuring properly, it will return a measurement to us that is an estimate of the value of the property in the entity at the time of the measurement. For example, if we need to know the (value of the) height (property) of a person, we can apply a height-measuring instrument (e.g., a tape measure) to the person, and the instrument will give us a number that is an estimate of the person's height.

# Variables

Empirical researchers and statisticians usually refer to properties of entities as variables. Similarly (but approaching from the opposite direction), when researchers or statisticians refer to a variable, they are usually referring (either specifically or generally) to some property of some type of entity.

Thus the important statistical concept of "variable" can be defined in terms of the three more fundamental concepts of "entity", "property of an entity", and "value of a property of an entity". A simple version of the definition is

A "variable" is a formal representation of a property of entities.

I discuss other definitions of the concept of "variable" in the appendix.

# Usefulness of the Approach

I have argued that we can use the concepts of "entity" and "property of an entity" and "value of a property of an entity" to define the concept of "variable". The three defining concepts are simple, intuitive, and fundamental. Thus I believe it is useful to introduce the three concepts at the beginning of an introductory statistics course, as a way of helping students to understand the concept of "variable".

I invite readers who disagree to present their views in the sci.stat.edu Usenet newsgroup.

The above points are part of a broader discussion of an approach to the introductory statistics course available at

http://www.matstat.com/teach/

# Appendix: Some Definitions of the Concept of "Variable"

To help evaluate the above characterization of a variable, it is useful to consider definitions of the concept that have been proposed by others.

Kruskal and Tanur (1978) lack entries for either "variable" or "random variable".

Kotz and Johnson (1982-1988) also do not define the term "variable". Their entry for "random variable" consists of "See PROBABILITY THEORY". In the entry for "probability theory" Heyde (1986) defines a random variable as a member of a certain class of real-valued functions of points in the sample space. Of course, the "points" are equivalent to entities.

Marriott (1990) gives the following definition:

variable Generally any quantity which varies. More precisely, a variable in the mathematical sense, i.e. a quantity which may take any one of a specified set of values. It is convenient to apply the same word to denote non-measurable characteristics, e.g. 'sex' is a variable in this sense since any human individual may take one of two values', male or female.

Marriott defines variables in terms of the concepts of "quantity" and "characteristic". These two somewhat abstract concepts are equivalent to the more tightly delineated concept of "property of an entity".

Marriott makes no reference to the general concept of "entity". However, it is clear that entities lurk in the background of his definition. For example, whenever a "sex" variable has a value, an entity, a particular organism whose sex has been determined ("measured"), is somewhere about. In fact, for virtually all variables, it is reasonable to see entities existing behind the variables. (The entities are whatever are associated with the rows in a standard computer-package data table, in which the columns represent variables.)

I believe that we should move the entities in statistical analysis to the foreground since, from the point of view of empirical researchers, the entities are an important and tangible aspect of the research, and as such should not be left lurking in the background.

Vogt (1993) defines a variable as

Variable Any finding (an attribute or characteristic) that can change, that can vary, or that can be expressed as more than one value or in various values or categories. The opposite of a variable is a constant. For example, height: 5'7", 5'8", and so on; or religion: Catholic, Protestant, Jewish, Other; or experimental treatment: Drug A, Drug B, Drug C.

Vogt uses the concepts of "finding", "attribute", and "characteristic" to define variables. As with the defining concepts in Marriott's definition, these three somewhat abstract concepts are all equivalent to the more tightly delineated concept of "property of an entity".

Like Marriott, Vogt makes no direct reference to the concept of "entity" although there are entities lurking in the background in each of his three examples.

Modern definitions of the concept of "variable" are beginning to embrace the concept of "entity" although once referred to in a definition, entities are still often given short shrift in the rest of the discussion.

For example, Freedman, Pisani, Purves, and Adhikari define a variable as

A variable is a characteristic which changes from person to person in a study (1991, 40).

These writers use the concept of "entity" in their definition but they seem to assume that only people can be entities. (One suspects, however, that this limitation is not their actual intent, and is instead an editing error.)

Moore, in his exemplary introductory statistics textbook, begins by defining "individuals" as

the objects described by a set of data. Individuals may be people, but they may also be animals or things.

He then defines a "variable" as

any characteristic of an individual. A variable can take different values for different individuals (1995, 10).

Moore defines the concept of "individuals" (= "entities") in terms of the concepts of "objects", "people", "animals", and "things". Similarly, he defines the concept of "variable" in terms of the concept of "characteristic" (= "property").

(The choice of which names to use for the concepts "entity" and "property" is of some importance, with the choice perhaps being dictated by considerations of generality and ease of understanding. However, the present discussion is not about the choice of names for the concepts, but is about the concepts themselves, regardless of what we decide to call them.)

Note that Moore defines "individuals" in terms of the concept of a "set of data". If we assume that Moore is following the convention of defining each term in a conceptual system in terms of other more fundamental terms, his definition suggests that he views the concept of "set of data" as being more fundamental than the concept of "individual". Thus Moore appears to be taking a phenomenalistic approach.

My approach to defining the concept of "variable" is similar to Moore's except I suggest that it is useful to view the concept of "entity" (= "individual") as being more fundamental than the concept of a "set of data". In fact, I suggest we leave the concept of "entity" as a primitive. And although we canillustrate the concept of "entity" for students by discussing many examples of entities, we should tell students that the concept itself will, to avoid circularity, be left verbally undefined.

[As noted above, humans acquire the concept of "entity" as young children through non-verbal linking of consistent sets of stimuli. Thus one could argue that the stimuli (sense data) that one receives are the fundamental units of reality. At a preconscious level this approach seems quite reasonable. But at the conscious level, which is the level at which all human discussion about statistics must operate, the concept of "entity" seems to hold sway as the concept that is the basis of all other concepts. (After all, even properties and sets of data are entities.) Thus at the discussion level it makes sense to designate the concept of "entity" as fundamental and therefore verbally undefined.]

Similarly, I believe that the concepts of "properties of entities" and "values of properties of entities" are best left as primitives, defined solely through human experience and through discussion of examples.

On the other hand, I agree with Moore that we can give the concept "variable" a formal or informal verbal definition in terms of the concepts of entities, properties, and values.

In their comprehensive unified view of many of the main statistical topics, Kendall, Stuart, and Ord characterize variables in a way that is similar to the approach described in this note although they use somewhat different underpinnings. In particular, in volume 1 in the first sentence of chapter 1 they assert that the concept of "population" is "the fundamental notion in statistical theory" (1987, 1994). They then give five examples of different types of entities to illustrate what a population can be made of. However, although Kendall et al populate their populations with entities, they do not recognize the concept of "entity" as being a concept in its own right, more fundamental than the concept of "population". Thus they seem to want to start in the ball game at second base.

In the second paragraph of chapter 1 Kendall et al prepare for discussion of the concept of "variable" by discussing the concept of "properties". However, they concentrate on "properties of populations" as opposed to the more general concept of "properties of entities". (Entities are more general than populations because all populations are also entities, but not vice versa.) I believe that we should first introduce students to the fundamental concepts of "entity" and "property of an entity". Then we can define the concepts of "population" and "variable" in terms of those concepts. By building the discussion around what appear to be the most fundamental concepts of human reality, I believe we make the field of statistics substantially easier for students to understand.

# REFERENCES

Freedman, D., Pisani, R., Purves, R., and Adhikari, A. (1991), Statistics (2nd ed.), New York: Norton.

Heyde, C. C. (1986), "Probability Theory (Outline)" in Encyclopedia of Statistical Sciences (Vol. 7), ed. S. Kotz and N. L. Johnson, New York: John Wiley, pp. 248-252.

Kendall, M., Stuart, A., and Ord, J. K. (1987, 1994) Kendall's Advanced Theory of Statistics, (5th and 6th eds, 3 vols), London: Charles Griffin, Edward Arnold.

Kotz, S. and Johnson, N. L., eds. (1982-1988), Encyclopedia of Statistical Sciences (9 vols), New York: John Wiley.

Kruskal, W. H. and Tanur, J. M., eds. (1978), International Encyclopedia of Statistics (2 vols), New York: Free Press.

Marriott, F. H. C. (1990), A Dictionary of Statistical Terms (5th ed.), Harlow, UK: Longman Scientific and Technical.

Moore, D. S. (1995), The Basic Practice of Statistics, New York: Freeman.

Vogt, W. P. (1993), Dictionary of Statistics and Methodology, Newbury Park, CA: Sage.