Q
uantitative
R
easoning &
P
roblem
S
olving
244
© 2014 Pacific Crest
Technique
S
ubstitution
Category
Ease of Use
Modifying
➋
Description
Replacement of data, for example, categorical data with numerical values
Benefits
Limitations
Coding alpha to numerical, replacing missing values,
capping values
There are constraints of extrapolation, compromises
of replacement values
Tool
If then function or replace
Application
Survey data, observations, addressing notational concerns
Example
replace(missing, average)
O
ops
! A
voiding
C
ommon
E
rrors
●
Not saving existing data when performing a transformation
Example
: age = ln(age)
Why?
Once ln transformation of age is stored back into age, we lose the original data stored
in age and we may need this data for other purposes.
●
Making an assumption about the meaning of an attribute
Example
: Life expectancy of current life insurance policy holders. It is assumed that it is based
on the current age of the policy holder.
Why?
The data was collected at the time the life insurance holder is awarded the insurance,
since the insurance is for a term of 20 years. The critical decision is the life expectancy
at
the time of issue
. Because the average length of time that current policy holders have had
their policy is 8 years, we are actually adding approximately 8 years to the number we
thought we were using. You really have to analyze the attribute definition before using it.
●
Failure to document new attributions and their means of creation
Example
: We created a weighted index of four other variables to measure overall quality of the
program and titled this “
overall quality.”
Why?
For people to trust and understand what this variable represents, the actual calculation
that was done must be made explicit and verified. The created transformations must
be documented as to why and how they were created.
A
re You Ready?
Before continuing, you should be able to ...
I can...
OR
Here’s my question...
justify a set of data transformations based
upon need and context
explain the rationale behind each type of
data transformation