© 2014 Pacific Crest
243
Technique
S
earch
Category
Ease of Use
Reduction
➌
Description
Slicing a set of data from a larger data set based upon a search criteria
Benefits
Limitations
Select the data of interest and filter out data not part
of the analysis
There is often information in the observations not
selected that may influence understanding the
information in the searched data
Tool
Normally a search function
Application
Narrowing the focus on a specific population
Example
Search(Table,”age”, 15<age<25)
Technique
S
elect
Category
Ease of Use
Reduction
➋
Description
Slicing a set of data from a larger data set based upon some rule -such as every 10th
row
Benefits
Limitations
Often it is easier to work on a smaller set of data to
get a feel for the data before working on the whole
data set
The analysis may be biased based upon the sample
and needs careful validation with the complete data
Tool
Select Function
Application
Reduction of the data to a fraction of the original data to build a model before testing
with the rest of the data
Example
Select(Table, .1) 1 of every 10 rows is selected
Technique
S
ort
Category
Ease of Use
Rearrangement
➊
Description
Changes the order of the data base using the sort type chosen for a specific attribute
Benefits
Limitations
Allows study of sequencing patterns
Blank values are separated; lose original order of
entry
Tool
Normally a sort function
Application
Determining the median, time series, clustering groups
Example
Sort(Table,”age”) Will sort the set of people from lowest to
highest in age
5.4 Transforming Data