Introduction
Categorical information performs a key function in information evaluation, providing a structured technique to seize qualitative relationships. Earlier than operating any fashions, merely analyzing the distribution of categorical information can present precious insights into underlying patterns.
Whether or not summarizing survey responses or exploring demographic developments, basic statistical instruments, equivalent to frequency counts and tabulations, assist reveal these patterns.
GAUSS presents a number of instruments for summarizing and visualizing categorical information, together with:
- tabulate: Rapidly compute cross-tabulations and abstract tables.
- frequency: Generate frequency counts and relative frequencies.
- plotFreq: Create visible representations of frequency distributions.
In GAUSS 25, these features obtained important enhancements, making them extra highly effective and user-friendly. On this put up, we’ll discover these enhancements and exhibit their sensible functions.
Frequency Counts
The GAUSS frequency perform generates frequency tables for categorical variables. In GAUSS 25, it has been enhanced to make the most of metadata from dataframes, robotically detecting and displaying variable names. Moreover, the perform now contains an choice to kind the frequency desk, making it simpler to research distributions.
Instance: Counting Product Classes
For this instance, we’ll use a hypothetical dataset containing 50 observations of two categorical variables: Product_Type and Area. You may obtain the dataset right here.
To start out, we’ll load the info utilizing loadd:
/*
** Pattern product gross sales information
*/
// Import gross sales dataframe
product_data = loadd(__FILE_DIR $+ "product_data.csv");
// Preview information
head(product_data);
Product_Type Area
Electronics East
House Items West
Furnishings North
Toys East
House Items North
Subsequent, we are going to compute the frequency counts of the Product_Type variable:
// Compute frequency counts
frequency(product_data, "Product_Type");
=============================================
Product_Type Rely Whole % Cum. %
=============================================
Clothes 8 16 16
Electronics 13 26 42
Furnishings 10 20 62
House Items 7 14 76
Toys 12 24 100
=============================================
Whole 50 100
We will additionally generate a sorted frequency desk, utilizing the elective sorting argument:
// Compute frequency counts
frequency(product_data, "Product_Type", 1);
=============================================
Product_Type Rely Whole % Cum. %
=============================================
Electronics 13 26 26
Toys 12 24 50
Furnishings 10 20 70
Clothes 8 16 86
House Items 7 14 100
=============================================
Whole 50 100
Tabulating Categorical Information
Whereas frequency counts assist us perceive particular person classes, the tabulate perform permits us to discover relationships between categorical variables. This perform performs cross-tabulations, providing deeper insights into categorical distributions. In GAUSS 25, it was enhanced with new choices for calculating row and column percentages, making comparisons simpler.
Instance: Cross-Tabulating Product Kind and Area
Now let us take a look at the connection between Product_Type and Area.
// Generate cross-tabulation
name tabulate(product_data, "Product_Type ~ Area");
=====================================================================================
Product_Type Area Whole
=====================================================================================
East North South West
Clothes 1 5 1 1 8
Electronics 5 1 5 2 13
Furnishings 3 3 1 3 10
House Items 1 3 2 1 7
Toys 4 3 2 3 12
Whole 14 15 11 10 50
=====================================================================================
By default, the tabulate perform generates absolute counts. Nevertheless, in some instances, relative frequencies present extra significant insights. In GAUSS 25, tabulate now contains choices to calculate row and column percentages, making it simpler to match distributions throughout classes.
That is accomplished utilizing the tabControl construction and the rowPercent or columnPercent members.
- Row percentages present how the distribution of product sorts varies throughout areas.
- Column percentages spotlight the composition of product sorts inside every area.
/*
** Relative tabulations
*/
struct tabControl tCtl;
tCtl = tabControlCreate();
// Specify row percentages
tCtl.rowPercent = 1;
// Tabulate
name tabulate(product_data, "Product_Type ~ Area", tCtl);
=====================================================================================
Product_Type Area Whole
=====================================================================================
East North South West
Clothes 12.5 62.5 12.5 12.5 100
Electronics 38.5 7.7 38.5 15.4 100
Furnishings 30.0 30.0 10.0 30.0 100
House Items 14.3 42.9 28.6 14.3 100
Toys 33.3 25.0 16.7 25.0 99
=====================================================================================
Desk stories row percentages.
Alternatively we will discover the column percentages:
/*
** Relative column tabulations
*/
struct tabControl tCtl;
tCtl = tabControlCreate();
// Compute row percentages
tCtl.columnPercent = 1;
// Tabulate product sorts
name tabulate(product_data, "Product_Type ~ Area", tCtl);
=========================================================================== Product_Type Area
=========================================================================== East North South West Clothes 7.1 33.3 9.1 10.0 Electronics 35.7 6.7 45.5 20.0 Furnishings 21.4 20.0 9.1 30.0 House Items 7.1 20.0 18.2 10.0 Toys 28.6 20.0 18.2 30.0 Whole 100 100 100 100 =========================================================================== Desk stories column percentages.
Visualizing Distributions
Whereas tables present numerical insights, frequency plots supply an intuitive visible illustration. GAUSS 25 enhancements to the plotFreq perform embrace:
- Automated class labeling for higher readability.
- New help for the
bykey phrase to separate information by class. - New proportion distributions.
Instance: Visualizing Product Kind % Distribution
To start out, let us take a look at the proportion distribution of product sort. To assist with interpretation, we’ll kind the graph by frequency and use a proportion axis:
// Type frequencies
kind = 1;
// Report proportion axis
pct_axis = 1;
// Generate frequency plot
plotFreq(product_data, "Product_Type", kind, pct_axis);
Instance: Visualizing Product Kind Distribution by Area
Subsequent, let’s visualize the distribution of the product sorts throughout areas utilizing the plotFreq perform and the by key phrase:
// Generate frequency plot
plotFreq(product_data, "Product_Type + by(Area)");
Conclusion
On this weblog, we have demonstrated how updates to frequency, tabulate, and plotFreq in GAUSS 25 make categorical information evaluation extra environment friendly and insightful. These enhancements present higher readability, enhanced cross-tabulations, and extra intuitive visualization choices.
Additional Studying
- Introduction to Categorical Variables.
- Simple Administration of Categorical Variables
- What’s a GAUSS Dataframe and Why Ought to You Care?.
- Getting Began With Survey Information In GAUSS.
Eric has been working to construct, distribute, and strengthen the GAUSS universe since 2012. He’s an economist expert in information evaluation and software program growth. He has earned a B.A. and MSc in economics and engineering and has over 18 years of mixed business and tutorial expertise in information evaluation and analysis.


