Saturday, October 25, 2025

Exploring Categorical Information in GAUSS 25


Introduction

Categorical information performs a key function in information evaluation, providing a structured technique to seize qualitative relationships. Earlier than operating any fashions, merely analyzing the distribution of categorical information can present precious insights into underlying patterns.

Whether or not summarizing survey responses or exploring demographic developments, basic statistical instruments, equivalent to frequency counts and tabulations, assist reveal these patterns.

GAUSS presents a number of instruments for summarizing and visualizing categorical information, together with:

  • tabulate: Rapidly compute cross-tabulations and abstract tables.
  • frequency: Generate frequency counts and relative frequencies.
  • plotFreq: Create visible representations of frequency distributions.

In GAUSS 25, these features obtained important enhancements, making them extra highly effective and user-friendly. On this put up, we’ll discover these enhancements and exhibit their sensible functions.

Frequency Counts

The GAUSS frequency perform generates frequency tables for categorical variables. In GAUSS 25, it has been enhanced to make the most of metadata from dataframes, robotically detecting and displaying variable names. Moreover, the perform now contains an choice to kind the frequency desk, making it simpler to research distributions.

Instance: Counting Product Classes

For this instance, we’ll use a hypothetical dataset containing 50 observations of two categorical variables: Product_Type and Area. You may obtain the dataset right here.

To start out, we’ll load the info utilizing loadd:

/*
** Pattern product gross sales information
*/
// Import gross sales dataframe
product_data = loadd(__FILE_DIR $+ "product_data.csv");

// Preview information
head(product_data);
    Product_Type           Area
     Electronics             East
      House Items             West
       Furnishings            North
            Toys             East
      House Items            North

Subsequent, we are going to compute the frequency counts of the Product_Type variable:

// Compute frequency counts
frequency(product_data, "Product_Type");
=============================================
   Product_Type     Rely   Whole %    Cum. %
=============================================

       Clothes         8        16        16
    Electronics        13        26        42
      Furnishings        10        20        62
     House Items         7        14        76
           Toys        12        24       100
=============================================
          Whole        50       100

We will additionally generate a sorted frequency desk, utilizing the elective sorting argument:

// Compute frequency counts
frequency(product_data, "Product_Type", 1);
=============================================
   Product_Type     Rely   Whole %    Cum. %
=============================================

    Electronics        13        26        26
           Toys        12        24        50
      Furnishings        10        20        70
       Clothes         8        16        86
     House Items         7        14       100
=============================================
          Whole        50       100  

Tabulating Categorical Information

Whereas frequency counts assist us perceive particular person classes, the tabulate perform permits us to discover relationships between categorical variables. This perform performs cross-tabulations, providing deeper insights into categorical distributions. In GAUSS 25, it was enhanced with new choices for calculating row and column percentages, making comparisons simpler.

Instance: Cross-Tabulating Product Kind and Area

Now let us take a look at the connection between Product_Type and Area.

// Generate cross-tabulation
name tabulate(product_data, "Product_Type ~ Area");
=====================================================================================
   Product_Type                              Area                             Whole
=====================================================================================
                      East          North          South           West

       Clothes          1              5              1              1             8
    Electronics          5              1              5              2            13
      Furnishings          3              3              1              3            10
     House Items          1              3              2              1             7
           Toys          4              3              2              3            12
          Whole         14             15             11             10            50

=====================================================================================

By default, the tabulate perform generates absolute counts. Nevertheless, in some instances, relative frequencies present extra significant insights. In GAUSS 25, tabulate now contains choices to calculate row and column percentages, making it simpler to match distributions throughout classes.

That is accomplished utilizing the tabControl construction and the rowPercent or columnPercent members.

  • Row percentages present how the distribution of product sorts varies throughout areas.
  • Column percentages spotlight the composition of product sorts inside every area.
/*
** Relative tabulations
*/ 
struct tabControl tCtl;
tCtl = tabControlCreate();

// Specify row percentages
tCtl.rowPercent = 1;

// Tabulate
name tabulate(product_data, "Product_Type ~ Area", tCtl);
=====================================================================================
   Product_Type                               Area                            Whole
=====================================================================================
                       East          North          South           West

       Clothes        12.5           62.5           12.5           12.5          100
    Electronics        38.5            7.7           38.5           15.4          100
      Furnishings        30.0           30.0           10.0           30.0          100
     House Items        14.3           42.9           28.6           14.3          100
           Toys        33.3           25.0           16.7           25.0           99

=====================================================================================
Desk stories row percentages.

Alternatively we will discover the column percentages:

/*
** Relative column tabulations
*/ 
struct tabControl tCtl;
tCtl = tabControlCreate();

// Compute row percentages
tCtl.columnPercent = 1;

// Tabulate product sorts
name tabulate(product_data, "Product_Type ~ Area", tCtl);
===========================================================================
   Product_Type                                  Area
=========================================================================== East North South West Clothes 7.1 33.3 9.1 10.0 Electronics 35.7 6.7 45.5 20.0 Furnishings 21.4 20.0 9.1 30.0 House Items 7.1 20.0 18.2 10.0 Toys 28.6 20.0 18.2 30.0 Whole 100 100 100 100 =========================================================================== Desk stories column percentages.

Visualizing Distributions

Whereas tables present numerical insights, frequency plots supply an intuitive visible illustration. GAUSS 25 enhancements to the plotFreq perform embrace:

  • Automated class labeling for higher readability.
  • New help for the by key phrase to separate information by class.
  • New proportion distributions.

Instance: Visualizing Product Kind % Distribution

To start out, let us take a look at the proportion distribution of product sort. To assist with interpretation, we’ll kind the graph by frequency and use a proportion axis:

// Type frequencies
kind = 1;

// Report proportion axis
pct_axis = 1;

// Generate frequency plot
plotFreq(product_data, "Product_Type", kind, pct_axis);

Instance: Visualizing Product Kind Distribution by Area

Subsequent, let’s visualize the distribution of the product sorts throughout areas utilizing the plotFreq perform and the by key phrase:

// Generate frequency plot
plotFreq(product_data, "Product_Type + by(Area)");

Product distribution frequency plot.

Conclusion

On this weblog, we have demonstrated how updates to frequency, tabulate, and plotFreq in GAUSS 25 make categorical information evaluation extra environment friendly and insightful. These enhancements present higher readability, enhanced cross-tabulations, and extra intuitive visualization choices.

Additional Studying

  1. Introduction to Categorical Variables.
  2. Simple Administration of Categorical Variables
  3. What’s a GAUSS Dataframe and Why Ought to You Care?.
  4. Getting Began With Survey Information In GAUSS.

Related Articles

Latest Articles