Leveraging the LAG Function for Enhanced Data Processing in SAS Viya

By Bob Kettlewell

 

In this tip we outline how to use the LAG function in SAS Viya and some of the key things to watch out for.

CAS, or “Cloud Analytical Services”, is the engine included within SAS Viya that provides quick, in memory, data analytics and processing. CAS provides extremely quick processing speeds, and in most cases, SAS 9 code can be submitted to CAS with no modifications required.

The multi-threaded nature of CAS processing is only possible as the rows within a table are potentially split across different threads, over different nodes of the CAS server. However, functions or statements that rely on inter-row information might return unexpected results when executed in a multi-threaded DATA step. One example of this is the LAG function.

The LAG function is commonly used in SAS coding; and can be a key part in some data processing steps. It allows the programmer to do calculations across rows dynamically. It is often used to create cumulative totals, or extract differences over the spread of data.

This first example shows how the LAG function can be used in SAS 9.4:

The DATA step produces the following results; notice how the lagged variable contains the data of the previous row:

If we now run this step within CAS, the LAG function produces an output with missing data. This is because the DATA step is splitting the data into multiple threads and, as such, data in separate threads cannot be related using the LAG function:

The log indicates the DATA step ran in multiple threads:

Notice that the output table contains missing data because the data was split across multiple threads that could not be compared:

The solution to this issue is to force the DATA step to be performed in a single-threaded way using SINGLE=YES DATA statement option.

Using the same example as above, but adding the SINGLE=YES option on the data statement:

The log no longer mentions multiple threads, whilst still processing in CAS:

The output table now contains the correct values from the LAG function. This then allows you to run comparisons and calculate differences between rows, whilst retaining the performance benefits of CAS:

Back to Insights

Related content

Talk to us about how we can help