By Paul Shannon, Senior Managed Service Consultant.
Many analysts will be aware of limits within a dataset, such as the length of a character column and the maximum values stored in numeric columns. Determining the correct size for datasets columns can be an inexact science and a “safe” approach is often followed to prevent data from various sources being truncated or ETL jobs failing. However, any inefficiency can be multiplied by millions once a dataset enters production use, which can lead to significant performance costs.
Katalyze Data have developed a utility designed to review mature environments and processes which can analyse any number of SAS datasets and the data stored within them, generating a report to highlight where there are potential inefficiencies in the datasets.
In particular, the following areas are focused:
The output of the analysis is a simple report designed to guide an analyst to make efficiency edits in two phases:
A high-level summary allows analysts to prioritise resources to the libraries with the biggest potential saving while a detailed breakdown provides the information on exactly which changes are required.
The immediate benefit is measured in the disk space consumed by a dataset. However, this inherently leads into other performance improvements.
Please contact us to book a consultation:
info@katalyzedata.com https://katalyzedata.com +44(0)1993848010