By Ian Amaranayake
Open-Source programming languages have been around for at least 30 years. So, why has there been a recent surge in popularity around their usage? To understand this, we need to understand the analytics landscape leading up to this point.
Within Life Sciences, commercially available software has traditionally been the de-facto standard, and there are many reasons for this. Companies need the software used to be secure and validated. By its very nature software is always changing, but it’s important that changes do not invalidate what has come in the past. In short, the software must be backwardly compatible, or at the very least, the programming language needs to be.
The results being produced by the software also need to be trusted.
Another factor is the capabilities. Best of breed software like SAS offer advanced data manipulation, statistical analysis and graphical capabilities which have helped promote its status as the adopted gold standard within the industry. The future software landscape within the field of clinical reporting is being disrupted, and the source of that disruption is based on a principle that was first established in the mid 1990’s.
The R programming language has its roots in academia, having been produced by professors Ross Ihaka and Robert Gentleman and based on the tenants of being free and open-source. R was traditionally adopted within the pharmaceutical industry as a research tool, but rarely used to contribute towards drug submission work. The concerns for a long time were around the openness of the language. What does openness mean – well at its heart it means the code is publicly available to be shared, modified, and extended. It’s fair to say, the concerns would have been around how one could ensure the results could be trusted, changes would be backwardly compatible, and the software would be validated. It’s easy to see why commercially available software held the upper hand.
So, what’s changed? There are many factors at play here, but surely one of the most prominent reasons is choice. For a long time, the analytics market has had a small number of niche players and the pace of change has been dictated by these organisations. With open-source, the pace of change is driven by the community, which are able to be highly reactive, collaborative and, by its very nature, open to tens of thousands of developers. New capabilities and improvements can be developed at an accelerated pace. Tools like Shiny and the tidyverse, which offer improved and automated data management and reporting capabilities, have seen their usage grow rapidly over the past 10 years.
It is also highly accessible, what was the preserve of large enterprises is now truly open to all.
The academic landscape has changed as well. The reasons for this are partly economic, academic institutions like businesses need to manage their costs and open-source software provides the perfect solution. Certainly for R, you can almost think of it returning to its roots. This has had a more fundamental impact on the industry though. There are far more graduates with open-source skills entering industry today, and that trend is set to continue. When you consider todays programmers are tomorrows managers and decision makers, the adoption and popularity of open-source will continue to accelerate.
What about those concerns around validation and trust? Those tenants are also being challenged. This has partly been driven by commercial enterprises providing their own flavour of open-source software, but these challenges are also being met head on by industry. Within life sciences, a group of pharmaceutical companies are collaborating in producing an opinionated set of R packages, referred to as the ‘pharmaverse’, with a shared vision to produce “pharma stack of open-source R packages to enable clinical reporting (from CRF to eSubmission)”. This initiative perfectly demonstrates how innovation and collaboration amongst industry professionals can provide a real driver for change.
The regulatory landscape is also changing, with regulators like the FDA collaborating as part of cross-industry working groups to explore open-source R based drug submissions.
At Katalyze Data, we believe the growth in the use of open-source tools will ultimately provide pharmaceutical companies with more choice and flexibility. Our validated SCE offering brings together the best of both worlds, providing a workbench of open-source and commercial applications that will meet both current and future use cases within the industry.
Open-source programming, notably R, is revolutionizing statistical analysis in Life Sciences by offering more choices and accessibility. Academic institutions and industry collaboration, like the ‘pharmaverse,’ are driving its adoption, addressing concerns about trust and validation. Regulators, including the FDA, are also exploring open-source options, making it the future of Life Sciences analysis.
If you would like to discuss Katalyze Data’s SCE offering Please reach out to us for a free, no obligation consultation.