You would be shocked to realize that your brand-new home you just walked into has no access to potable water. Your builder never installed pipes and faucets which deliver water to your kitchen, bathroom, or laundry. Ready-to-use potable water available in a new home is a basic need and an unconscious expectation of any home buyer in the developed world. Like potable water for a home, timely access to ready-to-use (RTU) data is fundamental and should be considered a basic need for digital transformation. Not long ago, The Economist declared that data has replaced oil as the world’s most valuable resource.1 Like crude oil, raw data is not valuable in itself, but only when it is contextualized and presented in an RTU format. However, the expectation of timely access to RTU data for new manufacturing facilities seems to have a lower bar.
RTU data can be an extremely helpful enabler for teams supporting new facility projects to make data-driven decisions and maintain the scheduled cadence of project execution activities. This would allow them to avoid unplanned delays, move to consistency runs and GMP manufacturing operations on time. This need is not often considered during the planning and design stages of most new manufacturing buildings and is especially the case for new pharmaceutical/biologics manufacturing facilities. When it comes to designing and building new manufacturing sites, most of the resources are allocated and attention is given to equipment, buildings, and other infrastructure, such as automation, execution and control systems, and basic raw data storage systems. Building data pipelines that connect this valuable infrastructure to end-users and a data contextualization system, which cleans, aggregates, and organizes data into an RTU/business-ready format, is often left outside the perimeter of planning, designing and execution of new building projects. Once a manufacturing building is in regular operation, the burden of building a data contextualization system typically falls on Data Science and other supporting business units. This would mean that RTU data is not available to monitor and troubleshoot manufacturing operations from day one, i.e., right at the start of test/engineering runs of new pharmaceutical/ biologics manufacturing facilities. This would also mean either additional resources must be brought in, or the existing resources will need to be stretched to support manual collection (a.k.a., data pulling), aggregation, cleaning, and organization of data to support process monitoring and decision making until an automated data pipeline and data contextualization system are built.
It is time for the industry to realize that the current paradigm of delivering new facilities need a facelift when it comes to data readiness and digitalization. Just like a homeowner expects RTU water from day one of assuming ownership of a house, industries must shift the current mindset and enable access to RTU data from day one. After all, when access to RTU data can put you in an advantageous position in terms of financials, schedule, and compliance targets, why wouldn’t you change the current paradigm? In this article, we want to share what Sanofi’s Toronto site has done differently. We hope that the information provided here would be beneficial to the industry.
At Sanofi’s Toronto site, a new bulk vaccine facility with over CAD 500 million in investment is getting ready to start manufacturing operations. This new facility, named Building 100 (B100), has presented an opportunity for the organization to build a data system that would provide RTU data to the end-users from day one. In the past, such data systems were developed months after a project reached the execution stage, as it allowed data management teams to know where to find data, build data management systems and test data connections using executed batch data. However, this meant that the business did not have access to RTU data for supporting and troubleshooting the initial execution phase of the operations, including engineering batches. Against this backdrop, Data Science Team on site embarked on a journey to build a data contextualization pipeline even before the start of any operational activities in B100. That meant building a data system before data existed.
NOVELTY of WORK
B100 with its state-of-the-art automation systems, manufacturing execution system (MES) and PI data historian systems, represented a challenging situation to access business-ready data. Raw data from MES and PI are messy and not easily readable. Data from MES, PI, laboratory information management system (LIMS) and the enterprise resource planning (ERP) system needed to be contextualized (i.e., cleaning, filtering, and adding business context) so that end-users can avoid non-value-added manual efforts. This required cross-functional partnerships between different stakeholders to understand how data is structured in different source systems, capture end-user requirements, and build novel data engineering solutions.
Serving Voice of Customers
Imagine for a second that building your brand-new home is still in progress. You have been focusing on the bathroom and laundry room, the plumber now finishing up the final touches to the sink installation. Everything there is perfect, exactly as you had imagined. Suddenly you get a call from the contractor that the kitchen is entirely complete! But wait a minute, you never provided the builder with your plan or even discussed how you needed the kitchen built. The builder simply decided they knew best how to design. For some of us this might be a nightmare scenario, but this kind of situation can occur in industry when dealing with RTU data. The experts building out the infrastructure or contextualizing the data may assume they know what data is needed, how it should be structured and presented to the end-users. The Data Science team at Sanofi pivoted away from this mentality and made the “voice of customers” a key driver for building the RTU data pipeline. The voice of the customer (user requirements) is captured digitally in a user specification management interface (USMI) and used for data contextualization. This is crucial for delivering the RTU data solution right-first time. The USMI captures several mandatory pieces of product and process specific information: business parameter name, process area, data source, data type, associated equipment ID, etc. Additionally, the exact location of each business parameter in source data systems is also mapped, in terms of unique combination of computer queryable field names, in the USMI.
The contextualization engine has a live connection to the USMI. The contextualization engine extracts raw data from a variety of source systems. These source systems are connected to the Sanofi Cloud Data Lake and the data contextualization engine is implemented in this could environment, where different automated data transformation steps are carried out to match user specifications. All information captured in USMI are available before the start-up of B100 operations, including pre-engineering and engineering runs. This makes it possible to build the RTU data pipeline before the start of B100 operations. Since USMI captures all the product and process specific user requirements, the contextualization logic and its programming scripts do not need to be cluttered with any hard-coded information related to user specifications. This allows rapid changes/updates to the user specifications to be made without changing the contextualization logic. As a result, changes to data contextualization can be made in a nimble way to reduce maintenance efforts and turnaround times to support end-user change requests.
Let’s take a step back for a moment. Everybody knows a building will not last with an incomplete foundation. Cracks can form, moisture will get in, collapse might even happen! Some of us are afraid to go to the basement and check. During the infancy of our RTU data system project, the current state of the B100 data foundation was assessed and the gaps identified, provided opportunities to build foundational data system capabilities to help the organization and its digital transformation journey.
One key missing element was Event Frames. Event Frames (EFs) act as the backbone for some of the data contextualization needs. In B100, the EFs are built on the PI data historian system. They organize the process stages along the time axis. Using trigger conditions, such as automation recipes and probes/sensor values an Event Frame can begin and end to bracket a process step of interest. For example, a specific phase in a manufacturing step such as the harvest stage during fermentation, or material charge for buffer preparation can be identified using an EFs. With EFs, it is possible to capture and monitor time series or discrete data of interest based on specific process requirements. Hundreds of EFs that are required for data contextualization were identified and built in collaboration with process experts, automation, and engineering teams. These efforts have not only enabled RTU data flows, but they would also enable the organization to benefit from these foundational data capabilities when it adopts additional digital solutions in the future.
Building a RTU data system from the ground up before data exists is not a cakewalk. It requires strong technical competencies in data engineering and deep knowledge in how source data systems work. Sanofi’s Toronto Data Science teams walked into this challenge as a young team with most of the key contributors being either new to the industry or not having background or experience with the new B100 data systems. Different stakeholder groups in the organization also did not have a proper grasp of what data contextualization was and what needed to be done to get a RTU system up and running. What made it possible though is the strong collaborations and partnerships that formed between different stakeholders, including the process subject matter experts, information technology (IT) specialists, source data system experts and the Data Science team. Known as “iContext” within the organization, this project is not only creating a paradigm shift in the way we deliver RTU data from a brand-new facility to the business, but it has also increased the foundational and technical capabilities within the organization and paves the way to build high-caliber Data Science talents for the future.
- References The Economist, (2017, May 6), The world’s most valuable resource is no longer oil, but data, https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data