Wouldn’t it be wonderful to work from a blank sheet of paper? To design a solution from scratch? I wouldn’t know – all the work I’ve ever done has been on existing facilities. The nearest I ever got to “greenfield” was expansion.
For each project, I’ve always had to take in to account decisions made by those who were there before me. It’s no different when considering functional safety – but here you also have to consider the compliance of the historic installation. And the earlier work may have been done before functional safety had been invented or (as is often the case) before it was properly understood. This paper looks at some of the issues that relate to “legacy” safety systems and modifications to those systems.
Review of the basics
There’s no escaping from some fundamentals. If we’re operating hazardous processes, we need to understand what the potential hazards are, confirm that the risk reduction solutions we have in place make those risks tolerable – and put in systems in place to manage and monitor those systems (and confirm that the risk landscape doesn’t change). There’s no avoiding this. It doesn’t matter if the plant has been running for 4 years or 40. These are the essential building blocks – the foundation for all that follows.
We’re very used to re-validation of our hazard studies. Perhaps every 5 years, we’ll dust off our HazOp studies and look to see if there’s anything on the current P&ID’s that we might have missed. If we’ve carried out Stage 4 Functional Safety Assessments (more on this later) they can be a key input in to the re-validation of our hazard studies. If we find anything new, we’ll need to update our SIL Determination studies, such as our LOPA’s. And this is where it gets difficult – how much risk reduction credit can you take for systems that were installed years ago (sometimes decades ago) that you know aren’t compliant with IEC 61511?
Quick revision on component compliance
IEC 61511 defines three different ways that components can be shown to be compliant (and therefore suitable for use in a safety instrumented functions).
- Product manufacturers can design new products to be safe, by following functional safety management and the design rules for hardware and software found in IEC 61508 (known as “Route 1”, but think of it as “safe by design”)
- Product manufacturers can collect returns data on existing products and (retrospectively) confirm that they are reliable enough to be safe (known as “Route 2” and “Proven in Use”). The demonstration must be based on a sufficient number of devices used over a sufficient number of years of service.
- Duty holders (end users) can collect data on proof testing, inspections etc and confirm that the devices used are reliable enough to be safe (known as “Prior Use”). Again, the demonstration must be based on a sufficient number of devices used over a sufficient number of years of service.
These are the ONLY ways a component can be shown to be suitable for use in a safety instrumented function.
If only we’d known about Prior Use compliance
For our installed safety systems / trips, if we’d known about Prior Use, we could have kept data on proof tests, inspections, demands and spurious trips. Then we’d have been building a database that we could have used to confirm the performance of our legacy trips. If we had lots of the same devices installed (perhaps across multiple sites) we could even have built this data so that we had sufficient operating experience to meet the statistical confidence levels defined in the standard. But typically, we didn’t know about Prior Use and we didn’t keep the data. Even if it did exist once upon a time, it may be lost – especially if the business has been bought / sold over time.
We are where we are
Given that we probably haven’t kept the data that would allow us to demonstrate Prior Use, what do we do? There are some fundamental questions that we need to answer if we are to take risk reduction credit for our legacy trips.
What does it do? A key check for taking a risk reduction credit for any protection layer is to confirm that it actually protects you. It follows that you can only confirm that a trip provides protection if you can describe what it does and then you can check that this is actual protection.
Some key questions to consider:
- What does it sense?
- What is the trip point?
- What actions are taken at the trip point?
- What is actuated to achieve the safe state?
- Exactly what is the safe state?
- How fast does it need to act?
- What secondary actions does it perfrorm? (Like setting alarms or passing signals to the control system)
- How much risk reduction does it give? In other words how reliable is it with respect to dangerous undetected failures?
The first of these questions (What does it do?) should be recorded in the Safety Requirements Specification (SRS). It’s possible / likely that for legacy safety trips that there isn’t an actual SRS document. Never mind. It’s a useful and valuable exercise to write down even a very basic SRS (that describes what the trip does). That’s not to say it will be easy to write this down – it might not be. But the exercise is important and often very revealing. Work from first principles – inspect the equipment, look at any drawings. Don’t rely on hearsay – what the trip actually does and what people think it does can be very different.
We may have no data at all on historical proof tests and inspections. But we have to start somewhere, so use what information you can find to come up with a first estimate of the reliability of the safety trip. Remember that what we’re interested in is the dangerous undetected failure rate (this is the failure rate we plug in to the calculations). We aren’t looking at overall reliability, only that part that would cause the trip to fail without us knowing about it – so that if there were a demand we’d only know the trip was faulty because it didn’t protect us. There may be supplier reliability data and there are various “generic” reliability databases that can be useful. Just remember that this is an initial “wet finger” estimate. It’s not sufficient – and we have to work to make it sufficient, by putting in place on-going data collection and monitoring.
Start data collection now
It’s typical for us to have so few devices that we’ll never build a statistically significant confidence in our reliability data. Never mind. The journey is important even if we may never reach the destination. Record and analyse the data related to the performance of the trips:
- Proof tests – this tells us about dangerous undetected failures. Our key area of focus when it comes to the reliability of our trips.
- Inspections – will often spot systematic failures, such as replacement of failed devices by non-approved alternatives (of course we can’t build confidence in a particular component if we don’t replace like-for-like) Demands – this tells us if our trip works “in the real world”. The number of demands is also an important KPI. If we are having more demands than we expected, something is not right. Maybe our initiating event frequency is out, maybe we have initiating events we didn’t foresee in the HazOp? Perhaps our other risk reduction measures aren’t working as they should? All of these are safety critical questions.
- Spurious trips – at one level we don’t mind if the safety trip is more enthusiastic than it should be. But if our safety trip continually disrupts production operators find ways to defeat and bypass the trips, so a high level of spurious trips can lead to unsafe behaviour.
If we are to take credit for the safety trip as a safety instrumented function, then we need to instigate a proof testing regime. Since we now have a basic SRS that describes what the SIF should do, we can probably design an appropriate proof test. If the trip doesn’t include redundant channels then end-to-end testing (treating the SIF as a black box) may well be sufficient. If there are redundant channels then things will be more complex).
Don’t forget that as well as testing the operation of the trip, secondary actions (e.g. setting alarms) should also be tested.
If you need more risk reduction
One of the outcomes of this work might be that you realise the existing safety trip doesn’t close the risk gap. Tucked away in 61508 is a clause that can be useful. Rather than replacing the existing trip (that may well have worked well for many years), consider adding in an additional safety instrumented function. Clause 7.4.3 “Synthesis of elements to achieve the required systematic capability” in IEC 61508-2 is worth a read – it essentially describes how SIL 1 + SIL 1 = SIL 2 as long as you have diversity (which would almost always be the case when adding new instrumentation alongside a legacy trip).
Functional Safety Management and Functional Safety Assessment
Since we’re doing work covered by the functional safety lifecycle, we have to follow the principles of Functional Safety Management. At one level you can consider this to be simply good engineering practice: have a plan, use competent people, check things carefully, keep good documentation – so it need not be onerous. Additionally we should perform functional safety assessments (FSA) and the Stage 4 FSA’s scope is operations and maintenance – so this covers management of safety instrumented functions, be they legacy or recent. The Stage 4 FSA will examine all the aspects of SIF management that this paper has mentioned, together with the implementation of Functional Safety Management. It will highlight areas of non-compliance, which – together with the monitoring of the SIF performance – will give you a clear sense of where non-compliances are to be found.
Today is the first day of the rest of our lives
We may often have the thought “I wouldn’t have started from here”. But history can’t be changed, only the future can. For all our Safety Instrumented Functions, be they legacy, installed or new, we need to put the basics in place. Know what the hazards are, know what risk reduction is needed, describe what the SIF does, check that the SIF protects you and is reliable enough, test it regularly and keep data to analyse the performance. Follow functional safety management and carry out regular functional safety assessments. The future can be better than the past.