Safety Integrity Level Assessment

August 26th, 2008

So after hazards are determined, they have to be assigned to a protection level, and the risk associated with them assessed. A potential hazard is that an asteroid may fall on the ride, but the risk of that is pretty small. The following is an example of a safety integrity level assessment. Again, I was just playing with this, and don’t claim to be an expert at it.

silassess

To figure out how the SIL level comes out, take the different levels I assigned to each hazard, and apply it to the chart at the top.  Follow the lines across, and you get the SIL level.  This is a qualitative SIL assessment level.   The problem with the amusement industry is that there really isn’t enough data out there on incidents to use one of the quantitative assessment methods.

Any comments would be appreciated.

Some Ride Analysis Methods

August 18th, 2008

I am just digging through some old files for some documents I have played with over the years. I am not an expert at this stuff. I am trying to show, however, that it isn’t rocket science. If anyone would like to expand on any of these through comments, I would love to see i

The first method I played with was called Preliminary Hazard Analysis. You can do a search on the internet on it, but basically, it is a brainstorming method that you use early in the design of a machine, usually before the machine design is nailed down. It would need to be supplemented with another method later in the design.

pha

Another method I looked at early on was accident analysis. In this method, you look at previous accident reports.

microsoft-word-accanal

Next time I will pull out some FMEA analysis. You will be able to see that the FMEA does not pick up some of these potential incidents.

note - I removed all of the text from these documents, and inserted a link to a pdf document - the formatting just wasn’t working with the text directly in the blog.

ASTM to IEC 61508 Comparison

August 11th, 2008

Last time I mentioned that the “Ride Analysis was roughly to add a requirement for something approaching a safety life cycle to the amusement industry. This week, I’d like to look at what those requirements are in the ASTM standards, and point you to what they correspond to in the Safety Life Cycle Model.

Again, F2291 was set up to allow alternative safety models, so if you want to look at how this corresponds to Mil Std 882D, feel free. You just get to do the correspondence on that yourself.

Because the idea was not to lock anyone into a particular safety model and since some of the requirements roughly already existed, the references in the ASTM standards are not very specific. So here is the “cheat sheet”.

Section 5.1 of F2291 – the Ride Analysis

The ride analysis corresponds to several steps in the IEC 61508 model. Basically, it roughly corresponds to steps 1 through 5.

  1. Concept
  2. Overall Scope Definition
  3. Hazard and Risk Analysis
  4. Overall Safety Requirements
  5. Safety Requirements Allocation.

In section 5.1.1.3 it calls for a failure analysis, which is part of what is required as step 3 in the IEC 61508 model. ASTM narrowly defines the Hazard Analysis as a Failure Analysis, but then calls out several other areas that must be examined (restraint and containment analysis, 5.1.1.1, clearance envelope, 5.1.1.2, suitability for intended patrons, 5.1.2, etc.) As I mentioned in a previous post, failure analysis is too narrow, and a broader analysis method should be used, in addition to a failure analysis, to ensure all risks are covered.

In 5.1.4, it specifically states, “The ride analysis shall be documented listing the safety issues that were identified and the means used to mitigate each issue.” This is basically steps 4 and 5 in the IEC chart, where you define the Overall Safety Requirements, and then allocate those safety requirements to various protection methods.

Step 6 was not specifically lifted from IEC61508, but since a lot of the 61508 process is common sense, there is a requirement that corresponds. In sections 4, 6 and 7 of F770, the owner is required to take pertinent information from the manufacturer, and develop their operations and maintenance procedures from that.

Step 7, validation is not well covered by ASTM. It kind of rolls into step 8, installation and commissioning in the minds of most people in the amusement industry, but in IEC61508, the idea of validation is that the protection method sufficiently reduces the risk. It takes into account that the protection method can fail. Dr. Gobles book goes into detail what is meant by validation. Additionally, these websites provide further information.

Step 8 is similar to step 6, it wasn’t added to ASTM, because it already existed. In section 6 of F846, the manufacturer is required to test the ride to ensure it “conforms to the original design criteria. So a commissioning test plan is required.

Step 9 is covered under section 11 in F2291. There would be a similar step for implementation of risk reduction to the other “systems” that are assigned to mitigate risks, but again, I am a controls guy, and IEC61508 is a controls standard.

Steps 10 and 11, are covered in various sections of F2291, if you want to see where, take the pdf version of 2291, and search for “Ride Analysis”. Basically, if something comes up in the ride analysis, the mechanical systems need to protect against the risk.

Step 12 rolls together with Step 8, as part of section 6 in F846.

Step 13 is another one that gets missed. Steps 7 and 13 tie together (step 7 is the planning for step 13). This is where all of the information from the steps before is looked at to make sure all hazards are dealt with, and the risk is sufficiently reduced. This roughly corresponds to what occurs when a State inspector reviews a new ride, but ASTM does not cover this subject. Some local authorities look at this in detail, some do not. But this is where the check that all i’s are dotted, and all t’s are crossed occurs. Without this, we are just depending on the manufacturer being sufficiently worried about lawsuits to comply.

This is where the states need to step up, and either perform the overall safety validation, or require a third party review. Just a signed statement from the manufacturer that “this ride was designed to ASTM standards” doesn’t cut it. If the manufacturer doesn’t understand the standards, then there is no protection. The manufacturer also may understand, but lie. Either way, the entire process is subverted.

Step 14 is also covered by existing ASTM standards which state that the owner must operate, maintain and repair the ride according to the manufacturer’s criteria.

Step 15 gets a little squishy, except that in Section 1 of F2291, it states that any major modifications must comply with F2291, which throws you back into a ride analysis.

Step 16, decommissioning is not usually a problem with an amusement ride. This is more intended for a system where a safety plan is required to shut a system down. Something like a nuclear power plant or chemical plant would have more concerns with decommissioning.

So next time, I will offer some examples of the documentation for some of these steps based on a little roundabout ride. I won’t have validation, but will show examples of the other steps as required by ASTM. This is a pretty simple ride system, but when I was in the industry looking for examples, no one would share. So I am sharing non-proprietary information on a ride system where the manufacturer no longer exists, so people can at least start to have a discussion.

The Safety Life Cycle

July 31st, 2008

Last time around, I jumped right into hazard analysis. I really should have started out with the idea of a safety life cycle. When I first encountered this concept, it was from a software perspective from some documents from Adelard. I didn’t take the time to dig into my archives for those documents, but if you are interested, the Adelard site is here

At the time, IEC 61508 was in progress, but not published. The first standard I found dealing with the same concept, but from the perspective of the entire control system was ISA S84.01, 1996. The life cycle in that standard looked like this:

The new version of S84.01 had a safety life cycle that matches the life cycle in IEC 61508, as the movement was to make S84.01 and IEC 61508 match (similar to the similar movement to match up IEC 60204 and NFPA 79). Unfortunately, I didn’t keep a copy of 61508, so I had to go find it referenced on a website:

There are some differences in the flow of the project, but the basic idea is, when you are building something, there needs to be a process where you look at the hazards involved, perform a risk assessment, determine your mitigation, test your mitigation, and then verify that all of this happened correctly.

This is a little different from the V-model that Adelard used, which can be found here: The V-model has more of a software bent to it, but the links between the planning and verifying are a little more clear.

Now realistically, all of this should happen on a project anyway. One problem used to be, people didn’t document the process. So there was no way to check to see if anything was missed, and even worse, in 10 years when all of the design people are scattered to the 4 winds, no one knows what is safety related, and what they can safely change. It’s like the old joke about the guy pulling a wall down in his house, and not realizing it is (was?) a load-bearing wall.

Further, with no defined process, there is a chance that even if someone picks up a risk early in the design process, it might get missed, or during the design, the mitigation might get dropped from the design.

Another benefit of this kind of process is that this is all done in a basically text document, that (hopefully) is more accessible to the non-technical members of the team than some of the engineering design documents would be. So the non-technical team members can determine if they are comfortable with the safety aspects of the equipment, before huge pieces of steel start showing up on site.

There are IEC documents that apply this same Safety Life Cycle concept to the overall machine, but since I was the controls guy, that was where my focus was. I think EN 292 is a good place to start if you want to research this subject for non-controls systems.

From the regulators point of view, this document trail is a great place to start reviewing a ride system. First, you can look at their hazard analysis, and verify that they picked up every risk applicable to this machine.

Next, you can look at the risk they assigned to the hazard. The higher the risk, the more important the mitigation should be. Is the risk assigned appropriate?

Next the mitigation should be described. There should be test procedures that tie back and verify the different hazards. The inspector can examine whether the procedure adequately tests the mitigation. Also, is the test something that can be repeated in the future to provide an ongoing test of the mitigation mechanism? The manufacturer should have performed the test, and there should be signatures. Actually, there should be signatures all through this process. This is important information, and whoever does this work should be proud enough to sign their name.

Now that I have given the overall road map, in future postings, we will look at some of this stuff in more detail. For your information, there is an alternative safety design process that some in the industry like better than the safety life cycle. If you are interested, you can google MIL-STD-882C. I wasn’t a fan, as most of the steps in that process started with “form a committee to……”. Some organizations may have enough manpower to staff 6-8 committees, but not at the places I worked.

So anyway, the Safety Life Cycle - ask for it by name!

FMEA and the Ride Analysis

July 24th, 2008

I am thinking of running a series on the “Ride Analysis” called for in ASTM 2291, but before I do that, I want to do a little ranting. Some politics went into the ride analysis section, so it never ended up being what it should have been.

The original intent, was to have a hazard analysis ‘ala IEC61508. When the whole thing started, Mike Miller and I were trying to shoehorn in the essence of 61508, but squeeze it down in to enough pages that we could actually get it passed. It didn’t happen, but some people on the committee caught the hazard analysis concept, and wanted to apply it to the entire ride system. Great idea, as a matter of fact, while 61508 applies to the safety related control system the hazard analysis comes from other European standards that would apply to the entire machine. What did get included will probably be fodder for a future posting.

Unfortunately, just like some attorneys don’t like calling certain pushbuttons “emergency stops” objections were raised to the name hazard analysis. So the name “Ride Analysis” was applied. The problem with that name is, the term hazard analysis has a meaning that is defined in other standards, and understood by people familiar with those standards. The term ride analysis has a drive-by definition in 2291, but in general is not defined.

So, if you ask for a ride analysis in the amusement industry, what you will generally get is an FMEA or Failure Mode Effects Analysis. This is a good failure analysis method, but it does not fill the requirement for a hazard analysis. The problem is, you can have a hazard, without having a failure. So while the FMEA is a subset of a hazard analysis, the two are not equal. What if there is a fire? Nothing had to fail, but you definitely have a hazard. If a guest tries to climb out, nothing has to fail, but you definitely have a hazard. If a tree falls on the track - well I guess the tree failed, but that still isn’t going to show up in the manufacturer’s FMEA. There is nothing in the FMEA that calls for a fence around the ride, but there is definitely a hazard there to be protected against.

In the course of preparing for this section in 2291, I did some research on hazard analysis. I found a list (which I can no longer find on the internet) that listed some 200 types of hazard analysis. One was even called “naked man analysis.” I couldn’t find any description of that method, but I assume it is self explanatory. I tried to find the holy grail of hazard analysis. What I found was, there is no single tool that meets all needs. Basically, any machine is probably going to require more than one analysis method, and which methods you use vary depending on the equipment being examined, the industry it is used in, the structure of the company, and the qualifications of the persons performing the analysis. Here and here are some documents that discuss some of the different methods available.

As you read these, it becomes apparent that there is no silver bullet. In most cases, more than one method would need to be applied. FMEA can be one of these methods, but it should not be the only one.

Do I have to be prepared for a car falling out of the sky?

July 18th, 2008

This quote came from an engineer at what was at the time a major ride manufacturer. His comment came because his control system was very much a state machine (sequence based) and could get lost if a car suddenly showed up. When the system got lost, there was an opportunity for two cars to get uncomfortably familiar with each other. The problem is, a car can suddenly show up, and the control system has to be able to deal with it. Examples of cars suddenly “showing up”:

  • Someone reloads the plc program, returning it to the state it was in when the program was saved.

When the program on many plc’s is saved, the state of any latches or memory locations is also saved at that point. So the program may thing the cars are at a, b and c, when in fact they are at d, e and f. The control system needs to be able to detect this problem, and must remain in a safe state.    This situation is complicated by the manufacturers that put the memory module in their PLC, and set it to automatically transfer the program if the plc memory has a fault.   If the program memory is messed up, I’d rather have the plc raise it’s hand, than try to deal with it by itself.

  • Mechanics moving cars around with the control system off.

Once in a while there is a situation where a car may be sitting right where the mechanics need to work. Sometimes they will turn the ride on, and move things the way the ride designer intended. And sometimes someone will get the bright idea that they can just release the brakes, push that car forward or back, and do the work they need to do.

  • Cars being removed and added to the ride.

This is similar to the item above except the car may be totally removed from the track for work to be done, and then replaced wherever it is convenient. This is exactly a “car falling out of the sky”.

  • Sensor failures missing a car’s transition

This shouldn’t be a problem, as a ride’s control system should detect sensor failures. But on older systems, it can and did happen.

  • Cars rolling backwards

Sometimes gravity wins over the mechanical system, and a car ends up going the wrong way. In that case, the ride system needs to protect against anything happening, either due to that car rolling backwards, or a car following running up into the car that isn’t where it is supposed to be.

So when an inspector is testing a ride system, I would have a few “ghost” cars appear, and see what the system does. And ask questions. “What happens if the car stops here” Don’t accept the answer that a car can’t stop there. Remember the corkscrew that stopped upside down in the loop? That wasn’t supposed to happen either. If the manufacturer doesn’t seem to have thought things through, then test, test, test.

If they are a well prepared manufacturer, there should be items in the Ride Analysis discussing these issues, tied to procedures in the test plan to test the resultant system. Even so, a spot check outside of their little world (making sure that it doesn’t become a destructive test) is a good idea.

You never know when one of those cars is going to fall out of the sky.

I’ve Got a Barn

July 5th, 2008

Once again, the “I’ve got a Barn” classification raises it’s head. In Tivoli a brand new roller coaster fell off the track.

This roller coaster company lists a number of “references” on it’s website. In fact, on RCDB, this company is only credited with one roller coaster, the one that failed.

The website mentions that the employees of the firm have extensive industry experience, but does not list any of the employees. The website also goes on to claim experience on many rides, including some that were built before the company existed. Obviously, they are trying to claim experience for their employees at other companies, but they don’t give us any information on how extensive that experience is. The employee might have been the shipping clerk at one of the major manufacturers while completing their accounting degree. There is no quantification of the experience here.

This is another blatant example of why inspection agencies must have the capability of performing engineering review. If you don’t have extensive background dealing with the salesmanship in the industry, and you take this website at face value, you might believe this company is experienced. But as Wendy’s used to say “where’s the beef”. It is better to be capable of examining the actual engineering, and ensuring that the proposed ride is a quality piece, that meets all necessary standards.

Until an investigation is completed, it is difficult to tell whether this is a manufacturing, design or some other problem. But an incident that occurs this early in the life of the ride is most likely the manufacturer’s problem.

(The term “I’ve got a Barn” comes from some of the Mickey Rooney musicals. I can’t remember which one or one of several, but generally, the kids were going to put on a show, the show got canceled, someone would say they had a barn, someone else had some costumes, let’s put on a show. I use it as a term to describe some of the companies that spring up that are not quite full blown ride manufacturers. Some of them do grow up to become real manufacturers, while others disappear as quickly as they appeared, leaving behind one or several orphan rides, that may be of less than stellar quality and safety.)

Now for a breaking Bulletin!

June 13th, 2008

I got an indication that I wasn’t clear in my last post. Intamin’s (lack of) response to the slipping issue was a contributing factor, but did not remove responsibility from the operator to maintain the rope. The issues I am bringing up are contributory issues that may have helped prevent this accident, which were not fully explored in the KDA report.

This time I want to look at how Intamin transmitted information to the owner. They claimed three things:

  1. There was a new rope specification
  2. There was a new manual
  3. There was a new rope inspection method (rag versus glove)

Which meant that they were totally blameless. And the KDA swallowed that, hook, line and sinker. Actually, it appears the lawsuit is buying that as well, since there is no mention of Intamin being named in the suit. Now over the years in the industry, there have been communication issues over new information from the manufacturer to the owner. Which lead ASTM to codify that process in the F24 standards. I can’t remember which standard this started out in, but now it lives in F1193, section 14. (I helped in the process of moving stuff around, but I don’t want to dig out my notes to figure out where that section came from.)

The standard says ” supplemental information notification bulletins delivered by the manufacturer of an amusement ride or device to the owner/operator that were not provided a the time fo sale and contain new information or newly recommended inspections or testing, or both, shall be consistent with the following criteria in order to carry the force and effect of this practice.”

The section goes on to define alerts, bulletins and notifications, including sample forms, formats, and specific information to be provided.

Each of the above changes fits the criteria requiring a bulletin. Can Intamin produce a single bulletin for any one of these three changes? There has not been any mention of a bulletin in any of the news media, or in the KDA report. The process was set up to make sure that everyone knows exactly what the criticality and required response to a change is. And Intamin does not appear to be following that process.

So without the proper process, Intamin shares some of the blame for the proper information not getting into the hands of the people who need it. I would guess there are two reasons why some manufacturers don’t follow the bulletin process in ASTM. One is, there is a requirement that the bulletin give an explanation why the change is required. Some manufacturers don’t want to admit there have been problems, or how extensive they have been. (The joke in the industry is that all manufacturer’s employees are trained to say “we’ve never seen that problem before” when they answer the phone.) The other is that when there is a bulletin, if the owners disagree, they may push back. Whereas if the information is sent out more informally, it allows the manufacturer to claim in the media that they sent it out (as Intamin has done in this case) without actually forcing the owner to take action. Or maybe even hoping that the changes pass under the radar, so no one challenges them, while still giving the manufacturer the ability to claim they were sent, in the event of a problem.

Since my last post, there has been another incident on another Giant Drop. It will be interesting to see as the event in Spain unfolds to see if Intamin sings the exact same songs, and we see the same pattern. KDA grabbed the low hanging, easy fruit in their report, and left a lot to rot on the tree. Intamin should have been disciplined for not following ASTM standards.

Slippery Superman

June 5th, 2008

Continuing the discussion of items that appeared in the KDA (Kentucky Department of Agriculture) report on the Superman incident that didn’t make it into the conclusion.

There was a string of bread crumbs running through the new reports and the KDA report. Bread crumbs? Or corn starch?

In the news reports, there was some information about corn starch being used to absorb excess lubricant on the rope. This was also mentioned in passing in one of the reports on the rope. The news media made a big deal about this, especially since it “wasn’t sanctioned by the manufacturer”. But the reason for the corn starch was never examined.

Slipping. The rope was slipping on the drum.

I have spent some time fighting a slipping problem myself. I never spoke to Intamin about this issue, but I have second hand knowledge that others had. (Since these cables don’t carry electricity, it wasn’t my place to fight that battle.) The response was to clean the ropes. Hmmm. Isn’t that removing lubricant as well? I don’t know if Intamin AG is aware of the slippage problem. There is a fairly effective firewall between AG and their customers. But it was discussed with Intamin’s USA branch.

The slipping rope is a risk itself. If the rope slips enough, the catch car or counterweight can overtravel. In an extreme case of overtravel, Bad Things ® can happen. The system actually has a fairly complex system to watch for rope slippage. (Dual encoders, along with multiple check points checking each encoder)

So the system was reporting that the rope was slipping. The manufacturer said to remove lubricant. The park found an “unapproved” way to do that. Maybe what should have happened, is the manufacturer should have looked at what could be done to fix the slippage problem. The slippage problem can make the machine very unreliable. And guests do not like being up the tower for an extended period of time. Think of this machine like an elevator. Do you want your elevator slipping? Do you want your elevator leaving you many feet in the air for 5-10 minutes?

The park says the rope slips. The machine says the rope slips. Even the expert hired by the KDA says the rope slips. Rope slippage is apparently enough of a problem that the machine has a fairly complex system to watch for it. And an opportunity to examine and address this issue with the manufacturer is missed.

So the slipping problem can continue. Due to the media attention, no one will use corn starch any longer. Now what happens when someone gets the bright idea to make the machine more “reliable” by opening up the limits on the slip monitoring? Hopefully that will never happen. But an opportunity to prevent that has been missed.

Maybe we do need some oversight from the CPSC.

Lopa on Superman

June 2nd, 2008

LOPA = Layer of Protection Analysis

I have been holding out on saying anything about the Kentucky Kingdom accident, as I was waiting for the accident report to come out. In some areas, the report was very thorough.  In my opinion however, it didn’t look at everything.  Then, in it’s conclusion it wants to tie up the problems that led to this incident with a neat little bow, blaming two parties:

1. SFKK maintenance

2. The main ride operator

As a controls engineer, I believe in contingencies. Just in case this protection method doesn’t work, you have a back up. If this horrible incident came down to just these two factors, I would actually be worried.  So I want to look at some of the contributing factors. In the next couple of weeks, I will examine the other factors that contributed to this accident.

Today, I want to look at the most obvious. The second operator. Some people might think the second operator is only there to help with loading, etc.  In the case of this ride, the main operator cannot see the entire ride. So the second operator is there to watch what the main operator cannot see.  The other side of the tower.

 See page 33 of the PDF file for Operator Panel 2

In the statements of both operators. The second operator told the main operator to hit the e-stop. But why didn’t the second operator hit her own e-stop? Now there may be some mitigating factors here. The second operator’s assigned position may not have been next to one of the two panels with e-stops.   But this doesn’t seem likely, as the operator states in her statement that she was at a panel to initiate the cycle.  The only panel that fits that description is OP2.

The second operator may have been trained to let the main operator control the e-stop.  Hopefully that is not the case.   But the e-stop would have been there. And the second operator could have hit it.

It is possible that the e-stop is no longer on that console.  If this is the case, then this should be examined further  in the KDA report.  The manufacturer put that pushbutton there for a reason, and it should still be there.

The main operator should have been the protection in this kind of incident. In this case, the operator froze up, and did not respond.  The protection failed. But so did the secondary protection.  To really be a full and complete examination of this incident, the KDA should have explored why the second operator didn’t hit their e-stop.