Risk Management for the Vera Rubin Observatory — Large Synoptic Survey Telescope

Francia Riesco
12 min readJul 24, 2021

This document will provide the risk management systems and risk matrix for the Large Synoptic Survey Telescope. Also, we will describe the risk assessment, risk control, and evaluation mitigation planning, contingency planning, risk monitoring, and others. Also, we will identify the risk based on impact and likelihood by the risk matrix.

LSST management team wrote an exhausting risk management plan. They identified the risks in different parts of the project, such as design and construction phases. The risk management team identified several risks during the design phase. Many of those risks are related to telescope lenses, cameras, and sensors. LSST Systems engineering reduce risk by creating prototypes, operation simulation, and image simulation. For instance, the camera design had mechanical, electrical, and software modeling and prototyping with the purpose of mitigating any technical risk.

Figure 2 LSS Telescope Lenses assembly prototype by LSST/NSF/AURA

Figure 3 LSS Telescope Lenses assembly in site 2018 by LSST/NSF/AURA

Risk Management

In many projects, implementing a risk management plan is not easy because required that the organization accept and cooperate with the risk identification and mitigation plans. Also, it requires that all the organization members participate in risk reduction and its benefits. When the risk management goals are well defined, the following steps are to create a risk management plan, identify the risk, assessing risks from a quantitative and qualitative point of view. Then, the risk management team should create risk responses and risks monitoring. LSST Risk Management Team (LRMT) created a process to identify, track, assess, monitor, mitigate, contingency plan, and manage risks. The LRMT has a Risk Management Plan (RMP) identified as LPM-20, and it complies with the NSF standards for large facilities and projects. The LPM-20 defines risk as the degree that an event can occur to harm the LSST project or its activities (Gessner et al. 2014). Also, a risk is a combination of probability and the consequence of the event. Therefore, the risk is intrinsic to any activity that doesn’t matter if it is small, large, or complicated. The LSST team also uses the Contingency Management plan to estimate how much uncertainty is the cost associated with uncertainties of scheduling and technical performances. In this way, they have a plan to handle unforeseen issues that can affect the project's technical performance, delivery timing, and budget. The LMP-20 document describes that Continency Management handles activity-based uncertainties and high-impact event-based uncertainties (Gessner et al. 2014). Moreover, the LSST team is handling the risk management task as a continuous and proactive process to keep risks at acceptable levels by tracking and monitoring responses. They identified three main risks to the project, such as Technical Risk which is related to the LSST performance and specifications. Cost Risk is related to risk to surpass the project budget. Schedule Risk, which is related to the LSST project, will not be ready at scheduled milestones (Nordby et al. 2011). Besides, the LSST team considers another category, the Programmatic Risk, which is produced by events beyond the control of the management team. These risks can origin one of the three other risks. The LSST Risk Management works in two-level. The subsystem level is where the risk is identified and the project level to where the risk management board members review the risk across the project. Moreover, the LSST has monthly Risk Review meetings where they review risks adding new and updating others. The Risk Review board are broad project managers, project scientists of each subsystem such as Data, Camera, Telescope to name a few (Selvy 2015).

Risk Assessment and Analysis Process

Risk Classification

The LSST risk management documentation defined four severity classes (Figure 4) being class 1 the worst case. Each class defined the degree of severity, damage, or impact in the project (Nordby et al. 2011).

Figure 4 Risk Classification Table LSST/NSF/AURA

The LSST Risk management defined five Risk Probability Levels (Figure 5) which are the likelihood that a risk will happen in base of an accident or mishap (Nordby et al. 2011).

Figure 5 Risk Probability Levels Table LSST/NSF/AURA

The LSST Risk management defined the Mishap Risk assessment Values (Figure 6 & 7), which are numerical values of an evaluation of potential severity and probability of a risk. The numbers go from 1 to 20 where 1 is catastrophic and 20 completely improbable (Nordby et al. 2011).

Figure 6 Mishap Risk Category Table LSST/NSF/AURA

Figure 7 Mishap Risk Assessment Matrix LSST/NSF/AURA

Risk Mitigation

The LSST Risk Management team developed several steps to reduce and mitigate risk. These strategies are designed to minimize the risk to an acceptable level of cost, time, and system quality. Also, it depends on the mishap risk assessment value who overseen these risks (Figure 8). For example, risks marked as High are monitored by AURA1, SLAC2, and NSF3. For risk cataloged as serious, they are assigned to LSST Project Director, Deputy Director, and Project Manager, and it is assigned project-level risk. Moreover, the Project Manager will then document and keep informed to upper management about all the risk that is cataloged as Serious or High. Also, the project manager will care that uses all the needed resources to resolve the risk and mitigate them to a lower level. It is essential to say that any risk that has been classified and mitigated and still is cataloged as High level won’t be accepted for the LSST Project. These risks will be cataloged with high priority for recourses to mitigate in the redesign (Gessner et al. 2014). Moreover, SLAC won’t accept any residual assessment cataloged as High. They need to be resolved to a lower level before continuing with the project. Also, LSST requires that any High or Serious risks need to be further documented to explain why the risk level cannot be reduced (Nordby et al. 2011).

Figure 8 Risk Categories and mishap acceptance levels LSST/NSF/AURA

The first level of mitigation of any risk at the LSST is in the base of selecting a design that will remove or reduce the risk. For this reason, sources of the risk are removed by a different design that will not interfere with any other part of the LSST or any of its subsystems. For example, selecting non-flammable hydraulic fluid and stop using hazardous materials. Another way to mitigate risk is by controlling the impact of the risk, for example, controlling the design using industry design or manufacturing standards. Moreover, the LSST engineering team reduced mishap risks by introducing Engineered Safety Features to the system design. These features add safety to the program without adding testing controlling or monitoring. Examples of those features are guards, physical barriers, or fuses to name a few. The LSST Risk management team added procedures and training to mitigate risks such as procedures to add devices protections, databases protection, personnel safety, and emergency shutoff devices (Nordby et al. 2011).

Operation Simulations

The complete LSST project was prototyped using Operation Simulator (OpSim). For example, the LSST Observatory Control System, the Telescope Scheduler, the LSST communication middleware, the telescope cameras, the three lenses, and the data system, among others (Delgado et al. 2014).
The design of the different models was built with OpSim. The simulation was divided into several modules (Figure 9) such as Simulator Kernel Model responsible for the interaction between all the models. This kernel handles the orchestration between the observation cycles and the telescope scheduler. The kernel provides the timestamps between systems transactions and mimics the behavior of the real-time sky survey. The Observatory Model was a real representation of the LSST observatory, and this model structure allowed perturbations of various model parameters that let the engineer test scenarios where the observatory performance was reduced. The survey database model handles the creation and interaction of the database, its storage, and usage. The survey database can simulate different types of databases but was modeled with MariaDB or SQLite databases. The Environmental Model handles simulations of the weather and astronomical sky conditions (Delgado et al. 2014).

Figure 9 Operation Simulator Models LSST/NSF/AURA

In addition, the Observatory and camera modeling and design identified that the cable-wrap included in the model allowed movements of more than 360 degrees in both directions and minimum and maximum movement of -270 and +270 degrees. The telescope model identified two possibilities to target the sky, and the analysis process gave the probability to the closest distance and direction for better computing and data optimization (Delgado et al. 2014).

The observations modeling and simulations from OpSims produced all the possible points in the sky of the telescope for ten years. OpSims simulated technical data that help engineering analysis for the database design and maintenance. Moreover, the simulation conditions set boundaries for sky brightness, airmass, and weather changes. The code validation helped to detect better implementation and maintenance of the telescope lenses among other early risk detection via modeling (Delgado et al. 2014).

Telescope Camera Modeling

LSST Camera is the most advanced camera for a terrestrial telescope in the world. It will be positioned in the middle of the telescope. The camera is a fully-enclosed and self-contained assembly composed of three lenses L1, L2, L3 (Figures 10, 11, 12, 13). Also, the camera has five filters enclosed in a carousel that is managed by an autochanger that can be changed during observations surveys. Besides, the camera has a heat dissipation system and a shutter that provide accurate exposure control and blocks direct light to the focal plane of the camera (Nordby et al. 2011).

Figure 10 LSST Camera 3D model section view LSST/AURA/NSF

Figure 11 LSST Cryostat 3D Model section view LSST/NSF/AURA

Figure 12 LSST Camera partial Assembly Filter Exchange System LSST/NSF/AURA

Figure 13 LSST Filter Storage Carousel Mounted to Camera LSST/NSF/AURA

During the 3D modeling and simulation design for the LSST camera, several risks were identified. Risks associated with the LSST camera were related to an unplanned failure of components, misuse, non-traditional operation, or unforeseen hardware or environmental forces. For example, thermal risks such as overheat, the extreme cold of the Cryostat system (Nordby et al. 2011). Also, mechanical risk such as collisions, or structural risk related to the high seismic activity in Chile. All these risks were mitigated in the early design iterations. Another simulation that helped to mitigate risks was the oxygen deficiency in the rooms; the simulation allowed to define the maximum allowed purge gas flow to 200 cfh4 for rooms with dimensions of 35,500 cu ft. With this maximum rate of purge gas, the personnel has at least 8 hours before the oxygen level in the room is reduced by 1%. More risks were identified during the all assembly, integration, test, and operations phases, and they were documented in the LCA-14-A Risk Management plan (Nordby et al. 2011).

Track and Monitoring

Figure 14 Risk Monitoring Entry Flow LSST/ NSF/AURA)

The LSST project has a Risk Management Tool to register and track risk entries (Figure 14). This track system has six significant fields related to the risk (Selvy 2015). For example, risk identification Id depending on the LSST subsystem or project that the risk belongs. The timeframe when the reporter determined the risk event can happen. Assessing Probability, the estimated probability that the risk may occur. The risk life cycle and responses depend on the severity and the impact on the project. To determinate the likely time for a risk to occur, the LSST Risk Management team generated enough data during the simulation and modeling to use Monte Carlo Analysis (Figure 15) to trigger potential dates. They use created three different models. For instance, the specific trigger data is a precise calendar date such as a project milestone, a product is delivered, or a contract is finalized. Random Occurrence, this is when risks will happen, but the dates are random. For example, bad weather may delay the project. Distributed Occurrence, is when identical risks are distributed during periods. For each of these time frames, the LSST Risk management software will use the Monte Carlo Simulation to identify possible dates (Selvy 2015).

Figure 15 LSST Monte Carlo Analysis LSST/NSF/AURA

Risk Matrix

It is evident that a well-defined and detailed risk management plan, and a project can reduce cost, shorten the development period, and improve the overall project implementation and design. For the LSST project, they created a documented Risk Management Plan and started the tracking in the early version of the project. Moreover, by using modeling and simulations, LSST engineers and scientists were able to mitigate the high risks and safety of the project associated with the summit location, the camera lenses, thermal control system. In the end, the LSST project has done an excellent job of incorporating risks management into the project through specific expectations and ongoing actions.

Figure 16 LSST Observatory 3D Model in the summit LSST/NSF/AURA

Figure 17 LSST construction site December 2018 LSST/NSF/AURA

References

Ahmed, M., Brill, S. R., Sengupta, R., Bryson, J., & Olson, L. (2003). Phases of risk management for the james webb space telescope. 2003 IEEE Aerospace Conference Proceedings (Cat. №03TH8652), 2, 2_791–2_797. https://doi.org/10.1109/AERO.2003.1235490

Bremerton Safety Council Camera Subsystem Hazards Frank O’Neill Safety Support August 18, ppt download. (n.d.). Retrieved September 26, 2019, from https://slideplayer.com/slide/8179555/ Contributor, G. (n.d.). Establishing an Enterprise Risk Management (ERM) Framework. Retrieved

September 26, 2019, from https://www.clearrisk.com/risk-management-blog/bid/56487/establishing-an-enterprise-risk-management-erm-framework-enterprise

Council, N. R., Sciences, D. on E. and P., Board, A. and S. E., Board, S. S., & Strategies, C. to R. N.-E.-O. S. and H. M. (2010). Defending Planet Earth: Near-Earth-Object Surveys and Hazard Mitigation Strategies. National Academies Press.

Delgado, F., Saha, A., Chandrasekharan, S., Cook, K., Petry, C., & Ridgway, S. (2014). The LSST operations simulator. Modeling, Systems Engineering, and Project Management for Astronomy VI, 9150, 915015. https://doi.org/10.1117/12.2056898

Hautmann, U. A. (2018). Technical engineering documentation for the construction, operation, and maintenance of the Large Synoptic Survey Telescope. Journal of Astronomical Telescopes, Instruments, and Systems, 4(4), 044005. https://doi.org/10.1117/1.JATIS.4.4.044005

Krabbendam, V. L., & Sweeney, D. (2010, July 1). The Large Synoptic Survey Telescope preliminary design overview. 7733, 77330D. https://doi.org/10.1117/12.857942

LCA-14-A-RELEASED-(CameraPHA).pdf. (n.d.). Retrieved from https://project.lsst.org/groups/sites/lsst.org.groups.safety/files/LCA-14-A-RELEASED- (CameraPHA).pdf

LSST Risk Management Process & Tool Training Brian Selvy Risk Review Board Chair / Sr. Systems Engineer T/CAM Training Meeting March 26, ppt video online download. (n.d.). Retrieved September 26, 2019, from https://slideplayer.com/slide/4900604/

LSST Tour | The Large Synoptic Survey Telescope. (n.d.). Retrieved September 22, 2019, from https://www.lsst.org/tour

PEP LPM54 2013–10–1.pdf. (n.d.). Retrieved from https://project.lsst.org/groups/sites/lsst.org.groups.safety/files/PEP%20LPM54%202013-10-1.pdf

Pfisterer, R. N., Ellis, K. S., & Pompea, S. M. (2010). The role of stray light modeling and analysis in telescope system engineering, performance assessment, and risk abatement. Modeling, Systems Engineering, and Project Management for Astronomy IV, 7738, 773811. https://doi.org/10.1117/12.858207

Posner, R. A. (2004). Catastrophe: Risk and Response. Oxford University Press. Reil et al. — Commissioning Execution Plan.pdf. (n.d.). Retrieved from

https://docushare.lsst.org/docushare/dsweb/Get/Rendition-56713/unknown

Reil, K., Claver, C., Riot, V., & Krabbendam, V. (n.d.). Commissioning Execution Plan. 30.

Reuter, M. A., Cook, K. H., Delgado, F., Petry, C. E., & Ridgway, S. T. (2016, August 8). Simulating the LSST OCS for conducting survey simulations using the LSST scheduler (G. Z. Angeli & P. Dierickx, Eds.). https://doi.org/10.1117/12.2232680

Snapshot. (n.d.). Retrieved from https://slideplayer.com/slide/8179555/ SPO 9 09 LSSTC annual report.pdf. (n.d.). Retrieved from

https://www.noao.edu/news/spo/SPO%209%2009%20LSSTC%20annual%20report.pdf

Sweeney, D., Claver, C., Jacoby, S., Kantor, J., Krabbendam, V., & Kurita, N. (2010). Management evolution in the LSST project. Modeling, Systems Engineering, and Project Management for Astronomy IV, 7738, 77380P. https://doi.org/10.1117/12.857301

Tait, I. E. (2018). Risk management system at Gemini Observatory. Modeling, Systems Engineering, and Project Management for Astronomy VIII, 10705, 1070508. https://doi.org/10.1117/12.2313972

--

--

Francia Riesco

Software engineer. Interested in Data Science, Cosmology, and Computational Astrophysics. MLA Harvard, PhD. (c) CSU.