General enquiries :
+44 (0)20 7602 6000

Transforming Working Practices to Overhaul Weather Forecasting Technology

Monday 1 June 2020 Digital Transformation

Jake Hoggans's picture
By Jake Hoggans

In largescale IT projects there are often challenges related to siloed information, communication and approaches. From differing definitions of standard terminology to legacy or cloud tech stack incompatibility, there is often much to overcome on the way to success. At CACI IIG, rising above these challenges, we are constantly pushing ourselves to improve working practices and facilitate collaboration in order to achieve significant operational improvements and commercial benefits for customers.

The secret to this is remove barriers and work together more closely than ever before.

This blog delves into a collaborative venture between the Met Office and CACI Information Intelligence to transform working practices within one of their core projects, the Space Weather Forecasting system. 

We have been working in partnership with the Met Office for over four years to develop this capability. Our strong relationship with the Met Office resulted in being shortlisted for the Most Successful Cultural Transformation at the DevOps Industry Awards for our work on Space Weather. For more information about this project, see our Case Study.

Here, we will focus on one key aspect of the system: pulling through new and improved data visualisation and forecasting models from science into operation. 

 

The problem

The time-to-production for a model was in the range of six to nine months – this was a major blocker for unlocking the value of scientific efforts. This was primarily caused by the necessary structure of the programme into multiple groups, consisting of people from various disciplines.  It caused several handovers between groups which, even with everyone’s best intentions, often manifested themselves as “throwing over the wall” moments.

The team identified an opportunity to improve the iteration time, which would allow new models to be exploited quickly while continuing to ensure they are scientifically accurate, stable, and maintainable.

In order to achieve this, the team needed to tackle several key challenges:

Technology stack
Models had complex dependencies including specialist libraries or databases.  On-premise environment hampers techniques like continuous deployment.

Data requirements
Scientists often work with non-operational data sets, which need to be developed into production. Several input and output formats needed to be supported, including raw data, image files, and video files of various size.

Compute requirements
Execution time varies, the smallest models take seconds, the largest take several hours.

Toolsets
Disparate toolsets make it difficult to collaborate in a transparent and frictionless manner.

Culture
The teams were familiar with a close, but siloed, relationship.

Resourcing
Fully utilised teams means investment needs quick returns.

Skillset Inconsistency
Skills were not consistent across teams – who used different tools, techniques, and languages.

 

The target

With the weaknesses of the current processes understood, the joint Met Office and CACI team identified a set of aims for the initiative:

• Reduce elapsed time to production from months to weeks.
• Increase value of models by identifying additional opportunities.
• Improve the confidence in the model outputs.
• Improve autonomy of releases by reducing Space Weather’s usage of the scientific compute estate.

… and focused on three core objectives to achieve the aims:

• Improve the cross-visibility of priorities and dependencies.
• Break down silos with a ‘one-team’ ethos.
• Remove complexity from integration activities.

 

Approach

The teams took a multi-staged approach, which aimed to improve the process at each step – delivering value from improvements early and often.

Tightening the feedback loop

As the number of models in the onboarding queue increased, inefficiencies in communication were a major source of friction, causing delays and rework.

• Separate backlogs meant that priorities were contentious. 
• Lack of understanding between teams meant key information surfaced late. 
• Difficulty sharing code meant that models were treated as black boxes.

The teams focused on cultural and process changes to solve these issues.  Scientists started to attend stand-ups – initially as observers, but actively participating when the teams were collaborating closely on models.  The team defined a joint Definition of Ready to ensure that everyone is aware of vital information blocking progress of a story or task.  Wireframing activities were already employed by the team, but these were shift left and performed much earlier in the development cycle; this allowed the teams to do more in parallel while ensuring models delivered necessary values to end users.

The result was a process which still had frequent handovers but with much less friction at each.

1. Tighten the feedback loop: Extending the development team’s agile ways of working.
2. Aligning toolsets: Improving consistency in toolsets.
3. Introducing shared pipelines: Exploiting public cloud to simplify continuous integration and deployment.

Each step improved the process.  By demonstrating the value of the improvements early and often, stakeholders were continuously re-assured of the investment.

Toolset alignment

The next stage looked at reducing the number of handovers.  It was identified that many of these were caused by opacity of each team’s toolsets.

• Two project backlogs meant that information needed to be duplicated. 
• Disparate source control mechanisms meant that changes to the model were not always shared with the development team in a timely manner; the development team did not feel empowered to make changes. 
• Documentation was not consistent across teams.

The teams revisited their tooling to overcome these.   They moved to a shared project board, with scientists encouraged to contribute to the model-based epics, stories, and tasks – this board was the single source of truth for all model tickets. 

The teams standardised on source control system – with the development team working closely to upskill the scientists in the use of Git.  Shared source control allowed a more transparent collaboration on source code – allowing the developers to contribute to the model, and the scientists to contribute to the data engineering pipeline around it. 

The final step was documentation.  Unfortunately, a fully shared Wiki was not possible – instead, the teams agreed on which types of documentation belonged in each, and a strategy for cross-referencing where appropriate.

The result was a process with fewer handovers, removing the need for re-planning and re-working, and streamlining the development process.  This created a greater understanding of how the work could be split into smaller parts, allowing for a tighter iterative approach for both teams.

Shared build pipelines

The final stage was the most technologically involved.  Despite the process improvements made so far, there were still invalid assumptions, unexpected complexity, and last-minute scope creep.

Limitations on the capacity and configuration of the on-premises system meant that models had to be deployed where convenient (e.g. available compute power or dependencies), rather than where they logically belonged in the system domain. 

Manual gate processes around the deployment, and release, limited the speed of the release train, which hampered the ability to unlock value in iteratively delivered models, reducing the impact of the existing improvements.   Inconsistencies between development, testing, and production environments often caused unexpected incompatibilities and unforeseen release complexity.

Empowered by the Met Office’s adoption of AWS as a platform, the development team looked to move away from the traditional on-premises architecture for executing model code.  The team set about creating a solution which exploited several AWS technologies.

Infrastructure as Code and Containerisation, achieved using CloudFormation and Docker, allowed development and test environments to be created and torn down quickly, and ensured that they were representative of the final production environment.  This improved the team’s ability to perform automated testing and manual verification, and enhanced confidence that the results (especially performance) would reflect the production environment.  It also simplified the management of environmental dependencies.

Using AWS native technologies including S3 and SQS simplified the architecture in terms of complexity and supportability.  Model inputs and outputs stored and archived in S3 made it easier to share with the wider team, using them to monitor the scientific accuracy of the model over time.

All of this was plumbed together using CodePipeline to implement automated CI pipelines for both the model and operational wrap, allowing them to be explicitly linked.  Changes to either of these trigger the integration pipeline, with automatic assurances at each stage of the process.  In theory, scientists could change a model, validate the results, and push it to production without the need to extensively involve the development team.  The result was an integrated build and test pipeline that enables continuous integration and deployment of the model and operational wrap.

Overall outcome

The initiative has been a success.  Thanks to the effort of Met Office and CACI teams, the silos of the scientists and technologists have been broken down, which has resulted in significant improvements to the efficiency of the programme:

  • Model time turnaround
  • A five times reduction in time taken for a model to be fully deployed
  • Availability and reliability
  • Reduction in production issues (tracked via tickets) due to consistent infrastructure and continuous quality assurance catching issues early
  • Satisfaction and morale
  • Scientists and technologists can work more closely
  • Updated infrastructures reduced support challenges
  • Satisfaction of senior stakeholders
  • The senior stakeholders within the business cite Space Weather as an exemplar case study for breaking down silos

 

Summary

The Met Office strategy is to focus everything they do on delivering greater benefit and impact to users, and to achieve this through exceptional science, technology and operations. 

In partnership with us, the Met Office’s Space Weather programme has comprehensively demonstrated that collaborating in a joined-up way can bring benefits including better visibility; more efficient pull through of advances in science; and technology optimisation.  As a result, more value is delivered to the end users, faster.

How has your organisation made inroads into adopting more collaborative ways of working?

Thank you to MOSWOC and the Met Office for providing the images used in this article. More information about space weather can be found on the Met Office website.

This blog delves into a collaborative venture between the Met Office and CACI to transform working practices within one of their core projects, the Space Weather Forecasting system.

Transforming Working Practices to Overhaul Weather Forecasting Technology