Table of Contents |
---|
Introduction
The Surveillance Data Platform (SDP) is a CDC program to streamline public heath surveillance. This cross-agency effort stems from a call from multiple stakeholders, including Congress, state public health leaders, and federal advisory committees asking for the development of a new surveillance strategy. The Surveillance Data Platform is part of the resulting CDC Surveillance Strategy whose goal is to improve public health efficiency and accelerate decision-making by building a collaborative that
...
In the absence of ECPaaS, each shared service would need to implement its own variant of each of the above resulting in additional development effort, overall complexity, and management difficulty.
...
While originally conceived as the platform for hosting SDP shared services, ECPaaS is not functionally limited to this use. ECPaaS provides a general-purpose application and service hosting platform suitable for most application and service hosting needs as illustrated in the high-level context diagram below.
Purpose and Intended Audience
This document describes the concept of operations (CONOPS) for ECPaaS. It is intended to assist CDC with planning for how to manage ECPaaS and outline how ECPaaS capabilities support each of the stakeholder groups identified below.
Operations: ECPaaS is owned and operated by the Information Technology Services Office (ITSO) Application Hosting Branch (AHB). Aspects of ECPaaS of interest to this team include: system patching and upgrades, user and project on-boarding and management, event logging, and resource quota enforcement.
Security: The staff of the Office of the Chief Information Security Officer (OCISO) also have a stake in ECPaaS. OCISO policy staff may need to update policies in light of new technologies used in ECPaaS. OCISO operations staff can take advantage of new ECPaaS capabilities for monitoring and incident response.
Developers: Developer teams within the CDC are likely to be the most common daily users of ECPaaS. These teams can take advantage of built-in resilience and horizontal scaling, and can also use DevOps tools offered by ECPaaS to create automated pipelines that automatically move applications through development, testing, and deployment.
End Users: This group of users may never be aware that the services they are using are hosted on ECPaaS. Nevertheless, they will benefit from the features of ECPaaS that ensure that the services they rely on are always available and performant.
...
This document does not detail standard operating procedures or processes, these are the subject of separate documents being developed by the operations team.
ECPaaS Technology Underpinnings
The SDP program Program is one of several CDC initiatives dedicated to making disease tracking more efficient through the use of cloud-based technology. ECPaaS was developed to provide the foundational IT infrastructure upon which SDP shared services Shared Services are deployed and managed as illustrated in the high-level architecture diagram below.
ECPaaS is built from a cluster of physical or virtual server nodes using several open-source technologies described in the next section. Developers build shared services and deploy them on ECPaaS for use by one or more end users. ECPaaS takes care of distributing deployed services over the cluster nodes and provides the ability to scale services horizontally and provide . It also provides resilience by deploying multiple load-balanced instances of those services. When necessary, existing CDC services can be re-used by ECPaaS services either directly or via an adaptor (e.g., to provide a new interface to a service that is natively accessed using a different mechanism).
SDP shared services Shared Services represent a flexible concept that aims to allow CDC surveillance programs to select and incorporate appropriate functionality to expedite their data collection and analysis workflows. ECPaaS is designed to host capabilities implemented using heterogeneous technologies while providing a consistent interface to service consumers. These services are deployed to a microservices-based infrastructure in a manner that is scalable, resilient, and secure to support the needs of service consumers.
The following subsections describe the component technologies that are used with ECPaaS.
Docker
Docker has become the de-facto industry standard for packaging and running an application in an isolated, secure and lightweight environment called a container. Such containerized applications run directly on the host machine's kernel, avoiding the overhead of a hypervisor and guest operating system incurred by virtual machines. This lightweight approach allows a host machine to run many more Docker containers than it could host virtual machines and thus makes more efficient use of resources.
...
Docker images are very loosely comparable to a statically-linked executable file, they contain all of the items needed to run an application along with the entry point from which the application is launched. While an executable is run in an instance of an operating system process, a Docker image is run as an instance of a Docker container. A Docker container mates a Docker image with the host operating system kernel, an operating system process, and an additional layer of writable, ephemeral storage. Like executable files and processes, there can be multiple running Docker containers for a given Docker image, each isolated from all of the others.
Kubernetes
Kubernetes is a system for automating deployment, scaling and management of containerized applications. Kubernetes combines multiple compute servers (nodes) into a cluster and provides a mechanism for automated, distributed container execution to provide horizontal scalability and resiliency.
...
Pods are fronted by a Kubernetes service which acts as a load-balancing proxy for one or more replicated pods. Services provide a stable network address for clients as pod deployments change within the Kubernetes cluster.
RedHat OpenShift Container Platform
Rather than assemble ECPaaS from a set of open source projects and build an administrative function, the SDP team Program Team selected a product that pre-integrates all of the required parts. The RedHat OpenShift Container Platform integrates Docker, Kubernetes, and many other open source projects into a unified platform that is are available to purchase with commercial support.
...
- Users: Interactions with OpenShift are associated with a user. Users are granted permissions via role assignments, either directly or via group membership. Users must authenticate to access OpenShift and it supports various authentication mechanisms for integration with existing enterprise infrastructure.
- Projects: Access to OpenShift resources is managed using projects. Projects provide a structure to organize content, such as application pods and services, in isolation from other projects. Using the software defined network multitenant plug-in, each project's pods share a virtual network and that project's network traffic is isolated from that of other projects.
- Routes: Kubernetes services are exposed outside the OpenShift cluster using routes which give a service an externally-resolvable hostname. Routes can be secured via transport level security (TLS) or unsecured.
- Persistent Storage: Building on underlying Kubernetes capabilities, OpenShift manages a set of persistent volumes created by administrators. Persistent storage can be provided in a variety of ways including network file system (NFS) mounts, fibre channel or Amazon Web Services (AWS) Elastic Block Storage (EBS). Application developers create a persistent volume claim to request storage and OpenShift is responsible for finding a matching persistent volume and mounting that volume on the desired pod.
The above is a very brief review of major OpenShift capabilities, full information can be found in the online documentation.
Platform Administration
OpenShift provides both a Web-based graphical user interface (GUI) and a command line interface. Platform administrators and application developers can use either to perform the functions available to them dependent on their role assignments.
This section briefly introduces the administrative capabilities supported by RedHat OpenShift, full documentation can be found in the online administrators guide.
Roles and Responsibilities
Administration of ECPaaS and the services deployed upon it is distributed over a number of stakeholder roles.
Operations: Staff in this role are responsible for ECPaaS system patching and upgrades, user and project on-boarding, ECPaaS system monitoring, ECPaaS system resource management, developer support, security support, and ECPaaS system issue tracking and resolution.
Security: Staff in this role are responsible for the definition of security policies, security auditing and monitoring, and security incident response.
Developers: Staff in this role are responsible for the development and deployment of applications and services, requesting required ECPaaS system resources, service and application monitoring, service and application user on-boarding and support, and responding to security issues.
ECPaaS Clusters
ECPaaS consists of three separate clusters, each with a specific purpose.
- Development and Test: Used for the development of applications and services and for various types of testing prior to deploying to the integration cluster for final testing. May also be used to demonstrate pre-release versions of applications and services.
- Integration: Structured identically to the Production cluster, this cluster is used for integration, security and performance testing of applications and services prior to their rollout into production.
- Production: Hosts production services and applications.
Capacity Planning and Management
Operations staff monitor and track trends for various metrics for each cluster including cluster node resource usage (CPU, memory, etc.), persistent storage usage, network usage, and application and service usage. This information is used to inform decisions about cluster configuration, sizing, and resource requirements. This information is also used to identify projects that may benefit from different deployment approaches (horizontal scaling) or that are using cluster resources but are not being actively used (potential targets for sunsetting).
Account Management
User management is a significant part of any system deployment. The ability to manage user accounts in a centralized manner makes it easier and more efficient for IT operations staff to provision user accounts on a new system. It also makes it easier for the organization to audit user access because all user roles and permissions are managed using a single system. ECPaaS is configured to sync user and group memberships with the CDC Active Directory (AD) infrastructure that is currently used to administer federal and contract staff network accounts.
...
- Prune: In this phase, any user account or group that is not defined in the AD directory is removed.
- Sync: In this phase, any new users included in the
gp-r-openshift-users
group (or included groups) that don't already have user accounts are added. Also, any new groups that are included in the whitelist are created, and populated with the users specified in the AD group.
Resource Quotas
A significant concern in any shared resource is the possibility that a subset of projects consume an inordinate amount of the overall system resources, thus preventing adequate levels of service from being delivered to other customers, or causing operations to invest in more capacity than planned or budgeted. RedHat OpenShift has a comprehensive system of quotas and limit ranges to facilitate control of cluster resources.
...
Limit ranges are also scoped to projects but they offer finer-grained control at the Kubernetes pod, Docker container or Docker image level. Minimum and maximum limits can be set for CPU and memory on a per-pod, container or image level, effectively controlling their resource consumption per cluster node.
Platform Security
The security policy and operations staff at the OCISO and division levels have a responsibility to ensure that the applications and systems connected to the CDC network do not present an unmanageable risk to the overall security posture of the agency. Security practices evolve over time to address new technologies and evolving attacks. ECPaaS includes several new technologies, including Docker containers, that present both security opportunities and challenges. Security opportunities include:
...
In addition to the issues outlined above, this section also describes how ECPaaS can provide a Computer Security Response Team (CSRT) with the necessary insights into the workings of the platform required to respond to an intrusion.
Trusted Image Registry
The underlying Docker images for all applications and services that are deployed on ECPaaS are stored in a trusted image registry. Kubernetes pulls images from this trusted registry when deploying pods and this is the only way that new Docker containers can be run within ECPaaS. This makes the trusted image registry an important control point from a security perspective since it offers a mechanism to ensure that only vetted Docker images are ever deployed. Images stored in the trusted registry can be vetted to ensure that they
...
ECPaaS adds additional tool support to OpenShift that provides Docker image scanning. The tool can scan Docker images to determine the software they include, determine if that software has known vulnerabilities and the severity of those vulnerabilities, and advise on versions of the software where those vulnerabilities are fixed.
Source-to-Image Builders
OpenShift supports a mechanism for packaging applications into Docker containers called Source-to-Image (S2I). Use of this mechanism provides operations and security staff assurance that the resulting Docker images were built by combining pre-approved builder images with application source code and associated dependencies. This should streamline the path to an operational service since security staff will only need to review the application-specific source code and dependencies, rather than the entire configuration of the particular language runtime stack that the application uses (since that is already approved as part of the S2I builder image).
Network Segmentation
The ECPaaS integration and production clusters are each split across two sub-networks that are configured to provide specific capabilities.
...
OpenShift provides the ability to limit the execution of pods to only run on specified nodes within the cluster. In ECPaaS this capability is used to ensure that pods that need to process requests that could originate outside of CDC only run on cluster nodes deployed in the external network segment. Similarly, this capability is also used to ensure that pods that need to access internal CDC systems only run on cluster nodes deployed in the internal network segment. Applications that need to both serve requests that could originate from outside of CDC, and access internal CDC systems need to be split into multiple pods, each confined to the appropriate network segment.
Network Management and Monitoring
As described above, the use of a software defined network within the ECPaaS cluster creates some challenges for security staff tasked with monitoring network traffic. To address these challenges ECPaaS adds additional tool support to OpenShift that provides the following capabilities.
- Real-time network inspection at the cluster, application, and container level
- Kubernetes-aware tools that can track pods as their deployment shifts over time
- Automated application behavior discovery and security policy creation that reduces the need for manual configuration and rule maintenance
- Threat detection including distributed denial of service, and domain name system attacks
Data Encryption
Recent federal mandates require the use of Transport Level Security (TLS) on all connections to federal computer systems. OpenShift provides built-in support for TLS protected connections to hosted applications.
...
- Agent-based encryption at the file and volume level
- Transparent to applications, databases, or other infrastructure
- Policy-based access control at the user, group, or role level
- Data access audit logging
- Centralized policy and encryption key management
Authentication and Authorization
As outlined above, for operations, security, and developer access, ECPaaS is configured to sync user and group memberships with the CDC AD infrastructure. End-user access can be controlled either via an application-specific mechanism or, preferably, via integration with CDC Secure Access Management Services (SAMS).
Project Isolation
The ECPaaS clusters have been deployed with the multitenant network plugin. This ensures that every OpenShift project is, by default, network isolated from every other project running on the cluster and that an application running in one project is not able to view network traffic, or communicate with network services that reside in another project. If there is a business need to allow such communication, then the networks of multiple OpenShift projects can be joined together to allow such access.
...
Finally, due to the use of Docker images and containers, OpenShift ensures that third-party libraries introduced by one application are isolated to that particular application, and will not impact any others that happen to use the same application stack.
Developer and End User Support
ECPaaS operations staff are responsible for user and project on-boarding, developer support, and system issue tracking and resolution. This section describes how each of these areas is expected to function.
Issue Tracking
ECPaaS support requests are managed using the CDC Service Manager application and new requests may be entered using that application (available internal to CDC only) or by emailing them to the service request email account. Support requests can be submitted for ECPaaS or the existing SDP services that it hosts. It is recommended that support for future services is also integrated into Service Manager to provide a "one stop" support experience for all stakeholders.
User and Project On-boarding
As outlined above, for operations, security, and developer access, ECPaaS is configured to sync user and group memberships with the CDC AD infrastructure. Operations staff will be responsible for developing and maintaining a process for on-boarding users and assigning them appropriate ECPaaS roles. OpenShift administrators can give users access to certain projects, allow them to create new projects, and give them administrative rights within individual projects. OpenShift administrators can also disable or limit the creation of new projects on a per-user basis.
New projects are created based on a project template. Operations staff will be responsible for curating a set of project templates that are customized to mirror common CDC application configurations. Project templates can preselect certain components and provide a means of rapidly deploying a new application on ECPaaS. Project templates can also control pod deployment to ensure that only appropriate pods are deployed on cluster nodes in the internal and external network segments. If an appropriate, more specialized, project template is not available for a new project then that project can be provisioned based on a simple generic template and then customized as needed. It is recommended that operations staff provide a mechanism to gather requirements from developers for new project templates as part of the template curation process.
Availability
The ECPaaS clusters are configured to provide sufficient redundancy to ensure that hosted applications will remain accessible even in the face of multiple failures in the platform infrastructure.
Data Backup and Restore
Operations staff are responsible for ECPaaS backup. Operations staff will assist development teams with the formulation and execution of a backup strategy for their application data.
...
- Backup any databases that are hosted in containers on the ECPaaS cluster to a cluster node.
- Backup the cluster nodes themselves.
Container Database Backups
One of the cluster infrastructure nodes was designated as the container backup node. A shell script is run daily to find any PostgreSQL database containers that were deployed using the persistent PostgreSQL templates provided by OpenShift. For every such container found, the pg_dumpall
command is executed in that container to dump all the hosted databases to a file with the following naming pattern: [timestamp]-[project]:[dc]:[pod].sql
in the output directory designated for PostgreSQL database backups (/var/accs/db_dumps/pgsql
). The [timestamp]
reflects the time that the backup was started, and has the format yyyymmdd-hhmm
where yyyy
is the 4-digit year, mm
is the 2-digit month (0 padded), dd
is the 2-digit day of the month (0 padded), hh
is the 2-digit hour (0 padded, 24 hour format), and mm
is the 2-digit minute of the hour (0 padded). The [project]
is the short project name (as used with oc project [project]
. The [dc]
and [pod]
portions are the names of the deployment configuration, and pod respectively.
The last seven days' worth of backups are preserved on the cluster node. Additional scripts would need to be created to backup different types of hosted database systems, and have their output placed in appropriate subdirectories of /var/accs/db_dumps
.
Cluster Node Backups
All of the nodes in the cluster are configured to be backed up with the normal CDC backup mechanism. The node designated as the container backup node also has the backup files generated by the backup script described above included in its backup set.
(Appendix) Shared Service Provider Recommendations
The following subsections provide recommendations for shared service providers intended to promote consistency across shared services deployed on ECPaaS.
Avoid Duplication
Before embarking on new service creation, potential shared service providers should become familiar with existing services to ensure they do not inadvertently duplicate existing functionality. If a similar service already exists it may be more cost effective to work with the service provider to add needed functionality than to create a whole new service.
Documentation
The existing shared services hosted on ECPaaS are documented in a publicly available CDC Website to promote discoverability. It is recommended that new shared services are added to the list hosted on this Website and that the documentation follows a similar structure to that used for existing services.
The application programming interfaces (APIs) of existing shared services are documented using the OpenAPI format. OpenAPI defines a language-agnostic format for describing RESTful APIs and its use makes it easier for service consumers to discover the capabilities of a service without access to the service source code or other human-oriented documentation. Shared service providers should consider describing their services using the OpenAPI format.
Support
As described above, ECPaaS support requests are managed using the CDC Service Manager application. It is recommended that shared service providers manage their support function using the same application to promote ease of integration and to ensure that support workflows can be managed in a single place rather than spanning multiple systems.
Shared service providers should document the support they offer including hours of operation, processes for initiating and managing requests, and expected issue turnaround times.
Software Development
Not all shared services will involve software development, the . The following subsections are specific to those that do.
Source Code Management and Continuous Delivery
OpenShift provides native support for Jenkins Continuous Integration/Continuous Delivery (CI/CD) pipelines. To make use of this functionality, shared service source code needs to be managed in a repository that is reachable from the ECPaaS clusters to allow Jenkins to access the source code, perform software builds, and execute tests. The source code repository could be public (e.g. GitHub) or private to CDC.
Depending on the source code management workflow adopted, Jenkins CI can be used to flag broken builds (as per a centralized workflow) or to assist with quality assurance prior to new code being incorporated in the main branch of the code (as per a feature branch workflow). Service provider teams should assess which workflow best suits them and then incorporate CI as far as possible to avail themselves of its significant benefits.
Use of Trusted Images
Operations staff maintain a trusted registry that contains a set of vetted images for popular applications and services to assist developers in selecting pre-approved foundational elements for their applications. Use of images from the trusted registry should streamline the path to an operational service since it will be based on pre-approved components that already incorporate required security controls. Shared service providers should work with operations and security staff to add new images and new versions of existing images to the trusted registry when needed.
Use of Source-to-Image Builders
As outlined earlier, OpenShift supports a mechanism for packaging applications into Docker containers called Source-to-Image (S2I). This mechanism can also be used independently of the OpenShift platform using the standalone S2I utility. By leveraging this mechanism, together with S2I builder images from the trusted registry, development teams will be able to package their applications in a docker image derived from a pre-approved base image without having to know the particulars of the Dockerfile format, nor the security implications of particular settings in the Dockerfile. This should also streamline the path to an operational service since security staff will only need to review the application-specific additions to the pre-approved S2I builder image.
New builds using the Source-to-Image process can be automatically triggered when the builder image that the application uses is updated in the trusted registry. This makes it possible to roll out security fixes to all services that incorporate a particular image provided their builds are automated as recommended.
(Appendix) Glossary
- Container Docker containers package a piece of software with a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries — anything that can be installed on a server. This guarantees that the software will always run the same, regardless of its environment.
- DevOps A term used to refer to a set of practices that emphasize the collaboration and communication of both software developers and IT professionals while automating the process of software delivery and infrastructure changes. It aims at establishing a culture and environment, where building, testing, and releasing software can happen rapidly, frequently, and more reliably.
- Docker Docker is a company that produces the leading software containerization platform. The Docker platform includes a specification that defines the container image format, the Linux system daemon that controls the lifecycle of Docker-formatted containers, and a command line interface (CLI) tool that is used to build, start, stop, and manipulate Docker-formatted containers.
- Dockerfile The configuration file that controls how an application is packaged into a docker container image.
- Incident Response A Computer Security Response Team (CSRT) at the CDC will have a process that outlines how suspicious computer actions or activities are handled.
- Internet Protocol The IP is the principal communications protocol in the internet protocol suite for relaying datagrams across network boundaries. Its routing function enables internetworking, and essentially establishes the Internet. IP has the task of delivering packets from the source host to the destination host solely based on the IP address in the packet headers. For this purpose, IP defines packet structures that encapsulate the data to be delivered. It also defines addressing methods that are used to label the datagram with source and destination information.
- Kubernetes Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.
- OpenShift An application platform based on docker containers and kubernetes container cluster management. It augments these components with additional capabilities, such as application lifecycle management and DevOps tooling.
- Platform as a Service A category of cloud computing services that provides a platform allowing customers to develop, run, and manage applications without the complexity of building and maintaining the infrastructure typically associated with developing and launching an application.
- Source-to-Image A process that controls the packaging of applications into docker containers by specifying the source code repository and a builder image.
- Trusted Registry A concept for a set of docker container images that have been scanned, reviewed, and approved for use on the CDC computer networks.