API design guidance can be found on the REST API Guidance page. This page provides guidance on several aspects of API implementation at CDC. There are no mandatory components to this guidance, but readers are encouraged to consider the implications of not following this guidance (e.g. development of redundant functionality and difficulty integrating with CDC operations systems).
Version | Date | Description |
---|---|---|
0.9 | 2/8/2019 | Initial draft for review by SDP Tech Panel |
0.10 | 2/28/2019 | Added implementation language and framework guidance |
0.11 | 5/16/2019 | Added additional framework and development tools |
Language and framework choice will ultimately depend on the preferences and prior experience of team members. However some general guidance applies regardless:
Web API frameworks exist for most popular programming languages and new ones appear regularly. The following is a non-exhaustive list of some popular Web API frameworks by programming language:
In addition to a Web API framework, the following tools may also prove useful:
The ability to monitor services provides many benefits to both the service owner and clients of the service:
Monitoring approaches can be broadly categorized as either passive or active. Each approach is described in the following subsections.
Passive monitoring infrastructure deals with information generated by a service in its normal course of execution. The information used in passive monitoring typically takes the form of logs or event streams. Monitoring infrastructure provides aggregation and visualization functionality to enable users to identify issues and track trends. Brice Figureau's 10 Commandments of Logging describes the following logging best practices in detail.
A particular service interaction can spawn a cascade of activity in downstream services. It is often useful to be able to link upstream and downstream events or log entries when investigating issues. This capability is known as distributed tracing and is implemented in several open source projects and products. A distributed tracing working group was chartered by the W3C in July 2018 to "define standards for interoperability between tracing tools" and has participation from several product vendors.
At CDC, ITSO operations use Splunk to aggregate and manage logs from multiple systems. ITSO's Enterprise Container Platform as a Service (ECPaaS) includes a complete log aggregation and exploration capability as does the CA Application Gateway managed by MISO.
For local log management a good starting point is the ELK Stack that is comprised of three open source projects: Elasticsearch, Logstash and Kibana. Often the Logstash component is replaced with Fluentd, a Cloud Native Computing Foundation (CNCF) member project. This variant is refered to as an EFK Stack. ECPaaS uses an EFK Stack for its built in log aggregation and exploration functionality.
Active monitoring, often referred to as "health checks," relies on direct interaction between the monitoring infrastructure and a service. It typically takes the form of a monitoring agent invoking functionality provided by the service, ensuring a positive response, and capturing performance metrics.
An IETF internet draft proposes a standard health check response format and a cut-down example is included below for ease of reference.
GET /health HTTP/1.1 Host: example.org Accept: application/health+json HTTP/1.1 200 OK Content-Type: application/health+json Cache-Control: max-age=3600 Connection: close { "status": "pass", "version": "1", "releaseID": "1.2.2", "notes": [""], "output": "", "serviceID": "f03e522f-1f44-4062-9b55-9587f91c9c41", "description": "health of authz service", "details": { "cassandra:responseTime": [ { "componentId": "dfd6cf2b-1b6e-4412-a0b8-f6f7797a60d2", "componentType": "datastore", "observedValue": 250, "observedUnit": "ms", "status": "pass", "time": "2018-01-17T03:36:48Z", "output": "" } ], "uptime": [ { "componentType": "system", "observedValue": 1209600.245, "observedUnit": "s", "status": "pass", "time": "2018-01-17T03:36:48Z" } ] }, "links": { "about": "http://api.example.com/about/authz", "http://api.x.io/rel/thresholds": "http://api.x.io/about/authz/thresholds" } } |
Health check agents can be built into the platform that hosts a service (e.g., the application health probes built into ITSO's ECPaaS), or they can be implemented by an external monitoring system such as New Relic or Pingdom. One advantage of the external monitoring approach is that it can detect network issues that may impact external service clients that would be invisible within a hosting platform (e.g., network bottlenecks between external clients and the service).
User privacy needs to be considered when deciding what information to include in logs. IETF RFC 6302 documents logging best practices for internet-facing servers. RFC 6973 offers guidance for incorporating privacy into internet protocols and a new internet draft proposes updates to RFC 6302 based on the guidance from RFC 6973. Readers are encouraged to review the internet draft and consider its recommendations carefully, particularly given the nature of data in use at CDC and new privacy regulations such as the European Union's General Data Protection Rules (GDPR).
Services hosted on-premise at CDC or in the AHB CDC Cloud Services (ACCS) environment (which includes services hosted on ECPaaS) benefit from the protection provided by the HHS trusted internet connection (TIC). HHS TIC benefits include
Service providers should familiarize themselves with TIC architecture and ensure their systems take full advantage of these capabilities and avoid unnecessarily duplicating those capabilities.
The CA Application Gateway managed by MISO offers additional threat protection capabilities that should be reviewed for applicability.
Reliability and availability are key desiderata of a service that can be impacted by bad clients that overwhelm a service's capacity. Clients need not be malicious to adversely impact a service. Software bugs or overly enthusiastic adopters can easily generate a surfeit of requests that impact service performance for others. Rate limiting (also called throttling) is a key enabler for reliable and available services.
Rate limiters manage the load on a service and can use different approaches including:
It may be appropriate to deploy several different types of rate limiters on the same service to deal with all eventualities.
Services deployed in CDC can take advantage of at least two existing rate limiting capabilities: