aws

Gruntwork Newsletter, December 2018

Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the…
Gruntwork Newsletter, December 2018
YB
Yevgeniy Brikman
Co-Founder
Published November 7, 2018

Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the DevOps industry, and important security updates. Note that many of the links below go to private repos in the Gruntwork Infrastructure as Code Library and Reference Architecture that are only accessible to customers.

Hello Grunts,

In the last month, we created modules for something many of you have been requesting for a long time: Kubernetes! We also worked with the InfluxData team to create and open source new modules to run InfluxDB Enterprise, created a talk and blog post about the lessons we’ve learned from writing over 300,000 lines of infrastructure code, added features and docs to make it easier to undeploy environments, and fixed several bugs in the Reference Architecture. In the DevOps world, AWS re:Invent was last week, and AWS released a huge number of new features and services, including a managed Kafka service (MSK), managed service discovery (Cloud Map), managed service mesh (AppMesh), managed time-series database (Timestream), managed Blockchain service, managed forecasting service (based on machine learning), global (multi-region) Aurora databases, and much more.

Finally, the entire Gruntwork team will be on holidays for the last week of December and first week of January (specific dates below). Happy holidays!

As always, if you have any questions or need help, email us at support@gruntwork.io!

Gruntwork Updates

New module: Kubernetes and EKS Infrastructure Package (Private Beta)

Motivation: We wanted to make it 10x easier to provision and manage a Kubernetes cluster on AWS, as well as make it easier to package your applications to follow all the best practices when deploying on Kubernetes.

Solution: We began creating a set of reusable modules that allow you to deploy and manage Kubernetes clusters and Kubernetes services. We are happy to announce the availability of the first set of modules in a private beta. The modules allow you to:

  • Provision an EKS cluster with extendable security groups and IAM roles.
  • Provision and manage multiple clusters of EKS worker nodes in separate autoscaling groups.
  • Automatically configure your local kubectl to authenticate with EKS cluster.
  • Manage IAM role to RBAC group authentication mappings as code.

This initial version is intended for development and learning purposes. The next version will include additional features that help manage a production grade EKS cluster, including automatic scaling on demand, zero-downtime rollout of AMI updates, allocating IAM roles per pod in a cluster, and helper scripts to manage a secure helm installation.

What to do about it: All of the new code is available in the package-k8s repository, currently available only to private beta participants. The private beta is open to any Gruntwork subscriber. If you are interested in participating, email us at support@gruntwork.io and we’ll grant you access (and if you’re not a subscriber, sign up now)!

New module: InfluxDB Enterprise for AWS

Motivation: The InfluxData team wanted to make it 10x easier for users to get up and running with InfluxDB Enterprise on AWS.

Solution: We created the terraform-aws-influx module, which is open source on GitHub under the Apache 2.0 License, and available in the Terraform Registry! This module makes it easy to spin up an InfluxDB Enterprise cluster in just a few lines of Terraform code:

provider "aws" {
region = "us-east-1"
}
module "influxdb" {
source        = "gruntwork-io/influx/aws"
version       = "0.1.0"
aws_region    = "us-east-1"
ami_id        = "ami_01a2b3c4d"
license_key   = "0000-1111-2222-3333-4444"
shared_secret = "xxxxxxxxxxx"
}
output "influxdb_url" {
value = "${module.influxdb.lb_dns_name}"
}

Put the code above into a file called main.tf, run terraform init and terraform apply, and in a few minutes, you’ll have an InfluxDB Enterprise cluster running in AWS!

What to do about it: Give the InfluxDB modules a try and let us know how they work for you!

5 lessons learned from writing over 300,000 lines of infrastructure code

This October, Gruntwork co-founder Yevgeniy (Jim) Brikman gave a talk at HashiConf 2018 where he shared 5 key lessons we learned at Gruntwork while creating and maintaining a library of over 300,000 lines of infrastructure code that’s used in production by hundreds of companies. Check out the blog post for the video and slides from the talk, as well as a condensed, written version of the presentation: 5 lessons learned from writing over 300,000 lines of infrastructure code.

Holiday schedule

Motivation: At Gruntwork, we are a family-friendly company and believe in taking plenty of time off.

Solution: The entire Gruntwork team will be on vacation December 24th — January 4th. During this time, there may not be anyone around to respond to support inquiries, so please plan accordingly.

What to do about it: We hope you’re able to relax and enjoy some time off as well. Happy holidays!

Undeploying modules and environments

Motivation: Several customers let us know that it wasn’t obvious how to undeploy a module or an entire environment in their Reference Architecture; in fact, they were hitting confusing errors about S3 buckets.

Solution: We’ve added documentation on how to undeploy modules and environments in the Reference Architecture!

What to do about it: Check out the new docs and let us know if they make things easier. Also, note that the docs assume you have exposed a force_destroy setting on all modules that use S3 buckets, as shown in this commit.

Reference Architecture Bug Fixes

Motivation: We found a few minor bugs in the deployed reference architecture. Although the stack is still functional, these changes help improve the maintainability of the architecture. The following modules were affected:

  • Kafka (AMI)
  • Jenkins (AMI)
  • ELK

Solution: We implemented the bug fixes in our reference architecture example repos for Acme. The specific fixes are:

  • Fix Jenkins packer script to avoid prompt when upgrading packages
  • Pin Kafka to version 1.0.2 in packer script, since 1.0.0 is no longer available
  • Fix ELK stack to maintain individual configs for non-TLS version in single cluster
  • Fix ELK stack to have the load balancer point to master nodes instead of data nodes

What to do about it: Check out the diffs to see how to upgrade your stack. See this commit for the multi account version and this commit for the single account version.

gruntkms update

Motivation: gruntkms, our CLI utility that makes it easier to use AWS KMS to encrypt and decrypt secrets (and files with secrets), was running with an old version of the AWS SDK, so certain features (e.g., ECS Task IAM Roles) were not working correctly.

Solution: We’ve updated the gruntkms dependency versions so it’s now on the latest AWS Go SDK.

What to do about it: Update to gruntkms, v0.0.7. Note that Dockerized sample apps in the Reference Architecture used to have copy of the gruntkms binary. The new recommended way to install gruntkms in Docker is via an intermediate build (to ensure the secrets you use to access private repos don’t end up in the final Docker image), as shown in this commit.

Other updates

  • module-security, v0.15.4: The cloudtrail module now exposes a force_destroy flag you can use to forcibly delete all the contents of the CloudTrail S3 bucket when you run destroy.
  • module-security, v0.15.5: The cross-account-iam-roles, iam-groups, and iam-policies modules now all expose roles, groups, and policies, respectively, that grant permissions to talk to the CLI endpoints of Gruntwork Houston.
  • package-static-assets, v0.3.3: Expose force_destroy_website and force_destroy_redirect flags in the s3-static-website module. You can use these flags to force the module S3 buckets in the module to be destroyed, even if they still have content in them.
  • package-static-assets, v0.3.4: You can now specify custom tags for all S3 buckets created by s3-static-website and s3-cloudfront modules using the new (optional) custom_tags parameter.
  • package-static-assets, v0.4.0: The s3-cloudfront module will now automatically create an AAAA alias record (in addition to the Arecord it always created) if is_ipv6_enabled and create_route53_entries are both set to true. This is necessary so your static websites work over IPv6.
  • module-data-storage, v0.8.0: Splits the single cluster_with_encryption resource into two permutations cluster_with_encryption_serverless and cluster_with_encryption_provisioned. This is to get around the fact that you can only specify the scaling_configuration block when in serverless engine mode. This change is backwards incompatible and you’ll need to follow the upgrade guide in the release notes to update your existing cluster.
  • module-data-storage, v0.8.1: You can now enable performance insights using two new (optional) parameters, performance_insights_enabled and performance_insights_kms_key_id.
  • module-ecs, v0.10.1: Fixes a bug in the check-ecs-service-deployment script where it did not properly detect the major python version on certain OS versions.
  • module-ecs, v0.10.2: Implements preliminary work on windows support for check-ecs-service-deployment script by using python as opposed to bash for the entrypoint. Also rebuilds the binaries to include windows versions of the dependencies.
  • module-server, v0.5.1: Fix volume_ids: readonly variable bug that would show up on Ubuntu 18.04 for mount-ebs-volume. Fix bug with missing is_nvme function in unmount-ebs-volume.

Open source updates

  • Terragrunt, v0.17.2: You can now tell Terragrunt to assume an IAM role by setting the iam_role parameter in your terragrunt = { ... } configuration.
  • Terragrunt, v0.17.3: Fix a potential concurrency bug in how Terragrunt handled stdout/stderr that could lead to transient failures with hard-to-understand error messages.
  • Terratest, v0.13.13: Terratest now includes functions to support verification of a Kubernetes cluster. These functions are available in the k8s package.
  • Terratest, v0.13.14: Fixes a bug where ssh.CheckSshCommand and related functions were not returning the command output when it fails.
  • terraform-aws-consul, v0.4.2: The install-consul module now retries the download of the consul package to prevent transient failures.
  • terraform-aws-consul, v0.4.3: The consul-cluster module now exposes a new optional enabled_metrics parameter that you can use to specify the metrics that should be enabled in the underlying Auto Scaling Group.
  • terraform-aws-consul, v0.4.4: The run-consul module reorganized the way autopilot configurations can be set and has additional documentation on autopilot features.
  • terraform-aws-vault, v0.11.0: New examples were added on how to authenticate to vault using the EC2 and IAM methods and on how to automatically unseal the Vault cluster. vault-cluster was updated to include the necessary policies for Auto-Unseal and be more robust against transient failures.run-vault was updated to allow enabling Auto-Unseal and passing the necessary configuration. The Auto-Unseal feature is currently part of Vault Enterprise, but will be released to Vault open source as well from version 1.0, which is currently in beta.
  • terraform-aws-vault, v0.11.1: The vault-cluster module now allows specifying autoscaling group metrics.
  • terraform-google-vault, v0.1.2: The vault-cluster module now allows using different GCP projects for launching your cluster, fetching your compute image, and referencing your network resources. It also now uses Regional Managed Instance Group instead of Zonal Managed Instance Group. This way, Vault nodes are spread across multiple Zones instead of being co-located in a single Zone, which means High Availability. The run-vaultmodule now enables the Vault UI by default and additionally, the private cluster example now creates a subnetwork with internal access to the Google API so clusters can fetch information about its nodes without internet access.
  • gruntwork-installer, v0.0.22: Update to fetch v0.3.2, which includes several improvements, including support for GitHub Enterprise.
  • bash-commons, v0.1.0: Made a number of improvements to make the scripts more portable, use better shell script practices, and fix a few bugs.

DevOps News

AWS re:Invent announcements

What happened: AWS had it’s major annual conference, re:Invent, in Las Vegas, where it announced a ton of new services and features.

Why it matters: Here are a few of the highlights of what was announced at re:Invent:

  1. Amazon Managed Streaming for Kafka (MSK): Amazon is now offering a managed Kafka service. This makes it easier to use Kafka for streaming and event-drive apps without having to run and manager ZooKeeper and Kafka brokers yourself.
  2. AWS Cloud Map: Cloud Map provides service discovery for all AWS resources. It allows you to register any resource, such as a database, queue, or microservice with a custom name. Cloud Map then constantly checks the health of the resource to make sure the location is up-to-date. Other applications can then query the registry for the location of the resources needed based on the application version and deployment environment. Cloud Map is automatically integrated with ECS, Fargate, and EKS.
  3. AWS App Mesh: App Mesh is a service mesh that provides not just service discovery (as with CloudMap), but also fine-control over routing and access control between microservices. Under the hood, App Mesh is built on top of the Envoy proxy. You can use App Mesh with ECS, EKS, and self-managed Kubernetes.
  4. Amazon Timestream: Amazon Timestream is a managed time-series database for collecting, storing, and processing time-series data such as server and network logs, sensor data, and industrial telemetry data for IoT and operational applications.
  5. Amazon Managed Blockchain: A fully managed service that makes it easy to create and manage scalable blockchain networks using the popular open source frameworks Hyperledger Fabric and Ethereum.
  6. Amazon Quantum Ledger Database (QLDB): A fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log ‎owned by a central trusted authority.
  7. Amazon Forecast: A fully managed service that uses machine learning to create highly accurate forecasts.
  8. Aurora Global: a new feature which allows a single MySQL-compatible Aurora database to span multiple AWS regions, with fast replication to enable low-latency global reads (typically less than 1 second) and disaster recovery from region-wide outages (secondary regions can be promoted to full read/write capabilities in less than 1 minute).
  9. AWS Transit Gateway: A managed service to help you build a hub-and-spoke network topology. You can connect your existing VPCs, data centers, remote offices, and remote gateways to a managed Transit Gateway, with full control over network routing and security, even if your VPCs, Active Directories, shared services, and other resources span multiple AWS accounts. You can attach up to 5000 VPCs to each gateway and each attachment can handle up to 50 Gbits/second of bursty traffic. You can attach your AWS VPN connections to a Transit Gateway today, with Direct Connect planned for early 2019.
  10. Custom runtimes for AWS Lambda: AWS Lambda now exposes a Runtime API that gives you a way to execute code in any language on top of AWS Lambda (see the new Ruby runtime for an example).
  11. Layers for AWS Lambda: Lambda Layers allow you to share code amongst your Lambda functions. You upload a zip file with the shared code (e.g., common dependencies, such as SDKs and libraries used by all your Lambda functions) and you can reference that zip file from any Lambda function to have access to the code within it.
  12. 8 new service integrations for AWS Step Functions: Your Step Functions state machines can now automatically trigger not only Lambda functions, but also ECS tasks, Fargate tasks, and read and write data to/from Amazon DynamoDB, Amazon SNS, Amazon SQS, AWS Batch, AWS Glue, and Amazon SageMaker.
  13. ALB can invoke Lambda: The Application Load Balancer (ALB) can now directly invoke your Lambda functions. This makes it possible to build serverless apps without using API Gateway.
  14. DynamoDB has added support for an on-demand mode, where it scales automatically with load, and you pay per-request, rather than having to do capacity planning ahead of time, as well as support for transactions, which provide atomicity, consistency, isolation, and durability (ACID), simplifying making coordinated, all-or-nothing changes to multiple items both within and across tables.
  15. CloudWatch Logs Insights: A fully integrated, interactive, and pay-as-you-go log analytics service for CloudWatch Logs that lets you explore, analyze, and visualize your logs instantly.

What to do about it: This is only a partial list of everything announced at re:Invent! For the full list, plus videos and more details, check out the re:Invent Product Announcements page.

Other AWS announcements

What happened: Earlier in the month, before re:Invent, AWS announced several other new features.

Why it matters: Here are a few of the highlights from last month:

  1. Aurora Serverless for PostgreSQL: Aurora Serverless is an on-demand database that can power down—and cost you essentially nothing—when you’re not using it and boot back up in seconds, which is great for pre-prod and test environments and infrequently-used apps. Now Aurora Serverless is available in both MySQL and PostgreSQL compatible flavors.
  2. Access Aurora Serverless as a RESTful API: You can now execute queries on your Aurora serverless databases using RESTful APIs (via the AWS SDK), without the need to use a standard database driver with a direct connection.
  3. Predictive Auto Scaling: AWS now supports automatic Auto Scaling powered by machine learning. It monitors daily and weekly patterns and tries to adjust capacity automatically inline with a forecast.
  4. CloudWatch alarms support metric math: You can now create CloudWatch alarms not only on metric values, but also on mathematical operations (+, -, /, *, etc) on top of those metrics.
  5. Amazon Corretto OpenJDK distribution: Amazon is now offering a no-cost, multiplatform, production-ready distribution of the Open Java Development Kit (OpenJDK). Since Oracle JDK now requires a paid support license for production usage, this is an interesting OpenJDK flavor alternative to consider for running your JVM apps.

What to do about it: Check out the new features mentioned above and let us know (a) how they work for you and (b) if we need to update our modules to better support this new functionality.

Security Updates

Below is a list of critical security updates that may impact your services. We notify Gruntwork customers of these vulnerabilities as soon as we know of them via the Gruntwork Security Alerts mailing list. It is up to you to scan this list and decide which of these apply and what to do about them, but most of these are severe vulnerabilities, and we recommend patching them ASAP.

event-stream library

  • https://github.com/dominictarr/event-stream/issues/116: If you are using anything crypto-currency related and Node.js, then you may be affected. As discovered by @maths22, the target seems to have been identified as copay related libraries. It only executes successfully when a matching package is in use (assumed to be copay at this point). If you are using a crypto-currency related library and if you see flatmap-stream@0.1.1 after running npm ls event-stream flatmap-stream, you are most likely affected. We notified the Gruntwork Security Alerts mailing list about this vulnerability on November 27th, 2018.

Consul RCE Risk with specific configurations

  • https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations: HashiCorp is announcing the discovery of malware targeting Consul nodes with a specific configuration which allows remote code execution. You should take action if you have -enable-script-checks set to true, or are running Consul 0.9.0 or earlier, and the Consul API is available on an interface that can be accessed over the network. You may be vulnerable if: (1) The API is available on an interface that can be accessed over the network; (2) Script checks are enabled; (3) ACLs are disabled or an ACL token is compromised.

Kubernetes privilege escalation vulnerability

  • https://github.com/kubernetes/kubernetes/issues/71411: If you are managing a Kubernetes cluster, there is a vulnerability discovered that would allow authenticated users to escalate their privileges beyond the RBAC role assigned to them, or under certain configurations, allow unauthenticated users to gain privileged access. A patch has been released for Kubernetes versions 1.10 (1.10.11), 1.11 (1.11.5), and 1.12 (1.12.3). The severity of the vulnerability depends on whether or not you have unauthenticated privilege escalation.
  • You are vulnerable to unauthenticated privilege escalation if: (1) You are managing your own Kubernetes cluster; (2) Running any of the unpatched versions; (3) Have unauthenticated access enabled with the aggregated api feature enabled. No immediate action is needed if you are running any of the following flavors of managed Kubernetes services: (1) EKS — AWS released a statement that they are patching all existing EKS clusters and no customer action is needed; (2) KOPS — According to justinsb, Kops based clusters disable unauthenticated access by default and thus is not vulnerable to unauthenticated access; (3) GKE — Google has released a statement that they have patched all running GKE clusters; (4) AKS — According to CecileRobertMichon, Kubernetes clusters running AKS disable unauthenticated access.
  • However, note that all Kubernetes clusters are vulnerable to authenticated user privilege escalation. If this is a concern to you, you should upgrade your Kubernetes cluster to the latest patch version.