Skip to content

Instantly share code, notes, and snippets.

@manuelzi
Forked from ejlp12/1_ecs_note.md
Created September 21, 2023 20:25
Show Gist options
  • Save manuelzi/12330fa62b64969ff702e3b0ba853799 to your computer and use it in GitHub Desktop.
Save manuelzi/12330fa62b64969ff702e3b0ba853799 to your computer and use it in GitHub Desktop.

Revisions

  1. @ejlp12 ejlp12 revised this gist Mar 29, 2021. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion 1_ecs_note.md
    Original file line number Diff line number Diff line change
    @@ -37,7 +37,7 @@
    - Fargate
    - Understand the use case. [EC2 or Fargate?](https://containersonaws.com/introduction/ec2-or-aws-fargate/)
    - Understand its [Task Definition limitation & CPU/Memory configuration](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html). If it is not match with your workload requirement then use EC2 launch type.
    - User EFS for persistent storage, but consider the performance, better make the task immutable.
    - User EFS for persistent storage, but consider the performance, immutable task is always better.
    - Use it for general purpose workload (burstable), assume your Fargate task will run on 't' or 'm' type of instance
    - Don't use it if you need GPU, high network bandwidth (50Gbps, 100Gbps), very high vPUC or RAM
    - User [Fargate on Spot instances](https://aws.amazon.com/blogs/compute/deep-dive-into-fargate-spot-to-run-your-ecs-tasks-for-up-to-70-less/) to reduce the cost. Or [use both Fargate & Fargate Spot using Capacity Provider](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html)
  2. @ejlp12 ejlp12 revised this gist Mar 29, 2021. 1 changed file with 10 additions and 2 deletions.
    12 changes: 10 additions & 2 deletions 1_ecs_note.md
    Original file line number Diff line number Diff line change
    @@ -33,13 +33,15 @@
    - Opt-in for `awsvpcTrunking` and use `awsvpc` as network mode
    - [Supported EC2 instance types](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-eni.html#eni-trunking-supported-instance-types) and read [some considerations](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-eni.html#eni-trunking-considerations)
    - {Day2} [Setup Automated update EC2 instances](https://aws.amazon.com/blogs/compute/automatically-update-instances-in-an-amazon-ecs-cluster-using-the-ami-id-parameter/), since [doing it manually](https://aws.amazon.com/blogs/compute/refreshing-an-amazon-ecs-container-instance-cluster-with-a-new-ami/) is hard and error prone

    - Fargate
    - Understand the use case. [EC2 or Fargate?](https://containersonaws.com/introduction/ec2-or-aws-fargate/)
    - Understand its [Task Definition limitation & CPU/Memory configuration](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html). If it is not match with your workload requirement then use EC2 launch type.
    - User EFS for persistent storage, but consider the performance, better make the task immutable.
    - Use it for general purpose workload (burstable), assume your Fargate task will run on 't' or 'm' type of instance
    - Don't use it if you need GPU, high network bandwidth (50Gbps, 100Gbps), very high vPUC or RAM
    - User [Fargate on Spot instances](https://aws.amazon.com/blogs/compute/deep-dive-into-fargate-spot-to-run-your-ecs-tasks-for-up-to-70-less/) to reduce the cost. Or [use both Fargate & Fargate Spot using Capacity Provider](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html)

    - Networking
    - Use separate VPC, don't mix up with other service eg. EC2 instances that are not belong to the cluster.
    - Plan your VPC & Subnet CIDR, avoid complexity of using multiple CIDRs in a VPC
    @@ -49,10 +51,12 @@
    - ECS Cluster & ECR are better in the same Region
    - Use network mode = `awspvc` for greater security using SG, easy troubleshooting (using VPC flow log)
    - Use network mode = `host`, if you want the task bypasses Docker's built-in virtual network and maps container ports directly to the EC2 instance's network interface directly

    - Task Definition
    - Don't store env variables in the task definition, instead [use Parameter Store](https://aws.amazon.com/blogs/compute/managing-secrets-for-amazon-ecs-applications-using-parameter-store-and-iam-roles-for-tasks/) - more secure.
    - Always set [`healthCheck`](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_healthcheck) parameter in the Container Definition for task that will be part of ECS service or using ECS Service Discovery.
    - Adjust other health check parameters: `interval, timeout, retries, startPeriod` based on your app characteristics

    - Service
    - [Consider to use placement strategy](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/)
    - use “availability-zone” as spread attribute, to spread the Tasks being launched as evenly as possible across AZ
    @@ -64,20 +68,24 @@
    - Tune scaling parameters: healthcheck grace period and scaling cooldowns
    - Recommended to use [Target Tracking Scaling Policies](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html) instead of Step Scaling Policies. Common scaling metric is based on EC2's CPU utilization or request count per target of ALB's target group.
    - [Use API gateway](https://aws.amazon.com/blogs/compute/using-amazon-api-gateway-with-microservices-deployed-on-amazon-ecs/) to expose services

    - Observability
    - Send application log to standar output and stream to centralize logging. Take advantage of aws-logs driver & CloudWatch
    - Enable CloudWatch Container Insight to collect more detail monitoring metrics and logging.
    - Use X-Ray for transaction tracing for troubleshooting perfomance.

    - Deployment
    - [Blue/Green deployment](https://aws.amazon.com/blogs/compute/bluegreen-deployments-with-amazon-ecs/) using [CodePipeline, CodeBuild, CloudFormation and Lambda](https://github.com/aws-samples/ecs-blue-green-deployment)

    - [Security](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security.html)
    - Set least privilege port access in SG of EC2 Container instances
    - Set least privilege for Container instance IAM role
    - Consider [using TLS end-to-end communication with NLB](https://aws.amazon.com/blogs/compute/maintaining-transport-layer-security-all-the-way-to-your-container-using-the-network-load-balancer-with-amazon-ecs/) and [evaluate some options to store/manage your certificates](https://aws.amazon.com/blogs/compute/maintaining-transport-layer-security-all-the-way-to-your-container-part-2-using-aws-certificate-manager-private-certificate-authority/)
    - Stored value securely in AWS Systems Manager Parameter Store or AWS Secrets Manager, then [inject data into containers in the Container Definition](https://aws.amazon.com/premiumsupport/knowledge-center/ecs-data-security-container-task/) of an Task Definition. [Try the lab!](https://ecsworkshop.com/secrets/)


    - Cost Optimization
    - Right sizing EC2 container instances
    - Set tagging for all Containter instances
    - Consider to use EC2 Spot and Fargate Spot


    [1]
  3. @ejlp12 ejlp12 revised this gist Mar 29, 2021. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions 1_ecs_note.md
    Original file line number Diff line number Diff line change
    @@ -8,6 +8,7 @@
    - https://github.com/nathanpeck/ecs-cloudformation
    - https://github.com/awslabs/aws-cloudformation-templates/tree/master/aws/services/ECS
    - CDK for ECS: [blog](https://github.com/awslabs/aws-cloudformation-templates/tree/master/aws/services/ECS)
    - Terraform [examples](https://registry.terraform.io/modules/terraform-aws-modules/ecs/aws/latest/examples/complete-ecs)
    - Use EC2 Auto Scaling Group & Capacity Provider for better scaling
    - Enable [AWS VPC Trunking](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html) setting in account level for higher task density in [some EC2 types instance](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-eni.html#eni-trunking-supported-instance-types)
    - Favor configuring ECS Clusters with EC2 instances in at least 3 AZ, keep keep the instance counts balanced across the AZs. More on [availability best practices](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/)
  4. @ejlp12 ejlp12 revised this gist Mar 29, 2021. 1 changed file with 8 additions and 1 deletion.
    9 changes: 8 additions & 1 deletion 1_ecs_note.md
    Original file line number Diff line number Diff line change
    @@ -4,6 +4,10 @@
    - Cluster
    - Use IaC for setting up resources eg. CloudFormation, AWS CDK, Terraform
    - Use [copilot](https://github.com/aws/copilot-cli) for simple setup
    - CloudFormation reference architecture [template](https://github.com/aws-samples/ecs-refarch-cloudformation)
    - https://github.com/nathanpeck/ecs-cloudformation
    - https://github.com/awslabs/aws-cloudformation-templates/tree/master/aws/services/ECS
    - CDK for ECS: [blog](https://github.com/awslabs/aws-cloudformation-templates/tree/master/aws/services/ECS)
    - Use EC2 Auto Scaling Group & Capacity Provider for better scaling
    - Enable [AWS VPC Trunking](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html) setting in account level for higher task density in [some EC2 types instance](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-eni.html#eni-trunking-supported-instance-types)
    - Favor configuring ECS Clusters with EC2 instances in at least 3 AZ, keep keep the instance counts balanced across the AZs. More on [availability best practices](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/)
    @@ -72,4 +76,7 @@
    - Cost Optimization
    - Right sizing EC2 container instances
    - Set tagging for all Containter instances
    - Consider to use EC2 Spot and Fargate Spot
    - Consider to use EC2 Spot and Fargate Spot


    [1]
  5. @ejlp12 ejlp12 revised this gist Mar 29, 2021. No changes.
  6. @ejlp12 ejlp12 revised this gist Dec 1, 2020. 1 changed file with 7 additions and 1 deletion.
    8 changes: 7 additions & 1 deletion 1_ecs_note.md
    Original file line number Diff line number Diff line change
    @@ -66,4 +66,10 @@
    - Deployment
    - [Blue/Green deployment](https://aws.amazon.com/blogs/compute/bluegreen-deployments-with-amazon-ecs/) using [CodePipeline, CodeBuild, CloudFormation and Lambda](https://github.com/aws-samples/ecs-blue-green-deployment)
    - [Security](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security.html)
    - Consider [using TLS end-to-end communication with NLB](https://aws.amazon.com/blogs/compute/maintaining-transport-layer-security-all-the-way-to-your-container-using-the-network-load-balancer-with-amazon-ecs/) and [evaluate some options to store/manage your certificates](https://aws.amazon.com/blogs/compute/maintaining-transport-layer-security-all-the-way-to-your-container-part-2-using-aws-certificate-manager-private-certificate-authority/)
    - Set least privilege port access in SG of EC2 Container instances
    - Set least privilege for Container instance IAM role
    - Consider [using TLS end-to-end communication with NLB](https://aws.amazon.com/blogs/compute/maintaining-transport-layer-security-all-the-way-to-your-container-using-the-network-load-balancer-with-amazon-ecs/) and [evaluate some options to store/manage your certificates](https://aws.amazon.com/blogs/compute/maintaining-transport-layer-security-all-the-way-to-your-container-part-2-using-aws-certificate-manager-private-certificate-authority/)
    - Cost Optimization
    - Right sizing EC2 container instances
    - Set tagging for all Containter instances
    - Consider to use EC2 Spot and Fargate Spot
  7. @ejlp12 ejlp12 revised this gist Dec 1, 2020. 2 changed files with 11 additions and 3 deletions.
    11 changes: 9 additions & 2 deletions 1_ecs_note.md
    Original file line number Diff line number Diff line change
    @@ -14,9 +14,10 @@
    - Don't use public IP address (Turn off Auto-assign Public IP)
    - Make EC2 instance immutable.
    - Better not to expose SSH for remote login, use AWS System Manager Run Command & Session Manager instead.
    - Use Spot Instance whenever possible eg. for Development environment
    - [Use Spot Instance](https://aws.amazon.com/blogs/compute/powering-your-amazon-ecs-cluster-with-amazon-ec2-spot-instances/) whenever possible eg. for Development environment
    - Find the instace type that are not frequently interrupted
    - Set Spot pricing to little bit higher than avarage
    - Use [Spot Fleet](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html) to deploy the target capacity you request (expressed in terms of instances or a vCPU count)
    - Understad how the EC2 container instance works
    - Don't use reserved ports for your application (Linux TCP: 22, 2375, 2376, 51678, 51679, 51680)
    - Don't store log files or any persistent data in the container - it will make docker storage full
    @@ -48,15 +49,21 @@
    - Always set [`healthCheck`](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_healthcheck) parameter in the Container Definition for task that will be part of ECS service or using ECS Service Discovery.
    - Adjust other health check parameters: `interval, timeout, retries, startPeriod` based on your app characteristics
    - Service
    - [Consider to use placement strategy](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/)
    - use “availability-zone” as spread attribute, to spread the Tasks being launched as evenly as possible across AZ
    - Service Discovery
    - Use Amazon ECS Service Discovery/CloudMap (internal domain resolution) for inter-service communication inside a cluster.
    - Be aware of SRV and A records for service lookup using DNS. 'A' record is simple, using SRV records you might change your app code since it will requires the app to resolve the IP address and the port.
    - Highly recommended to use ALB instead of ELB - dynamic port mapping, more detail monitoring & access log
    - Use placement strategy and constraint to maximize your resource. [CDK example](https://github.com/aws-samples/aws-cdk-examples/tree/master/python/ecs/ecs-service-with-task-placement/), [Terraform example](https://www.terraform.io/docs/providers/aws/r/ecs_service.html)
    - Tune scaling parameters: healthcheck grace period and scaling cooldowns
    - Recommended to use [Target Tracking Scaling Policies](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html) instead of Step Scaling Policies. Common scaling metric is based on EC2's CPU utilization or request count per target of ALB's target group.
    - [Use API gateway](https://aws.amazon.com/blogs/compute/using-amazon-api-gateway-with-microservices-deployed-on-amazon-ecs/) to expose services
    - Observability
    - Send application log to standar output and stream to centralize logging. Take advantage of aws-logs driver & CloudWatch
    - Enable CloudWatch Container Insight to collect more detail monitoring metrics and logging.
    - Use X-Ray for transaction tracing for troubleshooting perfomance.

    - Deployment
    - [Blue/Green deployment](https://aws.amazon.com/blogs/compute/bluegreen-deployments-with-amazon-ecs/) using [CodePipeline, CodeBuild, CloudFormation and Lambda](https://github.com/aws-samples/ecs-blue-green-deployment)
    - [Security](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security.html)
    - Consider [using TLS end-to-end communication with NLB](https://aws.amazon.com/blogs/compute/maintaining-transport-layer-security-all-the-way-to-your-container-using-the-network-load-balancer-with-amazon-ecs/) and [evaluate some options to store/manage your certificates](https://aws.amazon.com/blogs/compute/maintaining-transport-layer-security-all-the-way-to-your-container-part-2-using-aws-certificate-manager-private-certificate-authority/)
    3 changes: 2 additions & 1 deletion 2_ecs_fargate.md
    Original file line number Diff line number Diff line change
    @@ -18,4 +18,5 @@
    - When a security or infrastructure update is needed
    - No notification before recycling process
    - Only affect task that part of service (not standalone task)
    -
    - Fargate makes no network throughput guarantees, nor does it guarantee equal CPU performance among tasks,
    - [Expose Fargate using API gateway, VPC Link & NLB](https://medium.com/swlh/deploy-container-in-ecs-fargate-behind-api-gateway-nlb-for-secure-optimal-accessibility-with-95542d5867c3)
  8. @ejlp12 ejlp12 revised this gist Dec 1, 2020. 1 changed file with 6 additions and 2 deletions.
    8 changes: 6 additions & 2 deletions 1_ecs_note.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,6 @@
    ## ECS Best Practices

    - Understand and check [Service Quota of ECS/Fargate](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-quotas.html) and other related services
    - Cluster
    - Use IaC for setting up resources eg. CloudFormation, AWS CDK, Terraform
    - Use [copilot](https://github.com/aws/copilot-cli) for simple setup
    @@ -22,7 +23,9 @@
    - Look into `/data` directory for troubleshooting (contains information about the cluster and the agent state)
    - Set Container Agent config if you harden the OS using SELinux or Apparmor
    - For better performance, tune ECS_IMAGE_PULL_BEHAVIOR & Image/Task Clean up parameters based on how often you deploy [-](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/automated_image_cleanup.html)
    - Use ENI trunking to increased ENI density to place more task in EC2 instance
    - [Optimize ECS task density](https://aws.amazon.com/blogs/compute/optimizing-amazon-ecs-task-density-using-awsvpc-network-mode/) using ENI trunking
    - Opt-in for `awsvpcTrunking` and use `awsvpc` as network mode
    - [Supported EC2 instance types](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-eni.html#eni-trunking-supported-instance-types) and read [some considerations](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-eni.html#eni-trunking-considerations)
    - {Day2} [Setup Automated update EC2 instances](https://aws.amazon.com/blogs/compute/automatically-update-instances-in-an-amazon-ecs-cluster-using-the-ami-id-parameter/), since [doing it manually](https://aws.amazon.com/blogs/compute/refreshing-an-amazon-ecs-container-instance-cluster-with-a-new-ami/) is hard and error prone
    - Fargate
    - Understand the use case. [EC2 or Fargate?](https://containersonaws.com/introduction/ec2-or-aws-fargate/)
    @@ -33,7 +36,8 @@
    - User [Fargate on Spot instances](https://aws.amazon.com/blogs/compute/deep-dive-into-fargate-spot-to-run-your-ecs-tasks-for-up-to-70-less/) to reduce the cost. Or [use both Fargate & Fargate Spot using Capacity Provider](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html)
    - Networking
    - Use separate VPC, don't mix up with other service eg. EC2 instances that are not belong to the cluster.
    - Plan CIDR Block carefully, use highest
    - Plan your VPC & Subnet CIDR, avoid complexity of using multiple CIDRs in a VPC
    - Use [IP address](https://network00.com/NetworkTools/IPv4AddressPlanner/) [tools](http://www.davidc.net/sites/default/subnets/subnets.html)
    - VPC & Subneting architecture patterns: https://containersonaws.com/architecture/
    - Makes Container Registry as near as possible with your cluster (for low latency & speed up docker pull).
    - ECS Cluster & ECR are better in the same Region
  9. @ejlp12 ejlp12 revised this gist Oct 1, 2020. 1 changed file with 1 addition and 2 deletions.
    3 changes: 1 addition & 2 deletions 1_ecs_note.md
    Original file line number Diff line number Diff line change
    @@ -50,10 +50,9 @@
    - Highly recommended to use ALB instead of ELB - dynamic port mapping, more detail monitoring & access log
    - Use placement strategy and constraint to maximize your resource. [CDK example](https://github.com/aws-samples/aws-cdk-examples/tree/master/python/ecs/ecs-service-with-task-placement/), [Terraform example](https://www.terraform.io/docs/providers/aws/r/ecs_service.html)
    - Tune scaling parameters: healthcheck grace period and scaling cooldowns
    - Recommend using Target Tracking Scaling Policies instead of Step Scaling Policies
    - Recommended to use [Target Tracking Scaling Policies](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html) instead of Step Scaling Policies. Common scaling metric is based on EC2's CPU utilization or request count per target of ALB's target group.
    - Observability
    - Send application log to standar output and stream to centralize logging. Take advantage of aws-logs driver & CloudWatch
    - Enable CloudWatch Container Insight to collect more detail monitoring metrics and logging.
    - Use X-Ray for transaction tracing for troubleshooting perfomance.
    -

  10. @ejlp12 ejlp12 revised this gist Sep 30, 2020. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions 1_ecs_note.md
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,5 @@
    ## ECS Best Practices

    - Cluster
    - Use IaC for setting up resources eg. CloudFormation, AWS CDK, Terraform
    - Use [copilot](https://github.com/aws/copilot-cli) for simple setup
  11. @ejlp12 ejlp12 revised this gist Sep 30, 2020. 1 changed file with 3 additions and 3 deletions.
    6 changes: 3 additions & 3 deletions 1_ecs_note.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,7 @@
    - Use IaC for setting up resources eg. CloudFormation, AWS CDK, Terraform
    - Use [copilot](https://github.com/aws/copilot-cli) for simple setup
    - Use EC2 Auto Scaling Group & Capacity Provider for better scaling
    - Enable [AWSVPCTrucking](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html) setting in account level for [higher task density in EC2](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html), then use ENI trunking
    - Enable [AWS VPC Trunking](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html) setting in account level for higher task density in [some EC2 types instance](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-eni.html#eni-trunking-supported-instance-types)
    - Favor configuring ECS Clusters with EC2 instances in at least 3 AZ, keep keep the instance counts balanced across the AZs. More on [availability best practices](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/)
    - Use Amazon ECS-optimized AMIs.
    - Using different OS is hard to maintain: upgrade OS, patching, update Docker, update ECS Agent, etc
    @@ -23,8 +23,8 @@
    - Use ENI trunking to increased ENI density to place more task in EC2 instance
    - {Day2} [Setup Automated update EC2 instances](https://aws.amazon.com/blogs/compute/automatically-update-instances-in-an-amazon-ecs-cluster-using-the-ami-id-parameter/), since [doing it manually](https://aws.amazon.com/blogs/compute/refreshing-an-amazon-ecs-container-instance-cluster-with-a-new-ami/) is hard and error prone
    - Fargate
    - Understand its [Task Definition limition & CPU/Memory configuration](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html). If it is not match with your workload requirement then use EC2 launch type.
    - Understand the use case. [EC2 or Fargate?](https://containersonaws.com/introduction/ec2-or-aws-fargate/)
    - Understand its [Task Definition limitation & CPU/Memory configuration](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html). If it is not match with your workload requirement then use EC2 launch type.
    - User EFS for persistent storage, but consider the performance, better make the task immutable.
    - Use it for general purpose workload (burstable), assume your Fargate task will run on 't' or 'm' type of instance
    - Don't use it if you need GPU, high network bandwidth (50Gbps, 100Gbps), very high vPUC or RAM
    @@ -43,7 +43,7 @@
    - Adjust other health check parameters: `interval, timeout, retries, startPeriod` based on your app characteristics
    - Service
    - Service Discovery
    - User Amazon ECS Service Discovery/CloudMap (internal domain resolution) for inter-service communication inside a cluster.
    - Use Amazon ECS Service Discovery/CloudMap (internal domain resolution) for inter-service communication inside a cluster.
    - Be aware of SRV and A records for service lookup using DNS. 'A' record is simple, using SRV records you might change your app code since it will requires the app to resolve the IP address and the port.
    - Highly recommended to use ALB instead of ELB - dynamic port mapping, more detail monitoring & access log
    - Use placement strategy and constraint to maximize your resource. [CDK example](https://github.com/aws-samples/aws-cdk-examples/tree/master/python/ecs/ecs-service-with-task-placement/), [Terraform example](https://www.terraform.io/docs/providers/aws/r/ecs_service.html)
  12. @ejlp12 ejlp12 revised this gist Sep 30, 2020. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion 2_ecs_fargate.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,6 @@
    ### What you need to know (be aware of) when using ECS on Fargate.

    - **Limitation** Do not support all of the task definition parameters. [ref](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html)
    - **Limitation** Fargate do not support all of the task definition parameters. [ref](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html)
    - Cannot use provilaged mode
    - Should use `awsvpc` mode -> Task will have ENI and a primary private IP address
    - Cannot use _gpu_
  13. @ejlp12 ejlp12 revised this gist Sep 22, 2020. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion 1_ecs_note.md
    Original file line number Diff line number Diff line change
    @@ -39,7 +39,7 @@
    - Use network mode = `host`, if you want the task bypasses Docker's built-in virtual network and maps container ports directly to the EC2 instance's network interface directly
    - Task Definition
    - Don't store env variables in the task definition, instead [use Parameter Store](https://aws.amazon.com/blogs/compute/managing-secrets-for-amazon-ecs-applications-using-parameter-store-and-iam-roles-for-tasks/) - more secure.
    - Always set `healthCheck` parameter for task that will be part of ECS service or using ECS Service Discovery.
    - Always set [`healthCheck`](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_healthcheck) parameter in the Container Definition for task that will be part of ECS service or using ECS Service Discovery.
    - Adjust other health check parameters: `interval, timeout, retries, startPeriod` based on your app characteristics
    - Service
    - Service Discovery
  14. @ejlp12 ejlp12 renamed this gist Sep 22, 2020. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  15. @ejlp12 ejlp12 renamed this gist Sep 22, 2020. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  16. @ejlp12 ejlp12 revised this gist Sep 22, 2020. 1 changed file with 20 additions and 0 deletions.
    20 changes: 20 additions & 0 deletions ecs_fargate.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,20 @@

    - **Limitation** Do not support all of the task definition parameters. [ref](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html)
    - Cannot use provilaged mode
    - Should use `awsvpc` mode -> Task will have ENI and a primary private IP address
    - Cannot use _gpu_
    - No placement constraint
    - Task CPU and memory (min: 0.25 vCPU, 0.5GB RAM, max: 4 vCPU, 30 GB RAM)
    - Logging: awslogs, splunk, firelens, and fluentd
    - Optional need Amazon ECS task execution IAM role for call other AWS service, e.g. ECR
    - Fargate platform version realease will provides update on kernel or operating system updates, new features, bug fixes, or security update
    - Task automated scheduled-retirement: you will be notified by email
    - Task is stopped or terminated by AWS. If it is part of the service, it will be updated automatically.
    - Reason:
    - Irreparable failure of the underlying hardware
    - Task has a security vulnerability
    - Fargate task recycling
    - When a security or infrastructure update is needed
    - No notification before recycling process
    - Only affect task that part of service (not standalone task)
    -
  17. @ejlp12 ejlp12 revised this gist Sep 22, 2020. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion ECS_Notes.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    - Cluster
    - Use IaC for setting up resources eg. CloudFormation, AWS CDK, Terraform
    - Use (copilot](https://github.com/aws/copilot-cli for simple setup
    - Use [copilot](https://github.com/aws/copilot-cli) for simple setup
    - Use EC2 Auto Scaling Group & Capacity Provider for better scaling
    - Enable [AWSVPCTrucking](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html) setting in account level for [higher task density in EC2](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html), then use ENI trunking
    - Favor configuring ECS Clusters with EC2 instances in at least 3 AZ, keep keep the instance counts balanced across the AZs. More on [availability best practices](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/)
  18. @ejlp12 ejlp12 revised this gist Sep 22, 2020. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions ECS_Notes.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,6 @@
    - Cluster
    - Use IaC for setting up resources eg. CloudFormation, AWS CDK, Terraform
    - Use (copilot](https://github.com/aws/copilot-cli for simple setup
    - Use EC2 Auto Scaling Group & Capacity Provider for better scaling
    - Enable [AWSVPCTrucking](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html) setting in account level for [higher task density in EC2](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html), then use ENI trunking
    - Favor configuring ECS Clusters with EC2 instances in at least 3 AZ, keep keep the instance counts balanced across the AZs. More on [availability best practices](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/)
  19. @ejlp12 ejlp12 revised this gist Jun 9, 2020. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions ECS_Notes.md
    Original file line number Diff line number Diff line change
    @@ -39,6 +39,7 @@
    - Task Definition
    - Don't store env variables in the task definition, instead [use Parameter Store](https://aws.amazon.com/blogs/compute/managing-secrets-for-amazon-ecs-applications-using-parameter-store-and-iam-roles-for-tasks/) - more secure.
    - Always set `healthCheck` parameter for task that will be part of ECS service or using ECS Service Discovery.
    - Adjust other health check parameters: `interval, timeout, retries, startPeriod` based on your app characteristics
    - Service
    - Service Discovery
    - User Amazon ECS Service Discovery/CloudMap (internal domain resolution) for inter-service communication inside a cluster.
  20. @ejlp12 ejlp12 revised this gist Jun 9, 2020. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions ECS_Notes.md
    Original file line number Diff line number Diff line change
    @@ -38,6 +38,7 @@
    - Use network mode = `host`, if you want the task bypasses Docker's built-in virtual network and maps container ports directly to the EC2 instance's network interface directly
    - Task Definition
    - Don't store env variables in the task definition, instead [use Parameter Store](https://aws.amazon.com/blogs/compute/managing-secrets-for-amazon-ecs-applications-using-parameter-store-and-iam-roles-for-tasks/) - more secure.
    - Always set `healthCheck` parameter for task that will be part of ECS service or using ECS Service Discovery.
    - Service
    - Service Discovery
    - User Amazon ECS Service Discovery/CloudMap (internal domain resolution) for inter-service communication inside a cluster.
  21. @ejlp12 ejlp12 revised this gist Jun 9, 2020. 1 changed file with 3 additions and 4 deletions.
    7 changes: 3 additions & 4 deletions ECS_Notes.md
    Original file line number Diff line number Diff line change
    @@ -20,15 +20,14 @@
    - Set Container Agent config if you harden the OS using SELinux or Apparmor
    - For better performance, tune ECS_IMAGE_PULL_BEHAVIOR & Image/Task Clean up parameters based on how often you deploy [-](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/automated_image_cleanup.html)
    - Use ENI trunking to increased ENI density to place more task in EC2 instance
    - {Day2} [Setup Automated update EC2 instances](https://aws.amazon.com/blogs/compute/automatically-update-instances-in-an-amazon-ecs-cluster-using-the-ami-id-parameter/), [doing it manually](https://aws.amazon.com/blogs/compute/refreshing-an-amazon-ecs-container-instance-cluster-with-a-new-ami/) is hard
    - {Day2} [Setup Automated update EC2 instances](https://aws.amazon.com/blogs/compute/automatically-update-instances-in-an-amazon-ecs-cluster-using-the-ami-id-parameter/), since [doing it manually](https://aws.amazon.com/blogs/compute/refreshing-an-amazon-ecs-container-instance-cluster-with-a-new-ami/) is hard and error prone
    - Fargate
    - Understand its [Task Definition limition & CPU/Memory configuration](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html). If it is not match with your workload requirement then use EC2 launch type.
    - What use case is match with [EC2 or Fargate?](https://containersonaws.com/introduction/ec2-or-aws-fargate/)
    - Understand the use case. [EC2 or Fargate?](https://containersonaws.com/introduction/ec2-or-aws-fargate/)
    - User EFS for persistent storage, but consider the performance, better make the task immutable.
    - Use it for general purpose workload (burstable),
    - Use it for general purpose workload (burstable), assume your Fargate task will run on 't' or 'm' type of instance
    - Don't use it if you need GPU, high network bandwidth (50Gbps, 100Gbps), very high vPUC or RAM
    - User [Fargate on Spot instances](https://aws.amazon.com/blogs/compute/deep-dive-into-fargate-spot-to-run-your-ecs-tasks-for-up-to-70-less/) to reduce the cost. Or [use both Fargate & Fargate Spot using Capacity Provider](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html)
    - By default, Fargate tasks are spread across Availability Zones
    - Networking
    - Use separate VPC, don't mix up with other service eg. EC2 instances that are not belong to the cluster.
    - Plan CIDR Block carefully, use highest
  22. @ejlp12 ejlp12 created this gist Jun 9, 2020.
    55 changes: 55 additions & 0 deletions ECS_Notes.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,55 @@
    - Cluster
    - Use IaC for setting up resources eg. CloudFormation, AWS CDK, Terraform
    - Use EC2 Auto Scaling Group & Capacity Provider for better scaling
    - Enable [AWSVPCTrucking](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html) setting in account level for [higher task density in EC2](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html), then use ENI trunking
    - Favor configuring ECS Clusters with EC2 instances in at least 3 AZ, keep keep the instance counts balanced across the AZs. More on [availability best practices](https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/)
    - Use Amazon ECS-optimized AMIs.
    - Using different OS is hard to maintain: upgrade OS, patching, update Docker, update ECS Agent, etc
    - Subscribe for update notification.
    - Launching EC2 Container Instance
    - Don't use public IP address (Turn off Auto-assign Public IP)
    - Make EC2 instance immutable.
    - Better not to expose SSH for remote login, use AWS System Manager Run Command & Session Manager instead.
    - Use Spot Instance whenever possible eg. for Development environment
    - Find the instace type that are not frequently interrupted
    - Set Spot pricing to little bit higher than avarage
    - Understad how the EC2 container instance works
    - Don't use reserved ports for your application (Linux TCP: 22, 2375, 2376, 51678, 51679, 51680)
    - Don't store log files or any persistent data in the container - it will make docker storage full
    - Look into `/data` directory for troubleshooting (contains information about the cluster and the agent state)
    - Set Container Agent config if you harden the OS using SELinux or Apparmor
    - For better performance, tune ECS_IMAGE_PULL_BEHAVIOR & Image/Task Clean up parameters based on how often you deploy [-](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/automated_image_cleanup.html)
    - Use ENI trunking to increased ENI density to place more task in EC2 instance
    - {Day2} [Setup Automated update EC2 instances](https://aws.amazon.com/blogs/compute/automatically-update-instances-in-an-amazon-ecs-cluster-using-the-ami-id-parameter/), [doing it manually](https://aws.amazon.com/blogs/compute/refreshing-an-amazon-ecs-container-instance-cluster-with-a-new-ami/) is hard
    - Fargate
    - Understand its [Task Definition limition & CPU/Memory configuration](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html). If it is not match with your workload requirement then use EC2 launch type.
    - What use case is match with [EC2 or Fargate?](https://containersonaws.com/introduction/ec2-or-aws-fargate/)
    - User EFS for persistent storage, but consider the performance, better make the task immutable.
    - Use it for general purpose workload (burstable),
    - Don't use it if you need GPU, high network bandwidth (50Gbps, 100Gbps), very high vPUC or RAM
    - User [Fargate on Spot instances](https://aws.amazon.com/blogs/compute/deep-dive-into-fargate-spot-to-run-your-ecs-tasks-for-up-to-70-less/) to reduce the cost. Or [use both Fargate & Fargate Spot using Capacity Provider](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html)
    - By default, Fargate tasks are spread across Availability Zones
    - Networking
    - Use separate VPC, don't mix up with other service eg. EC2 instances that are not belong to the cluster.
    - Plan CIDR Block carefully, use highest
    - VPC & Subneting architecture patterns: https://containersonaws.com/architecture/
    - Makes Container Registry as near as possible with your cluster (for low latency & speed up docker pull).
    - ECS Cluster & ECR are better in the same Region
    - Use network mode = `awspvc` for greater security using SG, easy troubleshooting (using VPC flow log)
    - Use network mode = `host`, if you want the task bypasses Docker's built-in virtual network and maps container ports directly to the EC2 instance's network interface directly
    - Task Definition
    - Don't store env variables in the task definition, instead [use Parameter Store](https://aws.amazon.com/blogs/compute/managing-secrets-for-amazon-ecs-applications-using-parameter-store-and-iam-roles-for-tasks/) - more secure.
    - Service
    - Service Discovery
    - User Amazon ECS Service Discovery/CloudMap (internal domain resolution) for inter-service communication inside a cluster.
    - Be aware of SRV and A records for service lookup using DNS. 'A' record is simple, using SRV records you might change your app code since it will requires the app to resolve the IP address and the port.
    - Highly recommended to use ALB instead of ELB - dynamic port mapping, more detail monitoring & access log
    - Use placement strategy and constraint to maximize your resource. [CDK example](https://github.com/aws-samples/aws-cdk-examples/tree/master/python/ecs/ecs-service-with-task-placement/), [Terraform example](https://www.terraform.io/docs/providers/aws/r/ecs_service.html)
    - Tune scaling parameters: healthcheck grace period and scaling cooldowns
    - Recommend using Target Tracking Scaling Policies instead of Step Scaling Policies
    - Observability
    - Send application log to standar output and stream to centralize logging. Take advantage of aws-logs driver & CloudWatch
    - Enable CloudWatch Container Insight to collect more detail monitoring metrics and logging.
    - Use X-Ray for transaction tracing for troubleshooting perfomance.
    -