- 
      
- 
        Save ascdso2020/33abae9da40ec77d420f08b68d7ae4d1 to your computer and use it in GitHub Desktop. 
    AWS DevOps Engineer Professional Study Notes
  
        
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | CI & CD: | |
| ======== | |
| 2 core software development processes | |
| CI process of automating regular code commits followed by an automated build and test process designed to highlight intergration issues early. | |
| Additional tooling and functionality provided by Bamboo, CruiseControl, Jenkins, Go and TeamCity etc. | |
| workflow based | |
| CD takes the form of a workflow based process which accepts a tested software build payload from a CI server. Automates the deployment into a working QA, Pre-prod or Prod environment. | |
| AWS CodeDeploy and CodePipeline provide CI/CD services | |
| Elasticbeanstalk and CFN provide functionality which can be utilized by CI/CD servers. | |
| Deployment Types: | |
| ================= | |
| 1. Single Target Deployment - small dev projects, legacy or non-HA infrastructure; outage occurs in case of failure, testing opportunity is limited. | |
| 2. All-at-Once Deployment - deployment happens on multiple targets, requires Orchestration tools, suitable for non critical apps in 5-10 range. | |
| 3. Minimum in-service Deployment - keeps min in-service targets and deploy in multiple stages, suitable for large environments, allow automated testing, no downtime | |
| 4. Rolling Deployments - x targets per stage, happens in multiple stages, after completion of stage 1, next stage begins, orchestration and health check required, can be least efficient if x is smaller, allow automated testing, no downtime if x is not large to impact application, can be paused, allowing multi-version testing. | |
| 5. Blue Green Deployment - Deploy to seperate Green environment, update the code on Green, extra cost due to duplicate env during deployment, Deployment is rapid, cutover and migration is clean(DNS Change), Rollback easy(DNS regression), can be fully automates using CFN etc. | |
| Binary, No Traffic Split, not used to feature test | |
| A/B Testing - distribution traffic between blue/green, allows gradual performance/stability/health analysis, allows new feature testing, rollback is quick, end goal of A/B testing is not migration, Uses Route 53 for DNS resolution, 2 records one pointing A, other pointing B, weighted/round robin. | |
| Bootstraping: process starts with base image (ISO/AMI) and via automation build on it to create more complex object. | |
| Cloud Init, CFN Init (complex bootstraping engine) | |
| AMI (Baking) based approach large number of AMIs even for a small system change. Ex: Quickstart | |
| Bootstraping - every option like OS, patches, dependencies, applications can be tweaked. Ex: Quick launch | |
| Immutable Architecture - Replace infra instead of upgrading or repairing faulty components, treat servers as unchangeable objects, don't diagnose and fix, throw away and re-create, Nothing bootstraped except AMI. | |
| Example: Pets vs Cattles | |
| Container/Docker: contains only application and dependencies, achieves higher density, and improved portability by removing per container Guest OS. | |
| Escape for dependency hell, Consistent progression from DEV->TEST->QA->PROD, Isolation, resource scheduling at micro level, Extreme code portability, Micro-services; Docker Image(Read Only), Docker container (hold everything required to run), Layers/UFS (combines layers using UFS), Dockerfile, Docker Daemon/Engine, Docker Client (Interface between user and Docker Engine), Docker Registry/Hub (Private/Public store for Docker images) | |
| CloudFormation: | |
| ================ | |
| Building block service to provision infra within AWS, Elastic Beanstalk uses CFN, JSON format, Stack (CFN unit of grouping infra), Template (JSON doc giving instructions for CFN on how to act and what to create/update), Stack Policy (IAM style policy governs what/who can change, can be added via CLI or UI, updated but can not be removed) | |
| Create CFN template -> Add template to CFN -> Create CFNStack -> Resources (200 per template) -> Update template/Stack -> Delete Stack | |
| Template Anatomy: Parameters (pass variables into a template), Mappings (allow processing of hash's by CFN template), Resources, Outputs (results from template); Only Resources is mandatory. CFN can run scripts within instances, expand files within instances, stack id(unique). | |
| Use cases: | |
| 1. Template for manual deployment of bespoke infra. | |
| . create repeatable patterned environment | |
| 3. run automated testing for CI/CD environments | |
| 4. define an environment once, deployed | |
| o5. manage infra config using software development style versioning and testing concepts. | |
| IMPORTANT: Template should be designed to work 1 or 1000 app in one or more regions. | |
| CloudFormation Structure: | |
| ========================= | |
| CloudWatch: | |
| ============ | |
| Metric gathering service, Monitoring/alerting service, A graphing service | |
| CMetrics data will be available only for 2 weeks. Anything over 2 weeks should be stored outside. | |
| Namespaces - Ex: EBS, EC2, Check CloudWatch Developer Guide for all Namespaces. Enable detailed monitoring for additional metrics under Namespaces. | |
| Metrics can be aggregated across Autoscaling Services. | |
| Custom Metrics: | |
| Create an IAM role -> CloudWatch -> EC2 -> CloudWatch Full Access | |
| Create EC2 instance with CloudWatch Role -> SSH -> Install Python, Pip, Git and AWS CLI -> git clone https://github.com/ACloudGuru/resources; cd resources; | |
| cat time.sh; chmod a+x time.sh -> setup a cron job (crontab -e) */1 * * * * /home/ubuntu/resources/time.sh -> save -> CloudWatch Min granualrity 1 min -> | |
| Check the CloudWatch for new namespace with Unixtime metrics | |
| CloudWatch Alarms: | |
| Initiate actions (SNS notification) based on parameters specified against metrics available. Alarm period equal to metric frequency (5 min or 1 min). | |
| Alarm can't invoke actions unless state change. Alarms actions must be in same region as alarm. AWS resource wont send data under certain | |
| conditions (ex: EBS not attached to EC2 wont send data to CloudWatch) | |
| Alarm States: OK (metric match thresold) ALARM (metric outside thresold) INSUFFICIENT_DATA (metric not available/not enough) | |
| Limits: 5000 alarms per AWS account, create or update alarm using mon-put-metric-alarm, enable/disable using mon-[enable/disable]-alarm, | |
| Describe alarm using mon-describe-alarm. Alarm can be created before metrics created. | |
| date +%s -> Get Unix TimeAWS Alarm can be created using CLI aws cloudwatch put-metric-alarm. | |
| Autoscale based on CloudWatch Metrics: CPU/Network/Memory too high/low. | |
| Attach EC2 Full access policy to CloudWatch IAM role. | |
| Using aws autoscaling CLI, create autoscaling launch config and Auto Scaling group. | |
| Use aws autoscaling put-scaling-policy to scale in and out. | |
| Use aws cloudwatch put-metric-alarm to create an alarm for Add capacity using Scale out ARN. | |
| Use aws cloudwatch put-metric-alarm to create an alarm for Remove capacity using Scale in ARN. | |
| sudo apt-get install stress -> to stress CPU | |
| stress --cpu 2 --timeout 600 | |
| CloudWatch Logs: Monitor your existing system, application and custom logs in real time. | |
| Send your existing logs to CloudWatch; Create patterns to look for in your logs; Alert based on finding of these patterns. | |
| Free agents for Ubuntu, Amazon Linux, Windows. | |
| Main Purpose | |
| 1. Monitor logs from EC2 instances in realtime. (track number of errors in application logs and send notification if exceed thresold) | |
| 2. Monitor AWS CloudTrail logged events (API Activity such as manual EC2 instance termination) | |
| 3. Archive log data (change log retention setting to automatically delete) | |
| Log events - A record sent to CloudWatch Logs to be stored. Timestamp and Message. | |
| Log Streams - this is a sequence of log events that share the same resource (Apache access logs, automatically deleted after 2 months). | |
| Log Groups - group of log stream that share same retention, monitoring and access control settings. | |
| CMetric Filters - define how a service would extract metric observations from events and turn them into data points for a CloudWatch metric. | |
| Retention Settings - how long events are kept; Expired logs are deleted automatically. | |
| Download the logs agent (cloudwatchlogs-commands.txt) and run the command. | |
| Log Group Retention can be set to 1 day to 10 years. | |
| CloudWatch Log Filters: filter log data pushed to CloudWatch; won't work on existing log data, only work after log filter created, only returns | |
| first 50 results. Metric contains 1. Filter Pattern 2. Metric Name 3. Metric NameSpace 4. Metric value | |
| Modify rsyslog (/etc/rsyslog.d/50-default.conf) and remove auth on line number 9, sudo service rsyslog restart | |
| while [ 1 ] ; do ssh xx.xx.xx.xx ; sleep 2 ; done | |
| Real-Time Log processing: Subscription Filters for RTLP - Kinesis Streams, Lambda, Kinesis Firehouse | |
| Use aws kinesis command to create/describe stream, get stream ARN. Update permissions.json with stream and role ARN. | |
| Run aws logs put-subscription-filter to start real time log processing. | |
| Check GitHub: awslabs/cloudwatch-logs-subscription-consumer ==> one click deploy for ELK; | |
| CloudTrail: | |
| service records all AWS API calls on account and delivers in a log. Log contain identity of whom made the API call, time of call, source | |
| ip of call, request parameters, response elements returned by AWS service. | |
| Purpose - to enable security analysis, track changes to your account, provide compliance auditing | |
| Types - All regions and one region (logs delivered to S3 bucket and stored using SSE, logs delivered in 15 mins, new logs files every 5 min) | |
| CloudWatch Events: similar to CloudTrail but faster, central nervous system of AWS. Components 1. Events - Created when State Change or API call or | |
| Own code generated application level events. 2. Rules - match incoming events and route them for processing, not ordered. 3. Targets - | |
| where events are processed. i) Lambda functions ii) Kinesis streams iii) SNS topics iv) built-in targets | |
| Go to AWS Lambda from console and copy cloudwatchevents.txt in configure function. | |
| Go to CloudWatch, Create rule -> EC2 state change notification(shutting down), Lambda Function(), Configure rule details -> Create rule | |
| Delegation & Federation: | |
| ======================= | |
| Identity Federation - Own IdP - IAM, allow users in other AWS accounts access to resources - Delegation, | |
| allows users from external IdP - Federation. | |
| w-2 Types of Federation - | |
| 1. Corporate/Enterprise Identity Federation (LADP,AD,SAML,AWS Directory Service) | |
| 2. Web Identity Federation (Amazon, Facebook, Google, Twitter, OpenID Connect) allow app or access to your AWS resources. | |
| ROLES: object which contains 2 policy documents. TRUST Policy (who granted - ARN) ACCESS policy (what entity - Action) | |
| SESSIONS: set of temporary credentials; access and secret key with expiration; obtained by STS; | |
| Service Delegation - EC2 or Lambda auto refresh the session which auto refreshes temp credentials managed on your behalf. | |
| Console Multi-Account Access: | |
| ============================== | |
| oLogin to Prod account -> authenticate with access keys -> STS Service -> STS:AssumeRole -> Temp Credentials -> Dev Account | |
| Login to Dev account -> Create IAM role -> TrustProductionUsersFullAdmin -> Role for Cross account access -> between AWS accounts -> | |
| rProd account AWS Account ID -> Attach Policy -> Create Role | |
| Login to Prod Account -> Switch Role -> Account Name, Role, Display Name, Color | |
| Corporate Identity Federation: | |
| ============================== | |
| allow to use existing identity store for AWS access. AWS Directory services, SAML, custom federation proxy. Uses role architecture. | |
| Temp access by STS and access obtained via GetFederationToken or STS:AssumeRole operations. | |
| AssumeRole session min 15 minutes, Max 1 hr, Default 1hr; GetFederationToken min 15 min, Max 36 hrs, Default 12 hrs | |
| allows seperation of responsibilities, minimize admin overhead. | |
| Custom Proxy - Console - AssumeRole | |
| ------------------------------------ | |
| 1. Corporate User Browse the Fed Proxy domain.com | |
| 2. Fed Proxy authenticates user to LDAP | |
| 3. LDAP get groups from Fed Proxy | |
| 4. Fed Proxy sends list roles request | |
| 5. STS returns list of roles | |
| 6. User will select appropriate role | |
| 7. Fed Proxy sends STS:AssumeRole | |
| 8. STS returns STS:AssumeRole responses | |
| 9. Generate URL and redirect to user | |
| 10. User access URL and get console access | |
| Custom Proxy - API - GetFederationToken | |
| --------------------------------------- | |
| 1. Corporate App browse Fed Proxy | |
| 2. FedProxy authenticats App to LDAP | |
| 3. Directory sends Entitlements to Fed Proxy | |
| 4. Fed Proxy send GetFederationToken to STS | |
| 5. STS returns GetFederationToken reponse | |
| 6. Session | |
| 7. Call APIs | |
| Both use cases needs an IAM user. GetFederationToken does not support MFA. | |
| SAML - Console - AssumeRoleWithSAML | |
| ----------------------------------- | |
| 1. Corporate user access AD FS | |
| 2. AD FS authenticates user against Directory | |
| 3. SAML Token contains membership generated | |
| 4. Sigin in with SAML Token to AWS Sign-in Endpoint | |
| 5. AssumeRoleWithSAML send to STS | |
| 6. STS returns Creds | |
| 7. AWS Sign-in endpoint returns Console URL | |
| 8. Corporate user Redirected to AWS Console | |
| No need to maintain dedicated Fed proxy for application, proxy doesnt need any IAM permission. | |
| Web Identity Federation | |
| ======================= | |
| allows trusted third party to authneticate users; avoids to create and manage users; avoid users having multiple id's; simplifies access | |
| control via roles. | |
| Standard Web Identity Federation | |
| ================================ | |
| 1. Mobile user autheticates with Web Identity provider | |
| 2. WIP authenticates identity | |
| 3. Mobile user AssumeRole with STS via API | |
| 4. STS validates with WIP receives success/failure notification | |
| 5. success response verifies Role Trust policy | |
| 6. STS provide Temp access credentials to Mobile User | |
| 7. Mobile User use Temp credentials to use service | |
| Cognito: identity management and sync service; 2 product streams; cognito identity pool - collection of identities; allows 2 roles to | |
| be associated one for authenticated user other for unauthenticated users | |
| Cognito Unauthenticated flow: | |
| 1. Mobile user create unauthenticated identity | |
| 2. Coginto returns OpenID Token | |
| 3. Mobile User AssumeRole with STS | |
| 4. STS validates with Cognito | |
| 5. STS returns AWS Guest credentials | |
| 6. Mobile users Write data | |
| Cognito can orchestrate generation of unauthenticated identity, merge unauth identity into auth identity, merge multiple entities into one object | |
| Cognito Authenticated flow: Classic or Basic / Enhanced | |
| First step to Login to Web Identity provider, rest are same as unauthenticated flow. | |
| Enhanced flow, communicate all time with Cognito. | |
| pre-cognito auth flow, unautheticated or guest flow, simple cognito flow, enhanced cognito flow. | |
| why and when to use web id provider - when you need to publish app or service to thousands of users. | |
| High Availability and Elasticity: | |
| ================================= | |
| Autoscaling - Autoscaling Group (Min, Desired,Max), Launch Configuration (AMI, Instance type, Keypair etc), | |
| Scaling Plan (Conditions/Dynamic, Time/Scheduled) | |
| Better fault tolerence, Better availabilty, Better cost management | |
| Limits: 100 LC, 20 ASG, 50 Lifecycle hooks, 50 LB per ASG (10 attached), 20 Step adjustments per scaling policy | |
| Autoscaling Lifecycle: life of EC2 attached to ASG; ASG -> Scaling out -> Instance pending state -> EC2_INSTANCE_LAUNCHING lifecycle hook | |
| -> In Service -> Fail health check or Scale in -> Terminate -> EC2_INSTANCE_TERMINATING lifecycle hook -> Terminated | |
| -> Troubleshoot instances -> Enter StandBy (Manually) -> Pending -> In Service | |
| -> Detach Instances (Manually) -> Detachhing -> Detached -> Standalone Instance -> Can be attached to ASG. | |
| Autoscaling Lifecycle Hooks: allow custom actions as AS launches or terminate instances (Ex: install software or copy logs before termination) | |
| How it works: | |
| 1. AS responds to a scale out event by launching instance 2. AS put the instance into Pending:Wait state 3. As sends message to notification | |
| target as defined for the hook along with information and a token 4. waits until you tell it to continue or timeout ends 5. perform | |
| custom action, install software etc. 6. By default instance wait for an hour and change state Pending:Proceed, then In Service state. | |
| Change the timeout hook via heartbeat-parameter using AWS CLI; can call complete-lifecycle-action to tell ASG to proceed; | |
| call record-lifecycle-action-heartbeat to add more time to timeout; 48 hrs max in wait state | |
| Cooldowns: ensure ASG doesnt launch/terminate more instances; starts when instance enters In Service state; | |
| ABANDON - cause AS to terminate and launch new instance | |
| CONTINUE - put instance into service | |
| can use SPOT instances, but cannot prevent terminate when the SPOT price go up. | |
| Launch Configuration Deep Dive: Template used by AutoScaling; must specify LC; only one lc per ASG at a time; can't modify LC after created. | |
| LC created from Scratch or running EC2 instance (can't detect block devices created after LC applied); can't use same LC for SPOT and ON DEMAND. | |
| AutoScaling Group Deep Dive: collection of instances (LU); allows improved scaling and management of instances; | |
| creating ASG from EC2 - Tags will not copied, block device mapping from AMI not from instance, LB name is not copied. | |
| AutoScaling Group Self-Healing: create low cost, self-healing, immutable infrastructure; Good for Bastion/OpenVPN etc. | |
| sudo pip install beeswithmachineguns paramiko -> For Testing Autoscaling | |
| Setup access keys by using -> vi .boto; chmod 600 .boto; ssh-keygen -t rsa -f bees; mv bees ~/.ssh/bees.pem; cat bees.pub and import | |
| keypair in AWS console; bees up -s 10 -g bees -k bees; creates 10 instances; bees attack -n 1000 -c 250 -u http://elbdns; attacks LB, | |
| create CloudWatch Alarms, Scale out happens; bees down | |
| Amazon RDS: | |
| RDS provides: Provision Infra, Install DB software, Automatic backup, Patch DB software, sycnronous data replication, Automatic failover | |
| You provide: Settings, Schema, Performance tuning | |
| RDS can be vertically scaled, Change instance type, Multi-Az double the price, Reserved instances make it cheaper, Storage 5 GB to 6TB can | |
| be scaled live(except MS SQL server) | |
| Horizontal Scaling - via Read Replicas, Master/Slave, Great for High Read/Write ratios, one master, replicate write to slave, upto 5 Read Replicas | |
| can create replica of a replica, Shrading - split tables into multiple DB. | |
| GP2 (3 IOPS/GB, upto 3000 IOPS) or IOPS storage (Specify IOPS at creation, Fast, Predictable, consistent, suitable for I/O intensive, DB workloads) | |
| Amazon Aurora: | |
| Developed by Amazon. Fast, Reliable, Simple, Cost effective, 5x throughput of MySQL on same hardware, compatible with MySQL 5.6, storage | |
| is fault tolernet and self healing, detects crashes and restarts, no crash recovery or cache rebuilding, automatic failover to one of up | |
| to 15 RR, storage Auto-scaling from 10 GB to 64TB. | |
| Backups - Automatic, continuous, incremental backups, point-in-time restore with in a second, 35 days retention, stored in S3, No impact on DB performance. | |
| Snapshots - User initiated stored in S3, Incremental | |
| Database failure - Maintain 6 copies in 3 AZs, Recovery in healthy AZ, PIT/Snapshot restore | |
| Fault Tolerence - Data divided into 10GB across many disks, transparently handles data loss, Self-healing | |
| Replicas - Amazon Aurora Replica (upto 15, no performance impact) and MySQL read replicas (upto 5, high performance impact) | |
| Security - Must be created in VPC, SSL(AES-256) secure data in transit, can encrypt db with KMS, Encrypted storage/backups | |
| CANT ENCRYPT UNENCRYPTED DB, MUST CREATE NEW WITH ENCRYPTION ENABLED. | |
| DynamoDB primer: | |
| Fully managed, NoSQL database service, Predictable fully manageable performance with seamless scalability, Fully resilient and HA, | |
| performance scales in a linear way, fully integrated with IAM, suitable for mobile and web applications. | |
| Collection of Tables, tables highest level structure within DB, WCU number of 1KB blocks per second, RCU number of 4KB blocks per second. | |
| eventually consistent (less cost/RCU)/immediate consistent | |
| Hash Key/Partition Key | |
| Range Key/Sort Key | |
| Partitions: | |
| Underlying storage and processing nodes of DynamoDB. Initially one table -> one partition, one partition can store 10 GB, | |
| handles 3000 RCU and 1000 WCU, data distributed based on Hash/Partition Key, can scale indefinitely, no decrease in performance, | |
| allocated WCU and RCU is split between partitions. | |
| GSI/LSI: | |
| DynamoDB offers 2 data retrieval operations, SCAN (scan entire table) and QUERY (select single/multiple item by partition key value) | |
| Index allows efficient queries | |
| Global Secondary Index - can be created anytime, can have alternative Partition & Sort Key, RCU and WCU are defined on GSI. | |
| GSI only support evetually consistent reads. | |
| Local Secondary Index - can only be created at the time of table creation; contains Partition, Sort key and New Sort Key, Projected values | |
| LSI storage concerns - Beware of ItemCollections, ItemCollections max size is 10 GB, ItemCollectionSizeLimitExceededException - Answer | |
| LSI and capacity exceeded 10GB | |
| Streams & Replications: | |
| Streams: ordered record of updates to a DynamoDB table, If stream enabled records table and stores for 24 hours. | |
| AWS guarantee no duplication and real time. can be configured with 4 views; | |
| KEYS_ONLY (only key attributes are written to the stream) | |
| NEW_IMAGE (entire item POST update) | |
| OLD_IMAGE (entire item PRE update) | |
| NEW_AND_OLD_IMAGES (PRE and POST operation state) | |
| Use cases of Stream: Replication, Triggers, Games or large distributed app with user worldwide, DR, Lambda function triggered when items | |
| are added perform analytics etc | |
| Replication: not built in DynamoDB. Create or select table to be replicated, apply CFN stack and wait, get the location from URL of CFN output | |
| test a simple cross region replication | |
| Use SQS as Management write buffer. | |
| Increase in RCU is dangerous. | |
| How to structure data based on key-space and temporal load/heat | |
| prefix/suffix key additions to improve keyspace load leveling | |
| Buffering read/writes with SQS and Caching | |
| SQS: | |
| Reliable, scalable, hosted queue service for sending, storing and retrieving messages. Queue act as a buffer between sender and receiver. | |
| 256kb message, >256kb managed using SQS extended client library which uses S3, deliver message atleast once, not FIFO, can be create in | |
| any region, retained for 14 days, can be sent and read simultaneously, long polling reduces frequent polling (wait 20 secs if queue empty) | |
| First 1 million request free, $0.50 next 1 million + data transfer charges, SQS queues can be scaled. | |
| Amazon SQS Architectures - Priority (2 queues High/Low), Fanout (SNS topic/multiple SQS queues for parallel processing) | |
| Kinesis: | |
| 3 separate service, | |
| Streams (collect & process large streams of data in real time), | |
| Analytics(run standard SQL queries against streaming data), | |
| Firehouse(load streaming data into AWS from many sources into S3 and Redshift) | |
| Kinesis Streams vs SQS - Kinesis for data processed at the same time or within 24 hrs by different consumers, data can be reused with in 24 hrs | |
| 24 hrs retention | |
| SQS can write multiple queue using fanout, data cannot be reused, 14 days retention | |
| Operations: | |
| API/CLI Hints, Tips, Cheats, Instance types, EBS performance, Snapshots vs AMI, Tags | |
| Instance Types: | |
| PV (Default) & HVM (Slow) | |
| HVM supports wide selection of instance types/sizes. | |
| Why use older versions of Instance? Answer is very attractive spot pricing | |
| Certain Feature require HVM like enhanced networking | |
| Instance family - T M C R G I D(48TB HDD) | |
| Instance generation - 1, 2, 3, 4 | |
| Instance size - Micro, Small, Medium, Large etc | |
| Instance Features: | |
| 1.EBS Optimization - dedicated bandwidth, consistency, higher throughput | |
| 2.Enhanced Networking - AWS supported SR-IOV, less jitter, higher throughput (traffic bypass hypervisor) | |
| 3.Instance Store Volume - No resiliance, high throughput, high IO (Temporary Block level storage - storing Buffers, cache etc) | |
| 4.GPU Availability - Media conversion, Graphical or Scientific Compute - Genomics (G family instances) | |
| EBS Performance: | |
| CAPACITY(GB), THROUGHPUT(MB read/write), BLOCKSIZE(KB read/write), IOPS, LATENCY (delay between read & write) | |
| Magnetic volume (HDD) cold and near archiveal workloads, not suitable for Prod. | |
| GP2 - Base performance of 3 IOPS per GB, burstable up to 3000, up to 160 MB/s throughput, max 10000 IOPS | |
| IO2 - 30 IOPS per GB, Max throughput 320 MB/s, Max IOPS 20000 | |
| Storage Performance - Depends on EC2 instance type, IO profile, Network speed, EBS volume | |
| IOPS vs Throughput | |
| RAID - combine multiple EBS volume to overcome single volume limits (use 128 or 256KB for better performace) | |
| GP2 Busrt pool - create small volumes, monitor peak/normal usage over time and use GP2 rather than IO2. | |
| Pre-warming volume is no longer required for new volumes. | |
| Volume created from snapshots are lazy restored from S3 - force a full read of volume to force a restore | |
| If using RAID 0 or LVM striped the Quiesce IO, freeze file system and perform snapshots. | |
| Snapshot are incremental, improve RTO and RPO by snaping often (less time and cost) | |
| API/CLI Cheat sheet: | |
| Autoscaling - enter-standby, exit-standby (move instances with in ASG in to/out standby state), create-launch-configuration, | |
| delete-launch-configuration, update-auto-scaling-group, put-lifecycle-hook, put-scaling-policy | |
| CloudWatch - put-metric-data, put-metric-alarm, disable-alarm-actions, enable-alarm-actions, set-alarm-state, list-metrics, | |
| get-metric-statistic | |
| DynamoDB- get-item (eventually consistent), batch-get-item (reponse size 16MB & max 100 items), query & scan, put-item, update-item, | |
| delete-item, batch-writitem (multiple items in single request), create-table, update-table | |
| EC2 - run instances (Create instance), stop-instances (stops EBS backed instance), start-instance, terminate-instance, describe-instance, | |
| wait (wait until creation of snapshot), create-image (creates EBS backed AMI from stopped instance), create-snapshots, copy-image, | |
| copy-snapshot, create-volume, decribe-tags (list tag pairs ex:backup and prune EBS volume) | |
| S3 & S3 API - s3, mb, rb, mv, rm, sync, website, s3 api, head-object, head-bucket, get/put bucket-versioning, get/put bucket-acl, | |
| put-bucket-notification-configuration | |
| SQS - add-permssion, change-message-visibility(max 12 hrs), set-queue-attribute (changes can take 15 mins), send-message, | |
| recieve-message (wait-time-sec parameter, non zero enable long polling reduce CPU operations), delete-message | |
| STS - assume-role (return set of temp access credentials/cross account access), assume-role-with-saml, assume-role-with-web-identity, | |
| get-session-token (MFA enabled users) | |
| Creating Snapshots, Pruning and Orphan handling: | |
| Data recovery from snapshot happens in backgroud and data is immediately available; Never loose data from linked snapshot. | |
| Tagging - add custom backup related tags like type, retain until, instanceId | |
| Backup Framework - create snapshots hourly, daily, weekly, monthly. | |
| Lambda_Prune function or EC2 worker instance to look decribe-instance and delete based on expiration. | |
| http://ozaws.com | |
| https://read.acloud.guru | |
| https://serverlesscode.com | |
| https://paulwakeford.info | |
| http://blog.aws.amazon.com | |
| Write a CFN template to deploy a HA wordpress instance | |
| Write a CFN template to deploy a PHP website, inside an ASG, reading from DynamoDB, then deploy a HTTP load-testing application, watch | |
| and manipulate (CLI and UI) the auto scaling. | |
| Write a small Lambda function, use it as a backing for a custom resource in CFN template. | |
| Make changes to a CFN template, update template - become familiar with impacts, replace, update, interrupt | |
| Download EB example app, make changes, create Dev and Prod EB env, make changes, observer updates | |
| Deploy 2 instances, with appropriate roles, bootstrap the cloud watch logs agent and configure detailed log ingestion into CloudWatch. | |
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment