Skip to content

Instantly share code, notes, and snippets.

@rebjan
Forked from j-mprabhakaran/GCP Architect Part-2
Created March 5, 2019 17:52
Show Gist options
  • Save rebjan/16142dba87f08ee3dc7d1b1167d8864c to your computer and use it in GitHub Desktop.
Save rebjan/16142dba87f08ee3dc7d1b1167d8864c to your computer and use it in GitHub Desktop.

Revisions

  1. @j-mprabhakaran j-mprabhakaran created this gist Jan 20, 2019.
    375 changes: 375 additions & 0 deletions GCP Architect Part-2
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,375 @@
    Dataflow lifecycle
    migration concerns from migrating from on-premises over into google cloud
    code snippet to troubleshoot and diagnose

    Part 2 - Hands-on with tools

    Role of Cloud Architect
    plans, designs and builds the infrastructure for an org to host their workload on GCP; able to plan to scale;
    scalability and automation

    The Importance of Hands-on Practice
    Practice

    Core Management Services
    Cloud Resource Manager(Quotas, IAM, Billing)
    Management Services: (IMPORTANT FOR EXAM)

    Organization Node and Folders
    Org -> Folders -> Projects -> Resources
    Org - Highest root node for all GCP resources; Org Admin(Highest level, useful for Auditing), Org Owner(reserved for G suite super admin)
    Folder - Group projects under org; share common IAM policies; Roles granted to folder

    Quotas
    caps on resources you can create; ex: 48 CPU per region, 5 static IP's per project; prevent unexpected spikes in usage;
    3 Types - Resources per project, API rate limit requests per project, Per region
    Increasing Quota caps - soft caps can be raised by request; support ticket or self service form; quota can be viewed on console; proactively request

    Labels
    Method of organization and segregation(projects & folders); Labels are tool for organizing GCP resources; any resouce can be labeled(via console, gcloud or API)
    64 labels per resource; key:value pair; Ex: Environment - env:prod, env:test; Owner or POC - owner:matt, contact:devops; Team or cost center - team:research,
    team:marketing; App component - component:backend, component:frontend; Resource set - state:readyfordeletion, state:inuse
    Tags(only for network/VPC resources, affect resource operations)

    IAM & Admin -> Select entire Org -> Add -> Select Role -> Resource Manager, Select Folder Admin and Org Admin -> Create Folder
    Viewer, Editor, Owner - Primitive Permissions for GCP resources
    Using CLI:
    gcloud projects get-iam-policy pwnet-test1 --format json > iam.json
    ls
    nano iam.json
    gcloud projects set-iam-policy pwnet-test1 iam.json

    Create Custom role for additional permissions

    Service accounts
    Project Creator, Billing Account Creator access required for creating new projects
    gcloud config list
    gsutil ls gs://pwnet-bucket1
    gsutil cp gs://pwnet-bucket1/*
    gsutil cp file.txt gs://pwnet-bucket1 # access denied if the API permission is only READ, edit instance provide READ WRITE access


    IAM Best Practices
    Use Principle of least privilege
    Restrict service account access
    Restrit Service account admin role
    Careful with Owner role (Owner can change IAM policy)
    Rotate service account keys periodically
    Auditing - Cloud Audit Logs, Export Logs to cloud storage, restrict log access


    Billing:
    Bigger you scale, greater in number of resources
    Billing roles defined in IAM
    Org is top in Hierarchy, Billing accounts linked to projects(Required Billing Account User)
    view billing info - 1.web console 2.export to cloud storage and big query 3.set budgets and alerts

    find all charges that were more than 3 dollars:
    SELECT product, resource_type, start_time, end_time,
    cost, project_id, project_name, project_labels_key, currency, currency_conversion_rate,
    usage_amount, usage_unit
    FROM `cloud-training-prod-bucket.arch_infra.billing_data`
    WHERE (cost > 3)

    find which product had the highest total number of records:
    SELECT product, COUNT(*)
    FROM `cloud-training-prod-bucket.arch_infra.billing_data`
    GROUP BY product
    LIMIT 200

    which product most frequently cost more than a dollar:
    SELECT product, cost, COUNT(*)
    FROM `cloud-training-prod-bucket.arch_infra.billing_data`
    WHERE (cost > 1)
    GROUP BY cost, product
    LIMIT 200




    Stackdriver
    suite of tools for monitoring, logging, and tracking diagnostics for apps; native monitoring of both GCP and AWS; Dynamically discover all GCP resources
    1.Monitoring - monitor metrics, health checks, dashboard and alerts etc
    2.Logging - audit of activity
    3.Error Reporting - identify and understand app errors
    4.Trace - app engine find bottlenecks
    5.Debugger - find/fix code errors in prod

    Benefits - Multicloud monitoring, Identify trends and prevent problems before they occur, Centralized logging, Better signal-noise ratio, Find & fix problems faster
    3rd party integrations (SRE vendors) - BMC, Splunk, PagerDuty, Tenable, HipChat, netskope
    Best practice - single project for stackdriver monitoring, determine monitoring needs in advance, IAM controls

    Concepts:
    Pricing - Basic and Premium(Seperate from GCP account status); Applies only to monitoring; new accounts 30 day free trial
    $8 per month per resource; 30 days log retention, 500 time series per chargeable resource, 250 metric types per project
    Stackdriver agent - software installed on VMs; recommended not required, agentless gets CPU, disk/network traffic, and uptime info; agent access
    addition resource and application info; requires premium tier, monitor many 3rd party apps(Apache, Kafka, MySQL, Nginx, Tomcat etc)

    $ curl -sSO https://dl.google.com/cloudagents/install-monitoring-agent.sh
    $ sudo bash install-monitoring-agent.sh

    Resources -> Metric Explorer, Cloud Storage etc
    Groups -> Create Groups (select a project with group of instances)
    Dashboard -> Create Dashboard
    Explore resource, Alerting, uptime checks

    Stackdriver logging:
    Concepts - repository for log data and events; store, search, analyze, monitor and alert; collect platform, system and app logs(agent); realtime/batch
    Associated by project; Log entry - record status or event; Log - named collection of log entries; retention period
    Audit Log Types - 1.Admin Activity(automatically turned on, requires IAM role logging/Logs viewer or Project viewer, always enabled no charge),
    2.Data Access(create modify or read user-provided data, requires IAM role logging/Private Logs viewer or Project Owner, Disabled charged on usage)
    Retention - Admin activity 400 days; Data access logs 7/30 days, Non audit logs 7/30 days
    Allotment - 50Gb per project / 50+14.25MB premium, overage charge $0.50 per GB per project
    Exporting Logging date - 1.Cloud storage 2.BigQuery 3.stream to other source(pub/sub); requires project/destination bucket; create a filter;
    choose destination; filter and destination held in a sink
    Best practices - search for specific values, use adv filters, use adv viewing interface

    HandsOn:
    view logs
    filter(basic/advanced views)
    turn on real time viewing
    export logs to cloud storage/big query
    enable data access logs
    gcloud projects get-iam-policy pwnet-test2 --format json > policy.yaml
    ls
    nano policy.yaml # add auditConfigs:
    gcloud projects set-iam-policy pwnet-test2 policy.yaml

    Trace, Error Reporting, and Debugger Concepts
    Error reporting - real time error monitoring; automatic and real time analysis; automatically enabled in App Engine;
    Trace - find performance bottlenecks(latency); collect data from GAE, LB, or apps with Stackdriver Trace SDK;automatically enabled in App Engine
    Debugger - Inspect app state without stopping or slowing app; doesnt req additional log statement; automatically enabled in App Engine standard


    GCP Core Building Blocks
    Google Cloud Storage - Unstructured data, virtually limitless size, Pay per use not allocation, primary unit is bucket, object inside bucket
    Storage Class - Regional, Multi-regional, Nearline, Coldline
    Changing storage class - cannot change from multi-regional to regional vice versa; gsutil to change class of existing object or move obj to another bucket

    gsutil(FOR CLOUD STORAGE)
    https://cloud.google.com/storage/docs/gsutil
    gsutil mb -l us-central1 -c nearline gs://pwnet-test1-test
    gsutil ls -l gs://pwnet-test1-test

    Cloud Storage Security
    Access Management principles - IAM and ACL
    IAM - granted at projects, resource or bucket level; Roles - Primitive, Standard Storage roles (independently from ACLs), Legacy roles (work with ACLs)
    ACLs - can be applied to buckets/objects; Objects inherit ACS from default bucket ACL
    Best Practice - use IAM over ACL(enterprise grade access control, leaves audit trail); use ACL to grant access to obj without access to bucket
    signed URLs - times access to object data (temporary access without google account)


    storage.cloud.google.com/bucketname

    Assign IAM role to bucket
    via console
    gsutil iam ch user"[email protected]:objectCreator,objectViewer gs://pwnet-test1-test
    gsutil iam -d user"[email protected]:objectCreator,objectViewer gs://pwnet-test1-test

    Assign ACL role to bucket and objects
    via console
    gsutil acl ch -u [email protected]:O gs://pwnet-test1-test
    gsutil acl ch -d [email protected] gs://pwnet-test1-test
    gsutil acl ch -u [email protected]:O gs://pwnet-test1-test/3.png # only access to object
    gsutil -m acl ch -u [email protected]:O gs://pwnet-test1-test/*

    Mixed owner/read permissions
    Storage Legacy Bucket Owner - create, upload, delete file but cannot view the contents
    Storage Object Creator, Storage Object viewer - create, upload and view but cannot delete

    signed URLs
    APIs & Services -> Create Credentials -> Service account key -> New service account -> select name and role -> create (JSON file downloaded)
    ssh session -> upload file -> mv pwnet-test1-xedefdefefe.json pwnet-cert.json
    gsutil signurl -d 10m pwnet-cert.json gs://pwnet-test1-test/3.png
    get the URL from output and give it to user who need access to the object

    Object Versioning and Lifecycle Management Concepts
    Object versioning - retrieve objects that are deleted or overwritten; applied at bucket level; disabled by default; when enabled objects archived
    version increase bucket size, archive version retains ACLs; Versioing properties - Generation (obj content change), Metageneration

    Object Lifecycle management
    Sets TTL on an object(to delete version/downgrade storage class); Applied to bucket level ; implemented with combination of rules, conditions, actions
    Rule - Specify set of conditions in order to take action
    Condition - criteria to meet before action; Age, CreateBefore, IsLive, MatchesStorgaeClass, NumberOfNewerVersions
    Actions - Delete, SetStorageClass

    gsutil versioning help
    gsutil versioning get gs://pwnet-test1-test
    gsutil versioning set on gs://pwnet-test1-test
    gsutil ls -a gs://pwnet-test1-test

    gsutil lifecycle get gs://pwnet-test1-test > policy.json
    edit the file to change the rule
    gsutil lifecycle set policy.json gs://pwnet-test1-test

    Bucket and Object Command Line A-Z
    gsutil ls -al gs://pwnet-test1-test #gets metageneration
    gsutil -m rewrite -s NEARLINE gs://pwnet-test1-test/* # set off versioning before, to move diff storage class
    gsutil acl ch -u AllUsers:R gs://pwnet-test1-test/file.txt # shows as public link on console

    Interconnecting Networks
    Worldwide private network; communication between regions and on-premises never touches public internet; networking handled differently than others.
    SDN - traditional network(manage network hardware, high mgmt overhead req) SDN(Everything is virtualized)
    single global/cross region VPC; global internal DNS/load balancing/firewalls/routes; global public DNS; Rapid scaling with global LB(Layer 7/HTTP);
    Subnets within VPC group resources by region/zone; IP range between subnets dynamically expandable.
    Extend Google Private Network to On-premises - VPN, Cloud Interconnect, Direct Peering

    Connecting your Network to Google
    1. Dedicated Interconnect - Physically connect on-premise network to GCP VPC via Google Edge location; Useful for Hybrid env, High bandwidth traffic;
    Must be at supported peering location; can be direct with Google or ISP; $1700 per 10Gbps link, upto 80 Gbps total; Reduced egress fees
    Use Cases - On-premise data processing, low latency needs,
    2. Peering - connect business directly to google; 70+ location in 33 countries for Direct peering; Exchange BGP routes; Direct and Carrier Peering;
    Does not connect to internet; Also save on egress fees; 10GBps per link(direct), variable for carrier; Use case Ex: Private API excess
    3. Cloud VPN - Site to site VPN connection over IPSec; connect internal network to GCP over encrypted tunnel over public internet; Up to 1.5 Gbps per tunnel;
    Can use multiple tunnels for increased performance; Static and dynamic routes(using Cloud Router); Supports IKEv1 and IKEv2 using shared secret;
    connect on-premises to GCP or connect twoo different VPC's on GCP; No site to client option available.

    CloudVPN
    connect on-premise network to GCP VPC; IPSec connection over VPN over public internet; traffic encrypted by one gateway, decrypted by other gateway.
    99.9% SLA, Site-to-site only; Upto 1.5Gbps per tunnel, can have multiple tunnel; Static and Dynamic routes
    Use case - Connect to on-premises or connect 2 different VPC network on GCP
    Requirement - VPN Gateway on both ends(peer), Peer Gateway must have static IP; Non conflicting CIDR range/subnet with rest of network
    Cloud Router - Static vs Dynamic routing; Static:create routing table for existing and new routes, Can't re-route if link fails; Dynamic:networks
    automatically discovery topology changes via BGP; Can re-route if link fails

    To use Dynamic routing, change dynamic routing mode to Global on VPC network.
    Google ASN(65000-65001) and BGP address(169.254.0.1-169.254.0.2) required
    Tunnel IP is static IP of other VPN Gateway
    Add BGP session for Dynamic Routing

    gsutil cp gs://gcp-course-exercise-scripts/vpn-exercise-script.sh .
    bash vpn-exercise-script.sh

    Virtual Networks
    VPC Concepts
    subnets are region bound and can span span multiple zones
    isolated per project; but can share between projects with Shared VPC
    Quotas - Hardcap of 7000 VMs in a VPC; IPv4 unicast traffic only; Most other quotas can be increased by request
    Network Tags - primary method of segmenting network traffic access; apply to firewall and network routes; individual instances are tagged

    Firewall - single firewall for entire VPC; manage both ingress and egress traffic; Deny all Ingress, Allow all egress; Conditions - source/target, port, protocols, Tags
    create firewall rules - ssh-icmp-instance2(Tag: restrict-access); internal-allow-all; ssh-allow; ping-allow
    Firewall Rules via Command Line - vnc-desktop
    firewall rule for port 5901
    vnc-allow; Target tag vnc-server; tcp:5901 # get the command line and paste in CLI
    gcloud compute firewall-rules create vnc-allow ......
    gcloud compute instances add-tags vnc-desktop --tags vnc-server
    Routes - software based, not limited by hardware; routes traffic leaving VMs; special case for advanced routing Many-to-one route, Proxy server;
    Routes+firewall rules combine to determine traffic access

    Shared VPC Concepts
    share VPC across projects within Org(Cross Project Networking)
    Host project - project hosting the shared VPC; Service project - project with permission to shared VPC; Standalone project - project not using shared VPC;
    Shared VPC admin - IAM role for admin of shared VPC; Service project admin - project admin of shared VPC service project
    Use cases - Seperation of projects for access control/billing, but need access to same VPC environment; 2 tier web service; Hybrid cloud scenario
    IAM roles - Org Admin, Shared VPC admin(Org level role), Network user-compute.networkUser(Project level role)

    Compute Shared VPC admin role required to the user to enable Shared VPC;

    gsutil cp gs://gcp-course-exercise-scripts/firewall-exercise-script.sh .


    Compute Engine Deep Dive
    GCE, GKE, GAE all run on VMs; Single VM, Force multipliers, Automation, Autoscaling, Managed Instance Groups, Load Balancer, Custom Image, Disk manipulation,
    Metadata, Startup/Shutdown scripts,Snapshots, Persistent Disks, gcloud commands
    Disk concepts - Single root disk for OS; Persistent(most common, default, Not Directly attached) or Local SSD (Directly attached) or Cloud Storage Buckets
    Persistent - 64 TB in total, Scope of access zone, no RAID config necessary
    Local SSD - cannot be boot device, encrypted(Google Supplied Keys only), 375GB in size (can attach upto 8), must create on instance creation
    Cloud Storage Bucket - Not a root disk, Encrypted, Lower performance
    Disks are zone bound
    gcloud compute disks create disk03 --size 50GB --type pd-standard
    sudo lsblk
    sudo growpart /dev/sda 1
    sudo resize2fs /dev/sda1

    Images Concepts
    Images - create new instances, configure instance templates; access across projects
    Snapshots - periodic incremental backup of existing disk/instance, access only from within same project
    Images created from Persistent disk, another image in same project, image shared from another project, compressed image from cloud storage
    Image families(group related images together); Deprecating images - transition user away from older unsupported version in manageable way, Deprecation
    states: Deprecated, Obsolete, Deleted, Active(command only)
    Sharing and moving images - Require Compute Engine Image User role to host project; For managed instance group, service account must be granted role
    Export image to cloud storage - export image as a tar.gz to cloud storage(only linux); share with Image User role is preferable
    Hands On - Custom Images
    gcloud compute images describe-from-family webserver
    gcloud compute images deprecate webserver-base --state ACTIVE

    Snapshot Concepts
    For windows use VSS snapshots; run fstrim before taking snapshot(linux)
    gcloud compute disks snapshot website --zone us-central1-a --snapshot-names=website-backup-2
    gcloud compute snapshots list
    gcloud compute snapshots describe website-backup

    Startup and Shutdown scripts
    ease managemet of large no of VM's; easily and programmatically customize VM; key component to instance group and scaling capabilities
    always run as root/administrator; Input methods - Direct (script field in instance properties), Link to script on Cloud Storage
    Shutdown scripts - great for managed instance group/autoscaler; Ex: copy processed data to cloud storage, backup logs etc; Good to pair with preemptible
    Metadata server - Built into GCP; Manage config and env variables programmatically; Default and custom values; Key/value pair

    Metadata -> startup-script-url gs://pwnet-bucket1/startup_script.sh


    Elastic Cloud Infrastructure: Scaling and Automation
    Load Balancing and Instance Groups
    Force Multipliers - Automation and Scaling - Scalable, Automatic
    Repeatabale, documented, scalable, necessary for large architecture, reduce complexity
    Load Balancer, Instance Group, Autoscaling
    Load Balancer - distributes user network requests among a pool of instances; single frontend point of access; SDN; Global or regional in scope;
    traffic subject to firewalls; Types - Global External LB(HTTP/s, SSL/TCP Proxy), Regional External LB(Network TCP/UDP), Regional Internal LB
    HTTP(S) LB - Manages HTTP(s) requests; Global scope, IPv4 and IPv6, Distribute traffic by location or content requested; Paired with IG for backend;
    Native support for websocket protocol
    Network LB - non HTTP(s); balance requests by IP protocol data; Forwarding rules, Target pool
    Network Internal LB - private LB; used with multi-tier app; affects cloud router dynamic routing

    Instance Group and Autoscaling
    IG - group of instances; Manages as a group not one at a time; Managed IG and Unmanaged IG;
    Features - Autoscale, work with LB, Health check-ASG; Require Instance templates(Define Group config, Global); From template create Managed IG
    Networking - subject to firewall rules for allowed traffic; essential for LB; LB->Backend Service->Backend->IG ;
    Health checks - Auto healing; Managed IG only; if instance or service fails, delete and recreate identical
    Updating Managed IG - Managed Instance Group updater;
    Autoscaling - automatically scales IG; Managed IG only; Set by autoscaling policy; Set metric and threshold; set min and max instance count
    AS based on CPU usage, HTTP load balancing usage, Stackdriver monitoring metric, Multiple metrics

    simulate Load Testing via instance
    ab -n 500000 -c 1000 http://xx.xxx.xxx.xxx/


    Cloud Deployment Manager Concepts
    Infra deployment service; Automates creation/deployment of GCP resources(configuration files and templates); Standardize and repeatable;
    Used by Cloud Launcher to create, easy one click deployments
    How it works - Deploy with command line only; IaC; calls on API resources; Configuration file YAML format; contain resource section followed by list
    of resources; Resource components - Name, Type, Properties; Templates - config file contains templates; Python or JINJA2 format; reusable
    Manifest - Read only output of final config; Includes config Yaml, imported templates, expanded resource list; use for troubleshooting

    vm.yaml
    API call needs project name
    gcloud deployment-manager deployments create test-deployment --config vm.yaml
    gcloud deployment-manager deployments delete test-deployment


    GKE/GAE Exam Perspective
    Infrastructure, how to build, how to manage, best practices
    GKE/GAE are managed Infra, developer/code focused; High level understanding of GKE/GAE; when to choose one of them over other options;
    Managing app engine versions, resizing k8s engine cluster

    Containers
    Container Resources
    Container Builder, Container Registry, GKE
    Container/Kubernetes Engine Cluster
    gcloud container clusters create bookshelf --zone uns-central1-a --machine-type f1-micro --num-nodes 3
    gcloud command to input for changing the size of node pool
    gcloud container clusters resize bookshelf --size 5
    gcloud command to change machine type without stopping cluster
    migrate the instances to different node
    gcloud container clusters delete bookshelf

    App Engine Resources and Management
    Cloud Source Repository - Private Git repo hosted on GCP; Use with stackdriver to debug info alongside your code; connect to github/bitbucket;
    source code browser
    GAE Management - Cloud shell(preview in local env without deploying); versions+split traffic(Rollout update slowly);
    Firewall rules act differently(Default allow all, control access from IP ranges, cannot filter traffic type, Block malicious IP);
    Best Practices - Break app into microservices; Rollout update slowly with split traffic; Use blue-green deployment model
    Go to App Engine directory, create sandbox env using "dev_appserver.py ./app.yaml"

    Build and Deploy a Scalable Company Website
    Deploy a Cloud Network Monitoring Service to Monitor On-Premises Network