Ref : Key words for use in RFC https://datatracker.ietf.org/doc/html/rfc2119
- Rethink best practises from scratch everyday. Rethink from first principles often. We like original thought and common sense.
- The best code is no code. Next best is less code. Use tools that others have built and tested... if it suits the problem.
- If you build it, you are responsible for it - forever.
- Write code that less clever people (future you) can understand.
- Code -> Test -> Sanity Check and then -> Deliver.
- Write a lot... Emails, YouTrack, Git commits, Changelog, Code comments and then finally code.
- We like polyglot skills. If a solution looks elegant in a different language, go for it, as long as it is maintainable. But remember Elegant <> (Quick | Easy).
- Dont adopt solutions for problems Facebook and Netflix have, unless we have them as well.
(May none stretch the seams in putting on the coat. May it suit him well, whom it best fits - Thoreau, Walden) - Dont adopt solutions because it is the most popular thing at the moment. It should solve a tangible problem and not an itch. Figure out all possible solutions and then choose the simplest. Simple <> Easy
- Dont solve hypothetical problems. Farsight is good. Imagination is bad.
- Be adventurous. Use a 2 week old library from some Iraninan developer on github, as long as you have the confidence to maintain it if needed. Or you should be clever enough to find a fork that works.
- We are cloud native. Azure for life.
- We will never have a system admin or DB admin. We will offload administration to the cloud provider and have a single cloud admininstrator.
- Focus on business and application, not on infrastructure.
- Really Dev-Ops. You are responsible for what you deploy end-to-end.
- System architecture is evolutionary and fluid. It is not a monument cast in stone.
- We will use the Azure service that best fits our current needs. We may move to a different Azure service when we grow and scale.
- We may shop from other cloud service providers for some services. We dont like all Azure services equally.
- All Azure services are not approved for usage. We will choose on a case-for-usage basis.
- Different cloud services become economically viable at different scales of operations. We will not make assumptions about our scale or build for imaginary scale. We want to see the business grow and the infra cost graph go beyond a threshold before we re-architect the cloud services. Applications should support this fluidity.
- System architecture will always reflect today's traffic and workload and ensure that IT infra spend does not become a burden to the business.
- We are as aggressive in rearranging the blocks/services in the system architecture as we are in changing the business logic, if it is justified with invoices and cost estimates.
- Applications MUST be packaged as containers.
- All application repos MUST contain one or more
Dockerfile DockerfileMAY be a multi-stage build, and the final container with the binary MUST contain the bare minimum artifacts required for running the application- Applications MUST follow semantic versioning
- Applications SHOULD follow 12-factor app principles. What does this mean for us?
- Applications SHOULD be agnostic about where they are running or how they are packaged.
- The application SHOULD be agnostic of whether it is running in dev/uat/prod. It will simply have different environment variables for the different environments. There SHOULD be no code flow based on environment checks.
- Application MUST log to
stdout/stderr. Where or how the logs are collected is not an application concern
- All secrets MUST be read from environment variables.
- All config options SHOULD be read from environment variables.
- ENV Vars SHOULD have a flat structure. The application will print to sysout a json with all the ENV vars that are being used for that execution on start up.
- Apps MUST NOT do dynamic config updates.
- If the application is dependent on another service, the endpoints for that service MUST be read from environment variables.
- The application MUST NOT make any assumptions about the network. All private and public network addresses should be taken from environment variables.
- Application MUST NOT be dependant on local file system except for temporary storage.
- In short, the application will work the same running from the local machine, running inside a VM, running as a standalone container, running inside a container orchestration platform as long as the dependencies are available as supplied by the environment variables.
- Applications MUST share nothing. No shared volumes etc.
- All applications MUST have a
service-definition.yml. (More on this later). - All applications SHOULD have a
Changelog.mdin the project root with the application version number. - Application MUST respect the
LOGLEVELenvironment variable. A subset of the severity levels of the RFC 5424 standard SHOULD be supported. - Application MUST use UTC for all date and timezones.
- All web services MUST have a health check endpoint
/health. Health check endpoint MUST return 200 OK if the service is healthy and whatever details you need to determine the health of the service. - All web services MUST be authenticated.
- All http calls SHOULD have exponential backoff and timeouts SHOULD be applied.
- Alpine linux MUST be used as the base image.
- LTS releases SHOULD be used.(v3.20 at the time of writing)
- Alpine apk packages SHOULD be pulled from stable wherever possible. If you are pulling from edge, get it approved.
- Containers MUST be tagged explicitly with the application version number or a sequential build number. Containers SHOULD never be tagged
latest
- Use LTS (v20.9.0 at the time of writing)
pnpmMUST be the package manager. (We want to do repeatable cacheable builds with ease on the CI. We dont want node-modules hammering the disk)ReactJSorSvelteSHOULD be the frontend frameworksviteMUST be the default build tool.- CSR | SSR both are acceptable.
- All http calls should have retries with exponential backoff with timeout configured. This is for resilience between server restarts.
- We like monoliths and want to protect against proliferation of micro-services and http calls. i.e We dont want to go too micro.
Well written monoliths are...
- Easy to version
- Efficient (function calls - no http)
- Easy to reason about. Single code base.
- Easy to debug.
- Easy to deploy and handle infrastructure.
- Dependencies are uniform. Helps upgrades.
- Bounded context <> Micro-service.
- Bounded context = seperate source file in the repo
- Bounded context = abstract it into a versioned library if you want to share across domains.
- Micro-service <> Application server
- The definition of a micro-service is not always a separate application server. It can be a library that encapsulates all interaction with that subsystem. Or even an interface which hides the implementation details.
- Dev team structure should not determine what becomes a micro-service.
- Different teams committing to the same repo is causing merge conflicts is not a reason to spin off a micro-service.
- Micro-services are good in the following scenarios
- Independant resource requirements
- Independant scaling requirements
- Pharmacy and Lab are drastically different domains and can be split into micro-services. Changing one is never going to change another.
- Payments service is convenient to put in a library and share. But it is also a good micro service candidate if you want to rate limit.
- Important: We don't want to carry bounded contexts in business logic into the infrastructure layer
- If there are 2 micro-services which are consistently being changed in tandem, they should be merged.
- All micro services will share a common postgres instance unless you prove that we have Netflix level problems. You can use different schemas within the same postgres instance to acheive service separation.
- Micro-service checklist (incomplete..)
- Why can't this be a shared library or a function ?
- What are the service's unique scaling requirements ?
- What are the service's unique resource requirements ?
- What is the impact of the service/function going down ? Isolation level ?
- How will the service handle cleanups, transactions and rollbacks of the entities it manages ? i.e you can encapsulate a bunch of updates into a transaction in SQL all of which will rollback if one fails. This is not possible in micro-services. Every exception has to be handled and rolledback manually. Make sure you have a good plan for orchestrating a sequence of microservice calls for both success and intermediate failures.
- Applications MUST log to
stdout/stderr. Where or how the logs are collected is not an application concern - Applications MUST produce only structured json logs.
- Application MUST respect the
LOGLEVELenvironment variable. A subset of the severity levels of the RFC 5424 standard SHOULD be supported.
- DB migrations MUST be handled by an independent tool like Atlas Go or Flyway or Liquidbase or something similar. It should be independant of programming language ORM/framework. We dont want application programs to assume that they own the database layer.
- DB Schema and config SHOULD live in its own separate repo.
- DB migrations SHOULD run in a seperate container in the stack alongside the application container and releases will be managed.
- Primary keys MUST be named as tablename_id. e.g Client ID column should be named as client_id and not as id. (Idea borrowed from Salesforce engineering to avoid naming collisions and better readability of the code)
- Primary key column types SHOULD be UUID v7 or ULID. i.e Primary keys should be both random and sortable.
- Avoid breaking DB changes. If you have to change a column type, add a new column, update the application and then delete the old column in a subsequent upgrade. This avoids downtimes.
- DB changes will try to be incremental and have small batch sizes. We dont like massive schema changes going in together.
- Cache(kv) should be implemented as a library. Try to use Cosmos DB initially as the implementation. When we scale and the costs justify running a managed Redis instance, we will change the implementation within the library. (Redis > 3x CosmosDB price for estimated usage)
- Cache should simply be used as a KV store. Dont use Redis for streaming or pub/sub. If you have such requirements get in touch.
- All services MUST have a service-definition.yml file.
- Template coming soon...
- All cloud resource MUST fall under one of the following resource groups
- HTI-Core-Prod
- HTI-Core-Staging
- HTI-Core-Dev
- HTI-DigitalMarketing-Prod
- HTI-DigitalMarketing-Staging
- HTI-World-Root (Globals like Hosted zones etc..)
- All cloud resources MUST be created with tags like so
- Owner -> git email
- Group -> One of the predefined HTI resource groups
- If a resource is found without the above tags it will be automatically deprovisioned via a script that runs occassionally.
- All cloud resources for use in production MUST go through the cloud admin. Request should be raised with sufficient notice.
- Using a new cloud service will depend on the use case and MUST be pre-approved by the architect. Explain your use case and get it approved before you develop. We dont like all Azure services. Some are garbage.
- The following regions MUST be used for specific purposes. Dont mix and match or use any other region without approval.
- India West - Prod
- India South - UAT/Staging
- India Central - Dev