Cloud based Frameworks/ Kubernetes environment

Cloud based microservice frameworks

Some of open source platforms available are Swarm (Docker), Kubernetes (google), mesos,
The most popular in communities and internet industry seems to be kubernetes and picking steam in telecom front as well for upcoming 5G Service based architecture.
The kubernetes has the default container solution based on Rket ? but the most popular combinations are using Docker as container.

Kubernetes/ an Cloud orachastrator !!

Deployment automation of scaling in (zooming in/ increasing) and out (zooming out, decreasing)
Network plugin available such as flannel (popular, support only IPv4), calico (support IPv4, IPv6), weavenet
Kubernetes currently does not support dual stack IPv4, IPv6 inter-working etc capabilities till version 1.13 (dec 2018). Another limitation, it does not recognize the multiple interfaces in case enable to POD's for configuring services exposure and external communication till version 1.13 (dec 2018)

Will be adding more to Kubernetes networking in coming days ...

SNAT/ DNAT while
Kube-proxy* has various (3) mode to manage all the routing aspects of kubernetes environment between different kubernetes elements (e.g. POD's etc) currently its enhanced from user-space to iptables to ipvs. default has been kept as iptables. The user-space problem was that every packet into kubernetes needs to go from kernel to user-space and then back to kernel to get it routed to correct POD instances. with iptables the kernel to user-space switching is avoided but another problem was hit when one defines thousands of services, each service definitions adds an iptable entry for that virtual IP configured as service IP. Now with thousands of such IPs configured in iptable, it hits the performance so k8s introduced ipvs.

When the packets are travelling using iptable/ ipvs modes (kube-proxy) into the k8s environment, their destination addresses are modified from the service IP to the POD IP before routing them to an selected POD. Lets understand what is this routing !!! when the packet lands on Service IP, k8s does not know your application logic to which POD to route. The service encompasses multiple POD's running similar application logic in back-end. Given all that k8s only selects an appropriate POD (based on k8s attributes define to do selection of PODs), and packet is routed to one selected. Hence any legacy application porting into kubernetes needs to get rid of hard bindings for their service exposure. Any POD can receive any packet for connection establishment, thereafter the packets are guaranteed to land onto the same POD where connection is established.

Networking in kubernetes (moving topic)

Plugin available are

Flannel/ Calico/ Weavenet

Contiv/ Cilium/ etc

==================

>>>>>>>>>> K8S Networking

https://dzone.com/articles/how-to-understand-and-setup-kubernetes-networking

Kubernetes default bridge - cbr0

Docker default bridge - docker0

Kubernetes networking challenges

POD to POD - Intra-Node Level communication - Gets rerouted from cbr0/ docker0 and does not go to physical NIC of the Node. No IpTable involvement.

POD to POD - Inter-Node Level Communication - Gets redirected to external network. cbr0/docker0 after checking that Destination POD IP is not available local will send it to Physical NIC. No IpTable involvement.

Egress/ Ingress Traffic

Ingress Traffic - hits the public IP address exposed using Service type as load balancer. Who them route the packet to specific Node, with it destined for specific Node Port where the service is running.

The packet lands to CNE with load balancer IP as destination.

packet lands on to Node, with Node IP as destination

packet lands on to POD, with POD IP as destination.

Egress Traffic -

when connection to an external IP is initiated by a Pod, the source IP is the Pod IP... more iptables rules, added by kube-proxy, do the SNAT (Source Network Address Translation) aka IP MASQUERADE. This tells the kernel to use IP of the interface this packet is going out from, in place of the source Pod IP. A conntrack entry is also kept to un-SNAT the reply.

Kubeproxy - like many other controllers in Kubernetes, that watches the api server for endpoints changes and updates the iptables rules accordingly.

Due to these iptables rules, whenever a packet is destined for a service IP, it’s DNATed (DNAT=Destination Network Address Translation), meaning the destination IP is changed from service IP to one of the endpoints — pod IP — chosen at random by iptables. This makes sure the load is evenly distributed among the backend pods.

When this DNAT happens, this info is stored in conntrack — the Linux connection tracking table (stores 5-tuple translations iptables has done: protocol, srcIP, srcPort, dstIP, dstPort). This is so that when a reply comes back, it can un-DNAT, meaning change the source IP from the Pod IP to the Service IP. This way, the client is unaware of how the packet flow is handled behind the scenes.

network namespace

On every Kubernetes node, which is a linux machine in this case, there’s a root network namespace (root as in base, not the superuser) — root netns. The main network interface eth0is in this root netns. Similarly, each pod has its own netns, with a virtual ethernet pair connecting it to the root netns. This is basically a pipe-pair with one end in root netns, and other in the pod netns.

We name the pod-end eth0, so the pod doesn’t know about the underlying host and thinks that it has its own root network setup. The other end is named something like vethxxx.

You may list all these interfaces on your node using ifconfig or ip a commands.

This is done for all the pods on the node. For these pods to talk to each other, a linux ethernet bridge cbr0 is used. Docker uses a similar bridge named docker0. (use brctl show to see them)

cbr bridge does ARPing to detect the who has IP. it sends broadcast in network subnet and if someone replies, it sends the outbound packet to that POD.

ques ?? does it keep mapping of MAC addr ?? do POD interfaces have MAC addr ?? if not how these bridges keeps the information registered or every packet triggers such ARP requests??

Need of Overlay networks

Overlay networks are not required by default, however, they help in specific situations.

Like when we don’t have enough IP space, or network can’t handle the extra routes. Or maybe when we want some extra management features the overlays provide. One commonly seen case is when there’s a limit of how many routes the cloud provider route tables can handle.

For example, AWS route tables support up to 50 routes without impacting network performance. So if we have more than 50 Kubernetes nodes, AWS route table won’t be enough. In such cases, using an overlay network helps.

For example in flannel case, a new virtual ethernet device called flannel0 added to root netns. With it, packets always passes through it when they move between cbr0 and eth0 for ingress/ egress.

flanneld daemon reduces the burdon of routing overhead from kernel routing space to daemon process. As the flanneld daemon talks to the Kubernetes apiserver or the underlying etcd, it knows about all the pod IPs, and what nodes they’re on. So flannel creates the mappings (in userspace) for pods IPs to node IPs.

There could be slight differences among different implementations, but this is how overlay networks in Kubernetes work. There’s a common misconception that we have to use overlays when using Kubernetes. The truth is, it completely depends on the specific scenarios. So make sure you use it only when it’s absolutely needed.

==================

Multiple interface - Genie/ Multus

Service Mesh in cloud environment

As with the advent and software architectures are started getting tractions, the need arised to further isolate the inter-micro service communications logic from developer and provide it inbuilt using some standard plugins. Thats the space Istio and Envoy comes to play. Istio is open-source against and provide the service using proxy service. Envoy built by Lyft (Car ride company) is most popular option to use as proxy for Istio.

Service Exposure Architecture

With Kubernetes framework, you get lot of options available for internet or E-commerce sort of service. And they might not be suitable for all or every industry. So moving from pure dynamic to a little predictable scenario.

CPUSet/ Numa
kubernets allows to define the cpuset and memory (RAM) allocation for container.
The cloud application should use following to check the resources allocated and their usage?
top, procstat, ss, nset

Numa - ? is it applicable for container, may not be
Memory and CPU resource
memory stats inside container - https://fabiokung.com/2014/03/13/memory-inside-linux-containers/
Knowing CPU allocated from within container - https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling
cpu allocation to container - https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-time-9eff74d3161b
Does sockets options for sockets created inside containers are allow to be changed ?
There are options to set the socket settings in cloud container run-time
1# Default the container inherits the socket settings from Host
2$ using safe sysctl (containerized), there are few settings could be done at container level.
https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/
3# can application modify the socket created ?

Container Application Platform - Platform as a Service (PaaS)

Kubernetes - by Google, now opensource, orchestration tool, helps managing the deployement of applications.
OpenShift - https://www.openshift.com by Redhat .. Written in Go, AngularJS languages. Openshift is Platform as a Service which is a layer on top of IaaS (OpenStack)

OpenStack - in Java, it is an open source software platform for cloud computing
Cloud Foundry - PaaS

Docker Swarm - Achieve same functionality as Kubernetes

Why Cloud needs External Load Balancer's -

As you would have read about how the services deployed in cloud environment. And their respective service logic are run inside POD instances. A POD is nothing but an namespaced environment to run multiple containers (e.g. typical config is to run single container, but you have cases to run 2 containers as side car model or 2nd container is headless service providing service to main container). The Kube-proxy handles the load balancing at Node/ Card/ Hardware box level very well. Current Kube-proxy mode is iptables default, which allows the incoming traffic to be distributed among PODs on that Node, with random distribution. Kube-Proxy also support IPVS mode, which is IP Virtual Server running in kernel reducing the iptable rules burden/ overhead for deploying thousands of service in cluster.
Lets get back to the topic, So Node level distribution is through Kube-Proxy. Who distributes to Nodes in the Cluster. Thats where the load balancer (LB) comes in picture, LB does the distributes
the traffic to one of Nodes in Cluster. There are many L4 aware load balancer from popular cloud infrastructure providers. We are currently handling the load balancing for Bare Metal Cards. currently there is MetalLB is known load balancer service used for some of venders in their products. The MetalLB works on L2/L3 layer of the traffic and redirect / distrbutes the traffic to one of Nodes in Cluster. The MetalLB requires public IPs assigned for allocation purpose to services. The service definition in kubernetes needs to define the annotations to request the LB Public IP address for accessing the service from external world. refer to offical link https://metallb.universe.tf/usage/example/

Difference btw MetalLB and other Load Balancers (cloud based etc) ??

MetalLB

HAProxy

NGINX

Excellent explanation - https://www.dynatrace.com/news/blog/containers-relate-kubernetes-matters-openstack/

Database as a Server (DaaS)
Mango DB

Cloud Native Computing Foundation (https://www.cncf.io/) - https://horovits.wordpress.com/2015/08/13/cloud-native-computing-foundation-standardize-cloud-containers/

Let K8S Help you

Horzontal Pod AutoScaler - HPA concept from Kubernetes

It was based on the system metrics collected from the POD/ Containers. As of today, metric-server is used as measurement pipeline to feed the K8S HPA logic, prior to it K8S used heapster for this purpose. Prometheus also support similar functionality and also expandable to support customer measurements for application to be pluged-in.

https://brancz.com/2018/01/05/prometheus-vs-heapster-vs-kubernetes-metrics-apis/
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/autoscaling/horizontal-pod-autoscaler.md
https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

Taints/ Tolerations

Avoid Load balancer to not route packets to NODE, without POD's

SCTP support in Kubernets

The Kubernetes officially announced SCTP support in 1.12 as alpha and full in 1.13 release. The SCTP protocol is supported in K8S for internal services as well as the user applications for DNS discovery/ service/ cluster level etc. Currently this support is limited to single IP address and MultiHoming is not available as of April. 29, 2019. The K8S also has limitation of not supporting multiple interfaces for PODs, thus limiting the application to receive all the traffic on single Container interface and does not provide the method to segregate based on interface if you remember how it used to be during our old days of Blades/ Card sitting in the Chassis. Thou we could achieve multiple interface to container using CNI plugin's e.g. Genie (Huawei) / Multus (Intel) but the Kubernetes only detects first interfaces for its functionality of Service or Cluster interworking purpose so design your applications that way. As communities working on open source code, days could not be far when K8S may start supporting this as well.
Genie - https://github.com/Huawei-PaaS/CNI-Genie/blob/master/docs/GettingStarted.md
Multus - https://builders.intel.com/docs/networkbuilders/adv-network-features-in-kubernetes-app-note.pdf

DB Service/ Architecture

Provide via Bare Metal deployment and HA architecture

Provide via Cloud based service layered architecture

Provide via within service distributed db architecture

Persistence Storage

Logging

ELK/ EFK are quite popular solutions for Cloud logging. Logging architecture are basically, they run the log client on each POD/ Container defined in your service / namespace architecture. The container logs emitted on stdout/ stderr are sent to logging pipeline to log server running on each Node. The service can be accessed using service ip/ port to view the collected application logs for filtering/ analysis purpose.

https://logdna.com/blog/how-fluentd-plays-a-central-role-in-kubernetes-logging/

Measurement

components - prometheus, alertmanager, grafana

1 - prometheus - is an pull agent to retrieve the metrics exposed by microservice via port and api path

recently for HA improvements, prometheus has now been enabled with operator to allow the microservice to be discovered for prometheus and scrap the metrics. operator keeps track of discoverying the running containers (using select label tags).

2 - alertmanager - this has rules which uses the metrics from prometheus (TSDB-time series DB) to query or use for alarming and alerting purpose to operator

3 - grafana - further uses the prometheus DB (TSDB) to fetch the metrics and represent them via graphical charts, as running or active dashboard for KPI's

The Time Series DB has many other DB offering - redis, victoria metric etc

Tracing

Integration and Delivery Model

Git is quite popular among the cloud based application development projects. They typically use the Jenkins to automate the build and sanity cycles.

Benefits of Git, widely used among open source projects. Freely available. Easy integration with automation frameworks.
About Git tagging/ labeling the software code, http://airlinefiesta.com/git-branch-diagram/a-successful-git-branching-model-with-enterprise-support-seven-git-branch-diagram/

CNCF - Cloud Native Computing Foundation

Check the health and approval status of 3rd party/ open source project for cloud native environment at https://www.cncf.io/projects/

https://landscape.cncf.io/?project=graduated

Build tools - Jenkins
https://dzone.com/articles/learn-how-to-setup-a-cicd-pipeline-from-scratch

Build environment - Gradle for Java based source code compilations. For CPP/ C code base, use

Analytics Tools

Tableau - python based
Orange - python based

Mind hover

Search This Blog