Codestin Search App

About Conftest

Jan 152020

Conftest is a tool to help you write tests against structured configuration data. It relies on Rego which is a nice query language that comes with a bunch of built-in functions that are ready to use. By using it, you can write tests against the config types below:

YAML/JSON
INI
TOML
HOCON
HCL/HCL2
CUE
Dockerfile
EDN
XML

When it comes to talking about conftest's pros/cons, there're some unique features that some other testing tools don't have.

Pros:
You can:

write more declarative tests(policies) which are not simply assertions.
write tests against many kinds of config types.
use --combine flag to combine some different files in one context for using their variables globally.
use parse command to see how the inputs are parsed.
combine different input types in one test run and apply combined policy against them.
Pull/push policies from different kinds of sources like S3, docker registry, github file, etc...
Find real-world examples in examples/ folder

Cons:

Learning Rego could be a little bit time consuming

Finally, I encourage folks either to look at conftest's source code and rego language.
It's a simple, single-threaded command-line tool. I recommend folks to integrate it to their organizations also PR's are welcome.

Here's the repo: https://github.com/instrumenta/conftest

Thanks!

Tags:
conftest, unit testing, configuration unit testing

Run Spark Jobs on Kubernetes

Apr 162019

Hello All,

In this article, I'm gonna show you how we transform our ETL processes to spark which runs as Kubernetes pods.

Before that, we prefer custom python codes for our ETLs.
The problem about this project is a need for a distributed key-value store and when we pick a solution like Redis, It creates too much internal I/O between slave docker containers and Redis. The performance with spark is much better.
Also, the master creates numbers of slaves and manages the containers. Sometimes, docker-py library fails with communicating the docker-engine and the master can't delete the slaves or Redis containers. This causes idempotency problems.
You have to distribute the slave containers across your docker cluster which means that you have to put too many cross-functional requirements next to your business code.

We inspect the spark documentation for Kubernetes because we have been already using Kubernetes for our production environment.
We use the version 2.3.3 for Spark-Kubernetes.
You can have a look at this: https://spark.apache.org/docs/2.3.3/running-on-kubernetes.html
Even the Spark Documentation says the feature is experimental for now, we started to run spark jobs on our Kubernetes cluster.

This feature allows us to run spark across our cluster.

Easy to use.

Secured. Because you have to create a specific user for spark driver and executors.

Enough parameters for Kubernetes (node-selector for computation, core limit, number of executors, etc.)

We bundled the spark submit codes with our artifact jar.
After this step, the docker container can make a request to k8s master, starts the driver pod, and the driver pod creates executors from the same image.
This allows us to bundle all the things in one image. If the code change, CI creates a new bundle and publish it to the registry.
The image describes the architecture below.

undefined
First of all, you have to create a base image.
Download the "spark-2.3.3-bin-hadoop2.7" from here https://spark.apache.org/downloads.html and unzip it.
Create an image from this.

./bin/docker-image-tool.sh -r internal-registry-url.com:5000 -t base build
./bin/docker-image-tool.sh -r internal-registry-url.com:5000 -t base push

We created multi-staged Dockerfile like this.

FROM hseeberger/scala-sbt:11.0.1_2.12.7_1.2.6  AS build-env

COPY . /app

WORKDIR /app
ENV SPARK_APPLICATION_MAIN_CLASS Main

RUN sbt update && \
    sbt clean assembly

RUN SPARK_APPLICATION_JAR_LOCATION=`find /app/target -iname '*-assembly-*.jar' | head -n1` && \
    export SPARK_APPLICATION_JAR_LOCATION && \
    mkdir /publish && \
    cp -R ${SPARK_APPLICATION_JAR_LOCATION} /publish/ && \
    ls -la ${SPARK_APPLICATION_JAR_LOCATION} && \
    ls -la /publish

FROM internal-registry-url.com:5000/spark:base

RUN apk add --no-cache tzdata
ENV TZ=Europe/Istanbul
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

COPY --from=build-env /publish/* /opt/spark/examples/jars/
COPY --from=build-env /app/secrets/* /opt/spark/secrets/
COPY --from=build-env /app/run.sh /opt/spark/

WORKDIR /opt/spark

CMD [ "/opt/spark/run.sh" ]

And our run.sh script is like this :

#!/bin/bash

bin/spark-submit \
   --master k8s://https://${KUBERNETS_MASTER}:6443 \
   --deploy-mode cluster \
   --name coverage-${MORDOR_ENV} \
   --class Main \
   --conf spark.executor.instances=${NUMBER_OF_EXECUTORS} \
   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
   --conf spark.kubernetes.driverEnv.MORDOR_ENV=${MORDOR_ENV} \
   --conf spark.kubernetes.driver.label.app=coverage-${MORDOR_ENV} \
   --conf spark.kubernetes.container.image.pullPolicy=Always \
   --conf spark.kubernetes.container.image=http://internal-registry-url.com:5000/coveragecalculator:${VERSION} \
   --conf spark.kubernetes.driver.pod.name=coverage-${MORDOR_ENV} \
   --conf spark.kubernetes.authenticate.submission.caCertFile=/opt/spark/secrets/${CRT_FILE} \
   --conf spark.kubernetes.authenticate.submission.oauthToken=${CRT_TOKEN} \
   --conf spark.kubernetes.driver.limit.cores=${DRIVER_CORE_LIMIT} \
   --conf spark.kubernetes.executor.limit.cores=${EXECUTOR_CORE_LIMIT} \
   local:///opt/spark/examples/jars/CoverageCalculator-assembly-0.1.jar

Notice that, you have to place the secrets in secrets/ folder in order to create pods with single image. After the driver pod created, it uses the internal executor pod creation scripts which also placed in spark:base image described also in the spark-kubernetes documentation. We created the pipelines as build-push -> run-on-qa-cluster -> run-on-preprod-cluster -> run-on-prod-cluster The run scripts placed in pipeline, pass the parameters to run.sh and we run like this :

docker run -i --entrypoint /bin/bash -e KUBERNETS_MASTER='yourkubernetesmasterip' -e NUMBER_OF_EXECUTORS=5 -e MORDOR_ENV='qa' -e VERSION=$GO_PIPELINE_LABEL -e CRT_FILE='non_prod_ca.crt' -e CRT_TOKEN='THE_USER_CRT_TOKEN' -e DRIVER_CORE_LIMIT=2 -e EXECUTOR_CORE_LIMIT=2 -v /etc/resolv.conf:/etc/resolv.conf:ro -v /etc/localtime:/etc/localtime:ro 192.168.57.20:5000/coveragecalculator:$GO_PIPELINE_LABEL /opt/spark/run.sh

This command creates one driver pod which has core limit equals to 2. And after that, 5 executor Pods are created by spark:base. Each one of them has also 2 core limits.

Horizontal Pod Autoscaling in Kubernetes using Prometheus

Nov 192018

Hi All,

In this Article, I'm gonna talk about the Kubernetes Horizontal Pod Autoscale object and the Custom Metrics API and how we scale our API's in Hepsiburada.
Before digging into HPA, take a look at https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale

HPA determines if we need more pods and scales the number of Pod. You can scale using the CPU and memory metrics using "K8s Metrics-Server".

However, Kubernetes 1.6 adds support for making use of custom metrics in the Horizontal Pod Autoscaler. With Custom Metrics, you can attach Influxdb/Prometheus or another third party time series db.

There is a nice project and ready to go YAML's in GitHub https://github.com/stefanprodan/k8s-prom-hpa with described and detailed autoscale mechanism deeply.

The Prometheus collects metrics from your applications/pods and stores them on Prometheus. You can use the annotations in your deployment YAML's.

The default path is "/metrics"

annotations:
	prometheus.io/scrape: 'true'
	prometheus.io/path: '/metrics-text'

The Custom Metrics API is responsible for collect data from Prometheus and passes them to HPA.

undefined

After you connect your HPA, you can test and verify its working properly.

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

The exposed metrics that also exists in Prometheus are shown below.

undefined

For example, the "application_httprequests_active" metric is exposed by our API. Also, this can be used with HPA like this.

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
spec:
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: podinfo
  minReplicas: 5
  maxReplicas: 40
  metrics:
  - type: Pods
    pods:
      metricName: application_httprequests_active
      targetAverageValue: 1000

Here are the instances of our Grafana Dashboards which is connected to Prometheus and shows autoscale's in Kubernetes. You can inspect the Pod memory and the newly created Pods can be seen there. At "07:56" and "08:00" people started to use Search API more and after scaling process, metrics become normal.

undefined

Deploying .NET Core app to Kubernetes using GoCD

Jul 172018

It's been a long time since I have written my last post. In this period, I dig into Kubernetes mostly. Kubernetes is a deployment automation system that manages containers in distributed environments. It simplifies common tasks like deployment, scaling, configuration, versioning, log management and a lot more.

In this article, you will find how can a dotnetcore app put into kubernetes using blue-green deployment and using the pipeline as code. In this case, I used GoCD and their yaml plugin: https://github.com/tomzo/gocd-yaml-config-plugin

First of all, you have to dockerise your dotnetcore app. Here is a snippet for example.

FROM microsoft/dotnet:2.0.5-sdk-2.1.4 AS build-env

WORKDIR /workdir
COPY . /workdir

RUN dotnet restore ./WebApp.sln
RUN dotnet test ./src/tests/WebApp.IntegrationTests
RUN dotnet test ./src/tests/WebApp.UnitTests
RUN dotnet publish ./src/WebApp/WebApp.csproj -c Release -o /publish

FROM microsoft/dotnet:2.0.5-runtime
WORKDIR /app
COPY --from=build-env ./publish .

EXPOSE 3333/tcp
CMD ["dotnet", "WebApp.dll", "--server.urls", "http://*:3333"]

After that, put a "kubernetes" folder in your Project's root. Folder structure can be like this:

- kubernetes
    --  deployment.yaml
    --  service.yaml
    --  switch_environment.sh
- src
    ....
- ci.gocd.yaml
- Dockerfile
- WebApp.sln

Your "deployment.yaml" should be like this :

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: webapp-${ENV}
spec:
  replicas: ${PODS}
  template:
    metadata:
      labels:
        app: webapp
        ENV: ${ENV}
    spec:
      containers:
      - name: webapp
        image: yourdockerregistry:5000/webapp:${IMAGE_TAG}
        resources:
          requests:
            cpu: "750m"
        ports:
        - containerPort: 3333
        readinessProbe:
          tcpSocket:
              port: 3333
          initialDelaySeconds: 15
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /status
            port: 3333
          initialDelaySeconds: 15
          periodSeconds: 10
      terminationGracePeriodSeconds: 30
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-${ENV}
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: webapp-${ENV}
  minReplicas: 10
  maxReplicas: 25
  metrics:
  - type: Pods
    pods:
      metricName: cpu_usage # Metrics Comming From Prometheus. List of metrics : kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
      targetAverageValue: 0.6 # If average pod CPU over %50, Pods will be scaled.

In this snippet, you will see some ENV variables for parametric values like image tag, deployment environment, blue-green deployment etc..
You can also use helm for rolling deployments, version bump-ups but I will use much more simple thing: "envsubst"

The other mechanism is horizontal scaling in the cluster. You can merge deployment and scaling in one yaml.
In this instance, I used K8s' custom metric API.

Take a look if you wanna this or just skip it: https://github.com/stefanprodan/k8s-prom-hpa

And the service.yaml should be like this :

apiVersion: v1
kind: Service
metadata:
  name: webapp-svc
spec:
  type: NodePort
  ports:
  - port: 3333
    nodePort: 30333
    targetPort: 3333
    protocol: TCP
    name: http
  selector:
    app: webapp
    ENV: ${ENV}

We will use K8s' selectors in order to get blue-green switch for deployments. The selector object will take the suitable pods and bind into service.
And I used nodePort because of binding services to Load Balancer externally.

You can bind like this :
AGENTIP1:30333 http://servicedns.com
AGENTIP2:30333 http://servicedns.com
AGENTIP3:30333 http://servicedns.com

You don't have to give each agent's IP to load balancer because K8s have also internal Load-Balancing. (That's not a good approach. Managing in Loadbalancer in K8s simply better)

Your "switch_environment.sh" file can be like this.

#!/bin/bash

if [ -z "$1" ]
  then
    echo "No argument supplied"
    exit 1
fi

if ! kubectl get svc $1
  then
    echo "No service found : ${1}"
    exit 1
fi

ENVIRONMENT=$(kubectl describe svc $1 | grep ENV | awk '{print $2}' | cut -d"," -f1 | cut -d"=" -f2)

if [ $ENVIRONMENT == "blue" ]; then
    ENV=green envsubst < service.yaml | kubectl apply -f -
    echo "Switched to green"
else
    ENV=blue envsubst < service.yaml | kubectl apply -f -
    echo "Switched to blue"
fi

After all, bind all these items in one gocd.yml file.

format_version: 2
environments:
  WebAPI:
    pipelines:
      - webapp-build-and-push
      - webapp-deploy-to-prod-blue
      - webapp-deploy-to-prod-green
      - webapp-switch-environment

pipelines:
  webapp-build-and-push:
    group: webapp
    label_template: "1.1.${COUNT}"
    materials:
      project:
        git: http://github.com/example/webapp.git
        branch: master
        destination: app
    stages:
      - buildAndPush:
          clean_workspace: true
          jobs:
            buildAndPush:
              tasks:
               - exec:
                  working_directory: app/build-scripts
                  command: /bin/bash
                  arguments:
                    -  -c 
                    - './build-and-publish.sh'

  webapp-deploy-to-prod-blue:
    group: webapp
    label_template: "${webapp-build-and-push}"
    materials:
      webapp-build-and-push:
        type: pipeline
        pipeline: webapp-build-and-push
        stage: deploy
      project:
        git: http://github.com/example/webapp.git
        branch: master
        destination: app
    stages:
      - build:
          approval:
            type: manual
          clean_workspace: true
          jobs:
            build:
              tasks:
               - exec:
                  working_directory: app/kubernetes
                  command: /bin/bash
                  arguments:
                    -  -c 
                    - 'ENV=blue IMAGE_TAG=$GO_PIPELINE_LABEL PODS=10 envsubst < deployment.yaml | kubectl apply -f -'
               - exec:
                  working_directory: app/kubernetes
                  command: /bin/bash
                  arguments:
                    -  -c 
                    - 'kubectl rollout status deployment webapp-blue'

  webapp-deploy-to-prod-green:
    group: webapp
    label_template: "${webapp-build-and-push}"
    materials:
      webapp-build-and-push:
        type: pipeline
        pipeline: webapp-build-and-push
        stage: deploy
      project:
        git: http://github.com/example/webapp.git
        branch: master
        destination: app
    stages:
      - build:
          approval:
            type: manual
          clean_workspace: true
          jobs:
            build:
              tasks:
               - exec:
                  working_directory: app/kubernetes
                  command: /bin/bash
                  arguments:
                    -  -c 
                    - 'ENV=green IMAGE_TAG=$GO_PIPELINE_LABEL PODS=10 envsubst < deployment.yaml | kubectl apply -f -'
               - exec:
                  working_directory: app/kubernetes
                  command: /bin/bash
                  arguments:
                    -  -c 
                    - 'kubectl rollout status deployment webapp-green'

  webapp-switch-environment:
    group: webapp
    label_template: "${COUNT}"
    materials:
      webapp-build-and-push:
        type: pipeline
        pipeline: webapp-build-and-push
        stage: deploy
      project:
        git: http://github.com/example/webapp.git
        branch: master
        destination: app
    stages:
      - build:
          approval:
            type: manual
          clean_workspace: true
          jobs:
            build:
              tasks:
               - exec:
                  working_directory: app/kubernetes
                  command: /bin/bash
                  arguments:
                    -  -c 
                    - './switch_environment.sh webapp-svc'

Now, You have 4 pipelines:

webapp-build-and-push
webapp-deploy-to-prod-blue
webapp-deploy-to-prod-green
webapp-switch-environment

You can define your build script to build, dockerise the application.
If you have Test, Staging environments, put them in the "gocd.yaml" too. (In order to simplify, I removed those lines)
That's it! After that, you have :

Dockerised dotnetcore app
Kubernetes Deployment pipelines
Blue-Green Switch Pipeline which controls kubernetes service (You have to configure kubectl for gocd agents)
Horizontal Pod Autoscaler (CPU based autoscale mechanism in the cluster)

About Istio

Dec 122017

Hi Guys, in this article I'm gonna talk about how microservices managed by Istio and why we should prefer Istio. Before Istio, the people, who own microservice architecture, are complaining about management of microservices, visualizing service mesh, monitoring of distributed services, service discovery and so on. The announcement of Istio came at a good time because of those issues.

Istio is a platform that hosted on the top of your kubernetes cluster. You deploy your applications(containers) with a special sidecar proxy throughout your environment that intercepts all network communication between microservices, configured and managed using Istio’s control plane functionality. Therefore, you can use Envoy for managing the network, routing rules etc... Service Discovery is also supported by Istio. You don’t have to think about whether or not the network was configured correctly.

I did the POC on their website, described in here: https://istio.io/docs/guides/bookinfo.html. It's clear to understand because it's hosted on the top of kubernetes. You can use your favorite Cloud Provider. (I used GCP's fully managed kubernetes cluster). The content-based request routing(A/B testing for ex.), traffic shifting, fault injection are the most satisfying parts of using Istio.

Istio also has their ready-to-use plugins and add-ons, you can have a look at the addons parts. Grafana-Prometheus for the monitoring, jeager for the tracing, dotviz for the service mesh... All of them are ready to inject. You manage and do custom configurations via their provision ymls.

Check out of those URL's if you are interested in.

https://istio.io/docs/concepts/what-is-istio/overview.html

https://github.com/istio/istio

Tags:
istio, what is istio, istio nedir, kubernetes

some meme tech buzzwords :)

Aug 282017

>devops
non-programmer babysitting servers

>frontend developer
web programmer that can't do their job all the way

>fullstack developer
web programmer that can do their job all the way

>backend developer
a regular programmer

>systems architect
programmer that is too hot shit to actually make programs

>information security analyst
programmer that used to be script kiddie who wanted to be a hacker

>systems engineer
programmer who knows perl and RHEL (that's it)

>network engineer
non-programmer glorified tv cable guy

About Continuous Delivery

Jul 262017

In this article, I wanna focus on Continuous Delivery. As described by Martin Fowler, Continuous Delivery is a software development discipline where you build software in such a way that the software can be released to production at any time. This means our packages should be battle tested, reliable, automatically deployable and configurable. That's why we do Continuous Integration. Frequent builds, in turn, lead to more frequent releases. At that point, I ' m on the side of trunk based development rather than git flow.In my opinion, each commit should be deployed to the environments instead of waiting for a silly manual merging operation. We gain more agility and each change becomes simpler and lower risk.

From the business perspective, this idea is the simply perfection. Because this allows organizations to adjust rapidly to changing market conditions.

From the developer perspective, we need to develop and deploy more carefully, adding more test suites not only unit tests but also integration, contract, security etc. to our pipeline is good. Maybe, we should do more pair programming and this is like continuous code review. Frequent production releases make us more aware and we discover new approaches. The approaches we found, actually, Continuous Delivery best patterns.

Neden DevOps ?

Sep 112016

DevOps...

Yazılım dünyasında son zamanların yükselen trendi gibi görünse de aslında özünde bir takım konseptler ve disiplin süreçlerini içeriyor. Bunların toplamına ise bir kültür diyebiliriz aslında.

Kelime anlamı ise Development ve Operations süreçlerinin birbiri ile kolobrasyonu.

Yukarıdaki anlam biraz havada kaldıysa konuyu daha derinlemesine inceleyelim.

Aslında bana en yakın gelen tanım şu şekilde : Yazılımcının kendi yazdığı kodu deploy etmesinden sorumlu olması. Buradan hareketle, DevOps, yanlızca bazı kişilerin üzerinde olan bir sorumluluk değil, çalıştığınız kurumun tüm geliştiricilerinde olması gereken bir özelliktir. Bu pencereden baktığımızda ise DevOps, aslında Agile,Waterfall gibi bir süreç.

Spotify bunu Agile kültürünün bir parçası olarak görüyor ve takımları, ürünlerinin design,development,deployment (end-to-end) vs. tüm süreçlerinden sorumlular. Takımları tamamı ile Cross-Functional. Takımların operasyonel işlerinin automate edilmesi ile başlayan otomasyon süreci, daha sonra Machine Management'e , Immutable Infrastructure'a , Infrastructure as Code'a kadar gidiyor. Bu ekipler otomasyonu sağlayan tooları yazıyorlar ve de Site Reliability Engineer'lardan oluşuyorlar. Amaç; Self-healing systemler ve auto-scalable infrastructure. Bu noktada insan faktörünün manuel konfigürasyon değişikliklerini 0'a indirdiğini ve artık datacenter'leri yöneten Software System'leri görüyoruz. Öyle ki Google'nın Borg'u, birden fazla Datacenter'da ki tüm operasyonları başarı ile orchestrate ediyor. Yönetimde Sys. Adminler yeride Software-Systemler var. Bir benzer teknoloji için; http://mesos.apache.org/

Dikkat edilmesi gereken nokta, Site Reliability Engineer'lar ürün deliver eden ekipleri bloklayan bir ekip değil aksine onlar için teknoloji geliştiriyorlar ve bu teknolojileri Developer'ların kullanımına sunuyorlar.

Bu noktada kulağa ilginç gelen şey, piyasada aranan "DevOps Mühendisleri"'nin aslında Site Reliability Engineer'lar olması.(Bkz : Love DevOps? Wait until you meet SRE)

Bir sonraki yazımda Continuous Delivery ile DevOps ilişkisini ele alacağım.

Konu ile ilgili :

https://en.wikipedia.org/wiki/Continuous_delivery

http://martinfowler.com/bliki/ContinuousDelivery.html

https://www.youtube.com/watch?v=dxk8b9rSKOo

https://air.mozilla.org/continuous-delivery-at-google/