48082
Make an amazing climb in your career in an international team of experts. Our company provides technological services for the whole Schwarz group of more than 30 countries in Europe and the US. Our vision is to be the leading ecosystem for a better life. We built the European sovereign cloud STACKIT. With XM Cyber we set new standards in differing cyber crimes. We run AI better than anyone. With us you will find a variety of opportunities to grow and do your best at your calling – IT. We exist to improve life with our products and services - for today's generation and future generations. We act future proof!
The impact you will create:
- Infrastructure as Code (IaC): You use Terraform to architect and provision the underlying cloud infrastructure that powers our Kubernetes clusters, PostgreSQL databases, and Object Storage, ensuring environment parity from staging to production.
- Kubernetes Orchestration: You manage and evolve our production environments using Helm and GitOps workflows to keep the openDesk suite running smoothly across multi-tenant clusters.
- Disaster Recovery & Backups: You design and implement robust backup strategies in a cloud-native world using Velero. You ensure that cluster resources, persistent volumes, and application states are backed up to Object Storage (S3) and can be restored in minutes.
- Database Reliability: You take ownership of the persistence layer, tuning and securing PostgreSQL and MariaDB clusters to ensure high availability and data integrity.
- Identity & Access: You seamlessly integrate IdP/SSO solutions (OIDC/SAML) like Keycloak to provide a secure and unified login experience.
- Observability Engineering: You build a world-class monitoring stack using Prometheus (PromQL) and Grafana to turn raw system data into actionable insights and proactive alerts.
- Automation First: You replace manual "toil" with elegant code in Python, Go, or Bash, building the tooling that makes our deployments self-healing and repeatable.
Experience and skills you will need:
- Provisioning Mastery: Expert-level knowledge of Terraform. You know how to manage state, write modular code, and automate infra-level changes.
- Cloud Native Expertise: Deep experience with Kubernetes (CKA/CKS preferred). You understand how pods, services, and ingresses truly interact.
- Data Protection: Proven experience with Velero for Kubernetes backup/restore and working with Object Storage (S3-compatible API).
- Persistence Layer: Production-level experience with PostgreSQL and MariaDB, specifically regarding High Availability (HA) and backup/recovery strategies.
- Observability: Advanced skills in Prometheus and building high-fidelity Grafana dashboards that tell a clear story of system health.
- Identity Management: A strong understanding of IAM/SSO workflows and protocols like OAuth, OIDC, and SAML.
SITE RELIABILITY ENGINEER (SRE) - OpenDesk Platform (m/f/d)