We - Schwarz IT Hub - are the Romanian branch of Schwarz IT. We offer high value IT services for the entire companies of Schwarz Group, which includes Lidl, Kaufland, Schwarz Produktion and PreZero. As a digital hub and go-to technology provider for the group, we ensure access to innovative and scalable IT solutions, while always considering the latest technological developments. We are very proud of the business footprint created in Romania and we are looking forward to starting towards a bright future within the Schwarz family.
DevOps Engineer AI - Focus on Operations & Infrastructure (m/f/d)
Our AI Hub develops and operates innovative, large-scale Computer Vision solutions. Our core architecture is already up and running, built on a robust cloud-native infrastructure:
-
Orchestration & Deployment: Kubernetes, Argo CD, and Argo Workflows for complex ETL and model training pipelines.
-
Data & Storage: PostgreSQL for metadata repository and S3 Object Storage for large-scale image data.
-
Infrastructure: Cloud-native environments (GCP, Azure, or StackIT), managed via Terraform.
Your primary mission will be to support the stability, monitoring, and ongoing maintenance of this existing platform, ensuring smooth daily operations.
The Impact You Will Create
-
Support System Operations: You assist in ensuring the stable, secure, and reliable operation of our existing cloud infrastructure and production Computer Vision pipelines.
-
Pipeline Maintenance and Optimization: You manage, maintain and optimize existing CI/CD processes via Argo CD and help run ETL and ML training pipelines using Argo Workflows.
-
Monitoring & Observability: You work with our monitoring and observability stack (Prometheus, Loki, Grafana) to track system health, identify bottlenecks, and resolve technical issues.
-
Infrastructure and scaling: You help maintain and update our existing infrastructure landscape using Terraform and contribute to enhancing it.
-
Collaboration with AI Engineers: You work closely with our AI development team, support them by provisioning developer VMs, and help ensure that new machine learning models are smoothly integrated into the production environment.
Experience And Skills You Will Need
-
You have a solid base in software engineering. You bring about 2 to 4 years of professional experience in DevOps, MLOps, or Data Engineering, with practical experience in running and maintaining software solutions.
-
Containerization & Orchestration: You have solid, hands-on experience working with Docker and Kubernetes (ideally 2+ years).
-
Infrastructure as Code: You have practical experience working with Terraform (ideally 1–2 years).
-
Cloud Platforms: You possess practical experience with at least one major cloud provider (GCP, Azure or StackIT).
-
Data & Storage: Familiarity with relational databases (PostgreSQL) and object storage (S3).
-
Programming & ML Foundations: You have good programming skills in Python and a foundational understanding of AI/Deep Learning workflows
-
Mindset: You are a proactive team player who enjoys solving technical challenges reliably, writing clean code, and learning new cloud-native technologies on the job.
Bonus skills
-
First exposure to Argo CD or Argo Workflows is a strong advantage.
-
Interest or first touchpoints with PyTorch or MLOps frameworks like MLflow/Kubeflow is a plus
DevOps Engineer AI - Focus on Operations & Infrastructure (m/f/d)