Service Reliability Engineer for Cloud Commerce Operating System (m/f/d)

About us

The Spryker Systems GmbH is a fast-growing technology company, offering leading manufacturers, brands and sellers of all industries a flexible commerce solution along all customer facing touchpoints. From online shop and mobile to voice, chat bot, blockchain and IoT use cases. Our modern offices are located in the German digital metropolis Berlin and Hamburg.
The international Spryker team is constantly working with new exciting customers, technologies and innovative approaches and is looking for talented employees, to join us revolutionizing the digital commerce world.

In a Nutshell

Are you an experienced Service Reliability Engineer with strong ownership skills? Do you think that cloud-native is not just technology, but a mindset? Do you want to put the latest technologies to use for hundreds of customers in different industries?

Join us as a Service Reliability Engineer to help us build the next generation of cloud and composition platforms to revolutionize the world of transactional business models.

We are open-minded, pragmatic, and agile above all. If you think you have the same attitude, join our Spryker Technology Team and help us to revolutionize the world of commerce.

Your challenges

Responding to production live site incidents accordingly to the established on-call schedule
Communicate with incident managers, customers or other stakeholders the status, progress, and forecast for solving acute problems
Communicate with the product development team to produce requirements for infrastructure, networking, and operations toolsets necessary to provision and maintain product lines
Solving day-to-day operational problems with the production environment
Controlling and ensuring SLAs, SLOs, RPOs, and RTOs
Automate common tasks and processes
Writing of documentation, articles, and How-Tos
Analyze problems in the full stack starting from the virtual hardware ending with specific applications: read and analyze logs and metrics to identify root causes and resolve them permanently or with a workaround
Ensure robust, stable, and secure back-end infrastructure to support the product portfolio.
Build deep monitoring coverage, implementing inside-out, outside-in, and machine learning-based monitors pushing toward early discovery and auto resolution to push the system toward 99.999 SLA
Design, build and release platform updates, strive to full automation, regression detection, etc.
Staying up to date with industry trends

Your profile

Computer Science, Software Engineering Degree or equivalent experience
Customer obsessed
Vast experience with AWS or other major public cloud platforms
Experience and willingness to participate in 24\7 on-call duty as part of the team, sharp thinking, and troubleshooting skills even during critical incidents
Experience working with high-scale complex, cloud-based production environments
Experience with managing incidents and full-stack problem analysis and solving
Experience with configuration management tools such as Terraform, Ansible
Good knowledge in NewRelic/Blackfire/Tideways/any APM, Production monitoring
Experience in automation and willingness to automate routine tasks
Excellent communication skills, both internally and externally
Experience in writing technical articles, How-Tos, and customer-facing communications.
Experience working with Git, branching, git-flow
Basic knowledge of relational database management systems
Upper-intermediate English

Interested in this position?

Apply now

Other jobs

mediaire GmbH

Responsable Commercial France - IA en Radiologie & Imagerie Médicale (all)