REF # 20380

 324 total views,  1 views today

Cape Town

Systems Reliability Engineer/DevOps

Enquire / Apply


A leader in Cloud-based Procurement Solutions seeks the expertise of a strong technical Systems Reliability Engineer/DevOps to ensure the availability, performance, monitoring, and incident response, amongst other things, of platforms and services. You will require a tertiary qualification in Computer Science/Engineering or a related discipline or equivalent experience, at least 2 Years’ experience with AWS services – EC2, RDS, S3, Server Security, Setting up backup and stability systems, Windows, Linux, Docker, Git, SQL Server, able to manage & install SSL Certificates and write provisioning scripts in Ansible, Chef, Puppet, Terraform.



Security –

Security Management and configuration, response, and breach mitigation.

Ensure that all platforms and systems are in line with the required Security Standards.

Ensure critical system security using best in class cloud security solutions.

Remediate vulnerabilities on various environments.


Compliance –

IT governance compliance.

Help to ensure the delivery of infrastructure solutions.

Complete and perform daily, monthly, and yearly audit requirements.

Performance tuning and cost management.


Workload Management –

Provide business as usual support to your particular workload and environment.

Provide support on escalations for investigation and resolution of technical issues.

Help to support the reduction in the number of “Critical” incidents and investigate possible causes and fixes.

Monitor daily activities, including Change Requests, Project deliverables and Jira ticket responsibilities – Adopting a practical, methodical approach to identify and resolve issues.

Work to implement and make sure environments are structured and built with redundancy and “key component” failures in mind.

Make sure environments and workloads are run as cost effective as possible.

Evaluate new technologies to improve, but not limited to, improving process flow, improving uptime, improving security, improving go-to-market time.

Support maintenance of layered software, and infrastructure.

Identify where applications or hardware are having performance/reliability issues; analyse and formulate a proposed method to correct issues.

Work on and maintain continuous integration systems.

Application availability.

Performance Tuning and Cost Management.

Automated Deployment and Configuration Management.


Technical Processes and Development –

Develop (where and when necessary) and maintain support procedures.

Develop (where and when necessary) and maintain operating policies and procedures for workloads under management.

Develop (where and when necessary) backup and recovery solutions for workloads under management.

Automate tasks that can and should be automated.

Explore ways to constantly improve quality of existing services, processes and systems in order to maintain system effectiveness and reliability.

Work as part of the Engineering team to create a robust and responsive deployment and Integration process (CI/CD).

Version Control.


Communication –

Incident response.

Manage technical escalations, as necessary.

Communicate relevant technical business solutions to the identified internal or external stakeholder.



Qualifications –

Graduate-level qualified in Computer Science, Engineering or a related discipline or equivalent experience.

Technical certifications in key infrastructure services and applications (Azure Certifications in System Design or Administration advantageous).

IT governance policy ISO270001 and SOC advantageous.


Experience/Skills –

2 Years’ experience with AWS services.

SRE experience specifically in general cloud application development and hosting with a focus on Microsoft Azure Cloud.

Reliability tools for preventative and predictive techniques.

Gateways. Application firewalls.

Log management and monitoring.

Microsoft advanced breach and security detection tooling.

Strong Azure networking, server topologies, TCP IP, virtual networks experience and exposure.

DBA experience and knowledge.

Very good understanding of web app environments and server security.

Solid experience in building highly scalable server architectures.

Infrastructure and especially Platform as Service.

Automation experience with configuration management tools (Ansible, Chef, Puppet, Terraform etc.).


Be an expert in –

Working with AWS – EC2, RDS, S3

Server Security

Setting up backup and stability systems


Solid experience in:

Virtual machines – Windows and Linux


GIT source code repository

SQL server administration and maintenance

Setting up and securing highly available solutions

Managing and installing SSL Certificates

Configuring firewalls and VPNs

Configuring Nagios, new relic or any other monitoring software

Writing provisioning scripts in Ansible, Chef, Puppet, Terraform etc.


Have the ability to:

Write bash/PowerShell scripts

Scripted and automated deployment and configuration management of different projects on different environments (PowerShell etc.)

Azure DevOps for automated deployment

Understand complex software and system architecture

Set up multi-tier architectures


While we would really like to respond to every application, should you not be contacted for this position within 10 working days please consider your application unsuccessful.




When applying for jobs, ensure that you have the minimum job requirements. Only SA Citizens will be considered for this role. If you are not in the mentioned location of any of the jobs, please note your relocation plans in all applications for jobs and correspondence.

Degree, Diploma, Permanent

Please complete the following form.

Upload Your CV (MS Word)