Mô tả công việc
Summary of role
Helping setup the management procedures & tools for continuous software development, test and deployment; ensuring they are scalable and failure- resistant
Organizing the infrastructure health automated reporting across all systems and interfacing with business & technical (both internal) stakeholders for alerting + response.
Creating/embedding failure management processes & tools so outages can be detected and handled with no / minimal loss of service availability
Main responsibilities:
Design, validate, automate, and document many infrastructure management processus:
Encourage and build automated procedures wherever possible.
Set- up & manage continuous integration, and constant deployment activities and tools (Git, Jenkins, etc), along with building own solutions when more efficient
Inventory existing systems’ dependencies & scaling&039;s bottlenecks and propose solutions to be discussed with the CTO and larger System Administrators team:
Ability to code and script in Linux / Unix environments.
Design, build and operate cloud services (e.g. AWS, Google, SoftLayer, Azure…)
Hunt and identify weak points & redundancy improvements opportunities, and propose then implement solutions:
Use industry standard “on the shelf” tools and in- house built solutions.
Monitor all Back- End infrastructure- Servers, APIs, Databases, Networks, etc.-
Additionally:
Have the technical skill to review, verify, and validate software code (Py, Go, etc.)
Help write Business Continuity and Disaster Recovery white- books
Ensure that procedures are updated & relevant, that involved teams know and apply them (maintain a Knowledge Base, do Sharing sessions, run follow- up drills)
Non- technical duties:
Support an Incident Management and Root Cause Analysis culture in Devs teams.
Work in cooperation with Support & QA teams to improve our response time to incidents
Work closely with the CTO to offload him on infrastructure projects and daily tasks