Mô tả công việc
What do we do?
As a pioneer for digital transformation GFT develops sustainable solutions across new technologies – from cloud engineering and artificial intelligence to blockchain/DLT. With its deep technological expertise, strong partnerships, and comprehensive market know- how GFT offers advice to the financial and insurance sectors, as well as in the manufacturing industry. Through the intelligent use of IT solutions GFT increases productivity and creates added value for clients. Companies gain easy and safe access to scalable IT- applications and innovative business models.
Who are we?
Having started in Germany in 1987, GFT Technologies has grown to become a trusted Software Engineering and Consulting specialist for the international financial industry, counting many of the world’s largest and best- known Banks as our clients. We are an organization that empowers you to not only explore but raise your potential and seek out opportunities that add value. At GFT, diversity, equality, and inclusion are at the core of who we are. Ensuring a diverse and inclusive working environment for all communities is one of the main pillars of our diversity strategy, based on our core values and culture. We have been certified for 2022/23 as a ‘Great place to work’ in the APAC region. So, if you want to have the opportunity to work with an outstanding and progressive organization this position could be right for you.
Role Summary
As a Site Reliability Engineer (SRE) you will play a critical role in ensuring the reliability, availability, and performance of our systems and services. You will work closely with development and operations teams to build and maintain observable, scalable, reliable infrastructure on AWS, utilizing Kubernetes for orchestration and Python for automation. Proficiency in resilience testing and capacity management is also essential for this role.
Key Responsibilities
Develop and maintain automation scripts and tools using Python to improve operational efficiency.
Collaborate with development teams to ensure best practices for observing, deploying and maintaining applications.
Implement and manage monitoring and alerting solutions to proactively identify and resolve issues.
Participate in on- call rotations to provide 24/7 support for critical systems.
Monitor system performance and reliability, and respond to incidents to minimize downtime.
Conduct resilience testing to ensure system reliability under varying conditions.
Perform capacity planning and management to ensure systems can handle growth and peak demand.