Platform Engineer

Company:  asobbi
Location: London
Closing Date: 19/10/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
Position Overview:

As a Platform Engineer at an HPC (High Performance Computing) cloud provider, you will play a critical role in the design, development, and maintenance of their OpenStack platform. You will be responsible for ensuring the seamless operation of our cloud services, supporting the customers' high-demand computing needs, and implementing automation solutions to optimise efficiency and scalability. This position requires a strong foundation in Linux administration, proficiency in scripting languages such as Python and Bash, expertise in automation tools like Ansible, and experience with operating platforms at scale using cluster management systems like Kubernetes or Slurm. Additionally, you will be actively involved in managing Kubernetes clusters, troubleshooting networking issues, and deploying infrastructure as code using CI/CD pipelines. Key Responsibilities: Linux Administration: Demonstrated expertise in Linux administration, preferably holding a Red Hat Certified Engineer (RHCE) certification. Proficient in managing and troubleshooting Linux-based systems to ensure optimal performance and reliability.

Development Skills: Proficiency in scripting languages such as Python and Bash for automation, tooling, and system administration tasks.

Automation and Configuration Management: Hands-on experience with Ansible for automation of infrastructure provisioning, configuration management, and application deployment.

Platform Operations at Scale: Experience in operating platforms at scale, utilizing cluster management systems such as Kubernetes or Slurm to manage high-performance computing workloads efficiently. Manage, maintain, and optimize Kubernetes clusters, ensuring high availability and efficient resource allocation.

Kubernetes Management: Deploy and manage Kubernetes clusters, supporting high-performance computing workloads, and ensuring scalability, reliability, and security. Troubleshoot and resolve Kubernetes-specific issues related to deployment, networking, and performance.

Networking Skills: Strong networking skills including troubleshooting network issues, understanding network topology, protocols, and ensuring efficient traffic flow from source to destination.

Infrastructure as Code (IaC): Proficiency in implementing Infrastructure as Code (IaC) principles, deploying and managing infrastructure using version-controlled repositories (e.g., Git) and CI/CD pipelines.

Operations Focus: More emphasis on operational tasks than development, ensuring the reliability, scalability, and security of our cloud platform.

Qualifications and Experience: Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience). Proven experience in a similar role, preferably within a cloud computing or HPC environment. Experience working with OpenStack. Strong problem-solving skills and the ability to troubleshoot complex technical issues. Excellent communication and interpersonal skills, with the ability to work effectively in a collaborative team environment. Capacity to adapt to a fast-paced, dynamic work environment and prioritize tasks effectively. Certifications such as Certified Kubernetes Administrator (CKA) or Certified Slurm Administrator (CSA) would be advantageous.

Apply Now
Share this job
asobbi
An error has occurred. This application may no longer respond until reloaded. Reload 🗙