Job Title: Machine Learning Operations (ML Ops) Engineer
Location: Abu Dhabi, UAE
About Us: AI71 is a dynamic applied research team dedicated to creating reliable, ethical, and innovative AI-driven solutions tailored for knowledge workers. As part of the Technology Innovation Institute (TII), we leverage cutting-edge research to develop solutions that make a meaningful impact across various industries. Our vision is to shape the future of AI, driving progress in fields such as healthcare, finance, education, and beyond.
Position Overview: We are looking for a talented and motivated Machine Learning Operations (ML Ops) Engineer to join our team. In this role, you will work closely with data scientists, AI researchers, and software engineers to design, implement, and maintain the infrastructure and systems that support the deployment, monitoring, and scaling of machine learning models in production environments. You will be a key player in ensuring that our AI solutions are robust, reliable, and scalable.
Key Responsibilities:
· Collaborate with data scientists and software engineers to integrate machine learning models into production systems.
· Design and maintain end-to-end ML workflows, including data pipelines, model training, deployment, and monitoring.
· Develop and automate systems for continuous integration/continuous deployment (CI/CD) of machine learning models.
· Optimize model performance and ensure reliability and scalability in production environments.
· Build and maintain infrastructure for large-scale data storage, model deployment, and real-time analytics.
· Implement monitoring systems to track model performance, detect drift, and ensure data integrity.
· Work on cloud platforms (e.g., AWS, Azure, GCP) for model deployment and orchestration.
· Troubleshoot and resolve issues related to ML models and their deployment.
· Stay updated on the latest ML Ops tools, technologies, and best practices to continually improve workflows.
Required Skills and Qualifications:
· Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or related field.
· Proven experience in deploying and maintaining machine learning models in production environments.
· Strong experience with ML Ops tools and frameworks (e.g., MLflow, Kubeflow, TFX, Seldon, etc.).
· Proficiency in Python and experience with ML libraries such as TensorFlow, PyTorch, Scikit-learn, etc.
· Experience working with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
· Familiarity with CI/CD practices and tools (e.g., Jenkins, GitLab CI, CircleCI).
· Strong knowledge of data storage technologies (e.g., SQL, NoSQL, cloud storage solutions).
· Experience in model monitoring, performance tuning, and troubleshooting in production environments.
· Understanding of version control systems (e.g., Git) and collaboration in a team environment.
· Excellent problem-solving skills and attention to detail.
Preferred Qualifications:
· Experience with automation frameworks such as Terraform, Ansible, or similar.
· Knowledge of DevOps principles and practices.
· Experience in managing and deploying models on large-scale distributed systems.
· Familiarity with AI ethics, explainability, and fairness in ML models.
· Experience in a fast-paced, research-driven environment.
Why Join AI71?
· Be part of an innovative and collaborative team working on impactful AI solutions.
· Opportunity to work on cutting-edge research and real-world AI applications.
· Access to state-of-the-art tools and technologies.
· Competitive salary and benefits package.
· A dynamic work environment that fosters personal and professional growth.