Job Title: Data Engineer: Spark/Scala/Python Job Location: Dubai – UAE Job Duration: 12 months extendable Job Description: We are looking for a highly skilled and motivated Spark Data Engineer to join our team. The ideal candidate will have a strong background in Apache Spark, data ingestion, data processing, and data integration, and will be responsible for developing and maintaining our dynamic data ingestion framework using the Spark framework. The candidate should have expertise in building scalable, high-performance, and fault-tolerant data processing pipelines using Spark, and be able to optimize Spark jobs for performance and scalability. The candidate should also have experience in designing and implementing data models, handling data errors, implementing data quality and validation processes, and integrating Spark applications with other big data technologies in the Hadoop ecosystem. Responsibilities: • Develop and maintain a dynamic data ingestion framework using Apache Spark • Implement data ingestion pipelines for batch processing and real-time streaming using Spark’s data ingestion APIs. • Design and implement data models using Spark’s DataFrame and Dataset APIs • Optimize Spark jobs for performance and scalability, including caching, broadcasting, and data partitioning techniques. • Implement error handling and fault tolerance mechanisms to handle data errors, processing failures, and system failures in Spark applications. • Implement data quality and validation processes, including data profiling, data cleansing, and data validation rules using Spark’s data processing and data validation APIs. • Integrate Spark applications with other big data technologies in the Hadoop ecosystem, such as Hadoop, Hive, HBase, Kafka, and others. • Ensure data security by implementing data encryption, data masking, and data access controls in Spark applications. • Use version control systems, such as Git, for source code management, and implement DevOps practices, such as continuous integration, continuous delivery, and automated deployments, in Spark application development workflows. Qualifications: • Bachelor’s or master’s degree in computer science, Data Engineering, or a related field • Strong proficiency in Apache Spark, including Spark Core, Spark SQL, Spark Streaming, and Spark MLlib, with multiple production developments and deployment experience. • Proficiency in either Scala or Python programming languages, with knowledge of functional programming concepts • Experience in developing and maintaining dynamic data ingestion frameworks using Spark • Experience in data processing, data integration, and data modeling using Spark’s DataFrame and Dataset APIs • Knowledge of performance optimization techniques in Spark, including caching, broadcasting, and data partitioning • Experience in implementing error handling and fault tolerance mechanisms in Spark applications • Knowledge of data quality and validation techniques using Spark’s data processing and data validation APIs • Familiarity with other big data technologies in the Hadoop ecosystem, such as Hadoop, Hive, HBase, Kafka, etc. • Experience in implementing data security measures in Spark applications, such as data encryption, data masking, and data access controls. • Strong problem-solving skills and ability to troubleshoot and resolve issues related to Spark applications. • Proficiency in using version control systems, such as Git, and implementing DevOps practices in Spark application development workflows. • Excellent communication and collaboration skills, with the ability to work effectively in a team-oriented environment.