Expert around Spark architecture on top of Hadoop. Taught Spark developments (with DataSets and RDDs) by using MongoDB, CSV, JSON and XML libraries. Taught agile prototyping with Notebooks such as Jupyter and Zeppelin.
Installation and configuration of a plug-and-play Datalab environment enabling executions on multiple different clusters whatever the target Hadoop distribution types (Hortonworks, Cloudera, MapR or pure Hadoop). Integration of the latest Spark V2.1.0 and Spark history server, testing on Hortonworks V2.5 (in AWS and Local modes). Preparation of open source tools for Data-Engineers, Data-Analysts and Data-Scientists (e.g: Jupyter Notebook and Zeppelin for running on top of Spark). Docker containerization for Hadoop nodes, Spark applications and Notebook servers.