Data Engineering and Big Data Architecture projects driven by Data and Use Cases:
Data transformations with Spark (e.g: XML and log4j logs to column format), Dashboards with Banana and SolrCloud search engine, Machine Learning with Spark ML, Prototyping with Jupyter Notebook and Spark kernel, On-demand deployments for Spark applications, Developments with Python Java and Scala, Trainings.
Quick delivering of high challenge projects oriented logs analysis:
Leading a Data driven approach around the EDRMS (Electronic Document and Records Management System), by implying closely the project's owner in an agile way in order to clarify his needs like getting statistics and detecting outliers.
One of the challenge was to transform a big variety of XML logs coming from the EDRMS via a Java Spark transformer, to make complex aggregations with Spark SQL, then to perform indexing with Solr-Cloud, and to finally for analyzing of aggregated clean data with Banana dashboards.
Machine Learning POC for quality analysis of CRM data:
Using Spark ML and GraphLab-Create for researching hidden abnormalities in the data. Data preparation, and clustering algorithms such as K-Means, GMM and LOF.