... Detection of illicit activities |

Detection of illicit activities

Introduction

This was a project that was followed with the CRISP-DM data mining methodology.

In the data collection of the commercialization and transport of a product, different business rules were identified to detect various illegal activities that may be taking place in the context of the business.
In identification x rules were subdivided into 2 groups:

  • Statistical Analysis: Those rules that are sensitive to a trend and seasonality, two models were evaluated. ARIMA and PROPHET. The one that gave the best flexibility and fit to the data in the presence of outliers was selected. PROPHET was the selected model in order to train the model with the historical data to identify those atypical values ​​of the evaluated variables.
    Atypical values ​​can occur due to a global phenomenon in a group of similar values, therefore, the determination of atypical values ​​was carried out by verifying two phases by similar groups and those values ​​that were identified as anomalous were passed through the second phase to determine if in the unique behavior it was also outside of a seasonality and historical trend.

Through the PROPHET model, the identification of atypical values is carried out (values outside of a seasonality and historical trend). Which those values that are outside a range of uncertainty that was previously evaluated and set under different parameters, is identified as an outlier after going through two verification processes.

  • Classification Analysis: Those rules that had been exactly identified, different data extraction and/or transformation processes were carried out in order to have tangible those actions that can be considered illicit in the context of the business.

Finally, all the outliers identified from these two previous analyzes were unified in a database in order to make a visualization report of the results of the proposed models.

Once the models were evaluated and the results were validated, this process was automatized.

Not further information can be share due to confidencial information

Tools: SQL Management Studio, Anaconda, Jupyter Notebook, GIT, Azure Devops
Languajes: SQL,Python
Python Libraries: Pandas, Numpy, Seaborn, Sci-learn, Prophet, Arima,traceback

Andres Camilo Viloria Garcia
Data Scientist | Data Analyst