Main
Jaime Arboleda Castilla
Data Scientist | Mathematician | Software Engineer
Born in 1985 in Algeciras (Province of Cádiz, Spain).
I love gaining and sharing insights from data, as well as using those insights to make better decisions. I am always passionate about learning new things and I am able to do it fast. I am interesting in research and open problems, keeping up with new approaches in Machine Learning. Lately I have been very interested in Bayesian modeling and causal inference, although I have mostly worked on predictive modeling. I like programming and contributing to the open source community. I am easy to work with and I have always maintained good relationships with all my coworkers.
Professional Experience
Data Scientist
Remote work
2022 - 2021
On Safety and Security Analytics (SSA) team in ICS2 Project, TAXUD
Description
- Researching, designing and continuous developing of SSA models.
- Supporting deployment and orchestration of SSA models and analytic solutions for real time use.
- Supporting designing, organizing and monitoring of the SSA entire workflow.
- Developing Similarity Search and Anomaly Detection capabilities.
- Providing training for Public Officers across all Member States.
Projects
- Similarity search engine. It’s built on top of several custom components, including a new neural network architecture based on Transformers.
- Model comparison tool for performance analysis, built with Dash and Docker.
- Synthetic XML generation.
- XML Splitter tool, which is a custom pipeline for transforming a complex XML into a Pandas Dataframe, with many options.
Technologies
- Dataiku, RStudio, JupyterLab.
- Denodo, Oracle, PostgreSQL, Neo4j.
- SQL, Python, R, Bash, XML, JSON, HTML.
- Numpy, Pandas, Keras, Tensorflow, Dash.
- GitLab, Jenkins, Docker, Kubernetes, Apache Kafka.
Head of IT Unit and Data Scientist
Madrid, Spain
2021 - 2020
On Subdivision of Information Analysis Technologies and Fraud Investigation
Description
- Spanish Public Officer belonging to the Senior Corps of Systems, Information and Communication Technologies of the State Administration.
- Data Scientist and manager of a team of 8 people working on analytic models for tax fraud detection.
Projects
- Custom (based on KNN) clustering algorithm that, using the information of purchases and sales of each corporation, predicts whether its declared sector of economic activity is accurate or not. It was programmed in Scala.
- Identification of statistical position and undervaluation of goods fraud in Customs declarations by modifying an existing algorithm provided by European Commission.
- Classifier (using XGBoost) for predicting whenever a taxpayer is most likely to make a mistake when modifying some parts of its draft of Personal Income Tax Declaration. The goal was to provide a nudge message to those taxpayers in the event of modification, aiming at reducing filling errors.
- Classifier for predicting the risk of non-payment of debts with the Tax Agency, in order to anticipate precautionary measures.
- Classifier for predicting the risk of not paying its tax liabilities in due time for a given taxpayer. This model makes use of near real time information regarding all invoices collected in the previous months of the prediction.
- Regression model for predicting the total (declared or undeclared) incomes of a given family using all available information.
Technologies
- Python, Scala, Spark.
- Pandas, Numpy, Scikit-Learn, Xgboost, Luigi.
- Linux, Cloudera.
- SyBase IQ, DataStage.
Head of IT Unit
Madrid, Spain
2019 - 2017
On Subdivision of Software Development and Applications
Description
- Spanish Public Officer belonging to the Senior Corps of Systems, Information and Communication Technologies of the State Administration.
- Manager of a team of 15 people working on SW development for Personal Income Tax and applications.
Projects
- Web service for personal data ingestion, for the web app of Personal Income Tax Declarations.
- Cryptographic service for granting access credentials for the presentation of the Income Tax Declarations.
- Datawarehouse ingestion of Personal Income Tax Declarations.
- Risk analysis (combining rule-based risks, statistical risks and simple predictive models) for Personal Income Tax Declarations.
- Software development for the management and lifecycle of Personal Income Tax Declarations.
Technologies
- COBOL, Java, HTML, JavaScript.
- Web Services.
- DB2, Oracle.
- Z/OS, Linux.
- SyBase IQ, DataStage.
Head of IT Unit
Madrid, Spain
2017 - 2013
On Subdivision of Software Development and Applications
Description
- Spanish Public Officer belonging to the Senior Corps of Systems, Information and Communication Technologies of the State Administration.
- Manager of a team of 8 people working on SW development for Corporate Tax and applications.
Projects
- Datawarehouse ingestion of Corporate Tax Declarations.
- Risk analysis (combining rule-based risks, statistical risks and simple predictive models) for Corporate Tax Declarations.
- Software development for the management and lifecycle of Corporate Tax Declarations.
Technologies
- COBOL, Java.
- DB2, Oracle.
- Z/OS, Linux.
- SyBase IQ, DataStage.
Teaching Experience
Mathematics Teacher
Madrid, Spain
2022 - 2020
I regularly give a course on Mathematical Methods (Differential Equations, Harmonic Analysis and Complex Analysis) for students of third course of Aeronautic Engineering.
Lecturer
Webinar
Universidad Complutense, Madrid, Spain
2021
I gave a lecture in the Fiscality and Artificial Intelligence Webinar to present a project held at the Tax Agency, related to using the Nudge philosophy to boost voluntary tax compliance.
Mathematics Teacher
Madrid, Spain
2020 - 2017
I gave multiple courses to students of the Degree of Mathematics, including:
- Complex Analysis.
- Topology.
- Advanced Algebra.
- Statistics.
- Probability.
- Galois Theory.
- Integration and Measurement Theory.
- Physics.
Publications
Nudge Project
Paper
Aranzadi Thomson Reuters
2021
Paper on Nudge project, carried out at the Tax Agency, consisting of the application of Artificial Intelligence to help the assistance to the taxpayer and voluntary compliance with tax obligations. Published in Aranzadi Thomson Reuters, along with other works presented in the Webinar “Fiscality and Artificial Intelligence” organized by the Complutense University of Madrid.
Open Source Collaborations
Collaborator of Keras
Remote work
2022
I raised an issue after finding a bug, and weeks later I was able to fix it with a pull request that was merged on the main branch.
Developer of open source library
Remote work
2021
Python package that performs hyperparameter optimization, train, probability calibration and error validation of classification models using a Nested Cross-Validation approach.
Education and Training
European Commission Training
Internal training
Remote
2022 - 2021
- Cibersecurity
- Software Development and Agile Methodologies.
Spanish Tax Agency Training
Internal training
Madrid, Spain
2021 - 2013
- Advance Data Analysis.
- Machine Learning and Big Data.
- Geospatial Data Processing in R.
- Software Development and Agile Methodologies.
- Automatic Reporting tools.
- Data Warehousing.
- Blockchain and Cryptocurrencies.
- OSGI for Java development.
- SCRUM and Agile methodologies.
- Advanced Java programming.
Coursera
Courses and Specializations
Remote
2021 - 2017
- Bayesian Statistics: From Concept to Data Analysis.
- Bayesian Statistics: Techniques and Models.
- Bayesian Statistics: Mixture Models.
- Neural Networks and Deep Learning.
- Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization.
- Structuring Machine Learning Projects.
- Convolutional Neural Networks and Computer Vision.
- Sequence Models and Natural Language Processing.
- Machine Learning.
National Institute of Public Administrations (INAP)
Selective course
Madrid, Spain
2013 - 2012
Course for accessing the Senior Corps of Systems, Information and Communication Technologies of the State Administration.
- Passed the competitive examination to the Corps in the first attempt and with the second best place of 600 candidates.
- In the internship phase, I worked on an analysis of the security systems of the Ministry of Defense.
Universidad Española de Educación a Distancia (UNED)
Master’s Degree in High School Teacher Training
Madrid, Spain
2016 - 2014
- Average mark of 8.1.
- 215 hours of teaching internships at high school.
- Finished with a work on teaching Mathematics using programming tools.
Universidad Complutense de Madrid (UCM)
Master’s Degree in Mathematical Research
Madrid, Spain
2010 - 2009
- Average mark of 8.8.
- Carried out under the direction of Mr. Francisco Presas with a research work in geometric quantization.
Universidad Autónoma de Madrid (UAM)
Double Degree in Mathematics and Computer Science
Madrid, Spain
2009 - 2005
- Average mark of 9.5 and 25 Honor Distinctions.
- I studied additional courses out of the curriculum.
Honors and Awards
Consejo Superior de Investigaciones Científicas (CSIC)
Scolarships
Madrid, Spain
2011 - 2007
- I was granted a scholarship oriented to the realization of doctoral thesis with CSIC, stopped for personal reasons after obtaining the Master’s Degree in Mathematical Research.
- I was awarded to be part of a program to expand studies in Mathematics during summers. This allowed me to attend intensive courses during the summers of 2007 and 2008 at the Complutense University of Madrid.
Universidad Autónoma de Madrid
Honorable mention
Madrid, Spain
2009
Winner of honorable mention to the student with the overall best marks during the degree.
Comunidad de Madrid
Scolarships for outstanding academic performance
Madrid, Spain
2009 - 2004
- Course 2008/2009: Teaching supprt on Differential Geometry.
- Course 2007/2008: Work on numerical methods for solving equations.
- Course 2006/2007: Work on Neural Networks.
- Course 2005/2006: Work on Commutative Algebra.
- Course 2004/2005: Work on cryptography and number theory.