Algorithms for Data Analysis
This course is inspired by the one written by Charlotte Laclau who kindly granted me her source files. So far, I use the jupyter notebooks cowritten by Charlotte Laclau and Julien Tissier. Note that it will change over the next few years according to feedback from students and teachers.
The present course aims to develop a solid understanding of various algorithms and techniques used for processing and analyzing data. This includes gaining proficiency in implementing and evaluating algorithms, selecting appropriate algorithms for specific tasks, understanding algorithmic complexity and efficiency, and applying algorithms to solve real-world data analysis problems. The goal is to equip students with the necessary knowledge and skills to effectively and efficiently analyze large datasets, extract valuable insights, and make data-driven decisions.
Prerequisites
It is necessary to have prior knowledge on both the fundamentals of mathematics (analysis, linear algebra, probability, statistics) and computer science. As such, it is advised to be proficient in
- Basic concepts of probability and statistical inference
- Vector manipulation and matrix multiplication
- Basics of calculus of variations
- Linear algebra (eigenvalue - eigenvector, hyperplane, projection, etc.)
- Python programming and vector manipulation using NumPy
Educational Goals
This course is designed to provide a quick introduction to data analysis algorithms for data engineers. As such, it meets different educational objectives:
- Understand the differences between artificial intelligence and machine learning
- Know which model / algorithm to choose depending on the task
- Learn how to train, validate and test machine learning models
- Discover the basics of numerical optimization
- Master data analysis Python librairies
- Apply the most popular supervised learning algorithms (linear regression, SVM , decision trees, etc.)
- Use two unsupervised learning algorithms (K-means, PCA)
Course Program
The course is given in 8 sessions of 3 hours each
- Numpy
- Pandas
- Scikit-learn
- Practical exercice ( Jupyter notebook)
- Behaviours of a telecom operator's customers ( Jupyter notebook)
- Video games sales ( Jupyter notebook)
- differences between AI and ML
- distinction between supervised and unsupervised learning
- training, validation and testing of ML models
- Tutorial class on simple supervised learning models ( Slides)
- K neirest neighbors
- Linear Regression
- Ridge Regression
- Practical exercices ( Jupyter notebook)
- Logistic Regression
- Decision Trees
- Tutorial class on numerical optimization ( Slides)
- Convex vs. nonconvex
- Role and impact of the step-size
- Practical exercices
- Toy examples ( Jupyter notebook)
- Wine quality analysis ( Jupyter notebook)
- SVM
- Neural Networks
- Practical exercices ( Jupyter notebook)
- Determine if a mushroom is poisonous ( Jupyter notebook)
- Predict cell phone prices ( Jupyter notebook)
Course Materials
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License