Commit 467051bb by PLN (Algolia)

Slides: intro v0.6

parent c4e113e2
node_modules/
.idea/
dist/
......@@ -5,6 +5,7 @@ color: #eee
colorSecondary: #333
backgroundColor: #111
paginate: true
footer: "ML101 | Paul-Louis NECH | INTECH 2022-2023"
---
# <!-- fit --> Hello World
......@@ -81,6 +82,7 @@ $$ I_{xx}=\int\int_Ry^2f(x,y)\cdot{}dydx $$
![bg left](./img/01-midjourney.png)
---
<!-- _footer: "" -->
![bg 100%](./img/01-diffusion.png)
<br />
<br />
......@@ -179,12 +181,44 @@ _color: black
---
## C'est quoi apprendre ?
<!--
Behaviorisme: apprendre = savoir _faire_
Acquis vs Inné : danse des abeilles vs culture des chimpanzés
Conditionnement opérant (Pavlov et sa cloche)
-->
---
<!-- _footer: "" -->
![bg](https://image.beeplaza.com.au/wp-content/uploads/2019/12/bee-dance.jpg)
<!--
Acquis vs Inné : danse des abeilles vs culture des chimpanzés
-->
---
<!-- _footer: "" -->
![bg](https://i0.wp.com/www.throwcase.com/wp-content/uploads/2014/12/monkeys-e1419187612146.jpg?fit=1000%2C692&ssl=1)
---
## Le Conditionnement
<!--
Conditionnement classique vs opérant
- Classique: liaison Stimuli-Stimuli
Pavlov: nourriture -> salive [Stimulus Inconditionnel -> Réponse Inconditionnelle]
Avant la nourriture, on active une cloche [Stimulus Neutre], le chien écoute (Réaction)
A force, face à la cloche [Stimulus Conditionnel], il salive [Réponse Conditionnelle], avant même de voir de la nourriture : on a lié un Stimulus neutre à son Stimulus Inconditionnel initial
- Conditionnement opérant, "essais et erreurs": liaison Stimulus-Réponse (ou encore Stimulus-Réponse-Conséquence)
- Assis, s'assoie, caresse
- Couché, s'assoie, reçoit rien
- Cage de Skinner : Un rat est placé dans une cage de Skinner dans laquelle se trouve un levier actionnant un mécanisme qui fait entrer dans la cage des croquettes.
Le rat effectue au hasard toutes sortes d'opérations sur les objets qui se trouvent dans la cage, dont le levier.
Après qu'il a actionné quelques fois le levier, ce qui est suivi par la présentation de croquettes, il ne fait plus d'erreurs : la pesée sur le levier est devenue la réponse correcte.
Skinner:
-->
---
## C'est quoi apprendre ?
......@@ -211,6 +245,10 @@ _backgroundColor: white
---
### Le par-coeur, un problème ?
<!-- Pour passer un exam ? -->
<!-- Et si c'est l'exam du permis ? Apprendre par coeur les commandes du parcours précédent ? Apprendre des règles _généralisées_ ? -->
---
### Guessing the teachers password
<!--
......@@ -271,12 +309,113 @@ This signifies, with successive layers, there is loss of information about the i
Comment je sais ce que je sais pas ?
---
## Underfitting
![bg right 80%](https://upload.wikimedia.org/wikipedia/commons/5/55/Underfitted_Model.png)
<!-- Image by User:AAstein: https://upload.wikimedia.org/wikipedia/commons/5/55/Underfitted_Model.png -->
---
## Overfitting
![bg right 80%](https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Overfitting.svg/1920px-Overfitting.svg.png)
<!-- Image by User:Chabacano: https://commons.wikimedia.org/wiki/File:Overfitting.svg -->
---
# Failures of ML
---
MIT COVID FAILS
### MIT COVID FAILS
![bg right 100%](./img/01-covid-failed.png)
- Frankenstein data sets <!-- Driggs highlights the problem of what he calls Frankenstein data sets, which are spliced together from multiple sources and can contain duplicates. This means that some tools end up being tested on the same data they were trained on, making them appear more accurate than they are. -->
- Enfant == sain ? <!-- Many unwittingly used a data set that contained chest scans of children who did not have covid as their examples of what non-covid cases looked like. But as a result, the AIs learned to identify kids, not covid. -->
- Allongé == malade ? <!-- Driggs’s group trained its own model using a data set that contained a mix of scans taken when patients were lying down and standing up. Because patients scanned while lying down were more likely to be seriously ill, the AI learned wrongly to predict serious covid risk from a person’s position. -->
- _Confounding factors_ <!-- In yet other cases, some AIs were found to be picking up on the text font that certain hospitals used to label the scans. As a result, fonts from hospitals with more serious caseloads became predictors of covid risk. -->
---
### COVID FAILS: How to fix it?
![bg right 100%](./img/01-covid-failed.png)
- ###### Better data would help, but <!-- but in times of crisis that’s a big ask. It’s more important to make the most of the data sets we have. The simplest move would be for AI teams to **collaborate more with clinicians** -->
- ###### "Researchers need to **share their models** and disclose **how they were trained**" <!-- so that others can test them and build on them. “Those are two things we could do today,” he says. “And they would solve maybe 50% of the issues that we identified.” -->
- ###### Standardized formats
- ###### Not-invented-here syndrome
<!-- most researchers rushed to develop their own models, rather than working together or improving existing ones. The result was that the collective effort of researchers around the world produced hundreds of mediocre tools, rather than a handful of properly trained and tested ones.
REPLICATION CRISIS! Probleme commun dans toute la science, incentives, game theory...
“The models are so similar—they almost all use the same techniques with minor tweaks, the same inputs—and they all make the same mistakes,” says Wynants. “If all these people making new models instead tested models that were already available, maybe we’d have something that could really help in the clinic by now.”
-->
---
<!--
“Until we buy into the idea, we’re doomed to repeat the same mistakes”
-->
> we need to sort out the
> _unsexy_ problems
> **before the sexy ones**
---
GALACTICA?
https://galactica.org/explore/
[Try it live ;)](https://huggingface.co/spaces/morenolq/galactica-base)
[LeCun hyped](https://twitter.com/ylecun/status/1592619400024428544)
---
[GALACTICA failed](https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-days-gpt-3-science/
)
<br />
> Like all language models, Galactica is a mindless bot that cannot tell fact from fiction.
> ~_MIT Tech Review_
[_Paper_](https://arxiv.org/abs/2211.09085)
---
Demos
https://huggingface.co/spaces?sort=modified&search=galactica
---
---
> Narrator voice: LMs have no access to "truth", or any kind of "information" beyond information about the distribution of word forms in their training data.
Emily M. Bender, Faculty Director, Computational Linguistics
https://twitter.com/emilymbender/status/1592993757498331136
https://twitter.com/Michael_J_Black/status/1593133722316189696
[LeCun not happy](https://twitter.com/ylecun/status/1593293058174500865)
---
TAY
-> Interface problems!
---
### Gaming the game: IAs flemmardes
<br />
......@@ -305,51 +444,192 @@ Sacs de ifs
Amazon mechanical turk
---
Elisa? Turing test ?
Eliza? Turing test ?
http://psych.fullerton.edu/mbirnbaum/psych101/Eliza.htm
---
Akinator ?
https://fr.akinator.com/game
---
Quelles limites aux possibilités du ML ?
https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now
---
Andrew ng's quote on limits
> If a typical person can do a mental task with
> **less than one second of thought**,
> we can probably automate it using AI
> either now or in the near future.
~ Andrew Ng, [What AI can and can't do](https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now)
---
## Formalisme: Features
$$ X \Rightarrow Y $$
---
## Formalisme: features
## Métriques
---
## métriques
<!--
backgroundColor: white
color: black
-->
## Precision & Recall
![bg right:40% 90%](./img/01-metrics.png)
---
Precision
Précision
![bg right:40% 90%](./img/01-precision.png)
---
Rappel (Recall)
Rappel (_Recall_ )
![bg right:40% 90%](./img/01-recall.png)
---
Supervisé ou non
---
<!--
backgroundColor: black
color: white
-->
## Pionniers & Penseurs
---
### Turing
![bg right](https://1.bp.blogspot.com/-jqU6NCD5dVU/WBVezz7LZWI/AAAAAAAADkY/t7hQMwMuCZIXxgPl0ymB2M9WpvkPw3hvQCLcB/s200/Alan%2BTuring.jpg)
<!-- Turing test: Mécompréhensions
Against the other lines of thought, Turing provides a little “viva voce” that is intended to illustrate the kind of evidence that he supposes one might have that a machine is intelligent.
Given the right kinds of responses from the machine, we would naturally interpret its utterances as evidence of pleasure, grief, warmth, misery, anger, depression, etc.
Perhaps—though Turing doesn’t say this—the only way to make a machine of this kind would be to equip it with sensors, affective states, etc., i.e., in effect, to make an artificial person. However, the important point is that if the claims about self-consciousness, desires, emotions, etc. are right, then Turing can accept these claims with equanimity: his claim is then that a machine with a digital computing “brain” can have the full range of mental states that can be enjoyed by adult human beings.
-->
<!--
Also, Searles' chinese room experiment:
Searle imagines himself alone in a room following a computer program for responding to Chinese characters slipped under the door. Searle understands nothing of Chinese, and yet, by following the program for manipulating symbols and numerals just as a computer does, he sends appropriate strings of Chinese characters back out under the door, and this leads those outside to mistakenly suppose there is a Chinese speaker in the room.
The narrow conclusion of the argument is that programming a digital computer may make it appear to understand language but could not produce real understanding. Hence the “Turing Test” is inadequate. Searle argues that the thought experiment underscores the fact that computers merely use syntactic rules to manipulate symbol strings, but have no understanding of meaning or semantics.
https://plato.stanford.edu/entries/chinese-room/
His conclusion seems somewhat wrong to me as computer programs have no representation of meaning is not always true, but in our specific case of Language Models he was really on point :not_bad: -->
- Turing test
-> [Coffee test](https://www.fastcompany.com/1568187/wozniak-could-computer-make-cup-coffee)?
<!-- Steve Wozniak: for general AI, Turing's test is outdated -->
---
### Dartmouth
### AI Summer: Dartmouth workshop
> _1956' Dartmouth Summer Research Project on Artificial Intelligence_
<!-- The study is to proceed on the basis of the conjecture that
every aspect of learning or any other feature of intelligence can in principle
be so precisely described
that a machine can be made to simulate it.
An attempt will be made to find:
- how to make machines use language
- form abstractions and concepts
- solve kinds of problems now reserved for humans
- and improve themselves. -->
> We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.
---
### Summer
### AI Winter: 1984 et autres déceptions
<!-- . The AI winter was a result of such hype, due to over-inflated promises by developers, unnaturally high expectations from end-users, and extensive promotion in the media -->
<!-- Even today:
- AI researcher Rodney Brooks would complain in 2002 that "there's this stupid myth out there that AI has failed, but AI is around you every second of the day."[4]
- In 2005, Ray Kurzweil agreed: "Many observers still think that the AI winter was the end of the story and that nothing since has come of the AI field. Yet today many thousands of AI applications are deeply embedded in the infrastructure of every industry."
-->
<!-- Enthusiasm and optimism about AI has generally increased since its low point in the early 1990s. Beginning about 2012, interest in artificial intelligence (and especially the sub-field of machine learning) from the research and corporate communities led to a dramatic increase in funding and investment. -->
---
### Yann LeCun
<!-- Thèse sur la BackPropagation
Père du Deep LEarning
Yann Le Cun est diplômé de l'ESIEE Paris en 1983, il part ensuite à l'université Pierre-et-Marie-Curie effectuer un DEA puis un doctorat qu'il obtient en 1987.
Il s'oriente rapidement vers la recherche sur l'apprentissage automatique et il propose pendant sa thèse une variante de l'algorithme de rétropropagation du gradient, qui permet depuis le début des années 1980 l'apprentissage des réseaux de neurones.
Il réalise son post-doctorat au sein de l'équipe de Geoffrey Hinton.
Yann Le Cun travaille depuis les années 1980 sur l’apprentissage automatique (machine learning) et l’apprentissage profond (deep learning) : la capacité d’un ordinateur à reconnaître des représentations (images, textes, vidéos, sons) à force de les lui montrer, de très nombreuses fois.
En 1987, Yann Le Cun rejoint l'Université de Toronto et en 1988 les laboratoires AT&T, pour lesquels il développe des méthodes d'apprentissage supervisé6.
Yann Le Cun s'intéresse ensuite à la conception des algorithmes de compression du format d'archivage DjVu7 puis la reconnaissance automatique de chèques bancaires6.
Yann Le Cun est professeur à l'université de New York où il a créé le Center for Data Sciences8,9. Il travaille notamment au développement technologique des voitures autonomes10.
Le 9 décembre 2013, Yann Le Cun est invité par Mark Zuckerberg à rejoindre Facebook pour créer et diriger le laboratoire d'intelligence artificielle FAIR (« Facebook Artificial Intelligence Research ») à New York, Menlo Park et depuis 2015 à Paris, notamment pour travailler sur la reconnaissance d'images et de vidéos8,9. Il avait précédemment refusé une proposition similaire de la part de Google10.
En 2016, Yann Le Cun est le titulaire pour l'année de la chaire « Informatique et sciences numériques » du Collège de France9.
En janvier 2018, Yann Le Cun quitte son poste de chef de division en recherche sur l'intelligence artificielle chez Facebook, au profit de Jérôme Pesenti, pour occuper un poste de chercheur en tant que scientifique en chef de l'IA toujours chez Facebook11.
-->
![bg right ](https://www.controcorrenteblog.com/wp-content/uploads/2015/09/Yann-LeCun-1.jpg)
- [BackPropagation](http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf)
---
![bg](./img/01-lenet.gif)
---
## LeNet Architecture
![](./img/01-lenet.jpg)
---
### Geoffrey Hinton
---
### Andrew Ng
---
### Yudkowsky
\ No newline at end of file
### Eliezer Yudkowsky
- [Machine Intelligence Research Institute _[intelligence.org]_](https://intelligence.org/research/)
- [LessWrong](lesswrong.com) & [Overcoming Bias](overcomingbias.com)
- AI Box Experiments
![bg right](https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1f9d45e-db99-41a5-b38b-c824fb9ef481_1258x1078.jpeg)
---
<!--
_backgroundColor: white
color: #111
footer: ""
-->
![bg 100%](./img/01-miri.png)
---
<!--
_backgroundColor: white
color: #111
footer: ""
-->
## LessWrong <3
- #### [Sequences on Rationality](https://www.lesswrong.com/rationality)
- ###### [Guessing the Teacher's Password](https://www.lesswrong.com/s/5uZQHpecjn7955faL/p/NMoLJuDJEms7Ku9XS)
- ###### ["Science" as a Curiosity-Stopper](https://www.lesswrong.com/posts/L22jhyY9ocXQNLqyE/science-as-curiosity-stopper)
- ###### [Say no to Complexity](https://www.lesswrong.com/s/5uZQHpecjn7955faL/p/kpRSCH7ALLcb6ucWM)
- ###### [Making Beliefs pay Rent](https://www.lesswrong.com/s/7gRSERQZbqTuLX5re/p/a7n8GdKiAZRX86T5A)
![bg right 120%](./img/01-lesswrong.png)
---
\ No newline at end of file
......@@ -7,4 +7,46 @@ backgroundColor: #111
paginate: true
---
# Choisir un Modèle
\ No newline at end of file
# Théorie : Models,
# Models _everywhere_
---
## Perceptron
---
## Neural Network
---
## LSTMS
---
## Deep Learning:
---
### Layers, Layers, Layers!
---
### Convolutions, Capsules and other tricks
---
### Attention! It's all you need
---
### TRANSFORMERS
---
### Language Models
---
### Genetic Algorithms
---
### Reinforcement Agents
---
# Pratique : Choisir un Modèle
---
### Selon la data
---
### Selon les ressources
---
### Selon l'usage
......@@ -7,4 +7,11 @@ backgroundColor: #111
paginate: true
---
# Entrainer son Modèle
\ No newline at end of file
# Entrainer son Modèle
## Comment setup
## Quand s'arrêter
## Si c'est parti pour tourner une nuit....
......@@ -8,8 +8,28 @@ paginate: true
---
# Tester son Modèle
---
## À la main
---
## Pour éviter des régressions
---
## De manière continue
---
### Quand la data chanfge
---
### Quand le modèle évolue
---
### Dans ta CI/CD
---
# Évaluer un modèle
Sur les métriques visées
Sur ses biais éventuels
- Avec CHECKLIST
- Avec des users différents
---
## Adversarial perturbations
## Adversarial perturbations
## Méthodes
### Train/test/val split
......@@ -14,8 +14,58 @@ paginate: true
HuggingFace!
OpenAI!
---
## Laisser une porte de sortie
---
## Considérations pour l'UX
---
## Working with Large Language Models
(don't mistake them for a ghost in the shell :wink:)
[David Chalmers: Are Large Language Models sentient?](https://www.youtube.com/watch?v=-BcuCmf00_Y)
[LaMDA](https://archive.ph/1jdOO)
> AI ethicists warned Google not to impersonate humans. Now one of Google’s own thinks there’s a ghost in the machine.
---
## Outils
Dev, Collab, etc
\ No newline at end of file
Dev, Collab, etc
### Libs fondamentales
NumPy
Pandas
### Frameworks importants
Jupyter Notebooks
PyTorch
TensorFlow
### ML to API
Flask
FastAPI
#### Deploiement
Kubernetes et consorts
KubeML? TensorflowMobile?
APIs! HF, OpenAI, etc
### Code quality
PyLint
Black
PyDantic
MyPy
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment