Brief History of AI

The history of artificial intelligence (AI) began in antiquity, with myths, legends, and rumors about masters who endowed artificial organisms with intellect or consciousness. Classical thinkers who attempted to define the process of human thought as the mechanical manipulation of symbols sowed the roots of today's AI.

This research culminated in the 1940s with the emergence of programmable digital computers, a machine based on the abstract essence of mathematical reasoning. Technology readiness with its underlying theories led a small group of scientists to seriously consider a feasibility of creating an electronic brain. The modern AI was born.

1943 - Neuron

McCulloch and Pitts, in "A Logical Calculus of the Ideas Immanent in Nervous Activity", proposed logical models to replicate the activity of biological neurons. The mathematical neuron was a fascinating concept even without learning mechanisms. Yet it will form the basis for artificial neural networks and Deep Learning.

1957 - Perceptron

Rosenblatt, in his technical report "The Perceptron: A Perceiving and Recognizing Automaton", developed a practical version of the McCulloch-Pitts neuron, "Perceptron", capable of performing binary classification automatically. A breakthrough that was to lead to a revolution in artificial neural networks in the years to come.

1960 - Backpropagation Model

Henry J. Kelley demonstrates the first version of the continuous backpropagation model in "Gradient Theory of Optimal Flight Paths." The model is in the context of Control Theory where the "method of steepest descent" is presented, yet it forms the basis for further refinements and would be used in the future ANN.

1962 - Backpropagation With Chain Rule

In "The numerical solution of variational problems" Stuart Dreyfus presents a backpropagation model that employs a simple derivative chain rule rather than the dynamic programming employed by previous backpropagation models. This is yet another small step toward the future deep learning.

1965 - Multilayer Neural Network

Alexey Grigoryevich Ivakhnenko and Valentin Grigoryevich Lapa develop a hierarchical representation of a neural network that incorporates a polynomial activation function and is thus trained using the Group Method of Data Handling (GMDH). It is now considered the first multilayer perceptron and Ivakhnenko is widely regarded as the father of Deep Learning.

1969 - The Fall Of Perceptron

In the book Perceptrons, Marvin Minsky and Seymour Papert demonstrate that Rosenblatt's perceptron cannot solve complex functions like XOR. Perceptrons should be placed in multiple hidden layers, which compromises the perceptron learning algorithm. This setback triggers the first AI winter in neural network research.

1970 - Backpropagation Is Computer Coded

Seppo Linnainmaa publishes a general method of automatic differentiation for backpropagation and also implements backpropagation in computer code. Research on backpropagation is now very advanced, but it will not be implemented in neural networks until the next decade.

1971 - Neural Network Goes Deep

Alexey Grigoryevich Ivakhnenko describes in "Polynomial Theory of Complex Systems", a Deep Learning network with 8 layers trained with his much-cited method Group Method of Data Handling, which is still popular in the new millennium, where much of machine learning originated.

1974 - 1980 First major "AI Winter"

1980 - Neocognitron – First CNN Architecture

Kunihiko Fukushima proposed the Neocognitron, a hierarchical, multilayer artificial neural network, for a Mechanism of Pattern Recognition. It was used for handwritten Japanese character recognition and other pattern recognition tasks, and inspired convolutional neural networks.

1982 - Hopfield Network – Early RNN

John Hopfield introduced an artificial neural network, Hopfield Network, to collect and retrieve memories like the human brain. It has only one layer of neurons related to the size of input and output, which must be the same. It serves as a content-addressable memory system and would be crucial for further Recurrent Neural Network (RNN) models in the modern Deep Learning era.

1982 - Proposal For Backpropagation In ANN

Paul Werbos publicly proposes the use of backpropagation for error propagation in neural network training. Backpropagation Algorithm of Paul Werbos provided a "rule for updating the weights" of a multilayer network undergoing supervised learning, which eventually led to the practical adoption of backpropagation by the neural network community in the future.

1985 - Boltzmann Machine

David H. Ackley, Geoffrey Hinton and Terrence Sejnowski create Boltzmann Machine that is a stochastic recurrent neural network. This neural network has only input layer and hidden layer but no output layer.

1986 - NETtalk – ANN Learns Speech

Terrence Sejnowski and Charles Rosenberg developed NETtalk, an artificial neural network, to create simplified models that illustrate the complexity of learning human-level cognitive tasks. NETtalk is a program implemented as a connectionist model that can learn to perform a comparable task. For example, it can learn to pronounce written English text by displaying text as input and comparing phonetic transcriptions.

1986 - Implementation Of Backpropagation

Geoffrey Hinton, Rumelhart, and Williams describe a new learning procedure, back-propagation, for networks of neuron-like units. The procedure iteratively adjusts the weights of the network's connections in order to minimize a measure of the difference between the net's actual output vector and the desired output vector. Back-propagation differs from earlier, simpler methods such as the perceptron-convergence procedure in its ability to generate useful new features.

1986 - Restricted Boltzmann Machine

Paul Smolensky has developed a variant of the Boltzmann machine in which there are no connections between the input and hidden layers. It is known as the Restricted Boltzmann Machine (RBM). It will become popular in the coming years, especially for building recommender systems.

1987-1993 Second major "AI Winter"

1989 - CNN Adopting Backpropagation

Yann LeCun uses backpropagation to train a convolutional neural network that has been successfully applied to the recognition of handwritten ZIP codes from the U.S. Postal Service. A single network learns the entire recognition process, from the normalized image of the character to the final classification. This is a groundbreaking moment, as it lays the foundation for modern computer vision with Deep Learning.

1989 - Universal Approximators Theorem

George Cybenko publishes the earliest version of the Universal Approximation Theorem in his paper "Approximation by superpositions of a sigmoidal function". Theorem demonstrates how a feed forward neural network with a single hidden layer and a finite number of neurons can approximate any continuous function. Deep Learning gains even more credibility as a result.

1991 - Vanishing Gradient Problem

Sepp Hochreiter points out the vanishing gradient problem that can complicate the training of deep neural networks, as they suffer from the now famous vanishing or exploding gradient problem. In typical deep or recurrent networks, the back-proportional error signals either shrink rapidly or grow out of bounds. In fact, they decay exponentially in the number of layers, or they explode. All the subsequent Deep Learning research in the 1990s and 2000s was disrupted by this insight.

1997 - Long Short-Term Memory

Sepp Hochreiter and Juergen Schmidhuber publish a groundbreaking paper on "Long Short-Term Memory" (LSTM). This is an advanced architecture for recurrent neural networks that enables information preservation and solves the vanishing gradient problem for RNNs. LSTM will revolutionize Deep Learning in the coming decades.

2006 - Deep belief network

Geoffrey Hinton, Ruslan Salakhutdinov, Osindero, and Teh publish the paper "A fast learning algorithm for deep belief nets" in which they layer multiple RBMs and call them Deep Belief Networks. DBNs can be viewed as a composition of simple unsupervised networks such as Restricted Boltzmann Machines (RBMs) or autoencoders. The hidden layer of each subnet serves as the visible layer for the next layer and the connections between layers, but not within layers. This composition leads to a fast, layer-by-layer unsupervised training procedure, and the training process is much more efficient for large datasets.

2008 - Use of GPU in Training

Stanford University researchers Rajat Raina, Anand Madhavan and Andrew Ng wrote a seminal paper in 2009 discussing the promise of this technology for machine learning applications. It was not long before GPUs became the de facto standard for processing deep machine learning algorithms. These systems were considered more effective at massively parallel processing of repeatable, identical computations for machine learning compared to their CPU counterparts, which process multiple, more complex computations simultaneously.

2009 - ImageNet

The ImageNet project, inspired by AI researcher Fei-Fei Li, names a large visual database developed for visual object recognition research. More than 14 million images have been hand-annotated by the project, of which at least one million of the images labeled in more than 20,000 categories. The ImageNet project holds an annual software competition, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), in which software programs compete to correctly classify and recognize objects and scenes. Availability of this free database is critical for extending and improving AI algorithms.

2011 - ReLU : Rectified Linear Unit

In their paper "Deep Sparse Rectifier Neural Networks," Yoshua Bengio, Antoine Bordes, and Xavier Glorot demonstrate that the ReLU activation function can avoid the vanishing gradient problem. This means that, in addition to GPUs, the deep learning community now has another tool to avoid issues with long and inefficient deep neural network training times.

2012 - AlexNet

AlexNet is the name of a convolutional neural network (CNN) architecture developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. The key finding of the original work was that the depth of the model was critical to its high performance, which was computationally intensive but enabled by the use of graphics processing units (GPUs) during training. AlexNet is widely considered to be one of the most influential works in the field of computer vision and inspired many other papers to use CNNs and GPUs to accelerate Deep Learning.

2014 - GAN : Generative Adversarial Network

The Generative Adversarial Neural Network, also known as GAN, was developed by Ian Goodfellow. Given a training set, this technique learns to generate new data with the same statistics as the training set. GANs open up entirely new applications for Deep Learning in fashion, art, and science because of their ability to synthesize real-world data.

2016 - AlphaGo Beats Human

DeepMind made headlines in 2016 after its program AlphaGo beat human Go pro Lee Sedol, the world champion. A more general program, AlphaZero, beat the most powerful programs playing Go, chess and shogi (Japanese chess) after a few days of playing against itself using reinforcement learning, taking the promise of Deep Learning to a whole new level.

2017 - Transformer

The Transformer, a recurrent neural network, is designed to process sequential input data, such as natural language. However, unlike RNNs, Transformers do not necessarily process data sequentially. Rather, the attention mechanism provides context for each position in the input sequence by weighting the importance of each piece of input data differently.

For example, if the input data is a natural language sentence, the transformer does not need to process the beginning of the sentence before the end. Rather, it identifies the context that confers meaning to each word in the sentence. This feature allows for more parallelization than RNNs and thus reduces training times.

It is used primarily in natural language processing (NLP) and computer vision (CV).

2020 - OpenAI GPT-3

Open AI has unveiled the GPT-3 natural language processing algorithm, which has the remarkable ability to produce human-like text in response to a prompt. GPT-3 is now considered the largest and most advanced language model in the world, with 175 billion parameters trained on Microsoft Azure's AI supercomputer.

2021 - Monster AI models

GPT-3 has 175 billion parameters—10 times more than its predecessor, GPT-2. But GPT-3 is dwarfed by the class of 2021. Jurassic-1, a commercially available large language model launched by US startup AI21 Labs in September, edged out GPT-3 with 178 billion parameters. Gopher, a new model released by DeepMind in December, has 280 billion parameters. Megatron-Turing NLG has 530 billion. Google’s Switch-Transformer and GLaM models have one and 1.2 trillion parameters, respectively.

The trend is not just in the US. This year the Chinese tech giant Huawei built a 200-billion-parameter language model called PanGu. Inspur, another Chinese firm, built Yuan 1.0, a 245-billion-parameter model. Baidu and Peng Cheng Laboratory, a research institute in Shenzhen, announced PCL-BAIDU Wenxin, a model with 280 billion parameters that Baidu is already using in a variety of applications, including internet search, news feeds, and smart speakers. And the Beijing Academy of AI announced Wu Dao 2.0, which has 1.75 trillion parameters.

Meanwhile, South Korean internet search firm Naver announced a model called HyperCLOVA, with 204 billion parameters.

For all the effort put into building new language models this year, AI is still stuck in GPT-3’s shadow. “We thought we needed a new idea, but we got there just by scale”. Yet, “In 10 or 20 years, large-scale models will be the norm”. (Will Douglas Heaven)

2022 - BLOOM: AI in the community hands

All the AI monster models of recent times, like GPTs, DALL-Es, stem from the immense resources of private tech companies. Tech companies have kept building better and bigger models, one after another: big models by big corporations with big pockets. This is not only because of their technical specifications, but also because a handful of wealthy, for-profit research labs exercise absolute control over them.

That's about to change.

BLOOM (BigScience Language Open-science Open-access Multilingual) is unique not because it is architecturally different from GPT-3 - indeed, it is most similar to all of the above - but because it is the starting point of a socio-political paradigm shift in AI that will define the coming years in the field - and that will break Big Tech's stranglehold over R&D: an AI paradigm in the hands of society.

BLOOM is the result of the BigScience Research Workshop, which includes the work of over 1000 researchers from around the world and counts on the collaboration and support of over 250 institutions. What they have in common is that they believe technology - and AI in particular - should be open, diverse, inclusive, responsible and accessible, for the benefit of humanity. BigScience members have published an ethical charter that establishes the values they adhere to in the development and deployment of these technologies.

BigScience and BLOOM are without a doubt, the most notable attempt at bringing down all the barriers that Big Tech has erected in the field of AI in the last decade. And the most sincere and honest attempt to develop AI for the benefit of all. It's the first-time open source has triumphed over privacy and control.

2023 - ChatGPT: Human-like AI generation

ChatGPT is a bot trained to generate human-like responses to user inputs. Through the wonders of machine learning, it’s acquired a remarkably expansive skillset. It can solve all your problems and answer all your questions. Or at least it will try to.

It is, to borrow Arthur C. Clarke’s old formulation, “indistinguishable from magic.”

It unsderstands what we ask. We doubt its nature, our distinct: it comes with intelligence, too?

2025 - DeepSeek

Rise of Open-Source, Affordable LLMs

Its success has revolutionized AI, making advanced NLP accessible beyond tech giants. This democratization enables researchers, developers, and smaller organizations to harness cutting-edge AI without prohibitive costs, driving innovation and collaboration across diverse fields.