Puneet Jain | Template by Bootstrapious.com & ported to Hugo by Kishan B

Genealogy of Neural Networks - A short essay

Medical prescriptions from a chat bot on a website, songs recommended by a cylinder in a dining room, and cars on a highway driving without human assistance portrays a world surrounded by “intelligent” entities - intelligence that is equivalent to that of a human brain. Indeed, all the above technologies are equipped with a common architecture, a neural network that: imitates the human brain to learn patterns in datasets, optimize its decisions, and predict results to act on the outside world. However, the genealogy (Foucauldian sense) of neural networks is not alone technological but also a ramification of socio-political-economic relations of the mid-20th century followed by significant advancements in the early decades of the 21st century.

One of the origins for the growth of neural networks (or connectionism) in our everyday technologies is the dream of a specific group of engineers and scientists of the mid 1900’s to make the computers go beyond performing calculations to making deductive decisions equivalent to a human brain. Psychologists like Frank Rosenblatt were already conducting research to create mathematical models that would use numerical manipulations to derive logic by simulating the structure and working of the brain. The Office of Naval Research in 1958 complemented this stance by funding Rosenblatt to build a machine, a perceptron (Rosenblatt, 1958), an abstract model of a brain consisting of neurons arranged in vertical layers connected to each other by weights (parameters, symbolic of synaptic strengths of neurons in a brain). The perceptron learnt patterns in images by optimizing the numerical value of such weights through minimization of error in-between the predicted (a weighted sum of inputs) and the true (already known) outputs. Although promising, perceptron was shown to be only capable of learning patterns in datasets which could be linearly separated. In order words, a perceptron was a single layer neural network that could only recognize two distinct categories (binary classes like dogs and cats) in the dataset visually separated by a line in two-dimensional plane (or a hyperplane in higher dimensions). The inability of Rosenblatt’s perceptron to classify more than two classes or learn complex non-linear patterns in the datasets resulted into a stagnation of the funding for research on connectionism followed by a halt in the research of neural networks after a strong critique by computer scientists Marven Minsky and Seymour Papert at MIT in their 1969 book, Perceptrons (Minsky and Papert, 1988).

The financial setbacks and critiques by peers, however, only lasted for a decade. The funding revived by the Japanese government in the early 1980’s for research in AI also encouraged the research in connectionism that led to the discovery of a perceptron with multiple layers of neurons - the Multi-Layer Perceptron (MLP). Unlike, the perceptron, MLP was able to classify multiple classes and find non-linear patterns in high dimensional datasets. Multiple layers of connected neurons through numerical weights were diagrammatically visualized from left to right (hence, feed forward networks) - the first layer of neurons signaling the second and so on. The last layer was called the output layer and was replaced from a single neuron in perceptron to multiple neurons in MLP thus corresponding to the predicted probabilities of multiple classes. Moreover, mathematical functions like sigmoid and hyperbolic tangents (tanh) introduced in between each layer of neurons were able to produce non-linear mathematical models to classify complex high dimensional patterns. These mathematical functions, known as activation functions, also reified the idea of firing of a certain set of neurons in a human brain to produce thought to a computational prediction in the case of an artificial agent. However, the lack of availability of large-scale datasets as well as computational power during the 80’s limited the number of layers in an MLP for the training of such networks.

The “forward feed” of neural networks was in fact high dimensional matrix vector multiplications hidden behind the abstract connectionist idea that ultimately produced a vector of predicted outputs. Since performing such matrix-vector calculations on big datasets required high processing power, soon the attention shifted to the graphical processing cards of video game industries of the 1980’s to augment the computational power of the CPU’s. Moreover, access to large amounts of data or “big data” like MNIST and CIFAR datasets (Deng, 2012) (Torralba et al., 2008) in the first decades of 21st century and large scale scraping of public information from social networking sites like Facebook and Google lead to the emergence of the field of Deep Learning (DL) (Goodfellow et al., 2016) which involved training neural networks with much more layers and connected neurons on big datasets. In parallel, the advancements in cellular technologies, infrastructures for distribution of internet facilities, and inexpensive high scale computational power amplified data generation from the public thus contributing to some major breakthroughs in 2012 with the development of deep neural networks for speech recognition (using Hidden Markov models), image classification (by GoogleX lab), object detection (by Convolutional Neural Networks) reducing error rates in prediction - significantly close to the judgement level of humans in such tasks as revealed in this short history.

A dataset of inputs labelled with (or without) their corresponding outputs (or labels), an architecture (i.e., the number of layers and number of neurons in each layer), choice of activation functions in between the layers, a formulation of an error function (e.g., mean average error) to quantify the difference in the predicted and known labels, and a hardware facility to perform high computations is thus needed to produce intelligent entities that share a world along with us. However, such intelligence also shares a non-linear history of technical as well as socio-political and economic fluxes from varied disciplines, funding agencies, rivalries among viewpoints with a dream to imitate the human brain – revealing the close intertwined relations between the socio-material and the technological timelines.