Gen AI



We started with participation introductions. There were 12 of us – with ages ranging from 12 to 75. Except for the 12 year old, who decided to make AI drawings on his dad’s phone for the second half of the session, toast master Abinash managed to keep his audience awake and involved. Abinash works with Persistent on AI solutions using IBN’s Watson Next. He was helped by Arun Nair, who runs https://www.canspirit.ai/, a company that works in the AI space. 

Intelligence as per Abinash is simply processing information to do cognitive work. Artificial intelligence is when a machine does the same stuff. The competence of AI was first talked about by Alan Turing. In 1956, he talked of the Turing test, which means that when a human interacts with two agents in a blind test, one human and one machine, she should not be able to distinguish between the two. In the fifties computing power was non-existent, so all the work on AI remained only on paper. By the nineties some muscle had started to be added to computers. One of the pivotal moments in Ai came  in 1997 when IBM’s Deep Blue defeated a chess grandmaster, using brute force computing. The brute force approach continues to be the hallmark of AI even today, albeit with a difference in approach. Data scientists have been studying the functioning of the human brain for long – and around 2012 neural networks – decision networks that modelled the human brain learning started appearing. Google Translate was one of the earliest applications of Gen AI.

To model the human brain, you need simulated neurons, the geeks call them nodes, which are nothing but software modules. Neural networks are programs that solve mathematical calculations related to these nodes. Input nodes categorise this data and pass it on to subsequent layers that process this data further till it reaches the output layer. If we have a binary (yes/no) classification problem, the output layer will have only one output node, which will give the result as 1 or 0. A number, called weight, represents the connections between one node and another. The weight is a positive number if one node excites another, or negative if one node suppresses the other. Nodes with higher weight values have more influence on the other nodes. 

Neural networks need much more training as compared to other machine learning methods. They need millions of examples of training data rather than perhaps the hundreds or thousands that a simpler network might need. Artificial neural networks learn continuously from their mistakes. In simple terms, you can think of the data flowing from the input node to the output node through many different paths in the neural network. Only one path is the correct one that maps the input node to the correct output node. Each node makes a guess about the next node in the path.It checks if the guess was correct. Nodes assign higher weight values to paths that lead to more correct guesses and lower weight values to node paths that lead to incorrect guesses. For the next data point, the nodes make a new prediction using the higher weight paths. This way, the network uses corrective feedback loops to improve predictive analytics. 

Neural networks learn by initially processing several large sets of data. In supervised learning, data scientists give artificial neural networks labelled datasets that provide the right answer in advance. For example, a network training in facial recognition initially processes hundreds of thousands of images of human faces, with various terms related to ethnic origin, country, or emotion describing each image. Fun fact – most of the model training happens in low labor cost countries like India. Tesla road images annotation work was done out of a Yerwada based BPO. Companies like Google save labor by using security captchas for helping with annotations. Some companies also use sites like Amazon Mechanical Turk and Upwork to offload annotation work. 

The neural network slowly builds knowledge from these datasets, which provide the right answer in advance. After the network has been trained, it starts making guesses about the ethnic origin or emotion of a new image of a human face that it has never processed before. In supervised learning, a data scientist manually determines the set of relevant features that the software must analyse. This limits the software’s ability, which makes it tedious to create and manage. 

So time for a new approach, jargoned deep learning, where the data scientist gives only raw data to the software. The deep learning network derives the features by itself and learns more independently. It can analyse unstructured datasets like text documents, identify which data attributes to prioritise, and solve more complex problems. For example, if you were training a machine learning software to identify an image of a pet correctly, you would need to take these steps:

  • Find and label thousands of pet images, like cats, dogs, horses, hamsters, parrots, and so on, manually.
  • Tell the machine learning software what features to look for so it can identify the image using elimination. For instance, it might count the number of legs, then check for eye shape, ear shape, tail, fur, and so on.
  • Manually assess and change the labelled datasets to improve the software’s accuracy. For example, if your training set has too many pictures of black cats, the software will correctly identify a black cat but not a white one.

In deep learning, the neural networks process all the images and automatically determine what they need to analyse: the number of legs and the face shape first, then look at the tails last to correctly identify the animal in the image. This style is also referred to as GPT or Generative Pretrained Transformers. GPT models analyse natural language queries, known as prompts, and predict the best possible response based on their understanding of language. To do that, the GPT models rely on the knowledge they gain after they’re trained with hundreds of billions of parameters on massive language datasets. They can take input context into account and dynamically attend to different parts of the input, making them capable of generating long responses, not just the next word in a sequence. For example, when asked to generate a piece of Shakespeare-inspired content, a GPT model does so by remembering and reconstructing new phrases and entire sentences with a similar literary style.

The transformer architecture uses a self-attention mechanism to focus on different parts of the input text. Transformers pre-process text inputs as mathematical representations of a word. When encoded in vector space, words that are closer together are expected to be closer in meaning. These embeddings are processed through an encoder component that captures contextual information  from the input. When it receives input, the transformer network’s encoder block separates words into embeddings and assigns weight to each. Weights are parameters to indicate the relevance of words in a sentence. Position encoders allow GPT models to prevent ambiguous meanings when a word is used in other parts of a sentence. For example, position encoding allows the transformer model to differentiate the semantic differences between these sentences: 

A dog chases a cat

A cat chases a dog

The encoder processes the input sentence and generates a fixed-length vector representation. The decoder uses the vector representation to predict the requested output. Compared to its predecessors, transformers are more parallelizable because they do not process words sequentially one at a time, but instead, process the entire input all at once during the learning cycle. Due to this and a fine-tune with supervised training, a process known as reinforcement learning with human feedback (RLHF).  GPTs are able to give fluent answers to almost any input you provide.

Unlike Large Language Models (LLMs), foundational models which use not just text, but also audio and video. GPT-3 was trained with over 175 billion parameters or weights and one can consider it to have the same capability as a rat’s brain. Engineers trained it on over 45 terabytes of data from sources like web texts, Common Crawl, books, and Wikipedia. Prior to training, the average quality of the datasets was improved as the model matured from 2018’s version 1 to 2021’s version 3. This training process required massive computing power – about 3,000 to 5,000 GPUs were used. One GPU has the computing power of 3000 laptops.

GPT models can be used to analyze customer feedback and summarize it in easily understandable text. Thus, artificial neural networks attempt to solve complicated problems, like summarizing documents or recognizing faces, with greater accuracy. You can collect customer sentiment data from sources like surveys, reviews, and live chats, then you can ask a GPT model to summarize the data. GPT models can be used to enable virtual characters to converse naturally with human players in virtual reality. GPT models can be used to provide a better search experience for help desk personnel. They can query the product knowledge base with conversational language to retrieve relevant product information.

But AI can go wrong – and sometimes in dangerous ways. AI jargon borrows the term hallucinations from psychology, AI hallucinations occur when an AI model generates false or illogical information that isn’t based on real data or events, but is presented as fact. Although LLMs are designed to produce fluent and coherent text, they have no understanding of the underlying reality that they are describing. All they do is predict what the next word will be based on probability, not accuracy. Generative AI is not really intelligence, it’s pattern matching. They’re always generating something that’s statistically plausible, but not necessarily truthful. Even assigning the term “hallucination” to what these systems are doing is too generous, since it implies that artificial intelligence is capable of perception. It isn’t.  

Because the grammar and structure of these AI-generated sentences are so eloquent, they appear accurate. But they are not. AI hallucinations are caused by a variety of factors, including biased or low-quality training data, a lack of context provided by the user or insufficient programming in the model that keeps it from correctly interpreting information. So the direction in future is to explainable AI, which is used to describe an AI model, its expected impact and potential biases. It helps characterize model accuracy, fairness, transparency and outcomes in AI-powered decision making. Explainable AI is crucial for an organization in building trust and confidence when putting AI models into production. AI explainability also helps an organization adopt a responsible approach to AI development.

We earlier thought that AI would threaten the low cognitive jobs, but today it looks like the high cognitive ones are the ones in danger. General skill jobs for vanilla coders have started disappearing. Specialised coding engines like OctoGPT do a decent job writing code. After AI and ML, the next buzzword may be prompt engineering. Google’s search engine users realised the importance of typing in the right words to get meaningful results. This same common sense is now being called Prompt engineering. The quality of code would depend on the quality of the prompt. Nandu offered a silver lining here – his contention was that most bugs have their genesis in incorrect user requirements. But if coding is going out of fashion, one wonders what the crystal ball holds for managers. 

We ended with a Matrix-like discussion on the singularity, when technological growth becomes uncontrollable and irreversible. An upgradable intelligent agent will eventually enter a “runaway reaction” of self-improvement cycles, each new and more intelligent generation appearing more and more rapidly, causing an “explosion” in intelligence and resulting in a powerful superintelligence that qualitatively far surpasses all human intelligence. So here are questions that should worry governments and tech leaders: Should AI engines have consciousness? Can we label Gospel.ai, that identifies 100 targets a day for the Israel Defence forces in the Hamas, a bad actor? Can we feed ethics to AI engines?

Warning by Geoffrey Hinton: https://www.theguardian.com/technology/2023/may/02/geoffrey-hinton-godfather-of-ai-quits-google-warns-dangers-of-machine-learning

https://www.livemint.com/technology/tech-news/sorry-data-isn-t-really-the-new-oil-11700415117429.html

Leave a Comment