It was early October 2023 when I realized that artificial intelligence (AI) had opened its eyes. I was reviewing recent articles on AI and ran across OpenAI’s “GPT-4V(ision) System Card”, released on Sept. 25, 2023. The paper was only a few days old when I saw it for the first time, and it took me a few minutes to understand what I was reading. GPT-4V, a “multi-modal LLM” (MLLM), is a large language model (LLM) trained on various modalities of content—not just text. In addition to the enormous amount of data that ChatGPT and GPT -4 have been trained on, it has also been trained on video content of all types and it can be trained on additional modes of communication. In effect, it learns about the world around it not by reading about it, but by looking at pictures, charts, colors, facial expressions, buildings, the sky…everything. With its eyes open, AI has taken a cognitive leap forward.

Below, I describe some of the academic papers and research regarding MLLMs, then move on to legal and other important issues that are similar between LLMs and MLLMs, those that are different but obvious improvements, and those that should cause all of us to watch developments in this area carefully.