THE MOSAIC MIND OF AI

Transforming Text Data: Advanced Machine Learning Techniques for Data Intelligence and Visualization

For a detailed case study and better experience, please visit this site on a PC.

Language:  Python  |  HTML/CSS/JAVASCRIPT  

After a long, exhausting battle, while sipping coffee the next morning, an intriguing thought pop into my head, "What a journey! I've developed a web app to transfer text data into digital insights, and invite humanity to re-question the notions of AI mind and intelligence. Am I not, in essence, striving to decipher the computer 'mind' through its own language system along the way?".

Abstract

Language is one of the most powerful tools for expressing human thoughts and emotions, and it plays a crucial role in communication and the development of complex societies and cultures. Yet, the core role of language extends beyond communication. Noam Chomsky, a great mind and leading figure in linguistics, emphasized the profound internal aspect of language. He suggested that the primary function of language is related more to thought than to communication. This concept underscores the importance role of internal cognition in forming internal mental models. Thus, while language alone may not be the sole indicator of consciousness and intelligence, it forms part of a broader array of cognitive and behavioral indicators that collectively reflect the presence of a mind.

Despite significant advancements in Artificial Neural Networks (ANNs), particularly in Large Language Models (LLMs), which excel in tasks such as generating human-like text and answering questions, these models fundamentally lack a true understanding of language in the human sense, especially in terms of deep semantic comprehension. They struggle with the complexities and subtleties of language that human beings navigate effortlessly. So, while LLMs are able to demonstrate sophisticated linguistic knowledges, this does not necessarily imply they possess their own 'mind'. What, then, would be required beyond sufficient language capabilities to build these 'internal models' that we associate with true self-awareness? A true intelligence, capable of self-reflection and regulation?

This project aims to leverage machine learning and AI techniques—including Knowledge Graphs, Natural Language Processing (NLP), Distant Reading, Graph Neural Network, and LLMs—to explore potential solutions for deciphering the concept of the " AI Mind" through domain specificity and computational cognitive dimensions. By analyzing text data with statistical algorithms and NLP techniques, it is hoped that this exploration can deepen our understanding of how AI transforms data into information, and information into insight, through analysis, visualization and generation. And envision whether there will be a possible future where AI possesses higher order cognitive processing ability to understand the deep semantic meaning within human language.

By integrating nuanced human insights, this interactive and artistic representation of "AI mind" seeks to provide a novel and intriguing opportunity for humanity to collectively re-question ourselves the notions of 'Mind' and 'Intelligence'.

research

In today's data-driven era, despite their inherent limitations, data analysis and narrative visualization remain powerful tools in terms of offering us both detailed and macroscopic insights. Leveraged by mathematical algorithms and computational linguistic techniques, these tools provide us a more holistic interpretation of information that extends beyond what our notably highly sophisticated eyes can perceive. The idea of representing information and knowledge in graphical forms trace way back to ancient times, like cave paintings and hieroglyphs. These early forms of visual communication laid the groundwork for the modern concept of representing knowledge in graph form, which we see today in artificial intelligence (AI) areas such as semantic networks, knowledge graphs, and even graph neural network.

Semantic networks, first introduced in 1960s, are a form of knowledge representation in AI and a foundational concept in computer science, particularly in areas of knowledge representation and information processing. They are primarily used to visualize complex sets of relationships between entities and facilitate the understanding and analysis of specific domains. Semantic networks aid in reasoning about the connections and interactions between concepts through graph traversal techniques.

Complemented by semantic networks and other data structuring techniques, the term "distant reading" was first coined by Franco Moretti in 2002. Positioned within the context of literary studies, Moretti proposed this method as a way to analyze large volumes of literary texts using computational methods without focusing on individual texts in detail. Rather than closely analyzing specific texts, distant reading employs computational tools and techniques to identify patterns, trends, and structures across a broad corpus of literature. Through statistical analysis, knowledge graph visualization, and machine learning models, researchers can uncover insights into literary history, cultural trends, and thematic developments that are often obscured by traditional close reading methods.

Even though our current LLMs are not built for domain-specific tasks and therefore do not focus on deep semantic understanding, both semantic networks and distant reading have enhanced the structure and functionality of modern AI systems used by large-scale search engines, recommendation systems, and in semantic search. These methodologies leverage the principles of older models to link data, enabling more sophisticated information retrieval, natural language processing, and AI applications. They are indispensable in today’s machine learning and AI landscape for tasks that require an understanding of complex relationships and interdependencies in both textual and numerical data.

After extensive review of related papers and technical research to conduct thorough experiments, the following techniques and methodologies in the Modeling Lab were explored and employed to better visualize the "cognitive process of the AI mind".

Modeling lab

Data source

The datasets used in this application are categorized into three main types:

► Two large volumes of text data derived from classical books.
Three small real-world entity datasets from Corpora.

► Social media data, both numerical and categorical, sourced from Twitter and Facebook platforms.

► User-generated text data through real-time interaction within the web app.

Data Dictionary

Twitter dataset
Pride and Prejudice text data

Dataset 1Large volume of literary text data from the book "Pride and Prejudice" and "Alice's adventure in wonderland".

Size:
976 kb in total. Containing 156,644 words in "Pride and Prejudice" and 26,432 words in "Alice's adventure in wonderland".
Source: Project Gutenberg: Free eBooks
Content: original text data from each ebook.
App page: Explore - Social Network Visualization - Pride and Prejudice

➢ Three real-world entity datasets from
Corpora: celebrities.json; books.json; president_quote.json.
Size: 35 kb in total.
Source: Corpora Github
Content: A collection of small corpuses of interesting data for the creation of bots and similar stuff.
App page: Explore - Generate your own - Digital Story


Dataset 2Social Media Datasets

➣ Twitter Posts

Size: 43MB, includes 416,124 pieces of real user-generated content from English-speaking Twitter users.
Source: Twitter posts
Content: This dataset primarily used for sentiment analysis model training. Each entry in this dataset consists of a text segment representing a Twitter message and a corresponding label indicating the predominant emotion conveyed. The emotion label are classified into six categories by numerical numbers: sadness (0), joy (1), love (2), anger (3), fear (4), and surprise (5).
App page: Explore - Sentiment Analyzer


Facebook Social Circles

Size: 5.3MB, containing 34,791 real user-generated posts from Facebook.
Source: Facebook data was collected from survey participants using Facebook app.
Content: Facebook data has been anonymized by replacing the Facebook-internal ids for each user with a new value. Also, while feature vectors from this dataset have been provided, the interpretation of those features has been obscured. For instance, where the original dataset may have contained a feature "political=Democratic Party", the new data would simply contain "political=anonymized feature 1". Thus, using the anonymized data it is possible to determine whether two users have the same political affiliations, but not what their individual political affiliations represent.
App page: Explore - Social Network Visualization - Facebook Social Network


Dataset 3User-generated data through interaction within app.

Size: 0.
All the data will be discarded once user close the site as it denoted in the application.
Content: The user-generated data will be used for social network graph creation and story generation.
App page: Explore - Generate your own  


Knowledge graph
analysis &
visualization

For demonstration purpose, I'll exemplified the knowledge graph visualization and analysis process using the book "Pride and Prejudice".

Python Library

In order to perform the mathematical analysis and interactive graph representation, these python libraries were used to build the pipeline:

☞  Expand more Coding Details in Modeling Lab

UX/UI

UX/UI Design

The term 'Data Science and Analysis' can be daunting, not only to the general public unfamiliar with its intricate processes but sometimes even to data scientists themselves. So for the UX/UI design in this project, I aim to create an interface that is both concise and straightforward, by leveraging the clarity provided by graphical visualizations of data to simplify the complex underlying computational processes. Through the lens of statistically analysis and knowledge graph, I intend to represent the text data in a more visually artistic and engaging manner.

With this invention in mind, the information architecture of the website will be designed with a minimalist approach, featuring two sidebars for easy navigation and four functional pages, each serving a unique purpose. This layout ensures that users can navigate the website intuitively without being overwhelmed by excessive options or complex structures.

Information Architecture

Logo Design

User Interface

Home Page - Pride and Prejudice Social New Work Visualization

Facebook Social Community

AI Vision

Sentiment Analyzer

Network Visualization - Alice's Adventure in Wonderland

Generate your own - Electronic Story

Generate your own - Social Net Graph

Full-Stack Application Deployment

Web Framework
Streamlit

For this project, I opted to utilize Streamlit over building a custom server from scratch. The user interface, developed with additional HTML, CSS, and JavaScript, receives text inputs and establishes a WebSocket connection for data transfer to the Streamlit-powered Python backend. Then, a Machine Learning pipeline processes the data, and the results are visualized through an NLP pipeline, dynamically displaying interactive graphs on the web interface. Python handles backend data acquisition, analysis, and classification, while the Vis-Network library enhances the visual output.

Throughout the development, I navigated challenges related to syntax differences, data formatting, and lengthy processing times. Through persistence, I learned how to combine different syntax on the same page, refined the app's infrastructure, and developed automated procedures for storing data in JSON format, enabling animated graph visualization on an HTML canvas without Python's typical constraints.

Facebook Social Network - Visualization

Customized text data uploading system:

User Data Uploading system - generate graph & generate story

All user-inputted data will be automatically discarded once they close the website. This feature has been intentionally designed and clearly stated on the page to address any data privacy concerns. As no data is saved, users are provided with a download option to save any interesting work they create using this system.

User generated graph through the graph network pipeline

User Generated Text Representation

Pipeline Productization

The deployment phase of any software or system, especially in fields like AI and software development, can be incredibly challenging. It often involves addressing numerous small details and resolving various bugs that weren't evident during earlier stages of development.

This was the first time I single-handedly deployed a full-stack web application, truly a one man army effort. The difficulties I encountered during the deployment process were overwhelming, with numerous errors, bugs, and package management issues, not to mention the subtle discrepancies in the UI.

I spent three consecutive nights to dive deeper into these formidable challenges, which eventually taught me how to decipher obscured error message from the back-end terminal in both system. This newfound skill in turn helped me tackle down the problems one by one and successfully deployed the app in the end.

After a long, exhausting battle, while sipping coffee the next morning, an intriguing thought pop into my head, 'What a journey! I've built a web app to transfer text data into digital insights, and invite humanity to re-question the notions of AI mind and intelligence. Am I not, in essence, striving to decipher the computer 'mind' through its own language system along the way?".

Successfully Deployed Web Application

Partial code of the web app

App Deployment back-end process  

Data Annotation

Data Annotation, to me, is one of most crucial elements in any kind of machine learning project.
I devoted a good amount of time carefully to writing the data annotation both in the app guidelines and documentation part for each pages. Here's a break down of the process:

FINAL THOUGHTS

In the middle of this project, I began drafting a paper titled " The Evolution of Cognitive Representation from Cortex to Computing: The Visualization of Mind ". This work mainly reviews how the mind has been portrayed throughout the history of neuroscience, both from scientific and artistic perspectives. The final section of the paper briefly discussed the visualization of mind in artificial intelligence field, and how our current AI model akin to human cognitive function in both theoretical and practical way. The discussion then circles back to this project.

As we trace the evolution of cognitive representation from historical contexts and delve into recent advancements and setbacks in AI, the convergence of artificial intelligence with our deeper understanding of human cognition raises profound questions about the nature of 'intelligence', 'cognition', and 'perception'. Although our current AI systems are inspired by animal and human neural system, and inherently connected to human cognition, they have not yet achieved full parity with human capabilities with no doubt. The human brain is, by nature, a highly sophisticated and complicated computational machine. Despite remarkable advancements in science, math, and technology, we have only began to unravel the complexities of our remarkable brain. Whether AI will ever truly match human cognitive capabilities remains speculative and depends not only on technological advancements but also on deeper philosophical understanding and ethical considerations.

The intersection and convergence of cognitive science, technology, and art, to me, is a poetic way to illuminate the imperceptible of human nature. It is my hope that through the technologies we've developed, the narratives we've crafted, and the data we've meticulously analyzed, both natural and artificial phenomena can be understood more intuitively and insightfully. AI, perhaps the most transformative tool in human history, holds the potential to fundamentally alter our world into a more sustainable and peaceful environment, provided it is steered with the right ethical and philosophical guidance.

As we all strive to understand our very own experiences of joy and sorrow daily, what the future AGI’s motivation could be? How can we collectively forge an environment that fosters more joy and beauty in this potential 'digital mind',  instead of perpetuating the biases and hatred that already existing within our species?

- THE END-

Abstract
research
MODELING LAB
ux/ui
Information system & productization
Conclusion

NEXT PROJECT

ROOT

Back to Top