If you suspend your transcription on amara.org, please add a timestamp below to indicate how far you progressed! This will help others to resume your work!
Please do not press “publish” on amara.org to save your progress, use “save draft” instead. Only press “publish” when you're done with quality control.
The talk will be conducted by sharing various experiments we've done under the umbrella of generative AI models. We will begin with a general idea of how we, as artists/programmers, perceive these models and our research on the workflow of these constructs. Then, we will further elaborate on our exploration of the Stable Diffusion pipeline and datasets. Throughout our investigation, we discovered that some essential parts are all based on the same few datasets, models, and algorithms. This causes us to think that if we investigate deeper into some specific mechanisms, we might be able to reflect on the bigger picture of some political discourses surrounding generative AI models. We deconstructed the models into three steps essential to understanding how they worked: dataset, embedding, and diffusions. Our examples are primarily based on Stable-Diffusion, but some concepts are interchangeable in other generative models.
As datasets and machine-learning models grow in scale and complexity, understanding their nuances becomes challenging. Large datasets, like the one for training Stable Diffusion, are filtered using algorithms often employing machine learning. To "enhance" image generation, LAION's extensive dataset underwent filtering with an aesthetic prediction algorithm that uses machine learning to score the aesthetics of an image with a strong bias towards water-color and oil paintings. Besides the aesthetic scoring of images, images are also scored with a not safe-for-work classifier that outputs a probability of an image containing explicit content . This algorithm comes with its own discriminatory tendencies that we explore in the talk and furthermore asks how and by whom we want our datasets to be filtered and constructed.
Many generative models are built upon Contrastive Language-Image Pre-training (CLIP) and its open-source version, Open-CLIP, which stochastically relates images and texts. These models connect images and text, digitize text, and calculate distances between words and images. However, they heavily rely on a large number of text-image pairs during training, potentially introducing biases into the database. We conducted experiments involving various "false labelling" scenarios and identified correlations. For instance, we used faces from ThisPersonDoesNotExist to determine "happiness" faces, explored ethnicities and occupations on different looks, and analyzed stock images of culturally diverse food. The results often align with human predictions, but does that mean anything?
In the third part, we take a closer look at the image generation process, focusing on the Stable Diffusion pipeline. Generative AI models, like Stable Diffusion, have the ability not only to generate images from text descriptions but also to process existing images. Depending on the settings, they can reproduce input images with great accuracy. However, errors accumulate with each iteration when this AI reproduction is recursively used as input. We observed that images gradually transform into purple patterns or a limited set of mundane concepts depending on the parameters and settings. This raises questions about the models' tendencies to default to learned patterns.