Computational creativity: generative creature design for concept art

11 min readOct 9, 2020

Abstract

With the ever-powerful deep learning algorithm, computer graphics have been pushed to a new level. The generative adversarial network (GAN) can now generate almost any type of photo-realistic images with the proper size of datasets. However, most of the GAN use cases have been limited to the pursue of lifelike graphics. In this article, I prose a new framework “MonsterGAN,” combining machine learning, design, and psychology. MonsterGAN is a prototype of a generative design system (DRCI), which reduces the cognitive burden of creation and makes creativity scalable, for concept artists.

By encoding human ideas into a feature vector, I identified that a machine can easily compute creativity by latent space operation. The combination of machines diverging variations and humans converging solutions improves an artist’s productivity and scales creativity.

What happens if computer vision passes the Turing test? Where and how can we use it? As a designer, I’m fascinated by these questions because we designers are the graphic wizards who deal with creativity and graphics daily. One bold idea comes to my mind: can we have machine conceptualized creativity?

In 1951, Fitts invented the famous concept of function allocation and the MABA-MABA list. (Men-Are-Better-At/Machines-Are-Better-At lists) It explored the possibilities of how man and machine can work together as a team. In terms of the SRK taxonomy (skills-, rules-, and knowledge-based tasks), computers’ capability is limited to skills-based tasks and rules-based knowledge. With deep learning, I believe that the level of automation has changed, and machines can do specific knowledge-based jobs, which makes it necessary to rethink the notion of function allocation. For the full literature review click here.

I propose a new MABA-MABA list for modern challenges that are mostly knowledge-based. We can have machines do the first 70% of a job and have humans pick up the last 30%. Machines are good at solving well-defined problems with their strengths: speed, precision, variation, scaling, and sense; On the other hand, humans are good at jobs that are not well-defined with their strengths: design, empathy, and generalization. With teamwork, we can change the order of process, such as man-machine-man, machine-man-machine, and machine-machine-man.

Methodology: New Creative-thinking workflow

In the early stage of creative thinking, the target is usually uncertain. This makes it impossible for machines to implement creative thinking. But what if we can dismantle each step of creative thinking and allocate the tasks according to the new MABA-MABA list? As we know, there are four stages in the thinking process:

Preparation

The preparation step consists of observing, listening, asking, reading, collecting, comparing, contrasting, analyzing, and relating all kinds of objects and information.

Incubation

The incubation process is both conscious and unconscious. This step involves thinking about parts and relationships, reasoning, and often a fallow period.

Illumination

Inspiration very often appears during this fallow period [of incubation]. This probably accounts for the emphasis on releasing tension in order to be creative.

Verification

The step labeled verification is a period of hard work. This is the process of converting an idea into an object or into an articulated form.

I think that there is an opportunity to have machines that assist humans in the first two steps of the creative thinking process: preparation and incubation. As we know, uncertainty in a creative project usually stops people from delivering results on time because there are too many possibilities, and people tend to change their minds at the last second. What if we can build a generative design system for problem-solving processes of abduction and induction? This can help us decrease the time we spend on preparation and incubation; therefore, it “accelerates” the “Aha” moment.

Machine Divergence and human Convergence

In a traditional design thinking process, people repeat the process of divergence and convergence until they come up with a solution. It is how people narrow down the direction and iterate the practice. However, the problem with repetitive creative labor is that humans burnout, limiting the possibility of scaling up creativity. With the new MABA-MABA list, we can have machines diverge, and humans converge. If we can somehow encode ideas into numbers of vectors (this is exactly what deep learning is good at), it is reasonable to have machines diverge because computers can operate vectors easily; and this can also decrease the cognitive workload, helping humans work faster.

The importance of ambiguity

We know that Generative Adversarial Networks are really difficult to train. Both the quality and quantity of the data need to be high. However, in reality, creative graphics data is usually not sufficient. This creates a dilemma because the output of the GAN flattens out at an unacceptable level of quality. Fortunately, we can bypass the problem by asking a GAN model to generate ambiguous images. Therefore, we do not expect a model that generates “photo-realistic” results. Instead, a “good enough” model will be sufficient for humans to pick up.

Why do we want ambiguous results? It turns out that ambiguity plays a vital role in creative thinking. (Tamara Carleton, William Cockayne, and Larry Leifer, 20, An Exploratory Study about the Role of Ambiguity During Complex Problem Solving)

This resolves the problem of low-quality results with limited datasets because we need abstract images to get inspired. Also, symbolically, this idea matches the coarse-to-fine process in computer vision.

StyleGAN 2: generate ambiguous sketches

I decided to train a concept art model that requires heavy creativity with StyleGAN2. As a result, I came up with the idea of asking a model to generate abstract graphics. By providing low-level sketches to concept artists, they worked with these images as a foundation, and saved a substantial amount of time. I believe that these abstract graphics can, in some way, generate an emergence phenomenon for concept art. In this project, I used the implementation from this paper Training Generative Adversarial Networks with Limited Data.

The method used in the paper:
The NVIDIA research team considered a pipeline of 18 transformations that were grouped into 6 categories: pixel blitting (x-flips, 90◦ rotations, integer translation), more general geometric transformations, color transforms, image-space filtering, additive noise, and cutout.
During training, each image shown to the discriminator used a pre-defined set of transformations in a fixed order. The strength of augmentations was controlled by the scalar p ∈ [0, 1], so that each transformation was applied with probability p or skipped with probability 1 − p. We always used the same value of p for all transformations. The randomization was done separately for each augmentation and for each image in a minibatch. The generator was guided to produce only clean images as long as p remains below the practical safety limit.

Experiment Results

Now, back to ambiguity, we use Fréchet Inception Distance (FID) to measure how well the GAN works (the lower the score the more realistic to your predictions). However, in the case of MonsterGAN, a low FID score doesn’t always mean “better” results. It turns out that, although the lower FID model did provide more texture “details” of the creatures, it actually loses the diversity of shapes and forms. (result is shown below)

P0.7 with a score of 29.65 in FID50K has more diversity in creature forms

Results of the different parameters of SyleGAN-ADA

MonserGAN: Designing

After we got a trained model, an artist can browse the forms library and choose the most suitable shape for their requirements. Instead of starting from scratch, which is most of the time-consuming part, artists can pick several images that they find interesting. Since the inputs of the GAN are noise vectors, this gives our infinite concepts.

Let’s say we consider this stony monster with big claws matching our direction. We can then use the input vector of this image as the center of the starting point (of latent space). By lowering down the truncation-psi and the sampling distance, we can achieve the effect of detail variation under a similar shape.

MonsterGAN: Latent space exploration

It could also be the case that we want to merge multiple directions. And this is where latent space manipulation jumps in. Traditionally, if we want to change the feature of an image, tweak the Z latent space. In terms of combining different shapes, the below images are some results of Z latent space manipulation.

As can be seen, manipulating Z latent space is a rough approach to control features since mapping the Z to the features vector might entangle the distribution of features together. Even if we get an acceptable result, the art direction is just uncontrollable. Research related to this subject was discussed in the paper of Analyzing and Improving the Image Quality of StyleGAN. Thus, StyleGAN2 created 8 extra fully connected layers to encode W from Z.

In the implementation of style mixing, there are 18 style layers. We apply style layers to our target input with a range of style layers from 1 to 18 layers. (fully connect layers) I did some experiment of extracting the creature’s feature from the W latent space, and here’s what I found:

1. the reult of using only 1 layer is subtle
2. mixing 3–5 layers works the best
3. using all 18 layers may cause the result same as column style

You can find more details in this article

MonsterGAN: Human Refinement

After the artists are satisfied with the model’s result, concept artists can jump in and start working on the refinement. Now, here’s the beauty of ambiguity. Since every person perceives the same abstract sketch with different interpretations, it gives artists more flexibility to leverage their creativity. As can be seen, many “errors” were transformed into a new design.

In this project, I collaborated with concept artist Steve Wu, a senior concept designer specialized in creature design and has more than 6 years of experience in the film industry. The works shown in this article are credited to him. Check out Steve’s amazing work.

In this design, the abstract visual cues inspired Steve in different ways. First, we can see that the textures on the creature inspired Steve to create the teeth, hair on the head, wings, and extension of the abdomen. Furthermore, Steve decided to remove the block in the lower section of the legs.

This is a good example of how ambiguity inspires artists. The left side of the hog was originally a meaningless graphics generated by the model. Surprisingly, Steve managed to turn it into a snout of the hog. Also, the shape of the hog’s head was influenced by the texture of the original image, which became the highlight of this design.

Again, this was an interesting example of how an artist transformed the flaws of result into art. There were two fragments that were supposed to be removed. However, Steve changed it to two sparrows, standing on the creature’s horn.

Related GAN-based tool for compositing: GauGAN

After finished the creature designs, the artists can start working on the background and merge the creature into the scene, providing a look and feel of the concept. Again, we can also develop models to assist humans in each step. (e.g. texture creation, lighting, color grading, scene creation) Since Nvidia has already built the GauGAN, a tool for generating scenery images, we will use it directly.

Showcases of MonsterGAN

https://www.artstation.com/artwork/18B2B3

https://www.artstation.com/artwork/XnbkbY

https://www.artstation.com/artwork/0ndQe4

https://www.artstation.com/artwork/L3ENrv

https://www.artstation.com/artwork/ykKrNn

https://www.artstation.com/artwork/9mR45W

Evaluation

In a nutshell, I believe that the latent space of big data provides a higher dimension of creativity by creating a new medium for people to sculpture their imagination and experience. This is because we can use machine learning to extract information from the enormous datasets collected from mobile devices. In other words, Data Sculpturing (graph below) translates our indescribable subjects or creativity to a latent vector and re-create the output to amplify the creators’ creativity by vector arithmetic; The combination of machines diverging variations and humans converging solutions improves an artist’s productivity and make creativity scalable.