Human neurons reacting to particular individuals, such Halle Berry or Jennifer Aniston, was detailed in a letter published in Nature in 2005. Not only did they choose specific individuals, but they did so even if they weren’t offered pictures, sketches, or even just the person’s name. That was fantastic. Multimodal neurons were present. According to the lead author, “You are looking at the far end of the transformation from metric, visual shapes to conceptual… information.”
In artificial neural networks, we report the presence of comparable multimodal neurons. This contains neurons that favor well-known public individuals or imaginary characters, like Spider-Man or Lady Gaga. These artificial neurons react to the same subject in the drawings, photos, and images that bear their name, just like the biological multimodal neurons do:
The Biological Neuron
The CLIP Neuron
An earlier artificial neuron
The very abstract neurons we have discovered are far more extensive than people-detecting neurons. Certain things, like weather, seasons, alphabet, numbers, or core colors, seem like they belong in a kindergarten curriculum. Even seemingly insignificant traits have extensive multimodality; for example, a yellow neuron will fire for images of the words “yellow,” “banana,” and “lemon,” in addition to the hue.
These multimodal neurons are present in the most recent CLIP models. Even though it’s possible that previous models contained similar unidentified multimodal neurons. The two components of a CLIP model—a Transformer language model and a ResNet vision model—are trained to align text and image pairs from the internet using a contrastive loss. We identify multimodal neurons in all of the CLIP models, which range in size, but we concentrate on the RN50-x4 model, which is the mid-sized model. 5 For a more thorough explanation of CLIP’s architecture and functionality, we direct readers to the CLIP paper and blog post. As our analysis will concentrate on CLIP’s visual side, we refer to the model “reading” text in images when we discuss a multimodal neuron reacting to text.
One may argue that CLIP’s abstract visual elements are a natural outcome of text and vision alignment. We anticipate that abstract “topic” properties will be learned by word embeddings and language models in general. Either the “language side” of the paradigm, which processes captions, must relinquish certain traits, or the “vision side,” which processes them, must create visual analogs. 7 8 However, these characteristics differ qualitatively from neurons previously examined in vision models, even if they may appear normal in hindsight. Additionally, these models have practical ramifications because they are susceptible to a type of “typographic attack,” in which adversarial text added to photos might result in a systematic misclassification.
Not just one object is the focus of these neurons’ selection. A morning neuron activating for pictures of breakfast, or a Barack Obama neuron firing for Michelle Obama, are examples of how they also activate (although weakly) for linked stimuli. Additionally, stimuli that could be viewed as their opposite, in a very abstract sense, tend to maximally suppress them.
How do we conceptualize these neurons? From the standpoint of interpretability, these neurons can be viewed as extreme instances of “multi-faceted neurons,” which react to several different situations. In terms of neuroscience, they may sound like “grandmother neurons,” but they differ from how many neuroscientists understand the word due to their associative nature. Although biological neurons with comparable characteristics have occasionally been referred to as “concept neurons,” this terminology may lead people to overinterpret these artificial neurons. Rather, these neurons are typically thought of by the authors as the visual equivalent of a subject feature, activating for traits that we could anticipate to be similar in a word embedding.
Many of these neurons deal with delicate subjects, such as emotions and political leaders. Age, gender, race, religion, sexual orientation, mental health and disability status, pregnancy, and parenting status are among the neurons that expressly represent or are strongly associated with protected traits. These neurons can either be exploited downstream to carry out biased behavior or reflect preconceptions in the “associated” stimuli they respond to. Additionally, a “toxic” neuron reacts to hate speech and sexual material, and there are a few detectors for persons who have committed crimes against humanity. A network is not always biased just because its neurons relate to sensitive subjects. In certain situations, you could even think that explicit representations could be helpful. For example, the toxic neuron could assist the model in matching hateful visuals with descriptions that contradict them. However, they serve as a warning sign for a variety of potential biases, and researching them could help us identify biases that we may not be aware of.
There are a lot of fascinating neurons in CLIP. Three of the “neuron families” depicted above—people neurons, emotion neurons, and region neurons—will be the subject of a thorough analysis. You are welcome to investigate others in Microscope.
Individual Neurons
Neurons that reflect historical and contemporary figures will be covered in this section. We are not endorsing the model or the individuals it discusses, which include political personalities and those who have committed crimes against humanity; rather, our discussion is meant to be candid and descriptive about what the model learnt from the online data it was trained on. Some readers may find this content upsetting.
Cultural knowledge is used by people to caption photographs on the Internet. You’ll soon discover that your item and scene identification abilities are insufficient if you attempt to caption the well-known photos of a foreign location. In order to properly caption pictures at a stadium, you must be aware of the sport, and you might even need to know individual players. If you don’t know who is speaking or what they are talking about, captioning images of politicians and celebrities speaking—some of the most popular images on the Internet—becomes considerably more challenging. Strong reactions to some public individuals can affect internet conversation and captions, independent of other content.
Given this, it should come as no surprise that the model devotes a large amount of its energy to portraying particular historical and public individuals, particularly those that are divisive or emotive. Christian symbols such as crosses and crowns of thorns, paintings of Jesus, his written name, even feature visualizations of him as an infant in the Virgin Mary’s arms are all recognized by a Jesus Christ neuron. The masked hero is recognized by a Spiderman neuron, which is also aware of his true identity, Peter Parker. Additionally, it reacts to pictures, descriptions, and illustrations of Spider-Man heroes and villains from the past fifty years of Spider-Man films and comics. A Hitler neuron picks up on his face and body, Nazi party insignia, pertinent historical records, and other vaguely connected ideas, such as German cuisine. Hitler and Swastikas appear to be performing the Nazi salute in this feature visualization.
Although the model’s choice of individuals for whom it creates specific neurons is random, it appears to be connected with the individual’s frequency throughout dataset 16 and the degree to which they are reacted to. Donald Trump is the one individual we have identified in each CLIP model. In addition to more weakly activating for others he has worked closely with, such as Mike Pence and Steve Bannon, it reacts powerfully to representations of him in a wide range of contexts, such as effigies and caricatures in numerous artistic media. Additionally, it reacts to his political rhetoric and symbols (like as his “Make America Great Again” and “The Wall” hats). However, it most *negatively* activates civil rights activists like Martin Luther King Jr., video games like Fortnite, musicians like Nicky Minaj and Eminem, and LGBT symbols like rainbow flags.
Neurons Of Emotion
Neurons for emotions and a neuron for “mental illness” will be discussed in this section. Our commentary is meant to be candid and descriptive of what the model discovered from the online data it was trained on; it is not an endorsement. Some readers may find this content upsetting.
Emotional content is crucial to the captioning process because even a slight alteration in a person’s expression can drastically alter the meaning of a photo. Dozens of neurons, each representing a distinct emotion, are dedicated to this duty by the model.
These emotion neurons are adaptable and react to body language, animal and human facial expressions, drawings, and text in addition to face expressions linked to certain emotions. The surprise neuron, for instance, fires even when most of the face is hidden, while the neuron we consider to be a pleasure neuron reacts to both smiles and words like “joy.” Slang like “OMG!” and “WTF” are responded to by it, and text feature visualization generates words of surprise and shock that are similar. Some emotion neurons even react to sights that recall the “vibe” of the emotion; for example, the creative neuron reacts to art studios. 18 Of fact, these neurons don’t always reflect the individuals’ mental states in an image; they just react to stimuli linked to an emotion.
Together with these emotion neurons, we also identify neurons that primarily react to other stimuli but also play a secondary function in responding to an emotion. Similar to how a neuron that predominantly detects pornographic content appears to have a secondary role of reflecting arousal, a neuron that primarily responds to jail and incarceration helps represent emotions like “persecuted,” as we’ll explore in a later section. Additionally, the neuron that reacts most strongly to question marks helps to symbolize “curious.”
Some neurons, such as the stupid expression neuron, only react to particular body and facial expressions, although the majority of emotion neurons appear to be highly abstract. As we’ll see later, both terms appear in the maximum corresponding captions, and it is most activated by the internet-born duckface expression and peace signals.
We refer to a neuron that represents a high level category of mental states rather than a particular emotion as a “mental illness” neuron. When images contain words linked to clinical mental health treatment (“psychology,” “mental,” “disorder,” “therapy”), unpleasant mental states (such as “depression,” “anxiety,” “lonely,” or “stressed”), or derogatory terms related to mental health (“insane,” “psycho”), this neuron is activated. Additionally, it reacts less strongly to pictures of drugs, depressing or anxious facial expressions, and the names of unpleasant feelings.
Normally, we wouldn’t consider mental illness to be an aspect of emotion. This neuron is crucial to put in the context of emotion, though, for a few reasons. First, it symbolizes typical negative emotions like sadness in its low-mid range activations. Second, non-clinical symptoms are frequently described informally with terms like “depressed.” As we’ll learn in a later section, this neuron is crucial for annotating emotions since it works in conjunction with other emotion neurons to distinguish between “healthy” and “unhealthy” expressions of an emotion.
We once more calculated the conditional probabilities of several categories by activation magnitude in order to acquire a deeper understanding of this neuron. Concepts pertaining to mental illness have the most positive activations. On the other hand, events involving music, sports, and exercise are associated with the largest negative activations.
Area Neurons
Neurons that symbolize different parts of the world and, in turn, ethnicity will be covered in this section. The model’s representations, which are derived from the internet, might be a reflection of colonialism, sensitive local circumstances, and biases and stereotypes. Our talk is not meant to support the model’s representations or associations, but rather to be candid and descriptive about what the model learnt from the online data it was trained on. Some readers may find this content upsetting.
Language and race, travel and immigration, local weather and cuisine, and location are all significant implicit or explicit contexts in a lot of internet conversation. In Canada, people are more prone to talk about Blizzards. In Australia, vegemite is more likely to occur. Chinese is more likely to be used while discussing China.
We discover that region neurons are developed in CLIP models in response to geographic regions. These neurons could be thought of as the visual equivalents of the geographic data found in word embeddings. They react to a wide range of modalities and aspects that are specific to a location, such as names of countries and cities, architecture, well-known public figures, the faces of the most prevalent ethnic groups, unique attire, fauna, and local script (if not the Roman alphabet). Even in the absence of labels, these neurons activate selectively for the appropriate area on a globe map.
From neurons that correspond to entire hemispheres (a Northern Hemisphere neuron, for instance, reacts to bears, moose, coniferous forests, and the entire Northern third of a world map) to sub-regions of nations (like the US West Coast), region neurons come in a wide range of sizes. The areas to which the model allocates neurons appear to be random and differ throughout the models we looked at.
On a globe-scale map, not every area neuron fires. Specifically, neurons that represent smaller nations or areas (like New York, Israel/Palestine) might not. This implies that the sheer quantity of region neurons present in CLIP is underrepresented when behavior is seen on a global map. We estimate that approximately 4% of neurons are regional using the top-activating English words as a heuristic.
There are a lot of neurons that appear to be “secondarily regional” in addition to pure region neurons. These neurons don’t have a region as their major focus, but they do have some sort of geographic information built in, and they fire weakly for regions that are related to them on a world map. For instance, a cold neuron that fires for the Arctic or an entrepreneurship neuron that fires for California. Other neurons, such as the immigration neuron that reacts to Latin America and the terrorist neuron that reacts to the Middle East, make associations between ideas and parts of the world that appear to be Americentric or even racist.
Even with these instances of neurons picking up Americentric stereotypes, the model appears to be a little more complex in certain places than one might think, especially considering that CLIP was trained solely on English language data. For instance, the RN50-x4 model creates neurons for three different regions of Africa rather than combining all of Africa into a single, monolithic organism. This was impressive to us even though it is much less detailed than its representation of many Western countries, which occasionally include neurons for particular countries or even sub-regions of countries.
There are several Africa neurons in RN50-4x. Activating countries by name implies that they choose for distinct regions.
Early investigations soon showed that these neurons “know” more about Africa than the writers did. For instance, the text “Imbewu,” which we discovered was a South African TV drama, was depicted in one of the earliest feature representations of the South African regional neuron.
Using a conditional probability plot once more, we selected the East Africa neuron for closer examination. Flags, national names, and other strong national connotations are the things that cause it to fire the most. The distribution of medium strength activations, which are far more prevalent, is surprisingly different and appears to be mostly related to ethnicity. Perhaps this is because, whereas elements like flags are far less common but offer strong evidence when they do exist, ethnicity is implicit in all photos of people and only serves as weak evidence for a location. This is the first neuron with a clear regime shift between medium and strong activations that we have thoroughly examined.
Plots Of Conditional Probability
It is insufficient to examine the situations in which a neuron fires at its highest rate if we want to truly comprehend its behavior. We want to examine the entire range, including instances in which it shot weakly, instances in which it was just about to fire, and instances in which it was severely prevented from firing. This appears to be particularly true for highly abstract neurons, where “associated stimuli” can be revealed by modest activations, such when a Donald Trump neuron fires for Mike Pence.
We may sample the distribution of stimuli that induce a particular degree of activation by iterating through the validation set until we locate an image that causes that activation, as we have access to a validation set from the same distribution the model was trained on.
Using Curve Detectors as an example, we more thoroughly quantify this by plotting the conditional probability of different categories as a function of neuron activation. In order to achieve this, we sampled a set number of stimuli for each activation range and established equally spaced buckets between the most excitatory and maximally inhibitory activation values. Checking the cell activations for millions of stimuli is necessary to fill in the most extreme buckets. After each bucket has a complete set of stimuli, we blind a labeler to each stimulus’s activation and ask them to choose noteworthy categories based on their observations and our theory about the neuron. After that, the human labeler, who was blind to the activation, classified each stimulus into these groups.
Because activations have an arbitrary scale, we represent the activation axis in terms of standard deviations of activation from zero. However, remember that activations have substantially thicker tails and are not distributed in a Gaussian fashion.
It’s crucial to remember that the probability density might vary by several orders of magnitude depending on the activation level while examining these graphs. Specifically, the probability density decays exponentially to the tails after peaking at zero. Because they will be crowded out around zero, false negatives for a rare category will typically not be very noticeable. These graphs display a neuron’s precision but not recall. Curve Detectors goes into greater information about these problems.
Examining the distribution of activations conditioned on a category is an additional option. In our second plot for the Trump neuron, we adopt this strategy. These plots can help address recollection issues and describe how the cell reacts to uncommon categories in areas of increased density. But for these research, one needs a mechanism to acquire samples conditioned on a category, and your procedure might not be representative. We sampled images in a category using a popular image search because these neurons are so high-level for our objectives.
Visualization Of Faceted Features
If a neuron reacts to several different types of pictures, it is said to have numerous facets. A pose-invariant dog-head detector, for instance, can identify dog heads that are angled left, right, or directly ahead. Both the interior and exterior of grocery stores are protected by a detector fire. Boundary detectors don’t care which side is which; they only search for a variation in texture between the two. Even numerous, unrelated types of images can cause a cell to activate. We call these neurons polysemantic.
The process of feature visualization involves optimizing a neural network’s input to produce stimuli that exhibit certain behaviors, usually maximizing a neuron’s activity. Because it is challenging to depict several facets in a single image, neurons with multiple facets pose unique issues for feature visualization. Feature visualization frequently attempts to depict both facets simultaneously (which is illogical) or only displays one facet 44 when such neurons are present. Both situations are insufficient.
Two previous methods for enhancing feature visibility for multifaceted neurons are known to us. Finding a wide variety of images that trigger a particular neuron and using them as seeds for the feature visualization optimization process is the first strategy. In the second, a phrase that promotes diversity of activations on earlier levels is combined with feature visualization.
Here, we suggest a novel feature visualization goal called “faceted feature visualization,” which enables us to direct the feature visualization toward a specific topic represented by a set of images (such as text, logos, facial traits, etc.). In order to distinguish between these photos and generic natural images, we first gather samples of images that fit this theme. Then, we train a linear probe on the model’s lower layers. Next, we maximize the penalized objective to visualize the features.
f(g(x))+w T (g(x)⊙∇f(g(x))), where w are that linear probe’s weights.
The original feature visualization objective, f∘g, is made up of two functions: f, which converts those intermediate activations into the final objective, and g, which transforms the input into intermediate activations.
The architecture, inside, and nature facets in this paper use photos from SUN397; the stance employs VOC2012 bounding boxes; and the face uses a combination of Flickr-Faces-HQ and fairface.
Why we do not maximize f(g(x))+w T g(x) instead may be of interest to the reader. Because the ∇f(g(x)) functions as a filter, downweighting the unnecessary components of g(x) that do not contribute to the objective f∘g(x), we have discovered that, in reality, the former aim yields feature visualizations of significantly greater quality. We have also discovered that the quality of the generated visualizations is much improved by substituting the diversity term on the intermediate activations g(x)⊙∇f(g(x)).

