Machine Learning for Color Type Classification in Fashion: A Literature Review

Seasonal color analysis (also called personal color analysis) is a method of categorizing an individual’s coloring into seasons (Spring, Summer, Autumn, Winter) based on skin, hair, and eye colors. Modern approaches often use an extended 12-season system (each season subdivided into 3 sub-types) to create more nuanced personal color palettes (Park et al., 2018). In the fashion and beauty industry, this guides personalized color recommendations for clothing and makeup. Traditionally, determining someone’s color season required human experts and subjective judgment, but researchers in the last decade have explored machine learning to automate and standardize this process (Park et al., 2018). Below is a review of machine learning approaches for color type classification in fashion, focusing on the 12-season framework, along with relevant studies (2015–present), available datasets, key methods, and noted challenges.

Traditional ML Approaches to Color Classification

Early attempts relied on manually defined features and classical algorithms. A common strategy is to extract color features from specific facial regions (skin, hair, eyes) and apply rule-based or simple ML classifiers. For example, Park et al. (2018) identify a user’s personal colors by analyzing the average color of the pupils, hair, and skin in a selfie (Park et al., 2018). They determine the season by heuristics – for example, comparing the contrast between hair and skin tone to decide between Bright vs. Soft or Light vs. Deep seasons (Park et al., 2018) (Park et al., 2018). Similarly, an earlier approach by Ji-Ho et al. uses a personal color database and fuzzy logic rules to classify season; it defines representative RGB values for skin, hair, eye, and even wrist colors, and then applies fuzzy reasoning to match a person to a color category (Ji-Ho et al., 2016) (Ji-Ho et al., 2016). Other classical techniques have included decision trees and distance-based classifiers. For instance, some researchers formed a structured dataset of facial color features (such as the RGB of skin) and trained a decision tree to predict the season (Ji-Ho et al., 2016). A simple distance matching was tested by the open-source ColorInsight project: they computed the L2 distance between a face’s skin color and reference palette points from literature for each season, but this yielded only 20–30% accuracy (essentially random for 4 classes) (ColorInsight, 2023). Overall, traditional ML approaches tended to have limited success on fine-grained color typing. They could sometimes distinguish broad warm/cool groups, but more nuanced 12-season classification was unreliable (with certain types consistently misclassified or not predicted at all (ColorInsight, 2023)). This motivated a turn to more powerful deep learning methods.

Deep Learning Models for Color Type Classification

Recent studies leverage deep learning—especially Convolutional Neural Networks (CNNs)—to learn complex color-season mappings directly from images. Instead of manually selecting features, CNN-based models can ingest a face or skin image and learn the relevant color tone features automatically. For example, Su et al. (2023) train a face color classification model using a MobileNetV3 CNN (a lightweight deep network) to categorize a user’s face into one of the 4 main seasons (Su et al., 2023). In their system, this season prediction is combined with other attributes (such as face shape and age via an Inception network) to recommend personalized clothing (Su et al., 2023). Another project reports that using a pre-trained ResNet CNN on segmented face images significantly outperformed color-space features—boosting accuracy from ~20% to ~60% for 4-season classification (ColorInsight, 2023) (ColorInsight, 2023). This indicates that CNNs can capture subtler distinctions (though 60% is still far from perfect, as discussed later).

Beyond standard CNNs, specialized and larger models have been explored. Stacchio et al. (2024) introduced the Deep Armocromia study, where they fine-tuned state-of-the-art models on a new seasonal color dataset. They experimented with a pre-trained ResNeXt-50 CNN and a transformer-based face model called FaRL (General Facial Representation Learning) (FaRL, 2022). Fine-tuning FaRL (a ViT-based model for face analysis) achieved the best results on their data (Stacchio et al., 2024) (Stacchio et al., 2024). Notably, their approach involved hierarchical classification—first predicting the primary season (Spring, Summer, Autumn, or Winter), then the 12 sub-season types (Stacchio et al., 2024) (Stacchio et al., 2024). Recent transformer models (like FaRL or Vision Transformers) provide robust features for faces, but even these advanced models found the task difficult (e.g. ~55% accuracy on main seasons) (Stacchio et al., 2024). In summary, deep models (CNNs and even vision transformers) now lead the field for color type classification, generally outperforming earlier methods (ColorInsight, 2023) (ColorInsight, 2023). They are often used in a transfer learning setup—starting from models pre-trained on large face datasets and fine-tuning them on the smaller color-labeled datasets (Stacchio et al., 2024).

Recent Studies (2015–2024) on Fashion Color Analysis

Several academic works in the last decade have tackled automated personal color analysis:

Oztel & Kazan (2015) – Proposed a virtual makeup application that analyzes face color to choose flattering lipstick shades (Oztel and Kazan, 2015) (Oztel and Kazan, 2015).
Ji-Ho et al. (2016) – Developed a systematic color selection system using a personal color knowledge base and fuzzy logic (Ji-Ho et al., 2016).
Park et al. (2018) – “An Automatic Virtual Makeup Scheme Based on Personal Color Analysis.” In this study, a user’s season is first determined from a selfie by analyzing skin, hair, and iris colors, and then appropriate cosmetic colors are selected and virtually applied (Park et al., 2018) (Park et al., 2018).
Su et al. (2023) – “Personalized clothing recommendation fusing the 4-season color system and users’ biological characteristics.” This work integrates color analysis into a fashion e-commerce recommendation engine, where a MobileNetV3-based model classifies the user’s face into one of the 4 seasons (Su et al., 2023) (Su et al., 2023). They also collected a training set by crawling approximately 1,000 face images labeled with seasons, though this dataset was not released publicly.
Stacchio et al. (2024) – “Deep Armocromia: A Novel Dataset for Face Seasonal Color Analysis and Classification.” This preprint introduces a new dataset and benchmarks deep models for the 12-season (Armocromia) classification. The authors compiled approximately 5,000 face photos of public figures, each rigorously annotated into one of the 12 sub-types (e.g. Light Summer, Cool Winter) (Stacchio et al., 2024) (Stacchio et al., 2024).

In addition to these academic studies, open-source implementations and smaller projects reflect similar trends. The Deep Seasonal Color Analysis System (DSCAS) is a GitHub project that combines classical and deep techniques to assign a color palette to a user from a selfie (DSCAS, 2022). Similarly, the ColorInsight project (2023) built a web app for personal color diagnosis using a FaRL-based face segmentation step followed by a fine-tuned ResNet classifier trained on a set of Korean celebrity images (ColorInsight, 2023).

Datasets for Fashion Color Classification

Until recently, a major obstacle was the lack of labeled datasets for personal color seasons. Researchers often had to compile their own data or rely on expert labeling. Some noteworthy datasets include:

Deep Armocromia Dataset (2024) – Introduced by Stacchio et al. (2024), this is the most comprehensive public dataset for 12-season color analysis to date. It contains roughly 5,000 face images labeled into 4 seasons and 12 sub-seasons (with annotations following standardized Armocromia guidelines) (Stacchio et al., 2024) (Stacchio et al., 2024). The dataset is freely available on GitHub for research use.
Su et al.’s 4-Season Face Dataset (2023) – As part of their study, Su et al. (2023) collected approximately 1,000 web images of individuals labeled as Spring, Summer, Autumn, or Winter for training their classifier. However, this dataset was not released publicly (Su et al., 2023).
ColorInsight Celebrity Dataset (2023) – The ColorInsight team gathered about 750 images of Korean celebrities, labeled by an expert into 4 season categories (ColorInsight, 2023). Due to privacy concerns, the raw images were not made publicly available.
CapstoneA Personal Color Dataset (2020) – A small open dataset on Roboflow Universe contributed by “Capstonea” contains 230 images labeled with 4 classes (Spring, Summer, Autumn, Winter) (Capstonea, 2022) (Capstonea, 2022). Although limited in scale, it provides a starting point for experimentation.
General Face Databases – While not specifically labeled by color season, large face datasets (e.g. CelebA or LAION-Face) have been used for pre-training and face parsing (Stacchio et al., 2024). For instance, FaRL was pre-trained on millions of face images with weak labels before being fine-tuned for color classification.

Importantly, the scarcity of public data has made comparative evaluation difficult. Many authors note that previous proprietary datasets were “not released and no information about how [they were] constructed” (Stacchio et al., 2024).

Feature Extraction, Training, and Evaluation Methods

Feature Extraction:
Approaches vary on whether to use handcrafted or learned features. Traditional methods extract specific color features (e.g. average RGB or HSV values) from skin patches, hair regions, eyes, etc. For example, Park et al. (2018) use detected iris, hair, and jaw/skin regions to sample color values (Park et al., 2018) (Park et al., 2018). Similarly, Ji-Ho’s fuzzy system took RGB inputs from skin, hair, eyes, and wrist, applying membership functions for each season’s prototypical color range (Ji-Ho et al., 2016). These approaches heavily rely on accurate face parsing—that is, segmenting the image into regions (skin, hair, eyes). Earlier works employed classical image processing or facial landmarks (e.g. using an Active Shape Model) to locate these regions (Park et al., 2018). In contrast, deep learning methods often feed the whole face image (or a cropped version) into a CNN so that the network learns the relevant color features. Still, many pipelines include an explicit segmentation step. For instance, the ColorInsight pipeline first uses a deep face parsing model (FaRL) to create a skin mask, then inputs only that region to the classifier (ColorInsight, 2023). Other studies experiment with feeding multiple cropped regions (one for hair, one for eyes, one for skin) into separate models or an ensemble, mimicking the human analyst’s approach (Su et al., 2023) (Su et al., 2023).

Model Training:
Almost all recent studies use supervised learning with season labels as targets. Due to typically small dataset sizes, transfer learning is crucial—starting from a network pre-trained on a large dataset (e.g. ImageNet or VGGFace) and fine-tuning on the color classification task (Stacchio et al., 2024). Su et al. (2023) specifically mention using a MobileNetV3 with domain transfer learning (DTL) to improve generalization (Su et al., 2023). Stacchio et al. (2024) fine-tuned models such as ResNeXt50 and ViT-FaRL on their 5k-image dataset rather than training from scratch (Stacchio et al., 2024). Data augmentation is commonly used to expand the training set (for example, ColorInsight augmented its ~750 training images) (ColorInsight, 2023). Interestingly, 12-season classification can also be framed as a two-stage or hierarchical problem (first classifying the season, then the sub-type), an approach that Stacchio et al. (2024) suggest might mirror expert decision processes (Stacchio et al., 2024).

Evaluation Metrics:
For classification tasks, standard metrics such as accuracy, precision/recall, and F1-score are employed (Stacchio et al., 2024). Given potential class imbalances or multiple plausible season assignments for borderline cases, some studies also report Top-K accuracy (for example, Stacchio et al. use Top-2 accuracy for the 4-class task and Top-3 for the 12-class task) (Stacchio et al., 2024) (Stacchio et al., 2024). Confusion matrices are often used to identify which seasons are misclassified (e.g. frequent confusion between Bright Spring and Bright Winter). In some cases, evaluation goes beyond raw classification accuracy. For instance, Park et al. (2018) evaluated the end-to-end system (from color analysis to makeup recommendation) via user studies that measured satisfaction with the virtual makeup applied (Park et al., 2018) (Park et al., 2018). Additionally, Su et al. (2023) evaluated their recommender system by assessing how much incorporating personal color improved clothing suggestion relevance (Su et al., 2023).

Challenges and Limitations

Despite progress, several challenges remain in automating 12-season color analysis:

Ambiguity and Subjectivity:
The 12 seasonal categories are not always sharply defined. Many individuals fall on the border between two categories, making it difficult even for experts to agree on labels. As Park et al. (2018) note, personal color analysis has been “controversial since its beginnings” (Park et al., 2018). This ambiguity can confuse ML models since a classification might be deemed correct by one expert and incorrect by another. It also complicates dataset creation – Stacchio et al. (2024) had to employ trained Armocromia experts and a strict annotation protocol to ensure consistency (Stacchio et al., 2024) (Stacchio et al., 2024).
Data Scarcity:
Until recently, very few publicly available datasets with season labels existed. This forced many researchers to compile their own data or rely on expert labeling, increasing the risk of overfitting. Although datasets like Deep Armocromia have improved the situation, the field still remains data-limited compared to other vision tasks (Stacchio et al., 2024).
Intra-class Variability and Inter-class Overlap:
Within each season, individuals can exhibit a wide range of appearances. Moreover, different seasonal categories can share very similar color characteristics—for instance, distinguishing between Cool Summer and Cool Winter may rely on subtle differences. Deep Armocromia’s results reveal significant confusion between such look-alike classes (Stacchio et al., 2024).
Lighting and Image Conditions:
Optimal color analysis requires neutral lighting and standard conditions. However, user selfies often come with varying lighting, white balance, and filters that can distort perceived colors. Although many studies attempt to mitigate this (e.g. through white-balancing or controlled capture conditions), it remains an open challenge.
Need for Accurate Face Parsing:
Analyzing the wrong pixels (for example, including background or makeup) can lead to erroneous features. Robust detection of skin, hair, and eye regions is critical but challenging—advanced segmentation models (such as those used in ColorInsight and DSCAS) are required (ColorInsight, 2023) (ColorInsight, 2023).
Performance and Generalization:
Even with deep learning, reported accuracies for full 12-season classification remain relatively low (often in the 30–60% range). Moreover, models trained on a narrow demographic (e.g. Korean celebrities) may not generalize well to other ethnicities (ColorInsight, 2023). Increasing training data diversity and exploring synthetic augmentation are potential remedies.
Real-Time and User Experience Constraints:
For practical applications (e.g. mobile apps or web services), inference speed and resource usage are critical. Deep models often require a trade-off between complexity and deployability.

In summary, while machine learning has enabled automation of color type classification, the task remains challenging due to subtle intra- and inter-season differences and the inherent subjectivity of color labels (Stacchio et al., 2024). The best models so far only partially replicate expert-level classification, especially for the detailed 12-season palette. Future directions include exploring hierarchical models, multitask learning (e.g. combining color analysis with face attribute detection), and improved color constancy techniques to handle lighting variation.

References and Data Sources

Park, J. et al. (2018). “An Automatic Virtual Makeup Scheme Based on Personal Color Analysis.” IMCOM 2018. (Park et al., 2018) (Park et al., 2018)
Su, X. et al. (2023). “Personalized clothing recommendation fusing the 4-season color system and users’ biological characteristics.” Multimedia Tools and Applications, 83(5), 12597–12625. (Su et al., 2023) (Su et al., 2023)
Stacchio, L. et al. (2024). “Deep Armocromia: A Novel Dataset for Face Seasonal Color Analysis and Classification.” (preprint) (Stacchio et al., 2024) (Stacchio et al., 2024)
Oztel, G.Y. & Kazan, S. (2015). “Virtual Makeup Application Using Image Processing.” (Referenced in Park et al., 2018) (Oztel and Kazan, 2015)
Ji-Ho et al. (2016). “Personal color analysis using color space algorithm.” (Referenced in Park et al., 2018) (Ji-Ho et al., 2016) (Ji-Ho et al., 2016)
Deep Armocromia Dataset (2024) – Public face image dataset labeled with 4 seasons and 12 subtypes (Stacchio et al., 2024) (also available on GitHub).
Capstonea Personal Color Dataset (2022) – 230 images labeled in 4 season categories (Roboflow Universe) (Capstonea, 2022) (Capstonea, 2022).
ColorInsight Project (2023) – Open-source personal color analysis demo using FaRL and ResNet on a Korean celebrity dataset (ColorInsight, 2023).
Deep Seasonal Color Analysis System (DSCAS) (2022) – GitHub project combining classical and deep methods for palette assignment (DSCAS, 2022).