We propose a method to extract representative features for fashion analysis by utilizing weakly annotated online fashion images in this work. The proposed system consists of two stages. In the first stage, we attempt to detect clothing items in a fashion image: the top clothes (t), bottom clothes (b) and one-pieces (o). In the second stage, we extract discriminative features from detected regions for various applications of interest. Unlike previous work that heavily relies on well-annotated fashion data, we propose a way to collect fashion images from online resources and conduct automatic annotation on them. Aleesha Intitute Fashion Designing Based on this methodology, we create a new fashion dataset, called the Web Attributes, to train our feature extractor. It is shown by experiments that extracted regional features can capture local characteristics of fashion images well and offer better performance than previous works.

Fashion Designing Institute Aleesha

Fashion research covers various specialized tasks. Clothing parsing [1, 2, 6, 9] aims at segmenting individual fashion items in images. Fashion landmark localization [3, 4] targets at detecting virtual landmark points on fashion items defined by humans. Clothing retrieval [3, 5] involves search for the same fashion item in different images under different circumstances (e.g., lightning conditions, viewing angles, deformation, etc.). Clothing attribute classification [3, 10, 11] focuses on the characteristics of each fashion item. Clothing style classification [12] deals with the appearance style of individual persons. Fashion description [7] extracts features from fashion items, which can be utilized in solving multiple fashion problems. There exists previous work that utilized online resources for fashion study without human annotations [6, 7, 8]. Fashion images in PaperDoll [6] were collected based on associated meta-data tags that denote attributes such as color, clothing item or occasion. For each input image, similar images were retrieved from the collected database using hand-crafted features. The prediction was then made by voting from the labels of retrieved similar images to get a more robust result. Simo-Serra et al. [8] proposed a method to learn and predict how fashionable a person appears in a photo. To achieve this, a heterogeneous dataset called the “Fashion144k” was collected automatically online. The dataset contains 11 information types such as the number of fans, location, post tags, etc. Furthermore, they proposed a conditional random field model based on all information types. However, the contribution of each information type and the impact of inaccurate tags are unclear. The impact of noisy labels in the Fashion144k dataset was envisaged in [7], where a feature extraction network was proposed. It was observed that fashion data deviate too much from general object or scene datasets, which therefore would not be helpful fashion learning. Consequently, they would like to find a better way to uti lize the noisy fashion dataset. To suppress noise in each single label, they first filtered out images with less than three labels. Then, they constructed triplets of images that contains one reference image, one similar image and one dissimilar image based on noisy labels. The network was trained jointly on noisy and cleaned labels. The resulting features outperformed precedent work, including large-scale networks trained on the ImageNet [13]. Their feature extraction network, however, has two main drawbacks. First, the training only relies on color- and category-related labels and, as a result, the obtained features cannot represent detailed characteristics of fashion items. Second, the network only captures the global characteristics of an image. It cannot separate each individual fashion item (say, top and bottom) and extract features correspondingly.