258 |
AttEdgeNet |
87.40 |
67.17 |
54.17 |
78.55 |
Details
background |
hat |
hair |
glove |
sunglasses |
upper-clothes |
dress |
coat |
socks |
pants |
87.66 |
66.22 |
72.37 |
42.81 |
28.04 |
70.66 |
40.00 |
58.60 |
44.72 |
75.14 |
jumpsuits |
scarf |
skirt |
face |
left-arm |
right-arm |
left-leg |
right-leg |
left-shoe |
right-shoe |
29.67 |
25.05 |
30.45 |
74.12 |
64.12 |
65.69 |
57.94 |
57.72 |
46.64 |
45.83 |
|
Abbreviation
Contributors |
Description |
Li Tianpeng, Chen Jiansheng
Tsinghua University
|
We added two new classes(Background on human body and 3pixel edges) when we trained our models.
Two attention blocks were added to the resnet of pspnet to combine the low level and high level features.
Finally we used the predicted coarse result as attention map multiply the Conv3 feature and then to train a 8-layers subnet to generate more accurate results.
Our single model can achieve MeanIOU 52.6. Final results is generated by an ensemble of five models. |
|
2018-04-09 04:28:56 |
284 |
Attention |
84.52 |
54.83 |
44.60 |
74.03 |
Details
background |
hat |
hair |
glove |
sunglasses |
upper-clothes |
dress |
coat |
socks |
pants |
84.90 |
59.11 |
68.25 |
31.48 |
20.76 |
65.47 |
30.71 |
53.47 |
34.67 |
68.47 |
jumpsuits |
scarf |
skirt |
face |
left-arm |
right-arm |
left-leg |
right-leg |
left-shoe |
right-shoe |
22.75 |
14.97 |
22.35 |
71.89 |
52.54 |
55.31 |
39.77 |
39.67 |
27.60 |
27.85 |
|
Abbreviation
Contributors |
Description |
WuTao |
ABC |
|
2018-05-03 15:45:48 |
287 |
n_v3 |
86.73 |
61.93 |
51.53 |
77.30 |
Details
background |
hat |
hair |
glove |
sunglasses |
upper-clothes |
dress |
coat |
socks |
pants |
86.75 |
63.59 |
70.81 |
36.70 |
22.68 |
68.55 |
37.10 |
57.35 |
38.33 |
73.04 |
jumpsuits |
scarf |
skirt |
face |
left-arm |
right-arm |
left-leg |
right-leg |
left-shoe |
right-shoe |
28.09 |
12.35 |
28.37 |
74.33 |
62.98 |
64.23 |
58.71 |
58.89 |
44.32 |
43.40 |
|
Abbreviation
|
2018-06-04 01:58:10 |
283 |
JD_BUPT |
87.42 |
65.86 |
54.44 |
78.34 |
Details
background |
hat |
hair |
glove |
sunglasses |
upper-clothes |
dress |
coat |
socks |
pants |
87.26 |
65.10 |
72.21 |
42.71 |
31.03 |
70.53 |
42.04 |
58.95 |
42.59 |
74.47 |
jumpsuits |
scarf |
skirt |
face |
left-arm |
right-arm |
left-leg |
right-leg |
left-shoe |
right-shoe |
32.27 |
22.03 |
33.40 |
74.93 |
63.82 |
65.66 |
59.68 |
59.92 |
45.37 |
44.84 |
|
Abbreviation
Contributors |
Description |
Xinchen Liu (BUPT&JD), Meng Zhang (BUPT), Yanan Li (BUPT), Wenhui Gao (BUPT), Jiangtian Pan (JD AI Research), Wu Liu (JD AI Research), Huadong Ma (BUPT) |
(1) We revised and finetuned the JPP-Net[1], SS-NAN[2], SSL[3], DenseNet[4], RefineNet[5] on LIP training set. Then we combined the five models with different fusion strategies. (2) We mined several hard classes to improve the overall performance. (3) We used the data augmentation, focal loss[6], and image morphology. [1]Liang et al., Look into Person: Joint Body Parsing & Pose Estimation Network and A New Benchmark, T-PAMI, 2018. [2]Zhao et al., Self-Supervised Neural Aggregation Networks for Human Parsing, CVPR Workshop, 2017. [3]Gong et al., Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing, CVPR, 2017. [4]Jegou et al., The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation, CVPR, 2017. [5]Lin et al., RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation, CVPR, 2017. [6]Focal Loss for Dense Object Detection, Lin et al., ICCV, 2017. |
|
2018-06-10 12:59:11 |
290 |
densenet&deeplabv3+ |
81.56 |
49.56 |
37.92 |
70.65 |
Details
background |
hat |
hair |
glove |
sunglasses |
upper-clothes |
dress |
coat |
socks |
pants |
83.19 |
50.91 |
63.94 |
15.78 |
4.58 |
59.92 |
26.81 |
47.19 |
27.71 |
62.74 |
jumpsuits |
scarf |
skirt |
face |
left-arm |
right-arm |
left-leg |
right-leg |
left-shoe |
right-shoe |
17.66 |
2.88 |
11.60 |
68.13 |
44.82 |
48.30 |
36.70 |
36.21 |
25.92 |
23.40 |
|
Abbreviation
Contributors |
Description |
hanqiuyuan |
I replace the backbone of Xception with densenet in the deeplabv3+ model.Meanwhile, I remove the last layer of densenet and the maxpool layer to get the feature map.
The atrous I use are 3,6,9 and the output_stride is 32.The crop_size is 640 because the maxsize of the dataset id 640.
Finally, I concat the Unit1 of Block2 with the resized and concated feature maps. |
|
2018-05-27 05:33:01 |
292 |
no-ssl clean data 10w |
83.73 |
53.13 |
42.69 |
73.04 |
Details
background |
hat |
hair |
glove |
sunglasses |
upper-clothes |
dress |
coat |
socks |
pants |
84.52 |
57.40 |
67.45 |
29.90 |
19.55 |
63.66 |
27.40 |
51.57 |
32.40 |
66.94 |
jumpsuits |
scarf |
skirt |
face |
left-arm |
right-arm |
left-leg |
right-leg |
left-shoe |
right-shoe |
21.71 |
13.09 |
20.17 |
70.73 |
49.76 |
52.45 |
36.85 |
36.72 |
26.20 |
25.30 |
|
Abbreviation
|
2018-05-30 07:46:04 |
297 |
xNet |
80.98 |
47.50 |
36.20 |
70.20 |
Details
background |
hat |
hair |
glove |
sunglasses |
upper-clothes |
dress |
coat |
socks |
pants |
84.03 |
61.58 |
69.31 |
29.35 |
22.69 |
58.93 |
27.68 |
45.39 |
31.34 |
65.49 |
jumpsuits |
scarf |
skirt |
face |
left-arm |
right-arm |
left-leg |
right-leg |
left-shoe |
right-shoe |
8.25 |
9.27 |
19.56 |
72.61 |
2.73 |
38.18 |
12.44 |
32.50 |
13.34 |
19.33 |
|
Abbreviation
|
2018-06-06 04:21:49 |
301 |
refine_net |
87.42 |
65.86 |
54.44 |
78.34 |
Details
background |
hat |
hair |
glove |
sunglasses |
upper-clothes |
dress |
coat |
socks |
pants |
87.26 |
65.10 |
72.21 |
42.71 |
31.03 |
70.53 |
42.04 |
58.95 |
42.59 |
74.47 |
jumpsuits |
scarf |
skirt |
face |
left-arm |
right-arm |
left-leg |
right-leg |
left-shoe |
right-shoe |
32.27 |
22.03 |
33.40 |
74.93 |
63.82 |
65.66 |
59.68 |
59.92 |
45.37 |
44.84 |
|
Abbreviation
|
2018-06-10 13:02:27 |
270 |
PSPse |
88.92 |
67.78 |
57.90 |
80.59 |
Details
background |
hat |
hair |
glove |
sunglasses |
upper-clothes |
dress |
coat |
socks |
pants |
89.05 |
68.88 |
75.38 |
46.98 |
36.80 |
73.11 |
38.31 |
60.97 |
50.19 |
77.11 |
jumpsuits |
scarf |
skirt |
face |
left-arm |
right-arm |
left-leg |
right-leg |
left-shoe |
right-shoe |
29.25 |
26.10 |
30.89 |
77.87 |
69.74 |
71.15 |
66.43 |
66.33 |
51.80 |
51.76 |
|
Abbreviation
Contributors |
Description |
Ting Liu(BJTU)*, Tao Ruan(BJTU)*, Yunchao Wei(UIUC), Shikui Wei(BJTU), Jinjun Xiong(UIUC, IBM), Yao Zhao(BJTU), Thomas Huang(UIUC). (* means equal contribution) |
We proposed a novel PSPse (‘se’ means segmentation with edge) network, which consists of three key modules to learn for parsing in an end-to-end manner: 1) high resolution embedding module; 2) global context embedding module; 3) edge perceiving module. The model is trained by sync batch norm across multiple GPUs. With the Res101 as the backbone, one single PSPse model can achieve the mIoU score of 56.5% without any bells and whistles. Our best result is produced by an ensemble of three models.
Ref: [1] Pyramid Scene Parsing Network. CVPR 2017
[2] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Arxiv 2018
[3] Magic-wall: Visualizing Room Decoration. ACM MM 2017. |
|
2018-06-10 15:52:56 |
285 |
|
81.12 |
47.20 |
36.97 |
69.68 |
Details
background |
hat |
hair |
glove |
sunglasses |
upper-clothes |
dress |
coat |
socks |
pants |
82.62 |
56.63 |
64.52 |
30.96 |
26.47 |
61.50 |
24.97 |
49.33 |
35.64 |
64.00 |
jumpsuits |
scarf |
skirt |
face |
left-arm |
right-arm |
left-leg |
right-leg |
left-shoe |
right-shoe |
18.96 |
6.60 |
19.37 |
68.26 |
16.30 |
29.89 |
21.25 |
22.47 |
21.92 |
17.80 |
|
Abbreviation
|
2018-06-07 07:14:03 |